Cradicle Explorer

/ docs-website / docs / pipeline-components / retrievers / filterretriever.mdx
filterretriever.mdx
  1  ---
  2  title: "FilterRetriever"
  3  id: filterretriever
  4  slug: "/filterretriever"
  5  description: "Use this Retriever with any Document Store to get the Documents that match specific filters."
  6  ---
  7  
  8  # FilterRetriever
  9  
 10  Use this Retriever with any Document Store to get the Documents that match specific filters.
 11  
 12  <div className="key-value-table">
 13  
 14  |  |  |
 15  | --- | --- |
 16  | **Most common position in a pipeline** | At the beginning of a Pipeline                                                                        |
 17  | **Mandatory init variables**           | `document_store`: An instance of a Document Store                                                     |
 18  | **Mandatory run variables**            | `filters`: A dictionary of filters in the same syntax supported by the Document Stores                |
 19  | **Output variables**                   | `documents`: All the documents that match these filters                                               |
 20  | **API reference**                      | [Retrievers](/reference/retrievers-api)                                                                      |
 21  | **GitHub link**                        | https://github.com/deepset-ai/haystack/blob/main/haystack/components/retrievers/filter_retriever.py |
 22  
 23  </div>
 24  
 25  ## Overview
 26  
 27  `FilterRetriever` retrieves Documents that match the provided filters.
 28  
 29  It’s a special kind of Retriever – it can work with all Document Stores instead of being specialized to work with only one.
 30  
 31  However, as every other Retriever, it needs some Document Store at initialization time, and it will perform filtering on the content of that instance only.
 32  
 33  Therefore, it can be used as any other Retriever in a Pipeline.
 34  
 35  Pay attention when using `FilterRetriever` on a Document Store that contains many Documents, as `FilterRetriever` will return all documents that match the filters. The `run` command with no filters can easily overwhelm other components in the Pipeline (for example, Generators):
 36  
 37  ```python
 38  filter_retriever.run({})
 39  ```
 40  
 41  Another thing to note is that `FilterRetriever` does not score your Documents or rank them in any way. If you need to rank the Documents by similarity to a query, consider using Ranker components.
 42  
 43  ## Usage
 44  
 45  ### On its own
 46  
 47  ```python
 48  from haystack import Document
 49  from haystack.components.retrievers import FilterRetriever
 50  from haystack.document_stores.in_memory import InMemoryDocumentStore
 51  
 52  docs = [
 53      Document(content="Python is a popular programming language", meta={"lang": "en"}),
 54      Document(
 55          content="python ist eine beliebte Programmiersprache",
 56          meta={"lang": "de"},
 57      ),
 58  ]
 59  
 60  doc_store = InMemoryDocumentStore()
 61  doc_store.write_documents(docs)
 62  retriever = FilterRetriever(doc_store)
 63  result = retriever.run(filters={"field": "lang", "operator": "==", "value": "en"})
 64  
 65  assert "documents" in result
 66  assert len(result["documents"]) == 1
 67  assert result["documents"][0].content == "Python is a popular programming language"
 68  ```
 69  
 70  ### In a RAG pipeline
 71  
 72  Set your `OPENAI_API_KEY` as an environment variable and then run the following code:
 73  
 74  ```python
 75  from haystack.components.retrievers.filter_retriever import FilterRetriever
 76  from haystack.document_stores.in_memory import InMemoryDocumentStore
 77  
 78  from haystack import Document, Pipeline
 79  from haystack.components.builders.answer_builder import AnswerBuilder
 80  from haystack.components.builders.prompt_builder import PromptBuilder
 81  from haystack.components.generators import OpenAIGenerator
 82  from haystack.document_stores.types import DuplicatePolicy
 83  
 84  import os
 85  api_key = os.environ['OPENAI_API_KEY']
 86  
 87  document_store = InMemoryDocumentStore()
 88  documents = [
 89  		Document(content="Mark lives in Berlin.", meta={"year": 2018}),
 90  		Document(content="Mark lives in Paris.", meta={"year": 2021}),
 91  		Document(content="Mark is Danish.", meta={"year": 2021}),
 92  		Document(content="Mark lives in New York.", meta={"year": 2023}),
 93  ]
 94  document_store.write_documents(documents=documents)
 95  
 96  ## Create a RAG query pipeline
 97  prompt_template = """
 98      Given these documents, answer the question.\nDocuments:
 99      {% for doc in documents %}
100          {{ doc.content }}
101      {% endfor %}
102  
103      \nQuestion: {{question}}
104      \nAnswer:
105      """
106  
107  rag_pipeline = Pipeline()
108  rag_pipeline.add_component(name="retriever", instance=FilterRetriever(document_store=document_store))
109  rag_pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
110  rag_pipeline.add_component(instance=OpenAIGenerator(api_key=api_key), name="llm")
111  rag_pipeline.connect("retriever", "prompt_builder.documents")
112  rag_pipeline.connect("prompt_builder", "llm")
113  
114  result = rag_pipeline.run(
115    {
116      "retriever": {"filters": {"field": "year", "operator": "==", "value": 2021}},
117      "prompt_builder": {"question": "Where does Mark live?"},
118    }
119  )
120  print(result['answer_builder']['answers'][0])`
121  ```
122  
123  Here’s an example output you might get:
124  
125  ```
126  According to the provided documents, Mark lives in Paris.
127  ```