filterretriever.mdx
1 --- 2 title: "FilterRetriever" 3 id: filterretriever 4 slug: "/filterretriever" 5 description: "Use this Retriever with any Document Store to get the Documents that match specific filters." 6 --- 7 8 # FilterRetriever 9 10 Use this Retriever with any Document Store to get the Documents that match specific filters. 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | At the beginning of a Pipeline | 17 | **Mandatory init variables** | `document_store`: An instance of a Document Store | 18 | **Mandatory run variables** | `filters`: A dictionary of filters in the same syntax supported by the Document Stores | 19 | **Output variables** | `documents`: All the documents that match these filters | 20 | **API reference** | [Retrievers](/reference/retrievers-api) | 21 | **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/retrievers/filter_retriever.py | 22 23 </div> 24 25 ## Overview 26 27 `FilterRetriever` retrieves Documents that match the provided filters. 28 29 It’s a special kind of Retriever – it can work with all Document Stores instead of being specialized to work with only one. 30 31 However, as every other Retriever, it needs some Document Store at initialization time, and it will perform filtering on the content of that instance only. 32 33 Therefore, it can be used as any other Retriever in a Pipeline. 34 35 Pay attention when using `FilterRetriever` on a Document Store that contains many Documents, as `FilterRetriever` will return all documents that match the filters. The `run` command with no filters can easily overwhelm other components in the Pipeline (for example, Generators): 36 37 ```python 38 filter_retriever.run({}) 39 ``` 40 41 Another thing to note is that `FilterRetriever` does not score your Documents or rank them in any way. If you need to rank the Documents by similarity to a query, consider using Ranker components. 42 43 ## Usage 44 45 ### On its own 46 47 ```python 48 from haystack import Document 49 from haystack.components.retrievers import FilterRetriever 50 from haystack.document_stores.in_memory import InMemoryDocumentStore 51 52 docs = [ 53 Document(content="Python is a popular programming language", meta={"lang": "en"}), 54 Document( 55 content="python ist eine beliebte Programmiersprache", 56 meta={"lang": "de"}, 57 ), 58 ] 59 60 doc_store = InMemoryDocumentStore() 61 doc_store.write_documents(docs) 62 retriever = FilterRetriever(doc_store) 63 result = retriever.run(filters={"field": "lang", "operator": "==", "value": "en"}) 64 65 assert "documents" in result 66 assert len(result["documents"]) == 1 67 assert result["documents"][0].content == "Python is a popular programming language" 68 ``` 69 70 ### In a RAG pipeline 71 72 Set your `OPENAI_API_KEY` as an environment variable and then run the following code: 73 74 ```python 75 from haystack.components.retrievers.filter_retriever import FilterRetriever 76 from haystack.document_stores.in_memory import InMemoryDocumentStore 77 78 from haystack import Document, Pipeline 79 from haystack.components.builders.answer_builder import AnswerBuilder 80 from haystack.components.builders.prompt_builder import PromptBuilder 81 from haystack.components.generators import OpenAIGenerator 82 from haystack.document_stores.types import DuplicatePolicy 83 84 import os 85 api_key = os.environ['OPENAI_API_KEY'] 86 87 document_store = InMemoryDocumentStore() 88 documents = [ 89 Document(content="Mark lives in Berlin.", meta={"year": 2018}), 90 Document(content="Mark lives in Paris.", meta={"year": 2021}), 91 Document(content="Mark is Danish.", meta={"year": 2021}), 92 Document(content="Mark lives in New York.", meta={"year": 2023}), 93 ] 94 document_store.write_documents(documents=documents) 95 96 ## Create a RAG query pipeline 97 prompt_template = """ 98 Given these documents, answer the question.\nDocuments: 99 {% for doc in documents %} 100 {{ doc.content }} 101 {% endfor %} 102 103 \nQuestion: {{question}} 104 \nAnswer: 105 """ 106 107 rag_pipeline = Pipeline() 108 rag_pipeline.add_component(name="retriever", instance=FilterRetriever(document_store=document_store)) 109 rag_pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder") 110 rag_pipeline.add_component(instance=OpenAIGenerator(api_key=api_key), name="llm") 111 rag_pipeline.connect("retriever", "prompt_builder.documents") 112 rag_pipeline.connect("prompt_builder", "llm") 113 114 result = rag_pipeline.run( 115 { 116 "retriever": {"filters": {"field": "year", "operator": "==", "value": 2021}}, 117 "prompt_builder": {"question": "Where does Mark live?"}, 118 } 119 ) 120 print(result['answer_builder']['answers'][0])` 121 ``` 122 123 Here’s an example output you might get: 124 125 ``` 126 According to the provided documents, Mark lives in Paris. 127 ```