azureaisearchbm25retriever.mdx
1 --- 2 title: "AzureAISearchBM25Retriever" 3 id: azureaisearchbm25retriever 4 slug: "/azureaisearchbm25retriever" 5 description: "A keyword-based Retriever that fetches Documents matching a query from the Azure AI Search Document Store." 6 --- 7 8 # AzureAISearchBM25Retriever 9 10 A keyword-based Retriever that fetches Documents matching a query from the Azure AI Search Document Store. 11 12 A keyword-based Retriever that fetches documents matching a query from the Azure AI Search Document Store. 13 14 <div className="key-value-table"> 15 16 | | | 17 | --- | --- | 18 | **Most common position in a pipeline** | 1. Before a [`PromptBuilder`](../builders/promptbuilder.mdx) in a RAG pipeline 2. The last component in the semantic search pipeline 3. Before an [`ExtractiveReader`](../readers/extractivereader.mdx) in an extractive QA pipeline | 19 | **Mandatory init variables** | `document_store`: An instance of [`AzureAISearchDocumentStore`](../../document-stores/azureaisearchdocumentstore.mdx) | 20 | **Mandatory run variables** | `query`: A string | 21 | **Output variables** | `documents`: A list of documents (matching the query) | 22 | **API reference** | [Azure AI Search](/reference/integrations-azure_ai_search) | 23 | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/azure_ai_search | 24 25 </div> 26 27 ## Overview 28 29 The `AzureAISearchBM25Retriever` is a keyword-based Retriever designed to fetch documents that match a query from an `AzureAISearchDocumentStore`. It uses the BM25 algorithm which calculates a weighted word overlap between the query and the documents to determine their similarity. The Retriever accepts textual query but you can also provide a combination of terms with boolean operators. Some examples of valid queries could be `"pool"`, `"pool spa"`, and `"pool spa +airport"`. 30 31 In addition to the `query`, the `AzureAISearchBM25Retriever` accepts other optional parameters, including `top_k` (the maximum number of documents to retrieve) and `filters` to narrow down the search space. 32 33 If your search index includes a [semantic configuration](https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-request), you can enable semantic ranking to apply it to the Retriever's results. For more details, refer to the [Azure AI documentation](https://learn.microsoft.com/en-us/azure/search/hybrid-search-how-to-query#semantic-hybrid-search). 34 35 If you want a combination of BM25 and vector retrieval, use the `AzureAISearchHybridRetriever`, which uses both vector search and BM25 search to match documents and query. 36 37 ## Usage 38 39 ### Installation 40 41 This integration requires you to have an active Azure subscription with a deployed [Azure AI Search](https://azure.microsoft.com/en-us/products/ai-services/ai-search) service. 42 43 To start using Azure AI search with Haystack, install the package with: 44 45 ```shell 46 pip install azure-ai-search-haystack 47 ``` 48 49 ### On its own 50 51 This Retriever needs `AzureAISearchDocumentStore` and indexed documents to run. 52 53 ```python 54 from haystack import Document 55 from haystack_integrations.components.retrievers.azure_ai_search import ( 56 AzureAISearchBM25Retriever, 57 ) 58 from haystack_integrations.document_stores.azure_ai_search import ( 59 AzureAISearchDocumentStore, 60 ) 61 62 document_store = AzureAISearchDocumentStore(index_name="haystack_docs") 63 documents = [ 64 Document(content="There are over 7,000 languages spoken around the world today."), 65 Document( 66 content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.", 67 ), 68 Document( 69 content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.", 70 ), 71 ] 72 document_store.write_documents(documents=documents) 73 74 retriever = AzureAISearchBM25Retriever(document_store=document_store) 75 retriever.run(query="How many languages are spoken around the world today?") 76 ``` 77 78 ### In a RAG pipeline 79 80 The below example shows how to use the `AzureAISearchBM25Retriever` in a RAG pipeline. Set your `OPENAI_API_KEY` as an environment variable and then run the following code: 81 82 ```python 83 84 from haystack_integrations.components.retrievers.azure_ai_search import ( 85 AzureAISearchBM25Retriever, 86 ) 87 from haystack_integrations.document_stores.azure_ai_search import ( 88 AzureAISearchDocumentStore, 89 ) 90 91 from haystack import Document 92 from haystack import Pipeline 93 from haystack.components.builders.answer_builder import AnswerBuilder 94 from haystack.components.builders.prompt_builder import PromptBuilder 95 from haystack.components.generators import OpenAIGenerator 96 from haystack.document_stores.types import DuplicatePolicy 97 98 import os 99 100 api_key = os.environ["OPENAI_API_KEY"] 101 102 ## Create a RAG query pipeline 103 prompt_template = """ 104 Given these documents, answer the question.\nDocuments: 105 {% for doc in documents %} 106 {{ doc.content }} 107 {% endfor %} 108 109 \nQuestion: {{question}} 110 \nAnswer: 111 """ 112 113 document_store = AzureAISearchDocumentStore(index_name="haystack-docs") 114 115 ## Add Documents 116 documents = [ 117 Document(content="There are over 7,000 languages spoken around the world today."), 118 Document( 119 content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.", 120 ), 121 Document( 122 content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.", 123 ), 124 ] 125 126 ## policy param is optional, as AzureAISearchDocumentStore has a default policy of DuplicatePolicy.OVERWRITE 127 document_store.write_documents(documents=documents, policy=DuplicatePolicy.OVERWRITE) 128 129 retriever = AzureAISearchBM25Retriever(document_store=document_store) 130 rag_pipeline = Pipeline() 131 rag_pipeline.add_component(name="retriever", instance=retriever) 132 rag_pipeline.add_component( 133 instance=PromptBuilder(template=prompt_template), 134 name="prompt_builder", 135 ) 136 rag_pipeline.add_component(instance=OpenAIGenerator(), name="llm") 137 rag_pipeline.add_component(instance=AnswerBuilder(), name="answer_builder") 138 rag_pipeline.connect("retriever", "prompt_builder.documents") 139 rag_pipeline.connect("prompt_builder", "llm") 140 rag_pipeline.connect("llm.replies", "answer_builder.replies") 141 rag_pipeline.connect("llm.meta", "answer_builder.meta") 142 rag_pipeline.connect("retriever", "answer_builder.documents") 143 144 question = "Tell me something about languages?" 145 result = rag_pipeline.run( 146 { 147 "retriever": {"query": question}, 148 "prompt_builder": {"question": question}, 149 "answer_builder": {"query": question}, 150 }, 151 ) 152 print(result["answer_builder"]["answers"][0]) 153 ```