Cradicle Explorer

/ docs-website / docs / pipeline-components / retrievers / azureaisearchbm25retriever.mdx
azureaisearchbm25retriever.mdx
  1  ---
  2  title: "AzureAISearchBM25Retriever"
  3  id: azureaisearchbm25retriever
  4  slug: "/azureaisearchbm25retriever"
  5  description: "A keyword-based Retriever that fetches Documents matching a query from the Azure AI Search Document Store."
  6  ---
  7  
  8  # AzureAISearchBM25Retriever
  9  
 10  A keyword-based Retriever that fetches Documents matching a query from the Azure AI Search Document Store.
 11  
 12  A keyword-based Retriever that fetches documents matching a query from the Azure AI Search Document Store.
 13  
 14  <div className="key-value-table">
 15  
 16  |  |  |
 17  | --- | --- |
 18  | **Most common position in a pipeline** | 1. Before a [`PromptBuilder`](../builders/promptbuilder.mdx) in a RAG pipeline 2. The last component in the semantic search pipeline 3. Before an [`ExtractiveReader`](../readers/extractivereader.mdx) in an extractive QA pipeline |
 19  | **Mandatory init variables**           | `document_store`: An instance of [`AzureAISearchDocumentStore`](../../document-stores/azureaisearchdocumentstore.mdx)                                                                                                                   |
 20  | **Mandatory run variables**            | `query`: A string                                                                                                                                                                                                 |
 21  | **Output variables**                   | `documents`: A list of documents (matching the query)                                                                                                                                                             |
 22  | **API reference**                      | [Azure AI Search](/reference/integrations-azure_ai_search)                                                                                                                                                               |
 23  | **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/azure_ai_search                                                                                                                 |
 24  
 25  </div>
 26  
 27  ## Overview
 28  
 29  The `AzureAISearchBM25Retriever` is a keyword-based Retriever designed to fetch documents that match a query from an `AzureAISearchDocumentStore`. It uses the BM25 algorithm which calculates a weighted word overlap between the query and the documents to determine their similarity. The Retriever accepts textual query but you can also provide a combination of terms with boolean operators. Some examples of valid queries could be `"pool"`, `"pool spa"`, and `"pool spa +airport"`.
 30  
 31  In addition to the `query`, the `AzureAISearchBM25Retriever` accepts other optional parameters, including `top_k` (the maximum number of documents to retrieve) and `filters` to narrow down the search space.
 32  
 33  If your search index includes a [semantic configuration](https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-request), you can enable semantic ranking to apply it to the Retriever's results. For more details, refer to the [Azure AI documentation](https://learn.microsoft.com/en-us/azure/search/hybrid-search-how-to-query#semantic-hybrid-search).
 34  
 35  If you want a combination of BM25 and vector retrieval, use the `AzureAISearchHybridRetriever`, which uses both vector search and BM25 search to match documents and query.
 36  
 37  ## Usage
 38  
 39  ### Installation
 40  
 41  This integration requires you to have an active Azure subscription with a deployed [Azure AI Search](https://azure.microsoft.com/en-us/products/ai-services/ai-search) service.
 42  
 43  To start using Azure AI search with Haystack, install the package with:
 44  
 45  ```shell
 46  pip install azure-ai-search-haystack
 47  ```
 48  
 49  ### On its own
 50  
 51  This Retriever needs `AzureAISearchDocumentStore` and indexed documents to run.
 52  
 53  ```python
 54  from haystack import Document
 55  from haystack_integrations.components.retrievers.azure_ai_search import (
 56      AzureAISearchBM25Retriever,
 57  )
 58  from haystack_integrations.document_stores.azure_ai_search import (
 59      AzureAISearchDocumentStore,
 60  )
 61  
 62  document_store = AzureAISearchDocumentStore(index_name="haystack_docs")
 63  documents = [
 64      Document(content="There are over 7,000 languages spoken around the world today."),
 65      Document(
 66          content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.",
 67      ),
 68      Document(
 69          content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.",
 70      ),
 71  ]
 72  document_store.write_documents(documents=documents)
 73  
 74  retriever = AzureAISearchBM25Retriever(document_store=document_store)
 75  retriever.run(query="How many languages are spoken around the world today?")
 76  ```
 77  
 78  ### In a RAG pipeline
 79  
 80  The below example shows how to use the `AzureAISearchBM25Retriever` in a RAG pipeline. Set your `OPENAI_API_KEY` as an environment variable and then run the following code:
 81  
 82  ```python
 83  
 84  from haystack_integrations.components.retrievers.azure_ai_search import (
 85      AzureAISearchBM25Retriever,
 86  )
 87  from haystack_integrations.document_stores.azure_ai_search import (
 88      AzureAISearchDocumentStore,
 89  )
 90  
 91  from haystack import Document
 92  from haystack import Pipeline
 93  from haystack.components.builders.answer_builder import AnswerBuilder
 94  from haystack.components.builders.prompt_builder import PromptBuilder
 95  from haystack.components.generators import OpenAIGenerator
 96  from haystack.document_stores.types import DuplicatePolicy
 97  
 98  import os
 99  
100  api_key = os.environ["OPENAI_API_KEY"]
101  
102  ## Create a RAG query pipeline
103  prompt_template = """
104      Given these documents, answer the question.\nDocuments:
105      {% for doc in documents %}
106          {{ doc.content }}
107      {% endfor %}
108  
109      \nQuestion: {{question}}
110      \nAnswer:
111      """
112  
113  document_store = AzureAISearchDocumentStore(index_name="haystack-docs")
114  
115  ## Add Documents
116  documents = [
117      Document(content="There are over 7,000 languages spoken around the world today."),
118      Document(
119          content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.",
120      ),
121      Document(
122          content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.",
123      ),
124  ]
125  
126  ## policy param is optional, as AzureAISearchDocumentStore has a default policy of DuplicatePolicy.OVERWRITE
127  document_store.write_documents(documents=documents, policy=DuplicatePolicy.OVERWRITE)
128  
129  retriever = AzureAISearchBM25Retriever(document_store=document_store)
130  rag_pipeline = Pipeline()
131  rag_pipeline.add_component(name="retriever", instance=retriever)
132  rag_pipeline.add_component(
133      instance=PromptBuilder(template=prompt_template),
134      name="prompt_builder",
135  )
136  rag_pipeline.add_component(instance=OpenAIGenerator(), name="llm")
137  rag_pipeline.add_component(instance=AnswerBuilder(), name="answer_builder")
138  rag_pipeline.connect("retriever", "prompt_builder.documents")
139  rag_pipeline.connect("prompt_builder", "llm")
140  rag_pipeline.connect("llm.replies", "answer_builder.replies")
141  rag_pipeline.connect("llm.meta", "answer_builder.meta")
142  rag_pipeline.connect("retriever", "answer_builder.documents")
143  
144  question = "Tell me something about languages?"
145  result = rag_pipeline.run(
146      {
147          "retriever": {"query": question},
148          "prompt_builder": {"question": question},
149          "answer_builder": {"query": question},
150      },
151  )
152  print(result["answer_builder"]["answers"][0])
153  ```