Cradicle Explorer

/ docs-website / docs / pipeline-components / retrievers / azureaisearchhybridretriever.mdx
azureaisearchhybridretriever.mdx
  1  ---
  2  title: "AzureAISearchHybridRetriever"
  3  id: azureaisearchhybridretriever
  4  slug: "/azureaisearchhybridretriever"
  5  description: "A Retriever based both on dense and sparse embeddings, compatible with the Azure AI Search Document Store."
  6  ---
  7  
  8  # AzureAISearchHybridRetriever
  9  
 10  A Retriever based both on dense and sparse embeddings, compatible with the Azure AI Search Document Store.
 11  
 12  This Retriever combines embedding-based retrieval and BM25 text search search to find matching documents in the search index to get more relevant results.
 13  
 14  <div className="key-value-table">
 15  
 16  |  |  |
 17  | --- | --- |
 18  | **Most common position in a pipeline** | 1. After a TextEmbedder and before a [`PromptBuilder`](../builders/promptbuilder.mdx) in a RAG pipeline 2. The last component in a hybrid search pipeline 3. After a TextEmbedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx) in an extractive QA pipeline |
 19  | **Mandatory init variables** | `document_store`: An instance of [`AzureAISearchDocumentStore`](../../document-stores/azureaisearchdocumentstore.mdx) |
 20  | **Mandatory run variables** | `query`: A string  <br /> <br />`query_embedding`: A list of floats |
 21  | **Output variables** | `documents`: A list of documents (matching the query) |
 22  | **API reference** | [Azure AI Search](/reference/integrations-azure_ai_search) |
 23  | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/azure_ai_search |
 24  
 25  </div>
 26  
 27  ## Overview
 28  
 29  The `AzureAISearchHybridRetriever` combines vector retrieval and BM25 text search to fetch relevant documents from the `AzureAISearchDocumentStore`. It processes both textual (keyword) queries and query embeddings in a single request, executing all subqueries in parallel. The results are merged and reordered using [Reciprocal Rank Fusion (RRF)](https://learn.microsoft.com/en-us/azure/search/hybrid-search-ranking) to create a unified result set.
 30  
 31  Besides the `query` and `query_embedding`, the `AzureAISearchHybridRetriever` accepts optional parameters such as `top_k` (the maximum number of documents to retrieve) and `filters` to refine the search. Additional keyword arguments can also be passed during initialization for further customization.
 32  
 33  If your search index includes a [semantic configuration](https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-request), you can enable semantic ranking to apply it to the Retriever's results. For more details, refer to the [Azure AI documentation](https://learn.microsoft.com/en-us/azure/search/hybrid-search-how-to-query#semantic-hybrid-search).
 34  
 35  For purely keyword-based retrieval, you can use `AzureAISearchBM25Retriever`, and for embedding-based retrieval, `AzureAISearchEmbeddingRetriever` is available.
 36  
 37  ## Usage
 38  
 39  ### Installation
 40  
 41  This integration requires you to have an active Azure subscription with a deployed [Azure AI Search](https://azure.microsoft.com/en-us/products/ai-services/ai-search) service.
 42  
 43  To start using Azure AI search with Haystack, install the package with:
 44  
 45  ```shell
 46  pip install azure-ai-search-haystack
 47  ```
 48  
 49  ### On its own
 50  
 51  This Retriever needs `AzureAISearchDocumentStore` and indexed documents to run.
 52  
 53  ```python
 54  from haystack import Document
 55  from haystack_integrations.components.retrievers.azure_ai_search import (
 56      AzureAISearchHybridRetriever,
 57  )
 58  from haystack_integrations.document_stores.azure_ai_search import (
 59      AzureAISearchDocumentStore,
 60  )
 61  
 62  document_store = AzureAISearchDocumentStore(index_name="haystack_docs")
 63  documents = [
 64      Document(content="There are over 7,000 languages spoken around the world today."),
 65      Document(
 66          content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.",
 67      ),
 68      Document(
 69          content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.",
 70      ),
 71  ]
 72  document_store.write_documents(documents=documents)
 73  
 74  retriever = AzureAISearchHybridRetriever(document_store=document_store)
 75  ## fake embeddings to keep the example simple
 76  retriever.run(
 77      query="How many languages are spoken around the world today?",
 78      query_embedding=[0.1] * 384,
 79  )
 80  ```
 81  
 82  ### In a RAG pipeline
 83  
 84  The following example demonstrates using the `AzureAISearchHybridRetriever` in a pipeline. An indexing pipeline is responsible for indexing and storing documents with embeddings in the `AzureAISearchDocumentStore`, while the query pipeline uses hybrid retrieval to fetch relevant documents based on a given query.
 85  
 86  ```python
 87  from haystack import Document, Pipeline
 88  from haystack.components.embedders import (
 89      SentenceTransformersDocumentEmbedder,
 90      SentenceTransformersTextEmbedder,
 91  )
 92  from haystack.components.writers import DocumentWriter
 93  
 94  from haystack_integrations.components.retrievers.azure_ai_search import (
 95      AzureAISearchHybridRetriever,
 96  )
 97  from haystack_integrations.document_stores.azure_ai_search import (
 98      AzureAISearchDocumentStore,
 99  )
100  
101  document_store = AzureAISearchDocumentStore(index_name="hybrid-retrieval-example")
102  
103  model = "sentence-transformers/all-mpnet-base-v2"
104  
105  documents = [
106      Document(content="There are over 7,000 languages spoken around the world today."),
107      Document(
108          content="""Elephants have been observed to behave in a way that indicates a
109           high level of self-awareness, such as recognizing themselves in mirrors.""",
110      ),
111      Document(
112          content="""In certain parts of the world, like the Maldives, Puerto Rico, and
113            San Diego, you can witness the phenomenon of bioluminescent waves.""",
114      ),
115  ]
116  
117  document_embedder = SentenceTransformersDocumentEmbedder(model=model)
118  
119  ## Indexing Pipeline
120  indexing_pipeline = Pipeline()
121  indexing_pipeline.add_component(instance=document_embedder, name="doc_embedder")
122  indexing_pipeline.add_component(
123      instance=DocumentWriter(document_store=document_store),
124      name="doc_writer",
125  )
126  indexing_pipeline.connect("doc_embedder", "doc_writer")
127  
128  indexing_pipeline.run({"doc_embedder": {"documents": documents}})
129  
130  ## Query Pipeline
131  query_pipeline = Pipeline()
132  query_pipeline.add_component(
133      "text_embedder",
134      SentenceTransformersTextEmbedder(model=model),
135  )
136  query_pipeline.add_component(
137      "retriever",
138      AzureAISearchHybridRetriever(document_store=document_store),
139  )
140  query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
141  
142  query = "How many languages are there?"
143  
144  result = query_pipeline.run(
145      {"text_embedder": {"text": query}, "retriever": {"query": query}},
146  )
147  
148  print(result["retriever"]["documents"][0])
149  ```