azureaisearchhybridretriever.mdx
1 --- 2 title: "AzureAISearchHybridRetriever" 3 id: azureaisearchhybridretriever 4 slug: "/azureaisearchhybridretriever" 5 description: "A Retriever based both on dense and sparse embeddings, compatible with the Azure AI Search Document Store." 6 --- 7 8 # AzureAISearchHybridRetriever 9 10 A Retriever based both on dense and sparse embeddings, compatible with the Azure AI Search Document Store. 11 12 This Retriever combines embedding-based retrieval and BM25 text search search to find matching documents in the search index to get more relevant results. 13 14 <div className="key-value-table"> 15 16 | | | 17 | --- | --- | 18 | **Most common position in a pipeline** | 1. After a TextEmbedder and before a [`PromptBuilder`](../builders/promptbuilder.mdx) in a RAG pipeline 2. The last component in a hybrid search pipeline 3. After a TextEmbedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx) in an extractive QA pipeline | 19 | **Mandatory init variables** | `document_store`: An instance of [`AzureAISearchDocumentStore`](../../document-stores/azureaisearchdocumentstore.mdx) | 20 | **Mandatory run variables** | `query`: A string <br /> <br />`query_embedding`: A list of floats | 21 | **Output variables** | `documents`: A list of documents (matching the query) | 22 | **API reference** | [Azure AI Search](/reference/integrations-azure_ai_search) | 23 | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/azure_ai_search | 24 25 </div> 26 27 ## Overview 28 29 The `AzureAISearchHybridRetriever` combines vector retrieval and BM25 text search to fetch relevant documents from the `AzureAISearchDocumentStore`. It processes both textual (keyword) queries and query embeddings in a single request, executing all subqueries in parallel. The results are merged and reordered using [Reciprocal Rank Fusion (RRF)](https://learn.microsoft.com/en-us/azure/search/hybrid-search-ranking) to create a unified result set. 30 31 Besides the `query` and `query_embedding`, the `AzureAISearchHybridRetriever` accepts optional parameters such as `top_k` (the maximum number of documents to retrieve) and `filters` to refine the search. Additional keyword arguments can also be passed during initialization for further customization. 32 33 If your search index includes a [semantic configuration](https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-request), you can enable semantic ranking to apply it to the Retriever's results. For more details, refer to the [Azure AI documentation](https://learn.microsoft.com/en-us/azure/search/hybrid-search-how-to-query#semantic-hybrid-search). 34 35 For purely keyword-based retrieval, you can use `AzureAISearchBM25Retriever`, and for embedding-based retrieval, `AzureAISearchEmbeddingRetriever` is available. 36 37 ## Usage 38 39 ### Installation 40 41 This integration requires you to have an active Azure subscription with a deployed [Azure AI Search](https://azure.microsoft.com/en-us/products/ai-services/ai-search) service. 42 43 To start using Azure AI search with Haystack, install the package with: 44 45 ```shell 46 pip install azure-ai-search-haystack 47 ``` 48 49 ### On its own 50 51 This Retriever needs `AzureAISearchDocumentStore` and indexed documents to run. 52 53 ```python 54 from haystack import Document 55 from haystack_integrations.components.retrievers.azure_ai_search import ( 56 AzureAISearchHybridRetriever, 57 ) 58 from haystack_integrations.document_stores.azure_ai_search import ( 59 AzureAISearchDocumentStore, 60 ) 61 62 document_store = AzureAISearchDocumentStore(index_name="haystack_docs") 63 documents = [ 64 Document(content="There are over 7,000 languages spoken around the world today."), 65 Document( 66 content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.", 67 ), 68 Document( 69 content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.", 70 ), 71 ] 72 document_store.write_documents(documents=documents) 73 74 retriever = AzureAISearchHybridRetriever(document_store=document_store) 75 ## fake embeddings to keep the example simple 76 retriever.run( 77 query="How many languages are spoken around the world today?", 78 query_embedding=[0.1] * 384, 79 ) 80 ``` 81 82 ### In a RAG pipeline 83 84 The following example demonstrates using the `AzureAISearchHybridRetriever` in a pipeline. An indexing pipeline is responsible for indexing and storing documents with embeddings in the `AzureAISearchDocumentStore`, while the query pipeline uses hybrid retrieval to fetch relevant documents based on a given query. 85 86 ```python 87 from haystack import Document, Pipeline 88 from haystack.components.embedders import ( 89 SentenceTransformersDocumentEmbedder, 90 SentenceTransformersTextEmbedder, 91 ) 92 from haystack.components.writers import DocumentWriter 93 94 from haystack_integrations.components.retrievers.azure_ai_search import ( 95 AzureAISearchHybridRetriever, 96 ) 97 from haystack_integrations.document_stores.azure_ai_search import ( 98 AzureAISearchDocumentStore, 99 ) 100 101 document_store = AzureAISearchDocumentStore(index_name="hybrid-retrieval-example") 102 103 model = "sentence-transformers/all-mpnet-base-v2" 104 105 documents = [ 106 Document(content="There are over 7,000 languages spoken around the world today."), 107 Document( 108 content="""Elephants have been observed to behave in a way that indicates a 109 high level of self-awareness, such as recognizing themselves in mirrors.""", 110 ), 111 Document( 112 content="""In certain parts of the world, like the Maldives, Puerto Rico, and 113 San Diego, you can witness the phenomenon of bioluminescent waves.""", 114 ), 115 ] 116 117 document_embedder = SentenceTransformersDocumentEmbedder(model=model) 118 119 ## Indexing Pipeline 120 indexing_pipeline = Pipeline() 121 indexing_pipeline.add_component(instance=document_embedder, name="doc_embedder") 122 indexing_pipeline.add_component( 123 instance=DocumentWriter(document_store=document_store), 124 name="doc_writer", 125 ) 126 indexing_pipeline.connect("doc_embedder", "doc_writer") 127 128 indexing_pipeline.run({"doc_embedder": {"documents": documents}}) 129 130 ## Query Pipeline 131 query_pipeline = Pipeline() 132 query_pipeline.add_component( 133 "text_embedder", 134 SentenceTransformersTextEmbedder(model=model), 135 ) 136 query_pipeline.add_component( 137 "retriever", 138 AzureAISearchHybridRetriever(document_store=document_store), 139 ) 140 query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") 141 142 query = "How many languages are there?" 143 144 result = query_pipeline.run( 145 {"text_embedder": {"text": query}, "retriever": {"query": query}}, 146 ) 147 148 print(result["retriever"]["documents"][0]) 149 ```