qdrantsparseembeddingretriever.mdx
1 --- 2 title: "QdrantSparseEmbeddingRetriever" 3 id: qdrantsparseembeddingretriever 4 slug: "/qdrantsparseembeddingretriever" 5 description: "A Retriever based on sparse embeddings, compatible with the Qdrant Document Store." 6 --- 7 8 # QdrantSparseEmbeddingRetriever 9 10 A Retriever based on sparse embeddings, compatible with the Qdrant Document Store. 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | 1\. After a Text Embedder and before a [`PromptBuilder`](../builders/promptbuilder.mdx) in a RAG pipeline <br /> <br />2. The last component in the semantic search pipeline <br /> 3. After a Text Embedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx) in an extractive QA pipeline | 17 | **Mandatory init variables** | `document_store`: An instance of a [QdrantDocumentStore](../../document-stores/qdrant-document-store.mdx) | 18 | **Mandatory run variables** | `query_sparse_embedding`: A [`SparseEmbedding`](../../concepts/data-classes.mdx#sparseembedding) object containing a vectorial representation of the query | 19 | **Output variables** | `documents`: A list of documents | 20 | **API reference** | [Qdrant](/reference/integrations-qdrant) | 21 | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/qdrant | 22 23 </div> 24 25 ## Overview 26 27 The `QdrantSparseEmbeddingRetriever` is a Retriever based on sparse embeddings, compatible with the [`QdrantDocumentStore`](../../document-stores/qdrant-document-store.mdx). 28 29 It compares the query and document sparse embeddings and, based on the outcome, fetches the documents most relevant to the query from the `QdrantDocumentStore`. 30 31 When using the `QdrantSparseEmbeddingRetriever`, make sure it has the query and document sparse embeddings available. You can do so by adding a sparse document Embedder to your indexing pipeline and a sparse text Embedder to your query pipeline. 32 33 In addition to the `query_sparse_embedding`, the `QdrantSparseEmbeddingRetriever` accepts other optional parameters, including `top_k` (the maximum number of documents to retrieve) and `filters` to narrow down the search space. 34 35 :::note[Sparse Embedding Support] 36 37 To use Sparse Embedding support, you need to initialize the `QdrantDocumentStore` with `use_sparse_embeddings=True`, which is `False` by default. 38 39 If you want to use Document Store or collection previously created with this feature disabled, you must migrate the existing data. You can do this by taking advantage of the `migrate_to_sparse_embeddings_support` utility function. 40 ::: 41 42 ### Installation 43 44 To start using Qdrant with Haystack, first install the package with: 45 46 ```shell 47 pip install qdrant-haystack 48 ``` 49 50 ## Usage 51 52 ### On its own 53 54 This Retriever needs the `QdrantDocumentStore` and indexed documents to run. 55 56 ```python 57 from haystack_integrations.components.retrievers.qdrant import ( 58 QdrantSparseEmbeddingRetriever, 59 ) 60 from haystack_integrations.document_stores.qdrant import QdrantDocumentStore 61 from haystack.dataclasses import Document, SparseEmbedding 62 63 document_store = QdrantDocumentStore( 64 ":memory:", 65 use_sparse_embeddings=True, 66 recreate_index=True, 67 return_embedding=True, 68 ) 69 70 doc = Document( 71 content="test", 72 sparse_embedding=SparseEmbedding(indices=[0, 3, 5], values=[0.1, 0.5, 0.12]), 73 ) 74 document_store.write_documents([doc]) 75 76 retriever = QdrantSparseEmbeddingRetriever(document_store=document_store) 77 sparse_embedding = SparseEmbedding(indices=[0, 1, 2, 3], values=[0.1, 0.8, 0.05, 0.33]) 78 retriever.run(query_sparse_embedding=sparse_embedding) 79 ``` 80 81 ### In a pipeline 82 83 In Haystack, you can compute sparse embeddings using Fastembed Embedders. 84 85 First, install the package with: 86 87 ```shell 88 pip install fastembed-haystack 89 ``` 90 91 Then, try out this pipeline: 92 93 ```python 94 from haystack import Document, Pipeline 95 from haystack.components.writers import DocumentWriter 96 from haystack_integrations.components.retrievers.qdrant import ( 97 QdrantSparseEmbeddingRetriever, 98 ) 99 from haystack_integrations.document_stores.qdrant import QdrantDocumentStore 100 from haystack.document_stores.types import DuplicatePolicy 101 from haystack_integrations.components.embedders.fastembed import ( 102 FastembedDocumentEmbedder, 103 FastembedTextEmbedder, 104 ) 105 106 document_store = QdrantDocumentStore( 107 ":memory:", 108 recreate_index=True, 109 use_sparse_embeddings=True, 110 ) 111 112 documents = [ 113 Document(content="My name is Wolfgang and I live in Berlin"), 114 Document(content="I saw a black horse running"), 115 Document(content="Germany has many big cities"), 116 Document(content="fastembed is supported by and maintained by Qdrant."), 117 ] 118 119 sparse_document_embedder = FastembedSparseDocumentEmbedder() 120 writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.OVERWRITE) 121 122 indexing_pipeline = Pipeline() 123 indexing_pipeline.add_component("sparse_document_embedder", sparse_document_embedder) 124 indexing_pipeline.add_component("writer", writer) 125 indexing_pipeline.connect("sparse_document_embedder", "writer") 126 127 indexing_pipeline.run({"sparse_document_embedder": {"documents": documents}}) 128 129 query_pipeline = Pipeline() 130 query_pipeline.add_component("sparse_text_embedder", FastembedSparseTextEmbedder()) 131 query_pipeline.add_component( 132 "sparse_retriever", 133 QdrantSparseEmbeddingRetriever(document_store=document_store), 134 ) 135 query_pipeline.connect( 136 "sparse_text_embedder.sparse_embedding", 137 "sparse_retriever.query_sparse_embedding", 138 ) 139 140 query = "Who supports fastembed?" 141 142 result = query_pipeline.run({"sparse_text_embedder": {"text": query}}) 143 144 print(result["sparse_retriever"]["documents"][0]) # noqa: T201 145 146 ## Document(id=..., 147 ## content: 'fastembed is supported by and maintained by Qdrant.', 148 ## score: 0.758..) 149 ``` 150 151 ## Additional References 152 153 🧑🍳 Cookbook: [Sparse Embedding Retrieval with Qdrant and FastEmbed](https://haystack.deepset.ai/cookbook/sparse_embedding_retrieval)