Cradicle Explorer

/ docs-website / docs / pipeline-components / retrievers / qdrantsparseembeddingretriever.mdx
qdrantsparseembeddingretriever.mdx
  1  ---
  2  title: "QdrantSparseEmbeddingRetriever"
  3  id: qdrantsparseembeddingretriever
  4  slug: "/qdrantsparseembeddingretriever"
  5  description: "A Retriever based on sparse embeddings, compatible with the Qdrant Document Store."
  6  ---
  7  
  8  # QdrantSparseEmbeddingRetriever
  9  
 10  A Retriever based on sparse embeddings, compatible with the Qdrant Document Store.
 11  
 12  <div className="key-value-table">
 13  
 14  |  |  |
 15  | --- | --- |
 16  | **Most common position in a pipeline** | 1\. After a Text Embedder and before a [`PromptBuilder`](../builders/promptbuilder.mdx)  in a RAG pipeline  <br /> <br />2. The last component in the semantic search pipeline  <br />   3. After a Text Embedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx)  in an extractive QA pipeline |
 17  | **Mandatory init variables** | `document_store`: An instance of a [QdrantDocumentStore](../../document-stores/qdrant-document-store.mdx) |
 18  | **Mandatory run variables** | `query_sparse_embedding`: A [`SparseEmbedding`](../../concepts/data-classes.mdx#sparseembedding)  object containing a vectorial representation of the query |
 19  | **Output variables** | `documents`: A list of documents |
 20  | **API reference** | [Qdrant](/reference/integrations-qdrant) |
 21  | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/qdrant |
 22  
 23  </div>
 24  
 25  ## Overview
 26  
 27  The `QdrantSparseEmbeddingRetriever` is a Retriever based on sparse embeddings, compatible with the [`QdrantDocumentStore`](../../document-stores/qdrant-document-store.mdx).
 28  
 29  It compares the query and document sparse embeddings and, based on the outcome, fetches the documents most relevant to the query from the `QdrantDocumentStore`.
 30  
 31  When using the `QdrantSparseEmbeddingRetriever`, make sure it has the query and document sparse embeddings available. You can do so by adding a sparse document Embedder to your indexing pipeline and a sparse text Embedder to your query pipeline.
 32  
 33  In addition to the `query_sparse_embedding`, the `QdrantSparseEmbeddingRetriever` accepts other optional parameters, including `top_k` (the maximum number of documents to retrieve) and `filters` to narrow down the search space.
 34  
 35  :::note[Sparse Embedding Support]
 36  
 37  To use Sparse Embedding support, you need to initialize the `QdrantDocumentStore` with `use_sparse_embeddings=True`, which is `False` by default.
 38  
 39  If you want to use Document Store or collection previously created with this feature disabled, you must migrate the existing data. You can do this by taking advantage of the `migrate_to_sparse_embeddings_support` utility function.
 40  :::
 41  
 42  ### Installation
 43  
 44  To start using Qdrant with Haystack, first install the package with:
 45  
 46  ```shell
 47  pip install qdrant-haystack
 48  ```
 49  
 50  ## Usage
 51  
 52  ### On its own
 53  
 54  This Retriever needs the `QdrantDocumentStore` and indexed documents to run.
 55  
 56  ```python
 57  from haystack_integrations.components.retrievers.qdrant import (
 58      QdrantSparseEmbeddingRetriever,
 59  )
 60  from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
 61  from haystack.dataclasses import Document, SparseEmbedding
 62  
 63  document_store = QdrantDocumentStore(
 64      ":memory:",
 65      use_sparse_embeddings=True,
 66      recreate_index=True,
 67      return_embedding=True,
 68  )
 69  
 70  doc = Document(
 71      content="test",
 72      sparse_embedding=SparseEmbedding(indices=[0, 3, 5], values=[0.1, 0.5, 0.12]),
 73  )
 74  document_store.write_documents([doc])
 75  
 76  retriever = QdrantSparseEmbeddingRetriever(document_store=document_store)
 77  sparse_embedding = SparseEmbedding(indices=[0, 1, 2, 3], values=[0.1, 0.8, 0.05, 0.33])
 78  retriever.run(query_sparse_embedding=sparse_embedding)
 79  ```
 80  
 81  ### In a pipeline
 82  
 83  In Haystack, you can compute sparse embeddings using Fastembed Embedders.
 84  
 85  First, install the package with:
 86  
 87  ```shell
 88  pip install fastembed-haystack
 89  ```
 90  
 91  Then, try out this pipeline:
 92  
 93  ```python
 94  from haystack import Document, Pipeline
 95  from haystack.components.writers import DocumentWriter
 96  from haystack_integrations.components.retrievers.qdrant import (
 97      QdrantSparseEmbeddingRetriever,
 98  )
 99  from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
100  from haystack.document_stores.types import DuplicatePolicy
101  from haystack_integrations.components.embedders.fastembed import (
102      FastembedDocumentEmbedder,
103      FastembedTextEmbedder,
104  )
105  
106  document_store = QdrantDocumentStore(
107      ":memory:",
108      recreate_index=True,
109      use_sparse_embeddings=True,
110  )
111  
112  documents = [
113      Document(content="My name is Wolfgang and I live in Berlin"),
114      Document(content="I saw a black horse running"),
115      Document(content="Germany has many big cities"),
116      Document(content="fastembed is supported by and maintained by Qdrant."),
117  ]
118  
119  sparse_document_embedder = FastembedSparseDocumentEmbedder()
120  writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.OVERWRITE)
121  
122  indexing_pipeline = Pipeline()
123  indexing_pipeline.add_component("sparse_document_embedder", sparse_document_embedder)
124  indexing_pipeline.add_component("writer", writer)
125  indexing_pipeline.connect("sparse_document_embedder", "writer")
126  
127  indexing_pipeline.run({"sparse_document_embedder": {"documents": documents}})
128  
129  query_pipeline = Pipeline()
130  query_pipeline.add_component("sparse_text_embedder", FastembedSparseTextEmbedder())
131  query_pipeline.add_component(
132      "sparse_retriever",
133      QdrantSparseEmbeddingRetriever(document_store=document_store),
134  )
135  query_pipeline.connect(
136      "sparse_text_embedder.sparse_embedding",
137      "sparse_retriever.query_sparse_embedding",
138  )
139  
140  query = "Who supports fastembed?"
141  
142  result = query_pipeline.run({"sparse_text_embedder": {"text": query}})
143  
144  print(result["sparse_retriever"]["documents"][0])  # noqa: T201
145  
146  ## Document(id=...,
147  ## content: 'fastembed is supported by and maintained by Qdrant.',
148  ## score: 0.758..)
149  ```
150  
151  ## Additional References
152  
153  🧑‍🍳 Cookbook: [Sparse Embedding Retrieval with Qdrant and FastEmbed](https://haystack.deepset.ai/cookbook/sparse_embedding_retrieval)