Cradicle Explorer

/ docs-website / docs / pipeline-components / retrievers / qdrantembeddingretriever.mdx
qdrantembeddingretriever.mdx
  1  ---
  2  title: "QdrantEmbeddingRetriever"
  3  id: qdrantembeddingretriever
  4  slug: "/qdrantembeddingretriever"
  5  description: "An embedding-based Retriever compatible with the Qdrant Document Store."
  6  ---
  7  
  8  # QdrantEmbeddingRetriever
  9  
 10  An embedding-based Retriever compatible with the Qdrant Document Store.
 11  
 12  <div className="key-value-table">
 13  
 14  |  |  |
 15  | --- | --- |
 16  | **Most common position in a pipeline** | 1\. After a Text Embedder and before a [`PromptBuilder`](../builders/promptbuilder.mdx)  in a RAG Pipeline  <br /> <br />2. The last component in the semantic search pipeline  <br />3. After a Text Embedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx)  in an extractive QA pipeline |
 17  | **Mandatory init variables** | `document_store`: An instance of a [QdrantDocumentStore](../../document-stores/qdrant-document-store.mdx) |
 18  | **Mandatory run variables** | `query_embedding`: A vector representing the query (a list of floats) |
 19  | **Output variables** | `documents`: A list of documents |
 20  | **API reference** | [Qdrant](/reference/integrations-qdrant) |
 21  | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/qdrant |
 22  
 23  </div>
 24  
 25  ## Overview
 26  
 27  The `QdrantEmbeddingRetriever` is an embedding-based Retriever compatible with the `QdrantDocumentStore`. It compares the query and Document embeddings and fetches the Documents most relevant to the query from the `QdrantDocumentStore` based on the outcome.
 28  
 29  When using the `QdrantEmbeddingRetriever` in your NLP system, make sure it has the query and Document embeddings available. You can add a Document Embedder to your indexing Pipeline and a Text Embedder to your query Pipeline.
 30  
 31  In addition to the `query_embedding`, the `QdrantEmbeddingRetriever` accepts other optional parameters, including `top_k` (the maximum number of Documents to retrieve) and `filters` to narrow down the search space.
 32  
 33  Some relevant parameters that impact the embedding retrieval must be defined when the corresponding `QdrantDocumentStore` is initialized: these include the embedding dimension (`embedding_dim`), the `similarity` function to use when comparing embeddings and the HNWS configuration (`hnsw_config`).
 34  
 35  ### Installation
 36  
 37  To start using Qdrant with Haystack, first install the package with:
 38  
 39  ```shell
 40  pip install qdrant-haystack
 41  ```
 42  
 43  ### Usage
 44  
 45  #### On its own
 46  
 47  This Retriever needs the `QdrantDocumentStore` and indexed Documents to run.
 48  
 49  ```python
 50  from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
 51  from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
 52  
 53  document_store = QdrantDocumentStore(
 54      ":memory:",
 55      recreate_index=True,
 56      return_embedding=True,
 57      wait_result_from_api=True,
 58  )
 59  retriever = QdrantEmbeddingRetriever(document_store=document_store)
 60  
 61  ## using a fake vector to keep the example simple
 62  retriever.run(query_embedding=[0.1] * 768)
 63  ```
 64  
 65  #### In a Pipeline
 66  
 67  ```python
 68  from haystack.document_stores.types import DuplicatePolicy
 69  from haystack import Document
 70  from haystack import Pipeline
 71  from haystack.components.embedders import (
 72      SentenceTransformersTextEmbedder,
 73      SentenceTransformersDocumentEmbedder,
 74  )
 75  
 76  from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
 77  from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
 78  
 79  document_store = QdrantDocumentStore(
 80      ":memory:",
 81      recreate_index=True,
 82      return_embedding=True,
 83      wait_result_from_api=True,
 84  )
 85  
 86  documents = [
 87      Document(content="There are over 7,000 languages spoken around the world today."),
 88      Document(
 89          content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.",
 90      ),
 91      Document(
 92          content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.",
 93      ),
 94  ]
 95  
 96  document_embedder = SentenceTransformersDocumentEmbedder()
 97  documents_with_embeddings = document_embedder.run(documents)
 98  
 99  document_store.write_documents(
100      documents_with_embeddings.get("documents"),
101      policy=DuplicatePolicy.OVERWRITE,
102  )
103  
104  query_pipeline = Pipeline()
105  query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
106  query_pipeline.add_component(
107      "retriever",
108      QdrantEmbeddingRetriever(document_store=document_store),
109  )
110  query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
111  
112  query = "How many languages are there?"
113  
114  result = query_pipeline.run({"text_embedder": {"text": query}})
115  
116  print(result["retriever"]["documents"][0])
117  ```