Cradicle Explorer

/ docs-website / docs / pipeline-components / retrievers / chromaembeddingretriever.mdx
chromaembeddingretriever.mdx
  1  ---
  2  title: "ChromaEmbeddingRetriever"
  3  id: chromaembeddingretriever
  4  slug: "/chromaembeddingretriever"
  5  description: "This is an embedding Retriever compatible with the Chroma Document Store."
  6  ---
  7  
  8  # ChromaEmbeddingRetriever
  9  
 10  This is an embedding Retriever compatible with the Chroma Document Store.
 11  
 12  <div className="key-value-table">
 13  
 14  |  |  |
 15  | --- | --- |
 16  | **Most common position in a pipeline** | 1. After a Text Embedder and before a [`PromptBuilder`](../builders/promptbuilder.mdx)   in a RAG pipeline  2. The last component in the semantic search pipeline  3. After a Text Embedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx)   in an extractive QA pipeline |
 17  | **Mandatory init variables**           | `document_store`: An instance of a [ChromaDocumentStore](../../document-stores/chromadocumentstore.mdx)                                                                                                                                                                                         |
 18  | **Mandatory run variables**            | `query_embedding`: A list of floats                                                                                                                                                                                                                                         |
 19  | **Output variables**                   | `documents`: A list of documents                                                                                                                                                                                                                                            |
 20  | **API reference**                      | [Chroma](/reference/integrations-chroma)                                                                                                                                                                                                                                           |
 21  | **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/chroma                                                                                                                                                                                    |
 22  
 23  </div>
 24  
 25  ## Overview
 26  
 27  The `ChromaEmbeddingRetriever` is an embedding-based Retriever compatible with the `ChromaDocumentStore`. It compares the query and document embeddings and fetches the documents most relevant to the query from the `ChromaDocumentStore` based on the outcome.
 28  
 29  The query needs to be embedded before being passed to this component. For example, you could use a text [embedder](../embedders.mdx) component.
 30  
 31  In addition to the `query_embedding`, the `ChromaEmbeddingRetriever` accepts other optional parameters, including `top_k` (the maximum number of documents to retrieve) and `filters` to narrow down the search space.
 32  
 33  ### Usage
 34  
 35  #### On its own
 36  
 37  This Retriever needs the `ChromaDocumentStore` and indexed documents to run.
 38  
 39  ```python
 40  from haystack_integrations.document_stores.chroma import ChromaDocumentStore
 41  from haystack_integrations.components.retrievers.chroma import ChromaEmbeddingRetriever
 42  
 43  document_store = ChromaDocumentStore()
 44  
 45  retriever = ChromaEmbeddingRetriever(document_store=document_store)
 46  
 47  ## example run query
 48  retriever.run(query_embedding=[0.1] * 384)
 49  ```
 50  
 51  #### In a pipeline
 52  
 53  Here is how you could use the `ChromaEmbeddingRetriever` in a pipeline. In this example, you would create two pipelines: an indexing one and a querying one.
 54  
 55  In the indexing pipeline, the documents are passed to the Document Embedder and then written into the document Store.
 56  
 57  Then, in the querying pipeline, we use a text embedder to get the vector representation of the input query that will be then passed to the  `ChromaEmbeddingRetriever` to get the results.
 58  
 59  ```python
 60  import os
 61  from pathlib import Path
 62  
 63  from haystack import Pipeline
 64  from haystack.dataclasses import Document
 65  from haystack.components.writers import DocumentWriter
 66  
 67  ## Note: the following requires a "pip install sentence-transformers"
 68  from haystack.components.embedders import (
 69      SentenceTransformersDocumentEmbedder,
 70      SentenceTransformersTextEmbedder,
 71  )
 72  
 73  from haystack_integrations.document_stores.chroma import ChromaDocumentStore
 74  from haystack_integrations.components.retrievers.chroma import ChromaEmbeddingRetriever
 75  from sentence_transformers import SentenceTransformer
 76  
 77  ## Chroma is used in-memory so we use the same instances in the two pipelines below
 78  document_store = ChromaDocumentStore()
 79  
 80  documents = [
 81      Document(content="This contains variable declarations", meta={"title": "one"}),
 82      Document(
 83          content="This contains another sort of variable declarations",
 84          meta={"title": "two"},
 85      ),
 86      Document(
 87          content="This has nothing to do with variable declarations",
 88          meta={"title": "three"},
 89      ),
 90      Document(content="A random doc", meta={"title": "four"}),
 91  ]
 92  
 93  indexing = Pipeline()
 94  indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
 95  indexing.add_component("writer", DocumentWriter(document_store))
 96  indexing.connect("embedder.documents", "writer.documents")
 97  indexing.run({"embedder": {"documents": documents}})
 98  
 99  querying = Pipeline()
100  querying.add_component("query_embedder", SentenceTransformersTextEmbedder())
101  querying.add_component("retriever", ChromaEmbeddingRetriever(document_store))
102  querying.connect("query_embedder.embedding", "retriever.query_embedding")
103  results = querying.run({"query_embedder": {"text": "Variable declarations"}})
104  
105  for d in results["retriever"]["documents"]:
106      print(d.meta, d.score)
107  ```
108  
109  ## Additional References
110  
111  🧑‍🍳 Cookbook: [Use Chroma for RAG and Indexing](https://haystack.deepset.ai/cookbook/chroma-indexing-and-rag-examples)