chromaembeddingretriever.mdx
1 --- 2 title: "ChromaEmbeddingRetriever" 3 id: chromaembeddingretriever 4 slug: "/chromaembeddingretriever" 5 description: "This is an embedding Retriever compatible with the Chroma Document Store." 6 --- 7 8 # ChromaEmbeddingRetriever 9 10 This is an embedding Retriever compatible with the Chroma Document Store. 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | 1. After a Text Embedder and before a [`PromptBuilder`](../builders/promptbuilder.mdx) in a RAG pipeline 2. The last component in the semantic search pipeline 3. After a Text Embedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx) in an extractive QA pipeline | 17 | **Mandatory init variables** | `document_store`: An instance of a [ChromaDocumentStore](../../document-stores/chromadocumentstore.mdx) | 18 | **Mandatory run variables** | `query_embedding`: A list of floats | 19 | **Output variables** | `documents`: A list of documents | 20 | **API reference** | [Chroma](/reference/integrations-chroma) | 21 | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/chroma | 22 23 </div> 24 25 ## Overview 26 27 The `ChromaEmbeddingRetriever` is an embedding-based Retriever compatible with the `ChromaDocumentStore`. It compares the query and document embeddings and fetches the documents most relevant to the query from the `ChromaDocumentStore` based on the outcome. 28 29 The query needs to be embedded before being passed to this component. For example, you could use a text [embedder](../embedders.mdx) component. 30 31 In addition to the `query_embedding`, the `ChromaEmbeddingRetriever` accepts other optional parameters, including `top_k` (the maximum number of documents to retrieve) and `filters` to narrow down the search space. 32 33 ### Usage 34 35 #### On its own 36 37 This Retriever needs the `ChromaDocumentStore` and indexed documents to run. 38 39 ```python 40 from haystack_integrations.document_stores.chroma import ChromaDocumentStore 41 from haystack_integrations.components.retrievers.chroma import ChromaEmbeddingRetriever 42 43 document_store = ChromaDocumentStore() 44 45 retriever = ChromaEmbeddingRetriever(document_store=document_store) 46 47 ## example run query 48 retriever.run(query_embedding=[0.1] * 384) 49 ``` 50 51 #### In a pipeline 52 53 Here is how you could use the `ChromaEmbeddingRetriever` in a pipeline. In this example, you would create two pipelines: an indexing one and a querying one. 54 55 In the indexing pipeline, the documents are passed to the Document Embedder and then written into the document Store. 56 57 Then, in the querying pipeline, we use a text embedder to get the vector representation of the input query that will be then passed to the `ChromaEmbeddingRetriever` to get the results. 58 59 ```python 60 import os 61 from pathlib import Path 62 63 from haystack import Pipeline 64 from haystack.dataclasses import Document 65 from haystack.components.writers import DocumentWriter 66 67 ## Note: the following requires a "pip install sentence-transformers" 68 from haystack.components.embedders import ( 69 SentenceTransformersDocumentEmbedder, 70 SentenceTransformersTextEmbedder, 71 ) 72 73 from haystack_integrations.document_stores.chroma import ChromaDocumentStore 74 from haystack_integrations.components.retrievers.chroma import ChromaEmbeddingRetriever 75 from sentence_transformers import SentenceTransformer 76 77 ## Chroma is used in-memory so we use the same instances in the two pipelines below 78 document_store = ChromaDocumentStore() 79 80 documents = [ 81 Document(content="This contains variable declarations", meta={"title": "one"}), 82 Document( 83 content="This contains another sort of variable declarations", 84 meta={"title": "two"}, 85 ), 86 Document( 87 content="This has nothing to do with variable declarations", 88 meta={"title": "three"}, 89 ), 90 Document(content="A random doc", meta={"title": "four"}), 91 ] 92 93 indexing = Pipeline() 94 indexing.add_component("embedder", SentenceTransformersDocumentEmbedder()) 95 indexing.add_component("writer", DocumentWriter(document_store)) 96 indexing.connect("embedder.documents", "writer.documents") 97 indexing.run({"embedder": {"documents": documents}}) 98 99 querying = Pipeline() 100 querying.add_component("query_embedder", SentenceTransformersTextEmbedder()) 101 querying.add_component("retriever", ChromaEmbeddingRetriever(document_store)) 102 querying.connect("query_embedder.embedding", "retriever.query_embedding") 103 results = querying.run({"query_embedder": {"text": "Variable declarations"}}) 104 105 for d in results["retriever"]["documents"]: 106 print(d.meta, d.score) 107 ``` 108 109 ## Additional References 110 111 🧑🍳 Cookbook: [Use Chroma for RAG and Indexing](https://haystack.deepset.ai/cookbook/chroma-indexing-and-rag-examples)