Cradicle Explorer

/ docs-website / docs / pipeline-components / retrievers / chromaqueryretriever.mdx
chromaqueryretriever.mdx
 1  ---
 2  title: "ChromaQueryTextRetriever"
 3  id: chromaqueryretriever
 4  slug: "/chromaqueryretriever"
 5  description: "This is a a Retriever compatible with the Chroma Document Store."
 6  ---
 7  
 8  # ChromaQueryTextRetriever
 9  
10  This is a a Retriever compatible with the Chroma Document Store.
11  
12  <div className="key-value-table">
13  
14  |  |  |
15  | --- | --- |
16  | **Most common position in a pipeline** | 1. After a Text Embedder and before a [`PromptBuilder`](../builders/promptbuilder.mdx)   in a RAG pipeline  2. The last component in the semantic search pipeline  3. After a Text Embedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx)   in an extractive QA pipeline |
17  | **Mandatory init variables**           | `document_store`: An instance of a [ChromaDocumentStore](../../document-stores/chromadocumentstore.mdx)                                                                                                                                                                                         |
18  | **Mandatory run variables**            | `query`: A single query in plain-text format to be processed by the [Retriever](../retrievers.mdx)                                                                                                                                                                           |
19  | **Output variables**                   | `documents`: A list of documents                                                                                                                                                                                                                                            |
20  | **API reference**                      | [Chroma](/reference/integrations-chroma)                                                                                                                                                                                                                                           |
21  | **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/chroma                                                                                                                                                                                    |
22  
23  </div>
24  
25  ## Overview
26  
27  The `ChromaQueryTextRetriever` is an embedding-based Retriever compatible with the `ChromaDocumentStore` that uses the Chroma [query API](https://docs.trychroma.com/reference/Collection#query).
28  This component takes a plain-text query string in input and returns the matching documents.
29  Chroma will create the embedding for the query using its [embedding function](https://docs.trychroma.com/embeddings#default-all-minilm-l6-v2); in case you do not want to use the default embedding function, this must be specified at `ChromaDocumentStore` initialization.
30  
31  ### Usage
32  
33  #### On its own
34  
35  This Retriever needs the `ChromaDocumentStore` and indexed documents to run.
36  
37  ```python
38  from haystack_integrations.document_stores.chroma import ChromaDocumentStore
39  from haystack_integrations.components.retrievers.chroma import ChromaQueryTextRetriever
40  
41  document_store = ChromaDocumentStore()
42  
43  retriever = ChromaQueryTextRetriever(document_store=document_store)
44  
45  ## example run query
46  retriever.run(query="How does Chroma Retriever work?")
47  ```
48  
49  #### In a pipeline
50  
51  Here is how you could use the `ChromaQueryTextRetriever` in a Pipeline. In this example, you would create two pipelines: an indexing one and a querying one.
52  
53  In the indexing pipeline, the documents are written in the Document Store.
54  
55  Then, in the querying pipeline, `ChromaQueryTextRetriever` gets the answer from the Document Store based on the provided query.
56  
57  ```python
58  import os
59  from pathlib import Path
60  
61  from haystack import Pipeline
62  from haystack.dataclasses import Document
63  from haystack.components.writers import DocumentWriter
64  
65  from haystack_integrations.document_stores.chroma import ChromaDocumentStore
66  from haystack_integrations.components.retrievers.chroma import ChromaQueryTextRetriever
67  
68  ## Chroma is used in-memory so we use the same instances in the two pipelines below
69  document_store = ChromaDocumentStore()
70  
71  documents = [
72      Document(content="This contains variable declarations", meta={"title": "one"}),
73      Document(
74          content="This contains another sort of variable declarations",
75          meta={"title": "two"},
76      ),
77      Document(
78          content="This has nothing to do with variable declarations",
79          meta={"title": "three"},
80      ),
81      Document(content="A random doc", meta={"title": "four"}),
82  ]
83  
84  indexing = Pipeline()
85  indexing.add_component("writer", DocumentWriter(document_store))
86  indexing.run({"writer": {"documents": documents}})
87  
88  querying = Pipeline()
89  querying.add_component("retriever", ChromaQueryTextRetriever(document_store))
90  results = querying.run({"retriever": {"query": "Variable declarations", "top_k": 3}})
91  
92  for d in results["retriever"]["documents"]:
93      print(d.meta, d.score)
94  ```
95  
96  ## Additional References
97  
98  🧑‍🍳 Cookbook: [Use Chroma for RAG and Indexing](https://haystack.deepset.ai/cookbook/chroma-indexing-and-rag-examples)