Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.22 / integrations-api / chroma.md
chroma.md
  1  ---
  2  title: "Chroma"
  3  id: integrations-chroma
  4  description: "Chroma integration for Haystack"
  5  slug: "/integrations-chroma"
  6  ---
  7  
  8  
  9  ## haystack_integrations.components.retrievers.chroma.retriever
 10  
 11  ### ChromaQueryTextRetriever
 12  
 13  A component for retrieving documents from a [Chroma database](https://docs.trychroma.com/) using the `query` API.
 14  
 15  Example usage:
 16  
 17  ```python
 18  from haystack import Pipeline
 19  from haystack.components.converters import TextFileToDocument
 20  from haystack.components.writers import DocumentWriter
 21  
 22  from haystack_integrations.document_stores.chroma import ChromaDocumentStore
 23  from haystack_integrations.components.retrievers.chroma import ChromaQueryTextRetriever
 24  
 25  file_paths = ...
 26  
 27  # Chroma is used in-memory so we use the same instances in the two pipelines below
 28  document_store = ChromaDocumentStore()
 29  
 30  indexing = Pipeline()
 31  indexing.add_component("converter", TextFileToDocument())
 32  indexing.add_component("writer", DocumentWriter(document_store))
 33  indexing.connect("converter", "writer")
 34  indexing.run({"converter": {"sources": file_paths}})
 35  
 36  querying = Pipeline()
 37  querying.add_component("retriever", ChromaQueryTextRetriever(document_store))
 38  results = querying.run({"retriever": {"query": "Variable declarations", "top_k": 3}})
 39  
 40  for d in results["retriever"]["documents"]:
 41      print(d.meta, d.score)
 42  ```
 43  
 44  #### __init__
 45  
 46  ```python
 47  __init__(
 48      document_store: ChromaDocumentStore,
 49      filters: dict[str, Any] | None = None,
 50      top_k: int = 10,
 51      filter_policy: str | FilterPolicy = FilterPolicy.REPLACE,
 52  )
 53  ```
 54  
 55  **Parameters:**
 56  
 57  - **document_store** (<code>ChromaDocumentStore</code>) – an instance of `ChromaDocumentStore`.
 58  - **filters** (<code>dict\[str, Any\] | None</code>) – filters to narrow down the search space.
 59  - **top_k** (<code>int</code>) – the maximum number of documents to retrieve.
 60  - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied.
 61  
 62  #### run
 63  
 64  ```python
 65  run(
 66      query: str, filters: dict[str, Any] | None = None, top_k: int | None = None
 67  ) -> dict[str, Any]
 68  ```
 69  
 70  Run the retriever on the given input data.
 71  
 72  **Parameters:**
 73  
 74  - **query** (<code>str</code>) – The input data for the retriever. In this case, a plain-text query.
 75  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on
 76    the `filter_policy` chosen at retriever initialization. See init method docstring for more
 77    details.
 78  - **top_k** (<code>int | None</code>) – The maximum number of documents to retrieve.
 79    If not specified, the default value from the constructor is used.
 80  
 81  **Returns:**
 82  
 83  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
 84  - `documents`: List of documents returned by the search engine.
 85  
 86  **Raises:**
 87  
 88  - <code>ValueError</code> – If the specified document store is not found or is not a MemoryDocumentStore instance.
 89  
 90  #### run_async
 91  
 92  ```python
 93  run_async(
 94      query: str, filters: dict[str, Any] | None = None, top_k: int | None = None
 95  ) -> dict[str, Any]
 96  ```
 97  
 98  Asynchronously run the retriever on the given input data.
 99  
100  Asynchronous methods are only supported for HTTP connections.
101  
102  **Parameters:**
103  
104  - **query** (<code>str</code>) – The input data for the retriever. In this case, a plain-text query.
105  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on
106    the `filter_policy` chosen at retriever initialization. See init method docstring for more
107    details.
108  - **top_k** (<code>int | None</code>) – The maximum number of documents to retrieve.
109    If not specified, the default value from the constructor is used.
110  
111  **Returns:**
112  
113  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
114  - `documents`: List of documents returned by the search engine.
115  
116  **Raises:**
117  
118  - <code>ValueError</code> – If the specified document store is not found or is not a MemoryDocumentStore instance.
119  
120  #### from_dict
121  
122  ```python
123  from_dict(data: dict[str, Any]) -> ChromaQueryTextRetriever
124  ```
125  
126  Deserializes the component from a dictionary.
127  
128  **Parameters:**
129  
130  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
131  
132  **Returns:**
133  
134  - <code>ChromaQueryTextRetriever</code> – Deserialized component.
135  
136  #### to_dict
137  
138  ```python
139  to_dict() -> dict[str, Any]
140  ```
141  
142  Serializes the component to a dictionary.
143  
144  **Returns:**
145  
146  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
147  
148  ### ChromaEmbeddingRetriever
149  
150  A component for retrieving documents from a [Chroma database](https://docs.trychroma.com/) using embeddings.
151  
152  #### __init__
153  
154  ```python
155  __init__(
156      document_store: ChromaDocumentStore,
157      filters: dict[str, Any] | None = None,
158      top_k: int = 10,
159      filter_policy: str | FilterPolicy = FilterPolicy.REPLACE,
160  )
161  ```
162  
163  **Parameters:**
164  
165  - **document_store** (<code>ChromaDocumentStore</code>) – an instance of `ChromaDocumentStore`.
166  - **filters** (<code>dict\[str, Any\] | None</code>) – filters to narrow down the search space.
167  - **top_k** (<code>int</code>) – the maximum number of documents to retrieve.
168  - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied.
169  
170  #### run
171  
172  ```python
173  run(
174      query_embedding: list[float],
175      filters: dict[str, Any] | None = None,
176      top_k: int | None = None,
177  ) -> dict[str, Any]
178  ```
179  
180  Run the retriever on the given input data.
181  
182  **Parameters:**
183  
184  - **query_embedding** (<code>list\[float\]</code>) – the query embeddings.
185  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on
186    the `filter_policy` chosen at retriever initialization. See init method docstring for more
187    details.
188  - **top_k** (<code>int | None</code>) – the maximum number of documents to retrieve.
189    If not specified, the default value from the constructor is used.
190  
191  **Returns:**
192  
193  - <code>dict\[str, Any\]</code> – a dictionary with the following keys:
194  - `documents`: List of documents returned by the search engine.
195  
196  #### run_async
197  
198  ```python
199  run_async(
200      query_embedding: list[float],
201      filters: dict[str, Any] | None = None,
202      top_k: int | None = None,
203  ) -> dict[str, Any]
204  ```
205  
206  Asynchronously run the retriever on the given input data.
207  
208  Asynchronous methods are only supported for HTTP connections.
209  
210  **Parameters:**
211  
212  - **query_embedding** (<code>list\[float\]</code>) – the query embeddings.
213  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on
214    the `filter_policy` chosen at retriever initialization. See init method docstring for more
215    details.
216  - **top_k** (<code>int | None</code>) – the maximum number of documents to retrieve.
217    If not specified, the default value from the constructor is used.
218  
219  **Returns:**
220  
221  - <code>dict\[str, Any\]</code> – a dictionary with the following keys:
222  - `documents`: List of documents returned by the search engine.
223  
224  #### from_dict
225  
226  ```python
227  from_dict(data: dict[str, Any]) -> ChromaEmbeddingRetriever
228  ```
229  
230  Deserializes the component from a dictionary.
231  
232  **Parameters:**
233  
234  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
235  
236  **Returns:**
237  
238  - <code>ChromaEmbeddingRetriever</code> – Deserialized component.
239  
240  #### to_dict
241  
242  ```python
243  to_dict() -> dict[str, Any]
244  ```
245  
246  Serializes the component to a dictionary.
247  
248  **Returns:**
249  
250  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
251  
252  ## haystack_integrations.document_stores.chroma.document_store
253  
254  ### ChromaDocumentStore
255  
256  A document store using [Chroma](https://docs.trychroma.com/) as the backend.
257  
258  We use the `collection.get` API to implement the document store protocol,
259  the `collection.search` API will be used in the retriever instead.
260  
261  #### __init__
262  
263  ```python
264  __init__(
265      collection_name: str = "documents",
266      embedding_function: str = "default",
267      persist_path: str | None = None,
268      host: str | None = None,
269      port: int | None = None,
270      distance_function: Literal["l2", "cosine", "ip"] = "l2",
271      metadata: dict | None = None,
272      client_settings: dict[str, Any] | None = None,
273      **embedding_function_params: Any
274  )
275  ```
276  
277  Creates a new ChromaDocumentStore instance.
278  It is meant to be connected to a Chroma collection.
279  
280  Note: for the component to be part of a serializable pipeline, the __init__
281  parameters must be serializable, reason why we use a registry to configure the
282  embedding function passing a string.
283  
284  **Parameters:**
285  
286  - **collection_name** (<code>str</code>) – the name of the collection to use in the database.
287  - **embedding_function** (<code>str</code>) – the name of the embedding function to use to embed the query
288  - **persist_path** (<code>str | None</code>) – Path for local persistent storage. Cannot be used in combination with `host` and `port`.
289    If none of `persist_path`, `host`, and `port` is specified, the database will be `in-memory`.
290  - **host** (<code>str | None</code>) – The host address for the remote Chroma HTTP client connection. Cannot be used with `persist_path`.
291  - **port** (<code>int | None</code>) – The port number for the remote Chroma HTTP client connection. Cannot be used with `persist_path`.
292  - **distance_function** (<code>Literal['l2', 'cosine', 'ip']</code>) – The distance metric for the embedding space.
293  - `"l2"` computes the Euclidean (straight-line) distance between vectors,
294    where smaller scores indicate more similarity.
295  - `"cosine"` computes the cosine similarity between vectors,
296    with higher scores indicating greater similarity.
297  - `"ip"` stands for inner product, where higher scores indicate greater similarity between vectors.
298    **Note**: `distance_function` can only be set during the creation of a collection.
299    To change the distance metric of an existing collection, consider cloning the collection.
300  - **metadata** (<code>dict | None</code>) – a dictionary of chromadb collection parameters passed directly to chromadb's client
301    method `create_collection`. If it contains the key `"hnsw:space"`, the value will take precedence over the
302    `distance_function` parameter above.
303  - **client_settings** (<code>dict\[str, Any\] | None</code>) – a dictionary of Chroma Settings configuration options passed to
304    `chromadb.config.Settings`. These settings configure the underlying Chroma client behavior.
305    For available options, see [Chroma's config.py](https://github.com/chroma-core/chroma/blob/main/chromadb/config.py).
306    **Note**: specifying these settings may interfere with standard client initialization parameters.
307    This option is intended for advanced customization.
308  - **embedding_function_params** (<code>Any</code>) – additional parameters to pass to the embedding function.
309  
310  #### count_documents
311  
312  ```python
313  count_documents() -> int
314  ```
315  
316  Returns how many documents are present in the document store.
317  
318  **Returns:**
319  
320  - <code>int</code> – how many documents are present in the document store.
321  
322  #### count_documents_async
323  
324  ```python
325  count_documents_async() -> int
326  ```
327  
328  Asynchronously returns how many documents are present in the document store.
329  
330  Asynchronous methods are only supported for HTTP connections.
331  
332  **Returns:**
333  
334  - <code>int</code> – how many documents are present in the document store.
335  
336  #### filter_documents
337  
338  ```python
339  filter_documents(filters: dict[str, Any] | None = None) -> list[Document]
340  ```
341  
342  Returns the documents that match the filters provided.
343  
344  For a detailed specification of the filters,
345  refer to the [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering).
346  
347  **Parameters:**
348  
349  - **filters** (<code>dict\[str, Any\] | None</code>) – the filters to apply to the document list.
350  
351  **Returns:**
352  
353  - <code>list\[Document\]</code> – a list of Documents that match the given filters.
354  
355  #### filter_documents_async
356  
357  ```python
358  filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document]
359  ```
360  
361  Asynchronously returns the documents that match the filters provided.
362  
363  Asynchronous methods are only supported for HTTP connections.
364  
365  For a detailed specification of the filters,
366  refer to the [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering).
367  
368  **Parameters:**
369  
370  - **filters** (<code>dict\[str, Any\] | None</code>) – the filters to apply to the document list.
371  
372  **Returns:**
373  
374  - <code>list\[Document\]</code> – a list of Documents that match the given filters.
375  
376  #### write_documents
377  
378  ```python
379  write_documents(
380      documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.FAIL
381  ) -> int
382  ```
383  
384  Writes (or overwrites) documents into the store.
385  
386  **Parameters:**
387  
388  - **documents** (<code>list\[Document\]</code>) – A list of documents to write into the document store.
389  - **policy** (<code>DuplicatePolicy</code>) – Not supported at the moment.
390  
391  **Returns:**
392  
393  - <code>int</code> – The number of documents written
394  
395  **Raises:**
396  
397  - <code>ValueError</code> – When input is not valid.
398  
399  #### write_documents_async
400  
401  ```python
402  write_documents_async(
403      documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.FAIL
404  ) -> int
405  ```
406  
407  Asynchronously writes (or overwrites) documents into the store.
408  
409  Asynchronous methods are only supported for HTTP connections.
410  
411  **Parameters:**
412  
413  - **documents** (<code>list\[Document\]</code>) – A list of documents to write into the document store.
414  - **policy** (<code>DuplicatePolicy</code>) – Not supported at the moment.
415  
416  **Returns:**
417  
418  - <code>int</code> – The number of documents written
419  
420  **Raises:**
421  
422  - <code>ValueError</code> – When input is not valid.
423  
424  #### delete_documents
425  
426  ```python
427  delete_documents(document_ids: list[str]) -> None
428  ```
429  
430  Deletes all documents with a matching document_ids from the document store.
431  
432  **Parameters:**
433  
434  - **document_ids** (<code>list\[str\]</code>) – the document ids to delete
435  
436  #### delete_documents_async
437  
438  ```python
439  delete_documents_async(document_ids: list[str]) -> None
440  ```
441  
442  Asynchronously deletes all documents with a matching document_ids from the document store.
443  
444  Asynchronous methods are only supported for HTTP connections.
445  
446  **Parameters:**
447  
448  - **document_ids** (<code>list\[str\]</code>) – the document ids to delete
449  
450  #### delete_by_filter
451  
452  ```python
453  delete_by_filter(filters: dict[str, Any]) -> int
454  ```
455  
456  Deletes all documents that match the provided filters.
457  
458  **Parameters:**
459  
460  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for deletion.
461    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
462  
463  **Returns:**
464  
465  - <code>int</code> – The number of documents deleted.
466  
467  #### delete_by_filter_async
468  
469  ```python
470  delete_by_filter_async(filters: dict[str, Any]) -> int
471  ```
472  
473  Asynchronously deletes all documents that match the provided filters.
474  
475  Asynchronous methods are only supported for HTTP connections.
476  
477  **Parameters:**
478  
479  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for deletion.
480    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
481  
482  **Returns:**
483  
484  - <code>int</code> – The number of documents deleted.
485  
486  #### update_by_filter
487  
488  ```python
489  update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int
490  ```
491  
492  Updates the metadata of all documents that match the provided filters.
493  
494  **Note**: This operation is not atomic. Documents matching the filter are fetched first,
495  then updated. If documents are modified between the fetch and update operations,
496  those changes may be lost.
497  
498  **Parameters:**
499  
500  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for updating.
501    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
502  - **meta** (<code>dict\[str, Any\]</code>) – The metadata fields to update. This will be merged with existing metadata.
503  
504  **Returns:**
505  
506  - <code>int</code> – The number of documents updated.
507  
508  #### update_by_filter_async
509  
510  ```python
511  update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int
512  ```
513  
514  Asynchronously updates the metadata of all documents that match the provided filters.
515  
516  Asynchronous methods are only supported for HTTP connections.
517  
518  **Note**: This operation is not atomic. Documents matching the filter are fetched first,
519  then updated. If documents are modified between the fetch and update operations,
520  those changes may be lost.
521  
522  **Parameters:**
523  
524  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for updating.
525    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
526  - **meta** (<code>dict\[str, Any\]</code>) – The metadata fields to update. This will be merged with existing metadata.
527  
528  **Returns:**
529  
530  - <code>int</code> – The number of documents updated.
531  
532  #### delete_all_documents
533  
534  ```python
535  delete_all_documents(*, recreate_index: bool = False) -> None
536  ```
537  
538  Deletes all documents in the document store.
539  
540  A fast way to clear all documents from the document store while preserving any collection settings and mappings.
541  
542  **Parameters:**
543  
544  - **recreate_index** (<code>bool</code>) – Whether to recreate the index after deleting all documents.
545  
546  #### delete_all_documents_async
547  
548  ```python
549  delete_all_documents_async(*, recreate_index: bool = False) -> None
550  ```
551  
552  Asynchronously deletes all documents in the document store.
553  
554  A fast way to clear all documents from the document store while preserving any collection settings and mappings.
555  
556  **Parameters:**
557  
558  - **recreate_index** (<code>bool</code>) – Whether to recreate the index after deleting all documents.
559  
560  #### search
561  
562  ```python
563  search(
564      queries: list[str], top_k: int, filters: dict[str, Any] | None = None
565  ) -> list[list[Document]]
566  ```
567  
568  Search the documents in the store using the provided text queries.
569  
570  **Parameters:**
571  
572  - **queries** (<code>list\[str\]</code>) – the list of queries to search for.
573  - **top_k** (<code>int</code>) – top_k documents to return for each query.
574  - **filters** (<code>dict\[str, Any\] | None</code>) – a dictionary of filters to apply to the search. Accepts filters in haystack format.
575  
576  **Returns:**
577  
578  - <code>list\[list\[Document\]\]</code> – matching documents for each query.
579  
580  #### search_async
581  
582  ```python
583  search_async(
584      queries: list[str], top_k: int, filters: dict[str, Any] | None = None
585  ) -> list[list[Document]]
586  ```
587  
588  Asynchronously search the documents in the store using the provided text queries.
589  
590  Asynchronous methods are only supported for HTTP connections.
591  
592  **Parameters:**
593  
594  - **queries** (<code>list\[str\]</code>) – the list of queries to search for.
595  - **top_k** (<code>int</code>) – top_k documents to return for each query.
596  - **filters** (<code>dict\[str, Any\] | None</code>) – a dictionary of filters to apply to the search. Accepts filters in haystack format.
597  
598  **Returns:**
599  
600  - <code>list\[list\[Document\]\]</code> – matching documents for each query.
601  
602  #### search_embeddings
603  
604  ```python
605  search_embeddings(
606      query_embeddings: list[list[float]],
607      top_k: int,
608      filters: dict[str, Any] | None = None,
609  ) -> list[list[Document]]
610  ```
611  
612  Perform vector search on the stored document, pass the embeddings of the queries instead of their text.
613  
614  **Parameters:**
615  
616  - **query_embeddings** (<code>list\[list\[float\]\]</code>) – a list of embeddings to use as queries.
617  - **top_k** (<code>int</code>) – the maximum number of documents to retrieve.
618  - **filters** (<code>dict\[str, Any\] | None</code>) – a dictionary of filters to apply to the search. Accepts filters in haystack format.
619  
620  **Returns:**
621  
622  - <code>list\[list\[Document\]\]</code> – a list of lists of documents that match the given filters.
623  
624  #### search_embeddings_async
625  
626  ```python
627  search_embeddings_async(
628      query_embeddings: list[list[float]],
629      top_k: int,
630      filters: dict[str, Any] | None = None,
631  ) -> list[list[Document]]
632  ```
633  
634  Asynchronously perform vector search on the stored document, pass the embeddings of the queries instead of
635  their text.
636  
637  Asynchronous methods are only supported for HTTP connections.
638  
639  **Parameters:**
640  
641  - **query_embeddings** (<code>list\[list\[float\]\]</code>) – a list of embeddings to use as queries.
642  - **top_k** (<code>int</code>) – the maximum number of documents to retrieve.
643  - **filters** (<code>dict\[str, Any\] | None</code>) – a dictionary of filters to apply to the search. Accepts filters in haystack format.
644  
645  **Returns:**
646  
647  - <code>list\[list\[Document\]\]</code> – a list of lists of documents that match the given filters.
648  
649  #### count_documents_by_filter
650  
651  ```python
652  count_documents_by_filter(filters: dict[str, Any]) -> int
653  ```
654  
655  Returns the number of documents that match the provided filters.
656  
657  **Parameters:**
658  
659  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents.
660    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
661  
662  **Returns:**
663  
664  - <code>int</code> – The number of documents that match the filters.
665  
666  #### count_documents_by_filter_async
667  
668  ```python
669  count_documents_by_filter_async(filters: dict[str, Any]) -> int
670  ```
671  
672  Asynchronously returns the number of documents that match the provided filters.
673  
674  Asynchronous methods are only supported for HTTP connections.
675  
676  **Parameters:**
677  
678  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents.
679    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
680  
681  **Returns:**
682  
683  - <code>int</code> – The number of documents that match the filters.
684  
685  #### count_unique_metadata_by_filter
686  
687  ```python
688  count_unique_metadata_by_filter(
689      filters: dict[str, Any], metadata_fields: list[str]
690  ) -> dict[str, int]
691  ```
692  
693  Returns the number of unique values for each specified metadata field
694  of the documents that match the provided filters.
695  
696  **Parameters:**
697  
698  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents.
699    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
700  - **metadata_fields** (<code>list\[str\]</code>) – List of field names to calculate unique values for.
701    Field names can include or omit the "meta." prefix.
702  
703  **Returns:**
704  
705  - <code>dict\[str, int\]</code> – A dictionary mapping each metadata field name to the count of
706    its unique values among the filtered documents.
707  
708  #### count_unique_metadata_by_filter_async
709  
710  ```python
711  count_unique_metadata_by_filter_async(
712      filters: dict[str, Any], metadata_fields: list[str]
713  ) -> dict[str, int]
714  ```
715  
716  Asynchronously returns the number of unique values for each specified metadata field
717  of the documents that match the provided filters.
718  
719  Asynchronous methods are only supported for HTTP connections.
720  
721  **Parameters:**
722  
723  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents.
724    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
725  - **metadata_fields** (<code>list\[str\]</code>) – List of field names to calculate unique values for.
726    Field names can include or omit the "meta." prefix.
727  
728  **Returns:**
729  
730  - <code>dict\[str, int\]</code> – A dictionary mapping each metadata field name to the count of
731    its unique values among the filtered documents.
732  
733  #### get_metadata_fields_info
734  
735  ```python
736  get_metadata_fields_info() -> dict[str, dict[str, str]]
737  ```
738  
739  Returns information about the metadata fields in the collection.
740  
741  Since ChromaDB doesn't maintain a schema, this method samples documents
742  to infer field types.
743  
744  If we populated the collection with documents like:
745  
746  ```python
747  Document(content="Doc 1", meta={"category": "A", "status": "active", "priority": 1})
748  Document(content="Doc 2", meta={"category": "B", "status": "inactive"})
749  ```
750  
751  This method would return:
752  
753  ```python
754  {
755      'category': {'type': 'keyword'},
756      'status': {'type': 'keyword'},
757      'priority': {'type': 'long'},
758  }
759  ```
760  
761  **Returns:**
762  
763  - <code>dict\[str, dict\[str, str\]\]</code> – Dictionary mapping field names to their type information.
764  
765  #### get_metadata_fields_info_async
766  
767  ```python
768  get_metadata_fields_info_async() -> dict[str, dict[str, str]]
769  ```
770  
771  Asynchronously returns information about the metadata fields in the collection.
772  
773  Asynchronous methods are only supported for HTTP connections.
774  
775  Since ChromaDB doesn't maintain a schema, this method samples documents
776  to infer field types.
777  
778  If we populated the collection with documents like:
779  
780  ```python
781  Document(content="Doc 1", meta={"category": "A", "status": "active", "priority": 1})
782  Document(content="Doc 2", meta={"category": "B", "status": "inactive"})
783  ```
784  
785  This method would return:
786  
787  ```python
788  {
789      'category': {'type': 'keyword'},
790      'status': {'type': 'keyword'},
791      'priority': {'type': 'long'},
792  }
793  ```
794  
795  **Returns:**
796  
797  - <code>dict\[str, dict\[str, str\]\]</code> – Dictionary mapping field names to their type information.
798  
799  #### get_metadata_field_min_max
800  
801  ```python
802  get_metadata_field_min_max(metadata_field: str) -> dict[str, Any]
803  ```
804  
805  Returns the minimum and maximum values for the given metadata field.
806  
807  **Parameters:**
808  
809  - **metadata_field** (<code>str</code>) – The metadata field to get the minimum and maximum values for.
810    Can include or omit the "meta." prefix.
811  
812  **Returns:**
813  
814  - <code>dict\[str, Any\]</code> – A dictionary with the keys "min" and "max", where each value is
815    the minimum or maximum value of the metadata field across all documents.
816    Returns:
817  
818  ```python
819    {"min": None, "max": None}
820  ```
821  
822  if field doesn't exist or has no values.
823  
824  #### get_metadata_field_min_max_async
825  
826  ```python
827  get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any]
828  ```
829  
830  Asynchronously returns the minimum and maximum values for the given metadata field.
831  
832  Asynchronous methods are only supported for HTTP connections.
833  
834  **Parameters:**
835  
836  - **metadata_field** (<code>str</code>) – The metadata field to get the minimum and maximum values for.
837    Can include or omit the "meta." prefix.
838  
839  **Returns:**
840  
841  - <code>dict\[str, Any\]</code> – A dictionary with the keys "min" and "max", where each value is
842    the minimum or maximum value of the metadata field across all documents.
843    Returns:
844  
845  ```python
846    {"min": None, "max": None}
847  ```
848  
849  if field doesn't exist or has no values.
850  
851  #### get_metadata_field_unique_values
852  
853  ```python
854  get_metadata_field_unique_values(
855      metadata_field: str,
856      search_term: str | None = None,
857      from_: int = 0,
858      size: int = 10,
859  ) -> tuple[list[str], int]
860  ```
861  
862  Returns unique values for a metadata field, optionally filtered by
863  a search term in the content field, with pagination support.
864  
865  **Parameters:**
866  
867  - **metadata_field** (<code>str</code>) – The metadata field to get unique values for.
868    Can include or omit the "meta." prefix.
869  - **search_term** (<code>str | None</code>) – Optional search term to filter documents by matching
870    in the content field.
871  - **from\_** (<code>int</code>) – The offset to start returning values from (for pagination).
872  - **size** (<code>int</code>) – The maximum number of unique values to return.
873  
874  **Returns:**
875  
876  - <code>tuple\[list\[str\], int\]</code> – A tuple containing list of unique values and total count of unique values.
877  
878  #### get_metadata_field_unique_values_async
879  
880  ```python
881  get_metadata_field_unique_values_async(
882      metadata_field: str,
883      search_term: str | None = None,
884      from_: int = 0,
885      size: int = 10,
886  ) -> tuple[list[str], int]
887  ```
888  
889  Asynchronously returns unique values for a metadata field, optionally filtered by
890  a search term in the content field, with pagination support.
891  
892  Asynchronous methods are only supported for HTTP connections.
893  
894  **Parameters:**
895  
896  - **metadata_field** (<code>str</code>) – The metadata field to get unique values for.
897    Can include or omit the "meta." prefix.
898  - **search_term** (<code>str | None</code>) – Optional search term to filter documents by matching
899    in the content field.
900  - **from\_** (<code>int</code>) – The offset to start returning values from (for pagination).
901  - **size** (<code>int</code>) – The maximum number of unique values to return.
902  
903  **Returns:**
904  
905  - <code>tuple\[list\[str\], int\]</code> – A tuple containing list of unique values and total count of unique values.
906  
907  #### from_dict
908  
909  ```python
910  from_dict(data: dict[str, Any]) -> ChromaDocumentStore
911  ```
912  
913  Deserializes the component from a dictionary.
914  
915  **Parameters:**
916  
917  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
918  
919  **Returns:**
920  
921  - <code>ChromaDocumentStore</code> – Deserialized component.
922  
923  #### to_dict
924  
925  ```python
926  to_dict() -> dict[str, Any]
927  ```
928  
929  Serializes the component to a dictionary.
930  
931  **Returns:**
932  
933  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
934  
935  ## haystack_integrations.document_stores.chroma.errors
936  
937  ### ChromaDocumentStoreError
938  
939  Bases: <code>DocumentStoreError</code>
940  
941  Parent class for all ChromaDocumentStore exceptions.
942  
943  ### ChromaDocumentStoreFilterError
944  
945  Bases: <code>FilterError</code>, <code>ValueError</code>
946  
947  Raised when a filter is not valid for a ChromaDocumentStore.
948  
949  ### ChromaDocumentStoreConfigError
950  
951  Bases: <code>ChromaDocumentStoreError</code>
952  
953  Raised when a configuration is not valid for a ChromaDocumentStore.
954  
955  ## haystack_integrations.document_stores.chroma.utils
956  
957  ### get_embedding_function
958  
959  ```python
960  get_embedding_function(function_name: str, **kwargs: Any) -> EmbeddingFunction
961  ```
962  
963  Load an embedding function by name.
964  
965  **Parameters:**
966  
967  - **function_name** (<code>str</code>) – the name of the embedding function.
968  - **kwargs** (<code>Any</code>) – additional arguments to pass to the embedding function.
969  
970  **Returns:**
971  
972  - <code>EmbeddingFunction</code> – the loaded embedding function.
973  
974  **Raises:**
975  
976  - <code>ChromaDocumentStoreConfigError</code> – if the function name is invalid.