chroma.md
1 --- 2 title: "Chroma" 3 id: integrations-chroma 4 description: "Chroma integration for Haystack" 5 slug: "/integrations-chroma" 6 --- 7 8 9 ## haystack_integrations.components.retrievers.chroma.retriever 10 11 ### ChromaQueryTextRetriever 12 13 A component for retrieving documents from a [Chroma database](https://docs.trychroma.com/) using the `query` API. 14 15 Example usage: 16 17 ```python 18 from haystack import Pipeline 19 from haystack.components.converters import TextFileToDocument 20 from haystack.components.writers import DocumentWriter 21 22 from haystack_integrations.document_stores.chroma import ChromaDocumentStore 23 from haystack_integrations.components.retrievers.chroma import ChromaQueryTextRetriever 24 25 file_paths = ... 26 27 # Chroma is used in-memory so we use the same instances in the two pipelines below 28 document_store = ChromaDocumentStore() 29 30 indexing = Pipeline() 31 indexing.add_component("converter", TextFileToDocument()) 32 indexing.add_component("writer", DocumentWriter(document_store)) 33 indexing.connect("converter", "writer") 34 indexing.run({"converter": {"sources": file_paths}}) 35 36 querying = Pipeline() 37 querying.add_component("retriever", ChromaQueryTextRetriever(document_store)) 38 results = querying.run({"retriever": {"query": "Variable declarations", "top_k": 3}}) 39 40 for d in results["retriever"]["documents"]: 41 print(d.meta, d.score) 42 ``` 43 44 #### __init__ 45 46 ```python 47 __init__( 48 document_store: ChromaDocumentStore, 49 filters: dict[str, Any] | None = None, 50 top_k: int = 10, 51 filter_policy: str | FilterPolicy = FilterPolicy.REPLACE, 52 ) 53 ``` 54 55 **Parameters:** 56 57 - **document_store** (<code>ChromaDocumentStore</code>) – an instance of `ChromaDocumentStore`. 58 - **filters** (<code>dict\[str, Any\] | None</code>) – filters to narrow down the search space. 59 - **top_k** (<code>int</code>) – the maximum number of documents to retrieve. 60 - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied. 61 62 #### run 63 64 ```python 65 run( 66 query: str, filters: dict[str, Any] | None = None, top_k: int | None = None 67 ) -> dict[str, Any] 68 ``` 69 70 Run the retriever on the given input data. 71 72 **Parameters:** 73 74 - **query** (<code>str</code>) – The input data for the retriever. In this case, a plain-text query. 75 - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on 76 the `filter_policy` chosen at retriever initialization. See init method docstring for more 77 details. 78 - **top_k** (<code>int | None</code>) – The maximum number of documents to retrieve. 79 If not specified, the default value from the constructor is used. 80 81 **Returns:** 82 83 - <code>dict\[str, Any\]</code> – A dictionary with the following keys: 84 - `documents`: List of documents returned by the search engine. 85 86 **Raises:** 87 88 - <code>ValueError</code> – If the specified document store is not found or is not a MemoryDocumentStore instance. 89 90 #### run_async 91 92 ```python 93 run_async( 94 query: str, filters: dict[str, Any] | None = None, top_k: int | None = None 95 ) -> dict[str, Any] 96 ``` 97 98 Asynchronously run the retriever on the given input data. 99 100 Asynchronous methods are only supported for HTTP connections. 101 102 **Parameters:** 103 104 - **query** (<code>str</code>) – The input data for the retriever. In this case, a plain-text query. 105 - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on 106 the `filter_policy` chosen at retriever initialization. See init method docstring for more 107 details. 108 - **top_k** (<code>int | None</code>) – The maximum number of documents to retrieve. 109 If not specified, the default value from the constructor is used. 110 111 **Returns:** 112 113 - <code>dict\[str, Any\]</code> – A dictionary with the following keys: 114 - `documents`: List of documents returned by the search engine. 115 116 **Raises:** 117 118 - <code>ValueError</code> – If the specified document store is not found or is not a MemoryDocumentStore instance. 119 120 #### from_dict 121 122 ```python 123 from_dict(data: dict[str, Any]) -> ChromaQueryTextRetriever 124 ``` 125 126 Deserializes the component from a dictionary. 127 128 **Parameters:** 129 130 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 131 132 **Returns:** 133 134 - <code>ChromaQueryTextRetriever</code> – Deserialized component. 135 136 #### to_dict 137 138 ```python 139 to_dict() -> dict[str, Any] 140 ``` 141 142 Serializes the component to a dictionary. 143 144 **Returns:** 145 146 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 147 148 ### ChromaEmbeddingRetriever 149 150 A component for retrieving documents from a [Chroma database](https://docs.trychroma.com/) using embeddings. 151 152 #### __init__ 153 154 ```python 155 __init__( 156 document_store: ChromaDocumentStore, 157 filters: dict[str, Any] | None = None, 158 top_k: int = 10, 159 filter_policy: str | FilterPolicy = FilterPolicy.REPLACE, 160 ) 161 ``` 162 163 **Parameters:** 164 165 - **document_store** (<code>ChromaDocumentStore</code>) – an instance of `ChromaDocumentStore`. 166 - **filters** (<code>dict\[str, Any\] | None</code>) – filters to narrow down the search space. 167 - **top_k** (<code>int</code>) – the maximum number of documents to retrieve. 168 - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied. 169 170 #### run 171 172 ```python 173 run( 174 query_embedding: list[float], 175 filters: dict[str, Any] | None = None, 176 top_k: int | None = None, 177 ) -> dict[str, Any] 178 ``` 179 180 Run the retriever on the given input data. 181 182 **Parameters:** 183 184 - **query_embedding** (<code>list\[float\]</code>) – the query embeddings. 185 - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on 186 the `filter_policy` chosen at retriever initialization. See init method docstring for more 187 details. 188 - **top_k** (<code>int | None</code>) – the maximum number of documents to retrieve. 189 If not specified, the default value from the constructor is used. 190 191 **Returns:** 192 193 - <code>dict\[str, Any\]</code> – a dictionary with the following keys: 194 - `documents`: List of documents returned by the search engine. 195 196 #### run_async 197 198 ```python 199 run_async( 200 query_embedding: list[float], 201 filters: dict[str, Any] | None = None, 202 top_k: int | None = None, 203 ) -> dict[str, Any] 204 ``` 205 206 Asynchronously run the retriever on the given input data. 207 208 Asynchronous methods are only supported for HTTP connections. 209 210 **Parameters:** 211 212 - **query_embedding** (<code>list\[float\]</code>) – the query embeddings. 213 - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on 214 the `filter_policy` chosen at retriever initialization. See init method docstring for more 215 details. 216 - **top_k** (<code>int | None</code>) – the maximum number of documents to retrieve. 217 If not specified, the default value from the constructor is used. 218 219 **Returns:** 220 221 - <code>dict\[str, Any\]</code> – a dictionary with the following keys: 222 - `documents`: List of documents returned by the search engine. 223 224 #### from_dict 225 226 ```python 227 from_dict(data: dict[str, Any]) -> ChromaEmbeddingRetriever 228 ``` 229 230 Deserializes the component from a dictionary. 231 232 **Parameters:** 233 234 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 235 236 **Returns:** 237 238 - <code>ChromaEmbeddingRetriever</code> – Deserialized component. 239 240 #### to_dict 241 242 ```python 243 to_dict() -> dict[str, Any] 244 ``` 245 246 Serializes the component to a dictionary. 247 248 **Returns:** 249 250 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 251 252 ## haystack_integrations.document_stores.chroma.document_store 253 254 ### ChromaDocumentStore 255 256 A document store using [Chroma](https://docs.trychroma.com/) as the backend. 257 258 We use the `collection.get` API to implement the document store protocol, 259 the `collection.search` API will be used in the retriever instead. 260 261 #### __init__ 262 263 ```python 264 __init__( 265 collection_name: str = "documents", 266 embedding_function: str = "default", 267 persist_path: str | None = None, 268 host: str | None = None, 269 port: int | None = None, 270 distance_function: Literal["l2", "cosine", "ip"] = "l2", 271 metadata: dict | None = None, 272 client_settings: dict[str, Any] | None = None, 273 **embedding_function_params: Any 274 ) 275 ``` 276 277 Creates a new ChromaDocumentStore instance. 278 It is meant to be connected to a Chroma collection. 279 280 Note: for the component to be part of a serializable pipeline, the __init__ 281 parameters must be serializable, reason why we use a registry to configure the 282 embedding function passing a string. 283 284 **Parameters:** 285 286 - **collection_name** (<code>str</code>) – the name of the collection to use in the database. 287 - **embedding_function** (<code>str</code>) – the name of the embedding function to use to embed the query 288 - **persist_path** (<code>str | None</code>) – Path for local persistent storage. Cannot be used in combination with `host` and `port`. 289 If none of `persist_path`, `host`, and `port` is specified, the database will be `in-memory`. 290 - **host** (<code>str | None</code>) – The host address for the remote Chroma HTTP client connection. Cannot be used with `persist_path`. 291 - **port** (<code>int | None</code>) – The port number for the remote Chroma HTTP client connection. Cannot be used with `persist_path`. 292 - **distance_function** (<code>Literal['l2', 'cosine', 'ip']</code>) – The distance metric for the embedding space. 293 - `"l2"` computes the Euclidean (straight-line) distance between vectors, 294 where smaller scores indicate more similarity. 295 - `"cosine"` computes the cosine similarity between vectors, 296 with higher scores indicating greater similarity. 297 - `"ip"` stands for inner product, where higher scores indicate greater similarity between vectors. 298 **Note**: `distance_function` can only be set during the creation of a collection. 299 To change the distance metric of an existing collection, consider cloning the collection. 300 - **metadata** (<code>dict | None</code>) – a dictionary of chromadb collection parameters passed directly to chromadb's client 301 method `create_collection`. If it contains the key `"hnsw:space"`, the value will take precedence over the 302 `distance_function` parameter above. 303 - **client_settings** (<code>dict\[str, Any\] | None</code>) – a dictionary of Chroma Settings configuration options passed to 304 `chromadb.config.Settings`. These settings configure the underlying Chroma client behavior. 305 For available options, see [Chroma's config.py](https://github.com/chroma-core/chroma/blob/main/chromadb/config.py). 306 **Note**: specifying these settings may interfere with standard client initialization parameters. 307 This option is intended for advanced customization. 308 - **embedding_function_params** (<code>Any</code>) – additional parameters to pass to the embedding function. 309 310 #### count_documents 311 312 ```python 313 count_documents() -> int 314 ``` 315 316 Returns how many documents are present in the document store. 317 318 **Returns:** 319 320 - <code>int</code> – how many documents are present in the document store. 321 322 #### count_documents_async 323 324 ```python 325 count_documents_async() -> int 326 ``` 327 328 Asynchronously returns how many documents are present in the document store. 329 330 Asynchronous methods are only supported for HTTP connections. 331 332 **Returns:** 333 334 - <code>int</code> – how many documents are present in the document store. 335 336 #### filter_documents 337 338 ```python 339 filter_documents(filters: dict[str, Any] | None = None) -> list[Document] 340 ``` 341 342 Returns the documents that match the filters provided. 343 344 For a detailed specification of the filters, 345 refer to the [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). 346 347 **Parameters:** 348 349 - **filters** (<code>dict\[str, Any\] | None</code>) – the filters to apply to the document list. 350 351 **Returns:** 352 353 - <code>list\[Document\]</code> – a list of Documents that match the given filters. 354 355 #### filter_documents_async 356 357 ```python 358 filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document] 359 ``` 360 361 Asynchronously returns the documents that match the filters provided. 362 363 Asynchronous methods are only supported for HTTP connections. 364 365 For a detailed specification of the filters, 366 refer to the [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). 367 368 **Parameters:** 369 370 - **filters** (<code>dict\[str, Any\] | None</code>) – the filters to apply to the document list. 371 372 **Returns:** 373 374 - <code>list\[Document\]</code> – a list of Documents that match the given filters. 375 376 #### write_documents 377 378 ```python 379 write_documents( 380 documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.FAIL 381 ) -> int 382 ``` 383 384 Writes (or overwrites) documents into the store. 385 386 **Parameters:** 387 388 - **documents** (<code>list\[Document\]</code>) – A list of documents to write into the document store. 389 - **policy** (<code>DuplicatePolicy</code>) – Not supported at the moment. 390 391 **Returns:** 392 393 - <code>int</code> – The number of documents written 394 395 **Raises:** 396 397 - <code>ValueError</code> – When input is not valid. 398 399 #### write_documents_async 400 401 ```python 402 write_documents_async( 403 documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.FAIL 404 ) -> int 405 ``` 406 407 Asynchronously writes (or overwrites) documents into the store. 408 409 Asynchronous methods are only supported for HTTP connections. 410 411 **Parameters:** 412 413 - **documents** (<code>list\[Document\]</code>) – A list of documents to write into the document store. 414 - **policy** (<code>DuplicatePolicy</code>) – Not supported at the moment. 415 416 **Returns:** 417 418 - <code>int</code> – The number of documents written 419 420 **Raises:** 421 422 - <code>ValueError</code> – When input is not valid. 423 424 #### delete_documents 425 426 ```python 427 delete_documents(document_ids: list[str]) -> None 428 ``` 429 430 Deletes all documents with a matching document_ids from the document store. 431 432 **Parameters:** 433 434 - **document_ids** (<code>list\[str\]</code>) – the document ids to delete 435 436 #### delete_documents_async 437 438 ```python 439 delete_documents_async(document_ids: list[str]) -> None 440 ``` 441 442 Asynchronously deletes all documents with a matching document_ids from the document store. 443 444 Asynchronous methods are only supported for HTTP connections. 445 446 **Parameters:** 447 448 - **document_ids** (<code>list\[str\]</code>) – the document ids to delete 449 450 #### delete_by_filter 451 452 ```python 453 delete_by_filter(filters: dict[str, Any]) -> int 454 ``` 455 456 Deletes all documents that match the provided filters. 457 458 **Parameters:** 459 460 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for deletion. 461 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 462 463 **Returns:** 464 465 - <code>int</code> – The number of documents deleted. 466 467 #### delete_by_filter_async 468 469 ```python 470 delete_by_filter_async(filters: dict[str, Any]) -> int 471 ``` 472 473 Asynchronously deletes all documents that match the provided filters. 474 475 Asynchronous methods are only supported for HTTP connections. 476 477 **Parameters:** 478 479 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for deletion. 480 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 481 482 **Returns:** 483 484 - <code>int</code> – The number of documents deleted. 485 486 #### update_by_filter 487 488 ```python 489 update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int 490 ``` 491 492 Updates the metadata of all documents that match the provided filters. 493 494 **Note**: This operation is not atomic. Documents matching the filter are fetched first, 495 then updated. If documents are modified between the fetch and update operations, 496 those changes may be lost. 497 498 **Parameters:** 499 500 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for updating. 501 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 502 - **meta** (<code>dict\[str, Any\]</code>) – The metadata fields to update. This will be merged with existing metadata. 503 504 **Returns:** 505 506 - <code>int</code> – The number of documents updated. 507 508 #### update_by_filter_async 509 510 ```python 511 update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int 512 ``` 513 514 Asynchronously updates the metadata of all documents that match the provided filters. 515 516 Asynchronous methods are only supported for HTTP connections. 517 518 **Note**: This operation is not atomic. Documents matching the filter are fetched first, 519 then updated. If documents are modified between the fetch and update operations, 520 those changes may be lost. 521 522 **Parameters:** 523 524 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for updating. 525 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 526 - **meta** (<code>dict\[str, Any\]</code>) – The metadata fields to update. This will be merged with existing metadata. 527 528 **Returns:** 529 530 - <code>int</code> – The number of documents updated. 531 532 #### delete_all_documents 533 534 ```python 535 delete_all_documents(*, recreate_index: bool = False) -> None 536 ``` 537 538 Deletes all documents in the document store. 539 540 A fast way to clear all documents from the document store while preserving any collection settings and mappings. 541 542 **Parameters:** 543 544 - **recreate_index** (<code>bool</code>) – Whether to recreate the index after deleting all documents. 545 546 #### delete_all_documents_async 547 548 ```python 549 delete_all_documents_async(*, recreate_index: bool = False) -> None 550 ``` 551 552 Asynchronously deletes all documents in the document store. 553 554 A fast way to clear all documents from the document store while preserving any collection settings and mappings. 555 556 **Parameters:** 557 558 - **recreate_index** (<code>bool</code>) – Whether to recreate the index after deleting all documents. 559 560 #### search 561 562 ```python 563 search( 564 queries: list[str], top_k: int, filters: dict[str, Any] | None = None 565 ) -> list[list[Document]] 566 ``` 567 568 Search the documents in the store using the provided text queries. 569 570 **Parameters:** 571 572 - **queries** (<code>list\[str\]</code>) – the list of queries to search for. 573 - **top_k** (<code>int</code>) – top_k documents to return for each query. 574 - **filters** (<code>dict\[str, Any\] | None</code>) – a dictionary of filters to apply to the search. Accepts filters in haystack format. 575 576 **Returns:** 577 578 - <code>list\[list\[Document\]\]</code> – matching documents for each query. 579 580 #### search_async 581 582 ```python 583 search_async( 584 queries: list[str], top_k: int, filters: dict[str, Any] | None = None 585 ) -> list[list[Document]] 586 ``` 587 588 Asynchronously search the documents in the store using the provided text queries. 589 590 Asynchronous methods are only supported for HTTP connections. 591 592 **Parameters:** 593 594 - **queries** (<code>list\[str\]</code>) – the list of queries to search for. 595 - **top_k** (<code>int</code>) – top_k documents to return for each query. 596 - **filters** (<code>dict\[str, Any\] | None</code>) – a dictionary of filters to apply to the search. Accepts filters in haystack format. 597 598 **Returns:** 599 600 - <code>list\[list\[Document\]\]</code> – matching documents for each query. 601 602 #### search_embeddings 603 604 ```python 605 search_embeddings( 606 query_embeddings: list[list[float]], 607 top_k: int, 608 filters: dict[str, Any] | None = None, 609 ) -> list[list[Document]] 610 ``` 611 612 Perform vector search on the stored document, pass the embeddings of the queries instead of their text. 613 614 **Parameters:** 615 616 - **query_embeddings** (<code>list\[list\[float\]\]</code>) – a list of embeddings to use as queries. 617 - **top_k** (<code>int</code>) – the maximum number of documents to retrieve. 618 - **filters** (<code>dict\[str, Any\] | None</code>) – a dictionary of filters to apply to the search. Accepts filters in haystack format. 619 620 **Returns:** 621 622 - <code>list\[list\[Document\]\]</code> – a list of lists of documents that match the given filters. 623 624 #### search_embeddings_async 625 626 ```python 627 search_embeddings_async( 628 query_embeddings: list[list[float]], 629 top_k: int, 630 filters: dict[str, Any] | None = None, 631 ) -> list[list[Document]] 632 ``` 633 634 Asynchronously perform vector search on the stored document, pass the embeddings of the queries instead of 635 their text. 636 637 Asynchronous methods are only supported for HTTP connections. 638 639 **Parameters:** 640 641 - **query_embeddings** (<code>list\[list\[float\]\]</code>) – a list of embeddings to use as queries. 642 - **top_k** (<code>int</code>) – the maximum number of documents to retrieve. 643 - **filters** (<code>dict\[str, Any\] | None</code>) – a dictionary of filters to apply to the search. Accepts filters in haystack format. 644 645 **Returns:** 646 647 - <code>list\[list\[Document\]\]</code> – a list of lists of documents that match the given filters. 648 649 #### count_documents_by_filter 650 651 ```python 652 count_documents_by_filter(filters: dict[str, Any]) -> int 653 ``` 654 655 Returns the number of documents that match the provided filters. 656 657 **Parameters:** 658 659 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents. 660 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 661 662 **Returns:** 663 664 - <code>int</code> – The number of documents that match the filters. 665 666 #### count_documents_by_filter_async 667 668 ```python 669 count_documents_by_filter_async(filters: dict[str, Any]) -> int 670 ``` 671 672 Asynchronously returns the number of documents that match the provided filters. 673 674 Asynchronous methods are only supported for HTTP connections. 675 676 **Parameters:** 677 678 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents. 679 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 680 681 **Returns:** 682 683 - <code>int</code> – The number of documents that match the filters. 684 685 #### count_unique_metadata_by_filter 686 687 ```python 688 count_unique_metadata_by_filter( 689 filters: dict[str, Any], metadata_fields: list[str] 690 ) -> dict[str, int] 691 ``` 692 693 Returns the number of unique values for each specified metadata field 694 of the documents that match the provided filters. 695 696 **Parameters:** 697 698 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents. 699 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 700 - **metadata_fields** (<code>list\[str\]</code>) – List of field names to calculate unique values for. 701 Field names can include or omit the "meta." prefix. 702 703 **Returns:** 704 705 - <code>dict\[str, int\]</code> – A dictionary mapping each metadata field name to the count of 706 its unique values among the filtered documents. 707 708 #### count_unique_metadata_by_filter_async 709 710 ```python 711 count_unique_metadata_by_filter_async( 712 filters: dict[str, Any], metadata_fields: list[str] 713 ) -> dict[str, int] 714 ``` 715 716 Asynchronously returns the number of unique values for each specified metadata field 717 of the documents that match the provided filters. 718 719 Asynchronous methods are only supported for HTTP connections. 720 721 **Parameters:** 722 723 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents. 724 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 725 - **metadata_fields** (<code>list\[str\]</code>) – List of field names to calculate unique values for. 726 Field names can include or omit the "meta." prefix. 727 728 **Returns:** 729 730 - <code>dict\[str, int\]</code> – A dictionary mapping each metadata field name to the count of 731 its unique values among the filtered documents. 732 733 #### get_metadata_fields_info 734 735 ```python 736 get_metadata_fields_info() -> dict[str, dict[str, str]] 737 ``` 738 739 Returns information about the metadata fields in the collection. 740 741 Since ChromaDB doesn't maintain a schema, this method samples documents 742 to infer field types. 743 744 If we populated the collection with documents like: 745 746 ```python 747 Document(content="Doc 1", meta={"category": "A", "status": "active", "priority": 1}) 748 Document(content="Doc 2", meta={"category": "B", "status": "inactive"}) 749 ``` 750 751 This method would return: 752 753 ```python 754 { 755 'category': {'type': 'keyword'}, 756 'status': {'type': 'keyword'}, 757 'priority': {'type': 'long'}, 758 } 759 ``` 760 761 **Returns:** 762 763 - <code>dict\[str, dict\[str, str\]\]</code> – Dictionary mapping field names to their type information. 764 765 #### get_metadata_fields_info_async 766 767 ```python 768 get_metadata_fields_info_async() -> dict[str, dict[str, str]] 769 ``` 770 771 Asynchronously returns information about the metadata fields in the collection. 772 773 Asynchronous methods are only supported for HTTP connections. 774 775 Since ChromaDB doesn't maintain a schema, this method samples documents 776 to infer field types. 777 778 If we populated the collection with documents like: 779 780 ```python 781 Document(content="Doc 1", meta={"category": "A", "status": "active", "priority": 1}) 782 Document(content="Doc 2", meta={"category": "B", "status": "inactive"}) 783 ``` 784 785 This method would return: 786 787 ```python 788 { 789 'category': {'type': 'keyword'}, 790 'status': {'type': 'keyword'}, 791 'priority': {'type': 'long'}, 792 } 793 ``` 794 795 **Returns:** 796 797 - <code>dict\[str, dict\[str, str\]\]</code> – Dictionary mapping field names to their type information. 798 799 #### get_metadata_field_min_max 800 801 ```python 802 get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] 803 ``` 804 805 Returns the minimum and maximum values for the given metadata field. 806 807 **Parameters:** 808 809 - **metadata_field** (<code>str</code>) – The metadata field to get the minimum and maximum values for. 810 Can include or omit the "meta." prefix. 811 812 **Returns:** 813 814 - <code>dict\[str, Any\]</code> – A dictionary with the keys "min" and "max", where each value is 815 the minimum or maximum value of the metadata field across all documents. 816 Returns: 817 818 ```python 819 {"min": None, "max": None} 820 ``` 821 822 if field doesn't exist or has no values. 823 824 #### get_metadata_field_min_max_async 825 826 ```python 827 get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any] 828 ``` 829 830 Asynchronously returns the minimum and maximum values for the given metadata field. 831 832 Asynchronous methods are only supported for HTTP connections. 833 834 **Parameters:** 835 836 - **metadata_field** (<code>str</code>) – The metadata field to get the minimum and maximum values for. 837 Can include or omit the "meta." prefix. 838 839 **Returns:** 840 841 - <code>dict\[str, Any\]</code> – A dictionary with the keys "min" and "max", where each value is 842 the minimum or maximum value of the metadata field across all documents. 843 Returns: 844 845 ```python 846 {"min": None, "max": None} 847 ``` 848 849 if field doesn't exist or has no values. 850 851 #### get_metadata_field_unique_values 852 853 ```python 854 get_metadata_field_unique_values( 855 metadata_field: str, 856 search_term: str | None = None, 857 from_: int = 0, 858 size: int = 10, 859 ) -> tuple[list[str], int] 860 ``` 861 862 Returns unique values for a metadata field, optionally filtered by 863 a search term in the content field, with pagination support. 864 865 **Parameters:** 866 867 - **metadata_field** (<code>str</code>) – The metadata field to get unique values for. 868 Can include or omit the "meta." prefix. 869 - **search_term** (<code>str | None</code>) – Optional search term to filter documents by matching 870 in the content field. 871 - **from\_** (<code>int</code>) – The offset to start returning values from (for pagination). 872 - **size** (<code>int</code>) – The maximum number of unique values to return. 873 874 **Returns:** 875 876 - <code>tuple\[list\[str\], int\]</code> – A tuple containing list of unique values and total count of unique values. 877 878 #### get_metadata_field_unique_values_async 879 880 ```python 881 get_metadata_field_unique_values_async( 882 metadata_field: str, 883 search_term: str | None = None, 884 from_: int = 0, 885 size: int = 10, 886 ) -> tuple[list[str], int] 887 ``` 888 889 Asynchronously returns unique values for a metadata field, optionally filtered by 890 a search term in the content field, with pagination support. 891 892 Asynchronous methods are only supported for HTTP connections. 893 894 **Parameters:** 895 896 - **metadata_field** (<code>str</code>) – The metadata field to get unique values for. 897 Can include or omit the "meta." prefix. 898 - **search_term** (<code>str | None</code>) – Optional search term to filter documents by matching 899 in the content field. 900 - **from\_** (<code>int</code>) – The offset to start returning values from (for pagination). 901 - **size** (<code>int</code>) – The maximum number of unique values to return. 902 903 **Returns:** 904 905 - <code>tuple\[list\[str\], int\]</code> – A tuple containing list of unique values and total count of unique values. 906 907 #### from_dict 908 909 ```python 910 from_dict(data: dict[str, Any]) -> ChromaDocumentStore 911 ``` 912 913 Deserializes the component from a dictionary. 914 915 **Parameters:** 916 917 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 918 919 **Returns:** 920 921 - <code>ChromaDocumentStore</code> – Deserialized component. 922 923 #### to_dict 924 925 ```python 926 to_dict() -> dict[str, Any] 927 ``` 928 929 Serializes the component to a dictionary. 930 931 **Returns:** 932 933 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 934 935 ## haystack_integrations.document_stores.chroma.errors 936 937 ### ChromaDocumentStoreError 938 939 Bases: <code>DocumentStoreError</code> 940 941 Parent class for all ChromaDocumentStore exceptions. 942 943 ### ChromaDocumentStoreFilterError 944 945 Bases: <code>FilterError</code>, <code>ValueError</code> 946 947 Raised when a filter is not valid for a ChromaDocumentStore. 948 949 ### ChromaDocumentStoreConfigError 950 951 Bases: <code>ChromaDocumentStoreError</code> 952 953 Raised when a configuration is not valid for a ChromaDocumentStore. 954 955 ## haystack_integrations.document_stores.chroma.utils 956 957 ### get_embedding_function 958 959 ```python 960 get_embedding_function(function_name: str, **kwargs: Any) -> EmbeddingFunction 961 ``` 962 963 Load an embedding function by name. 964 965 **Parameters:** 966 967 - **function_name** (<code>str</code>) – the name of the embedding function. 968 - **kwargs** (<code>Any</code>) – additional arguments to pass to the embedding function. 969 970 **Returns:** 971 972 - <code>EmbeddingFunction</code> – the loaded embedding function. 973 974 **Raises:** 975 976 - <code>ChromaDocumentStoreConfigError</code> – if the function name is invalid.