opensearch.md
1 --- 2 title: "OpenSearch" 3 id: integrations-opensearch 4 description: "OpenSearch integration for Haystack" 5 slug: "/integrations-opensearch" 6 --- 7 8 9 ## haystack_integrations.components.retrievers.opensearch.bm25_retriever 10 11 ### OpenSearchBM25Retriever 12 13 Fetches documents from OpenSearchDocumentStore using the keyword-based BM25 algorithm. 14 15 BM25 computes a weighted word overlap between the query string and a document to determine its similarity. 16 17 #### __init__ 18 19 ```python 20 __init__( 21 *, 22 document_store: OpenSearchDocumentStore, 23 filters: dict[str, Any] | None = None, 24 fuzziness: int | str = "AUTO", 25 top_k: int = 10, 26 scale_score: bool = False, 27 all_terms_must_match: bool = False, 28 filter_policy: str | FilterPolicy = FilterPolicy.REPLACE, 29 custom_query: dict[str, Any] | None = None, 30 raise_on_failure: bool = True 31 ) -> None 32 ``` 33 34 Creates the OpenSearchBM25Retriever component. 35 36 **Parameters:** 37 38 - **document_store** (<code>OpenSearchDocumentStore</code>) – An instance of OpenSearchDocumentStore to use with the Retriever. 39 - **filters** (<code>dict\[str, Any\] | None</code>) – Filters to narrow down the search for documents in the Document Store. 40 - **fuzziness** (<code>int | str</code>) – Determines how approximate string matching is applied in full-text queries. 41 This parameter sets the number of character edits (insertions, deletions, or substitutions) 42 required to transform one word into another. For example, the "fuzziness" between the words 43 "wined" and "wind" is 1 because only one edit is needed to match them. 44 45 Use "AUTO" (the default) for automatic adjustment based on term length, which is optimal for 46 most scenarios. For detailed guidance, refer to the 47 [OpenSearch fuzzy query documentation](https://opensearch.org/docs/latest/query-dsl/term/fuzzy/). 48 49 - **top_k** (<code>int</code>) – Maximum number of documents to return. 50 51 - **scale_score** (<code>bool</code>) – If `True`, scales the score of retrieved documents to a range between 0 and 1. 52 This is useful when comparing documents across different indexes. 53 54 - **all_terms_must_match** (<code>bool</code>) – If `True`, all terms in the query string must be present in the 55 retrieved documents. This is useful when searching for short text where even one term 56 can make a difference. 57 58 - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied. Possible options: 59 60 - `replace`: Runtime filters replace initialization filters. Use this policy to change the filtering scope 61 for specific queries. 62 63 - `merge`: Runtime filters are merged with initialization filters. 64 65 - **custom_query** (<code>dict\[str, Any\] | None</code>) – The query containing a mandatory `$query` and an optional `$filters` placeholder. 66 67 **An example custom_query:** 68 69 ```python 70 { 71 "query": { 72 "bool": { 73 "should": [{"multi_match": { 74 "query": "$query", // mandatory query placeholder 75 "type": "most_fields", 76 "fields": ["content", "title"]}}], 77 "filter": "$filters" // optional filter placeholder 78 } 79 } 80 } 81 ``` 82 83 An example `run()` method for this `custom_query`: 84 85 ```python 86 retriever.run( 87 query="Why did the revenue increase?", 88 filters={ 89 "operator": "AND", 90 "conditions": [ 91 {"field": "meta.years", "operator": "==", "value": "2019"}, 92 {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, 93 ], 94 }, 95 ) 96 ``` 97 98 - **raise_on_failure** (<code>bool</code>) – Whether to raise an exception if the API call fails. Otherwise log a warning and return an empty list. 99 100 **Raises:** 101 102 - <code>ValueError</code> – If `document_store` is not an instance of OpenSearchDocumentStore. 103 104 #### to_dict 105 106 ```python 107 to_dict() -> dict[str, Any] 108 ``` 109 110 Serializes the component to a dictionary. 111 112 **Returns:** 113 114 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 115 116 #### from_dict 117 118 ```python 119 from_dict(data: dict[str, Any]) -> OpenSearchBM25Retriever 120 ``` 121 122 Deserializes the component from a dictionary. 123 124 **Parameters:** 125 126 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 127 128 **Returns:** 129 130 - <code>OpenSearchBM25Retriever</code> – Deserialized component. 131 132 #### run 133 134 ```python 135 run( 136 query: str, 137 filters: dict[str, Any] | None = None, 138 all_terms_must_match: bool | None = None, 139 top_k: int | None = None, 140 fuzziness: int | str | None = None, 141 scale_score: bool | None = None, 142 custom_query: dict[str, Any] | None = None, 143 document_store: OpenSearchDocumentStore | None = None, 144 ) -> dict[str, list[Document]] 145 ``` 146 147 Retrieve documents using BM25 retrieval. 148 149 **Parameters:** 150 151 - **query** (<code>str</code>) – The query string. 152 153 - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved documents. The way runtime filters are applied depends on 154 the `filter_policy` specified at Retriever's initialization. 155 156 - **all_terms_must_match** (<code>bool | None</code>) – If `True`, all terms in the query string must be present in the 157 retrieved documents. 158 159 - **top_k** (<code>int | None</code>) – Maximum number of documents to return. 160 161 - **fuzziness** (<code>int | str | None</code>) – Fuzziness parameter for full-text queries to apply approximate string matching. 162 For more information, see [OpenSearch fuzzy query](https://opensearch.org/docs/latest/query-dsl/term/fuzzy/). 163 164 - **scale_score** (<code>bool | None</code>) – If `True`, scales the score of retrieved documents to a range between 0 and 1. 165 This is useful when comparing documents across different indexes. 166 167 - **custom_query** (<code>dict\[str, Any\] | None</code>) – A custom OpenSearch query. It must include a `$query` and may optionally 168 include a `$filters` placeholder. 169 170 **An example custom_query:** 171 172 ```python 173 { 174 "query": { 175 "bool": { 176 "should": [{"multi_match": { 177 "query": "$query", // mandatory query placeholder 178 "type": "most_fields", 179 "fields": ["content", "title"]}}], 180 "filter": "$filters" // optional filter placeholder 181 } 182 } 183 } 184 ``` 185 186 **For this custom_query, a sample `run()` could be:** 187 188 ```python 189 retriever.run( 190 query="Why did the revenue increase?", 191 filters={ 192 "operator": "AND", 193 "conditions": [ 194 {"field": "meta.years", "operator": "==", "value": "2019"}, 195 {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, 196 ], 197 }, 198 ) 199 ``` 200 201 - **document_store** (<code>OpenSearchDocumentStore | None</code>) – Optionally, an instance of OpenSearchDocumentStore to use with the Retriever 202 203 **Returns:** 204 205 - <code>dict\[str, list\[Document\]\]</code> – A dictionary containing the retrieved documents with the following structure: 206 - documents: List of retrieved Documents. 207 208 #### run_async 209 210 ```python 211 run_async( 212 query: str, 213 filters: dict[str, Any] | None = None, 214 all_terms_must_match: bool | None = None, 215 top_k: int | None = None, 216 fuzziness: int | str | None = None, 217 scale_score: bool | None = None, 218 custom_query: dict[str, Any] | None = None, 219 document_store: OpenSearchDocumentStore | None = None, 220 ) -> dict[str, list[Document]] 221 ``` 222 223 Asynchronously retrieve documents using BM25 retrieval. 224 225 **Parameters:** 226 227 - **query** (<code>str</code>) – The query string. 228 - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved documents. The way runtime filters are applied depends on 229 the `filter_policy` specified at Retriever's initialization. 230 - **all_terms_must_match** (<code>bool | None</code>) – If `True`, all terms in the query string must be present in the 231 retrieved documents. 232 - **top_k** (<code>int | None</code>) – Maximum number of documents to return. 233 - **fuzziness** (<code>int | str | None</code>) – Fuzziness parameter for full-text queries to apply approximate string matching. 234 For more information, see [OpenSearch fuzzy query](https://opensearch.org/docs/latest/query-dsl/term/fuzzy/). 235 - **scale_score** (<code>bool | None</code>) – If `True`, scales the score of retrieved documents to a range between 0 and 1. 236 This is useful when comparing documents across different indexes. 237 - **custom_query** (<code>dict\[str, Any\] | None</code>) – A custom OpenSearch query. It must include a `$query` and may optionally 238 include a `$filters` placeholder. 239 - **document_store** (<code>OpenSearchDocumentStore | None</code>) – Optionally, an instance of OpenSearchDocumentStore to use with the Retriever 240 241 **Returns:** 242 243 - <code>dict\[str, list\[Document\]\]</code> – A dictionary containing the retrieved documents with the following structure: 244 - documents: List of retrieved Documents. 245 246 ## haystack_integrations.components.retrievers.opensearch.embedding_retriever 247 248 ### OpenSearchEmbeddingRetriever 249 250 Retrieves documents from the OpenSearchDocumentStore using a vector similarity metric. 251 252 Must be connected to the OpenSearchDocumentStore to run. 253 254 #### __init__ 255 256 ```python 257 __init__( 258 *, 259 document_store: OpenSearchDocumentStore, 260 filters: dict[str, Any] | None = None, 261 top_k: int = 10, 262 filter_policy: str | FilterPolicy = FilterPolicy.REPLACE, 263 custom_query: dict[str, Any] | None = None, 264 raise_on_failure: bool = True, 265 efficient_filtering: bool = False, 266 search_kwargs: dict[str, Any] | None = None 267 ) -> None 268 ``` 269 270 Create the OpenSearchEmbeddingRetriever component. 271 272 **Parameters:** 273 274 - **document_store** (<code>OpenSearchDocumentStore</code>) – An instance of OpenSearchDocumentStore to use with the Retriever. 275 276 - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied when fetching documents from the Document Store. 277 Filters are applied during the approximate kNN search to ensure the Retriever returns 278 `top_k` matching documents. 279 280 - **top_k** (<code>int</code>) – Maximum number of documents to return. 281 282 - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied. Possible options: 283 284 - `merge`: Runtime filters are merged with initialization filters. 285 286 - `replace`: Runtime filters replace initialization filters. Use this policy to change the filtering scope. 287 288 - **custom_query** (<code>dict\[str, Any\] | None</code>) – The custom OpenSearch query containing a mandatory `$query_embedding` and 289 an optional `$filters` placeholder. 290 291 **An example custom_query:** 292 293 ```python 294 { 295 "query": { 296 "bool": { 297 "must": [ 298 { 299 "knn": { 300 "embedding": { 301 "vector": "$query_embedding", // mandatory query placeholder 302 "k": 10000, 303 } 304 } 305 } 306 ], 307 "filter": "$filters" // optional filter placeholder 308 } 309 } 310 } 311 ``` 312 313 For this `custom_query`, an example `run()` could be: 314 315 ```python 316 retriever.run( 317 query_embedding=embedding, 318 filters={ 319 "operator": "AND", 320 "conditions": [ 321 {"field": "meta.years", "operator": "==", "value": "2019"}, 322 {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, 323 ], 324 }, 325 ) 326 ``` 327 328 - **raise_on_failure** (<code>bool</code>) – If `True`, raises an exception if the API call fails. 329 If `False`, logs a warning and returns an empty list. 330 - **efficient_filtering** (<code>bool</code>) – If `True`, the filter will be applied during the approximate kNN search. 331 This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib". 332 - **search_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for finetuning the embedding search. 333 E.g., to specify `k` and `ef_search` 334 335 ```python 336 { 337 "k": 20, # See https://docs.opensearch.org/latest/vector-search/vector-search-techniques/approximate-knn/#the-number-of-returned-results 338 "method_parameters": { 339 "ef_search": 512, # See https://docs.opensearch.org/latest/query-dsl/specialized/k-nn/index/#ef_search 340 } 341 } 342 ``` 343 344 For a full list of available parameters, see the OpenSearch documentation: 345 https://docs.opensearch.org/latest/query-dsl/specialized/k-nn/index/#request-body-fields 346 347 **Raises:** 348 349 - <code>ValueError</code> – If `document_store` is not an instance of OpenSearchDocumentStore. 350 351 #### to_dict 352 353 ```python 354 to_dict() -> dict[str, Any] 355 ``` 356 357 Serializes the component to a dictionary. 358 359 **Returns:** 360 361 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 362 363 #### from_dict 364 365 ```python 366 from_dict(data: dict[str, Any]) -> OpenSearchEmbeddingRetriever 367 ``` 368 369 Deserializes the component from a dictionary. 370 371 **Parameters:** 372 373 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 374 375 **Returns:** 376 377 - <code>OpenSearchEmbeddingRetriever</code> – Deserialized component. 378 379 #### run 380 381 ```python 382 run( 383 query_embedding: list[float], 384 filters: dict[str, Any] | None = None, 385 top_k: int | None = None, 386 custom_query: dict[str, Any] | None = None, 387 efficient_filtering: bool | None = None, 388 document_store: OpenSearchDocumentStore | None = None, 389 search_kwargs: dict[str, Any] | None = None, 390 ) -> dict[str, list[Document]] 391 ``` 392 393 Retrieve documents using a vector similarity metric. 394 395 **Parameters:** 396 397 - **query_embedding** (<code>list\[float\]</code>) – Embedding of the query. 398 399 - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied when fetching documents from the Document Store. 400 Filters are applied during the approximate kNN search to ensure the Retriever returns `top_k` matching 401 documents. 402 The way runtime filters are applied depends on the `filter_policy` selected when initializing the Retriever. 403 404 - **top_k** (<code>int | None</code>) – Maximum number of documents to return. 405 406 - **custom_query** (<code>dict\[str, Any\] | None</code>) – A custom OpenSearch query containing a mandatory `$query_embedding` and an 407 optional `$filters` placeholder. 408 409 **An example custom_query:** 410 411 ```python 412 { 413 "query": { 414 "bool": { 415 "must": [ 416 { 417 "knn": { 418 "embedding": { 419 "vector": "$query_embedding", // mandatory query placeholder 420 "k": 10000, 421 } 422 } 423 } 424 ], 425 "filter": "$filters" // optional filter placeholder 426 } 427 } 428 } 429 ``` 430 431 For this `custom_query`, an example `run()` could be: 432 433 ```python 434 retriever.run( 435 query_embedding=embedding, 436 filters={ 437 "operator": "AND", 438 "conditions": [ 439 {"field": "meta.years", "operator": "==", "value": "2019"}, 440 {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, 441 ], 442 }, 443 ) 444 ``` 445 446 - **efficient_filtering** (<code>bool | None</code>) – If `True`, the filter will be applied during the approximate kNN search. 447 This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib". 448 - **document_store** (<code>OpenSearchDocumentStore | None</code>) – Optional instance of OpenSearchDocumentStore to use with the Retriever. 449 - **search_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for finetuning the embedding search. If not provided, 450 defaults to the parameter set at initialization (if any). 451 E.g., to specify `k` and `ef_search` 452 453 ```python 454 { 455 "k": 20, # See https://docs.opensearch.org/latest/vector-search/vector-search-techniques/approximate-knn/#the-number-of-returned-results 456 "method_parameters": { 457 "ef_search": 512, # See https://docs.opensearch.org/latest/query-dsl/specialized/k-nn/index/#ef_search 458 } 459 } 460 ``` 461 462 For a full list of available parameters, see the OpenSearch documentation: 463 https://docs.opensearch.org/latest/query-dsl/specialized/k-nn/index/#request-body-fields 464 465 **Returns:** 466 467 - <code>dict\[str, list\[Document\]\]</code> – Dictionary with key "documents" containing the retrieved Documents. 468 - documents: List of Document similar to `query_embedding`. 469 470 #### run_async 471 472 ```python 473 run_async( 474 query_embedding: list[float], 475 filters: dict[str, Any] | None = None, 476 top_k: int | None = None, 477 custom_query: dict[str, Any] | None = None, 478 efficient_filtering: bool | None = None, 479 document_store: OpenSearchDocumentStore | None = None, 480 search_kwargs: dict[str, Any] | None = None, 481 ) -> dict[str, list[Document]] 482 ``` 483 484 Asynchronously retrieve documents using a vector similarity metric. 485 486 **Parameters:** 487 488 - **query_embedding** (<code>list\[float\]</code>) – Embedding of the query. 489 490 - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied when fetching documents from the Document Store. 491 Filters are applied during the approximate kNN search to ensure the Retriever 492 returns `top_k` matching documents. 493 The way runtime filters are applied depends on the `filter_policy` selected when initializing the Retriever. 494 495 - **top_k** (<code>int | None</code>) – Maximum number of documents to return. 496 497 - **custom_query** (<code>dict\[str, Any\] | None</code>) – A custom OpenSearch query containing a mandatory `$query_embedding` and an 498 optional `$filters` placeholder. 499 500 **An example custom_query:** 501 502 ```python 503 { 504 "query": { 505 "bool": { 506 "must": [ 507 { 508 "knn": { 509 "embedding": { 510 "vector": "$query_embedding", // mandatory query placeholder 511 "k": 10000, 512 } 513 } 514 } 515 ], 516 "filter": "$filters" // optional filter placeholder 517 } 518 } 519 } 520 ``` 521 522 For this `custom_query`, an example `run()` could be: 523 524 ```python 525 retriever.run( 526 query_embedding=embedding, 527 filters={ 528 "operator": "AND", 529 "conditions": [ 530 {"field": "meta.years", "operator": "==", "value": "2019"}, 531 {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]}, 532 ], 533 }, 534 ) 535 ``` 536 537 - **efficient_filtering** (<code>bool | None</code>) – If `True`, the filter will be applied during the approximate kNN search. 538 This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib". 539 - **document_store** (<code>OpenSearchDocumentStore | None</code>) – Optional instance of OpenSearchDocumentStore to use with the Retriever. 540 - **search_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for finetuning the embedding search. If not provided, 541 defaults to the parameter set at initialization (if any). 542 E.g., to specify `k` and `ef_search` 543 544 ```python 545 { 546 "k": 20, # See https://docs.opensearch.org/latest/vector-search/vector-search-techniques/approximate-knn/#the-number-of-returned-results 547 "method_parameters": { 548 "ef_search": 512, # See https://docs.opensearch.org/latest/query-dsl/specialized/k-nn/index/#ef_search 549 } 550 } 551 ``` 552 553 For a full list of available parameters, see the OpenSearch documentation: 554 https://docs.opensearch.org/latest/query-dsl/specialized/k-nn/index/#request-body-fields 555 556 **Returns:** 557 558 - <code>dict\[str, list\[Document\]\]</code> – Dictionary with key "documents" containing the retrieved Documents. 559 - documents: List of Document similar to `query_embedding`. 560 561 ## haystack_integrations.components.retrievers.opensearch.metadata_retriever 562 563 ### OpenSearchMetadataRetriever 564 565 Retrieves and ranks metadata from documents stored in an OpenSearchDocumentStore. 566 567 It searches specified metadata fields for matches to a given query, ranks the results based on relevance using 568 Jaccard similarity, and returns the top-k results containing only the specified metadata fields. Additionally, it 569 adds a boost to the score of exact matches. 570 571 The search is designed for metadata fields whose values are **text** (strings). It uses prefix, wildcard and fuzzy 572 matching to find candidate documents; these query types operate only on text/keyword fields in OpenSearch. 573 574 Metadata fields with **non-string types** (integers, floats, booleans, lists of non-strings) are indexed by 575 OpenSearch as numeric, boolean, or array types. Those field types do not support prefix, wildcard, or full-text 576 match queries, so documents are typically not found when you search only by such fields. 577 578 **Mixed types** in the same metadata field (e.g. a list containing both strings and numbers) are not supported. 579 580 Must be connected to the OpenSearchDocumentStore to run. 581 582 Example: 583 \`\`\`python 584 from haystack import Document 585 from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore 586 from haystack_integrations.components.retrievers.opensearch import OpenSearchMetadataRetriever 587 588 ```` 589 # Create documents with metadata 590 docs = [ 591 Document( 592 content="Python programming guide", 593 meta={"category": "Python", "status": "active", "priority": 1, "author": "John Doe"} 594 ), 595 Document( 596 content="Java tutorial", 597 meta={"category": "Java", "status": "active", "priority": 2, "author": "Jane Smith"} 598 ), 599 Document( 600 content="Python advanced topics", 601 meta={"category": "Python", "status": "inactive", "priority": 3, "author": "John Doe"} 602 ), 603 ] 604 document_store.write_documents(docs, refresh=True) 605 606 # Create retriever specifying which metadata fields to search and return 607 retriever = OpenSearchMetadataRetriever( 608 document_store=document_store, 609 metadata_fields=["category", "status", "priority"], 610 top_k=10, 611 ) 612 613 # Search for metadata 614 result = retriever.run(query="Python") 615 616 # Result structure: 617 # { 618 # "metadata": [ 619 # {"category": "Python", "status": "active", "priority": 1}, 620 # {"category": "Python", "status": "inactive", "priority": 3}, 621 # ] 622 # } 623 # 624 # Note: Only the specified metadata_fields are returned in the results. 625 # Other metadata fields (like "author") and document content are excluded. 626 ``` 627 ```` 628 629 #### __init__ 630 631 ```python 632 __init__( 633 *, 634 document_store: OpenSearchDocumentStore, 635 metadata_fields: list[str], 636 top_k: int = 20, 637 exact_match_weight: float = 0.6, 638 mode: Literal["strict", "fuzzy"] = "fuzzy", 639 fuzziness: int | Literal["AUTO"] = 2, 640 prefix_length: int = 0, 641 max_expansions: int = 200, 642 tie_breaker: float = 0.7, 643 jaccard_n: int = 3, 644 raise_on_failure: bool = True 645 ) -> None 646 ``` 647 648 Create the OpenSearchMetadataRetriever component. 649 650 **Parameters:** 651 652 - **document_store** (<code>OpenSearchDocumentStore</code>) – An instance of OpenSearchDocumentStore to use with the Retriever. 653 - **metadata_fields** (<code>list\[str\]</code>) – List of metadata field names to search within each document's metadata. 654 - **top_k** (<code>int</code>) – Maximum number of top results to return based on relevance. Default is 20. 655 - **exact_match_weight** (<code>float</code>) – Weight to boost the score of exact matches in metadata fields. 656 Default is 0.6. It's used on both "strict" and "fuzzy" modes and applied after the search executes. 657 - **mode** (<code>Literal['strict', 'fuzzy']</code>) – Search mode. "strict" uses prefix and wildcard matching, 658 "fuzzy" uses fuzzy matching with dis_max queries. Default is "fuzzy". 659 In both modes, results are scored using Jaccard similarity (n-gram based) 660 computed server-side via a Painless script; n is controlled by jaccard_n. 661 - **fuzziness** (<code>int | Literal['AUTO']</code>) – Maximum allowed Damerau-Levenshtein distance (edit distance) for fuzzy matching. 662 Accepts an integer (e.g., 0, 1, 2) or "AUTO" which chooses based on term length. 663 Default is 2. Only applies when mode is "fuzzy". 664 - **prefix_length** (<code>int</code>) – Number of leading characters that must match exactly before fuzzy matching applies. 665 Default is 0 (no prefix requirement). Only applies when mode is "fuzzy". 666 - **max_expansions** (<code>int</code>) – Maximum number of term variations the fuzzy query can generate. 667 Default is 200. Only applies when mode is "fuzzy". 668 - **tie_breaker** (<code>float</code>) – Weight (0..1) for other matching clauses in the dis_max query. 669 Boosts documents that match multiple clauses. Default is 0.7. Only applies when mode is "fuzzy". 670 - **jaccard_n** (<code>int</code>) – N-gram size for Jaccard similarity scoring. Default 3; larger n favors longer token matches. 671 - **raise_on_failure** (<code>bool</code>) – If `True`, raises an exception if the API call fails. 672 If `False`, logs a warning and returns an empty list. 673 674 **Raises:** 675 676 - <code>ValueError</code> – If `document_store` is not an instance of OpenSearchDocumentStore. 677 678 #### to_dict 679 680 ```python 681 to_dict() -> dict[str, Any] 682 ``` 683 684 Serializes the component to a dictionary. 685 686 **Returns:** 687 688 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 689 690 #### from_dict 691 692 ```python 693 from_dict(data: dict[str, Any]) -> OpenSearchMetadataRetriever 694 ``` 695 696 Deserializes the component from a dictionary. 697 698 **Parameters:** 699 700 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 701 702 **Returns:** 703 704 - <code>OpenSearchMetadataRetriever</code> – Deserialized component. 705 706 #### run 707 708 ```python 709 run( 710 query: str, 711 *, 712 document_store: OpenSearchDocumentStore | None = None, 713 metadata_fields: list[str] | None = None, 714 top_k: int | None = None, 715 exact_match_weight: float | None = None, 716 mode: Literal["strict", "fuzzy"] | None = None, 717 fuzziness: int | Literal["AUTO"] | None = None, 718 prefix_length: int | None = None, 719 max_expansions: int | None = None, 720 tie_breaker: float | None = None, 721 jaccard_n: int | None = None, 722 filters: dict[str, Any] | None = None 723 ) -> dict[str, list[dict[str, Any]]] 724 ``` 725 726 Execute a search query against the metadata fields of documents stored in the Document Store. 727 728 **Parameters:** 729 730 - **query** (<code>str</code>) – The search query string, which can contain multiple comma-separated parts. 731 Each part will be searched across all specified fields. 732 - **document_store** (<code>OpenSearchDocumentStore | None</code>) – The Document Store to run the query against. 733 If not provided, the one provided in `__init__` is used. 734 - **metadata_fields** (<code>list\[str\] | None</code>) – List of metadata field names to search within. 735 If not provided, the fields provided in `__init__` are used. 736 - **top_k** (<code>int | None</code>) – Maximum number of top results to return based on relevance. 737 The search retrieves up to 1000 hits from OpenSearch, then applies boosting and filters 738 the results to the top_k most relevant matches. 739 If not provided, the top_k provided in `__init__` is used. 740 - **exact_match_weight** (<code>float | None</code>) – Weight to boost the score of exact matches in metadata fields. 741 If not provided, the exact_match_weight provided in `__init__` is used. 742 - **mode** (<code>Literal['strict', 'fuzzy'] | None</code>) – Search mode. "strict" uses prefix and wildcard matching, 743 "fuzzy" uses fuzzy matching with dis_max queries. 744 In both modes, results are scored using Jaccard similarity (n-gram based) via a Painless script. 745 If not provided, the mode provided in `__init__` is used. 746 - **fuzziness** (<code>int | Literal['AUTO'] | None</code>) – Maximum allowed Damerau-Levenshtein distance (edit distance) for fuzzy matching. 747 Accepts an integer (e.g., 0, 1, 2) or "AUTO" which chooses based on term length. 748 Only applies when mode is "fuzzy". If not provided, the fuzziness provided in `__init__` is used. 749 - **prefix_length** (<code>int | None</code>) – Number of leading characters that must match exactly before fuzzy matching applies. 750 Only applies when mode is "fuzzy". If not provided, the prefix_length provided in `__init__` is used. 751 - **max_expansions** (<code>int | None</code>) – Maximum number of term variations the fuzzy query can generate. 752 Only applies when mode is "fuzzy". If not provided, the max_expansions provided in `__init__` is used. 753 - **tie_breaker** (<code>float | None</code>) – Weight (0..1) for other matching clauses; boosts docs matching multiple 754 clauses. Only applies when mode is "fuzzy". If not provided, the tie_breaker provided in `__init__` is used. 755 - **jaccard_n** (<code>int | None</code>) – N-gram size for Jaccard similarity scoring. If not provided, the jaccard_n from `__init__` 756 is used. 757 - **filters** (<code>dict\[str, Any\] | None</code>) – Additional filters to apply to the search query. 758 759 **Returns:** 760 761 - <code>dict\[str, list\[dict\[str, Any\]\]\]</code> – A dictionary containing the top-k retrieved metadata results. 762 763 Example: 764 \`\`\`python 765 from haystack import Document 766 767 ```` 768 # First, add a document with matching metadata to the store 769 store.write_documents([ 770 Document( 771 content="Python programming guide", 772 meta={"category": "Python", "status": "active", "priority": 1} 773 ) 774 ]) 775 776 retriever = OpenSearchMetadataRetriever( 777 document_store=store, 778 metadata_fields=["category", "status", "priority"] 779 ) 780 result = retriever.run(query="Python, active") 781 # Returns: {"metadata": [{"category": "Python", "status": "active", "priority": 1}]} 782 ``` 783 ```` 784 785 #### run_async 786 787 ```python 788 run_async( 789 query: str, 790 *, 791 document_store: OpenSearchDocumentStore | None = None, 792 metadata_fields: list[str] | None = None, 793 top_k: int | None = None, 794 exact_match_weight: float | None = None, 795 mode: Literal["strict", "fuzzy"] | None = None, 796 fuzziness: int | Literal["AUTO"] | None = None, 797 prefix_length: int | None = None, 798 max_expansions: int | None = None, 799 tie_breaker: float | None = None, 800 jaccard_n: int | None = None, 801 filters: dict[str, Any] | None = None 802 ) -> dict[str, list[dict[str, Any]]] 803 ``` 804 805 Asynchronously execute a search query against the metadata fields of documents stored in the Document Store. 806 807 **Parameters:** 808 809 - **query** (<code>str</code>) – The search query string, which can contain multiple comma-separated parts. 810 Each part will be searched across all specified fields. 811 - **document_store** (<code>OpenSearchDocumentStore | None</code>) – The Document Store to run the query against. 812 If not provided, the one provided in `__init__` is used. 813 - **metadata_fields** (<code>list\[str\] | None</code>) – List of metadata field names to search within. 814 If not provided, the fields provided in `__init__` are used. 815 - **top_k** (<code>int | None</code>) – Maximum number of top results to return based on relevance. 816 The search retrieves up to 1000 hits from OpenSearch, then applies boosting and filters 817 the results to the top_k most relevant matches. 818 If not provided, the top_k provided in `__init__` is used. 819 - **exact_match_weight** (<code>float | None</code>) – Weight to boost the score of exact matches in metadata fields. 820 If not provided, the exact_match_weight provided in `__init__` is used. 821 - **mode** (<code>Literal['strict', 'fuzzy'] | None</code>) – Search mode. "strict" uses prefix and wildcard matching, 822 "fuzzy" uses fuzzy matching with dis_max queries. 823 In both modes, results are scored using Jaccard similarity (n-gram based) via a Painless script. 824 If not provided, the mode provided in `__init__` is used. 825 - **fuzziness** (<code>int | Literal['AUTO'] | None</code>) – Maximum allowed Damerau-Levenshtein distance (edit distance) for fuzzy matching. 826 Accepts an integer (e.g., 0, 1, 2) or "AUTO" which chooses based on term length. 827 Only applies when mode is "fuzzy". If not provided, the fuzziness provided in `__init__` is used. 828 - **prefix_length** (<code>int | None</code>) – Number of leading characters that must match exactly before fuzzy matching applies. 829 Only applies when mode is "fuzzy". If not provided, the prefix_length provided in `__init__` is used. 830 - **max_expansions** (<code>int | None</code>) – Maximum number of term variations the fuzzy query can generate. 831 Only applies when mode is "fuzzy". If not provided, the max_expansions provided in `__init__` is used. 832 - **tie_breaker** (<code>float | None</code>) – Weight (0..1) for other matching clauses; boosts docs matching multiple clauses. 833 Only applies when mode is "fuzzy". If not provided, the tie_breaker provided in `__init__` is used. 834 - **jaccard_n** (<code>int | None</code>) – N-gram size for Jaccard similarity scoring. If not provided, the jaccard_n from `__init__` 835 is used. 836 - **filters** (<code>dict\[str, Any\] | None</code>) – Additional filters to apply to the search query. 837 838 **Returns:** 839 840 - <code>dict\[str, list\[dict\[str, Any\]\]\]</code> – A dictionary containing the top-k retrieved metadata results. 841 842 Example: 843 \`\`\`python 844 from haystack import Document 845 846 ```` 847 # First, add a document with matching metadata to the store 848 await store.write_documents_async([ 849 Document( 850 content="Python programming guide", 851 meta={"category": "Python", "status": "active", "priority": 1} 852 ) 853 ]) 854 855 retriever = OpenSearchMetadataRetriever( 856 document_store=store, 857 metadata_fields=["category", "status", "priority"] 858 ) 859 result = await retriever.run_async(query="Python, active") 860 # Returns: {"metadata": [{"category": "Python", "status": "active", "priority": 1}]} 861 ``` 862 ```` 863 864 ## haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever 865 866 ### OpenSearchHybridRetriever 867 868 A hybrid retriever that combines embedding-based and keyword-based retrieval from OpenSearch. 869 870 Example usage: 871 872 Make sure you have "sentence-transformers>=3.0.0": 873 874 ``` 875 pip install haystack-ai datasets "sentence-transformers>=3.0.0" 876 ``` 877 878 And OpenSearch running. You can run OpenSearch with Docker: 879 880 ``` 881 docker run -d --name opensearch-nosec -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" 882 -e "DISABLE_SECURITY_PLUGIN=true" opensearchproject/opensearch:2.12.0 883 ``` 884 885 ```python 886 from haystack import Document 887 from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder 888 from haystack_integrations.components.retrievers.opensearch import OpenSearchHybridRetriever 889 from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore 890 891 # Initialize the document store 892 doc_store = OpenSearchDocumentStore( 893 hosts=["<http://localhost:9200>"], 894 index="document_store", 895 embedding_dim=384, 896 ) 897 898 # Create some sample documents 899 docs = [ 900 Document(content="Machine learning is a subset of artificial intelligence."), 901 Document(content="Deep learning is a subset of machine learning."), 902 Document(content="Natural language processing is a field of AI."), 903 Document(content="Reinforcement learning is a type of machine learning."), 904 Document(content="Supervised learning is a type of machine learning."), 905 ] 906 907 # Embed the documents and add them to the document store 908 doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2") 909 doc_embedder.warm_up() 910 docs = doc_embedder.run(docs) 911 doc_store.write_documents(docs['documents']) 912 913 # Initialize some haystack text embedder, in this case the SentenceTransformersTextEmbedder 914 embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2") 915 916 # Initialize the hybrid retriever 917 retriever = OpenSearchHybridRetriever( 918 document_store=doc_store, 919 embedder=embedder, 920 top_k_bm25=3, 921 top_k_embedding=3, 922 join_mode="reciprocal_rank_fusion" 923 ) 924 925 # Run the retriever 926 results = retriever.run(query="What is reinforcement learning?", filters_bm25=None, filters_embedding=None) 927 928 >> results['documents'] 929 {'documents': [Document(id=..., content: 'Reinforcement learning is a type of machine learning.', score: 1.0), 930 Document(id=..., content: 'Supervised learning is a type of machine learning.', score: 0.9760624679979518), 931 Document(id=..., content: 'Deep learning is a subset of machine learning.', score: 0.4919354838709677), 932 Document(id=..., content: 'Machine learning is a subset of artificial intelligence.', score: 0.4841269841269841)]} 933 ``` 934 935 #### __init__ 936 937 ```python 938 __init__( 939 document_store: OpenSearchDocumentStore, 940 *, 941 embedder: TextEmbedder, 942 filters_bm25: dict[str, Any] | None = None, 943 fuzziness: int | str = "AUTO", 944 top_k_bm25: int = 10, 945 scale_score: bool = False, 946 all_terms_must_match: bool = False, 947 filter_policy_bm25: str | FilterPolicy = FilterPolicy.REPLACE, 948 custom_query_bm25: dict[str, Any] | None = None, 949 filters_embedding: dict[str, Any] | None = None, 950 top_k_embedding: int = 10, 951 filter_policy_embedding: str | FilterPolicy = FilterPolicy.REPLACE, 952 custom_query_embedding: dict[str, Any] | None = None, 953 search_kwargs_embedding: dict[str, Any] | None = None, 954 join_mode: str | JoinMode = JoinMode.RECIPROCAL_RANK_FUSION, 955 weights: list[float] | None = None, 956 top_k: int | None = None, 957 sort_by_score: bool = True, 958 **kwargs: Any 959 ) -> None 960 ``` 961 962 Initialize the OpenSearchHybridRetriever using both embedding-based and keyword-based retrieval methods. 963 964 This is a super component to retrieve documents from OpenSearch using both retrieval methods. 965 966 We don't explicitly define all the init parameters of the components in the constructor, for each 967 of the components, since that would be around 20+ parameters. Instead, we define the most important ones 968 and pass the rest as kwargs. This is to keep the constructor clean and easy to read. 969 970 If you need to pass extra parameters to the components, you can do so by passing them as kwargs. It expects 971 a dictionary with the component name as the key and the parameters as the value. The component name should be: 972 973 ``` 974 - "bm25_retriever" -> OpenSearchBM25Retriever 975 - "embedding_retriever" -> OpenSearchEmbeddingRetriever 976 ``` 977 978 **Parameters:** 979 980 - **document_store** (<code>OpenSearchDocumentStore</code>) – The OpenSearchDocumentStore to use for retrieval. 981 - **embedder** (<code>TextEmbedder</code>) – A TextEmbedder to use for embedding the query. 982 See `haystack.components.embedders.types.protocol.TextEmbedder` for more information. 983 - **filters_bm25** (<code>dict\[str, Any\] | None</code>) – Filters for the BM25 retriever. 984 - **fuzziness** (<code>int | str</code>) – The fuzziness for the BM25 retriever. 985 - **top_k_bm25** (<code>int</code>) – The number of results to return from the BM25 retriever. 986 - **scale_score** (<code>bool</code>) – Whether to scale the score for the BM25 retriever. 987 - **all_terms_must_match** (<code>bool</code>) – Whether all terms must match for the BM25 retriever. 988 - **filter_policy_bm25** (<code>str | FilterPolicy</code>) – The filter policy for the BM25 retriever. 989 - **custom_query_bm25** (<code>dict\[str, Any\] | None</code>) – A custom query for the BM25 retriever. 990 - **filters_embedding** (<code>dict\[str, Any\] | None</code>) – Filters for the embedding retriever. 991 - **top_k_embedding** (<code>int</code>) – The number of results to return from the embedding retriever. 992 - **filter_policy_embedding** (<code>str | FilterPolicy</code>) – The filter policy for the embedding retriever. 993 - **custom_query_embedding** (<code>dict\[str, Any\] | None</code>) – A custom query for the embedding retriever. 994 - **search_kwargs_embedding** (<code>dict\[str, Any\] | None</code>) – Additional search kwargs for the embedding retriever. 995 - **join_mode** (<code>str | JoinMode</code>) – The mode to use for joining the results from the BM25 and embedding retrievers. 996 - **weights** (<code>list\[float\] | None</code>) – The weights for the joiner. 997 - **top_k** (<code>int | None</code>) – The number of results to return from the joiner. 998 - **sort_by_score** (<code>bool</code>) – Whether to sort the results by score. 999 - \*\***kwargs** (<code>Any</code>) – Additional keyword arguments. Use the following keys to pass extra parameters to the retrievers: 1000 - "bm25_retriever" -> OpenSearchBM25Retriever 1001 - "embedding_retriever" -> OpenSearchEmbeddingRetriever 1002 1003 #### warm_up 1004 1005 ```python 1006 warm_up() -> None 1007 ``` 1008 1009 Warm up the underlying pipeline components. 1010 1011 #### run 1012 1013 ```python 1014 run( 1015 query: str, 1016 filters_bm25: dict[str, Any] | None = None, 1017 filters_embedding: dict[str, Any] | None = None, 1018 top_k_bm25: int | None = None, 1019 top_k_embedding: int | None = None, 1020 ) -> dict[str, list[Document]] 1021 ``` 1022 1023 Run the hybrid retrieval pipeline and return retrieved documents. 1024 1025 #### to_dict 1026 1027 ```python 1028 to_dict() -> dict[str, Any] 1029 ``` 1030 1031 Serialize OpenSearchHybridRetriever to a dictionary. 1032 1033 **Returns:** 1034 1035 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1036 1037 #### from_dict 1038 1039 ```python 1040 from_dict(data: dict[str, Any]) -> OpenSearchHybridRetriever 1041 ``` 1042 1043 Deserialize an OpenSearchHybridRetriever from a dictionary. 1044 1045 ## haystack_integrations.components.retrievers.opensearch.sql_retriever 1046 1047 ### OpenSearchSQLRetriever 1048 1049 Executes raw OpenSearch SQL queries against an OpenSearchDocumentStore. 1050 1051 This component allows you to execute SQL queries directly against the OpenSearch index, 1052 which is useful for fetching metadata, aggregations, and other structured data at runtime. 1053 1054 Returns the raw JSON response from the OpenSearch SQL API. 1055 1056 #### __init__ 1057 1058 ```python 1059 __init__( 1060 *, 1061 document_store: OpenSearchDocumentStore, 1062 raise_on_failure: bool = True, 1063 fetch_size: int | None = None 1064 ) -> None 1065 ``` 1066 1067 Creates the OpenSearchSQLRetriever component. 1068 1069 **Parameters:** 1070 1071 - **document_store** (<code>OpenSearchDocumentStore</code>) – An instance of OpenSearchDocumentStore to use with the Retriever. 1072 - **raise_on_failure** (<code>bool</code>) – Whether to raise an exception if the API call fails. Otherwise, log a warning and return None. 1073 - **fetch_size** (<code>int | None</code>) – Optional number of results to fetch per page. If not provided, the default 1074 fetch size set in OpenSearch is used. 1075 1076 **Raises:** 1077 1078 - <code>ValueError</code> – If `document_store` is not an instance of OpenSearchDocumentStore. 1079 1080 #### to_dict 1081 1082 ```python 1083 to_dict() -> dict[str, Any] 1084 ``` 1085 1086 Serializes the component to a dictionary. 1087 1088 **Returns:** 1089 1090 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1091 1092 #### from_dict 1093 1094 ```python 1095 from_dict(data: dict[str, Any]) -> OpenSearchSQLRetriever 1096 ``` 1097 1098 Deserializes the component from a dictionary. 1099 1100 **Parameters:** 1101 1102 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 1103 1104 **Returns:** 1105 1106 - <code>OpenSearchSQLRetriever</code> – Deserialized component. 1107 1108 #### run 1109 1110 ```python 1111 run( 1112 query: str, 1113 document_store: OpenSearchDocumentStore | None = None, 1114 fetch_size: int | None = None, 1115 ) -> dict[str, dict[str, Any]] 1116 ``` 1117 1118 Execute a raw OpenSearch SQL query against the index. 1119 1120 **Parameters:** 1121 1122 - **query** (<code>str</code>) – The OpenSearch SQL query to execute. 1123 - **document_store** (<code>OpenSearchDocumentStore | None</code>) – Optionally, an instance of OpenSearchDocumentStore to use with the Retriever. 1124 - **fetch_size** (<code>int | None</code>) – Optional number of results to fetch per page. If not provided, uses the value 1125 specified during initialization, or the default fetch size set in OpenSearch. 1126 1127 **Returns:** 1128 1129 - <code>dict\[str, dict\[str, Any\]\]</code> – A dictionary containing the raw JSON response from OpenSearch SQL API: 1130 - result: The raw JSON response from OpenSearch (dict) or None on error. 1131 1132 Example: 1133 `python retriever = OpenSearchSQLRetriever(document_store=document_store) result = retriever.run( query="SELECT content, category FROM my_index WHERE category = 'A'" ) # result["result"] contains the raw OpenSearch JSON response # For regular queries: result["result"]["hits"]["hits"] contains documents # For aggregate queries: result["result"]["aggregations"] contains aggregations ` 1134 1135 #### run_async 1136 1137 ```python 1138 run_async( 1139 query: str, 1140 document_store: OpenSearchDocumentStore | None = None, 1141 fetch_size: int | None = None, 1142 ) -> dict[str, dict[str, Any]] 1143 ``` 1144 1145 Asynchronously execute a raw OpenSearch SQL query against the index. 1146 1147 **Parameters:** 1148 1149 - **query** (<code>str</code>) – The OpenSearch SQL query to execute. 1150 - **document_store** (<code>OpenSearchDocumentStore | None</code>) – Optionally, an instance of OpenSearchDocumentStore to use with the Retriever. 1151 - **fetch_size** (<code>int | None</code>) – Optional number of results to fetch per page. If not provided, uses the value 1152 specified during initialization, or the default fetch size set in OpenSearch. 1153 1154 **Returns:** 1155 1156 - <code>dict\[str, dict\[str, Any\]\]</code> – A dictionary containing the raw JSON response from OpenSearch SQL API: 1157 - result: The raw JSON response from OpenSearch (dict) or None on error. 1158 1159 Example: 1160 `python retriever = OpenSearchSQLRetriever(document_store=document_store) result = await retriever.run_async( query="SELECT content, category FROM my_index WHERE category = 'A'" ) # result["result"] contains the raw OpenSearch JSON response # For regular queries: result["result"]["hits"]["hits"] contains documents # For aggregate queries: result["result"]["aggregations"] contains aggregations ` 1161 1162 ## haystack_integrations.document_stores.opensearch.document_store 1163 1164 ### OpenSearchDocumentStore 1165 1166 An instance of an OpenSearch database you can use to store all types of data. 1167 1168 This document store is a thin wrapper around the OpenSearch client. 1169 It allows you to store and retrieve documents from an OpenSearch index. 1170 1171 Usage example: 1172 1173 ```python 1174 from haystack_integrations.document_stores.opensearch import ( 1175 OpenSearchDocumentStore, 1176 ) 1177 from haystack import Document 1178 1179 document_store = OpenSearchDocumentStore(hosts="localhost:9200") 1180 1181 document_store.write_documents( 1182 [ 1183 Document(content="My first document", id="1"), 1184 Document(content="My second document", id="2"), 1185 ] 1186 ) 1187 1188 print(document_store.count_documents()) 1189 # 2 1190 1191 print(document_store.filter_documents()) 1192 # [Document(id='1', content='My first document', ...), Document(id='2', content='My second document', ...)] 1193 ``` 1194 1195 #### __init__ 1196 1197 ```python 1198 __init__( 1199 *, 1200 hosts: Hosts | None = None, 1201 index: str = "default", 1202 max_chunk_bytes: int = DEFAULT_MAX_CHUNK_BYTES, 1203 embedding_dim: int = 768, 1204 return_embedding: bool = False, 1205 method: dict[str, Any] | None = None, 1206 mappings: dict[str, Any] | None = None, 1207 settings: dict[str, Any] | None = DEFAULT_SETTINGS, 1208 create_index: bool = True, 1209 http_auth: ( 1210 tuple[Secret, Secret] 1211 | tuple[str, str] 1212 | list[str] 1213 | str 1214 | AWSAuth 1215 | None 1216 ) = ( 1217 Secret.from_env_var("OPENSEARCH_USERNAME", strict=False), 1218 Secret.from_env_var("OPENSEARCH_PASSWORD", strict=False), 1219 ), 1220 use_ssl: bool | None = None, 1221 verify_certs: bool | None = None, 1222 timeout: int | None = None, 1223 nested_fields: list[str] | Literal["*"] | None = None, 1224 **kwargs: Any 1225 ) -> None 1226 ``` 1227 1228 Creates a new OpenSearchDocumentStore instance. 1229 1230 The `embeddings_dim`, `method`, `mappings`, and `settings` arguments are only used if the index does not 1231 exist and needs to be created. If the index already exists, its current configurations will be used. 1232 1233 For more information on connection parameters, see the [official OpenSearch documentation](https://opensearch.org/docs/latest/clients/python-low-level/#connecting-to-opensearch) 1234 1235 **Parameters:** 1236 1237 - **hosts** (<code>Hosts | None</code>) – List of hosts running the OpenSearch client. Defaults to None 1238 - **index** (<code>str</code>) – Name of index in OpenSearch, if it doesn't exist it will be created. Defaults to "default" 1239 - **max_chunk_bytes** (<code>int</code>) – Maximum size of the requests in bytes. Defaults to 100MB 1240 - **embedding_dim** (<code>int</code>) – Dimension of the embeddings. Defaults to 768 1241 - **return_embedding** (<code>bool</code>) – Whether to return the embedding of the retrieved Documents. This parameter also applies to the 1242 `filter_documents` and `filter_documents_async` methods. 1243 - **method** (<code>dict\[str, Any\] | None</code>) – The method definition of the underlying configuration of the approximate k-NN algorithm. Please 1244 see the [official OpenSearch docs](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/#method-definitions) 1245 for more information. Defaults to None 1246 - **mappings** (<code>dict\[str, Any\] | None</code>) – The mapping of how the documents are stored and indexed. Please see the [official OpenSearch docs](https://opensearch.org/docs/latest/field-types/) 1247 for more information. If None, it uses the embedding_dim and method arguments to create default mappings. 1248 Defaults to None 1249 - **settings** (<code>dict\[str, Any\] | None</code>) – The settings of the index to be created. Please see the [official OpenSearch docs](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/#index-settings) 1250 for more information. Defaults to `{"index.knn": True}`. 1251 - **create_index** (<code>bool</code>) – Whether to create the index if it doesn't exist. Defaults to True 1252 - **http_auth** (<code>tuple\[Secret, Secret\] | tuple\[str, str\] | list\[str\] | str | AWSAuth | None</code>) – http_auth param passed to the underlying connection class. 1253 For basic authentication with default connection class `Urllib3HttpConnection` this can be 1254 - a tuple of (username, password) 1255 - a list of [username, password] 1256 - a string of "username:password" 1257 If not provided, will read values from OPENSEARCH_USERNAME and OPENSEARCH_PASSWORD environment variables. 1258 For AWS authentication with `Urllib3HttpConnection` pass an instance of `AWSAuth`. 1259 Defaults to None 1260 - **use_ssl** (<code>bool | None</code>) – Whether to use SSL. Defaults to None 1261 - **verify_certs** (<code>bool | None</code>) – Whether to verify certificates. Defaults to None 1262 - **timeout** (<code>int | None</code>) – Timeout in seconds. Defaults to None 1263 - **nested_fields** (<code>list\[str\] | Literal['\*'] | None</code>) – List of metadata field paths (without the `meta.` prefix) that should be mapped 1264 as OpenSearch `nested` type, enabling multi-condition filtering on array-of-objects fields. 1265 Pass `"*"` to auto-detect `list[dict]` fields and map them as nested from 1266 the first `write_documents` batch. 1267 When the index already exists, nested fields are discovered from the live mapping. 1268 Defaults to None (no nested support). 1269 - \*\***kwargs** (<code>Any</code>) – Optional arguments that `OpenSearch` takes. For the full list of supported kwargs, 1270 see the [official OpenSearch reference](https://opensearch-project.github.io/opensearch-py/api-ref/clients/opensearch_client.html) 1271 1272 #### create_index 1273 1274 ```python 1275 create_index( 1276 index: str | None = None, 1277 mappings: dict[str, Any] | None = None, 1278 settings: dict[str, Any] | None = None, 1279 ) -> None 1280 ``` 1281 1282 Creates an index in OpenSearch. 1283 1284 Note that this method ignores the `create_index` argument from the constructor. 1285 1286 **Parameters:** 1287 1288 - **index** (<code>str | None</code>) – Name of the index to create. If None, the index name from the constructor is used. 1289 - **mappings** (<code>dict\[str, Any\] | None</code>) – The mapping of how the documents are stored and indexed. Please see the [official OpenSearch docs](https://opensearch.org/docs/latest/field-types/) 1290 for more information. If None, the mappings from the constructor are used. 1291 - **settings** (<code>dict\[str, Any\] | None</code>) – The settings of the index to be created. Please see the [official OpenSearch docs](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/#index-settings) 1292 for more information. If None, the settings from the constructor are used. 1293 1294 #### to_dict 1295 1296 ```python 1297 to_dict() -> dict[str, Any] 1298 ``` 1299 1300 Serializes the component to a dictionary. 1301 1302 **Returns:** 1303 1304 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1305 1306 #### from_dict 1307 1308 ```python 1309 from_dict(data: dict[str, Any]) -> OpenSearchDocumentStore 1310 ``` 1311 1312 Deserializes the component from a dictionary. 1313 1314 **Parameters:** 1315 1316 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 1317 1318 **Returns:** 1319 1320 - <code>OpenSearchDocumentStore</code> – Deserialized component. 1321 1322 #### count_documents 1323 1324 ```python 1325 count_documents() -> int 1326 ``` 1327 1328 Returns how many documents are present in the document store. 1329 1330 #### count_documents_async 1331 1332 ```python 1333 count_documents_async() -> int 1334 ``` 1335 1336 Asynchronously returns the total number of documents in the document store. 1337 1338 #### filter_documents 1339 1340 ```python 1341 filter_documents(filters: dict[str, Any] | None = None) -> list[Document] 1342 ``` 1343 1344 Returns the documents that match the filters provided. 1345 1346 For a detailed specification of the filters, 1347 refer to the [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering) 1348 1349 **Parameters:** 1350 1351 - **filters** (<code>dict\[str, Any\] | None</code>) – The filters to apply to the document list. 1352 1353 **Returns:** 1354 1355 - <code>list\[Document\]</code> – A list of Documents that match the given filters. 1356 1357 #### filter_documents_async 1358 1359 ```python 1360 filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document] 1361 ``` 1362 1363 Asynchronously returns the documents that match the filters provided. 1364 1365 For a detailed specification of the filters, 1366 refer to the [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering) 1367 1368 **Parameters:** 1369 1370 - **filters** (<code>dict\[str, Any\] | None</code>) – The filters to apply to the document list. 1371 1372 **Returns:** 1373 1374 - <code>list\[Document\]</code> – A list of Documents that match the given filters. 1375 1376 #### write_documents 1377 1378 ```python 1379 write_documents( 1380 documents: list[Document], 1381 policy: DuplicatePolicy = DuplicatePolicy.NONE, 1382 refresh: Literal["wait_for", True, False] = "wait_for", 1383 ) -> int 1384 ``` 1385 1386 Writes documents to the document store. 1387 1388 **Parameters:** 1389 1390 - **documents** (<code>list\[Document\]</code>) – A list of Documents to write to the document store. 1391 - **policy** (<code>DuplicatePolicy</code>) – The duplicate policy to use when writing documents. 1392 - **refresh** (<code>Literal['wait_for', True, False]</code>) – Controls when changes are made visible to search operations. 1393 - `True`: Force refresh immediately after the operation. 1394 - `False`: Do not refresh (better performance for bulk operations). 1395 - `"wait_for"`: Wait for the next refresh cycle (default, ensures read-your-writes consistency). 1396 For more details, see the [OpenSearch refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/index-document/). 1397 1398 **Returns:** 1399 1400 - <code>int</code> – The number of documents written to the document store. 1401 1402 **Raises:** 1403 1404 - <code>DuplicateDocumentError</code> – If a document with the same id already exists in the document store 1405 and the policy is set to `DuplicatePolicy.FAIL` (or not specified). 1406 1407 #### write_documents_async 1408 1409 ```python 1410 write_documents_async( 1411 documents: list[Document], 1412 policy: DuplicatePolicy = DuplicatePolicy.NONE, 1413 refresh: Literal["wait_for", True, False] = "wait_for", 1414 ) -> int 1415 ``` 1416 1417 Asynchronously writes documents to the document store. 1418 1419 **Parameters:** 1420 1421 - **documents** (<code>list\[Document\]</code>) – A list of Documents to write to the document store. 1422 - **policy** (<code>DuplicatePolicy</code>) – The duplicate policy to use when writing documents. 1423 - **refresh** (<code>Literal['wait_for', True, False]</code>) – Controls when changes are made visible to search operations. 1424 - `True`: Force refresh immediately after the operation. 1425 - `False`: Do not refresh (better performance for bulk operations). 1426 - `"wait_for"`: Wait for the next refresh cycle (default, ensures read-your-writes consistency). 1427 For more details, see the [OpenSearch refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/index-document/). 1428 1429 **Returns:** 1430 1431 - <code>int</code> – The number of documents written to the document store. 1432 1433 #### delete_documents 1434 1435 ```python 1436 delete_documents( 1437 document_ids: list[str], 1438 refresh: Literal["wait_for", True, False] = "wait_for", 1439 routing: dict[str, str] | None = None, 1440 ) -> None 1441 ``` 1442 1443 Deletes documents that match the provided `document_ids` from the document store. 1444 1445 **Parameters:** 1446 1447 - **document_ids** (<code>list\[str\]</code>) – the document ids to delete 1448 - **refresh** (<code>Literal['wait_for', True, False]</code>) – Controls when changes are made visible to search operations. 1449 - `True`: Force refresh immediately after the operation. 1450 - `False`: Do not refresh (better performance for bulk operations). 1451 - `"wait_for"`: Wait for the next refresh cycle (default, ensures read-your-writes consistency). 1452 For more details, see the [OpenSearch refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/index-document/). 1453 - **routing** (<code>dict\[str, str\] | None</code>) – A dictionary mapping document IDs to their routing values. 1454 Routing values are used to determine the shard where documents are stored. 1455 If provided, the routing value for each document will be used during deletion. 1456 1457 #### delete_documents_async 1458 1459 ```python 1460 delete_documents_async( 1461 document_ids: list[str], 1462 refresh: Literal["wait_for", True, False] = "wait_for", 1463 routing: dict[str, str] | None = None, 1464 ) -> None 1465 ``` 1466 1467 Asynchronously deletes documents that match the provided `document_ids` from the document store. 1468 1469 **Parameters:** 1470 1471 - **document_ids** (<code>list\[str\]</code>) – the document ids to delete 1472 - **refresh** (<code>Literal['wait_for', True, False]</code>) – Controls when changes are made visible to search operations. 1473 - `True`: Force refresh immediately after the operation. 1474 - `False`: Do not refresh (better performance for bulk operations). 1475 - `"wait_for"`: Wait for the next refresh cycle (default, ensures read-your-writes consistency). 1476 For more details, see the [OpenSearch refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/index-document/). 1477 - **routing** (<code>dict\[str, str\] | None</code>) – A dictionary mapping document IDs to their routing values. 1478 Routing values are used to determine the shard where documents are stored. 1479 If provided, the routing value for each document will be used during deletion. 1480 1481 #### delete_all_documents 1482 1483 ```python 1484 delete_all_documents( 1485 recreate_index: bool = False, refresh: bool = True 1486 ) -> None 1487 ``` 1488 1489 Deletes all documents in the document store. 1490 1491 **Parameters:** 1492 1493 - **recreate_index** (<code>bool</code>) – If True, the index will be deleted and recreated with the original mappings and 1494 settings. If False, all documents will be deleted using the `delete_by_query` API. 1495 - **refresh** (<code>bool</code>) – If True, OpenSearch refreshes all shards involved in the delete by query after the request 1496 completes. If False, no refresh is performed. For more details, see the 1497 [OpenSearch delete_by_query refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/delete-by-query/). 1498 1499 #### delete_all_documents_async 1500 1501 ```python 1502 delete_all_documents_async( 1503 recreate_index: bool = False, refresh: bool = True 1504 ) -> None 1505 ``` 1506 1507 Asynchronously deletes all documents in the document store. 1508 1509 **Parameters:** 1510 1511 - **recreate_index** (<code>bool</code>) – If True, the index will be deleted and recreated with the original mappings and 1512 settings. If False, all documents will be deleted using the `delete_by_query` API. 1513 - **refresh** (<code>bool</code>) – If True, OpenSearch refreshes all shards involved in the delete by query after the request 1514 completes. If False, no refresh is performed. For more details, see the 1515 [OpenSearch delete_by_query refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/delete-by-query/). 1516 1517 #### delete_by_filter 1518 1519 ```python 1520 delete_by_filter(filters: dict[str, Any], refresh: bool = False) -> int 1521 ``` 1522 1523 Deletes all documents that match the provided filters. 1524 1525 **Parameters:** 1526 1527 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for deletion. 1528 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 1529 - **refresh** (<code>bool</code>) – If True, OpenSearch refreshes all shards involved in the delete by query after the request 1530 completes so that subsequent reads (e.g. count_documents) see the update. If False, no refresh is 1531 performed (better for bulk deletes). For more details, see the 1532 [OpenSearch delete_by_query refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/delete-by-query/). 1533 1534 **Returns:** 1535 1536 - <code>int</code> – The number of documents deleted. 1537 1538 #### delete_by_filter_async 1539 1540 ```python 1541 delete_by_filter_async(filters: dict[str, Any], refresh: bool = False) -> int 1542 ``` 1543 1544 Asynchronously deletes all documents that match the provided filters. 1545 1546 **Parameters:** 1547 1548 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for deletion. 1549 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 1550 - **refresh** (<code>bool</code>) – If True, OpenSearch refreshes all shards involved in the delete by query after the request 1551 completes so that subsequent reads see the update. If False, no refresh is performed. For more details, 1552 see the [OpenSearch delete_by_query refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/delete-by-query/). 1553 1554 **Returns:** 1555 1556 - <code>int</code> – The number of documents deleted. 1557 1558 #### update_by_filter 1559 1560 ```python 1561 update_by_filter( 1562 filters: dict[str, Any], meta: dict[str, Any], refresh: bool = False 1563 ) -> int 1564 ``` 1565 1566 Updates the metadata of all documents that match the provided filters. 1567 1568 **Parameters:** 1569 1570 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for updating. 1571 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 1572 - **meta** (<code>dict\[str, Any\]</code>) – The metadata fields to update. 1573 - **refresh** (<code>bool</code>) – If True, OpenSearch refreshes all shards involved in the update by query after the request 1574 completes. If False, no refresh is performed. For more details, see the 1575 [OpenSearch update_by_query refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/update-by-query/). 1576 1577 **Returns:** 1578 1579 - <code>int</code> – The number of documents updated. 1580 1581 #### update_by_filter_async 1582 1583 ```python 1584 update_by_filter_async( 1585 filters: dict[str, Any], meta: dict[str, Any], refresh: bool = False 1586 ) -> int 1587 ``` 1588 1589 Asynchronously updates the metadata of all documents that match the provided filters. 1590 1591 **Parameters:** 1592 1593 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for updating. 1594 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 1595 - **meta** (<code>dict\[str, Any\]</code>) – The metadata fields to update. 1596 - **refresh** (<code>bool</code>) – If True, OpenSearch refreshes all shards involved in the update by query after the request 1597 completes. If False, no refresh is performed. For more details, see the 1598 [OpenSearch update_by_query refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/update-by-query/). 1599 1600 **Returns:** 1601 1602 - <code>int</code> – The number of documents updated. 1603 1604 #### count_documents_by_filter 1605 1606 ```python 1607 count_documents_by_filter(filters: dict[str, Any]) -> int 1608 ``` 1609 1610 Returns the number of documents that match the provided filters. 1611 1612 **Parameters:** 1613 1614 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents. 1615 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 1616 1617 **Returns:** 1618 1619 - <code>int</code> – The number of documents that match the filters. 1620 1621 #### count_documents_by_filter_async 1622 1623 ```python 1624 count_documents_by_filter_async(filters: dict[str, Any]) -> int 1625 ``` 1626 1627 Asynchronously returns the number of documents that match the provided filters. 1628 1629 **Parameters:** 1630 1631 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents. 1632 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 1633 1634 **Returns:** 1635 1636 - <code>int</code> – The number of documents that match the filters. 1637 1638 #### count_unique_metadata_by_filter 1639 1640 ```python 1641 count_unique_metadata_by_filter( 1642 filters: dict[str, Any], metadata_fields: list[str] 1643 ) -> dict[str, int] 1644 ``` 1645 1646 Returns the number of unique values for each specified metadata field of the documents that match the filters. 1647 1648 **Parameters:** 1649 1650 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents. 1651 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 1652 - **metadata_fields** (<code>list\[str\]</code>) – List of field names to calculate unique values for. 1653 Field names can include or omit the "meta." prefix. 1654 1655 **Returns:** 1656 1657 - <code>dict\[str, int\]</code> – A dictionary mapping each metadata field name to the count of its unique values among the filtered 1658 documents. 1659 1660 **Raises:** 1661 1662 - <code>ValueError</code> – If any of the requested fields don't exist in the index mapping. 1663 1664 #### count_unique_metadata_by_filter_async 1665 1666 ```python 1667 count_unique_metadata_by_filter_async( 1668 filters: dict[str, Any], metadata_fields: list[str] 1669 ) -> dict[str, int] 1670 ``` 1671 1672 Asynchronously returns the number of unique values for each specified metadata field matching the filters. 1673 1674 **Parameters:** 1675 1676 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents. 1677 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 1678 - **metadata_fields** (<code>list\[str\]</code>) – List of field names to calculate unique values for. 1679 Field names can include or omit the "meta." prefix. 1680 1681 **Returns:** 1682 1683 - <code>dict\[str, int\]</code> – A dictionary mapping each metadata field name to the count of its unique values among the filtered 1684 documents. 1685 1686 **Raises:** 1687 1688 - <code>ValueError</code> – If any of the requested fields don't exist in the index mapping. 1689 1690 #### get_metadata_fields_info 1691 1692 ```python 1693 get_metadata_fields_info() -> dict[str, dict[str, str]] 1694 ``` 1695 1696 Returns the information about the fields in the index. 1697 1698 If we populated the index with documents like: 1699 1700 ```python 1701 Document(content="Doc 1", meta={"category": "A", "status": "active", "priority": 1}) 1702 Document(content="Doc 2", meta={"category": "B", "status": "inactive"}) 1703 ``` 1704 1705 This method would return: 1706 1707 ```python 1708 { 1709 'content': {'type': 'text'}, 1710 'category': {'type': 'keyword'}, 1711 'status': {'type': 'keyword'}, 1712 'priority': {'type': 'long'}, 1713 } 1714 ``` 1715 1716 **Returns:** 1717 1718 - <code>dict\[str, dict\[str, str\]\]</code> – The information about the fields in the index. 1719 1720 #### get_metadata_fields_info_async 1721 1722 ```python 1723 get_metadata_fields_info_async() -> dict[str, dict[str, str]] 1724 ``` 1725 1726 Asynchronously returns the information about the fields in the index. 1727 1728 If we populated the index with documents like: 1729 1730 ```python 1731 Document(content="Doc 1", meta={"category": "A", "status": "active", "priority": 1}) 1732 Document(content="Doc 2", meta={"category": "B", "status": "inactive"}) 1733 ``` 1734 1735 This method would return: 1736 1737 ```python 1738 { 1739 'content': {'type': 'text'}, 1740 'category': {'type': 'keyword'}, 1741 'status': {'type': 'keyword'}, 1742 'priority': {'type': 'long'}, 1743 } 1744 ``` 1745 1746 **Returns:** 1747 1748 - <code>dict\[str, dict\[str, str\]\]</code> – The information about the fields in the index. 1749 1750 #### get_metadata_field_min_max 1751 1752 ```python 1753 get_metadata_field_min_max(metadata_field: str) -> dict[str, int | None] 1754 ``` 1755 1756 Returns the minimum and maximum values for the given metadata field. 1757 1758 **Parameters:** 1759 1760 - **metadata_field** (<code>str</code>) – The metadata field to get the minimum and maximum values for. 1761 1762 **Returns:** 1763 1764 - <code>dict\[str, int | None\]</code> – A dictionary with the keys "min" and "max", where each value is the minimum or maximum value of the 1765 metadata field across all documents. 1766 1767 #### get_metadata_field_min_max_async 1768 1769 ```python 1770 get_metadata_field_min_max_async(metadata_field: str) -> dict[str, int | None] 1771 ``` 1772 1773 Asynchronously returns the minimum and maximum values for the given metadata field. 1774 1775 **Parameters:** 1776 1777 - **metadata_field** (<code>str</code>) – The metadata field to get the minimum and maximum values for. 1778 1779 **Returns:** 1780 1781 - <code>dict\[str, int | None\]</code> – A dictionary with the keys "min" and "max", where each value is the minimum or maximum value of the 1782 metadata field across all documents. 1783 1784 #### get_metadata_field_unique_values 1785 1786 ```python 1787 get_metadata_field_unique_values( 1788 metadata_field: str, 1789 search_term: str | None = None, 1790 size: int | None = 10000, 1791 after: dict[str, Any] | None = None, 1792 ) -> tuple[list[str], dict[str, Any] | None] 1793 ``` 1794 1795 Returns unique values for a metadata field, optionally filtered by a search term in the content. 1796 1797 Uses composite aggregations for proper pagination beyond 10k results. 1798 1799 **Parameters:** 1800 1801 - **metadata_field** (<code>str</code>) – The metadata field to get unique values for. 1802 - **search_term** (<code>str | None</code>) – Optional search term to filter documents by matching in the content field. 1803 - **size** (<code>int | None</code>) – The number of unique values to return per page. Defaults to 10000. 1804 - **after** (<code>dict\[str, Any\] | None</code>) – Optional pagination key from the previous response. Use None for the first page. 1805 For subsequent pages, pass the `after_key` from the previous response. 1806 1807 **Returns:** 1808 1809 - <code>tuple\[list\[str\], dict\[str, Any\] | None\]</code> – A tuple containing (list of unique values, after_key for pagination). 1810 The after_key is None when there are no more results. Use it in the `after` parameter 1811 for the next page. 1812 1813 #### get_metadata_field_unique_values_async 1814 1815 ```python 1816 get_metadata_field_unique_values_async( 1817 metadata_field: str, 1818 search_term: str | None = None, 1819 size: int | None = 10000, 1820 after: dict[str, Any] | None = None, 1821 ) -> tuple[list[str], dict[str, Any] | None] 1822 ``` 1823 1824 Asynchronously returns unique values for a metadata field, optionally filtered by a search term in the content. 1825 1826 Uses composite aggregations for proper pagination beyond 10k results. 1827 1828 **Parameters:** 1829 1830 - **metadata_field** (<code>str</code>) – The metadata field to get unique values for. 1831 - **search_term** (<code>str | None</code>) – Optional search term to filter documents by matching in the content field. 1832 - **size** (<code>int | None</code>) – The number of unique values to return per page. Defaults to 10000. 1833 - **after** (<code>dict\[str, Any\] | None</code>) – Optional pagination key from the previous response. Use None for the first page. 1834 For subsequent pages, pass the `after_key` from the previous response. 1835 1836 **Returns:** 1837 1838 - <code>tuple\[list\[str\], dict\[str, Any\] | None\]</code> – A tuple containing (list of unique values, after_key for pagination). 1839 The after_key is None when there are no more results. Use it in the `after` parameter 1840 for the next page. 1841 1842 ## haystack_integrations.document_stores.opensearch.filters 1843 1844 ### normalize_filters 1845 1846 ```python 1847 normalize_filters( 1848 filters: dict[str, Any], nested_fields: set[str] | None = None 1849 ) -> dict[str, Any] 1850 ``` 1851 1852 Converts Haystack filters in OpenSearch compatible filters. 1853 1854 **Parameters:** 1855 1856 - **filters** (<code>dict\[str, Any\]</code>) – Haystack filter dictionary. 1857 - **nested_fields** (<code>set\[str\] | None</code>) – Set of metadata field paths that are mapped as `nested` type in OpenSearch. 1858 When provided, conditions targeting sub-fields of these paths are wrapped in `nested` queries.