opensearch.md
   1  ---
   2  title: "OpenSearch"
   3  id: integrations-opensearch
   4  description: "OpenSearch integration for Haystack"
   5  slug: "/integrations-opensearch"
   6  ---
   7  
   8  
   9  ## haystack_integrations.components.retrievers.opensearch.bm25_retriever
  10  
  11  ### OpenSearchBM25Retriever
  12  
  13  Fetches documents from OpenSearchDocumentStore using the keyword-based BM25 algorithm.
  14  
  15  BM25 computes a weighted word overlap between the query string and a document to determine its similarity.
  16  
  17  #### __init__
  18  
  19  ```python
  20  __init__(
  21      *,
  22      document_store: OpenSearchDocumentStore,
  23      filters: dict[str, Any] | None = None,
  24      fuzziness: int | str = "AUTO",
  25      top_k: int = 10,
  26      scale_score: bool = False,
  27      all_terms_must_match: bool = False,
  28      filter_policy: str | FilterPolicy = FilterPolicy.REPLACE,
  29      custom_query: dict[str, Any] | None = None,
  30      raise_on_failure: bool = True
  31  ) -> None
  32  ```
  33  
  34  Creates the OpenSearchBM25Retriever component.
  35  
  36  **Parameters:**
  37  
  38  - **document_store** (<code>OpenSearchDocumentStore</code>) – An instance of OpenSearchDocumentStore to use with the Retriever.
  39  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters to narrow down the search for documents in the Document Store.
  40  - **fuzziness** (<code>int | str</code>) – Determines how approximate string matching is applied in full-text queries.
  41    This parameter sets the number of character edits (insertions, deletions, or substitutions)
  42    required to transform one word into another. For example, the "fuzziness" between the words
  43    "wined" and "wind" is 1 because only one edit is needed to match them.
  44  
  45  Use "AUTO" (the default) for automatic adjustment based on term length, which is optimal for
  46  most scenarios. For detailed guidance, refer to the
  47  [OpenSearch fuzzy query documentation](https://opensearch.org/docs/latest/query-dsl/term/fuzzy/).
  48  
  49  - **top_k** (<code>int</code>) – Maximum number of documents to return.
  50  
  51  - **scale_score** (<code>bool</code>) – If `True`, scales the score of retrieved documents to a range between 0 and 1.
  52    This is useful when comparing documents across different indexes.
  53  
  54  - **all_terms_must_match** (<code>bool</code>) – If `True`, all terms in the query string must be present in the
  55    retrieved documents. This is useful when searching for short text where even one term
  56    can make a difference.
  57  
  58  - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied. Possible options:
  59  
  60  - `replace`: Runtime filters replace initialization filters. Use this policy to change the filtering scope
  61    for specific queries.
  62  
  63  - `merge`: Runtime filters are merged with initialization filters.
  64  
  65  - **custom_query** (<code>dict\[str, Any\] | None</code>) – The query containing a mandatory `$query` and an optional `$filters` placeholder.
  66  
  67    **An example custom_query:**
  68  
  69    ```python
  70    {
  71        "query": {
  72            "bool": {
  73                "should": [{"multi_match": {
  74                    "query": "$query",                 // mandatory query placeholder
  75                    "type": "most_fields",
  76                    "fields": ["content", "title"]}}],
  77                "filter": "$filters"                  // optional filter placeholder
  78            }
  79        }
  80    }
  81    ```
  82  
  83  An example `run()` method for this `custom_query`:
  84  
  85  ```python
  86  retriever.run(
  87      query="Why did the revenue increase?",
  88      filters={
  89          "operator": "AND",
  90          "conditions": [
  91              {"field": "meta.years", "operator": "==", "value": "2019"},
  92              {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]},
  93          ],
  94      },
  95  )
  96  ```
  97  
  98  - **raise_on_failure** (<code>bool</code>) – Whether to raise an exception if the API call fails. Otherwise log a warning and return an empty list.
  99  
 100  **Raises:**
 101  
 102  - <code>ValueError</code> – If `document_store` is not an instance of OpenSearchDocumentStore.
 103  
 104  #### to_dict
 105  
 106  ```python
 107  to_dict() -> dict[str, Any]
 108  ```
 109  
 110  Serializes the component to a dictionary.
 111  
 112  **Returns:**
 113  
 114  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 115  
 116  #### from_dict
 117  
 118  ```python
 119  from_dict(data: dict[str, Any]) -> OpenSearchBM25Retriever
 120  ```
 121  
 122  Deserializes the component from a dictionary.
 123  
 124  **Parameters:**
 125  
 126  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
 127  
 128  **Returns:**
 129  
 130  - <code>OpenSearchBM25Retriever</code> – Deserialized component.
 131  
 132  #### run
 133  
 134  ```python
 135  run(
 136      query: str,
 137      filters: dict[str, Any] | None = None,
 138      all_terms_must_match: bool | None = None,
 139      top_k: int | None = None,
 140      fuzziness: int | str | None = None,
 141      scale_score: bool | None = None,
 142      custom_query: dict[str, Any] | None = None,
 143      document_store: OpenSearchDocumentStore | None = None,
 144  ) -> dict[str, list[Document]]
 145  ```
 146  
 147  Retrieve documents using BM25 retrieval.
 148  
 149  **Parameters:**
 150  
 151  - **query** (<code>str</code>) – The query string.
 152  
 153  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved documents. The way runtime filters are applied depends on
 154    the `filter_policy` specified at Retriever's initialization.
 155  
 156  - **all_terms_must_match** (<code>bool | None</code>) – If `True`, all terms in the query string must be present in the
 157    retrieved documents.
 158  
 159  - **top_k** (<code>int | None</code>) – Maximum number of documents to return.
 160  
 161  - **fuzziness** (<code>int | str | None</code>) – Fuzziness parameter for full-text queries to apply approximate string matching.
 162    For more information, see [OpenSearch fuzzy query](https://opensearch.org/docs/latest/query-dsl/term/fuzzy/).
 163  
 164  - **scale_score** (<code>bool | None</code>) – If `True`, scales the score of retrieved documents to a range between 0 and 1.
 165    This is useful when comparing documents across different indexes.
 166  
 167  - **custom_query** (<code>dict\[str, Any\] | None</code>) – A custom OpenSearch query. It must include a `$query` and may optionally
 168    include a `$filters` placeholder.
 169  
 170    **An example custom_query:**
 171  
 172    ```python
 173    {
 174        "query": {
 175            "bool": {
 176                "should": [{"multi_match": {
 177                    "query": "$query",                 // mandatory query placeholder
 178                    "type": "most_fields",
 179                    "fields": ["content", "title"]}}],
 180                "filter": "$filters"                  // optional filter placeholder
 181            }
 182        }
 183    }
 184    ```
 185  
 186  **For this custom_query, a sample `run()` could be:**
 187  
 188  ```python
 189  retriever.run(
 190      query="Why did the revenue increase?",
 191      filters={
 192          "operator": "AND",
 193          "conditions": [
 194              {"field": "meta.years", "operator": "==", "value": "2019"},
 195              {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]},
 196          ],
 197      },
 198  )
 199  ```
 200  
 201  - **document_store** (<code>OpenSearchDocumentStore | None</code>) – Optionally, an instance of OpenSearchDocumentStore to use with the Retriever
 202  
 203  **Returns:**
 204  
 205  - <code>dict\[str, list\[Document\]\]</code> – A dictionary containing the retrieved documents with the following structure:
 206  - documents: List of retrieved Documents.
 207  
 208  #### run_async
 209  
 210  ```python
 211  run_async(
 212      query: str,
 213      filters: dict[str, Any] | None = None,
 214      all_terms_must_match: bool | None = None,
 215      top_k: int | None = None,
 216      fuzziness: int | str | None = None,
 217      scale_score: bool | None = None,
 218      custom_query: dict[str, Any] | None = None,
 219      document_store: OpenSearchDocumentStore | None = None,
 220  ) -> dict[str, list[Document]]
 221  ```
 222  
 223  Asynchronously retrieve documents using BM25 retrieval.
 224  
 225  **Parameters:**
 226  
 227  - **query** (<code>str</code>) – The query string.
 228  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved documents. The way runtime filters are applied depends on
 229    the `filter_policy` specified at Retriever's initialization.
 230  - **all_terms_must_match** (<code>bool | None</code>) – If `True`, all terms in the query string must be present in the
 231    retrieved documents.
 232  - **top_k** (<code>int | None</code>) – Maximum number of documents to return.
 233  - **fuzziness** (<code>int | str | None</code>) – Fuzziness parameter for full-text queries to apply approximate string matching.
 234    For more information, see [OpenSearch fuzzy query](https://opensearch.org/docs/latest/query-dsl/term/fuzzy/).
 235  - **scale_score** (<code>bool | None</code>) – If `True`, scales the score of retrieved documents to a range between 0 and 1.
 236    This is useful when comparing documents across different indexes.
 237  - **custom_query** (<code>dict\[str, Any\] | None</code>) – A custom OpenSearch query. It must include a `$query` and may optionally
 238    include a `$filters` placeholder.
 239  - **document_store** (<code>OpenSearchDocumentStore | None</code>) – Optionally, an instance of OpenSearchDocumentStore to use with the Retriever
 240  
 241  **Returns:**
 242  
 243  - <code>dict\[str, list\[Document\]\]</code> – A dictionary containing the retrieved documents with the following structure:
 244  - documents: List of retrieved Documents.
 245  
 246  ## haystack_integrations.components.retrievers.opensearch.embedding_retriever
 247  
 248  ### OpenSearchEmbeddingRetriever
 249  
 250  Retrieves documents from the OpenSearchDocumentStore using a vector similarity metric.
 251  
 252  Must be connected to the OpenSearchDocumentStore to run.
 253  
 254  #### __init__
 255  
 256  ```python
 257  __init__(
 258      *,
 259      document_store: OpenSearchDocumentStore,
 260      filters: dict[str, Any] | None = None,
 261      top_k: int = 10,
 262      filter_policy: str | FilterPolicy = FilterPolicy.REPLACE,
 263      custom_query: dict[str, Any] | None = None,
 264      raise_on_failure: bool = True,
 265      efficient_filtering: bool = False,
 266      search_kwargs: dict[str, Any] | None = None
 267  ) -> None
 268  ```
 269  
 270  Create the OpenSearchEmbeddingRetriever component.
 271  
 272  **Parameters:**
 273  
 274  - **document_store** (<code>OpenSearchDocumentStore</code>) – An instance of OpenSearchDocumentStore to use with the Retriever.
 275  
 276  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied when fetching documents from the Document Store.
 277    Filters are applied during the approximate kNN search to ensure the Retriever returns
 278    `top_k` matching documents.
 279  
 280  - **top_k** (<code>int</code>) – Maximum number of documents to return.
 281  
 282  - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied. Possible options:
 283  
 284  - `merge`: Runtime filters are merged with initialization filters.
 285  
 286  - `replace`: Runtime filters replace initialization filters. Use this policy to change the filtering scope.
 287  
 288  - **custom_query** (<code>dict\[str, Any\] | None</code>) – The custom OpenSearch query containing a mandatory `$query_embedding` and
 289    an optional `$filters` placeholder.
 290  
 291    **An example custom_query:**
 292  
 293    ```python
 294    {
 295        "query": {
 296            "bool": {
 297                "must": [
 298                    {
 299                        "knn": {
 300                            "embedding": {
 301                                "vector": "$query_embedding",   // mandatory query placeholder
 302                                "k": 10000,
 303                            }
 304                        }
 305                    }
 306                ],
 307                "filter": "$filters"                            // optional filter placeholder
 308            }
 309        }
 310    }
 311    ```
 312  
 313  For this `custom_query`, an example `run()` could be:
 314  
 315  ```python
 316  retriever.run(
 317      query_embedding=embedding,
 318      filters={
 319          "operator": "AND",
 320          "conditions": [
 321              {"field": "meta.years", "operator": "==", "value": "2019"},
 322              {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]},
 323          ],
 324      },
 325  )
 326  ```
 327  
 328  - **raise_on_failure** (<code>bool</code>) – If `True`, raises an exception if the API call fails.
 329    If `False`, logs a warning and returns an empty list.
 330  - **efficient_filtering** (<code>bool</code>) – If `True`, the filter will be applied during the approximate kNN search.
 331    This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib".
 332  - **search_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for finetuning the embedding search.
 333    E.g., to specify `k` and `ef_search`
 334  
 335  ```python
 336  {
 337      "k": 20, # See https://docs.opensearch.org/latest/vector-search/vector-search-techniques/approximate-knn/#the-number-of-returned-results
 338      "method_parameters": {
 339          "ef_search": 512, # See https://docs.opensearch.org/latest/query-dsl/specialized/k-nn/index/#ef_search
 340      }
 341  }
 342  ```
 343  
 344  For a full list of available parameters, see the OpenSearch documentation:
 345  https://docs.opensearch.org/latest/query-dsl/specialized/k-nn/index/#request-body-fields
 346  
 347  **Raises:**
 348  
 349  - <code>ValueError</code> – If `document_store` is not an instance of OpenSearchDocumentStore.
 350  
 351  #### to_dict
 352  
 353  ```python
 354  to_dict() -> dict[str, Any]
 355  ```
 356  
 357  Serializes the component to a dictionary.
 358  
 359  **Returns:**
 360  
 361  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 362  
 363  #### from_dict
 364  
 365  ```python
 366  from_dict(data: dict[str, Any]) -> OpenSearchEmbeddingRetriever
 367  ```
 368  
 369  Deserializes the component from a dictionary.
 370  
 371  **Parameters:**
 372  
 373  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
 374  
 375  **Returns:**
 376  
 377  - <code>OpenSearchEmbeddingRetriever</code> – Deserialized component.
 378  
 379  #### run
 380  
 381  ```python
 382  run(
 383      query_embedding: list[float],
 384      filters: dict[str, Any] | None = None,
 385      top_k: int | None = None,
 386      custom_query: dict[str, Any] | None = None,
 387      efficient_filtering: bool | None = None,
 388      document_store: OpenSearchDocumentStore | None = None,
 389      search_kwargs: dict[str, Any] | None = None,
 390  ) -> dict[str, list[Document]]
 391  ```
 392  
 393  Retrieve documents using a vector similarity metric.
 394  
 395  **Parameters:**
 396  
 397  - **query_embedding** (<code>list\[float\]</code>) – Embedding of the query.
 398  
 399  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied when fetching documents from the Document Store.
 400    Filters are applied during the approximate kNN search to ensure the Retriever returns `top_k` matching
 401    documents.
 402    The way runtime filters are applied depends on the `filter_policy` selected when initializing the Retriever.
 403  
 404  - **top_k** (<code>int | None</code>) – Maximum number of documents to return.
 405  
 406  - **custom_query** (<code>dict\[str, Any\] | None</code>) – A custom OpenSearch query containing a mandatory `$query_embedding` and an
 407    optional `$filters` placeholder.
 408  
 409    **An example custom_query:**
 410  
 411    ```python
 412    {
 413        "query": {
 414            "bool": {
 415                "must": [
 416                    {
 417                        "knn": {
 418                            "embedding": {
 419                                "vector": "$query_embedding",   // mandatory query placeholder
 420                                "k": 10000,
 421                            }
 422                        }
 423                    }
 424                ],
 425                "filter": "$filters"                            // optional filter placeholder
 426            }
 427        }
 428    }
 429    ```
 430  
 431  For this `custom_query`, an example `run()` could be:
 432  
 433  ```python
 434  retriever.run(
 435      query_embedding=embedding,
 436      filters={
 437          "operator": "AND",
 438          "conditions": [
 439              {"field": "meta.years", "operator": "==", "value": "2019"},
 440              {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]},
 441          ],
 442      },
 443  )
 444  ```
 445  
 446  - **efficient_filtering** (<code>bool | None</code>) – If `True`, the filter will be applied during the approximate kNN search.
 447    This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib".
 448  - **document_store** (<code>OpenSearchDocumentStore | None</code>) – Optional instance of OpenSearchDocumentStore to use with the Retriever.
 449  - **search_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for finetuning the embedding search. If not provided,
 450    defaults to the parameter set at initialization (if any).
 451    E.g., to specify `k` and `ef_search`
 452  
 453  ```python
 454  {
 455      "k": 20, # See https://docs.opensearch.org/latest/vector-search/vector-search-techniques/approximate-knn/#the-number-of-returned-results
 456      "method_parameters": {
 457          "ef_search": 512, # See https://docs.opensearch.org/latest/query-dsl/specialized/k-nn/index/#ef_search
 458      }
 459  }
 460  ```
 461  
 462  For a full list of available parameters, see the OpenSearch documentation:
 463  https://docs.opensearch.org/latest/query-dsl/specialized/k-nn/index/#request-body-fields
 464  
 465  **Returns:**
 466  
 467  - <code>dict\[str, list\[Document\]\]</code> – Dictionary with key "documents" containing the retrieved Documents.
 468  - documents: List of Document similar to `query_embedding`.
 469  
 470  #### run_async
 471  
 472  ```python
 473  run_async(
 474      query_embedding: list[float],
 475      filters: dict[str, Any] | None = None,
 476      top_k: int | None = None,
 477      custom_query: dict[str, Any] | None = None,
 478      efficient_filtering: bool | None = None,
 479      document_store: OpenSearchDocumentStore | None = None,
 480      search_kwargs: dict[str, Any] | None = None,
 481  ) -> dict[str, list[Document]]
 482  ```
 483  
 484  Asynchronously retrieve documents using a vector similarity metric.
 485  
 486  **Parameters:**
 487  
 488  - **query_embedding** (<code>list\[float\]</code>) – Embedding of the query.
 489  
 490  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied when fetching documents from the Document Store.
 491    Filters are applied during the approximate kNN search to ensure the Retriever
 492    returns `top_k` matching documents.
 493    The way runtime filters are applied depends on the `filter_policy` selected when initializing the Retriever.
 494  
 495  - **top_k** (<code>int | None</code>) – Maximum number of documents to return.
 496  
 497  - **custom_query** (<code>dict\[str, Any\] | None</code>) – A custom OpenSearch query containing a mandatory `$query_embedding` and an
 498    optional `$filters` placeholder.
 499  
 500    **An example custom_query:**
 501  
 502    ```python
 503    {
 504        "query": {
 505            "bool": {
 506                "must": [
 507                    {
 508                        "knn": {
 509                            "embedding": {
 510                                "vector": "$query_embedding",   // mandatory query placeholder
 511                                "k": 10000,
 512                            }
 513                        }
 514                    }
 515                ],
 516                "filter": "$filters"                            // optional filter placeholder
 517            }
 518        }
 519    }
 520    ```
 521  
 522  For this `custom_query`, an example `run()` could be:
 523  
 524  ```python
 525  retriever.run(
 526      query_embedding=embedding,
 527      filters={
 528          "operator": "AND",
 529          "conditions": [
 530              {"field": "meta.years", "operator": "==", "value": "2019"},
 531              {"field": "meta.quarters", "operator": "in", "value": ["Q1", "Q2"]},
 532          ],
 533      },
 534  )
 535  ```
 536  
 537  - **efficient_filtering** (<code>bool | None</code>) – If `True`, the filter will be applied during the approximate kNN search.
 538    This is only supported for knn engines "faiss" and "lucene" and does not work with the default "nmslib".
 539  - **document_store** (<code>OpenSearchDocumentStore | None</code>) – Optional instance of OpenSearchDocumentStore to use with the Retriever.
 540  - **search_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for finetuning the embedding search. If not provided,
 541    defaults to the parameter set at initialization (if any).
 542    E.g., to specify `k` and `ef_search`
 543  
 544  ```python
 545  {
 546      "k": 20, # See https://docs.opensearch.org/latest/vector-search/vector-search-techniques/approximate-knn/#the-number-of-returned-results
 547      "method_parameters": {
 548          "ef_search": 512, # See https://docs.opensearch.org/latest/query-dsl/specialized/k-nn/index/#ef_search
 549      }
 550  }
 551  ```
 552  
 553  For a full list of available parameters, see the OpenSearch documentation:
 554  https://docs.opensearch.org/latest/query-dsl/specialized/k-nn/index/#request-body-fields
 555  
 556  **Returns:**
 557  
 558  - <code>dict\[str, list\[Document\]\]</code> – Dictionary with key "documents" containing the retrieved Documents.
 559  - documents: List of Document similar to `query_embedding`.
 560  
 561  ## haystack_integrations.components.retrievers.opensearch.metadata_retriever
 562  
 563  ### OpenSearchMetadataRetriever
 564  
 565  Retrieves and ranks metadata from documents stored in an OpenSearchDocumentStore.
 566  
 567  It searches specified metadata fields for matches to a given query, ranks the results based on relevance using
 568  Jaccard similarity, and returns the top-k results containing only the specified metadata fields. Additionally, it
 569  adds a boost to the score of exact matches.
 570  
 571  The search is designed for metadata fields whose values are **text** (strings). It uses prefix, wildcard and fuzzy
 572  matching to find candidate documents; these query types operate only on text/keyword fields in OpenSearch.
 573  
 574  Metadata fields with **non-string types** (integers, floats, booleans, lists of non-strings) are indexed by
 575  OpenSearch as numeric, boolean, or array types. Those field types do not support prefix, wildcard, or full-text
 576  match queries, so documents are typically not found when you search only by such fields.
 577  
 578  **Mixed types** in the same metadata field (e.g. a list containing both strings and numbers) are not supported.
 579  
 580  Must be connected to the OpenSearchDocumentStore to run.
 581  
 582  Example:
 583  \`\`\`python
 584  from haystack import Document
 585  from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
 586  from haystack_integrations.components.retrievers.opensearch import OpenSearchMetadataRetriever
 587  
 588  ````
 589  # Create documents with metadata
 590  docs = [
 591      Document(
 592          content="Python programming guide",
 593          meta={"category": "Python", "status": "active", "priority": 1, "author": "John Doe"}
 594      ),
 595      Document(
 596          content="Java tutorial",
 597          meta={"category": "Java", "status": "active", "priority": 2, "author": "Jane Smith"}
 598      ),
 599      Document(
 600          content="Python advanced topics",
 601          meta={"category": "Python", "status": "inactive", "priority": 3, "author": "John Doe"}
 602      ),
 603  ]
 604  document_store.write_documents(docs, refresh=True)
 605  
 606  # Create retriever specifying which metadata fields to search and return
 607  retriever = OpenSearchMetadataRetriever(
 608      document_store=document_store,
 609      metadata_fields=["category", "status", "priority"],
 610      top_k=10,
 611  )
 612  
 613  # Search for metadata
 614  result = retriever.run(query="Python")
 615  
 616  # Result structure:
 617  # {
 618  #     "metadata": [
 619  #         {"category": "Python", "status": "active", "priority": 1},
 620  #         {"category": "Python", "status": "inactive", "priority": 3},
 621  #     ]
 622  # }
 623  #
 624  # Note: Only the specified metadata_fields are returned in the results.
 625  # Other metadata fields (like "author") and document content are excluded.
 626  ```
 627  ````
 628  
 629  #### __init__
 630  
 631  ```python
 632  __init__(
 633      *,
 634      document_store: OpenSearchDocumentStore,
 635      metadata_fields: list[str],
 636      top_k: int = 20,
 637      exact_match_weight: float = 0.6,
 638      mode: Literal["strict", "fuzzy"] = "fuzzy",
 639      fuzziness: int | Literal["AUTO"] = 2,
 640      prefix_length: int = 0,
 641      max_expansions: int = 200,
 642      tie_breaker: float = 0.7,
 643      jaccard_n: int = 3,
 644      raise_on_failure: bool = True
 645  ) -> None
 646  ```
 647  
 648  Create the OpenSearchMetadataRetriever component.
 649  
 650  **Parameters:**
 651  
 652  - **document_store** (<code>OpenSearchDocumentStore</code>) – An instance of OpenSearchDocumentStore to use with the Retriever.
 653  - **metadata_fields** (<code>list\[str\]</code>) – List of metadata field names to search within each document's metadata.
 654  - **top_k** (<code>int</code>) – Maximum number of top results to return based on relevance. Default is 20.
 655  - **exact_match_weight** (<code>float</code>) – Weight to boost the score of exact matches in metadata fields.
 656    Default is 0.6. It's used on both "strict" and "fuzzy" modes and applied after the search executes.
 657  - **mode** (<code>Literal['strict', 'fuzzy']</code>) – Search mode. "strict" uses prefix and wildcard matching,
 658    "fuzzy" uses fuzzy matching with dis_max queries. Default is "fuzzy".
 659    In both modes, results are scored using Jaccard similarity (n-gram based)
 660    computed server-side via a Painless script; n is controlled by jaccard_n.
 661  - **fuzziness** (<code>int | Literal['AUTO']</code>) – Maximum allowed Damerau-Levenshtein distance (edit distance) for fuzzy matching.
 662    Accepts an integer (e.g., 0, 1, 2) or "AUTO" which chooses based on term length.
 663    Default is 2. Only applies when mode is "fuzzy".
 664  - **prefix_length** (<code>int</code>) – Number of leading characters that must match exactly before fuzzy matching applies.
 665    Default is 0 (no prefix requirement). Only applies when mode is "fuzzy".
 666  - **max_expansions** (<code>int</code>) – Maximum number of term variations the fuzzy query can generate.
 667    Default is 200. Only applies when mode is "fuzzy".
 668  - **tie_breaker** (<code>float</code>) – Weight (0..1) for other matching clauses in the dis_max query.
 669    Boosts documents that match multiple clauses. Default is 0.7. Only applies when mode is "fuzzy".
 670  - **jaccard_n** (<code>int</code>) – N-gram size for Jaccard similarity scoring. Default 3; larger n favors longer token matches.
 671  - **raise_on_failure** (<code>bool</code>) – If `True`, raises an exception if the API call fails.
 672    If `False`, logs a warning and returns an empty list.
 673  
 674  **Raises:**
 675  
 676  - <code>ValueError</code> – If `document_store` is not an instance of OpenSearchDocumentStore.
 677  
 678  #### to_dict
 679  
 680  ```python
 681  to_dict() -> dict[str, Any]
 682  ```
 683  
 684  Serializes the component to a dictionary.
 685  
 686  **Returns:**
 687  
 688  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 689  
 690  #### from_dict
 691  
 692  ```python
 693  from_dict(data: dict[str, Any]) -> OpenSearchMetadataRetriever
 694  ```
 695  
 696  Deserializes the component from a dictionary.
 697  
 698  **Parameters:**
 699  
 700  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
 701  
 702  **Returns:**
 703  
 704  - <code>OpenSearchMetadataRetriever</code> – Deserialized component.
 705  
 706  #### run
 707  
 708  ```python
 709  run(
 710      query: str,
 711      *,
 712      document_store: OpenSearchDocumentStore | None = None,
 713      metadata_fields: list[str] | None = None,
 714      top_k: int | None = None,
 715      exact_match_weight: float | None = None,
 716      mode: Literal["strict", "fuzzy"] | None = None,
 717      fuzziness: int | Literal["AUTO"] | None = None,
 718      prefix_length: int | None = None,
 719      max_expansions: int | None = None,
 720      tie_breaker: float | None = None,
 721      jaccard_n: int | None = None,
 722      filters: dict[str, Any] | None = None
 723  ) -> dict[str, list[dict[str, Any]]]
 724  ```
 725  
 726  Execute a search query against the metadata fields of documents stored in the Document Store.
 727  
 728  **Parameters:**
 729  
 730  - **query** (<code>str</code>) – The search query string, which can contain multiple comma-separated parts.
 731    Each part will be searched across all specified fields.
 732  - **document_store** (<code>OpenSearchDocumentStore | None</code>) – The Document Store to run the query against.
 733    If not provided, the one provided in `__init__` is used.
 734  - **metadata_fields** (<code>list\[str\] | None</code>) – List of metadata field names to search within.
 735    If not provided, the fields provided in `__init__` are used.
 736  - **top_k** (<code>int | None</code>) – Maximum number of top results to return based on relevance.
 737    The search retrieves up to 1000 hits from OpenSearch, then applies boosting and filters
 738    the results to the top_k most relevant matches.
 739    If not provided, the top_k provided in `__init__` is used.
 740  - **exact_match_weight** (<code>float | None</code>) – Weight to boost the score of exact matches in metadata fields.
 741    If not provided, the exact_match_weight provided in `__init__` is used.
 742  - **mode** (<code>Literal['strict', 'fuzzy'] | None</code>) – Search mode. "strict" uses prefix and wildcard matching,
 743    "fuzzy" uses fuzzy matching with dis_max queries.
 744    In both modes, results are scored using Jaccard similarity (n-gram based) via a Painless script.
 745    If not provided, the mode provided in `__init__` is used.
 746  - **fuzziness** (<code>int | Literal['AUTO'] | None</code>) – Maximum allowed Damerau-Levenshtein distance (edit distance) for fuzzy matching.
 747    Accepts an integer (e.g., 0, 1, 2) or "AUTO" which chooses based on term length.
 748    Only applies when mode is "fuzzy". If not provided, the fuzziness provided in `__init__` is used.
 749  - **prefix_length** (<code>int | None</code>) – Number of leading characters that must match exactly before fuzzy matching applies.
 750    Only applies when mode is "fuzzy". If not provided, the prefix_length provided in `__init__` is used.
 751  - **max_expansions** (<code>int | None</code>) – Maximum number of term variations the fuzzy query can generate.
 752    Only applies when mode is "fuzzy". If not provided, the max_expansions provided in `__init__` is used.
 753  - **tie_breaker** (<code>float | None</code>) – Weight (0..1) for other matching clauses; boosts docs matching multiple
 754    clauses. Only applies when mode is "fuzzy". If not provided, the tie_breaker provided in `__init__` is used.
 755  - **jaccard_n** (<code>int | None</code>) – N-gram size for Jaccard similarity scoring. If not provided, the jaccard_n from `__init__`
 756    is used.
 757  - **filters** (<code>dict\[str, Any\] | None</code>) – Additional filters to apply to the search query.
 758  
 759  **Returns:**
 760  
 761  - <code>dict\[str, list\[dict\[str, Any\]\]\]</code> – A dictionary containing the top-k retrieved metadata results.
 762  
 763  Example:
 764  \`\`\`python
 765  from haystack import Document
 766  
 767  ````
 768  # First, add a document with matching metadata to the store
 769  store.write_documents([
 770      Document(
 771          content="Python programming guide",
 772          meta={"category": "Python", "status": "active", "priority": 1}
 773      )
 774  ])
 775  
 776  retriever = OpenSearchMetadataRetriever(
 777      document_store=store,
 778      metadata_fields=["category", "status", "priority"]
 779  )
 780  result = retriever.run(query="Python, active")
 781  # Returns: {"metadata": [{"category": "Python", "status": "active", "priority": 1}]}
 782  ```
 783  ````
 784  
 785  #### run_async
 786  
 787  ```python
 788  run_async(
 789      query: str,
 790      *,
 791      document_store: OpenSearchDocumentStore | None = None,
 792      metadata_fields: list[str] | None = None,
 793      top_k: int | None = None,
 794      exact_match_weight: float | None = None,
 795      mode: Literal["strict", "fuzzy"] | None = None,
 796      fuzziness: int | Literal["AUTO"] | None = None,
 797      prefix_length: int | None = None,
 798      max_expansions: int | None = None,
 799      tie_breaker: float | None = None,
 800      jaccard_n: int | None = None,
 801      filters: dict[str, Any] | None = None
 802  ) -> dict[str, list[dict[str, Any]]]
 803  ```
 804  
 805  Asynchronously execute a search query against the metadata fields of documents stored in the Document Store.
 806  
 807  **Parameters:**
 808  
 809  - **query** (<code>str</code>) – The search query string, which can contain multiple comma-separated parts.
 810    Each part will be searched across all specified fields.
 811  - **document_store** (<code>OpenSearchDocumentStore | None</code>) – The Document Store to run the query against.
 812    If not provided, the one provided in `__init__` is used.
 813  - **metadata_fields** (<code>list\[str\] | None</code>) – List of metadata field names to search within.
 814    If not provided, the fields provided in `__init__` are used.
 815  - **top_k** (<code>int | None</code>) – Maximum number of top results to return based on relevance.
 816    The search retrieves up to 1000 hits from OpenSearch, then applies boosting and filters
 817    the results to the top_k most relevant matches.
 818    If not provided, the top_k provided in `__init__` is used.
 819  - **exact_match_weight** (<code>float | None</code>) – Weight to boost the score of exact matches in metadata fields.
 820    If not provided, the exact_match_weight provided in `__init__` is used.
 821  - **mode** (<code>Literal['strict', 'fuzzy'] | None</code>) – Search mode. "strict" uses prefix and wildcard matching,
 822    "fuzzy" uses fuzzy matching with dis_max queries.
 823    In both modes, results are scored using Jaccard similarity (n-gram based) via a Painless script.
 824    If not provided, the mode provided in `__init__` is used.
 825  - **fuzziness** (<code>int | Literal['AUTO'] | None</code>) – Maximum allowed Damerau-Levenshtein distance (edit distance) for fuzzy matching.
 826    Accepts an integer (e.g., 0, 1, 2) or "AUTO" which chooses based on term length.
 827    Only applies when mode is "fuzzy". If not provided, the fuzziness provided in `__init__` is used.
 828  - **prefix_length** (<code>int | None</code>) – Number of leading characters that must match exactly before fuzzy matching applies.
 829    Only applies when mode is "fuzzy". If not provided, the prefix_length provided in `__init__` is used.
 830  - **max_expansions** (<code>int | None</code>) – Maximum number of term variations the fuzzy query can generate.
 831    Only applies when mode is "fuzzy". If not provided, the max_expansions provided in `__init__` is used.
 832  - **tie_breaker** (<code>float | None</code>) – Weight (0..1) for other matching clauses; boosts docs matching multiple clauses.
 833    Only applies when mode is "fuzzy". If not provided, the tie_breaker provided in `__init__` is used.
 834  - **jaccard_n** (<code>int | None</code>) – N-gram size for Jaccard similarity scoring. If not provided, the jaccard_n from `__init__`
 835    is used.
 836  - **filters** (<code>dict\[str, Any\] | None</code>) – Additional filters to apply to the search query.
 837  
 838  **Returns:**
 839  
 840  - <code>dict\[str, list\[dict\[str, Any\]\]\]</code> – A dictionary containing the top-k retrieved metadata results.
 841  
 842  Example:
 843  \`\`\`python
 844  from haystack import Document
 845  
 846  ````
 847  # First, add a document with matching metadata to the store
 848  await store.write_documents_async([
 849      Document(
 850          content="Python programming guide",
 851          meta={"category": "Python", "status": "active", "priority": 1}
 852      )
 853  ])
 854  
 855  retriever = OpenSearchMetadataRetriever(
 856      document_store=store,
 857      metadata_fields=["category", "status", "priority"]
 858  )
 859  result = await retriever.run_async(query="Python, active")
 860  # Returns: {"metadata": [{"category": "Python", "status": "active", "priority": 1}]}
 861  ```
 862  ````
 863  
 864  ## haystack_integrations.components.retrievers.opensearch.open_search_hybrid_retriever
 865  
 866  ### OpenSearchHybridRetriever
 867  
 868  A hybrid retriever that combines embedding-based and keyword-based retrieval from OpenSearch.
 869  
 870  Example usage:
 871  
 872  Make sure you have "sentence-transformers>=3.0.0":
 873  
 874  ```
 875  pip install haystack-ai datasets "sentence-transformers>=3.0.0"
 876  ```
 877  
 878  And OpenSearch running. You can run OpenSearch with Docker:
 879  
 880  ```
 881  docker run -d --name opensearch-nosec -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node"
 882  -e "DISABLE_SECURITY_PLUGIN=true" opensearchproject/opensearch:2.12.0
 883  ```
 884  
 885  ```python
 886  from haystack import Document
 887  from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
 888  from haystack_integrations.components.retrievers.opensearch import OpenSearchHybridRetriever
 889  from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
 890  
 891  # Initialize the document store
 892  doc_store = OpenSearchDocumentStore(
 893      hosts=["<http://localhost:9200>"],
 894      index="document_store",
 895      embedding_dim=384,
 896  )
 897  
 898  # Create some sample documents
 899  docs = [
 900      Document(content="Machine learning is a subset of artificial intelligence."),
 901      Document(content="Deep learning is a subset of machine learning."),
 902      Document(content="Natural language processing is a field of AI."),
 903      Document(content="Reinforcement learning is a type of machine learning."),
 904      Document(content="Supervised learning is a type of machine learning."),
 905  ]
 906  
 907  # Embed the documents and add them to the document store
 908  doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
 909  doc_embedder.warm_up()
 910  docs = doc_embedder.run(docs)
 911  doc_store.write_documents(docs['documents'])
 912  
 913  # Initialize some haystack text embedder, in this case the SentenceTransformersTextEmbedder
 914  embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
 915  
 916  # Initialize the hybrid retriever
 917  retriever = OpenSearchHybridRetriever(
 918      document_store=doc_store,
 919      embedder=embedder,
 920      top_k_bm25=3,
 921      top_k_embedding=3,
 922      join_mode="reciprocal_rank_fusion"
 923  )
 924  
 925  # Run the retriever
 926  results = retriever.run(query="What is reinforcement learning?", filters_bm25=None, filters_embedding=None)
 927  
 928  >> results['documents']
 929  {'documents': [Document(id=..., content: 'Reinforcement learning is a type of machine learning.', score: 1.0),
 930    Document(id=..., content: 'Supervised learning is a type of machine learning.', score: 0.9760624679979518),
 931    Document(id=..., content: 'Deep learning is a subset of machine learning.', score: 0.4919354838709677),
 932    Document(id=..., content: 'Machine learning is a subset of artificial intelligence.', score: 0.4841269841269841)]}
 933  ```
 934  
 935  #### __init__
 936  
 937  ```python
 938  __init__(
 939      document_store: OpenSearchDocumentStore,
 940      *,
 941      embedder: TextEmbedder,
 942      filters_bm25: dict[str, Any] | None = None,
 943      fuzziness: int | str = "AUTO",
 944      top_k_bm25: int = 10,
 945      scale_score: bool = False,
 946      all_terms_must_match: bool = False,
 947      filter_policy_bm25: str | FilterPolicy = FilterPolicy.REPLACE,
 948      custom_query_bm25: dict[str, Any] | None = None,
 949      filters_embedding: dict[str, Any] | None = None,
 950      top_k_embedding: int = 10,
 951      filter_policy_embedding: str | FilterPolicy = FilterPolicy.REPLACE,
 952      custom_query_embedding: dict[str, Any] | None = None,
 953      search_kwargs_embedding: dict[str, Any] | None = None,
 954      join_mode: str | JoinMode = JoinMode.RECIPROCAL_RANK_FUSION,
 955      weights: list[float] | None = None,
 956      top_k: int | None = None,
 957      sort_by_score: bool = True,
 958      **kwargs: Any
 959  ) -> None
 960  ```
 961  
 962  Initialize the OpenSearchHybridRetriever using both embedding-based and keyword-based retrieval methods.
 963  
 964  This is a super component to retrieve documents from OpenSearch using both retrieval methods.
 965  
 966  We don't explicitly define all the init parameters of the components in the constructor, for each
 967  of the components, since that would be around 20+ parameters. Instead, we define the most important ones
 968  and pass the rest as kwargs. This is to keep the constructor clean and easy to read.
 969  
 970  If you need to pass extra parameters to the components, you can do so by passing them as kwargs. It expects
 971  a dictionary with the component name as the key and the parameters as the value. The component name should be:
 972  
 973  ```
 974  - "bm25_retriever" -> OpenSearchBM25Retriever
 975  - "embedding_retriever" -> OpenSearchEmbeddingRetriever
 976  ```
 977  
 978  **Parameters:**
 979  
 980  - **document_store** (<code>OpenSearchDocumentStore</code>) – The OpenSearchDocumentStore to use for retrieval.
 981  - **embedder** (<code>TextEmbedder</code>) – A TextEmbedder to use for embedding the query.
 982    See `haystack.components.embedders.types.protocol.TextEmbedder` for more information.
 983  - **filters_bm25** (<code>dict\[str, Any\] | None</code>) – Filters for the BM25 retriever.
 984  - **fuzziness** (<code>int | str</code>) – The fuzziness for the BM25 retriever.
 985  - **top_k_bm25** (<code>int</code>) – The number of results to return from the BM25 retriever.
 986  - **scale_score** (<code>bool</code>) – Whether to scale the score for the BM25 retriever.
 987  - **all_terms_must_match** (<code>bool</code>) – Whether all terms must match for the BM25 retriever.
 988  - **filter_policy_bm25** (<code>str | FilterPolicy</code>) – The filter policy for the BM25 retriever.
 989  - **custom_query_bm25** (<code>dict\[str, Any\] | None</code>) – A custom query for the BM25 retriever.
 990  - **filters_embedding** (<code>dict\[str, Any\] | None</code>) – Filters for the embedding retriever.
 991  - **top_k_embedding** (<code>int</code>) – The number of results to return from the embedding retriever.
 992  - **filter_policy_embedding** (<code>str | FilterPolicy</code>) – The filter policy for the embedding retriever.
 993  - **custom_query_embedding** (<code>dict\[str, Any\] | None</code>) – A custom query for the embedding retriever.
 994  - **search_kwargs_embedding** (<code>dict\[str, Any\] | None</code>) – Additional search kwargs for the embedding retriever.
 995  - **join_mode** (<code>str | JoinMode</code>) – The mode to use for joining the results from the BM25 and embedding retrievers.
 996  - **weights** (<code>list\[float\] | None</code>) – The weights for the joiner.
 997  - **top_k** (<code>int | None</code>) – The number of results to return from the joiner.
 998  - **sort_by_score** (<code>bool</code>) – Whether to sort the results by score.
 999  - \*\***kwargs** (<code>Any</code>) – Additional keyword arguments. Use the following keys to pass extra parameters to the retrievers:
1000  - "bm25_retriever" -> OpenSearchBM25Retriever
1001  - "embedding_retriever" -> OpenSearchEmbeddingRetriever
1002  
1003  #### warm_up
1004  
1005  ```python
1006  warm_up() -> None
1007  ```
1008  
1009  Warm up the underlying pipeline components.
1010  
1011  #### run
1012  
1013  ```python
1014  run(
1015      query: str,
1016      filters_bm25: dict[str, Any] | None = None,
1017      filters_embedding: dict[str, Any] | None = None,
1018      top_k_bm25: int | None = None,
1019      top_k_embedding: int | None = None,
1020  ) -> dict[str, list[Document]]
1021  ```
1022  
1023  Run the hybrid retrieval pipeline and return retrieved documents.
1024  
1025  #### to_dict
1026  
1027  ```python
1028  to_dict() -> dict[str, Any]
1029  ```
1030  
1031  Serialize OpenSearchHybridRetriever to a dictionary.
1032  
1033  **Returns:**
1034  
1035  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1036  
1037  #### from_dict
1038  
1039  ```python
1040  from_dict(data: dict[str, Any]) -> OpenSearchHybridRetriever
1041  ```
1042  
1043  Deserialize an OpenSearchHybridRetriever from a dictionary.
1044  
1045  ## haystack_integrations.components.retrievers.opensearch.sql_retriever
1046  
1047  ### OpenSearchSQLRetriever
1048  
1049  Executes raw OpenSearch SQL queries against an OpenSearchDocumentStore.
1050  
1051  This component allows you to execute SQL queries directly against the OpenSearch index,
1052  which is useful for fetching metadata, aggregations, and other structured data at runtime.
1053  
1054  Returns the raw JSON response from the OpenSearch SQL API.
1055  
1056  #### __init__
1057  
1058  ```python
1059  __init__(
1060      *,
1061      document_store: OpenSearchDocumentStore,
1062      raise_on_failure: bool = True,
1063      fetch_size: int | None = None
1064  ) -> None
1065  ```
1066  
1067  Creates the OpenSearchSQLRetriever component.
1068  
1069  **Parameters:**
1070  
1071  - **document_store** (<code>OpenSearchDocumentStore</code>) – An instance of OpenSearchDocumentStore to use with the Retriever.
1072  - **raise_on_failure** (<code>bool</code>) – Whether to raise an exception if the API call fails. Otherwise, log a warning and return None.
1073  - **fetch_size** (<code>int | None</code>) – Optional number of results to fetch per page. If not provided, the default
1074    fetch size set in OpenSearch is used.
1075  
1076  **Raises:**
1077  
1078  - <code>ValueError</code> – If `document_store` is not an instance of OpenSearchDocumentStore.
1079  
1080  #### to_dict
1081  
1082  ```python
1083  to_dict() -> dict[str, Any]
1084  ```
1085  
1086  Serializes the component to a dictionary.
1087  
1088  **Returns:**
1089  
1090  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1091  
1092  #### from_dict
1093  
1094  ```python
1095  from_dict(data: dict[str, Any]) -> OpenSearchSQLRetriever
1096  ```
1097  
1098  Deserializes the component from a dictionary.
1099  
1100  **Parameters:**
1101  
1102  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
1103  
1104  **Returns:**
1105  
1106  - <code>OpenSearchSQLRetriever</code> – Deserialized component.
1107  
1108  #### run
1109  
1110  ```python
1111  run(
1112      query: str,
1113      document_store: OpenSearchDocumentStore | None = None,
1114      fetch_size: int | None = None,
1115  ) -> dict[str, dict[str, Any]]
1116  ```
1117  
1118  Execute a raw OpenSearch SQL query against the index.
1119  
1120  **Parameters:**
1121  
1122  - **query** (<code>str</code>) – The OpenSearch SQL query to execute.
1123  - **document_store** (<code>OpenSearchDocumentStore | None</code>) – Optionally, an instance of OpenSearchDocumentStore to use with the Retriever.
1124  - **fetch_size** (<code>int | None</code>) – Optional number of results to fetch per page. If not provided, uses the value
1125    specified during initialization, or the default fetch size set in OpenSearch.
1126  
1127  **Returns:**
1128  
1129  - <code>dict\[str, dict\[str, Any\]\]</code> – A dictionary containing the raw JSON response from OpenSearch SQL API:
1130    - result: The raw JSON response from OpenSearch (dict) or None on error.
1131  
1132  Example:
1133  `python     retriever = OpenSearchSQLRetriever(document_store=document_store)     result = retriever.run(         query="SELECT content, category FROM my_index WHERE category = 'A'"     )     # result["result"] contains the raw OpenSearch JSON response     # For regular queries: result["result"]["hits"]["hits"] contains documents     # For aggregate queries: result["result"]["aggregations"] contains aggregations     `
1134  
1135  #### run_async
1136  
1137  ```python
1138  run_async(
1139      query: str,
1140      document_store: OpenSearchDocumentStore | None = None,
1141      fetch_size: int | None = None,
1142  ) -> dict[str, dict[str, Any]]
1143  ```
1144  
1145  Asynchronously execute a raw OpenSearch SQL query against the index.
1146  
1147  **Parameters:**
1148  
1149  - **query** (<code>str</code>) – The OpenSearch SQL query to execute.
1150  - **document_store** (<code>OpenSearchDocumentStore | None</code>) – Optionally, an instance of OpenSearchDocumentStore to use with the Retriever.
1151  - **fetch_size** (<code>int | None</code>) – Optional number of results to fetch per page. If not provided, uses the value
1152    specified during initialization, or the default fetch size set in OpenSearch.
1153  
1154  **Returns:**
1155  
1156  - <code>dict\[str, dict\[str, Any\]\]</code> – A dictionary containing the raw JSON response from OpenSearch SQL API:
1157    - result: The raw JSON response from OpenSearch (dict) or None on error.
1158  
1159  Example:
1160  `python     retriever = OpenSearchSQLRetriever(document_store=document_store)     result = await retriever.run_async(         query="SELECT content, category FROM my_index WHERE category = 'A'"     )     # result["result"] contains the raw OpenSearch JSON response     # For regular queries: result["result"]["hits"]["hits"] contains documents     # For aggregate queries: result["result"]["aggregations"] contains aggregations     `
1161  
1162  ## haystack_integrations.document_stores.opensearch.document_store
1163  
1164  ### OpenSearchDocumentStore
1165  
1166  An instance of an OpenSearch database you can use to store all types of data.
1167  
1168  This document store is a thin wrapper around the OpenSearch client.
1169  It allows you to store and retrieve documents from an OpenSearch index.
1170  
1171  Usage example:
1172  
1173  ```python
1174  from haystack_integrations.document_stores.opensearch import (
1175      OpenSearchDocumentStore,
1176  )
1177  from haystack import Document
1178  
1179  document_store = OpenSearchDocumentStore(hosts="localhost:9200")
1180  
1181  document_store.write_documents(
1182      [
1183          Document(content="My first document", id="1"),
1184          Document(content="My second document", id="2"),
1185      ]
1186  )
1187  
1188  print(document_store.count_documents())
1189  # 2
1190  
1191  print(document_store.filter_documents())
1192  # [Document(id='1', content='My first document', ...), Document(id='2', content='My second document', ...)]
1193  ```
1194  
1195  #### __init__
1196  
1197  ```python
1198  __init__(
1199      *,
1200      hosts: Hosts | None = None,
1201      index: str = "default",
1202      max_chunk_bytes: int = DEFAULT_MAX_CHUNK_BYTES,
1203      embedding_dim: int = 768,
1204      return_embedding: bool = False,
1205      method: dict[str, Any] | None = None,
1206      mappings: dict[str, Any] | None = None,
1207      settings: dict[str, Any] | None = DEFAULT_SETTINGS,
1208      create_index: bool = True,
1209      http_auth: (
1210          tuple[Secret, Secret]
1211          | tuple[str, str]
1212          | list[str]
1213          | str
1214          | AWSAuth
1215          | None
1216      ) = (
1217          Secret.from_env_var("OPENSEARCH_USERNAME", strict=False),
1218          Secret.from_env_var("OPENSEARCH_PASSWORD", strict=False),
1219      ),
1220      use_ssl: bool | None = None,
1221      verify_certs: bool | None = None,
1222      timeout: int | None = None,
1223      nested_fields: list[str] | Literal["*"] | None = None,
1224      **kwargs: Any
1225  ) -> None
1226  ```
1227  
1228  Creates a new OpenSearchDocumentStore instance.
1229  
1230  The `embeddings_dim`, `method`, `mappings`, and `settings` arguments are only used if the index does not
1231  exist and needs to be created. If the index already exists, its current configurations will be used.
1232  
1233  For more information on connection parameters, see the [official OpenSearch documentation](https://opensearch.org/docs/latest/clients/python-low-level/#connecting-to-opensearch)
1234  
1235  **Parameters:**
1236  
1237  - **hosts** (<code>Hosts | None</code>) – List of hosts running the OpenSearch client. Defaults to None
1238  - **index** (<code>str</code>) – Name of index in OpenSearch, if it doesn't exist it will be created. Defaults to "default"
1239  - **max_chunk_bytes** (<code>int</code>) – Maximum size of the requests in bytes. Defaults to 100MB
1240  - **embedding_dim** (<code>int</code>) – Dimension of the embeddings. Defaults to 768
1241  - **return_embedding** (<code>bool</code>) – Whether to return the embedding of the retrieved Documents. This parameter also applies to the
1242    `filter_documents` and `filter_documents_async` methods.
1243  - **method** (<code>dict\[str, Any\] | None</code>) – The method definition of the underlying configuration of the approximate k-NN algorithm. Please
1244    see the [official OpenSearch docs](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/#method-definitions)
1245    for more information. Defaults to None
1246  - **mappings** (<code>dict\[str, Any\] | None</code>) – The mapping of how the documents are stored and indexed. Please see the [official OpenSearch docs](https://opensearch.org/docs/latest/field-types/)
1247    for more information. If None, it uses the embedding_dim and method arguments to create default mappings.
1248    Defaults to None
1249  - **settings** (<code>dict\[str, Any\] | None</code>) – The settings of the index to be created. Please see the [official OpenSearch docs](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/#index-settings)
1250    for more information. Defaults to `{"index.knn": True}`.
1251  - **create_index** (<code>bool</code>) – Whether to create the index if it doesn't exist. Defaults to True
1252  - **http_auth** (<code>tuple\[Secret, Secret\] | tuple\[str, str\] | list\[str\] | str | AWSAuth | None</code>) – http_auth param passed to the underlying connection class.
1253    For basic authentication with default connection class `Urllib3HttpConnection` this can be
1254  - a tuple of (username, password)
1255  - a list of [username, password]
1256  - a string of "username:password"
1257    If not provided, will read values from OPENSEARCH_USERNAME and OPENSEARCH_PASSWORD environment variables.
1258    For AWS authentication with `Urllib3HttpConnection` pass an instance of `AWSAuth`.
1259    Defaults to None
1260  - **use_ssl** (<code>bool | None</code>) – Whether to use SSL. Defaults to None
1261  - **verify_certs** (<code>bool | None</code>) – Whether to verify certificates. Defaults to None
1262  - **timeout** (<code>int | None</code>) – Timeout in seconds. Defaults to None
1263  - **nested_fields** (<code>list\[str\] | Literal['\*'] | None</code>) – List of metadata field paths (without the `meta.` prefix) that should be mapped
1264    as OpenSearch `nested` type, enabling multi-condition filtering on array-of-objects fields.
1265    Pass `"*"` to auto-detect `list[dict]` fields and map them as nested from
1266    the first `write_documents` batch.
1267    When the index already exists, nested fields are discovered from the live mapping.
1268    Defaults to None (no nested support).
1269  - \*\***kwargs** (<code>Any</code>) – Optional arguments that `OpenSearch` takes. For the full list of supported kwargs,
1270    see the [official OpenSearch reference](https://opensearch-project.github.io/opensearch-py/api-ref/clients/opensearch_client.html)
1271  
1272  #### create_index
1273  
1274  ```python
1275  create_index(
1276      index: str | None = None,
1277      mappings: dict[str, Any] | None = None,
1278      settings: dict[str, Any] | None = None,
1279  ) -> None
1280  ```
1281  
1282  Creates an index in OpenSearch.
1283  
1284  Note that this method ignores the `create_index` argument from the constructor.
1285  
1286  **Parameters:**
1287  
1288  - **index** (<code>str | None</code>) – Name of the index to create. If None, the index name from the constructor is used.
1289  - **mappings** (<code>dict\[str, Any\] | None</code>) – The mapping of how the documents are stored and indexed. Please see the [official OpenSearch docs](https://opensearch.org/docs/latest/field-types/)
1290    for more information. If None, the mappings from the constructor are used.
1291  - **settings** (<code>dict\[str, Any\] | None</code>) – The settings of the index to be created. Please see the [official OpenSearch docs](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/#index-settings)
1292    for more information. If None, the settings from the constructor are used.
1293  
1294  #### to_dict
1295  
1296  ```python
1297  to_dict() -> dict[str, Any]
1298  ```
1299  
1300  Serializes the component to a dictionary.
1301  
1302  **Returns:**
1303  
1304  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1305  
1306  #### from_dict
1307  
1308  ```python
1309  from_dict(data: dict[str, Any]) -> OpenSearchDocumentStore
1310  ```
1311  
1312  Deserializes the component from a dictionary.
1313  
1314  **Parameters:**
1315  
1316  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
1317  
1318  **Returns:**
1319  
1320  - <code>OpenSearchDocumentStore</code> – Deserialized component.
1321  
1322  #### count_documents
1323  
1324  ```python
1325  count_documents() -> int
1326  ```
1327  
1328  Returns how many documents are present in the document store.
1329  
1330  #### count_documents_async
1331  
1332  ```python
1333  count_documents_async() -> int
1334  ```
1335  
1336  Asynchronously returns the total number of documents in the document store.
1337  
1338  #### filter_documents
1339  
1340  ```python
1341  filter_documents(filters: dict[str, Any] | None = None) -> list[Document]
1342  ```
1343  
1344  Returns the documents that match the filters provided.
1345  
1346  For a detailed specification of the filters,
1347  refer to the [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering)
1348  
1349  **Parameters:**
1350  
1351  - **filters** (<code>dict\[str, Any\] | None</code>) – The filters to apply to the document list.
1352  
1353  **Returns:**
1354  
1355  - <code>list\[Document\]</code> – A list of Documents that match the given filters.
1356  
1357  #### filter_documents_async
1358  
1359  ```python
1360  filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document]
1361  ```
1362  
1363  Asynchronously returns the documents that match the filters provided.
1364  
1365  For a detailed specification of the filters,
1366  refer to the [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering)
1367  
1368  **Parameters:**
1369  
1370  - **filters** (<code>dict\[str, Any\] | None</code>) – The filters to apply to the document list.
1371  
1372  **Returns:**
1373  
1374  - <code>list\[Document\]</code> – A list of Documents that match the given filters.
1375  
1376  #### write_documents
1377  
1378  ```python
1379  write_documents(
1380      documents: list[Document],
1381      policy: DuplicatePolicy = DuplicatePolicy.NONE,
1382      refresh: Literal["wait_for", True, False] = "wait_for",
1383  ) -> int
1384  ```
1385  
1386  Writes documents to the document store.
1387  
1388  **Parameters:**
1389  
1390  - **documents** (<code>list\[Document\]</code>) – A list of Documents to write to the document store.
1391  - **policy** (<code>DuplicatePolicy</code>) – The duplicate policy to use when writing documents.
1392  - **refresh** (<code>Literal['wait_for', True, False]</code>) – Controls when changes are made visible to search operations.
1393  - `True`: Force refresh immediately after the operation.
1394  - `False`: Do not refresh (better performance for bulk operations).
1395  - `"wait_for"`: Wait for the next refresh cycle (default, ensures read-your-writes consistency).
1396    For more details, see the [OpenSearch refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/index-document/).
1397  
1398  **Returns:**
1399  
1400  - <code>int</code> – The number of documents written to the document store.
1401  
1402  **Raises:**
1403  
1404  - <code>DuplicateDocumentError</code> – If a document with the same id already exists in the document store
1405    and the policy is set to `DuplicatePolicy.FAIL` (or not specified).
1406  
1407  #### write_documents_async
1408  
1409  ```python
1410  write_documents_async(
1411      documents: list[Document],
1412      policy: DuplicatePolicy = DuplicatePolicy.NONE,
1413      refresh: Literal["wait_for", True, False] = "wait_for",
1414  ) -> int
1415  ```
1416  
1417  Asynchronously writes documents to the document store.
1418  
1419  **Parameters:**
1420  
1421  - **documents** (<code>list\[Document\]</code>) – A list of Documents to write to the document store.
1422  - **policy** (<code>DuplicatePolicy</code>) – The duplicate policy to use when writing documents.
1423  - **refresh** (<code>Literal['wait_for', True, False]</code>) – Controls when changes are made visible to search operations.
1424  - `True`: Force refresh immediately after the operation.
1425  - `False`: Do not refresh (better performance for bulk operations).
1426  - `"wait_for"`: Wait for the next refresh cycle (default, ensures read-your-writes consistency).
1427    For more details, see the [OpenSearch refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/index-document/).
1428  
1429  **Returns:**
1430  
1431  - <code>int</code> – The number of documents written to the document store.
1432  
1433  #### delete_documents
1434  
1435  ```python
1436  delete_documents(
1437      document_ids: list[str],
1438      refresh: Literal["wait_for", True, False] = "wait_for",
1439      routing: dict[str, str] | None = None,
1440  ) -> None
1441  ```
1442  
1443  Deletes documents that match the provided `document_ids` from the document store.
1444  
1445  **Parameters:**
1446  
1447  - **document_ids** (<code>list\[str\]</code>) – the document ids to delete
1448  - **refresh** (<code>Literal['wait_for', True, False]</code>) – Controls when changes are made visible to search operations.
1449  - `True`: Force refresh immediately after the operation.
1450  - `False`: Do not refresh (better performance for bulk operations).
1451  - `"wait_for"`: Wait for the next refresh cycle (default, ensures read-your-writes consistency).
1452    For more details, see the [OpenSearch refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/index-document/).
1453  - **routing** (<code>dict\[str, str\] | None</code>) – A dictionary mapping document IDs to their routing values.
1454    Routing values are used to determine the shard where documents are stored.
1455    If provided, the routing value for each document will be used during deletion.
1456  
1457  #### delete_documents_async
1458  
1459  ```python
1460  delete_documents_async(
1461      document_ids: list[str],
1462      refresh: Literal["wait_for", True, False] = "wait_for",
1463      routing: dict[str, str] | None = None,
1464  ) -> None
1465  ```
1466  
1467  Asynchronously deletes documents that match the provided `document_ids` from the document store.
1468  
1469  **Parameters:**
1470  
1471  - **document_ids** (<code>list\[str\]</code>) – the document ids to delete
1472  - **refresh** (<code>Literal['wait_for', True, False]</code>) – Controls when changes are made visible to search operations.
1473  - `True`: Force refresh immediately after the operation.
1474  - `False`: Do not refresh (better performance for bulk operations).
1475  - `"wait_for"`: Wait for the next refresh cycle (default, ensures read-your-writes consistency).
1476    For more details, see the [OpenSearch refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/index-document/).
1477  - **routing** (<code>dict\[str, str\] | None</code>) – A dictionary mapping document IDs to their routing values.
1478    Routing values are used to determine the shard where documents are stored.
1479    If provided, the routing value for each document will be used during deletion.
1480  
1481  #### delete_all_documents
1482  
1483  ```python
1484  delete_all_documents(
1485      recreate_index: bool = False, refresh: bool = True
1486  ) -> None
1487  ```
1488  
1489  Deletes all documents in the document store.
1490  
1491  **Parameters:**
1492  
1493  - **recreate_index** (<code>bool</code>) – If True, the index will be deleted and recreated with the original mappings and
1494    settings. If False, all documents will be deleted using the `delete_by_query` API.
1495  - **refresh** (<code>bool</code>) – If True, OpenSearch refreshes all shards involved in the delete by query after the request
1496    completes. If False, no refresh is performed. For more details, see the
1497    [OpenSearch delete_by_query refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/delete-by-query/).
1498  
1499  #### delete_all_documents_async
1500  
1501  ```python
1502  delete_all_documents_async(
1503      recreate_index: bool = False, refresh: bool = True
1504  ) -> None
1505  ```
1506  
1507  Asynchronously deletes all documents in the document store.
1508  
1509  **Parameters:**
1510  
1511  - **recreate_index** (<code>bool</code>) – If True, the index will be deleted and recreated with the original mappings and
1512    settings. If False, all documents will be deleted using the `delete_by_query` API.
1513  - **refresh** (<code>bool</code>) – If True, OpenSearch refreshes all shards involved in the delete by query after the request
1514    completes. If False, no refresh is performed. For more details, see the
1515    [OpenSearch delete_by_query refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/delete-by-query/).
1516  
1517  #### delete_by_filter
1518  
1519  ```python
1520  delete_by_filter(filters: dict[str, Any], refresh: bool = False) -> int
1521  ```
1522  
1523  Deletes all documents that match the provided filters.
1524  
1525  **Parameters:**
1526  
1527  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for deletion.
1528    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
1529  - **refresh** (<code>bool</code>) – If True, OpenSearch refreshes all shards involved in the delete by query after the request
1530    completes so that subsequent reads (e.g. count_documents) see the update. If False, no refresh is
1531    performed (better for bulk deletes). For more details, see the
1532    [OpenSearch delete_by_query refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/delete-by-query/).
1533  
1534  **Returns:**
1535  
1536  - <code>int</code> – The number of documents deleted.
1537  
1538  #### delete_by_filter_async
1539  
1540  ```python
1541  delete_by_filter_async(filters: dict[str, Any], refresh: bool = False) -> int
1542  ```
1543  
1544  Asynchronously deletes all documents that match the provided filters.
1545  
1546  **Parameters:**
1547  
1548  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for deletion.
1549    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
1550  - **refresh** (<code>bool</code>) – If True, OpenSearch refreshes all shards involved in the delete by query after the request
1551    completes so that subsequent reads see the update. If False, no refresh is performed. For more details,
1552    see the [OpenSearch delete_by_query refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/delete-by-query/).
1553  
1554  **Returns:**
1555  
1556  - <code>int</code> – The number of documents deleted.
1557  
1558  #### update_by_filter
1559  
1560  ```python
1561  update_by_filter(
1562      filters: dict[str, Any], meta: dict[str, Any], refresh: bool = False
1563  ) -> int
1564  ```
1565  
1566  Updates the metadata of all documents that match the provided filters.
1567  
1568  **Parameters:**
1569  
1570  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for updating.
1571    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
1572  - **meta** (<code>dict\[str, Any\]</code>) – The metadata fields to update.
1573  - **refresh** (<code>bool</code>) – If True, OpenSearch refreshes all shards involved in the update by query after the request
1574    completes. If False, no refresh is performed. For more details, see the
1575    [OpenSearch update_by_query refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/update-by-query/).
1576  
1577  **Returns:**
1578  
1579  - <code>int</code> – The number of documents updated.
1580  
1581  #### update_by_filter_async
1582  
1583  ```python
1584  update_by_filter_async(
1585      filters: dict[str, Any], meta: dict[str, Any], refresh: bool = False
1586  ) -> int
1587  ```
1588  
1589  Asynchronously updates the metadata of all documents that match the provided filters.
1590  
1591  **Parameters:**
1592  
1593  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for updating.
1594    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
1595  - **meta** (<code>dict\[str, Any\]</code>) – The metadata fields to update.
1596  - **refresh** (<code>bool</code>) – If True, OpenSearch refreshes all shards involved in the update by query after the request
1597    completes. If False, no refresh is performed. For more details, see the
1598    [OpenSearch update_by_query refresh documentation](https://opensearch.org/docs/latest/api-reference/document-apis/update-by-query/).
1599  
1600  **Returns:**
1601  
1602  - <code>int</code> – The number of documents updated.
1603  
1604  #### count_documents_by_filter
1605  
1606  ```python
1607  count_documents_by_filter(filters: dict[str, Any]) -> int
1608  ```
1609  
1610  Returns the number of documents that match the provided filters.
1611  
1612  **Parameters:**
1613  
1614  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents.
1615    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
1616  
1617  **Returns:**
1618  
1619  - <code>int</code> – The number of documents that match the filters.
1620  
1621  #### count_documents_by_filter_async
1622  
1623  ```python
1624  count_documents_by_filter_async(filters: dict[str, Any]) -> int
1625  ```
1626  
1627  Asynchronously returns the number of documents that match the provided filters.
1628  
1629  **Parameters:**
1630  
1631  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents.
1632    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
1633  
1634  **Returns:**
1635  
1636  - <code>int</code> – The number of documents that match the filters.
1637  
1638  #### count_unique_metadata_by_filter
1639  
1640  ```python
1641  count_unique_metadata_by_filter(
1642      filters: dict[str, Any], metadata_fields: list[str]
1643  ) -> dict[str, int]
1644  ```
1645  
1646  Returns the number of unique values for each specified metadata field of the documents that match the filters.
1647  
1648  **Parameters:**
1649  
1650  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents.
1651    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
1652  - **metadata_fields** (<code>list\[str\]</code>) – List of field names to calculate unique values for.
1653    Field names can include or omit the "meta." prefix.
1654  
1655  **Returns:**
1656  
1657  - <code>dict\[str, int\]</code> – A dictionary mapping each metadata field name to the count of its unique values among the filtered
1658    documents.
1659  
1660  **Raises:**
1661  
1662  - <code>ValueError</code> – If any of the requested fields don't exist in the index mapping.
1663  
1664  #### count_unique_metadata_by_filter_async
1665  
1666  ```python
1667  count_unique_metadata_by_filter_async(
1668      filters: dict[str, Any], metadata_fields: list[str]
1669  ) -> dict[str, int]
1670  ```
1671  
1672  Asynchronously returns the number of unique values for each specified metadata field matching the filters.
1673  
1674  **Parameters:**
1675  
1676  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents.
1677    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
1678  - **metadata_fields** (<code>list\[str\]</code>) – List of field names to calculate unique values for.
1679    Field names can include or omit the "meta." prefix.
1680  
1681  **Returns:**
1682  
1683  - <code>dict\[str, int\]</code> – A dictionary mapping each metadata field name to the count of its unique values among the filtered
1684    documents.
1685  
1686  **Raises:**
1687  
1688  - <code>ValueError</code> – If any of the requested fields don't exist in the index mapping.
1689  
1690  #### get_metadata_fields_info
1691  
1692  ```python
1693  get_metadata_fields_info() -> dict[str, dict[str, str]]
1694  ```
1695  
1696  Returns the information about the fields in the index.
1697  
1698  If we populated the index with documents like:
1699  
1700  ```python
1701      Document(content="Doc 1", meta={"category": "A", "status": "active", "priority": 1})
1702      Document(content="Doc 2", meta={"category": "B", "status": "inactive"})
1703  ```
1704  
1705  This method would return:
1706  
1707  ```python
1708      {
1709          'content': {'type': 'text'},
1710          'category': {'type': 'keyword'},
1711          'status': {'type': 'keyword'},
1712          'priority': {'type': 'long'},
1713      }
1714  ```
1715  
1716  **Returns:**
1717  
1718  - <code>dict\[str, dict\[str, str\]\]</code> – The information about the fields in the index.
1719  
1720  #### get_metadata_fields_info_async
1721  
1722  ```python
1723  get_metadata_fields_info_async() -> dict[str, dict[str, str]]
1724  ```
1725  
1726  Asynchronously returns the information about the fields in the index.
1727  
1728  If we populated the index with documents like:
1729  
1730  ```python
1731      Document(content="Doc 1", meta={"category": "A", "status": "active", "priority": 1})
1732      Document(content="Doc 2", meta={"category": "B", "status": "inactive"})
1733  ```
1734  
1735  This method would return:
1736  
1737  ```python
1738      {
1739          'content': {'type': 'text'},
1740          'category': {'type': 'keyword'},
1741          'status': {'type': 'keyword'},
1742          'priority': {'type': 'long'},
1743      }
1744  ```
1745  
1746  **Returns:**
1747  
1748  - <code>dict\[str, dict\[str, str\]\]</code> – The information about the fields in the index.
1749  
1750  #### get_metadata_field_min_max
1751  
1752  ```python
1753  get_metadata_field_min_max(metadata_field: str) -> dict[str, int | None]
1754  ```
1755  
1756  Returns the minimum and maximum values for the given metadata field.
1757  
1758  **Parameters:**
1759  
1760  - **metadata_field** (<code>str</code>) – The metadata field to get the minimum and maximum values for.
1761  
1762  **Returns:**
1763  
1764  - <code>dict\[str, int | None\]</code> – A dictionary with the keys "min" and "max", where each value is the minimum or maximum value of the
1765    metadata field across all documents.
1766  
1767  #### get_metadata_field_min_max_async
1768  
1769  ```python
1770  get_metadata_field_min_max_async(metadata_field: str) -> dict[str, int | None]
1771  ```
1772  
1773  Asynchronously returns the minimum and maximum values for the given metadata field.
1774  
1775  **Parameters:**
1776  
1777  - **metadata_field** (<code>str</code>) – The metadata field to get the minimum and maximum values for.
1778  
1779  **Returns:**
1780  
1781  - <code>dict\[str, int | None\]</code> – A dictionary with the keys "min" and "max", where each value is the minimum or maximum value of the
1782    metadata field across all documents.
1783  
1784  #### get_metadata_field_unique_values
1785  
1786  ```python
1787  get_metadata_field_unique_values(
1788      metadata_field: str,
1789      search_term: str | None = None,
1790      size: int | None = 10000,
1791      after: dict[str, Any] | None = None,
1792  ) -> tuple[list[str], dict[str, Any] | None]
1793  ```
1794  
1795  Returns unique values for a metadata field, optionally filtered by a search term in the content.
1796  
1797  Uses composite aggregations for proper pagination beyond 10k results.
1798  
1799  **Parameters:**
1800  
1801  - **metadata_field** (<code>str</code>) – The metadata field to get unique values for.
1802  - **search_term** (<code>str | None</code>) – Optional search term to filter documents by matching in the content field.
1803  - **size** (<code>int | None</code>) – The number of unique values to return per page. Defaults to 10000.
1804  - **after** (<code>dict\[str, Any\] | None</code>) – Optional pagination key from the previous response. Use None for the first page.
1805    For subsequent pages, pass the `after_key` from the previous response.
1806  
1807  **Returns:**
1808  
1809  - <code>tuple\[list\[str\], dict\[str, Any\] | None\]</code> – A tuple containing (list of unique values, after_key for pagination).
1810    The after_key is None when there are no more results. Use it in the `after` parameter
1811    for the next page.
1812  
1813  #### get_metadata_field_unique_values_async
1814  
1815  ```python
1816  get_metadata_field_unique_values_async(
1817      metadata_field: str,
1818      search_term: str | None = None,
1819      size: int | None = 10000,
1820      after: dict[str, Any] | None = None,
1821  ) -> tuple[list[str], dict[str, Any] | None]
1822  ```
1823  
1824  Asynchronously returns unique values for a metadata field, optionally filtered by a search term in the content.
1825  
1826  Uses composite aggregations for proper pagination beyond 10k results.
1827  
1828  **Parameters:**
1829  
1830  - **metadata_field** (<code>str</code>) – The metadata field to get unique values for.
1831  - **search_term** (<code>str | None</code>) – Optional search term to filter documents by matching in the content field.
1832  - **size** (<code>int | None</code>) – The number of unique values to return per page. Defaults to 10000.
1833  - **after** (<code>dict\[str, Any\] | None</code>) – Optional pagination key from the previous response. Use None for the first page.
1834    For subsequent pages, pass the `after_key` from the previous response.
1835  
1836  **Returns:**
1837  
1838  - <code>tuple\[list\[str\], dict\[str, Any\] | None\]</code> – A tuple containing (list of unique values, after_key for pagination).
1839    The after_key is None when there are no more results. Use it in the `after` parameter
1840    for the next page.
1841  
1842  ## haystack_integrations.document_stores.opensearch.filters
1843  
1844  ### normalize_filters
1845  
1846  ```python
1847  normalize_filters(
1848      filters: dict[str, Any], nested_fields: set[str] | None = None
1849  ) -> dict[str, Any]
1850  ```
1851  
1852  Converts Haystack filters in OpenSearch compatible filters.
1853  
1854  **Parameters:**
1855  
1856  - **filters** (<code>dict\[str, Any\]</code>) – Haystack filter dictionary.
1857  - **nested_fields** (<code>set\[str\] | None</code>) – Set of metadata field paths that are mapped as `nested` type in OpenSearch.
1858    When provided, conditions targeting sub-fields of these paths are wrapped in `nested` queries.