Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.20 / integrations-api / weaviate.md
weaviate.md
   1  ---
   2  title: "Weaviate"
   3  id: integrations-weaviate
   4  description: "Weaviate integration for Haystack"
   5  slug: "/integrations-weaviate"
   6  ---
   7  
   8  
   9  ## haystack_integrations.components.retrievers.weaviate.bm25_retriever
  10  
  11  ### WeaviateBM25Retriever
  12  
  13  A component for retrieving documents from Weaviate using the BM25 algorithm.
  14  
  15  Example usage:
  16  
  17  ```python
  18  from haystack_integrations.document_stores.weaviate.document_store import (
  19      WeaviateDocumentStore,
  20  )
  21  from haystack_integrations.components.retrievers.weaviate.bm25_retriever import (
  22      WeaviateBM25Retriever,
  23  )
  24  
  25  document_store = WeaviateDocumentStore(url="http://localhost:8080")
  26  retriever = WeaviateBM25Retriever(document_store=document_store)
  27  retriever.run(query="How to make a pizza", top_k=3)
  28  ```
  29  
  30  #### __init__
  31  
  32  ```python
  33  __init__(
  34      *,
  35      document_store: WeaviateDocumentStore,
  36      filters: dict[str, Any] | None = None,
  37      top_k: int = 10,
  38      filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
  39  )
  40  ```
  41  
  42  Create a new instance of WeaviateBM25Retriever.
  43  
  44  **Parameters:**
  45  
  46  - **document_store** (<code>WeaviateDocumentStore</code>) – Instance of WeaviateDocumentStore that will be used from this retriever.
  47  - **filters** (<code>dict\[str, Any\] | None</code>) – Custom filters applied when running the retriever
  48  - **top_k** (<code>int</code>) – Maximum number of documents to return
  49  - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied.
  50  
  51  #### to_dict
  52  
  53  ```python
  54  to_dict() -> dict[str, Any]
  55  ```
  56  
  57  Serializes the component to a dictionary.
  58  
  59  **Returns:**
  60  
  61  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
  62  
  63  #### from_dict
  64  
  65  ```python
  66  from_dict(data: dict[str, Any]) -> WeaviateBM25Retriever
  67  ```
  68  
  69  Deserializes the component from a dictionary.
  70  
  71  **Parameters:**
  72  
  73  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
  74  
  75  **Returns:**
  76  
  77  - <code>WeaviateBM25Retriever</code> – Deserialized component.
  78  
  79  #### run
  80  
  81  ```python
  82  run(
  83      query: str, filters: dict[str, Any] | None = None, top_k: int | None = None
  84  ) -> dict[str, list[Document]]
  85  ```
  86  
  87  Retrieves documents from Weaviate using the BM25 algorithm.
  88  
  89  **Parameters:**
  90  
  91  - **query** (<code>str</code>) – The query text.
  92  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on
  93    the `filter_policy` chosen at retriever initialization. See init method docstring for more
  94    details.
  95  - **top_k** (<code>int | None</code>) – The maximum number of documents to return.
  96  
  97  **Returns:**
  98  
  99  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
 100  - `documents`: List of documents returned by the search engine.
 101  
 102  #### run_async
 103  
 104  ```python
 105  run_async(
 106      query: str, filters: dict[str, Any] | None = None, top_k: int | None = None
 107  ) -> dict[str, list[Document]]
 108  ```
 109  
 110  Asynchronously retrieves documents from Weaviate using the BM25 algorithm.
 111  
 112  **Parameters:**
 113  
 114  - **query** (<code>str</code>) – The query text.
 115  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on
 116    the `filter_policy` chosen at retriever initialization. See init method docstring for more
 117    details.
 118  - **top_k** (<code>int | None</code>) – The maximum number of documents to return.
 119  
 120  **Returns:**
 121  
 122  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
 123  - `documents`: List of documents returned by the search engine.
 124  
 125  ## haystack_integrations.components.retrievers.weaviate.embedding_retriever
 126  
 127  ### WeaviateEmbeddingRetriever
 128  
 129  A retriever that uses Weaviate's vector search to find similar documents based on the embeddings of the query.
 130  
 131  #### __init__
 132  
 133  ```python
 134  __init__(
 135      *,
 136      document_store: WeaviateDocumentStore,
 137      filters: dict[str, Any] | None = None,
 138      top_k: int = 10,
 139      distance: float | None = None,
 140      certainty: float | None = None,
 141      filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
 142  )
 143  ```
 144  
 145  Creates a new instance of WeaviateEmbeddingRetriever.
 146  
 147  **Parameters:**
 148  
 149  - **document_store** (<code>WeaviateDocumentStore</code>) – Instance of WeaviateDocumentStore that will be used from this retriever.
 150  - **filters** (<code>dict\[str, Any\] | None</code>) – Custom filters applied when running the retriever.
 151  - **top_k** (<code>int</code>) – Maximum number of documents to return.
 152  - **distance** (<code>float | None</code>) – The maximum allowed distance between Documents' embeddings.
 153  - **certainty** (<code>float | None</code>) – Normalized distance between the result item and the search vector.
 154  - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied.
 155  
 156  **Raises:**
 157  
 158  - <code>ValueError</code> – If both `distance` and `certainty` are provided.
 159    See https://weaviate.io/developers/weaviate/api/graphql/search-operators#variables to learn more about
 160    `distance` and `certainty` parameters.
 161  
 162  #### to_dict
 163  
 164  ```python
 165  to_dict() -> dict[str, Any]
 166  ```
 167  
 168  Serializes the component to a dictionary.
 169  
 170  **Returns:**
 171  
 172  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 173  
 174  #### from_dict
 175  
 176  ```python
 177  from_dict(data: dict[str, Any]) -> WeaviateEmbeddingRetriever
 178  ```
 179  
 180  Deserializes the component from a dictionary.
 181  
 182  **Parameters:**
 183  
 184  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
 185  
 186  **Returns:**
 187  
 188  - <code>WeaviateEmbeddingRetriever</code> – Deserialized component.
 189  
 190  #### run
 191  
 192  ```python
 193  run(
 194      query_embedding: list[float],
 195      filters: dict[str, Any] | None = None,
 196      top_k: int | None = None,
 197      distance: float | None = None,
 198      certainty: float | None = None,
 199  ) -> dict[str, list[Document]]
 200  ```
 201  
 202  Retrieves documents from Weaviate using the vector search.
 203  
 204  **Parameters:**
 205  
 206  - **query_embedding** (<code>list\[float\]</code>) – Embedding of the query.
 207  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on
 208    the `filter_policy` chosen at retriever initialization. See init method docstring for more
 209    details.
 210  - **top_k** (<code>int | None</code>) – The maximum number of documents to return.
 211  - **distance** (<code>float | None</code>) – The maximum allowed distance between Documents' embeddings.
 212  - **certainty** (<code>float | None</code>) – Normalized distance between the result item and the search vector.
 213  
 214  **Returns:**
 215  
 216  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
 217  - `documents`: List of documents returned by the search engine.
 218  
 219  **Raises:**
 220  
 221  - <code>ValueError</code> – If both `distance` and `certainty` are provided.
 222    See https://weaviate.io/developers/weaviate/api/graphql/search-operators#variables to learn more about
 223    `distance` and `certainty` parameters.
 224  
 225  #### run_async
 226  
 227  ```python
 228  run_async(
 229      query_embedding: list[float],
 230      filters: dict[str, Any] | None = None,
 231      top_k: int | None = None,
 232      distance: float | None = None,
 233      certainty: float | None = None,
 234  ) -> dict[str, list[Document]]
 235  ```
 236  
 237  Asynchronously retrieves documents from Weaviate using the vector search.
 238  
 239  **Parameters:**
 240  
 241  - **query_embedding** (<code>list\[float\]</code>) – Embedding of the query.
 242  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on
 243    the `filter_policy` chosen at retriever initialization. See init method docstring for more
 244    details.
 245  - **top_k** (<code>int | None</code>) – The maximum number of documents to return.
 246  - **distance** (<code>float | None</code>) – The maximum allowed distance between Documents' embeddings.
 247  - **certainty** (<code>float | None</code>) – Normalized distance between the result item and the search vector.
 248  
 249  **Returns:**
 250  
 251  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
 252  - `documents`: List of documents returned by the search engine.
 253  
 254  **Raises:**
 255  
 256  - <code>ValueError</code> – If both `distance` and `certainty` are provided.
 257    See https://weaviate.io/developers/weaviate/api/graphql/search-operators#variables to learn more about
 258    `distance` and `certainty` parameters.
 259  
 260  ## haystack_integrations.components.retrievers.weaviate.hybrid_retriever
 261  
 262  ### WeaviateHybridRetriever
 263  
 264  A retriever that uses Weaviate's hybrid search to find similar documents based on the embeddings of the query.
 265  
 266  #### __init__
 267  
 268  ```python
 269  __init__(
 270      *,
 271      document_store: WeaviateDocumentStore,
 272      filters: dict[str, Any] | None = None,
 273      top_k: int = 10,
 274      alpha: float = 0.7,
 275      max_vector_distance: float | None = None,
 276      filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
 277  )
 278  ```
 279  
 280  Creates a new instance of WeaviateHybridRetriever.
 281  
 282  **Parameters:**
 283  
 284  - **document_store** (<code>WeaviateDocumentStore</code>) – Instance of WeaviateDocumentStore that will be used from this retriever.
 285  - **filters** (<code>dict\[str, Any\] | None</code>) – Custom filters applied when running the retriever.
 286  - **top_k** (<code>int</code>) – Maximum number of documents to return.
 287  - **alpha** (<code>float</code>) – Blending factor for hybrid retrieval in Weaviate. Must be in the range `[0.0, 1.0]`.
 288  
 289  Weaviate hybrid search combines keyword (BM25) and vector scores into a single ranking. `alpha` controls
 290  how much each part contributes to the final score:
 291  
 292  - `alpha = 0.0`: only keyword (BM25) scoring is used.
 293  - `alpha = 1.0`: only vector similarity scoring is used.
 294  - Values in between blend the two; higher values favor the vector score, lower values favor BM25.
 295  
 296  By default, 0.7 is used which is the Weaviate server default.
 297  
 298  See the official Weaviate docs on Hybrid Search parameters for more details:
 299  
 300  - [Hybrid search parameters](https://weaviate.io/developers/weaviate/search/hybrid#parameters)
 301  - [Hybrid Search](https://docs.weaviate.io/weaviate/concepts/search/hybrid-search)
 302  - **max_vector_distance** (<code>float | None</code>) – Optional threshold that restricts the vector part of the hybrid search to candidates within a maximum
 303    vector distance. Candidates with a distance larger than this threshold are excluded from the vector portion
 304    before blending.
 305  
 306  Use this to prune low-quality vector matches while still benefitting from keyword recall. Leave `None` to
 307  use Weaviate's default behavior without an explicit cutoff.
 308  
 309  See the official Weaviate docs on Hybrid Search parameters for more details:
 310  
 311  - [Hybrid search parameters](https://weaviate.io/developers/weaviate/search/hybrid#parameters)
 312  - [Hybrid Search](https://docs.weaviate.io/weaviate/concepts/search/hybrid-search)
 313  - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied.
 314  
 315  #### to_dict
 316  
 317  ```python
 318  to_dict() -> dict[str, Any]
 319  ```
 320  
 321  Serializes the component to a dictionary.
 322  
 323  **Returns:**
 324  
 325  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 326  
 327  #### from_dict
 328  
 329  ```python
 330  from_dict(data: dict[str, Any]) -> WeaviateHybridRetriever
 331  ```
 332  
 333  Deserializes the component from a dictionary.
 334  
 335  **Parameters:**
 336  
 337  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
 338  
 339  **Returns:**
 340  
 341  - <code>WeaviateHybridRetriever</code> – Deserialized component.
 342  
 343  #### run
 344  
 345  ```python
 346  run(
 347      query: str,
 348      query_embedding: list[float],
 349      filters: dict[str, Any] | None = None,
 350      top_k: int | None = None,
 351      alpha: float | None = None,
 352      max_vector_distance: float | None = None,
 353  ) -> dict[str, list[Document]]
 354  ```
 355  
 356  Retrieves documents from Weaviate using hybrid search.
 357  
 358  **Parameters:**
 359  
 360  - **query** (<code>str</code>) – The query text.
 361  - **query_embedding** (<code>list\[float\]</code>) – Embedding of the query.
 362  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on
 363    the `filter_policy` chosen at retriever initialization. See init method docstring for more
 364    details.
 365  - **top_k** (<code>int | None</code>) – The maximum number of documents to return.
 366  - **alpha** (<code>float | None</code>) – Blending factor for hybrid retrieval in Weaviate. Must be in the range `[0.0, 1.0]`.
 367  
 368  Weaviate hybrid search combines keyword (BM25) and vector scores into a single ranking. `alpha` controls
 369  how much each part contributes to the final score:
 370  
 371  - `alpha = 0.0`: only keyword (BM25) scoring is used.
 372  - `alpha = 1.0`: only vector similarity scoring is used.
 373  - Values in between blend the two; higher values favor the vector score, lower values favor BM25.
 374  
 375  If `None`, the Weaviate server default is used.
 376  
 377  See the official Weaviate docs on Hybrid Search parameters for more details:
 378  
 379  - [Hybrid search parameters](https://weaviate.io/developers/weaviate/search/hybrid#parameters)
 380  - [Hybrid Search](https://docs.weaviate.io/weaviate/concepts/search/hybrid-search)
 381  - **max_vector_distance** (<code>float | None</code>) – Optional threshold that restricts the vector part of the hybrid search to candidates within a maximum
 382    vector distance. Candidates with a distance larger than this threshold are excluded from the vector portion
 383    before blending.
 384  
 385  Use this to prune low-quality vector matches while still benefitting from keyword recall. Leave `None` to
 386  use Weaviate's default behavior without an explicit cutoff.
 387  
 388  See the official Weaviate docs on Hybrid Search parameters for more details:
 389  
 390  - [Hybrid search parameters](https://weaviate.io/developers/weaviate/search/hybrid#parameters)
 391  - [Hybrid Search](https://docs.weaviate.io/weaviate/concepts/search/hybrid-search)
 392  
 393  **Returns:**
 394  
 395  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
 396  - `documents`: List of documents returned by the search engine.
 397  
 398  #### run_async
 399  
 400  ```python
 401  run_async(
 402      query: str,
 403      query_embedding: list[float],
 404      filters: dict[str, Any] | None = None,
 405      top_k: int | None = None,
 406      alpha: float | None = None,
 407      max_vector_distance: float | None = None,
 408  ) -> dict[str, list[Document]]
 409  ```
 410  
 411  Asynchronously retrieves documents from Weaviate using hybrid search.
 412  
 413  **Parameters:**
 414  
 415  - **query** (<code>str</code>) – The query text.
 416  - **query_embedding** (<code>list\[float\]</code>) – Embedding of the query.
 417  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on
 418    the `filter_policy` chosen at retriever initialization. See init method docstring for more
 419    details.
 420  - **top_k** (<code>int | None</code>) – The maximum number of documents to return.
 421  - **alpha** (<code>float | None</code>) – Blending factor for hybrid retrieval in Weaviate. Must be in the range `[0.0, 1.0]`.
 422  
 423  Weaviate hybrid search combines keyword (BM25) and vector scores into a single ranking. `alpha` controls
 424  how much each part contributes to the final score:
 425  
 426  - `alpha = 0.0`: only keyword (BM25) scoring is used.
 427  - `alpha = 1.0`: only vector similarity scoring is used.
 428  - Values in between blend the two; higher values favor the vector score, lower values favor BM25.
 429  
 430  If `None`, the Weaviate server default is used.
 431  
 432  See the official Weaviate docs on Hybrid Search parameters for more details:
 433  
 434  - [Hybrid search parameters](https://weaviate.io/developers/weaviate/search/hybrid#parameters)
 435  - [Hybrid Search](https://docs.weaviate.io/weaviate/concepts/search/hybrid-search)
 436  - **max_vector_distance** (<code>float | None</code>) – Optional threshold that restricts the vector part of the hybrid search to candidates within a maximum
 437    vector distance. Candidates with a distance larger than this threshold are excluded from the vector portion
 438    before blending.
 439  
 440  Use this to prune low-quality vector matches while still benefitting from keyword recall. Leave `None` to
 441  use Weaviate's default behavior without an explicit cutoff.
 442  
 443  See the official Weaviate docs on Hybrid Search parameters for more details:
 444  
 445  - [Hybrid search parameters](https://weaviate.io/developers/weaviate/search/hybrid#parameters)
 446  - [Hybrid Search](https://docs.weaviate.io/weaviate/concepts/search/hybrid-search)
 447  
 448  **Returns:**
 449  
 450  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
 451  - `documents`: List of documents returned by the search engine.
 452  
 453  ## haystack_integrations.document_stores.weaviate.auth
 454  
 455  ### SupportedAuthTypes
 456  
 457  Bases: <code>Enum</code>
 458  
 459  Supported auth credentials for WeaviateDocumentStore.
 460  
 461  ### AuthCredentials
 462  
 463  Bases: <code>ABC</code>
 464  
 465  Base class for all auth credentials supported by WeaviateDocumentStore.
 466  Can be used to deserialize from dict any of the supported auth credentials.
 467  
 468  #### to_dict
 469  
 470  ```python
 471  to_dict() -> dict[str, Any]
 472  ```
 473  
 474  Converts the object to a dictionary representation for serialization.
 475  
 476  #### from_dict
 477  
 478  ```python
 479  from_dict(data: dict[str, Any]) -> AuthCredentials
 480  ```
 481  
 482  Converts a dictionary representation to an auth credentials object.
 483  
 484  #### resolve_value
 485  
 486  ```python
 487  resolve_value()
 488  ```
 489  
 490  Resolves all the secrets in the auth credentials object and returns the corresponding Weaviate object.
 491  All subclasses must implement this method.
 492  
 493  ### AuthApiKey
 494  
 495  Bases: <code>AuthCredentials</code>
 496  
 497  AuthCredentials for API key authentication.
 498  By default it will load `api_key` from the environment variable `WEAVIATE_API_KEY`.
 499  
 500  ### AuthBearerToken
 501  
 502  Bases: <code>AuthCredentials</code>
 503  
 504  AuthCredentials for Bearer token authentication.
 505  By default it will load `access_token` from the environment variable `WEAVIATE_ACCESS_TOKEN`,
 506  and `refresh_token` from the environment variable
 507  `WEAVIATE_REFRESH_TOKEN`.
 508  `WEAVIATE_REFRESH_TOKEN` environment variable is optional.
 509  
 510  ### AuthClientCredentials
 511  
 512  Bases: <code>AuthCredentials</code>
 513  
 514  AuthCredentials for client credentials authentication.
 515  By default it will load `client_secret` from the environment variable `WEAVIATE_CLIENT_SECRET`, and
 516  `scope` from the environment variable `WEAVIATE_SCOPE`.
 517  `WEAVIATE_SCOPE` environment variable is optional, if set it can either be a string or a list of space
 518  separated strings. e.g "scope1" or "scope1 scope2".
 519  
 520  ### AuthClientPassword
 521  
 522  Bases: <code>AuthCredentials</code>
 523  
 524  AuthCredentials for username and password authentication.
 525  By default it will load `username` from the environment variable `WEAVIATE_USERNAME`,
 526  `password` from the environment variable `WEAVIATE_PASSWORD`, and
 527  `scope` from the environment variable `WEAVIATE_SCOPE`.
 528  `WEAVIATE_SCOPE` environment variable is optional, if set it can either be a string or a list of space
 529  separated strings. e.g "scope1" or "scope1 scope2".
 530  
 531  ## haystack_integrations.document_stores.weaviate.document_store
 532  
 533  ### WeaviateDocumentStore
 534  
 535  A WeaviateDocumentStore instance you
 536  can use with Weaviate Cloud Services or self-hosted instances.
 537  
 538  Usage example with Weaviate Cloud Services:
 539  
 540  ```python
 541  import os
 542  from haystack_integrations.document_stores.weaviate.auth import AuthApiKey
 543  from haystack_integrations.document_stores.weaviate.document_store import (
 544      WeaviateDocumentStore,
 545  )
 546  
 547  os.environ["WEAVIATE_API_KEY"] = "MY_API_KEY"
 548  
 549  document_store = WeaviateDocumentStore(
 550      url="rAnD0mD1g1t5.something.weaviate.cloud",
 551      auth_client_secret=AuthApiKey(),
 552  )
 553  ```
 554  
 555  Usage example with self-hosted Weaviate:
 556  
 557  ```python
 558  from haystack_integrations.document_stores.weaviate.document_store import (
 559      WeaviateDocumentStore,
 560  )
 561  
 562  document_store = WeaviateDocumentStore(url="http://localhost:8080")
 563  ```
 564  
 565  #### __init__
 566  
 567  ```python
 568  __init__(
 569      *,
 570      url: str | None = None,
 571      collection_settings: dict[str, Any] | None = None,
 572      auth_client_secret: AuthCredentials | None = None,
 573      additional_headers: dict | None = None,
 574      embedded_options: EmbeddedOptions | None = None,
 575      additional_config: AdditionalConfig | None = None,
 576      grpc_port: int = 50051,
 577      grpc_secure: bool = False
 578  ) -> None
 579  ```
 580  
 581  Create a new instance of WeaviateDocumentStore and connects to the Weaviate instance.
 582  
 583  **Parameters:**
 584  
 585  - **url** (<code>str | None</code>) – The URL to the weaviate instance.
 586  - **collection_settings** (<code>dict\[str, Any\] | None</code>) – The collection settings to use. If `None`, it will use a collection named `default` with the following
 587    properties:
 588  - \_original_id: text
 589  - content: text
 590  - blob_data: blob
 591  - blob_mime_type: text
 592  - score: number
 593    The Document `meta` fields are omitted in the default collection settings as we can't make assumptions
 594    on the structure of the meta field.
 595    We heavily recommend to create a custom collection with the correct meta properties
 596    for your use case.
 597    Another option is relying on the automatic schema generation, but that's not recommended for
 598    production use.
 599    See the official [Weaviate documentation](https://weaviate.io/developers/weaviate/manage-data/collections)
 600    for more information on collections and their properties.
 601  - **auth_client_secret** (<code>AuthCredentials | None</code>) – Authentication credentials. Can be one of the following types depending on the authentication mode:
 602  - `AuthBearerToken` to use existing access and (optionally, but recommended) refresh tokens
 603  - `AuthClientPassword` to use username and password for oidc Resource Owner Password flow
 604  - `AuthClientCredentials` to use a client secret for oidc client credential flow
 605  - `AuthApiKey` to use an API key
 606  - **additional_headers** (<code>dict | None</code>) – Additional headers to include in the requests. Can be used to set OpenAI/HuggingFace keys.
 607    OpenAI/HuggingFace key looks like this:
 608  
 609  ```
 610  {"X-OpenAI-Api-Key": "<THE-KEY>"}, {"X-HuggingFace-Api-Key": "<THE-KEY>"}
 611  ```
 612  
 613  - **embedded_options** (<code>EmbeddedOptions | None</code>) – If set, create an embedded Weaviate cluster inside the client. For a full list of options see
 614    `weaviate.embedded.EmbeddedOptions`.
 615  - **additional_config** (<code>AdditionalConfig | None</code>) – Additional and advanced configuration options for weaviate.
 616  - **grpc_port** (<code>int</code>) – The port to use for the gRPC connection.
 617  - **grpc_secure** (<code>bool</code>) – Whether to use a secure channel for the underlying gRPC API.
 618  
 619  #### close
 620  
 621  ```python
 622  close() -> None
 623  ```
 624  
 625  Close the synchronous Weaviate client connection.
 626  
 627  #### close_async
 628  
 629  ```python
 630  close_async() -> None
 631  ```
 632  
 633  Close the asynchronous Weaviate client connection.
 634  
 635  #### to_dict
 636  
 637  ```python
 638  to_dict() -> dict[str, Any]
 639  ```
 640  
 641  Serializes the component to a dictionary.
 642  
 643  **Returns:**
 644  
 645  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 646  
 647  #### from_dict
 648  
 649  ```python
 650  from_dict(data: dict[str, Any]) -> WeaviateDocumentStore
 651  ```
 652  
 653  Deserializes the component from a dictionary.
 654  
 655  **Parameters:**
 656  
 657  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
 658  
 659  **Returns:**
 660  
 661  - <code>WeaviateDocumentStore</code> – The deserialized component.
 662  
 663  #### count_documents
 664  
 665  ```python
 666  count_documents() -> int
 667  ```
 668  
 669  Returns the number of documents present in the DocumentStore.
 670  
 671  #### count_documents_async
 672  
 673  ```python
 674  count_documents_async() -> int
 675  ```
 676  
 677  Asynchronously returns the number of documents present in the DocumentStore.
 678  
 679  #### count_documents_by_filter
 680  
 681  ```python
 682  count_documents_by_filter(filters: dict[str, Any]) -> int
 683  ```
 684  
 685  Returns the number of documents that match the provided filters.
 686  
 687  **Parameters:**
 688  
 689  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents.
 690    For filter syntax, see
 691    [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering).
 692  
 693  **Returns:**
 694  
 695  - <code>int</code> – The number of documents that match the filters.
 696  
 697  #### count_documents_by_filter_async
 698  
 699  ```python
 700  count_documents_by_filter_async(filters: dict[str, Any]) -> int
 701  ```
 702  
 703  Asynchronously returns the number of documents that match the provided filters.
 704  
 705  **Parameters:**
 706  
 707  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents.
 708    For filter syntax, see
 709    [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering).
 710  
 711  **Returns:**
 712  
 713  - <code>int</code> – The number of documents that match the filters.
 714  
 715  #### get_metadata_fields_info
 716  
 717  ```python
 718  get_metadata_fields_info() -> dict[str, dict[str, str]]
 719  ```
 720  
 721  Returns metadata field names and their types, excluding special fields.
 722  
 723  Special fields (content, blob_data, blob_mime_type, \_original_id, score) are excluded
 724  as they are not user metadata fields.
 725  
 726  **Returns:**
 727  
 728  - <code>dict\[str, dict\[str, str\]\]</code> – A dictionary where keys are field names and values are dictionaries
 729    containing type information, e.g.:
 730  
 731  ```python
 732  {
 733      'number': {'type': 'int'},
 734      'date': {'type': 'date'},
 735      'category': {'type': 'text'},
 736      'status': {'type': 'text'}
 737  }
 738  ```
 739  
 740  #### get_metadata_fields_info_async
 741  
 742  ```python
 743  get_metadata_fields_info_async() -> dict[str, dict[str, str]]
 744  ```
 745  
 746  Asynchronously returns metadata field names and their types, excluding special fields.
 747  
 748  Special fields (content, blob_data, blob_mime_type, \_original_id, score) are excluded
 749  as they are not user metadata fields.
 750  
 751  **Returns:**
 752  
 753  - <code>dict\[str, dict\[str, str\]\]</code> – A dictionary where keys are field names and values are dictionaries
 754    containing type information, e.g.:
 755  
 756  ```python
 757  {
 758      'number': {'type': 'int'},
 759      'date': {'type': 'date'},
 760      'category': {'type': 'text'},
 761      'status': {'type': 'text'}
 762  }
 763  ```
 764  
 765  #### get_metadata_field_min_max
 766  
 767  ```python
 768  get_metadata_field_min_max(metadata_field: str) -> dict[str, Any]
 769  ```
 770  
 771  Returns the minimum and maximum values for a numeric or date metadata field.
 772  
 773  **Parameters:**
 774  
 775  - **metadata_field** (<code>str</code>) – The metadata field name to get min/max for.
 776    Can be prefixed with 'meta.' (e.g., 'meta.year' or 'year').
 777  
 778  **Returns:**
 779  
 780  - <code>dict\[str, Any\]</code> – A dictionary with 'min' and 'max' keys containing the respective values.
 781  
 782  **Raises:**
 783  
 784  - <code>ValueError</code> – If the field is not found or doesn't support min/max operations.
 785  
 786  #### get_metadata_field_min_max_async
 787  
 788  ```python
 789  get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any]
 790  ```
 791  
 792  Asynchronously returns the minimum and maximum values for a numeric or date metadata field.
 793  
 794  **Parameters:**
 795  
 796  - **metadata_field** (<code>str</code>) – The metadata field name to get min/max for.
 797    Can be prefixed with 'meta.' (e.g., 'meta.year' or 'year').
 798  
 799  **Returns:**
 800  
 801  - <code>dict\[str, Any\]</code> – A dictionary with 'min' and 'max' keys containing the respective values.
 802  
 803  **Raises:**
 804  
 805  - <code>ValueError</code> – If the field is not found or doesn't support min/max operations.
 806  
 807  #### count_unique_metadata_by_filter
 808  
 809  ```python
 810  count_unique_metadata_by_filter(
 811      filters: dict[str, Any], metadata_fields: list[str]
 812  ) -> dict[str, int]
 813  ```
 814  
 815  Returns the count of unique values for each specified metadata field.
 816  
 817  **Parameters:**
 818  
 819  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply when counting unique values.
 820    For filter syntax, see
 821    [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering).
 822  - **metadata_fields** (<code>list\[str\]</code>) – List of metadata field names to count unique values for.
 823    Field names can be prefixed with 'meta.' (e.g., 'meta.category' or 'category').
 824  
 825  **Returns:**
 826  
 827  - <code>dict\[str, int\]</code> – A dictionary mapping field names to counts of unique values.
 828  
 829  **Raises:**
 830  
 831  - <code>ValueError</code> – If any of the requested fields don't exist in the collection schema.
 832  
 833  #### count_unique_metadata_by_filter_async
 834  
 835  ```python
 836  count_unique_metadata_by_filter_async(
 837      filters: dict[str, Any], metadata_fields: list[str]
 838  ) -> dict[str, int]
 839  ```
 840  
 841  Asynchronously returns the count of unique values for each specified metadata field.
 842  
 843  **Parameters:**
 844  
 845  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply when counting unique values.
 846    For filter syntax, see
 847    [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering).
 848  - **metadata_fields** (<code>list\[str\]</code>) – List of metadata field names to count unique values for.
 849    Field names can be prefixed with 'meta.' (e.g., 'meta.category' or 'category').
 850  
 851  **Returns:**
 852  
 853  - <code>dict\[str, int\]</code> – A dictionary mapping field names to counts of unique values.
 854  
 855  **Raises:**
 856  
 857  - <code>ValueError</code> – If any of the requested fields don't exist in the collection schema.
 858  
 859  #### get_metadata_field_unique_values
 860  
 861  ```python
 862  get_metadata_field_unique_values(
 863      metadata_field: str,
 864      search_term: str | None = None,
 865      from_: int = 0,
 866      size: int = 10000,
 867  ) -> tuple[list[str], int]
 868  ```
 869  
 870  Returns unique values for a metadata field with pagination support.
 871  
 872  **Parameters:**
 873  
 874  - **metadata_field** (<code>str</code>) – The metadata field name to get unique values for.
 875    Can be prefixed with 'meta.' (e.g., 'meta.category' or 'category').
 876  - **search_term** (<code>str | None</code>) – Optional term to filter documents by content before
 877    extracting unique values. If provided, only documents whose content
 878    contains this term will be considered.
 879    Note: Uses substring matching (case-sensitive, no stemming).
 880  - **from\_** (<code>int</code>) – The starting offset for pagination (0-indexed). Defaults to 0.
 881  - **size** (<code>int</code>) – The maximum number of unique values to return. Defaults to 10000.
 882  
 883  **Returns:**
 884  
 885  - <code>tuple\[list\[str\], int\]</code> – A tuple of (list of unique values, total count of unique values).
 886  
 887  **Raises:**
 888  
 889  - <code>ValueError</code> – If the field is not found in the collection schema.
 890  
 891  #### get_metadata_field_unique_values_async
 892  
 893  ```python
 894  get_metadata_field_unique_values_async(
 895      metadata_field: str,
 896      search_term: str | None = None,
 897      from_: int = 0,
 898      size: int = 10000,
 899  ) -> tuple[list[str], int]
 900  ```
 901  
 902  Asynchronously returns unique values for a metadata field with pagination support.
 903  
 904  **Parameters:**
 905  
 906  - **metadata_field** (<code>str</code>) – The metadata field name to get unique values for.
 907    Can be prefixed with 'meta.' (e.g., 'meta.category' or 'category').
 908  - **search_term** (<code>str | None</code>) – Optional term to filter documents by content before
 909    extracting unique values. If provided, only documents whose content
 910    contains this term will be considered.
 911    Note: Uses substring matching (case-sensitive, no stemming).
 912  - **from\_** (<code>int</code>) – The starting offset for pagination (0-indexed). Defaults to 0.
 913  - **size** (<code>int</code>) – The maximum number of unique values to return. Defaults to 10000.
 914  
 915  **Returns:**
 916  
 917  - <code>tuple\[list\[str\], int\]</code> – A tuple of (list of unique values, total count of unique values).
 918  
 919  **Raises:**
 920  
 921  - <code>ValueError</code> – If the field is not found in the collection schema.
 922  
 923  #### filter_documents
 924  
 925  ```python
 926  filter_documents(filters: dict[str, Any] | None = None) -> list[Document]
 927  ```
 928  
 929  Returns the documents that match the filters provided.
 930  
 931  For a detailed specification of the filters, refer to the
 932  DocumentStore.filter_documents() protocol documentation.
 933  
 934  Note: The `contains` filter operator is case-sensitive (substring
 935  matching). For case-insensitive matching, normalize the value before
 936  building the filter.
 937  
 938  **Parameters:**
 939  
 940  - **filters** (<code>dict\[str, Any\] | None</code>) – The filters to apply to the document list.
 941  
 942  **Returns:**
 943  
 944  - <code>list\[Document\]</code> – A list of Documents that match the given filters.
 945  
 946  #### filter_documents_async
 947  
 948  ```python
 949  filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document]
 950  ```
 951  
 952  Asynchronously returns the documents that match the filters provided.
 953  
 954  For a detailed specification of the filters, refer to the
 955  DocumentStore.filter_documents() protocol documentation.
 956  
 957  Note: The `contains` filter operator is case-sensitive (substring
 958  matching). For case-insensitive matching, normalize the value before
 959  building the filter.
 960  
 961  **Parameters:**
 962  
 963  - **filters** (<code>dict\[str, Any\] | None</code>) – The filters to apply to the document list.
 964  
 965  **Returns:**
 966  
 967  - <code>list\[Document\]</code> – A list of Documents that match the given filters.
 968  
 969  #### write_documents
 970  
 971  ```python
 972  write_documents(
 973      documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
 974  ) -> int
 975  ```
 976  
 977  Writes documents to Weaviate using the specified policy.
 978  We recommend using a OVERWRITE policy as it's faster than other policies for Weaviate since it uses
 979  the batch API.
 980  We can't use the batch API for other policies as it doesn't return any information whether the document
 981  already exists or not. That prevents us from returning errors when using the FAIL policy or skipping a
 982  Document when using the SKIP policy.
 983  
 984  **Parameters:**
 985  
 986  - **documents** (<code>list\[Document\]</code>) – A list of documents to write into the document store.
 987  - **policy** (<code>DuplicatePolicy</code>) – DuplicatePolicy to apply when a document with the same ID already exists in the document store.
 988  
 989  **Returns:**
 990  
 991  - <code>int</code> – The number of documents written.
 992  
 993  **Raises:**
 994  
 995  - <code>ValueError</code> – When input is not valid.
 996  - <code>DuplicateDocumentError</code> – When duplicate documents are found and using a FAIL policy.
 997  - <code>DocumentStoreError</code> – When documents have failed to be batch written.
 998  
 999  #### write_documents_async
1000  
1001  ```python
1002  write_documents_async(
1003      documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
1004  ) -> int
1005  ```
1006  
1007  Asynchronously writes documents to Weaviate using the specified policy.
1008  We recommend using a OVERWRITE policy as it's faster than other policies for Weaviate since it uses
1009  the batch API.
1010  We can't use the batch API for other policies as it doesn't return any information whether the document
1011  already exists or not. That prevents us from returning errors when using the FAIL policy or skipping a
1012  Document when using the SKIP policy.
1013  
1014  **Parameters:**
1015  
1016  - **documents** (<code>list\[Document\]</code>) – A list of documents to write into the document store.
1017  - **policy** (<code>DuplicatePolicy</code>) – DuplicatePolicy to apply when a document with the same ID already exists in the document store.
1018  
1019  **Returns:**
1020  
1021  - <code>int</code> – The number of documents written.
1022  
1023  **Raises:**
1024  
1025  - <code>ValueError</code> – When input is not valid.
1026  - <code>DuplicateDocumentError</code> – When duplicate documents are found and using a FAIL policy.
1027  - <code>DocumentStoreError</code> – When documents have failed to be batch written.
1028  
1029  #### delete_documents
1030  
1031  ```python
1032  delete_documents(document_ids: list[str]) -> None
1033  ```
1034  
1035  Deletes all documents with matching document_ids from the DocumentStore.
1036  
1037  **Parameters:**
1038  
1039  - **document_ids** (<code>list\[str\]</code>) – The object_ids to delete.
1040  
1041  #### delete_documents_async
1042  
1043  ```python
1044  delete_documents_async(document_ids: list[str]) -> None
1045  ```
1046  
1047  Asynchronously deletes all documents with matching document_ids from the DocumentStore.
1048  
1049  **Parameters:**
1050  
1051  - **document_ids** (<code>list\[str\]</code>) – The object_ids to delete.
1052  
1053  #### delete_all_documents
1054  
1055  ```python
1056  delete_all_documents(
1057      *, recreate_index: bool = False, batch_size: int = 1000
1058  ) -> None
1059  ```
1060  
1061  Deletes all documents in a collection.
1062  
1063  If recreate_index is False, it keeps the collection but deletes documents iteratively.
1064  If recreate_index is True, the collection is dropped and faithfully recreated.
1065  This is recommended for performance reasons.
1066  
1067  **Parameters:**
1068  
1069  - **recreate_index** (<code>bool</code>) – Use drop and recreate strategy. (recommended for performance)
1070  - **batch_size** (<code>int</code>) – Only relevant if recreate_index is false. Defines the deletion batch size.
1071    Note that this parameter needs to be less or equal to the set `QUERY_MAXIMUM_RESULTS` variable
1072    set for the weaviate deployment (default is 10000).
1073    Reference: https://docs.weaviate.io/weaviate/manage-objects/delete#delete-all-objects
1074  
1075  #### delete_all_documents_async
1076  
1077  ```python
1078  delete_all_documents_async(
1079      *, recreate_index: bool = False, batch_size: int = 1000
1080  ) -> None
1081  ```
1082  
1083  Asynchronously deletes all documents in a collection.
1084  
1085  If recreate_index is False, it keeps the collection but deletes documents iteratively.
1086  If recreate_index is True, the collection is dropped and faithfully recreated.
1087  This is recommended for performance reasons.
1088  
1089  **Parameters:**
1090  
1091  - **recreate_index** (<code>bool</code>) – Use drop and recreate strategy. (recommended for performance)
1092  - **batch_size** (<code>int</code>) – Only relevant if recreate_index is false. Defines the deletion batch size.
1093    Note that this parameter needs to be less or equal to the set `QUERY_MAXIMUM_RESULTS` variable
1094    set for the weaviate deployment (default is 10000).
1095    Reference: https://docs.weaviate.io/weaviate/manage-objects/delete#delete-all-objects
1096  
1097  #### delete_by_filter
1098  
1099  ```python
1100  delete_by_filter(filters: dict[str, Any]) -> int
1101  ```
1102  
1103  Deletes all documents that match the provided filters.
1104  
1105  **Parameters:**
1106  
1107  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for deletion.
1108    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
1109  
1110  **Returns:**
1111  
1112  - <code>int</code> – The number of documents deleted.
1113  
1114  #### delete_by_filter_async
1115  
1116  ```python
1117  delete_by_filter_async(filters: dict[str, Any]) -> int
1118  ```
1119  
1120  Asynchronously deletes all documents that match the provided filters.
1121  
1122  **Parameters:**
1123  
1124  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for deletion.
1125    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
1126  
1127  **Returns:**
1128  
1129  - <code>int</code> – The number of documents deleted.
1130  
1131  #### update_by_filter
1132  
1133  ```python
1134  update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int
1135  ```
1136  
1137  Updates the metadata of all documents that match the provided filters.
1138  
1139  **Parameters:**
1140  
1141  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for updating.
1142    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
1143  - **meta** (<code>dict\[str, Any\]</code>) – The metadata fields to update. These will be merged with existing metadata.
1144  
1145  **Returns:**
1146  
1147  - <code>int</code> – The number of documents updated.
1148  
1149  #### update_by_filter_async
1150  
1151  ```python
1152  update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int
1153  ```
1154  
1155  Asynchronously updates the metadata of all documents that match the provided filters.
1156  
1157  **Parameters:**
1158  
1159  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for updating.
1160    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
1161  - **meta** (<code>dict\[str, Any\]</code>) – The metadata fields to update. These will be merged with existing metadata.
1162  
1163  **Returns:**
1164  
1165  - <code>int</code> – The number of documents updated.