Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.18 / integrations-api / pgvector.md
pgvector.md
  1  ---
  2  title: "Pgvector"
  3  id: integrations-pgvector
  4  description: "Pgvector integration for Haystack"
  5  slug: "/integrations-pgvector"
  6  ---
  7  
  8  
  9  ## haystack_integrations.components.retrievers.pgvector.embedding_retriever
 10  
 11  ### PgvectorEmbeddingRetriever
 12  
 13  Retrieves documents from the `PgvectorDocumentStore`, based on their dense embeddings.
 14  
 15  Example usage:
 16  
 17  ```python
 18  from haystack.document_stores import DuplicatePolicy
 19  from haystack import Document, Pipeline
 20  from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
 21  
 22  from haystack_integrations.document_stores.pgvector import PgvectorDocumentStore
 23  from haystack_integrations.components.retrievers.pgvector import PgvectorEmbeddingRetriever
 24  
 25  # Set an environment variable `PG_CONN_STR` with the connection string to your PostgreSQL database.
 26  # e.g., "postgresql://USER:PASSWORD@HOST:PORT/DB_NAME"
 27  
 28  document_store = PgvectorDocumentStore(
 29      embedding_dimension=768,
 30      vector_function="cosine_similarity",
 31      recreate_table=True,
 32  )
 33  
 34  documents = [Document(content="There are over 7,000 languages spoken around the world today."),
 35               Document(content="Elephants have been observed to behave in a way that indicates..."),
 36               Document(content="In certain places, you can witness the phenomenon of bioluminescent waves.")]
 37  
 38  document_embedder = SentenceTransformersDocumentEmbedder()
 39  document_embedder.warm_up()
 40  documents_with_embeddings = document_embedder.run(documents)
 41  
 42  document_store.write_documents(documents_with_embeddings.get("documents"), policy=DuplicatePolicy.OVERWRITE)
 43  
 44  query_pipeline = Pipeline()
 45  query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
 46  query_pipeline.add_component("retriever", PgvectorEmbeddingRetriever(document_store=document_store))
 47  query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
 48  
 49  query = "How many languages are there?"
 50  
 51  res = query_pipeline.run({"text_embedder": {"text": query}})
 52  
 53  assert res['retriever']['documents'][0].content == "There are over 7,000 languages spoken around the world today."
 54  ```
 55  
 56  #### __init__
 57  
 58  ```python
 59  __init__(
 60      *,
 61      document_store: PgvectorDocumentStore,
 62      filters: dict[str, Any] | None = None,
 63      top_k: int = 10,
 64      vector_function: (
 65          Literal["cosine_similarity", "inner_product", "l2_distance"] | None
 66      ) = None,
 67      filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
 68  )
 69  ```
 70  
 71  **Parameters:**
 72  
 73  - **document_store** (<code>PgvectorDocumentStore</code>) – An instance of `PgvectorDocumentStore`.
 74  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents.
 75  - **top_k** (<code>int</code>) – Maximum number of Documents to return.
 76  - **vector_function** (<code>Literal['cosine_similarity', 'inner_product', 'l2_distance'] | None</code>) – The similarity function to use when searching for similar embeddings.
 77    Defaults to the one set in the `document_store` instance.
 78    `"cosine_similarity"` and `"inner_product"` are similarity functions and
 79    higher scores indicate greater similarity between the documents.
 80    `"l2_distance"` returns the straight-line distance between vectors,
 81    and the most similar documents are the ones with the smallest score.
 82    **Important**: if the document store is using the `"hnsw"` search strategy, the vector function
 83    should match the one utilized during index creation to take advantage of the index.
 84  - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied.
 85  
 86  **Raises:**
 87  
 88  - <code>ValueError</code> – If `document_store` is not an instance of `PgvectorDocumentStore` or if `vector_function`
 89    is not one of the valid options.
 90  
 91  #### to_dict
 92  
 93  ```python
 94  to_dict() -> dict[str, Any]
 95  ```
 96  
 97  Serializes the component to a dictionary.
 98  
 99  **Returns:**
100  
101  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
102  
103  #### from_dict
104  
105  ```python
106  from_dict(data: dict[str, Any]) -> PgvectorEmbeddingRetriever
107  ```
108  
109  Deserializes the component from a dictionary.
110  
111  **Parameters:**
112  
113  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
114  
115  **Returns:**
116  
117  - <code>PgvectorEmbeddingRetriever</code> – Deserialized component.
118  
119  #### run
120  
121  ```python
122  run(
123      query_embedding: list[float],
124      filters: dict[str, Any] | None = None,
125      top_k: int | None = None,
126      vector_function: (
127          Literal["cosine_similarity", "inner_product", "l2_distance"] | None
128      ) = None,
129  ) -> dict[str, list[Document]]
130  ```
131  
132  Retrieve documents from the `PgvectorDocumentStore`, based on their embeddings.
133  
134  **Parameters:**
135  
136  - **query_embedding** (<code>list\[float\]</code>) – Embedding of the query.
137  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on
138    the `filter_policy` chosen at retriever initialization. See init method docstring for more
139    details.
140  - **top_k** (<code>int | None</code>) – Maximum number of Documents to return.
141  - **vector_function** (<code>Literal['cosine_similarity', 'inner_product', 'l2_distance'] | None</code>) – The similarity function to use when searching for similar embeddings.
142  
143  **Returns:**
144  
145  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
146  - `documents`: List of `Document`s that are similar to `query_embedding`.
147  
148  #### run_async
149  
150  ```python
151  run_async(
152      query_embedding: list[float],
153      filters: dict[str, Any] | None = None,
154      top_k: int | None = None,
155      vector_function: (
156          Literal["cosine_similarity", "inner_product", "l2_distance"] | None
157      ) = None,
158  ) -> dict[str, list[Document]]
159  ```
160  
161  Asynchronously retrieve documents from the `PgvectorDocumentStore`, based on their embeddings.
162  
163  **Parameters:**
164  
165  - **query_embedding** (<code>list\[float\]</code>) – Embedding of the query.
166  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on
167    the `filter_policy` chosen at retriever initialization. See init method docstring for more
168    details.
169  - **top_k** (<code>int | None</code>) – Maximum number of Documents to return.
170  - **vector_function** (<code>Literal['cosine_similarity', 'inner_product', 'l2_distance'] | None</code>) – The similarity function to use when searching for similar embeddings.
171  
172  **Returns:**
173  
174  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
175  - `documents`: List of `Document`s that are similar to `query_embedding`.
176  
177  ## haystack_integrations.components.retrievers.pgvector.keyword_retriever
178  
179  ### PgvectorKeywordRetriever
180  
181  Retrieve documents from the `PgvectorDocumentStore`, based on keywords.
182  
183  To rank the documents, the `ts_rank_cd` function of PostgreSQL is used.
184  It considers how often the query terms appear in the document, how close together the terms are in the document,
185  and how important is the part of the document where they occur.
186  For more details, see
187  [Postgres documentation](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING).
188  
189  Usage example:
190  
191  ````python
192  from haystack.document_stores import DuplicatePolicy
193  from haystack import Document
194  
195  from haystack_integrations.document_stores.pgvector import PgvectorDocumentStore
196  from haystack_integrations.components.retrievers.pgvector import PgvectorKeywordRetriever
197  
198  # Set an environment variable `PG_CONN_STR` with the connection string to your PostgreSQL database.
199  # e.g., "postgresql://USER:PASSWORD@HOST:PORT/DB_NAME"
200  
201  document_store = PgvectorDocumentStore(language="english", recreate_table=True)
202  
203  documents = [Document(content="There are over 7,000 languages spoken around the world today."),
204      Document(content="Elephants have been observed to behave in a way that indicates..."),
205      Document(content="In certain places, you can witness the phenomenon of bioluminescent waves.")]
206  
207  document_store.write_documents(documents_with_embeddings.get("documents"), policy=DuplicatePolicy.OVERWRITE)
208  
209  retriever = PgvectorKeywordRetriever(document_store=document_store)
210  
211  result = retriever.run(query="languages")
212  
213  assert res['retriever']['documents'][0].content == "There are over 7,000 languages spoken around the world today."
214  
215  
216  
217  
218  
219  
220  
221  
222  
223  
224  
225  
226  #### __init__
227  
228  ```python
229  __init__(
230      *,
231      document_store: PgvectorDocumentStore,
232      filters: dict[str, Any] | None = None,
233      top_k: int = 10,
234      filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
235  )
236  ````
237  
238  **Parameters:**
239  
240  - **document_store** (<code>PgvectorDocumentStore</code>) – An instance of `PgvectorDocumentStore`.
241  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents.
242  - **top_k** (<code>int</code>) – Maximum number of Documents to return.
243  - **filter_policy** (<code>str | FilterPolicy</code>) – Policy to determine how filters are applied.
244  
245  **Raises:**
246  
247  - <code>ValueError</code> – If `document_store` is not an instance of `PgvectorDocumentStore`.
248  
249  #### to_dict
250  
251  ```python
252  to_dict() -> dict[str, Any]
253  ```
254  
255  Serializes the component to a dictionary.
256  
257  **Returns:**
258  
259  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
260  
261  #### from_dict
262  
263  ```python
264  from_dict(data: dict[str, Any]) -> PgvectorKeywordRetriever
265  ```
266  
267  Deserializes the component from a dictionary.
268  
269  **Parameters:**
270  
271  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
272  
273  **Returns:**
274  
275  - <code>PgvectorKeywordRetriever</code> – Deserialized component.
276  
277  #### run
278  
279  ```python
280  run(
281      query: str, filters: dict[str, Any] | None = None, top_k: int | None = None
282  ) -> dict[str, list[Document]]
283  ```
284  
285  Retrieve documents from the `PgvectorDocumentStore`, based on keywords.
286  
287  **Parameters:**
288  
289  - **query** (<code>str</code>) – String to search in `Document`s' content.
290  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on
291    the `filter_policy` chosen at retriever initialization. See init method docstring for more
292    details.
293  - **top_k** (<code>int | None</code>) – Maximum number of Documents to return.
294  
295  **Returns:**
296  
297  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
298  - `documents`: List of `Document`s that match the query.
299  
300  #### run_async
301  
302  ```python
303  run_async(
304      query: str, filters: dict[str, Any] | None = None, top_k: int | None = None
305  ) -> dict[str, list[Document]]
306  ```
307  
308  Asynchronously retrieve documents from the `PgvectorDocumentStore`, based on keywords.
309  
310  **Parameters:**
311  
312  - **query** (<code>str</code>) – String to search in `Document`s' content.
313  - **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on
314    the `filter_policy` chosen at retriever initialization. See init method docstring for more
315    details.
316  - **top_k** (<code>int | None</code>) – Maximum number of Documents to return.
317  
318  **Returns:**
319  
320  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
321  - `documents`: List of `Document`s that match the query.
322  
323  ## haystack_integrations.document_stores.pgvector.document_store
324  
325  ### PgvectorDocumentStore
326  
327  A Document Store using PostgreSQL with the [pgvector extension](https://github.com/pgvector/pgvector) installed.
328  
329  #### __init__
330  
331  ```python
332  __init__(
333      *,
334      connection_string: Secret = Secret.from_env_var("PG_CONN_STR"),
335      create_extension: bool = True,
336      schema_name: str = "public",
337      table_name: str = "haystack_documents",
338      language: str = "english",
339      embedding_dimension: int = 768,
340      vector_type: Literal["vector", "halfvec"] = "vector",
341      vector_function: Literal[
342          "cosine_similarity", "inner_product", "l2_distance"
343      ] = "cosine_similarity",
344      recreate_table: bool = False,
345      search_strategy: Literal[
346          "exact_nearest_neighbor", "hnsw"
347      ] = "exact_nearest_neighbor",
348      hnsw_recreate_index_if_exists: bool = False,
349      hnsw_index_creation_kwargs: dict[str, int] | None = None,
350      hnsw_index_name: str = "haystack_hnsw_index",
351      hnsw_ef_search: int | None = None,
352      keyword_index_name: str = "haystack_keyword_index"
353  )
354  ```
355  
356  Creates a new PgvectorDocumentStore instance.
357  It is meant to be connected to a PostgreSQL database with the pgvector extension installed.
358  A specific table to store Haystack documents will be created if it doesn't exist yet.
359  
360  **Parameters:**
361  
362  - **connection_string** (<code>Secret</code>) – The connection string to use to connect to the PostgreSQL database, defined as an
363    environment variable. Supported formats:
364  - URI, e.g. `PG_CONN_STR="postgresql://USER:PASSWORD@HOST:PORT/DB_NAME"` (use percent-encoding for special
365    characters)
366  - keyword/value format, e.g. `PG_CONN_STR="host=HOST port=PORT dbname=DBNAME user=USER password=PASSWORD"`
367    See [PostgreSQL Documentation](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING)
368    for more details.
369  - **create_extension** (<code>bool</code>) – Whether to create the pgvector extension if it doesn't exist.
370    Set this to `True` (default) to automatically create the extension if it is missing.
371    Creating the extension may require superuser privileges.
372    If set to `False`, ensure the extension is already installed; otherwise, an error will be raised.
373  - **schema_name** (<code>str</code>) – The name of the schema the table is created in. The schema must already exist.
374  - **table_name** (<code>str</code>) – The name of the table to use to store Haystack documents.
375  - **language** (<code>str</code>) – The language to be used to parse query and document content in keyword retrieval.
376    To see the list of available languages, you can run the following SQL query in your PostgreSQL database:
377    `SELECT cfgname FROM pg_ts_config;`.
378    More information can be found in this [StackOverflow answer](https://stackoverflow.com/a/39752553).
379  - **embedding_dimension** (<code>int</code>) – The dimension of the embedding.
380  - **vector_type** (<code>Literal['vector', 'halfvec']</code>) – The type of vector used for embedding storage.
381    "vector" is the default.
382    "halfvec" stores embeddings in half-precision, which is particularly useful for high-dimensional embeddings
383    (dimension greater than 2,000 and up to 4,000). Requires pgvector versions 0.7.0 or later. For more
384    information, see the [pgvector documentation](https://github.com/pgvector/pgvector?tab=readme-ov-file).
385  - **vector_function** (<code>Literal['cosine_similarity', 'inner_product', 'l2_distance']</code>) – The similarity function to use when searching for similar embeddings.
386    `"cosine_similarity"` and `"inner_product"` are similarity functions and
387    higher scores indicate greater similarity between the documents.
388    `"l2_distance"` returns the straight-line distance between vectors,
389    and the most similar documents are the ones with the smallest score.
390    **Important**: when using the `"hnsw"` search strategy, an index will be created that depends on the
391    `vector_function` passed here. Make sure subsequent queries will keep using the same
392    vector similarity function in order to take advantage of the index.
393  - **recreate_table** (<code>bool</code>) – Whether to recreate the table if it already exists.
394  - **search_strategy** (<code>Literal['exact_nearest_neighbor', 'hnsw']</code>) – The search strategy to use when searching for similar embeddings.
395    `"exact_nearest_neighbor"` provides perfect recall but can be slow for large numbers of documents.
396    `"hnsw"` is an approximate nearest neighbor search strategy,
397    which trades off some accuracy for speed; it is recommended for large numbers of documents.
398    **Important**: when using the `"hnsw"` search strategy, an index will be created that depends on the
399    `vector_function` passed here. Make sure subsequent queries will keep using the same
400    vector similarity function in order to take advantage of the index.
401  - **hnsw_recreate_index_if_exists** (<code>bool</code>) – Whether to recreate the HNSW index if it already exists.
402    Only used if search_strategy is set to `"hnsw"`.
403  - **hnsw_index_creation_kwargs** (<code>dict\[str, int\] | None</code>) – Additional keyword arguments to pass to the HNSW index creation.
404    Only used if search_strategy is set to `"hnsw"`. You can find the list of valid arguments in the
405    [pgvector documentation](https://github.com/pgvector/pgvector?tab=readme-ov-file#hnsw)
406  - **hnsw_index_name** (<code>str</code>) – Index name for the HNSW index.
407  - **hnsw_ef_search** (<code>int | None</code>) – The `ef_search` parameter to use at query time. Only used if search_strategy is set to
408    `"hnsw"`. You can find more information about this parameter in the
409    [pgvector documentation](https://github.com/pgvector/pgvector?tab=readme-ov-file#hnsw).
410  - **keyword_index_name** (<code>str</code>) – Index name for the Keyword index.
411  
412  #### to_dict
413  
414  ```python
415  to_dict() -> dict[str, Any]
416  ```
417  
418  Serializes the component to a dictionary.
419  
420  **Returns:**
421  
422  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
423  
424  #### from_dict
425  
426  ```python
427  from_dict(data: dict[str, Any]) -> PgvectorDocumentStore
428  ```
429  
430  Deserializes the component from a dictionary.
431  
432  **Parameters:**
433  
434  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
435  
436  **Returns:**
437  
438  - <code>PgvectorDocumentStore</code> – Deserialized component.
439  
440  #### delete_table
441  
442  ```python
443  delete_table()
444  ```
445  
446  Deletes the table used to store Haystack documents.
447  The name of the schema (`schema_name`) and the name of the table (`table_name`)
448  are defined when initializing the `PgvectorDocumentStore`.
449  
450  #### delete_table_async
451  
452  ```python
453  delete_table_async()
454  ```
455  
456  Async method to delete the table used to store Haystack documents.
457  
458  #### count_documents
459  
460  ```python
461  count_documents() -> int
462  ```
463  
464  Returns how many documents are present in the document store.
465  
466  **Returns:**
467  
468  - <code>int</code> – Number of documents in the document store.
469  
470  #### count_documents_async
471  
472  ```python
473  count_documents_async() -> int
474  ```
475  
476  Returns how many documents are present in the document store.
477  
478  **Returns:**
479  
480  - <code>int</code> – Number of documents in the document store.
481  
482  #### filter_documents
483  
484  ```python
485  filter_documents(filters: dict[str, Any] | None = None) -> list[Document]
486  ```
487  
488  Returns the documents that match the filters provided.
489  
490  For a detailed specification of the filters,
491  refer to the [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering)
492  
493  **Parameters:**
494  
495  - **filters** (<code>dict\[str, Any\] | None</code>) – The filters to apply to the document list.
496  
497  **Returns:**
498  
499  - <code>list\[Document\]</code> – A list of Documents that match the given filters.
500  
501  **Raises:**
502  
503  - <code>TypeError</code> – If `filters` is not a dictionary.
504  - <code>ValueError</code> – If `filters` syntax is invalid.
505  
506  #### filter_documents_async
507  
508  ```python
509  filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document]
510  ```
511  
512  Asynchronously returns the documents that match the filters provided.
513  
514  For a detailed specification of the filters,
515  refer to the [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering)
516  
517  **Parameters:**
518  
519  - **filters** (<code>dict\[str, Any\] | None</code>) – The filters to apply to the document list.
520  
521  **Returns:**
522  
523  - <code>list\[Document\]</code> – A list of Documents that match the given filters.
524  
525  **Raises:**
526  
527  - <code>TypeError</code> – If `filters` is not a dictionary.
528  - <code>ValueError</code> – If `filters` syntax is invalid.
529  
530  #### write_documents
531  
532  ```python
533  write_documents(
534      documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
535  ) -> int
536  ```
537  
538  Writes documents to the document store.
539  
540  **Parameters:**
541  
542  - **documents** (<code>list\[Document\]</code>) – A list of Documents to write to the document store.
543  - **policy** (<code>DuplicatePolicy</code>) – The duplicate policy to use when writing documents.
544  
545  **Returns:**
546  
547  - <code>int</code> – The number of documents written to the document store.
548  
549  **Raises:**
550  
551  - <code>ValueError</code> – If `documents` contains objects that are not of type `Document`.
552  - <code>DuplicateDocumentError</code> – If a document with the same id already exists in the document store
553    and the policy is set to `DuplicatePolicy.FAIL` (or not specified).
554  - <code>DocumentStoreError</code> – If the write operation fails for any other reason.
555  
556  #### write_documents_async
557  
558  ```python
559  write_documents_async(
560      documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
561  ) -> int
562  ```
563  
564  Asynchronously writes documents to the document store.
565  
566  **Parameters:**
567  
568  - **documents** (<code>list\[Document\]</code>) – A list of Documents to write to the document store.
569  - **policy** (<code>DuplicatePolicy</code>) – The duplicate policy to use when writing documents.
570  
571  **Returns:**
572  
573  - <code>int</code> – The number of documents written to the document store.
574  
575  **Raises:**
576  
577  - <code>ValueError</code> – If `documents` contains objects that are not of type `Document`.
578  - <code>DuplicateDocumentError</code> – If a document with the same id already exists in the document store
579    and the policy is set to `DuplicatePolicy.FAIL` (or not specified).
580  - <code>DocumentStoreError</code> – If the write operation fails for any other reason.
581  
582  #### delete_documents
583  
584  ```python
585  delete_documents(document_ids: list[str]) -> None
586  ```
587  
588  Deletes documents that match the provided `document_ids` from the document store.
589  
590  **Parameters:**
591  
592  - **document_ids** (<code>list\[str\]</code>) – the document ids to delete
593  
594  #### delete_documents_async
595  
596  ```python
597  delete_documents_async(document_ids: list[str]) -> None
598  ```
599  
600  Asynchronously deletes documents that match the provided `document_ids` from the document store.
601  
602  **Parameters:**
603  
604  - **document_ids** (<code>list\[str\]</code>) – the document ids to delete
605  
606  #### delete_all_documents
607  
608  ```python
609  delete_all_documents() -> None
610  ```
611  
612  Deletes all documents in the document store.
613  
614  #### delete_all_documents_async
615  
616  ```python
617  delete_all_documents_async() -> None
618  ```
619  
620  Asynchronously deletes all documents in the document store.
621  
622  #### delete_by_filter
623  
624  ```python
625  delete_by_filter(filters: dict[str, Any]) -> int
626  ```
627  
628  Deletes all documents that match the provided filters.
629  
630  **Parameters:**
631  
632  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for deletion.
633    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
634  
635  **Returns:**
636  
637  - <code>int</code> – The number of documents deleted.
638  
639  #### delete_by_filter_async
640  
641  ```python
642  delete_by_filter_async(filters: dict[str, Any]) -> int
643  ```
644  
645  Asynchronously deletes all documents that match the provided filters.
646  
647  **Parameters:**
648  
649  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for deletion.
650    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
651  
652  **Returns:**
653  
654  - <code>int</code> – The number of documents deleted.
655  
656  #### update_by_filter
657  
658  ```python
659  update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int
660  ```
661  
662  Updates the metadata of all documents that match the provided filters.
663  
664  **Parameters:**
665  
666  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for updating.
667    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
668  - **meta** (<code>dict\[str, Any\]</code>) – The metadata fields to update.
669  
670  **Returns:**
671  
672  - <code>int</code> – The number of documents updated.
673  
674  #### update_by_filter_async
675  
676  ```python
677  update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int
678  ```
679  
680  Asynchronously updates the metadata of all documents that match the provided filters.
681  
682  **Parameters:**
683  
684  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for updating.
685    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
686  - **meta** (<code>dict\[str, Any\]</code>) – The metadata fields to update.
687  
688  **Returns:**
689  
690  - <code>int</code> – The number of documents updated.
691  
692  #### count_documents_by_filter
693  
694  ```python
695  count_documents_by_filter(filters: dict[str, Any]) -> int
696  ```
697  
698  Returns the number of documents that match the provided filters.
699  
700  **Parameters:**
701  
702  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents.
703    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
704  
705  **Returns:**
706  
707  - <code>int</code> – The number of documents that match the filters.
708  
709  #### count_documents_by_filter_async
710  
711  ```python
712  count_documents_by_filter_async(filters: dict[str, Any]) -> int
713  ```
714  
715  Asynchronously returns the number of documents that match the provided filters.
716  
717  **Parameters:**
718  
719  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to count documents.
720    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
721  
722  **Returns:**
723  
724  - <code>int</code> – The number of documents that match the filters.
725  
726  #### count_unique_metadata_by_filter
727  
728  ```python
729  count_unique_metadata_by_filter(
730      filters: dict[str, Any], metadata_fields: list[str]
731  ) -> dict[str, int]
732  ```
733  
734  Returns the count of unique values for each specified metadata field,
735  considering only documents that match the provided filters.
736  
737  **Parameters:**
738  
739  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents.
740    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
741  - **metadata_fields** (<code>list\[str\]</code>) – List of metadata field names to count unique values for.
742    Field names can include or omit the "meta." prefix.
743  
744  **Returns:**
745  
746  - <code>dict\[str, int\]</code> – A dictionary mapping field names to their unique value counts.
747  
748  #### count_unique_metadata_by_filter_async
749  
750  ```python
751  count_unique_metadata_by_filter_async(
752      filters: dict[str, Any], metadata_fields: list[str]
753  ) -> dict[str, int]
754  ```
755  
756  Asynchronously returns the count of unique values for each specified metadata field,
757  considering only documents that match the provided filters.
758  
759  **Parameters:**
760  
761  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents.
762    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
763  - **metadata_fields** (<code>list\[str\]</code>) – List of metadata field names to count unique values for.
764    Field names can include or omit the "meta." prefix.
765  
766  **Returns:**
767  
768  - <code>dict\[str, int\]</code> – A dictionary mapping field names to their unique value counts.
769  
770  #### get_metadata_fields_info
771  
772  ```python
773  get_metadata_fields_info() -> dict[str, dict[str, str]]
774  ```
775  
776  Returns the information about the metadata fields in the document store.
777  
778  Since metadata is stored in a JSONB field, this method analyzes actual data
779  to infer field types.
780  
781  Example return:
782  
783  ```python
784  {
785      'content': {'type': 'text'},
786      'category': {'type': 'text'},
787      'status': {'type': 'text'},
788      'priority': {'type': 'integer'},
789  }
790  ```
791  
792  **Returns:**
793  
794  - <code>dict\[str, dict\[str, str\]\]</code> – A dictionary mapping field names to their type information.
795  
796  #### get_metadata_fields_info_async
797  
798  ```python
799  get_metadata_fields_info_async() -> dict[str, dict[str, str]]
800  ```
801  
802  Asynchronously returns the information about the metadata fields in the document store.
803  
804  Since metadata is stored in a JSONB field, this method analyzes actual data
805  to infer field types.
806  
807  **Returns:**
808  
809  - <code>dict\[str, dict\[str, str\]\]</code> – A dictionary mapping field names to their type information.
810  
811  #### get_metadata_field_min_max
812  
813  ```python
814  get_metadata_field_min_max(metadata_field: str) -> dict[str, Any]
815  ```
816  
817  Returns the minimum and maximum values for a given metadata field.
818  
819  **Parameters:**
820  
821  - **metadata_field** (<code>str</code>) – The name of the metadata field. Can include or omit the "meta." prefix.
822  
823  **Returns:**
824  
825  - <code>dict\[str, Any\]</code> – A dictionary with 'min' and 'max' keys containing the minimum and maximum values.
826    For numeric fields (integer, real), returns numeric min/max.
827    For text fields, returns lexicographic min/max based on database collation.
828  
829  **Raises:**
830  
831  - <code>ValueError</code> – If the field doesn't exist or has no values.
832  
833  #### get_metadata_field_min_max_async
834  
835  ```python
836  get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any]
837  ```
838  
839  Asynchronously returns the minimum and maximum values for a given metadata field.
840  
841  **Parameters:**
842  
843  - **metadata_field** (<code>str</code>) – The name of the metadata field. Can include or omit the "meta." prefix.
844  
845  **Returns:**
846  
847  - <code>dict\[str, Any\]</code> – A dictionary with 'min' and 'max' keys containing the minimum and maximum values.
848    For numeric fields (integer, real), returns numeric min/max.
849    For text fields, returns lexicographic min/max based on database collation.
850  
851  **Raises:**
852  
853  - <code>ValueError</code> – If the field doesn't exist or has no values.
854  
855  #### get_metadata_field_unique_values
856  
857  ```python
858  get_metadata_field_unique_values(
859      metadata_field: str, search_term: str | None, from_: int, size: int
860  ) -> tuple[list[str], int]
861  ```
862  
863  Returns unique values for a given metadata field, optionally filtered by a search term.
864  
865  **Parameters:**
866  
867  - **metadata_field** (<code>str</code>) – The name of the metadata field. Can include or omit the "meta." prefix.
868  - **search_term** (<code>str | None</code>) – Optional search term to filter documents by content before extracting unique values.
869    If None, all documents are considered.
870  - **from\_** (<code>int</code>) – The offset for pagination (0-based).
871  - **size** (<code>int</code>) – The number of unique values to return.
872  
873  **Returns:**
874  
875  - <code>tuple\[list\[str\], int\]</code> – A tuple containing:
876  - A list of unique values (as strings)
877  - The total count of unique values
878  
879  #### get_metadata_field_unique_values_async
880  
881  ```python
882  get_metadata_field_unique_values_async(
883      metadata_field: str, search_term: str | None, from_: int, size: int
884  ) -> tuple[list[str], int]
885  ```
886  
887  Asynchronously returns unique values for a given metadata field, optionally filtered by a search term.
888  
889  **Parameters:**
890  
891  - **metadata_field** (<code>str</code>) – The name of the metadata field. Can include or omit the "meta." prefix.
892  - **search_term** (<code>str | None</code>) – Optional search term to filter documents by content before extracting unique values.
893    If None, all documents are considered.
894  - **from\_** (<code>int</code>) – The offset for pagination (0-based).
895  - **size** (<code>int</code>) – The number of unique values to return.
896  
897  **Returns:**
898  
899  - <code>tuple\[list\[str\], int\]</code> – A tuple containing:
900  - A list of unique values (as strings)
901  - The total count of unique values