Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.22 / integrations-api / arcadedb.md
arcadedb.md
  1  ---
  2  title: "ArcadeDB"
  3  id: integrations-arcadedb
  4  description: "ArcadeDB integration for Haystack"
  5  slug: "/integrations-arcadedb"
  6  ---
  7  
  8  
  9  ## haystack_integrations.components.retrievers.arcadedb.embedding_retriever
 10  
 11  ### ArcadeDBEmbeddingRetriever
 12  
 13  Retrieve documents from ArcadeDB using vector similarity (LSM_VECTOR / HNSW index).
 14  
 15  Usage example:
 16  
 17  ```python
 18  from haystack import Document
 19  from haystack.components.embedders import SentenceTransformersTextEmbedder
 20  from haystack_integrations.components.retrievers.arcadedb import ArcadeDBEmbeddingRetriever
 21  from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore
 22  
 23  store = ArcadeDBDocumentStore(database="mydb")
 24  retriever = ArcadeDBEmbeddingRetriever(document_store=store, top_k=5)
 25  
 26  # Add documents to DocumentStore
 27  documents = [
 28      Document(text="My name is Carla and I live in Berlin"),
 29      Document(text="My name is Paul and I live in New York"),
 30      Document(text="My name is Silvano and I live in Matera"),
 31      Document(text="My name is Usagi Tsukino and I live in Tokyo"),
 32  ]
 33  document_store.write_documents(documents)
 34  
 35  embedder = SentenceTransformersTextEmbedder()
 36  query_embeddings = embedder.run("Who lives in Berlin?")["embedding"]
 37  
 38  result = retriever.run(query=query_embeddings)
 39  for doc in result["documents"]:
 40      print(doc.content)
 41  ```
 42  
 43  #### __init__
 44  
 45  ```python
 46  __init__(
 47      *,
 48      document_store: ArcadeDBDocumentStore,
 49      filters: dict[str, Any] | None = None,
 50      top_k: int = 10,
 51      filter_policy: FilterPolicy = FilterPolicy.REPLACE
 52  ) -> None
 53  ```
 54  
 55  Create an ArcadeDBEmbeddingRetriever.
 56  
 57  **Parameters:**
 58  
 59  - **document_store** (<code>ArcadeDBDocumentStore</code>) – An instance of `ArcadeDBDocumentStore`.
 60  - **filters** (<code>dict\[str, Any\] | None</code>) – Default filters applied to every retrieval call.
 61  - **top_k** (<code>int</code>) – Maximum number of documents to return.
 62  - **filter_policy** (<code>FilterPolicy</code>) – How runtime filters interact with default filters.
 63  
 64  #### run
 65  
 66  ```python
 67  run(
 68      query_embedding: list[float],
 69      filters: dict[str, Any] | None = None,
 70      top_k: int | None = None,
 71  ) -> dict[str, list[Document]]
 72  ```
 73  
 74  Retrieve documents by vector similarity.
 75  
 76  **Parameters:**
 77  
 78  - **query_embedding** (<code>list\[float\]</code>) – The embedding vector to search with.
 79  - **filters** (<code>dict\[str, Any\] | None</code>) – Optional filters to narrow results.
 80  - **top_k** (<code>int | None</code>) – Maximum number of documents to return.
 81  
 82  **Returns:**
 83  
 84  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
 85  - `documents`: List of `Document`s most similar to the given `query_embedding`
 86  
 87  #### to_dict
 88  
 89  ```python
 90  to_dict() -> dict[str, Any]
 91  ```
 92  
 93  Serializes the component to a dictionary.
 94  
 95  **Returns:**
 96  
 97  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 98  
 99  #### from_dict
100  
101  ```python
102  from_dict(data: dict[str, Any]) -> ArcadeDBEmbeddingRetriever
103  ```
104  
105  Deserializes the component from a dictionary.
106  
107  **Parameters:**
108  
109  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
110  
111  **Returns:**
112  
113  - <code>ArcadeDBEmbeddingRetriever</code> – Deserialized component.
114  
115  ## haystack_integrations.document_stores.arcadedb.document_store
116  
117  ArcadeDB DocumentStore for Haystack 2.x — document storage + vector search via HTTP/JSON API.
118  
119  ### ArcadeDBDocumentStore
120  
121  An ArcadeDB-backed DocumentStore for Haystack 2.x.
122  
123  Uses ArcadeDB's HTTP/JSON API for all operations — no special drivers required.
124  Supports HNSW vector search (LSM_VECTOR) and SQL metadata filtering.
125  
126  Usage example:
127  
128  ```python
129  from haystack.dataclasses.document import Document
130  from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore
131  
132  document_store = ArcadeDBDocumentStore(
133      url="http://localhost:2480",
134      database="haystack",
135      embedding_dimension=768,
136  )
137  document_store.write_documents([
138      Document(content="This is first", embedding=[0.0]*5),
139      Document(content="This is second", embedding=[0.1, 0.2, 0.3, 0.4, 0.5])
140  ])
141  ```
142  
143  #### __init__
144  
145  ```python
146  __init__(
147      *,
148      url: str = "http://localhost:2480",
149      database: str = "haystack",
150      username: Secret = Secret.from_env_var("ARCADEDB_USERNAME", strict=False),
151      password: Secret = Secret.from_env_var("ARCADEDB_PASSWORD", strict=False),
152      type_name: str = "Document",
153      embedding_dimension: int = 768,
154      similarity_function: str = "cosine",
155      recreate_type: bool = False,
156      create_database: bool = True
157  ) -> None
158  ```
159  
160  Create an ArcadeDBDocumentStore instance.
161  
162  **Parameters:**
163  
164  - **url** (<code>str</code>) – ArcadeDB HTTP endpoint.
165  - **database** (<code>str</code>) – Database name.
166  - **username** (<code>Secret</code>) – HTTP Basic Auth username (default: `ARCADEDB_USERNAME` env var).
167  - **password** (<code>Secret</code>) – HTTP Basic Auth password (default: `ARCADEDB_PASSWORD` env var).
168  - **type_name** (<code>str</code>) – Vertex type name for documents.
169  - **embedding_dimension** (<code>int</code>) – Vector dimension for the HNSW index.
170  - **similarity_function** (<code>str</code>) – Distance metric — `"cosine"`, `"euclidean"`, or `"dot"`.
171  - **recreate_type** (<code>bool</code>) – If `True`, drop and recreate the type on initialization.
172  - **create_database** (<code>bool</code>) – If `True`, create the database if it doesn't exist.
173  
174  #### to_dict
175  
176  ```python
177  to_dict() -> dict[str, Any]
178  ```
179  
180  Serializes the DocumentStore to a dictionary.
181  
182  **Returns:**
183  
184  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
185  
186  #### from_dict
187  
188  ```python
189  from_dict(data: dict[str, Any]) -> ArcadeDBDocumentStore
190  ```
191  
192  Deserializes the DocumentStore from a dictionary.
193  
194  **Parameters:**
195  
196  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
197  
198  **Returns:**
199  
200  - <code>ArcadeDBDocumentStore</code> – The deserialized DocumentStore.
201  
202  #### count_documents
203  
204  ```python
205  count_documents() -> int
206  ```
207  
208  Returns how many documents are present in the document store.
209  
210  **Returns:**
211  
212  - <code>int</code> – Number of documents in the document store.
213  
214  #### filter_documents
215  
216  ```python
217  filter_documents(filters: dict[str, Any] | None = None) -> list[Document]
218  ```
219  
220  Return documents matching the given filters.
221  
222  **Parameters:**
223  
224  - **filters** (<code>dict\[str, Any\] | None</code>) – Haystack filter dictionary.
225  
226  **Returns:**
227  
228  - <code>list\[Document\]</code> – List of matching documents.
229  
230  #### write_documents
231  
232  ```python
233  write_documents(
234      documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
235  ) -> int
236  ```
237  
238  Write documents to the store.
239  
240  **Parameters:**
241  
242  - **documents** (<code>list\[Document\]</code>) – List of Haystack Documents to write.
243  - **policy** (<code>DuplicatePolicy</code>) – How to handle duplicate document IDs.
244  
245  **Returns:**
246  
247  - <code>int</code> – Number of documents written.
248  
249  #### delete_documents
250  
251  ```python
252  delete_documents(document_ids: list[str]) -> None
253  ```
254  
255  Delete documents by their IDs.
256  
257  **Parameters:**
258  
259  - **document_ids** (<code>list\[str\]</code>) – List of document IDs to delete.
260  
261  #### delete_all_documents
262  
263  ```python
264  delete_all_documents() -> None
265  ```
266  
267  Deletes all documents in the document store.
268  
269  #### delete_by_filter
270  
271  ```python
272  delete_by_filter(filters: dict[str, Any]) -> int
273  ```
274  
275  Deletes all documents that match the provided filters.
276  
277  **Parameters:**
278  
279  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for deletion.
280    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
281  
282  **Returns:**
283  
284  - <code>int</code> – The number of documents deleted.
285  
286  #### update_by_filter
287  
288  ```python
289  update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int
290  ```
291  
292  Updates the metadata of all documents that match the provided filters.
293  
294  **Parameters:**
295  
296  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for updating.
297    For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering)
298  - **meta** (<code>dict\[str, Any\]</code>) – The metadata fields to update.
299  
300  **Returns:**
301  
302  - <code>int</code> – The number of documents updated.
303  
304  #### count_documents_by_filter
305  
306  ```python
307  count_documents_by_filter(filters: dict[str, Any]) -> int
308  ```
309  
310  Counts the number of documents matching the provided filter
311  
312  **Parameters:**
313  
314  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to the documents
315  
316  **Returns:**
317  
318  - <code>int</code> – The number of documents that match the filter
319  
320  #### count_unique_metadata_by_filter
321  
322  ```python
323  count_unique_metadata_by_filter(
324      filters: dict[str, Any], metadata_fields: list[str]
325  ) -> dict[str, int]
326  ```
327  
328  Counts unique values for each metadata field in documents matching the provided filters.
329  
330  **Parameters:**
331  
332  - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to the document list.
333  - **metadata_fields** (<code>list\[str\]</code>) – Metadata fields for which to count unique values.
334  
335  **Returns:**
336  
337  - <code>dict\[str, int\]</code> – A dictionary where keys are metadata field names and values are the
338    counts of unique values for that field.
339  
340  #### get_metadata_fields_info
341  
342  ```python
343  get_metadata_fields_info() -> dict[str, dict[str, str]]
344  ```
345  
346  Returns the metadata fields and their corresponding types based on sampled documents.
347  
348  **Returns:**
349  
350  - <code>dict\[str, dict\[str, str\]\]</code> – A dictionary mapping field names to dictionaries with a `type` key.
351  
352  #### get_metadata_field_min_max
353  
354  ```python
355  get_metadata_field_min_max(metadata_field: str) -> dict[str, Any]
356  ```
357  
358  For a given metadata field, finds its min and max values.
359  
360  **Parameters:**
361  
362  - **metadata_field** (<code>str</code>) – The metadata field to inspect.
363  
364  **Returns:**
365  
366  - <code>dict\[str, Any\]</code> – A dictionary with `min` and `max` keys and their corresponding values.
367  
368  #### get_metadata_field_unique_values
369  
370  ```python
371  get_metadata_field_unique_values(
372      metadata_field: str,
373      search_term: str | None = None,
374      from_: int = 0,
375      size: int = 10,
376  ) -> tuple[list[str], int]
377  ```
378  
379  Retrieves unique values for a field matching a search term or all possible values
380  if no search term is given.
381  
382  **Parameters:**
383  
384  - **metadata_field** (<code>str</code>) – The metadata field to inspect.
385  - **search_term** (<code>str | None</code>) – Optional case-insensitive substring search term.
386  - **from\_** (<code>int</code>) – The starting index for pagination.
387  - **size** (<code>int</code>) – The number of values to return.
388  
389  **Returns:**
390  
391  - <code>tuple\[list\[str\], int\]</code> – A tuple containing the paginated values and the total count.