arcadedb.md
1 --- 2 title: "ArcadeDB" 3 id: integrations-arcadedb 4 description: "ArcadeDB integration for Haystack" 5 slug: "/integrations-arcadedb" 6 --- 7 8 9 ## haystack_integrations.components.retrievers.arcadedb.embedding_retriever 10 11 ### ArcadeDBEmbeddingRetriever 12 13 Retrieve documents from ArcadeDB using vector similarity (LSM_VECTOR / HNSW index). 14 15 Usage example: 16 17 ```python 18 from haystack import Document 19 from haystack.components.embedders import SentenceTransformersTextEmbedder 20 from haystack_integrations.components.retrievers.arcadedb import ArcadeDBEmbeddingRetriever 21 from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore 22 23 store = ArcadeDBDocumentStore(database="mydb") 24 retriever = ArcadeDBEmbeddingRetriever(document_store=store, top_k=5) 25 26 # Add documents to DocumentStore 27 documents = [ 28 Document(text="My name is Carla and I live in Berlin"), 29 Document(text="My name is Paul and I live in New York"), 30 Document(text="My name is Silvano and I live in Matera"), 31 Document(text="My name is Usagi Tsukino and I live in Tokyo"), 32 ] 33 document_store.write_documents(documents) 34 35 embedder = SentenceTransformersTextEmbedder() 36 query_embeddings = embedder.run("Who lives in Berlin?")["embedding"] 37 38 result = retriever.run(query=query_embeddings) 39 for doc in result["documents"]: 40 print(doc.content) 41 ``` 42 43 #### __init__ 44 45 ```python 46 __init__( 47 *, 48 document_store: ArcadeDBDocumentStore, 49 filters: dict[str, Any] | None = None, 50 top_k: int = 10, 51 filter_policy: FilterPolicy = FilterPolicy.REPLACE 52 ) -> None 53 ``` 54 55 Create an ArcadeDBEmbeddingRetriever. 56 57 **Parameters:** 58 59 - **document_store** (<code>ArcadeDBDocumentStore</code>) – An instance of `ArcadeDBDocumentStore`. 60 - **filters** (<code>dict\[str, Any\] | None</code>) – Default filters applied to every retrieval call. 61 - **top_k** (<code>int</code>) – Maximum number of documents to return. 62 - **filter_policy** (<code>FilterPolicy</code>) – How runtime filters interact with default filters. 63 64 #### run 65 66 ```python 67 run( 68 query_embedding: list[float], 69 filters: dict[str, Any] | None = None, 70 top_k: int | None = None, 71 ) -> dict[str, list[Document]] 72 ``` 73 74 Retrieve documents by vector similarity. 75 76 **Parameters:** 77 78 - **query_embedding** (<code>list\[float\]</code>) – The embedding vector to search with. 79 - **filters** (<code>dict\[str, Any\] | None</code>) – Optional filters to narrow results. 80 - **top_k** (<code>int | None</code>) – Maximum number of documents to return. 81 82 **Returns:** 83 84 - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys: 85 - `documents`: List of `Document`s most similar to the given `query_embedding` 86 87 #### to_dict 88 89 ```python 90 to_dict() -> dict[str, Any] 91 ``` 92 93 Serializes the component to a dictionary. 94 95 **Returns:** 96 97 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 98 99 #### from_dict 100 101 ```python 102 from_dict(data: dict[str, Any]) -> ArcadeDBEmbeddingRetriever 103 ``` 104 105 Deserializes the component from a dictionary. 106 107 **Parameters:** 108 109 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 110 111 **Returns:** 112 113 - <code>ArcadeDBEmbeddingRetriever</code> – Deserialized component. 114 115 ## haystack_integrations.document_stores.arcadedb.document_store 116 117 ArcadeDB DocumentStore for Haystack 2.x — document storage + vector search via HTTP/JSON API. 118 119 ### ArcadeDBDocumentStore 120 121 An ArcadeDB-backed DocumentStore for Haystack 2.x. 122 123 Uses ArcadeDB's HTTP/JSON API for all operations — no special drivers required. 124 Supports HNSW vector search (LSM_VECTOR) and SQL metadata filtering. 125 126 Usage example: 127 128 ```python 129 from haystack.dataclasses.document import Document 130 from haystack_integrations.document_stores.arcadedb import ArcadeDBDocumentStore 131 132 document_store = ArcadeDBDocumentStore( 133 url="http://localhost:2480", 134 database="haystack", 135 embedding_dimension=768, 136 ) 137 document_store.write_documents([ 138 Document(content="This is first", embedding=[0.0]*5), 139 Document(content="This is second", embedding=[0.1, 0.2, 0.3, 0.4, 0.5]) 140 ]) 141 ``` 142 143 #### __init__ 144 145 ```python 146 __init__( 147 *, 148 url: str = "http://localhost:2480", 149 database: str = "haystack", 150 username: Secret = Secret.from_env_var("ARCADEDB_USERNAME", strict=False), 151 password: Secret = Secret.from_env_var("ARCADEDB_PASSWORD", strict=False), 152 type_name: str = "Document", 153 embedding_dimension: int = 768, 154 similarity_function: str = "cosine", 155 recreate_type: bool = False, 156 create_database: bool = True 157 ) -> None 158 ``` 159 160 Create an ArcadeDBDocumentStore instance. 161 162 **Parameters:** 163 164 - **url** (<code>str</code>) – ArcadeDB HTTP endpoint. 165 - **database** (<code>str</code>) – Database name. 166 - **username** (<code>Secret</code>) – HTTP Basic Auth username (default: `ARCADEDB_USERNAME` env var). 167 - **password** (<code>Secret</code>) – HTTP Basic Auth password (default: `ARCADEDB_PASSWORD` env var). 168 - **type_name** (<code>str</code>) – Vertex type name for documents. 169 - **embedding_dimension** (<code>int</code>) – Vector dimension for the HNSW index. 170 - **similarity_function** (<code>str</code>) – Distance metric — `"cosine"`, `"euclidean"`, or `"dot"`. 171 - **recreate_type** (<code>bool</code>) – If `True`, drop and recreate the type on initialization. 172 - **create_database** (<code>bool</code>) – If `True`, create the database if it doesn't exist. 173 174 #### to_dict 175 176 ```python 177 to_dict() -> dict[str, Any] 178 ``` 179 180 Serializes the DocumentStore to a dictionary. 181 182 **Returns:** 183 184 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 185 186 #### from_dict 187 188 ```python 189 from_dict(data: dict[str, Any]) -> ArcadeDBDocumentStore 190 ``` 191 192 Deserializes the DocumentStore from a dictionary. 193 194 **Parameters:** 195 196 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 197 198 **Returns:** 199 200 - <code>ArcadeDBDocumentStore</code> – The deserialized DocumentStore. 201 202 #### count_documents 203 204 ```python 205 count_documents() -> int 206 ``` 207 208 Returns how many documents are present in the document store. 209 210 **Returns:** 211 212 - <code>int</code> – Number of documents in the document store. 213 214 #### filter_documents 215 216 ```python 217 filter_documents(filters: dict[str, Any] | None = None) -> list[Document] 218 ``` 219 220 Return documents matching the given filters. 221 222 **Parameters:** 223 224 - **filters** (<code>dict\[str, Any\] | None</code>) – Haystack filter dictionary. 225 226 **Returns:** 227 228 - <code>list\[Document\]</code> – List of matching documents. 229 230 #### write_documents 231 232 ```python 233 write_documents( 234 documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE 235 ) -> int 236 ``` 237 238 Write documents to the store. 239 240 **Parameters:** 241 242 - **documents** (<code>list\[Document\]</code>) – List of Haystack Documents to write. 243 - **policy** (<code>DuplicatePolicy</code>) – How to handle duplicate document IDs. 244 245 **Returns:** 246 247 - <code>int</code> – Number of documents written. 248 249 #### delete_documents 250 251 ```python 252 delete_documents(document_ids: list[str]) -> None 253 ``` 254 255 Delete documents by their IDs. 256 257 **Parameters:** 258 259 - **document_ids** (<code>list\[str\]</code>) – List of document IDs to delete. 260 261 #### delete_all_documents 262 263 ```python 264 delete_all_documents() -> None 265 ``` 266 267 Deletes all documents in the document store. 268 269 #### delete_by_filter 270 271 ```python 272 delete_by_filter(filters: dict[str, Any]) -> int 273 ``` 274 275 Deletes all documents that match the provided filters. 276 277 **Parameters:** 278 279 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for deletion. 280 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 281 282 **Returns:** 283 284 - <code>int</code> – The number of documents deleted. 285 286 #### update_by_filter 287 288 ```python 289 update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int 290 ``` 291 292 Updates the metadata of all documents that match the provided filters. 293 294 **Parameters:** 295 296 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to select documents for updating. 297 For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) 298 - **meta** (<code>dict\[str, Any\]</code>) – The metadata fields to update. 299 300 **Returns:** 301 302 - <code>int</code> – The number of documents updated. 303 304 #### count_documents_by_filter 305 306 ```python 307 count_documents_by_filter(filters: dict[str, Any]) -> int 308 ``` 309 310 Counts the number of documents matching the provided filter 311 312 **Parameters:** 313 314 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to the documents 315 316 **Returns:** 317 318 - <code>int</code> – The number of documents that match the filter 319 320 #### count_unique_metadata_by_filter 321 322 ```python 323 count_unique_metadata_by_filter( 324 filters: dict[str, Any], metadata_fields: list[str] 325 ) -> dict[str, int] 326 ``` 327 328 Counts unique values for each metadata field in documents matching the provided filters. 329 330 **Parameters:** 331 332 - **filters** (<code>dict\[str, Any\]</code>) – The filters to apply to the document list. 333 - **metadata_fields** (<code>list\[str\]</code>) – Metadata fields for which to count unique values. 334 335 **Returns:** 336 337 - <code>dict\[str, int\]</code> – A dictionary where keys are metadata field names and values are the 338 counts of unique values for that field. 339 340 #### get_metadata_fields_info 341 342 ```python 343 get_metadata_fields_info() -> dict[str, dict[str, str]] 344 ``` 345 346 Returns the metadata fields and their corresponding types based on sampled documents. 347 348 **Returns:** 349 350 - <code>dict\[str, dict\[str, str\]\]</code> – A dictionary mapping field names to dictionaries with a `type` key. 351 352 #### get_metadata_field_min_max 353 354 ```python 355 get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] 356 ``` 357 358 For a given metadata field, finds its min and max values. 359 360 **Parameters:** 361 362 - **metadata_field** (<code>str</code>) – The metadata field to inspect. 363 364 **Returns:** 365 366 - <code>dict\[str, Any\]</code> – A dictionary with `min` and `max` keys and their corresponding values. 367 368 #### get_metadata_field_unique_values 369 370 ```python 371 get_metadata_field_unique_values( 372 metadata_field: str, 373 search_term: str | None = None, 374 from_: int = 0, 375 size: int = 10, 376 ) -> tuple[list[str], int] 377 ``` 378 379 Retrieves unique values for a field matching a search term or all possible values 380 if no search term is given. 381 382 **Parameters:** 383 384 - **metadata_field** (<code>str</code>) – The metadata field to inspect. 385 - **search_term** (<code>str | None</code>) – Optional case-insensitive substring search term. 386 - **from\_** (<code>int</code>) – The starting index for pagination. 387 - **size** (<code>int</code>) – The number of values to return. 388 389 **Returns:** 390 391 - <code>tuple\[list\[str\], int\]</code> – A tuple containing the paginated values and the total count.