fastembed.md
1 --- 2 title: "FastEmbed" 3 id: fastembed-embedders 4 description: "FastEmbed integration for Haystack" 5 slug: "/fastembed-embedders" 6 --- 7 8 9 ## haystack_integrations.components.embedders.fastembed.fastembed_document_embedder 10 11 ### FastembedDocumentEmbedder 12 13 FastembedDocumentEmbedder computes Document embeddings using Fastembed embedding models. 14 15 The embedding of each Document is stored in the `embedding` field of the Document. 16 17 Usage example: 18 19 ```python 20 # To use this component, install the "fastembed-haystack" package. 21 # pip install fastembed-haystack 22 23 from haystack_integrations.components.embedders.fastembed import FastembedDocumentEmbedder 24 from haystack.dataclasses import Document 25 26 doc_embedder = FastembedDocumentEmbedder( 27 model="BAAI/bge-small-en-v1.5", 28 batch_size=256, 29 ) 30 31 # Text taken from PubMed QA Dataset (https://huggingface.co/datasets/pubmed_qa) 32 document_list = [ 33 Document( 34 content=("Oxidative stress generated within inflammatory joints can produce autoimmune phenomena and joint " 35 "destruction. Radical species with oxidative activity, including reactive nitrogen species, " 36 "represent mediators of inflammation and cartilage damage."), 37 meta={ 38 "pubid": "25,445,628", 39 "long_answer": "yes", 40 }, 41 ), 42 Document( 43 content=("Plasma levels of pancreatic polypeptide (PP) rise upon food intake. Although other pancreatic " 44 "islet hormones, such as insulin and glucagon, have been extensively investigated, PP secretion " 45 "and actions are still poorly understood."), 46 meta={ 47 "pubid": "25,445,712", 48 "long_answer": "yes", 49 }, 50 ), 51 ] 52 53 result = doc_embedder.run(document_list) 54 print(f"Document Text: {result['documents'][0].content}") 55 print(f"Document Embedding: {result['documents'][0].embedding}") 56 print(f"Embedding Dimension: {len(result['documents'][0].embedding)}") 57 ``` 58 59 #### __init__ 60 61 ```python 62 __init__( 63 model: str = "BAAI/bge-small-en-v1.5", 64 cache_dir: str | None = None, 65 threads: int | None = None, 66 prefix: str = "", 67 suffix: str = "", 68 batch_size: int = 256, 69 progress_bar: bool = True, 70 parallel: int | None = None, 71 local_files_only: bool = False, 72 meta_fields_to_embed: list[str] | None = None, 73 embedding_separator: str = "\n", 74 ) -> None 75 ``` 76 77 Create an FastembedDocumentEmbedder component. 78 79 **Parameters:** 80 81 - **model** (<code>str</code>) – Local path or name of the model in Hugging Face's model hub, 82 such as `BAAI/bge-small-en-v1.5`. 83 - **cache_dir** (<code>str | None</code>) – The path to the cache directory. 84 Can be set using the `FASTEMBED_CACHE_PATH` env variable. 85 Defaults to `fastembed_cache` in the system's temp directory. 86 - **threads** (<code>int | None</code>) – The number of threads single onnxruntime session can use. Defaults to None. 87 - **prefix** (<code>str</code>) – A string to add to the beginning of each text. 88 - **suffix** (<code>str</code>) – A string to add to the end of each text. 89 - **batch_size** (<code>int</code>) – Number of strings to encode at once. 90 - **progress_bar** (<code>bool</code>) – If `True`, displays progress bar during embedding. 91 - **parallel** (<code>int | None</code>) – If > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets. 92 If 0, use all available cores. 93 If None, don't use data-parallel processing, use default onnxruntime threading instead. 94 - **local_files_only** (<code>bool</code>) – If `True`, only use the model files in the `cache_dir`. 95 - **meta_fields_to_embed** (<code>list\[str\] | None</code>) – List of meta fields that should be embedded along with the Document content. 96 - **embedding_separator** (<code>str</code>) – Separator used to concatenate the meta fields to the Document content. 97 98 #### to_dict 99 100 ```python 101 to_dict() -> dict[str, Any] 102 ``` 103 104 Serializes the component to a dictionary. 105 106 **Returns:** 107 108 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 109 110 #### warm_up 111 112 ```python 113 warm_up() -> None 114 ``` 115 116 Initializes the component. 117 118 #### run 119 120 ```python 121 run(documents: list[Document]) -> dict[str, list[Document]] 122 ``` 123 124 Embeds a list of Documents. 125 126 **Parameters:** 127 128 - **documents** (<code>list\[Document\]</code>) – List of Documents to embed. 129 130 **Returns:** 131 132 - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys: 133 - `documents`: List of Documents with each Document's `embedding` field set to the computed embeddings. 134 135 **Raises:** 136 137 - <code>TypeError</code> – If the input is not a list of Documents. 138 139 ## haystack_integrations.components.embedders.fastembed.fastembed_sparse_document_embedder 140 141 ### FastembedSparseDocumentEmbedder 142 143 FastembedSparseDocumentEmbedder computes Document embeddings using Fastembed sparse models. 144 145 Usage example: 146 147 ```python 148 from haystack_integrations.components.embedders.fastembed import FastembedSparseDocumentEmbedder 149 from haystack.dataclasses import Document 150 151 sparse_doc_embedder = FastembedSparseDocumentEmbedder( 152 model="prithivida/Splade_PP_en_v1", 153 batch_size=32, 154 ) 155 156 # Text taken from PubMed QA Dataset (https://huggingface.co/datasets/pubmed_qa) 157 document_list = [ 158 Document( 159 content=("Oxidative stress generated within inflammatory joints can produce autoimmune phenomena and joint " 160 "destruction. Radical species with oxidative activity, including reactive nitrogen species, " 161 "represent mediators of inflammation and cartilage damage."), 162 meta={ 163 "pubid": "25,445,628", 164 "long_answer": "yes", 165 }, 166 ), 167 Document( 168 content=("Plasma levels of pancreatic polypeptide (PP) rise upon food intake. Although other pancreatic " 169 "islet hormones, such as insulin and glucagon, have been extensively investigated, PP secretion " 170 "and actions are still poorly understood."), 171 meta={ 172 "pubid": "25,445,712", 173 "long_answer": "yes", 174 }, 175 ), 176 ] 177 178 result = sparse_doc_embedder.run(document_list) 179 print(f"Document Text: {result['documents'][0].content}") 180 print(f"Document Sparse Embedding: {result['documents'][0].sparse_embedding}") 181 print(f"Sparse Embedding Dimension: {len(result['documents'][0].sparse_embedding)}") 182 ``` 183 184 #### __init__ 185 186 ```python 187 __init__( 188 model: str = "prithivida/Splade_PP_en_v1", 189 cache_dir: str | None = None, 190 threads: int | None = None, 191 batch_size: int = 32, 192 progress_bar: bool = True, 193 parallel: int | None = None, 194 local_files_only: bool = False, 195 meta_fields_to_embed: list[str] | None = None, 196 embedding_separator: str = "\n", 197 model_kwargs: dict[str, Any] | None = None, 198 ) -> None 199 ``` 200 201 Create an FastembedDocumentEmbedder component. 202 203 **Parameters:** 204 205 - **model** (<code>str</code>) – Local path or name of the model in Hugging Face's model hub, 206 such as `prithivida/Splade_PP_en_v1`. 207 - **cache_dir** (<code>str | None</code>) – The path to the cache directory. 208 Can be set using the `FASTEMBED_CACHE_PATH` env variable. 209 Defaults to `fastembed_cache` in the system's temp directory. 210 - **threads** (<code>int | None</code>) – The number of threads single onnxruntime session can use. 211 - **batch_size** (<code>int</code>) – Number of strings to encode at once. 212 - **progress_bar** (<code>bool</code>) – If `True`, displays progress bar during embedding. 213 - **parallel** (<code>int | None</code>) – If > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets. 214 If 0, use all available cores. 215 If None, don't use data-parallel processing, use default onnxruntime threading instead. 216 - **local_files_only** (<code>bool</code>) – If `True`, only use the model files in the `cache_dir`. 217 - **meta_fields_to_embed** (<code>list\[str\] | None</code>) – List of meta fields that should be embedded along with the Document content. 218 - **embedding_separator** (<code>str</code>) – Separator used to concatenate the meta fields to the Document content. 219 - **model_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary containing model parameters such as `k`, `b`, `avg_len`, `language`. 220 221 #### to_dict 222 223 ```python 224 to_dict() -> dict[str, Any] 225 ``` 226 227 Serializes the component to a dictionary. 228 229 **Returns:** 230 231 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 232 233 #### warm_up 234 235 ```python 236 warm_up() -> None 237 ``` 238 239 Initializes the component. 240 241 #### run 242 243 ```python 244 run(documents: list[Document]) -> dict[str, list[Document]] 245 ``` 246 247 Embeds a list of Documents. 248 249 **Parameters:** 250 251 - **documents** (<code>list\[Document\]</code>) – List of Documents to embed. 252 253 **Returns:** 254 255 - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys: 256 - `documents`: List of Documents with each Document's `sparse_embedding` 257 field set to the computed embeddings. 258 259 **Raises:** 260 261 - <code>TypeError</code> – If the input is not a list of Documents. 262 263 ## haystack_integrations.components.embedders.fastembed.fastembed_sparse_text_embedder 264 265 ### FastembedSparseTextEmbedder 266 267 FastembedSparseTextEmbedder computes string embedding using fastembed sparse models. 268 269 Usage example: 270 271 ```python 272 from haystack_integrations.components.embedders.fastembed import FastembedSparseTextEmbedder 273 274 text = ("It clearly says online this will work on a Mac OS system. " 275 "The disk comes and it does not, only Windows. Do Not order this if you have a Mac!!") 276 277 sparse_text_embedder = FastembedSparseTextEmbedder( 278 model="prithivida/Splade_PP_en_v1" 279 ) 280 281 sparse_embedding = sparse_text_embedder.run(text)["sparse_embedding"] 282 ``` 283 284 #### __init__ 285 286 ```python 287 __init__( 288 model: str = "prithivida/Splade_PP_en_v1", 289 cache_dir: str | None = None, 290 threads: int | None = None, 291 progress_bar: bool = True, 292 parallel: int | None = None, 293 local_files_only: bool = False, 294 model_kwargs: dict[str, Any] | None = None, 295 ) -> None 296 ``` 297 298 Create a FastembedSparseTextEmbedder component. 299 300 **Parameters:** 301 302 - **model** (<code>str</code>) – Local path or name of the model in Fastembed's model hub, such as `prithivida/Splade_PP_en_v1` 303 - **cache_dir** (<code>str | None</code>) – The path to the cache directory. 304 Can be set using the `FASTEMBED_CACHE_PATH` env variable. 305 Defaults to `fastembed_cache` in the system's temp directory. 306 - **threads** (<code>int | None</code>) – The number of threads single onnxruntime session can use. Defaults to None. 307 - **progress_bar** (<code>bool</code>) – If `True`, displays progress bar during embedding. 308 - **parallel** (<code>int | None</code>) – If > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets. 309 If 0, use all available cores. 310 If None, don't use data-parallel processing, use default onnxruntime threading instead. 311 - **local_files_only** (<code>bool</code>) – If `True`, only use the model files in the `cache_dir`. 312 - **model_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary containing model parameters such as `k`, `b`, `avg_len`, `language`. 313 314 #### to_dict 315 316 ```python 317 to_dict() -> dict[str, Any] 318 ``` 319 320 Serializes the component to a dictionary. 321 322 **Returns:** 323 324 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 325 326 #### warm_up 327 328 ```python 329 warm_up() -> None 330 ``` 331 332 Initializes the component. 333 334 #### run 335 336 ```python 337 run(text: str) -> dict[str, SparseEmbedding] 338 ``` 339 340 Embeds text using the Fastembed model. 341 342 **Parameters:** 343 344 - **text** (<code>str</code>) – A string to embed. 345 346 **Returns:** 347 348 - <code>dict\[str, SparseEmbedding\]</code> – A dictionary with the following keys: 349 - `embedding`: A list of floats representing the embedding of the input text. 350 351 **Raises:** 352 353 - <code>TypeError</code> – If the input is not a string. 354 355 ## haystack_integrations.components.embedders.fastembed.fastembed_text_embedder 356 357 ### FastembedTextEmbedder 358 359 FastembedTextEmbedder computes string embedding using fastembed embedding models. 360 361 Usage example: 362 363 ```python 364 from haystack_integrations.components.embedders.fastembed import FastembedTextEmbedder 365 366 text = ("It clearly says online this will work on a Mac OS system. " 367 "The disk comes and it does not, only Windows. Do Not order this if you have a Mac!!") 368 369 text_embedder = FastembedTextEmbedder( 370 model="BAAI/bge-small-en-v1.5" 371 ) 372 373 embedding = text_embedder.run(text)["embedding"] 374 ``` 375 376 #### __init__ 377 378 ```python 379 __init__( 380 model: str = "BAAI/bge-small-en-v1.5", 381 cache_dir: str | None = None, 382 threads: int | None = None, 383 prefix: str = "", 384 suffix: str = "", 385 progress_bar: bool = True, 386 parallel: int | None = None, 387 local_files_only: bool = False, 388 ) -> None 389 ``` 390 391 Create a FastembedTextEmbedder component. 392 393 **Parameters:** 394 395 - **model** (<code>str</code>) – Local path or name of the model in Fastembed's model hub, such as `BAAI/bge-small-en-v1.5` 396 - **cache_dir** (<code>str | None</code>) – The path to the cache directory. 397 Can be set using the `FASTEMBED_CACHE_PATH` env variable. 398 Defaults to `fastembed_cache` in the system's temp directory. 399 - **threads** (<code>int | None</code>) – The number of threads single onnxruntime session can use. Defaults to None. 400 - **prefix** (<code>str</code>) – A string to add to the beginning of each text. 401 - **suffix** (<code>str</code>) – A string to add to the end of each text. 402 - **progress_bar** (<code>bool</code>) – If `True`, displays progress bar during embedding. 403 - **parallel** (<code>int | None</code>) – If > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets. 404 If 0, use all available cores. 405 If None, don't use data-parallel processing, use default onnxruntime threading instead. 406 - **local_files_only** (<code>bool</code>) – If `True`, only use the model files in the `cache_dir`. 407 408 #### to_dict 409 410 ```python 411 to_dict() -> dict[str, Any] 412 ``` 413 414 Serializes the component to a dictionary. 415 416 **Returns:** 417 418 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 419 420 #### warm_up 421 422 ```python 423 warm_up() -> None 424 ``` 425 426 Initializes the component. 427 428 #### run 429 430 ```python 431 run(text: str) -> dict[str, list[float]] 432 ``` 433 434 Embeds text using the Fastembed model. 435 436 **Parameters:** 437 438 - **text** (<code>str</code>) – A string to embed. 439 440 **Returns:** 441 442 - <code>dict\[str, list\[float\]\]</code> – A dictionary with the following keys: 443 - `embedding`: A list of floats representing the embedding of the input text. 444 445 **Raises:** 446 447 - <code>TypeError</code> – If the input is not a string. 448 449 ## haystack_integrations.components.rankers.fastembed.late_interaction_ranker 450 451 ### FastembedLateInteractionRanker 452 453 Ranks Documents based on their similarity to the query using ColBERT models via Fastembed. 454 455 Uses late interaction (MaxSim) scoring to compute token-level similarity between 456 query and document embeddings, then ranks documents accordingly. 457 458 See https://qdrant.github.io/fastembed/examples/Supported_Models/ for supported models. 459 460 Usage example: 461 462 ```python 463 from haystack import Document 464 from haystack_integrations.components.rankers.fastembed import FastembedLateInteractionRanker 465 466 ranker = FastembedLateInteractionRanker(model_name="colbert-ir/colbertv2.0", top_k=2) 467 468 docs = [Document(content="Paris"), Document(content="Berlin")] 469 query = "What is the capital of germany?" 470 output = ranker.run(query=query, documents=docs) 471 print(output["documents"][0].content) 472 473 # Berlin 474 ``` 475 476 #### __init__ 477 478 ```python 479 __init__( 480 model_name: str = "colbert-ir/colbertv2.0", 481 top_k: int = 10, 482 cache_dir: str | None = None, 483 threads: int | None = None, 484 batch_size: int = 64, 485 parallel: int | None = None, 486 local_files_only: bool = False, 487 meta_fields_to_embed: list[str] | None = None, 488 meta_data_separator: str = "\n", 489 score_threshold: float | None = None, 490 ) -> None 491 ``` 492 493 Creates an instance of the 'FastembedLateInteractionRanker'. 494 495 **Parameters:** 496 497 - **model_name** (<code>str</code>) – Fastembed ColBERT model name. Check the list of supported models in the 498 [Fastembed documentation](https://qdrant.github.io/fastembed/examples/Supported_Models/). 499 - **top_k** (<code>int</code>) – The maximum number of documents to return. 500 - **cache_dir** (<code>str | None</code>) – The path to the cache directory. 501 Can be set using the `FASTEMBED_CACHE_PATH` env variable. 502 Defaults to `fastembed_cache` in the system's temp directory. 503 - **threads** (<code>int | None</code>) – The number of threads single onnxruntime session can use. Defaults to None. 504 - **batch_size** (<code>int</code>) – Number of strings to encode at once. 505 - **parallel** (<code>int | None</code>) – If > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets. 506 If 0, use all available cores. 507 If None, don't use data-parallel processing, use default onnxruntime threading instead. 508 - **local_files_only** (<code>bool</code>) – If `True`, only use the model files in the `cache_dir`. 509 - **meta_fields_to_embed** (<code>list\[str\] | None</code>) – List of meta fields that should be concatenated 510 with the document content for reranking. 511 - **meta_data_separator** (<code>str</code>) – Separator used to concatenate the meta fields 512 to the Document content. 513 - **score_threshold** (<code>float | None</code>) – If provided, only documents with a score above the threshold are returned. 514 Note that ColBERT scores are unnormalized sums and typically range from 3 to 25. 515 516 #### to_dict 517 518 ```python 519 to_dict() -> dict[str, Any] 520 ``` 521 522 Serializes the component to a dictionary. 523 524 **Returns:** 525 526 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 527 528 #### from_dict 529 530 ```python 531 from_dict(data: dict[str, Any]) -> FastembedLateInteractionRanker 532 ``` 533 534 Deserializes the component from a dictionary. 535 536 **Parameters:** 537 538 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 539 540 **Returns:** 541 542 - <code>FastembedLateInteractionRanker</code> – The deserialized component. 543 544 #### warm_up 545 546 ```python 547 warm_up() -> None 548 ``` 549 550 Initializes the component. 551 552 #### run 553 554 ```python 555 run( 556 query: str, documents: list[Document], top_k: int | None = None 557 ) -> dict[str, list[Document]] 558 ``` 559 560 Returns a list of documents ranked by their similarity to the given query using ColBERT MaxSim scoring. 561 562 **Parameters:** 563 564 - **query** (<code>str</code>) – The input query to compare the documents to. 565 - **documents** (<code>list\[Document\]</code>) – A list of documents to be ranked. 566 - **top_k** (<code>int | None</code>) – The maximum number of documents to return. 567 568 **Returns:** 569 570 - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys: 571 - `documents`: A list of documents closest to the query, sorted from most similar to least similar. 572 573 **Raises:** 574 575 - <code>ValueError</code> – If `top_k` is not > 0. 576 577 ## haystack_integrations.components.rankers.fastembed.ranker 578 579 ### FastembedRanker 580 581 Ranks Documents based on their similarity to the query using Fastembed models. 582 583 See https://qdrant.github.io/fastembed/examples/Supported_Models/ for supported models. 584 585 Documents are indexed from most to least semantically relevant to the query. 586 587 Usage example: 588 589 ```python 590 from haystack import Document 591 from haystack_integrations.components.rankers.fastembed import FastembedRanker 592 593 ranker = FastembedRanker(model_name="Xenova/ms-marco-MiniLM-L-6-v2", top_k=2) 594 595 docs = [Document(content="Paris"), Document(content="Berlin")] 596 query = "What is the capital of germany?" 597 output = ranker.run(query=query, documents=docs) 598 print(output["documents"][0].content) 599 600 # Berlin 601 ``` 602 603 #### __init__ 604 605 ```python 606 __init__( 607 model_name: str = "Xenova/ms-marco-MiniLM-L-6-v2", 608 top_k: int = 10, 609 cache_dir: str | None = None, 610 threads: int | None = None, 611 batch_size: int = 64, 612 parallel: int | None = None, 613 local_files_only: bool = False, 614 meta_fields_to_embed: list[str] | None = None, 615 meta_data_separator: str = "\n", 616 ) -> None 617 ``` 618 619 Creates an instance of the 'FastembedRanker'. 620 621 **Parameters:** 622 623 - **model_name** (<code>str</code>) – Fastembed model name. Check the list of supported models in the [Fastembed documentation](https://qdrant.github.io/fastembed/examples/Supported_Models/). 624 - **top_k** (<code>int</code>) – The maximum number of documents to return. 625 - **cache_dir** (<code>str | None</code>) – The path to the cache directory. 626 Can be set using the `FASTEMBED_CACHE_PATH` env variable. 627 Defaults to `fastembed_cache` in the system's temp directory. 628 - **threads** (<code>int | None</code>) – The number of threads single onnxruntime session can use. Defaults to None. 629 - **batch_size** (<code>int</code>) – Number of strings to encode at once. 630 - **parallel** (<code>int | None</code>) – If > 1, data-parallel encoding will be used, recommended for offline encoding of large datasets. 631 If 0, use all available cores. 632 If None, don't use data-parallel processing, use default onnxruntime threading instead. 633 - **local_files_only** (<code>bool</code>) – If `True`, only use the model files in the `cache_dir`. 634 - **meta_fields_to_embed** (<code>list\[str\] | None</code>) – List of meta fields that should be concatenated 635 with the document content for reranking. 636 - **meta_data_separator** (<code>str</code>) – Separator used to concatenate the meta fields 637 to the Document content. 638 639 #### to_dict 640 641 ```python 642 to_dict() -> dict[str, Any] 643 ``` 644 645 Serializes the component to a dictionary. 646 647 **Returns:** 648 649 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 650 651 #### from_dict 652 653 ```python 654 from_dict(data: dict[str, Any]) -> FastembedRanker 655 ``` 656 657 Deserializes the component from a dictionary. 658 659 **Parameters:** 660 661 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 662 663 **Returns:** 664 665 - <code>FastembedRanker</code> – The deserialized component. 666 667 #### warm_up 668 669 ```python 670 warm_up() -> None 671 ``` 672 673 Initializes the component. 674 675 #### run 676 677 ```python 678 run( 679 query: str, documents: list[Document], top_k: int | None = None 680 ) -> dict[str, list[Document]] 681 ``` 682 683 Returns a list of documents ranked by their similarity to the given query, using FastEmbed. 684 685 **Parameters:** 686 687 - **query** (<code>str</code>) – The input query to compare the documents to. 688 - **documents** (<code>list\[Document\]</code>) – A list of documents to be ranked. 689 - **top_k** (<code>int | None</code>) – The maximum number of documents to return. 690 691 **Returns:** 692 693 - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys: 694 - `documents`: A list of documents closest to the query, sorted from most similar to least similar. 695 696 **Raises:** 697 698 - <code>ValueError</code> – If `top_k` is not > 0.