optimum.md
1 --- 2 title: "Optimum" 3 id: integrations-optimum 4 description: "Optimum integration for Haystack" 5 slug: "/integrations-optimum" 6 --- 7 8 <a id="haystack_integrations.components.embedders.optimum.optimization"></a> 9 10 ## Module haystack\_integrations.components.embedders.optimum.optimization 11 12 <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationMode"></a> 13 14 ### OptimumEmbedderOptimizationMode 15 16 [ONXX Optimization modes](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization) 17 support by the Optimum Embedders. 18 19 <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationMode.O1"></a> 20 21 #### O1 22 23 Basic general optimizations. 24 25 <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationMode.O2"></a> 26 27 #### O2 28 29 Basic and extended general optimizations, transformers-specific fusions. 30 31 <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationMode.O3"></a> 32 33 #### O3 34 35 Same as O2 with Gelu approximation. 36 37 <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationMode.O4"></a> 38 39 #### O4 40 41 Same as O3 with mixed precision. 42 43 <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationMode.from_str"></a> 44 45 #### OptimumEmbedderOptimizationMode.from\_str 46 47 ```python 48 @classmethod 49 def from_str(cls, string: str) -> "OptimumEmbedderOptimizationMode" 50 ``` 51 52 Create an optimization mode from a string. 53 54 **Arguments**: 55 56 - `string`: String to convert. 57 58 **Returns**: 59 60 Optimization mode. 61 62 <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationConfig"></a> 63 64 ### OptimumEmbedderOptimizationConfig 65 66 Configuration for Optimum Embedder Optimization. 67 68 **Arguments**: 69 70 - `mode`: Optimization mode. 71 - `for_gpu`: Whether to optimize for GPUs. 72 73 <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationConfig.to_optimum_config"></a> 74 75 #### OptimumEmbedderOptimizationConfig.to\_optimum\_config 76 77 ```python 78 def to_optimum_config() -> OptimizationConfig 79 ``` 80 81 Convert the configuration to a Optimum configuration. 82 83 **Returns**: 84 85 Optimum configuration. 86 87 <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationConfig.to_dict"></a> 88 89 #### OptimumEmbedderOptimizationConfig.to\_dict 90 91 ```python 92 def to_dict() -> dict[str, Any] 93 ``` 94 95 Convert the configuration to a dictionary. 96 97 **Returns**: 98 99 Dictionary with serialized data. 100 101 <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationConfig.from_dict"></a> 102 103 #### OptimumEmbedderOptimizationConfig.from\_dict 104 105 ```python 106 @classmethod 107 def from_dict(cls, data: dict[str, 108 Any]) -> "OptimumEmbedderOptimizationConfig" 109 ``` 110 111 Create an optimization configuration from a dictionary. 112 113 **Arguments**: 114 115 - `data`: Dictionary to deserialize from. 116 117 **Returns**: 118 119 Optimization configuration. 120 121 <a id="haystack_integrations.components.embedders.optimum.optimum_document_embedder"></a> 122 123 ## Module haystack\_integrations.components.embedders.optimum.optimum\_document\_embedder 124 125 <a id="haystack_integrations.components.embedders.optimum.optimum_document_embedder.OptimumDocumentEmbedder"></a> 126 127 ### OptimumDocumentEmbedder 128 129 A component for computing `Document` embeddings using models loaded with the 130 [HuggingFace Optimum](https://huggingface.co/docs/optimum/index) library, 131 leveraging the ONNX runtime for high-speed inference. 132 133 The embedding of each Document is stored in the `embedding` field of the Document. 134 135 Usage example: 136 ```python 137 from haystack.dataclasses import Document 138 from haystack_integrations.components.embedders.optimum import OptimumDocumentEmbedder 139 140 doc = Document(content="I love pizza!") 141 142 document_embedder = OptimumDocumentEmbedder(model="sentence-transformers/all-mpnet-base-v2") 143 document_embedder.warm_up() 144 145 result = document_embedder.run([doc]) 146 print(result["documents"][0].embedding) 147 148 # [0.017020374536514282, -0.023255806416273117, ...] 149 ``` 150 151 <a id="haystack_integrations.components.embedders.optimum.optimum_document_embedder.OptimumDocumentEmbedder.__init__"></a> 152 153 #### OptimumDocumentEmbedder.\_\_init\_\_ 154 155 ```python 156 def __init__(model: str = "sentence-transformers/all-mpnet-base-v2", 157 token: Secret | None = Secret.from_env_var("HF_API_TOKEN", 158 strict=False), 159 prefix: str = "", 160 suffix: str = "", 161 normalize_embeddings: bool = True, 162 onnx_execution_provider: str = "CPUExecutionProvider", 163 pooling_mode: str | OptimumEmbedderPooling | None = None, 164 model_kwargs: dict[str, Any] | None = None, 165 working_dir: str | None = None, 166 optimizer_settings: OptimumEmbedderOptimizationConfig 167 | None = None, 168 quantizer_settings: OptimumEmbedderQuantizationConfig 169 | None = None, 170 batch_size: int = 32, 171 progress_bar: bool = True, 172 meta_fields_to_embed: list[str] | None = None, 173 embedding_separator: str = "\n") -> None 174 ``` 175 176 Create a OptimumDocumentEmbedder component. 177 178 **Arguments**: 179 180 - `model`: A string representing the model id on HF Hub. 181 - `token`: The HuggingFace token to use as HTTP bearer authorization. 182 - `prefix`: A string to add to the beginning of each text. 183 - `suffix`: A string to add to the end of each text. 184 - `normalize_embeddings`: Whether to normalize the embeddings to unit length. 185 - `onnx_execution_provider`: The [execution provider](https://onnxruntime.ai/docs/execution-providers/) 186 to use for ONNX models. 187 188 Note: Using the TensorRT execution provider 189 TensorRT requires to build its inference engine ahead of inference, 190 which takes some time due to the model optimization and nodes fusion. 191 To avoid rebuilding the engine every time the model is loaded, ONNX 192 Runtime provides a pair of options to save the engine: `trt_engine_cache_enable` 193 and `trt_engine_cache_path`. We recommend setting these two provider 194 options using the `model_kwargs` parameter, when using the TensorRT execution provider. 195 The usage is as follows: 196 ```python 197 embedder = OptimumDocumentEmbedder( 198 model="sentence-transformers/all-mpnet-base-v2", 199 onnx_execution_provider="TensorrtExecutionProvider", 200 model_kwargs={ 201 "provider_options": { 202 "trt_engine_cache_enable": True, 203 "trt_engine_cache_path": "tmp/trt_cache", 204 } 205 }, 206 ) 207 ``` 208 - `pooling_mode`: The pooling mode to use. When `None`, pooling mode will be inferred from the model config. 209 - `model_kwargs`: Dictionary containing additional keyword arguments to pass to the model. 210 In case of duplication, these kwargs override `model`, `onnx_execution_provider` 211 and `token` initialization parameters. 212 - `working_dir`: The directory to use for storing intermediate files 213 generated during model optimization/quantization. Required 214 for optimization and quantization. 215 - `optimizer_settings`: Configuration for Optimum Embedder Optimization. 216 If `None`, no additional optimization is be applied. 217 - `quantizer_settings`: Configuration for Optimum Embedder Quantization. 218 If `None`, no quantization is be applied. 219 - `batch_size`: Number of Documents to encode at once. 220 - `progress_bar`: Whether to show a progress bar or not. 221 - `meta_fields_to_embed`: List of meta fields that should be embedded along with the Document text. 222 - `embedding_separator`: Separator used to concatenate the meta fields to the Document text. 223 224 <a id="haystack_integrations.components.embedders.optimum.optimum_document_embedder.OptimumDocumentEmbedder.warm_up"></a> 225 226 #### OptimumDocumentEmbedder.warm\_up 227 228 ```python 229 def warm_up() -> None 230 ``` 231 232 Initializes the component. 233 234 <a id="haystack_integrations.components.embedders.optimum.optimum_document_embedder.OptimumDocumentEmbedder.to_dict"></a> 235 236 #### OptimumDocumentEmbedder.to\_dict 237 238 ```python 239 def to_dict() -> dict[str, Any] 240 ``` 241 242 Serializes the component to a dictionary. 243 244 **Returns**: 245 246 Dictionary with serialized data. 247 248 <a id="haystack_integrations.components.embedders.optimum.optimum_document_embedder.OptimumDocumentEmbedder.from_dict"></a> 249 250 #### OptimumDocumentEmbedder.from\_dict 251 252 ```python 253 @classmethod 254 def from_dict(cls, data: dict[str, Any]) -> "OptimumDocumentEmbedder" 255 ``` 256 257 Deserializes the component from a dictionary. 258 259 **Arguments**: 260 261 - `data`: The dictionary to deserialize from. 262 263 **Returns**: 264 265 The deserialized component. 266 267 <a id="haystack_integrations.components.embedders.optimum.optimum_document_embedder.OptimumDocumentEmbedder.run"></a> 268 269 #### OptimumDocumentEmbedder.run 270 271 ```python 272 @component.output_types(documents=list[Document]) 273 def run(documents: list[Document]) -> dict[str, list[Document]] 274 ``` 275 276 Embed a list of Documents. 277 278 The embedding of each Document is stored in the `embedding` field of the Document. 279 280 **Arguments**: 281 282 - `documents`: A list of Documents to embed. 283 284 **Raises**: 285 286 - `TypeError`: If the input is not a list of Documents. 287 288 **Returns**: 289 290 The updated Documents with their embeddings. 291 292 <a id="haystack_integrations.components.embedders.optimum.optimum_text_embedder"></a> 293 294 ## Module haystack\_integrations.components.embedders.optimum.optimum\_text\_embedder 295 296 <a id="haystack_integrations.components.embedders.optimum.optimum_text_embedder.OptimumTextEmbedder"></a> 297 298 ### OptimumTextEmbedder 299 300 A component to embed text using models loaded with the 301 [HuggingFace Optimum](https://huggingface.co/docs/optimum/index) library, 302 leveraging the ONNX runtime for high-speed inference. 303 304 Usage example: 305 ```python 306 from haystack_integrations.components.embedders.optimum import OptimumTextEmbedder 307 308 text_to_embed = "I love pizza!" 309 310 text_embedder = OptimumTextEmbedder(model="sentence-transformers/all-mpnet-base-v2") 311 text_embedder.warm_up() 312 313 print(text_embedder.run(text_to_embed)) 314 315 # {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]} 316 ``` 317 318 <a id="haystack_integrations.components.embedders.optimum.optimum_text_embedder.OptimumTextEmbedder.__init__"></a> 319 320 #### OptimumTextEmbedder.\_\_init\_\_ 321 322 ```python 323 def __init__( 324 model: str = "sentence-transformers/all-mpnet-base-v2", 325 token: Secret | None = Secret.from_env_var("HF_API_TOKEN", 326 strict=False), 327 prefix: str = "", 328 suffix: str = "", 329 normalize_embeddings: bool = True, 330 onnx_execution_provider: str = "CPUExecutionProvider", 331 pooling_mode: str | OptimumEmbedderPooling | None = None, 332 model_kwargs: dict[str, Any] | None = None, 333 working_dir: str | None = None, 334 optimizer_settings: OptimumEmbedderOptimizationConfig | None = None, 335 quantizer_settings: OptimumEmbedderQuantizationConfig | None = None) 336 ``` 337 338 Create a OptimumTextEmbedder component. 339 340 **Arguments**: 341 342 - `model`: A string representing the model id on HF Hub. 343 - `token`: The HuggingFace token to use as HTTP bearer authorization. 344 - `prefix`: A string to add to the beginning of each text. 345 - `suffix`: A string to add to the end of each text. 346 - `normalize_embeddings`: Whether to normalize the embeddings to unit length. 347 - `onnx_execution_provider`: The [execution provider](https://onnxruntime.ai/docs/execution-providers/) 348 to use for ONNX models. 349 350 Note: Using the TensorRT execution provider 351 TensorRT requires to build its inference engine ahead of inference, 352 which takes some time due to the model optimization and nodes fusion. 353 To avoid rebuilding the engine every time the model is loaded, ONNX 354 Runtime provides a pair of options to save the engine: `trt_engine_cache_enable` 355 and `trt_engine_cache_path`. We recommend setting these two provider 356 options using the `model_kwargs` parameter, when using the TensorRT execution provider. 357 The usage is as follows: 358 ```python 359 embedder = OptimumDocumentEmbedder( 360 model="sentence-transformers/all-mpnet-base-v2", 361 onnx_execution_provider="TensorrtExecutionProvider", 362 model_kwargs={ 363 "provider_options": { 364 "trt_engine_cache_enable": True, 365 "trt_engine_cache_path": "tmp/trt_cache", 366 } 367 }, 368 ) 369 ``` 370 - `pooling_mode`: The pooling mode to use. When `None`, pooling mode will be inferred from the model config. 371 - `model_kwargs`: Dictionary containing additional keyword arguments to pass to the model. 372 In case of duplication, these kwargs override `model`, `onnx_execution_provider` 373 and `token` initialization parameters. 374 - `working_dir`: The directory to use for storing intermediate files 375 generated during model optimization/quantization. Required 376 for optimization and quantization. 377 - `optimizer_settings`: Configuration for Optimum Embedder Optimization. 378 If `None`, no additional optimization is be applied. 379 - `quantizer_settings`: Configuration for Optimum Embedder Quantization. 380 If `None`, no quantization is be applied. 381 382 <a id="haystack_integrations.components.embedders.optimum.optimum_text_embedder.OptimumTextEmbedder.warm_up"></a> 383 384 #### OptimumTextEmbedder.warm\_up 385 386 ```python 387 def warm_up() 388 ``` 389 390 Initializes the component. 391 392 <a id="haystack_integrations.components.embedders.optimum.optimum_text_embedder.OptimumTextEmbedder.to_dict"></a> 393 394 #### OptimumTextEmbedder.to\_dict 395 396 ```python 397 def to_dict() -> dict[str, Any] 398 ``` 399 400 Serializes the component to a dictionary. 401 402 **Returns**: 403 404 Dictionary with serialized data. 405 406 <a id="haystack_integrations.components.embedders.optimum.optimum_text_embedder.OptimumTextEmbedder.from_dict"></a> 407 408 #### OptimumTextEmbedder.from\_dict 409 410 ```python 411 @classmethod 412 def from_dict(cls, data: dict[str, Any]) -> "OptimumTextEmbedder" 413 ``` 414 415 Deserializes the component from a dictionary. 416 417 **Arguments**: 418 419 - `data`: The dictionary to deserialize from. 420 421 **Returns**: 422 423 The deserialized component. 424 425 <a id="haystack_integrations.components.embedders.optimum.optimum_text_embedder.OptimumTextEmbedder.run"></a> 426 427 #### OptimumTextEmbedder.run 428 429 ```python 430 @component.output_types(embedding=list[float]) 431 def run(text: str) -> dict[str, list[float]] 432 ``` 433 434 Embed a string. 435 436 **Arguments**: 437 438 - `text`: The text to embed. 439 440 **Raises**: 441 442 - `TypeError`: If the input is not a string. 443 444 **Returns**: 445 446 The embeddings of the text. 447 448 <a id="haystack_integrations.components.embedders.optimum.pooling"></a> 449 450 ## Module haystack\_integrations.components.embedders.optimum.pooling 451 452 <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling"></a> 453 454 ### OptimumEmbedderPooling 455 456 Pooling modes support by the Optimum Embedders. 457 458 <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling.CLS"></a> 459 460 #### CLS 461 462 Perform CLS Pooling on the output of the embedding model 463 using the first token (CLS token). 464 465 <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling.MEAN"></a> 466 467 #### MEAN 468 469 Perform Mean Pooling on the output of the embedding model. 470 471 <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling.MAX"></a> 472 473 #### MAX 474 475 Perform Max Pooling on the output of the embedding model 476 using the maximum value in each dimension over all the tokens. 477 478 <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling.MEAN_SQRT_LEN"></a> 479 480 #### MEAN\_SQRT\_LEN 481 482 Perform mean-pooling on the output of the embedding model but 483 divide by the square root of the sequence length. 484 485 <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling.WEIGHTED_MEAN"></a> 486 487 #### WEIGHTED\_MEAN 488 489 Perform weighted (position) mean pooling on the output of the 490 embedding model. 491 492 <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling.LAST_TOKEN"></a> 493 494 #### LAST\_TOKEN 495 496 Perform Last Token Pooling on the output of the embedding model. 497 498 <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling.from_str"></a> 499 500 #### OptimumEmbedderPooling.from\_str 501 502 ```python 503 @classmethod 504 def from_str(cls, string: str) -> "OptimumEmbedderPooling" 505 ``` 506 507 Create a pooling mode from a string. 508 509 **Arguments**: 510 511 - `string`: String to convert. 512 513 **Returns**: 514 515 Pooling mode. 516 517 <a id="haystack_integrations.components.embedders.optimum.quantization"></a> 518 519 ## Module haystack\_integrations.components.embedders.optimum.quantization 520 521 <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationMode"></a> 522 523 ### OptimumEmbedderQuantizationMode 524 525 [Dynamic Quantization modes](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/quantization) 526 support by the Optimum Embedders. 527 528 <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationMode.ARM64"></a> 529 530 #### ARM64 531 532 Quantization for the ARM64 architecture. 533 534 <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationMode.AVX2"></a> 535 536 #### AVX2 537 538 Quantization with AVX-2 instructions. 539 540 <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationMode.AVX512"></a> 541 542 #### AVX512 543 544 Quantization with AVX-512 instructions. 545 546 <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationMode.AVX512_VNNI"></a> 547 548 #### AVX512\_VNNI 549 550 Quantization with AVX-512 and VNNI instructions. 551 552 <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationMode.from_str"></a> 553 554 #### OptimumEmbedderQuantizationMode.from\_str 555 556 ```python 557 @classmethod 558 def from_str(cls, string: str) -> "OptimumEmbedderQuantizationMode" 559 ``` 560 561 Create an quantization mode from a string. 562 563 **Arguments**: 564 565 - `string`: String to convert. 566 567 **Returns**: 568 569 Quantization mode. 570 571 <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationConfig"></a> 572 573 ### OptimumEmbedderQuantizationConfig 574 575 Configuration for Optimum Embedder Quantization. 576 577 **Arguments**: 578 579 - `mode`: Quantization mode. 580 - `per_channel`: Whether to apply per-channel quantization. 581 582 <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationConfig.to_optimum_config"></a> 583 584 #### OptimumEmbedderQuantizationConfig.to\_optimum\_config 585 586 ```python 587 def to_optimum_config() -> QuantizationConfig 588 ``` 589 590 Convert the configuration to a Optimum configuration. 591 592 **Returns**: 593 594 Optimum configuration. 595 596 <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationConfig.to_dict"></a> 597 598 #### OptimumEmbedderQuantizationConfig.to\_dict 599 600 ```python 601 def to_dict() -> dict[str, Any] 602 ``` 603 604 Convert the configuration to a dictionary. 605 606 **Returns**: 607 608 Dictionary with serialized data. 609 610 <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationConfig.from_dict"></a> 611 612 #### OptimumEmbedderQuantizationConfig.from\_dict 613 614 ```python 615 @classmethod 616 def from_dict(cls, data: dict[str, 617 Any]) -> "OptimumEmbedderQuantizationConfig" 618 ``` 619 620 Create a configuration from a dictionary. 621 622 **Arguments**: 623 624 - `data`: Dictionary to deserialize from. 625 626 **Returns**: 627 628 Quantization configuration. 629