optimum.md
  1  ---
  2  title: "Optimum"
  3  id: integrations-optimum
  4  description: "Optimum integration for Haystack"
  5  slug: "/integrations-optimum"
  6  ---
  7  
  8  <a id="haystack_integrations.components.embedders.optimum.optimization"></a>
  9  
 10  ## Module haystack\_integrations.components.embedders.optimum.optimization
 11  
 12  <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationMode"></a>
 13  
 14  ### OptimumEmbedderOptimizationMode
 15  
 16  [ONXX Optimization modes](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization)
 17  support by the Optimum Embedders.
 18  
 19  <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationMode.O1"></a>
 20  
 21  #### O1
 22  
 23  Basic general optimizations.
 24  
 25  <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationMode.O2"></a>
 26  
 27  #### O2
 28  
 29  Basic and extended general optimizations, transformers-specific fusions.
 30  
 31  <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationMode.O3"></a>
 32  
 33  #### O3
 34  
 35  Same as O2 with Gelu approximation.
 36  
 37  <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationMode.O4"></a>
 38  
 39  #### O4
 40  
 41  Same as O3 with mixed precision.
 42  
 43  <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationMode.from_str"></a>
 44  
 45  #### OptimumEmbedderOptimizationMode.from\_str
 46  
 47  ```python
 48  @classmethod
 49  def from_str(cls, string: str) -> "OptimumEmbedderOptimizationMode"
 50  ```
 51  
 52  Create an optimization mode from a string.
 53  
 54  **Arguments**:
 55  
 56  - `string`: String to convert.
 57  
 58  **Returns**:
 59  
 60  Optimization mode.
 61  
 62  <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationConfig"></a>
 63  
 64  ### OptimumEmbedderOptimizationConfig
 65  
 66  Configuration for Optimum Embedder Optimization.
 67  
 68  **Arguments**:
 69  
 70  - `mode`: Optimization mode.
 71  - `for_gpu`: Whether to optimize for GPUs.
 72  
 73  <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationConfig.to_optimum_config"></a>
 74  
 75  #### OptimumEmbedderOptimizationConfig.to\_optimum\_config
 76  
 77  ```python
 78  def to_optimum_config() -> OptimizationConfig
 79  ```
 80  
 81  Convert the configuration to a Optimum configuration.
 82  
 83  **Returns**:
 84  
 85  Optimum configuration.
 86  
 87  <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationConfig.to_dict"></a>
 88  
 89  #### OptimumEmbedderOptimizationConfig.to\_dict
 90  
 91  ```python
 92  def to_dict() -> dict[str, Any]
 93  ```
 94  
 95  Convert the configuration to a dictionary.
 96  
 97  **Returns**:
 98  
 99  Dictionary with serialized data.
100  
101  <a id="haystack_integrations.components.embedders.optimum.optimization.OptimumEmbedderOptimizationConfig.from_dict"></a>
102  
103  #### OptimumEmbedderOptimizationConfig.from\_dict
104  
105  ```python
106  @classmethod
107  def from_dict(cls, data: dict[str,
108                                Any]) -> "OptimumEmbedderOptimizationConfig"
109  ```
110  
111  Create an optimization configuration from a dictionary.
112  
113  **Arguments**:
114  
115  - `data`: Dictionary to deserialize from.
116  
117  **Returns**:
118  
119  Optimization configuration.
120  
121  <a id="haystack_integrations.components.embedders.optimum.optimum_document_embedder"></a>
122  
123  ## Module haystack\_integrations.components.embedders.optimum.optimum\_document\_embedder
124  
125  <a id="haystack_integrations.components.embedders.optimum.optimum_document_embedder.OptimumDocumentEmbedder"></a>
126  
127  ### OptimumDocumentEmbedder
128  
129  A component for computing `Document` embeddings using models loaded with the
130  [HuggingFace Optimum](https://huggingface.co/docs/optimum/index) library,
131  leveraging the ONNX runtime for high-speed inference.
132  
133  The embedding of each Document is stored in the `embedding` field of the Document.
134  
135  Usage example:
136  ```python
137  from haystack.dataclasses import Document
138  from haystack_integrations.components.embedders.optimum import OptimumDocumentEmbedder
139  
140  doc = Document(content="I love pizza!")
141  
142  document_embedder = OptimumDocumentEmbedder(model="sentence-transformers/all-mpnet-base-v2")
143  document_embedder.warm_up()
144  
145  result = document_embedder.run([doc])
146  print(result["documents"][0].embedding)
147  
148  # [0.017020374536514282, -0.023255806416273117, ...]
149  ```
150  
151  <a id="haystack_integrations.components.embedders.optimum.optimum_document_embedder.OptimumDocumentEmbedder.__init__"></a>
152  
153  #### OptimumDocumentEmbedder.\_\_init\_\_
154  
155  ```python
156  def __init__(model: str = "sentence-transformers/all-mpnet-base-v2",
157               token: Secret | None = Secret.from_env_var("HF_API_TOKEN",
158                                                          strict=False),
159               prefix: str = "",
160               suffix: str = "",
161               normalize_embeddings: bool = True,
162               onnx_execution_provider: str = "CPUExecutionProvider",
163               pooling_mode: str | OptimumEmbedderPooling | None = None,
164               model_kwargs: dict[str, Any] | None = None,
165               working_dir: str | None = None,
166               optimizer_settings: OptimumEmbedderOptimizationConfig
167               | None = None,
168               quantizer_settings: OptimumEmbedderQuantizationConfig
169               | None = None,
170               batch_size: int = 32,
171               progress_bar: bool = True,
172               meta_fields_to_embed: list[str] | None = None,
173               embedding_separator: str = "\n") -> None
174  ```
175  
176  Create a OptimumDocumentEmbedder component.
177  
178  **Arguments**:
179  
180  - `model`: A string representing the model id on HF Hub.
181  - `token`: The HuggingFace token to use as HTTP bearer authorization.
182  - `prefix`: A string to add to the beginning of each text.
183  - `suffix`: A string to add to the end of each text.
184  - `normalize_embeddings`: Whether to normalize the embeddings to unit length.
185  - `onnx_execution_provider`: The [execution provider](https://onnxruntime.ai/docs/execution-providers/)
186  to use for ONNX models.
187  
188  Note: Using the TensorRT execution provider
189  TensorRT requires to build its inference engine ahead of inference,
190  which takes some time due to the model optimization and nodes fusion.
191  To avoid rebuilding the engine every time the model is loaded, ONNX
192  Runtime provides a pair of options to save the engine: `trt_engine_cache_enable`
193  and `trt_engine_cache_path`. We recommend setting these two provider
194  options using the `model_kwargs` parameter, when using the TensorRT execution provider.
195  The usage is as follows:
196  ```python
197  embedder = OptimumDocumentEmbedder(
198      model="sentence-transformers/all-mpnet-base-v2",
199      onnx_execution_provider="TensorrtExecutionProvider",
200      model_kwargs={
201          "provider_options": {
202              "trt_engine_cache_enable": True,
203              "trt_engine_cache_path": "tmp/trt_cache",
204          }
205      },
206  )
207  ```
208  - `pooling_mode`: The pooling mode to use. When `None`, pooling mode will be inferred from the model config.
209  - `model_kwargs`: Dictionary containing additional keyword arguments to pass to the model.
210  In case of duplication, these kwargs override `model`, `onnx_execution_provider`
211  and `token` initialization parameters.
212  - `working_dir`: The directory to use for storing intermediate files
213  generated during model optimization/quantization. Required
214  for optimization and quantization.
215  - `optimizer_settings`: Configuration for Optimum Embedder Optimization.
216  If `None`, no additional optimization is be applied.
217  - `quantizer_settings`: Configuration for Optimum Embedder Quantization.
218  If `None`, no quantization is be applied.
219  - `batch_size`: Number of Documents to encode at once.
220  - `progress_bar`: Whether to show a progress bar or not.
221  - `meta_fields_to_embed`: List of meta fields that should be embedded along with the Document text.
222  - `embedding_separator`: Separator used to concatenate the meta fields to the Document text.
223  
224  <a id="haystack_integrations.components.embedders.optimum.optimum_document_embedder.OptimumDocumentEmbedder.warm_up"></a>
225  
226  #### OptimumDocumentEmbedder.warm\_up
227  
228  ```python
229  def warm_up() -> None
230  ```
231  
232  Initializes the component.
233  
234  <a id="haystack_integrations.components.embedders.optimum.optimum_document_embedder.OptimumDocumentEmbedder.to_dict"></a>
235  
236  #### OptimumDocumentEmbedder.to\_dict
237  
238  ```python
239  def to_dict() -> dict[str, Any]
240  ```
241  
242  Serializes the component to a dictionary.
243  
244  **Returns**:
245  
246  Dictionary with serialized data.
247  
248  <a id="haystack_integrations.components.embedders.optimum.optimum_document_embedder.OptimumDocumentEmbedder.from_dict"></a>
249  
250  #### OptimumDocumentEmbedder.from\_dict
251  
252  ```python
253  @classmethod
254  def from_dict(cls, data: dict[str, Any]) -> "OptimumDocumentEmbedder"
255  ```
256  
257  Deserializes the component from a dictionary.
258  
259  **Arguments**:
260  
261  - `data`: The dictionary to deserialize from.
262  
263  **Returns**:
264  
265  The deserialized component.
266  
267  <a id="haystack_integrations.components.embedders.optimum.optimum_document_embedder.OptimumDocumentEmbedder.run"></a>
268  
269  #### OptimumDocumentEmbedder.run
270  
271  ```python
272  @component.output_types(documents=list[Document])
273  def run(documents: list[Document]) -> dict[str, list[Document]]
274  ```
275  
276  Embed a list of Documents.
277  
278  The embedding of each Document is stored in the `embedding` field of the Document.
279  
280  **Arguments**:
281  
282  - `documents`: A list of Documents to embed.
283  
284  **Raises**:
285  
286  - `TypeError`: If the input is not a list of Documents.
287  
288  **Returns**:
289  
290  The updated Documents with their embeddings.
291  
292  <a id="haystack_integrations.components.embedders.optimum.optimum_text_embedder"></a>
293  
294  ## Module haystack\_integrations.components.embedders.optimum.optimum\_text\_embedder
295  
296  <a id="haystack_integrations.components.embedders.optimum.optimum_text_embedder.OptimumTextEmbedder"></a>
297  
298  ### OptimumTextEmbedder
299  
300  A component to embed text using models loaded with the
301  [HuggingFace Optimum](https://huggingface.co/docs/optimum/index) library,
302  leveraging the ONNX runtime for high-speed inference.
303  
304  Usage example:
305  ```python
306  from haystack_integrations.components.embedders.optimum import OptimumTextEmbedder
307  
308  text_to_embed = "I love pizza!"
309  
310  text_embedder = OptimumTextEmbedder(model="sentence-transformers/all-mpnet-base-v2")
311  text_embedder.warm_up()
312  
313  print(text_embedder.run(text_to_embed))
314  
315  # {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}
316  ```
317  
318  <a id="haystack_integrations.components.embedders.optimum.optimum_text_embedder.OptimumTextEmbedder.__init__"></a>
319  
320  #### OptimumTextEmbedder.\_\_init\_\_
321  
322  ```python
323  def __init__(
324          model: str = "sentence-transformers/all-mpnet-base-v2",
325          token: Secret | None = Secret.from_env_var("HF_API_TOKEN",
326                                                     strict=False),
327          prefix: str = "",
328          suffix: str = "",
329          normalize_embeddings: bool = True,
330          onnx_execution_provider: str = "CPUExecutionProvider",
331          pooling_mode: str | OptimumEmbedderPooling | None = None,
332          model_kwargs: dict[str, Any] | None = None,
333          working_dir: str | None = None,
334          optimizer_settings: OptimumEmbedderOptimizationConfig | None = None,
335          quantizer_settings: OptimumEmbedderQuantizationConfig | None = None)
336  ```
337  
338  Create a OptimumTextEmbedder component.
339  
340  **Arguments**:
341  
342  - `model`: A string representing the model id on HF Hub.
343  - `token`: The HuggingFace token to use as HTTP bearer authorization.
344  - `prefix`: A string to add to the beginning of each text.
345  - `suffix`: A string to add to the end of each text.
346  - `normalize_embeddings`: Whether to normalize the embeddings to unit length.
347  - `onnx_execution_provider`: The [execution provider](https://onnxruntime.ai/docs/execution-providers/)
348  to use for ONNX models.
349  
350  Note: Using the TensorRT execution provider
351  TensorRT requires to build its inference engine ahead of inference,
352  which takes some time due to the model optimization and nodes fusion.
353  To avoid rebuilding the engine every time the model is loaded, ONNX
354  Runtime provides a pair of options to save the engine: `trt_engine_cache_enable`
355  and `trt_engine_cache_path`. We recommend setting these two provider
356  options using the `model_kwargs` parameter, when using the TensorRT execution provider.
357  The usage is as follows:
358  ```python
359  embedder = OptimumDocumentEmbedder(
360      model="sentence-transformers/all-mpnet-base-v2",
361      onnx_execution_provider="TensorrtExecutionProvider",
362      model_kwargs={
363          "provider_options": {
364              "trt_engine_cache_enable": True,
365              "trt_engine_cache_path": "tmp/trt_cache",
366          }
367      },
368  )
369  ```
370  - `pooling_mode`: The pooling mode to use. When `None`, pooling mode will be inferred from the model config.
371  - `model_kwargs`: Dictionary containing additional keyword arguments to pass to the model.
372  In case of duplication, these kwargs override `model`, `onnx_execution_provider`
373  and `token` initialization parameters.
374  - `working_dir`: The directory to use for storing intermediate files
375  generated during model optimization/quantization. Required
376  for optimization and quantization.
377  - `optimizer_settings`: Configuration for Optimum Embedder Optimization.
378  If `None`, no additional optimization is be applied.
379  - `quantizer_settings`: Configuration for Optimum Embedder Quantization.
380  If `None`, no quantization is be applied.
381  
382  <a id="haystack_integrations.components.embedders.optimum.optimum_text_embedder.OptimumTextEmbedder.warm_up"></a>
383  
384  #### OptimumTextEmbedder.warm\_up
385  
386  ```python
387  def warm_up()
388  ```
389  
390  Initializes the component.
391  
392  <a id="haystack_integrations.components.embedders.optimum.optimum_text_embedder.OptimumTextEmbedder.to_dict"></a>
393  
394  #### OptimumTextEmbedder.to\_dict
395  
396  ```python
397  def to_dict() -> dict[str, Any]
398  ```
399  
400  Serializes the component to a dictionary.
401  
402  **Returns**:
403  
404  Dictionary with serialized data.
405  
406  <a id="haystack_integrations.components.embedders.optimum.optimum_text_embedder.OptimumTextEmbedder.from_dict"></a>
407  
408  #### OptimumTextEmbedder.from\_dict
409  
410  ```python
411  @classmethod
412  def from_dict(cls, data: dict[str, Any]) -> "OptimumTextEmbedder"
413  ```
414  
415  Deserializes the component from a dictionary.
416  
417  **Arguments**:
418  
419  - `data`: The dictionary to deserialize from.
420  
421  **Returns**:
422  
423  The deserialized component.
424  
425  <a id="haystack_integrations.components.embedders.optimum.optimum_text_embedder.OptimumTextEmbedder.run"></a>
426  
427  #### OptimumTextEmbedder.run
428  
429  ```python
430  @component.output_types(embedding=list[float])
431  def run(text: str) -> dict[str, list[float]]
432  ```
433  
434  Embed a string.
435  
436  **Arguments**:
437  
438  - `text`: The text to embed.
439  
440  **Raises**:
441  
442  - `TypeError`: If the input is not a string.
443  
444  **Returns**:
445  
446  The embeddings of the text.
447  
448  <a id="haystack_integrations.components.embedders.optimum.pooling"></a>
449  
450  ## Module haystack\_integrations.components.embedders.optimum.pooling
451  
452  <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling"></a>
453  
454  ### OptimumEmbedderPooling
455  
456  Pooling modes support by the Optimum Embedders.
457  
458  <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling.CLS"></a>
459  
460  #### CLS
461  
462  Perform CLS Pooling on the output of the embedding model
463  using the first token (CLS token).
464  
465  <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling.MEAN"></a>
466  
467  #### MEAN
468  
469  Perform Mean Pooling on the output of the embedding model.
470  
471  <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling.MAX"></a>
472  
473  #### MAX
474  
475  Perform Max Pooling on the output of the embedding model
476  using the maximum value in each dimension over all the tokens.
477  
478  <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling.MEAN_SQRT_LEN"></a>
479  
480  #### MEAN\_SQRT\_LEN
481  
482  Perform mean-pooling on the output of the embedding model but
483  divide by the square root of the sequence length.
484  
485  <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling.WEIGHTED_MEAN"></a>
486  
487  #### WEIGHTED\_MEAN
488  
489  Perform weighted (position) mean pooling on the output of the
490  embedding model.
491  
492  <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling.LAST_TOKEN"></a>
493  
494  #### LAST\_TOKEN
495  
496  Perform Last Token Pooling on the output of the embedding model.
497  
498  <a id="haystack_integrations.components.embedders.optimum.pooling.OptimumEmbedderPooling.from_str"></a>
499  
500  #### OptimumEmbedderPooling.from\_str
501  
502  ```python
503  @classmethod
504  def from_str(cls, string: str) -> "OptimumEmbedderPooling"
505  ```
506  
507  Create a pooling mode from a string.
508  
509  **Arguments**:
510  
511  - `string`: String to convert.
512  
513  **Returns**:
514  
515  Pooling mode.
516  
517  <a id="haystack_integrations.components.embedders.optimum.quantization"></a>
518  
519  ## Module haystack\_integrations.components.embedders.optimum.quantization
520  
521  <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationMode"></a>
522  
523  ### OptimumEmbedderQuantizationMode
524  
525  [Dynamic Quantization modes](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/quantization)
526  support by the Optimum Embedders.
527  
528  <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationMode.ARM64"></a>
529  
530  #### ARM64
531  
532  Quantization for the ARM64 architecture.
533  
534  <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationMode.AVX2"></a>
535  
536  #### AVX2
537  
538  Quantization with AVX-2 instructions.
539  
540  <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationMode.AVX512"></a>
541  
542  #### AVX512
543  
544  Quantization with AVX-512 instructions.
545  
546  <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationMode.AVX512_VNNI"></a>
547  
548  #### AVX512\_VNNI
549  
550  Quantization with AVX-512 and VNNI instructions.
551  
552  <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationMode.from_str"></a>
553  
554  #### OptimumEmbedderQuantizationMode.from\_str
555  
556  ```python
557  @classmethod
558  def from_str(cls, string: str) -> "OptimumEmbedderQuantizationMode"
559  ```
560  
561  Create an quantization mode from a string.
562  
563  **Arguments**:
564  
565  - `string`: String to convert.
566  
567  **Returns**:
568  
569  Quantization mode.
570  
571  <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationConfig"></a>
572  
573  ### OptimumEmbedderQuantizationConfig
574  
575  Configuration for Optimum Embedder Quantization.
576  
577  **Arguments**:
578  
579  - `mode`: Quantization mode.
580  - `per_channel`: Whether to apply per-channel quantization.
581  
582  <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationConfig.to_optimum_config"></a>
583  
584  #### OptimumEmbedderQuantizationConfig.to\_optimum\_config
585  
586  ```python
587  def to_optimum_config() -> QuantizationConfig
588  ```
589  
590  Convert the configuration to a Optimum configuration.
591  
592  **Returns**:
593  
594  Optimum configuration.
595  
596  <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationConfig.to_dict"></a>
597  
598  #### OptimumEmbedderQuantizationConfig.to\_dict
599  
600  ```python
601  def to_dict() -> dict[str, Any]
602  ```
603  
604  Convert the configuration to a dictionary.
605  
606  **Returns**:
607  
608  Dictionary with serialized data.
609  
610  <a id="haystack_integrations.components.embedders.optimum.quantization.OptimumEmbedderQuantizationConfig.from_dict"></a>
611  
612  #### OptimumEmbedderQuantizationConfig.from\_dict
613  
614  ```python
615  @classmethod
616  def from_dict(cls, data: dict[str,
617                                Any]) -> "OptimumEmbedderQuantizationConfig"
618  ```
619  
620  Create a configuration from a dictionary.
621  
622  **Arguments**:
623  
624  - `data`: Dictionary to deserialize from.
625  
626  **Returns**:
627  
628  Quantization configuration.
629