/ docs-website / reference / haystack-api / embedders_api.md
embedders_api.md
   1  ---
   2  title: "Embedders"
   3  id: embedders-api
   4  description: "Transforms queries into vectors to look for similar or relevant Documents."
   5  slug: "/embedders-api"
   6  ---
   7  
   8  
   9  ## azure_document_embedder
  10  
  11  ### AzureOpenAIDocumentEmbedder
  12  
  13  Bases: <code>OpenAIDocumentEmbedder</code>
  14  
  15  Calculates document embeddings using OpenAI models deployed on Azure.
  16  
  17  ### Usage example
  18  
  19  <!-- test-ignore -->
  20  
  21  ```python
  22  from haystack import Document
  23  from haystack.components.embedders import AzureOpenAIDocumentEmbedder
  24  
  25  doc = Document(content="I love pizza!")
  26  document_embedder = AzureOpenAIDocumentEmbedder()
  27  
  28  result = document_embedder.run([doc])
  29  print(result['documents'][0].embedding)
  30  
  31  # [0.017020374536514282, -0.023255806416273117, ...]
  32  ```
  33  
  34  #### __init__
  35  
  36  ```python
  37  __init__(
  38      azure_endpoint: str | None = None,
  39      api_version: str | None = "2023-05-15",
  40      azure_deployment: str = "text-embedding-ada-002",
  41      dimensions: int | None = None,
  42      api_key: Secret | None = Secret.from_env_var(
  43          "AZURE_OPENAI_API_KEY", strict=False
  44      ),
  45      azure_ad_token: Secret | None = Secret.from_env_var(
  46          "AZURE_OPENAI_AD_TOKEN", strict=False
  47      ),
  48      organization: str | None = None,
  49      prefix: str = "",
  50      suffix: str = "",
  51      batch_size: int = 32,
  52      progress_bar: bool = True,
  53      meta_fields_to_embed: list[str] | None = None,
  54      embedding_separator: str = "\n",
  55      timeout: float | None = None,
  56      max_retries: int | None = None,
  57      *,
  58      default_headers: dict[str, str] | None = None,
  59      azure_ad_token_provider: AzureADTokenProvider | None = None,
  60      http_client_kwargs: dict[str, Any] | None = None,
  61      raise_on_failure: bool = False
  62  ) -> None
  63  ```
  64  
  65  Creates an AzureOpenAIDocumentEmbedder component.
  66  
  67  **Parameters:**
  68  
  69  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the model deployed on Azure.
  70  - **api_version** (<code>str | None</code>) – The version of the API to use.
  71  - **azure_deployment** (<code>str</code>) – The name of the model deployed on Azure. The default model is text-embedding-ada-002.
  72  - **dimensions** (<code>int | None</code>) – The number of dimensions of the resulting embeddings. Only supported in text-embedding-3
  73    and later models.
  74  - **api_key** (<code>Secret | None</code>) – The Azure OpenAI API key.
  75    You can set it with an environment variable `AZURE_OPENAI_API_KEY`, or pass with this
  76    parameter during initialization.
  77  - **azure_ad_token** (<code>Secret | None</code>) – Microsoft Entra ID token, see Microsoft's
  78    [Entra ID](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id)
  79    documentation for more information. You can set it with an environment variable
  80    `AZURE_OPENAI_AD_TOKEN`, or pass with this parameter during initialization.
  81    Previously called Azure Active Directory.
  82  - **organization** (<code>str | None</code>) – Your organization ID. See OpenAI's
  83    [Setting Up Your Organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization)
  84    for more information.
  85  - **prefix** (<code>str</code>) – A string to add at the beginning of each text.
  86  - **suffix** (<code>str</code>) – A string to add at the end of each text.
  87  - **batch_size** (<code>int</code>) – Number of documents to embed at once.
  88  - **progress_bar** (<code>bool</code>) – If `True`, shows a progress bar when running.
  89  - **meta_fields_to_embed** (<code>list\[str\] | None</code>) – List of metadata fields to embed along with the document text.
  90  - **embedding_separator** (<code>str</code>) – Separator used to concatenate the metadata fields to the document text.
  91  - **timeout** (<code>float | None</code>) – The timeout for `AzureOpenAI` client calls, in seconds.
  92    If not set, defaults to either the
  93    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
  94  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact AzureOpenAI after an internal error.
  95    If not set, defaults to either the `OPENAI_MAX_RETRIES` environment variable or to 5 retries.
  96  - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to send to the AzureOpenAI client.
  97  - **azure_ad_token_provider** (<code>AzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on
  98    every request.
  99  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 100    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
 101  - **raise_on_failure** (<code>bool</code>) – Whether to raise an exception if the embedding request fails. If `False`, the component will log the error
 102    and continue processing the remaining documents. If `True`, it will raise an exception on failure.
 103  
 104  #### to_dict
 105  
 106  ```python
 107  to_dict() -> dict[str, Any]
 108  ```
 109  
 110  Serializes the component to a dictionary.
 111  
 112  **Returns:**
 113  
 114  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 115  
 116  #### from_dict
 117  
 118  ```python
 119  from_dict(data: dict[str, Any]) -> AzureOpenAIDocumentEmbedder
 120  ```
 121  
 122  Deserializes the component from a dictionary.
 123  
 124  **Parameters:**
 125  
 126  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
 127  
 128  **Returns:**
 129  
 130  - <code>AzureOpenAIDocumentEmbedder</code> – Deserialized component.
 131  
 132  ## azure_text_embedder
 133  
 134  ### AzureOpenAITextEmbedder
 135  
 136  Bases: <code>OpenAITextEmbedder</code>
 137  
 138  Embeds strings using OpenAI models deployed on Azure.
 139  
 140  ### Usage example
 141  
 142  <!-- test-ignore -->
 143  
 144  ```python
 145  from haystack.components.embedders import AzureOpenAITextEmbedder
 146  
 147  text_to_embed = "I love pizza!"
 148  text_embedder = AzureOpenAITextEmbedder()
 149  
 150  print(text_embedder.run(text_to_embed))
 151  
 152  # {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
 153  # 'meta': {'model': 'text-embedding-ada-002-v2',
 154  #          'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}
 155  ```
 156  
 157  #### __init__
 158  
 159  ```python
 160  __init__(
 161      azure_endpoint: str | None = None,
 162      api_version: str | None = "2023-05-15",
 163      azure_deployment: str = "text-embedding-ada-002",
 164      dimensions: int | None = None,
 165      api_key: Secret | None = Secret.from_env_var(
 166          "AZURE_OPENAI_API_KEY", strict=False
 167      ),
 168      azure_ad_token: Secret | None = Secret.from_env_var(
 169          "AZURE_OPENAI_AD_TOKEN", strict=False
 170      ),
 171      organization: str | None = None,
 172      timeout: float | None = None,
 173      max_retries: int | None = None,
 174      prefix: str = "",
 175      suffix: str = "",
 176      *,
 177      default_headers: dict[str, str] | None = None,
 178      azure_ad_token_provider: AzureADTokenProvider | None = None,
 179      http_client_kwargs: dict[str, Any] | None = None
 180  ) -> None
 181  ```
 182  
 183  Creates an AzureOpenAITextEmbedder component.
 184  
 185  **Parameters:**
 186  
 187  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the model deployed on Azure.
 188  - **api_version** (<code>str | None</code>) – The version of the API to use.
 189  - **azure_deployment** (<code>str</code>) – The name of the model deployed on Azure. The default model is text-embedding-ada-002.
 190  - **dimensions** (<code>int | None</code>) – The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3
 191    and later models.
 192  - **api_key** (<code>Secret | None</code>) – The Azure OpenAI API key.
 193    You can set it with an environment variable `AZURE_OPENAI_API_KEY`, or pass with this
 194    parameter during initialization.
 195  - **azure_ad_token** (<code>Secret | None</code>) – Microsoft Entra ID token, see Microsoft's
 196    [Entra ID](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id)
 197    documentation for more information. You can set it with an environment variable
 198    `AZURE_OPENAI_AD_TOKEN`, or pass with this parameter during initialization.
 199    Previously called Azure Active Directory.
 200  - **organization** (<code>str | None</code>) – Your organization ID. See OpenAI's
 201    [Setting Up Your Organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization)
 202    for more information.
 203  - **timeout** (<code>float | None</code>) – The timeout for `AzureOpenAI` client calls, in seconds.
 204    If not set, defaults to either the
 205    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 206  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact AzureOpenAI after an internal error.
 207    If not set, defaults to either the `OPENAI_MAX_RETRIES` environment variable, or to 5 retries.
 208  - **prefix** (<code>str</code>) – A string to add at the beginning of each text.
 209  - **suffix** (<code>str</code>) – A string to add at the end of each text.
 210  - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to send to the AzureOpenAI client.
 211  - **azure_ad_token_provider** (<code>AzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on
 212    every request.
 213  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 214    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
 215  
 216  #### to_dict
 217  
 218  ```python
 219  to_dict() -> dict[str, Any]
 220  ```
 221  
 222  Serializes the component to a dictionary.
 223  
 224  **Returns:**
 225  
 226  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 227  
 228  #### from_dict
 229  
 230  ```python
 231  from_dict(data: dict[str, Any]) -> AzureOpenAITextEmbedder
 232  ```
 233  
 234  Deserializes the component from a dictionary.
 235  
 236  **Parameters:**
 237  
 238  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
 239  
 240  **Returns:**
 241  
 242  - <code>AzureOpenAITextEmbedder</code> – Deserialized component.
 243  
 244  ## hugging_face_api_document_embedder
 245  
 246  ### HuggingFaceAPIDocumentEmbedder
 247  
 248  Embeds documents using Hugging Face APIs.
 249  
 250  Use it with the following Hugging Face APIs:
 251  
 252  - [Free Serverless Inference API](https://huggingface.co/inference-api)
 253  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
 254  - [Self-hosted Text Embeddings Inference](https://github.com/huggingface/text-embeddings-inference)
 255  
 256  ### Usage examples
 257  
 258  #### With free serverless inference API
 259  
 260  <!-- test-ignore -->
 261  
 262  ```python
 263  from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
 264  from haystack.utils import Secret
 265  from haystack.dataclasses import Document
 266  
 267  doc = Document(content="I love pizza!")
 268  
 269  doc_embedder = HuggingFaceAPIDocumentEmbedder(api_type="serverless_inference_api",
 270                                                api_params={"model": "BAAI/bge-small-en-v1.5"},
 271                                                token=Secret.from_token("<your-api-key>"))
 272  
 273  result = document_embedder.run([doc])
 274  print(result["documents"][0].embedding)
 275  
 276  # [0.017020374536514282, -0.023255806416273117, ...]
 277  ```
 278  
 279  #### With paid inference endpoints
 280  
 281  <!-- test-ignore -->
 282  
 283  ```python
 284  from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
 285  from haystack.utils import Secret
 286  from haystack.dataclasses import Document
 287  
 288  doc = Document(content="I love pizza!")
 289  
 290  doc_embedder = HuggingFaceAPIDocumentEmbedder(api_type="inference_endpoints",
 291                                                api_params={"url": "<your-inference-endpoint-url>"},
 292                                                token=Secret.from_token("<your-api-key>"))
 293  
 294  result = document_embedder.run([doc])
 295  print(result["documents"][0].embedding)
 296  
 297  # [0.017020374536514282, -0.023255806416273117, ...]
 298  ```
 299  
 300  #### With self-hosted text embeddings inference
 301  
 302  <!-- test-ignore -->
 303  
 304  ```python
 305  from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
 306  from haystack.dataclasses import Document
 307  
 308  doc = Document(content="I love pizza!")
 309  
 310  doc_embedder = HuggingFaceAPIDocumentEmbedder(api_type="text_embeddings_inference",
 311                                                api_params={"url": "http://localhost:8080"})
 312  
 313  result = document_embedder.run([doc])
 314  print(result["documents"][0].embedding)
 315  
 316  # [0.017020374536514282, -0.023255806416273117, ...]
 317  ```
 318  
 319  #### __init__
 320  
 321  ```python
 322  __init__(
 323      api_type: HFEmbeddingAPIType | str,
 324      api_params: dict[str, str],
 325      token: Secret | None = Secret.from_env_var(
 326          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
 327      ),
 328      prefix: str = "",
 329      suffix: str = "",
 330      truncate: bool | None = True,
 331      normalize: bool | None = False,
 332      batch_size: int = 32,
 333      progress_bar: bool = True,
 334      meta_fields_to_embed: list[str] | None = None,
 335      embedding_separator: str = "\n",
 336      concurrency_limit: int = 4,
 337  ) -> None
 338  ```
 339  
 340  Creates a HuggingFaceAPIDocumentEmbedder component.
 341  
 342  **Parameters:**
 343  
 344  - **api_type** (<code>HFEmbeddingAPIType | str</code>) – The type of Hugging Face API to use.
 345  - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys:
 346  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
 347  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
 348    `TEXT_EMBEDDINGS_INFERENCE`.
 349  - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization.
 350    Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
 351  - **prefix** (<code>str</code>) – A string to add at the beginning of each text.
 352  - **suffix** (<code>str</code>) – A string to add at the end of each text.
 353  - **truncate** (<code>bool | None</code>) – Truncates the input text to the maximum length supported by the model.
 354    Applicable when `api_type` is `TEXT_EMBEDDINGS_INFERENCE`, or `INFERENCE_ENDPOINTS`
 355    if the backend uses Text Embeddings Inference.
 356    If `api_type` is `SERVERLESS_INFERENCE_API`, this parameter is ignored.
 357  - **normalize** (<code>bool | None</code>) – Normalizes the embeddings to unit length.
 358    Applicable when `api_type` is `TEXT_EMBEDDINGS_INFERENCE`, or `INFERENCE_ENDPOINTS`
 359    if the backend uses Text Embeddings Inference.
 360    If `api_type` is `SERVERLESS_INFERENCE_API`, this parameter is ignored.
 361  - **batch_size** (<code>int</code>) – Number of documents to process at once.
 362  - **progress_bar** (<code>bool</code>) – If `True`, shows a progress bar when running.
 363  - **meta_fields_to_embed** (<code>list\[str\] | None</code>) – List of metadata fields to embed along with the document text.
 364  - **embedding_separator** (<code>str</code>) – Separator used to concatenate the metadata fields to the document text.
 365  - **concurrency_limit** (<code>int</code>) – The maximum number of requests that should be allowed to run concurrently.
 366    This parameter is only used in the `run_async` method.
 367  
 368  #### to_dict
 369  
 370  ```python
 371  to_dict() -> dict[str, Any]
 372  ```
 373  
 374  Serializes the component to a dictionary.
 375  
 376  **Returns:**
 377  
 378  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 379  
 380  #### from_dict
 381  
 382  ```python
 383  from_dict(data: dict[str, Any]) -> HuggingFaceAPIDocumentEmbedder
 384  ```
 385  
 386  Deserializes the component from a dictionary.
 387  
 388  **Parameters:**
 389  
 390  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
 391  
 392  **Returns:**
 393  
 394  - <code>HuggingFaceAPIDocumentEmbedder</code> – Deserialized component.
 395  
 396  #### run
 397  
 398  ```python
 399  run(documents: list[Document]) -> dict[str, list[Document]]
 400  ```
 401  
 402  Embeds a list of documents.
 403  
 404  **Parameters:**
 405  
 406  - **documents** (<code>list\[Document\]</code>) – Documents to embed.
 407  
 408  **Returns:**
 409  
 410  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
 411  - `documents`: A list of documents with embeddings.
 412  
 413  #### run_async
 414  
 415  ```python
 416  run_async(documents: list[Document]) -> dict[str, list[Document]]
 417  ```
 418  
 419  Embeds a list of documents asynchronously.
 420  
 421  **Parameters:**
 422  
 423  - **documents** (<code>list\[Document\]</code>) – Documents to embed.
 424  
 425  **Returns:**
 426  
 427  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
 428  - `documents`: A list of documents with embeddings.
 429  
 430  ## hugging_face_api_text_embedder
 431  
 432  ### HuggingFaceAPITextEmbedder
 433  
 434  Embeds strings using Hugging Face APIs.
 435  
 436  Use it with the following Hugging Face APIs:
 437  
 438  - [Free Serverless Inference API](https://huggingface.co/inference-api)
 439  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
 440  - [Self-hosted Text Embeddings Inference](https://github.com/huggingface/text-embeddings-inference)
 441  
 442  ### Usage examples
 443  
 444  #### With free serverless inference API
 445  
 446  <!-- test-ignore -->
 447  
 448  ```python
 449  from haystack.components.embedders import HuggingFaceAPITextEmbedder
 450  from haystack.utils import Secret
 451  
 452  text_embedder = HuggingFaceAPITextEmbedder(api_type="serverless_inference_api",
 453                                             api_params={"model": "BAAI/bge-small-en-v1.5"},
 454                                             token=Secret.from_token("<your-api-key>"))
 455  
 456  print(text_embedder.run("I love pizza!"))
 457  
 458  # {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
 459  ```
 460  
 461  #### With paid inference endpoints
 462  
 463  <!-- test-ignore -->
 464  
 465  ```python
 466  from haystack.components.embedders import HuggingFaceAPITextEmbedder
 467  from haystack.utils import Secret
 468  text_embedder = HuggingFaceAPITextEmbedder(api_type="inference_endpoints",
 469                                             api_params={"model": "BAAI/bge-small-en-v1.5"},
 470                                             token=Secret.from_token("<your-api-key>"))
 471  
 472  print(text_embedder.run("I love pizza!"))
 473  
 474  # {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
 475  ```
 476  
 477  #### With self-hosted text embeddings inference
 478  
 479  <!-- test-ignore -->
 480  
 481  ```python
 482  from haystack.components.embedders import HuggingFaceAPITextEmbedder
 483  from haystack.utils import Secret
 484  
 485  text_embedder = HuggingFaceAPITextEmbedder(api_type="text_embeddings_inference",
 486                                             api_params={"url": "http://localhost:8080"})
 487  
 488  print(text_embedder.run("I love pizza!"))
 489  
 490  # {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
 491  ```
 492  
 493  #### __init__
 494  
 495  ```python
 496  __init__(
 497      api_type: HFEmbeddingAPIType | str,
 498      api_params: dict[str, str],
 499      token: Secret | None = Secret.from_env_var(
 500          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
 501      ),
 502      prefix: str = "",
 503      suffix: str = "",
 504      truncate: bool | None = True,
 505      normalize: bool | None = False,
 506  ) -> None
 507  ```
 508  
 509  Creates a HuggingFaceAPITextEmbedder component.
 510  
 511  **Parameters:**
 512  
 513  - **api_type** (<code>HFEmbeddingAPIType | str</code>) – The type of Hugging Face API to use.
 514  - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys:
 515  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
 516  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
 517    `TEXT_EMBEDDINGS_INFERENCE`.
 518  - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization.
 519    Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
 520  - **prefix** (<code>str</code>) – A string to add at the beginning of each text.
 521  - **suffix** (<code>str</code>) – A string to add at the end of each text.
 522  - **truncate** (<code>bool | None</code>) – Truncates the input text to the maximum length supported by the model.
 523    Applicable when `api_type` is `TEXT_EMBEDDINGS_INFERENCE`, or `INFERENCE_ENDPOINTS`
 524    if the backend uses Text Embeddings Inference.
 525    If `api_type` is `SERVERLESS_INFERENCE_API`, this parameter is ignored.
 526  - **normalize** (<code>bool | None</code>) – Normalizes the embeddings to unit length.
 527    Applicable when `api_type` is `TEXT_EMBEDDINGS_INFERENCE`, or `INFERENCE_ENDPOINTS`
 528    if the backend uses Text Embeddings Inference.
 529    If `api_type` is `SERVERLESS_INFERENCE_API`, this parameter is ignored.
 530  
 531  #### to_dict
 532  
 533  ```python
 534  to_dict() -> dict[str, Any]
 535  ```
 536  
 537  Serializes the component to a dictionary.
 538  
 539  **Returns:**
 540  
 541  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 542  
 543  #### from_dict
 544  
 545  ```python
 546  from_dict(data: dict[str, Any]) -> HuggingFaceAPITextEmbedder
 547  ```
 548  
 549  Deserializes the component from a dictionary.
 550  
 551  **Parameters:**
 552  
 553  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
 554  
 555  **Returns:**
 556  
 557  - <code>HuggingFaceAPITextEmbedder</code> – Deserialized component.
 558  
 559  #### run
 560  
 561  ```python
 562  run(text: str) -> dict[str, Any]
 563  ```
 564  
 565  Embeds a single string.
 566  
 567  **Parameters:**
 568  
 569  - **text** (<code>str</code>) – Text to embed.
 570  
 571  **Returns:**
 572  
 573  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
 574  - `embedding`: The embedding of the input text.
 575  
 576  #### run_async
 577  
 578  ```python
 579  run_async(text: str) -> dict[str, Any]
 580  ```
 581  
 582  Embeds a single string asynchronously.
 583  
 584  **Parameters:**
 585  
 586  - **text** (<code>str</code>) – Text to embed.
 587  
 588  **Returns:**
 589  
 590  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
 591  - `embedding`: The embedding of the input text.
 592  
 593  ## image/sentence_transformers_doc_image_embedder
 594  
 595  ### SentenceTransformersDocumentImageEmbedder
 596  
 597  A component for computing Document embeddings based on images using Sentence Transformers models.
 598  
 599  The embedding of each Document is stored in the `embedding` field of the Document.
 600  
 601  ### Usage example
 602  
 603  <!-- test-ignore -->
 604  
 605  ```python
 606  from haystack import Document
 607  from haystack.components.embedders.image import SentenceTransformersDocumentImageEmbedder
 608  
 609  embedder = SentenceTransformersDocumentImageEmbedder(model="sentence-transformers/clip-ViT-B-32")
 610  
 611  documents = [
 612      Document(content="A photo of a cat", meta={"file_path": "cat.jpg"}),
 613      Document(content="A photo of a dog", meta={"file_path": "dog.jpg"}),
 614  ]
 615  
 616  result = embedder.run(documents=documents)
 617  documents_with_embeddings = result["documents"]
 618  print(documents_with_embeddings)
 619  
 620  # [Document(id=...,
 621  #           content='A photo of a cat',
 622  #           meta={'file_path': 'cat.jpg',
 623  #                 'embedding_source': {'type': 'image', 'file_path_meta_field': 'file_path'}},
 624  #           embedding=vector of size 512),
 625  #  ...]
 626  ```
 627  
 628  #### __init__
 629  
 630  ```python
 631  __init__(
 632      *,
 633      file_path_meta_field: str = "file_path",
 634      root_path: str | None = None,
 635      model: str = "sentence-transformers/clip-ViT-B-32",
 636      device: ComponentDevice | None = None,
 637      token: Secret | None = Secret.from_env_var(
 638          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
 639      ),
 640      batch_size: int = 32,
 641      progress_bar: bool = True,
 642      normalize_embeddings: bool = False,
 643      trust_remote_code: bool = False,
 644      local_files_only: bool = False,
 645      model_kwargs: dict[str, Any] | None = None,
 646      tokenizer_kwargs: dict[str, Any] | None = None,
 647      config_kwargs: dict[str, Any] | None = None,
 648      precision: Literal[
 649          "float32", "int8", "uint8", "binary", "ubinary"
 650      ] = "float32",
 651      encode_kwargs: dict[str, Any] | None = None,
 652      backend: Literal["torch", "onnx", "openvino"] = "torch"
 653  ) -> None
 654  ```
 655  
 656  Creates a SentenceTransformersDocumentEmbedder component.
 657  
 658  **Parameters:**
 659  
 660  - **file_path_meta_field** (<code>str</code>) – The metadata field in the Document that contains the file path to the image or PDF.
 661  - **root_path** (<code>str | None</code>) – The root directory path where document files are located. If provided, file paths in
 662    document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths.
 663  - **model** (<code>str</code>) – The Sentence Transformers model to use for calculating embeddings. Pass a local path or ID of the model on
 664    Hugging Face. To be used with this component, the model must be able to embed images and text into the same
 665    vector space. Compatible models include:
 666  - "sentence-transformers/clip-ViT-B-32"
 667  - "sentence-transformers/clip-ViT-L-14"
 668  - "sentence-transformers/clip-ViT-B-16"
 669  - "sentence-transformers/clip-ViT-B-32-multilingual-v1"
 670  - "jinaai/jina-embeddings-v4"
 671  - "jinaai/jina-clip-v1"
 672  - "jinaai/jina-clip-v2".
 673  - **device** (<code>ComponentDevice | None</code>) – The device to use for loading the model.
 674    Overrides the default device.
 675  - **token** (<code>Secret | None</code>) – The API token to download private models from Hugging Face.
 676  - **batch_size** (<code>int</code>) – Number of documents to embed at once.
 677  - **progress_bar** (<code>bool</code>) – If `True`, shows a progress bar when embedding documents.
 678  - **normalize_embeddings** (<code>bool</code>) – If `True`, the embeddings are normalized using L2 normalization, so that each embedding has a norm of 1.
 679  - **trust_remote_code** (<code>bool</code>) – If `False`, allows only Hugging Face verified model architectures.
 680    If `True`, allows custom models and scripts.
 681  - **local_files_only** (<code>bool</code>) – If `True`, does not attempt to download the model from Hugging Face Hub and only looks at local files.
 682  - **model_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoModelForSequenceClassification.from_pretrained`
 683    when loading the model. Refer to specific model documentation for available kwargs.
 684  - **tokenizer_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoTokenizer.from_pretrained` when loading the tokenizer.
 685    Refer to specific model documentation for available kwargs.
 686  - **config_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoConfig.from_pretrained` when loading the model configuration.
 687  - **precision** (<code>Literal['float32', 'int8', 'uint8', 'binary', 'ubinary']</code>) – The precision to use for the embeddings.
 688    All non-float32 precisions are quantized embeddings.
 689    Quantized embeddings are smaller and faster to compute, but may have a lower accuracy.
 690    They are useful for reducing the size of the embeddings of a corpus for semantic search, among other tasks.
 691  - **encode_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `SentenceTransformer.encode` when embedding documents.
 692    This parameter is provided for fine customization. Be careful not to clash with already set parameters and
 693    avoid passing parameters that change the output type.
 694  - **backend** (<code>Literal['torch', 'onnx', 'openvino']</code>) – The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino".
 695    Refer to the [Sentence Transformers documentation](https://sbert.net/docs/sentence_transformer/usage/efficiency.html)
 696    for more information on acceleration and quantization options.
 697  
 698  #### to_dict
 699  
 700  ```python
 701  to_dict() -> dict[str, Any]
 702  ```
 703  
 704  Serializes the component to a dictionary.
 705  
 706  **Returns:**
 707  
 708  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 709  
 710  #### from_dict
 711  
 712  ```python
 713  from_dict(data: dict[str, Any]) -> SentenceTransformersDocumentImageEmbedder
 714  ```
 715  
 716  Deserializes the component from a dictionary.
 717  
 718  **Parameters:**
 719  
 720  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
 721  
 722  **Returns:**
 723  
 724  - <code>SentenceTransformersDocumentImageEmbedder</code> – Deserialized component.
 725  
 726  #### warm_up
 727  
 728  ```python
 729  warm_up() -> None
 730  ```
 731  
 732  Initializes the component.
 733  
 734  #### run
 735  
 736  ```python
 737  run(documents: list[Document]) -> dict[str, list[Document]]
 738  ```
 739  
 740  Embed a list of documents.
 741  
 742  **Parameters:**
 743  
 744  - **documents** (<code>list\[Document\]</code>) – Documents to embed.
 745  
 746  **Returns:**
 747  
 748  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
 749  - `documents`: Documents with embeddings.
 750  
 751  ## openai_document_embedder
 752  
 753  ### OpenAIDocumentEmbedder
 754  
 755  Computes document embeddings using OpenAI models.
 756  
 757  ### Usage example
 758  
 759  <!-- test-ignore -->
 760  
 761  ```python
 762  from haystack import Document
 763  from haystack.components.embedders import OpenAIDocumentEmbedder
 764  
 765  doc = Document(content="I love pizza!")
 766  document_embedder = OpenAIDocumentEmbedder()
 767  result = document_embedder.run([doc])
 768  
 769  print(result['documents'][0].embedding)
 770  
 771  # [0.017020374536514282, -0.023255806416273117, ...]
 772  ```
 773  
 774  #### __init__
 775  
 776  ```python
 777  __init__(
 778      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
 779      model: str = "text-embedding-ada-002",
 780      dimensions: int | None = None,
 781      api_base_url: str | None = None,
 782      organization: str | None = None,
 783      prefix: str = "",
 784      suffix: str = "",
 785      batch_size: int = 32,
 786      progress_bar: bool = True,
 787      meta_fields_to_embed: list[str] | None = None,
 788      embedding_separator: str = "\n",
 789      timeout: float | None = None,
 790      max_retries: int | None = None,
 791      http_client_kwargs: dict[str, Any] | None = None,
 792      *,
 793      raise_on_failure: bool = False
 794  ) -> None
 795  ```
 796  
 797  Creates an OpenAIDocumentEmbedder component.
 798  
 799  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
 800  environment variables to override the `timeout` and `max_retries` parameters respectively
 801  in the OpenAI client.
 802  
 803  **Parameters:**
 804  
 805  - **api_key** (<code>Secret</code>) – The OpenAI API key.
 806    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
 807    during initialization.
 808  - **model** (<code>str</code>) – The name of the model to use for calculating embeddings.
 809    The default model is `text-embedding-ada-002`.
 810  - **dimensions** (<code>int | None</code>) – The number of dimensions of the resulting embeddings. Only `text-embedding-3` and
 811    later models support this parameter.
 812  - **api_base_url** (<code>str | None</code>) – Overrides the default base URL for all HTTP requests.
 813  - **organization** (<code>str | None</code>) – Your OpenAI organization ID. See OpenAI's
 814    [Setting Up Your Organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization)
 815    for more information.
 816  - **prefix** (<code>str</code>) – A string to add at the beginning of each text.
 817  - **suffix** (<code>str</code>) – A string to add at the end of each text.
 818  - **batch_size** (<code>int</code>) – Number of documents to embed at once.
 819  - **progress_bar** (<code>bool</code>) – If `True`, shows a progress bar when running.
 820  - **meta_fields_to_embed** (<code>list\[str\] | None</code>) – List of metadata fields to embed along with the document text.
 821  - **embedding_separator** (<code>str</code>) – Separator used to concatenate the metadata fields to the document text.
 822  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
 823    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 824  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
 825    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or 5 retries.
 826  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 827    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
 828  - **raise_on_failure** (<code>bool</code>) – Whether to raise an exception if the embedding request fails. If `False`, the component will log the error
 829    and continue processing the remaining documents. If `True`, it will raise an exception on failure.
 830  
 831  #### to_dict
 832  
 833  ```python
 834  to_dict() -> dict[str, Any]
 835  ```
 836  
 837  Serializes the component to a dictionary.
 838  
 839  **Returns:**
 840  
 841  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 842  
 843  #### from_dict
 844  
 845  ```python
 846  from_dict(data: dict[str, Any]) -> OpenAIDocumentEmbedder
 847  ```
 848  
 849  Deserializes the component from a dictionary.
 850  
 851  **Parameters:**
 852  
 853  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
 854  
 855  **Returns:**
 856  
 857  - <code>OpenAIDocumentEmbedder</code> – Deserialized component.
 858  
 859  #### run
 860  
 861  ```python
 862  run(documents: list[Document]) -> dict[str, Any]
 863  ```
 864  
 865  Embeds a list of documents.
 866  
 867  **Parameters:**
 868  
 869  - **documents** (<code>list\[Document\]</code>) – A list of documents to embed.
 870  
 871  **Returns:**
 872  
 873  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
 874  - `documents`: A list of documents with embeddings.
 875  - `meta`: Information about the usage of the model.
 876  
 877  #### run_async
 878  
 879  ```python
 880  run_async(documents: list[Document]) -> dict[str, Any]
 881  ```
 882  
 883  Embeds a list of documents asynchronously.
 884  
 885  **Parameters:**
 886  
 887  - **documents** (<code>list\[Document\]</code>) – A list of documents to embed.
 888  
 889  **Returns:**
 890  
 891  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
 892  - `documents`: A list of documents with embeddings.
 893  - `meta`: Information about the usage of the model.
 894  
 895  ## openai_text_embedder
 896  
 897  ### OpenAITextEmbedder
 898  
 899  Embeds strings using OpenAI models.
 900  
 901  You can use it to embed user query and send it to an embedding Retriever.
 902  
 903  ### Usage example
 904  
 905  <!-- test-ignore -->
 906  
 907  ```python
 908  from haystack.components.embedders import OpenAITextEmbedder
 909  
 910  text_to_embed = "I love pizza!"
 911  text_embedder = OpenAITextEmbedder()
 912  
 913  print(text_embedder.run(text_to_embed))
 914  
 915  # {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
 916  # 'meta': {'model': 'text-embedding-ada-002-v2',
 917  #          'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}
 918  ```
 919  
 920  #### __init__
 921  
 922  ```python
 923  __init__(
 924      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
 925      model: str = "text-embedding-ada-002",
 926      dimensions: int | None = None,
 927      api_base_url: str | None = None,
 928      organization: str | None = None,
 929      prefix: str = "",
 930      suffix: str = "",
 931      timeout: float | None = None,
 932      max_retries: int | None = None,
 933      http_client_kwargs: dict[str, Any] | None = None,
 934  ) -> None
 935  ```
 936  
 937  Creates an OpenAITextEmbedder component.
 938  
 939  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
 940  environment variables to override the `timeout` and `max_retries` parameters respectively
 941  in the OpenAI client.
 942  
 943  **Parameters:**
 944  
 945  - **api_key** (<code>Secret</code>) – The OpenAI API key.
 946    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
 947    during initialization.
 948  - **model** (<code>str</code>) – The name of the model to use for calculating embeddings.
 949    The default model is `text-embedding-ada-002`.
 950  - **dimensions** (<code>int | None</code>) – The number of dimensions of the resulting embeddings. Only `text-embedding-3` and
 951    later models support this parameter.
 952  - **api_base_url** (<code>str | None</code>) – Overrides default base URL for all HTTP requests.
 953  - **organization** (<code>str | None</code>) – Your organization ID. See OpenAI's
 954    [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization)
 955    for more information.
 956  - **prefix** (<code>str</code>) – A string to add at the beginning of each text to embed.
 957  - **suffix** (<code>str</code>) – A string to add at the end of each text to embed.
 958  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
 959    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 960  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
 961    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 962  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 963    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
 964  
 965  #### to_dict
 966  
 967  ```python
 968  to_dict() -> dict[str, Any]
 969  ```
 970  
 971  Serializes the component to a dictionary.
 972  
 973  **Returns:**
 974  
 975  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 976  
 977  #### from_dict
 978  
 979  ```python
 980  from_dict(data: dict[str, Any]) -> OpenAITextEmbedder
 981  ```
 982  
 983  Deserializes the component from a dictionary.
 984  
 985  **Parameters:**
 986  
 987  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
 988  
 989  **Returns:**
 990  
 991  - <code>OpenAITextEmbedder</code> – Deserialized component.
 992  
 993  #### run
 994  
 995  ```python
 996  run(text: str) -> dict[str, Any]
 997  ```
 998  
 999  Embeds a single string.
1000  
1001  **Parameters:**
1002  
1003  - **text** (<code>str</code>) – Text to embed.
1004  
1005  **Returns:**
1006  
1007  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
1008  - `embedding`: The embedding of the input text.
1009  - `meta`: Information about the usage of the model.
1010  
1011  #### run_async
1012  
1013  ```python
1014  run_async(text: str) -> dict[str, Any]
1015  ```
1016  
1017  Asynchronously embed a single string.
1018  
1019  This is the asynchronous version of the `run` method. It has the same parameters and return values
1020  but can be used with `await` in async code.
1021  
1022  **Parameters:**
1023  
1024  - **text** (<code>str</code>) – Text to embed.
1025  
1026  **Returns:**
1027  
1028  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
1029  - `embedding`: The embedding of the input text.
1030  - `meta`: Information about the usage of the model.
1031  
1032  ## sentence_transformers_document_embedder
1033  
1034  ### SentenceTransformersDocumentEmbedder
1035  
1036  Calculates document embeddings using Sentence Transformers models.
1037  
1038  It stores the embeddings in the `embedding` metadata field of each document.
1039  You can also embed documents' metadata.
1040  Use this component in indexing pipelines to embed input documents
1041  and send them to DocumentWriter to write into a Document Store.
1042  
1043  ### Usage example:
1044  
1045  <!-- test-ignore -->
1046  
1047  ```python
1048  from haystack import Document
1049  from haystack.components.embedders import SentenceTransformersDocumentEmbedder
1050  doc = Document(content="I love pizza!")
1051  doc_embedder = SentenceTransformersDocumentEmbedder()
1052  
1053  result = doc_embedder.run([doc])
1054  print(result['documents'][0].embedding)
1055  
1056  # [-0.07804739475250244, 0.1498992145061493, ...]
1057  ```
1058  
1059  #### __init__
1060  
1061  ```python
1062  __init__(
1063      model: str = "sentence-transformers/all-mpnet-base-v2",
1064      device: ComponentDevice | None = None,
1065      token: Secret | None = Secret.from_env_var(
1066          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
1067      ),
1068      prefix: str = "",
1069      suffix: str = "",
1070      batch_size: int = 32,
1071      progress_bar: bool = True,
1072      normalize_embeddings: bool = False,
1073      meta_fields_to_embed: list[str] | None = None,
1074      embedding_separator: str = "\n",
1075      trust_remote_code: bool = False,
1076      local_files_only: bool = False,
1077      truncate_dim: int | None = None,
1078      model_kwargs: dict[str, Any] | None = None,
1079      tokenizer_kwargs: dict[str, Any] | None = None,
1080      config_kwargs: dict[str, Any] | None = None,
1081      precision: Literal[
1082          "float32", "int8", "uint8", "binary", "ubinary"
1083      ] = "float32",
1084      encode_kwargs: dict[str, Any] | None = None,
1085      backend: Literal["torch", "onnx", "openvino"] = "torch",
1086      revision: str | None = None,
1087  ) -> None
1088  ```
1089  
1090  Creates a SentenceTransformersDocumentEmbedder component.
1091  
1092  **Parameters:**
1093  
1094  - **model** (<code>str</code>) – The model to use for calculating embeddings.
1095    Pass a local path or ID of the model on Hugging Face.
1096  - **device** (<code>ComponentDevice | None</code>) – The device to use for loading the model.
1097    Overrides the default device.
1098  - **token** (<code>Secret | None</code>) – The API token to download private models from Hugging Face.
1099  - **prefix** (<code>str</code>) – A string to add at the beginning of each document text.
1100    Can be used to prepend the text with an instruction, as required by some embedding models,
1101    such as E5 and bge.
1102  - **suffix** (<code>str</code>) – A string to add at the end of each document text.
1103  - **batch_size** (<code>int</code>) – Number of documents to embed at once.
1104  - **progress_bar** (<code>bool</code>) – If `True`, shows a progress bar when embedding documents.
1105  - **normalize_embeddings** (<code>bool</code>) – If `True`, the embeddings are normalized using L2 normalization, so that each embedding has a norm of 1.
1106  - **meta_fields_to_embed** (<code>list\[str\] | None</code>) – List of metadata fields to embed along with the document text.
1107  - **embedding_separator** (<code>str</code>) – Separator used to concatenate the metadata fields to the document text.
1108  - **trust_remote_code** (<code>bool</code>) – If `False`, allows only Hugging Face verified model architectures.
1109    If `True`, allows custom models and scripts.
1110  - **local_files_only** (<code>bool</code>) – If `True`, does not attempt to download the model from Hugging Face Hub and only looks at local files.
1111  - **truncate_dim** (<code>int | None</code>) – The dimension to truncate sentence embeddings to. `None` does no truncation.
1112    If the model wasn't trained with Matryoshka Representation Learning,
1113    truncating embeddings can significantly affect performance.
1114  - **model_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoModelForSequenceClassification.from_pretrained`
1115    when loading the model. Refer to specific model documentation for available kwargs.
1116  - **tokenizer_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoTokenizer.from_pretrained` when loading the tokenizer.
1117    Refer to specific model documentation for available kwargs.
1118  - **config_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoConfig.from_pretrained` when loading the model configuration.
1119  - **precision** (<code>Literal['float32', 'int8', 'uint8', 'binary', 'ubinary']</code>) – The precision to use for the embeddings.
1120    All non-float32 precisions are quantized embeddings.
1121    Quantized embeddings are smaller and faster to compute, but may have a lower accuracy.
1122    They are useful for reducing the size of the embeddings of a corpus for semantic search, among other tasks.
1123  - **encode_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `SentenceTransformer.encode` when embedding documents.
1124    This parameter is provided for fine customization. Be careful not to clash with already set parameters and
1125    avoid passing parameters that change the output type.
1126  - **backend** (<code>Literal['torch', 'onnx', 'openvino']</code>) – The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino".
1127    Refer to the [Sentence Transformers documentation](https://sbert.net/docs/sentence_transformer/usage/efficiency.html)
1128    for more information on acceleration and quantization options.
1129  - **revision** (<code>str | None</code>) – The specific model version to use. It can be a branch name, a tag name, or a commit id,
1130    for a stored model on Hugging Face.
1131  
1132  #### to_dict
1133  
1134  ```python
1135  to_dict() -> dict[str, Any]
1136  ```
1137  
1138  Serializes the component to a dictionary.
1139  
1140  **Returns:**
1141  
1142  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1143  
1144  #### from_dict
1145  
1146  ```python
1147  from_dict(data: dict[str, Any]) -> SentenceTransformersDocumentEmbedder
1148  ```
1149  
1150  Deserializes the component from a dictionary.
1151  
1152  **Parameters:**
1153  
1154  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
1155  
1156  **Returns:**
1157  
1158  - <code>SentenceTransformersDocumentEmbedder</code> – Deserialized component.
1159  
1160  #### warm_up
1161  
1162  ```python
1163  warm_up() -> None
1164  ```
1165  
1166  Initializes the component.
1167  
1168  #### run
1169  
1170  ```python
1171  run(documents: list[Document]) -> dict[str, list[Document]]
1172  ```
1173  
1174  Embed a list of documents.
1175  
1176  **Parameters:**
1177  
1178  - **documents** (<code>list\[Document\]</code>) – Documents to embed.
1179  
1180  **Returns:**
1181  
1182  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
1183  - `documents`: Documents with embeddings.
1184  
1185  ## sentence_transformers_sparse_document_embedder
1186  
1187  ### SentenceTransformersSparseDocumentEmbedder
1188  
1189  Calculates document sparse embeddings using sparse embedding models from Sentence Transformers.
1190  
1191  It stores the sparse embeddings in the `sparse_embedding` metadata field of each document.
1192  You can also embed documents' metadata.
1193  Use this component in indexing pipelines to embed input documents
1194  and send them to DocumentWriter to write a into a Document Store.
1195  
1196  ### Usage example:
1197  
1198  <!-- test-ignore -->
1199  
1200  ```python
1201  from haystack import Document
1202  from haystack.components.embedders import SentenceTransformersSparseDocumentEmbedder
1203  
1204  doc = Document(content="I love pizza!")
1205  doc_embedder = SentenceTransformersSparseDocumentEmbedder()
1206  
1207  result = doc_embedder.run([doc])
1208  print(result['documents'][0].sparse_embedding)
1209  
1210  # SparseEmbedding(indices=[999, 1045, ...], values=[0.918, 0.867, ...])
1211  ```
1212  
1213  #### __init__
1214  
1215  ```python
1216  __init__(
1217      *,
1218      model: str = "prithivida/Splade_PP_en_v2",
1219      device: ComponentDevice | None = None,
1220      token: Secret | None = Secret.from_env_var(
1221          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
1222      ),
1223      prefix: str = "",
1224      suffix: str = "",
1225      batch_size: int = 32,
1226      progress_bar: bool = True,
1227      meta_fields_to_embed: list[str] | None = None,
1228      embedding_separator: str = "\n",
1229      trust_remote_code: bool = False,
1230      local_files_only: bool = False,
1231      model_kwargs: dict[str, Any] | None = None,
1232      tokenizer_kwargs: dict[str, Any] | None = None,
1233      config_kwargs: dict[str, Any] | None = None,
1234      backend: Literal["torch", "onnx", "openvino"] = "torch",
1235      revision: str | None = None
1236  ) -> None
1237  ```
1238  
1239  Creates a SentenceTransformersSparseDocumentEmbedder component.
1240  
1241  **Parameters:**
1242  
1243  - **model** (<code>str</code>) – The model to use for calculating sparse embeddings.
1244    Pass a local path or ID of the model on Hugging Face.
1245  - **device** (<code>ComponentDevice | None</code>) – The device to use for loading the model.
1246    Overrides the default device.
1247  - **token** (<code>Secret | None</code>) – The API token to download private models from Hugging Face.
1248  - **prefix** (<code>str</code>) – A string to add at the beginning of each document text.
1249  - **suffix** (<code>str</code>) – A string to add at the end of each document text.
1250  - **batch_size** (<code>int</code>) – Number of documents to embed at once.
1251  - **progress_bar** (<code>bool</code>) – If `True`, shows a progress bar when embedding documents.
1252  - **meta_fields_to_embed** (<code>list\[str\] | None</code>) – List of metadata fields to embed along with the document text.
1253  - **embedding_separator** (<code>str</code>) – Separator used to concatenate the metadata fields to the document text.
1254  - **trust_remote_code** (<code>bool</code>) – If `False`, allows only Hugging Face verified model architectures.
1255    If `True`, allows custom models and scripts.
1256  - **local_files_only** (<code>bool</code>) – If `True`, does not attempt to download the model from Hugging Face Hub and only looks at local files.
1257  - **model_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoModelForSequenceClassification.from_pretrained`
1258    when loading the model. Refer to specific model documentation for available kwargs.
1259  - **tokenizer_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoTokenizer.from_pretrained` when loading the tokenizer.
1260    Refer to specific model documentation for available kwargs.
1261  - **config_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoConfig.from_pretrained` when loading the model configuration.
1262  - **backend** (<code>Literal['torch', 'onnx', 'openvino']</code>) – The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino".
1263    Refer to the [Sentence Transformers documentation](https://sbert.net/docs/sentence_transformer/usage/efficiency.html)
1264    for more information on acceleration and quantization options.
1265  - **revision** (<code>str | None</code>) – The specific model version to use. It can be a branch name, a tag name, or a commit id,
1266    for a stored model on Hugging Face.
1267  
1268  #### to_dict
1269  
1270  ```python
1271  to_dict() -> dict[str, Any]
1272  ```
1273  
1274  Serializes the component to a dictionary.
1275  
1276  **Returns:**
1277  
1278  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1279  
1280  #### from_dict
1281  
1282  ```python
1283  from_dict(data: dict[str, Any]) -> SentenceTransformersSparseDocumentEmbedder
1284  ```
1285  
1286  Deserializes the component from a dictionary.
1287  
1288  **Parameters:**
1289  
1290  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
1291  
1292  **Returns:**
1293  
1294  - <code>SentenceTransformersSparseDocumentEmbedder</code> – Deserialized component.
1295  
1296  #### warm_up
1297  
1298  ```python
1299  warm_up() -> None
1300  ```
1301  
1302  Initializes the component.
1303  
1304  #### run
1305  
1306  ```python
1307  run(documents: list[Document]) -> dict[str, list[Document]]
1308  ```
1309  
1310  Embed a list of documents.
1311  
1312  **Parameters:**
1313  
1314  - **documents** (<code>list\[Document\]</code>) – Documents to embed.
1315  
1316  **Returns:**
1317  
1318  - <code>dict\[str, list\[Document\]\]</code> – A dictionary with the following keys:
1319  - `documents`: Documents with sparse embeddings under the `sparse_embedding` field.
1320  
1321  ## sentence_transformers_sparse_text_embedder
1322  
1323  ### SentenceTransformersSparseTextEmbedder
1324  
1325  Embeds strings using sparse embedding models from Sentence Transformers.
1326  
1327  You can use it to embed user query and send it to a sparse embedding retriever.
1328  
1329  Usage example:
1330  
1331  <!-- test-ignore -->
1332  
1333  ```python
1334  from haystack.components.embedders import SentenceTransformersSparseTextEmbedder
1335  
1336  text_to_embed = "I love pizza!"
1337  
1338  text_embedder = SentenceTransformersSparseTextEmbedder()
1339  
1340  print(text_embedder.run(text_to_embed))
1341  
1342  # {'sparse_embedding': SparseEmbedding(indices=[999, 1045, ...], values=[0.918, 0.867, ...])}
1343  ```
1344  
1345  #### __init__
1346  
1347  ```python
1348  __init__(
1349      *,
1350      model: str = "prithivida/Splade_PP_en_v2",
1351      device: ComponentDevice | None = None,
1352      token: Secret | None = Secret.from_env_var(
1353          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
1354      ),
1355      prefix: str = "",
1356      suffix: str = "",
1357      trust_remote_code: bool = False,
1358      local_files_only: bool = False,
1359      model_kwargs: dict[str, Any] | None = None,
1360      tokenizer_kwargs: dict[str, Any] | None = None,
1361      config_kwargs: dict[str, Any] | None = None,
1362      backend: Literal["torch", "onnx", "openvino"] = "torch",
1363      revision: str | None = None
1364  ) -> None
1365  ```
1366  
1367  Create a SentenceTransformersSparseTextEmbedder component.
1368  
1369  **Parameters:**
1370  
1371  - **model** (<code>str</code>) – The model to use for calculating sparse embeddings.
1372    Specify the path to a local model or the ID of the model on Hugging Face.
1373  - **device** (<code>ComponentDevice | None</code>) – Overrides the default device used to load the model.
1374  - **token** (<code>Secret | None</code>) – An API token to use private models from Hugging Face.
1375  - **prefix** (<code>str</code>) – A string to add at the beginning of each text to be embedded.
1376  - **suffix** (<code>str</code>) – A string to add at the end of each text to embed.
1377  - **trust_remote_code** (<code>bool</code>) – If `False`, permits only Hugging Face verified model architectures.
1378    If `True`, permits custom models and scripts.
1379  - **local_files_only** (<code>bool</code>) – If `True`, does not attempt to download the model from Hugging Face Hub and only looks at local files.
1380  - **model_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoModelForSequenceClassification.from_pretrained`
1381    when loading the model. Refer to specific model documentation for available kwargs.
1382  - **tokenizer_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoTokenizer.from_pretrained` when loading the tokenizer.
1383    Refer to specific model documentation for available kwargs.
1384  - **config_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoConfig.from_pretrained` when loading the model configuration.
1385  - **backend** (<code>Literal['torch', 'onnx', 'openvino']</code>) – The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino".
1386    Refer to the [Sentence Transformers documentation](https://sbert.net/docs/sentence_transformer/usage/efficiency.html)
1387    for more information on acceleration and quantization options.
1388  - **revision** (<code>str | None</code>) – The specific model version to use. It can be a branch name, a tag name, or a commit id,
1389    for a stored model on Hugging Face.
1390  
1391  #### to_dict
1392  
1393  ```python
1394  to_dict() -> dict[str, Any]
1395  ```
1396  
1397  Serializes the component to a dictionary.
1398  
1399  **Returns:**
1400  
1401  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1402  
1403  #### from_dict
1404  
1405  ```python
1406  from_dict(data: dict[str, Any]) -> SentenceTransformersSparseTextEmbedder
1407  ```
1408  
1409  Deserializes the component from a dictionary.
1410  
1411  **Parameters:**
1412  
1413  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
1414  
1415  **Returns:**
1416  
1417  - <code>SentenceTransformersSparseTextEmbedder</code> – Deserialized component.
1418  
1419  #### warm_up
1420  
1421  ```python
1422  warm_up() -> None
1423  ```
1424  
1425  Initializes the component.
1426  
1427  #### run
1428  
1429  ```python
1430  run(text: str) -> dict[str, Any]
1431  ```
1432  
1433  Embed a single string.
1434  
1435  **Parameters:**
1436  
1437  - **text** (<code>str</code>) – Text to embed.
1438  
1439  **Returns:**
1440  
1441  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
1442  - `sparse_embedding`: The sparse embedding of the input text.
1443  
1444  ## sentence_transformers_text_embedder
1445  
1446  ### SentenceTransformersTextEmbedder
1447  
1448  Embeds strings using Sentence Transformers models.
1449  
1450  You can use it to embed user query and send it to an embedding retriever.
1451  
1452  Usage example:
1453  
1454  <!-- test-ignore -->
1455  
1456  ```python
1457  from haystack.components.embedders import SentenceTransformersTextEmbedder
1458  
1459  text_to_embed = "I love pizza!"
1460  
1461  text_embedder = SentenceTransformersTextEmbedder()
1462  
1463  print(text_embedder.run(text_to_embed))
1464  
1465  # {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}
1466  ```
1467  
1468  #### __init__
1469  
1470  ```python
1471  __init__(
1472      model: str = "sentence-transformers/all-mpnet-base-v2",
1473      device: ComponentDevice | None = None,
1474      token: Secret | None = Secret.from_env_var(
1475          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
1476      ),
1477      prefix: str = "",
1478      suffix: str = "",
1479      batch_size: int = 32,
1480      progress_bar: bool = True,
1481      normalize_embeddings: bool = False,
1482      trust_remote_code: bool = False,
1483      local_files_only: bool = False,
1484      truncate_dim: int | None = None,
1485      model_kwargs: dict[str, Any] | None = None,
1486      tokenizer_kwargs: dict[str, Any] | None = None,
1487      config_kwargs: dict[str, Any] | None = None,
1488      precision: Literal[
1489          "float32", "int8", "uint8", "binary", "ubinary"
1490      ] = "float32",
1491      encode_kwargs: dict[str, Any] | None = None,
1492      backend: Literal["torch", "onnx", "openvino"] = "torch",
1493      revision: str | None = None,
1494  ) -> None
1495  ```
1496  
1497  Create a SentenceTransformersTextEmbedder component.
1498  
1499  **Parameters:**
1500  
1501  - **model** (<code>str</code>) – The model to use for calculating embeddings.
1502    Specify the path to a local model or the ID of the model on Hugging Face.
1503  - **device** (<code>ComponentDevice | None</code>) – Overrides the default device used to load the model.
1504  - **token** (<code>Secret | None</code>) – An API token to use private models from Hugging Face.
1505  - **prefix** (<code>str</code>) – A string to add at the beginning of each text to be embedded.
1506    You can use it to prepend the text with an instruction, as required by some embedding models,
1507    such as E5 and bge.
1508  - **suffix** (<code>str</code>) – A string to add at the end of each text to embed.
1509  - **batch_size** (<code>int</code>) – Number of texts to embed at once.
1510  - **progress_bar** (<code>bool</code>) – If `True`, shows a progress bar for calculating embeddings.
1511    If `False`, disables the progress bar.
1512  - **normalize_embeddings** (<code>bool</code>) – If `True`, the embeddings are normalized using L2 normalization, so that the embeddings have a norm of 1.
1513  - **trust_remote_code** (<code>bool</code>) – If `False`, permits only Hugging Face verified model architectures.
1514    If `True`, permits custom models and scripts.
1515  - **local_files_only** (<code>bool</code>) – If `True`, does not attempt to download the model from Hugging Face Hub and only looks at local files.
1516  - **truncate_dim** (<code>int | None</code>) – The dimension to truncate sentence embeddings to. `None` does no truncation.
1517    If the model has not been trained with Matryoshka Representation Learning,
1518    truncation of embeddings can significantly affect performance.
1519  - **model_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoModelForSequenceClassification.from_pretrained`
1520    when loading the model. Refer to specific model documentation for available kwargs.
1521  - **tokenizer_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoTokenizer.from_pretrained` when loading the tokenizer.
1522    Refer to specific model documentation for available kwargs.
1523  - **config_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `AutoConfig.from_pretrained` when loading the model configuration.
1524  - **precision** (<code>Literal['float32', 'int8', 'uint8', 'binary', 'ubinary']</code>) – The precision to use for the embeddings.
1525    All non-float32 precisions are quantized embeddings.
1526    Quantized embeddings are smaller in size and faster to compute, but may have a lower accuracy.
1527    They are useful for reducing the size of the embeddings of a corpus for semantic search, among other tasks.
1528  - **encode_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for `SentenceTransformer.encode` when embedding texts.
1529    This parameter is provided for fine customization. Be careful not to clash with already set parameters and
1530    avoid passing parameters that change the output type.
1531  - **backend** (<code>Literal['torch', 'onnx', 'openvino']</code>) – The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino".
1532    Refer to the [Sentence Transformers documentation](https://sbert.net/docs/sentence_transformer/usage/efficiency.html)
1533    for more information on acceleration and quantization options.
1534  - **revision** (<code>str | None</code>) – The specific model version to use. It can be a branch name, a tag name, or a commit id,
1535    for a stored model on Hugging Face.
1536  
1537  #### to_dict
1538  
1539  ```python
1540  to_dict() -> dict[str, Any]
1541  ```
1542  
1543  Serializes the component to a dictionary.
1544  
1545  **Returns:**
1546  
1547  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1548  
1549  #### from_dict
1550  
1551  ```python
1552  from_dict(data: dict[str, Any]) -> SentenceTransformersTextEmbedder
1553  ```
1554  
1555  Deserializes the component from a dictionary.
1556  
1557  **Parameters:**
1558  
1559  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
1560  
1561  **Returns:**
1562  
1563  - <code>SentenceTransformersTextEmbedder</code> – Deserialized component.
1564  
1565  #### warm_up
1566  
1567  ```python
1568  warm_up() -> None
1569  ```
1570  
1571  Initializes the component.
1572  
1573  #### run
1574  
1575  ```python
1576  run(text: str) -> dict[str, Any]
1577  ```
1578  
1579  Embed a single string.
1580  
1581  **Parameters:**
1582  
1583  - **text** (<code>str</code>) – Text to embed.
1584  
1585  **Returns:**
1586  
1587  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
1588  - `embedding`: The embedding of the input text.