Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.18 / haystack-api / embedders_api.md
embedders_api.md
   1  ---
   2  title: Embedders
   3  id: embedders-api
   4  description: Transforms queries into vectors to look for similar or relevant Documents.
   5  slug: "/embedders-api"
   6  ---
   7  
   8  <a id="azure_document_embedder"></a>
   9  
  10  # Module azure\_document\_embedder
  11  
  12  <a id="azure_document_embedder.AzureOpenAIDocumentEmbedder"></a>
  13  
  14  ## AzureOpenAIDocumentEmbedder
  15  
  16  Calculates document embeddings using OpenAI models deployed on Azure.
  17  
  18  ### Usage example
  19  
  20  ```python
  21  from haystack import Document
  22  from haystack.components.embedders import AzureOpenAIDocumentEmbedder
  23  
  24  doc = Document(content="I love pizza!")
  25  
  26  document_embedder = AzureOpenAIDocumentEmbedder()
  27  
  28  result = document_embedder.run([doc])
  29  print(result['documents'][0].embedding)
  30  
  31  # [0.017020374536514282, -0.023255806416273117, ...]
  32  ```
  33  
  34  <a id="azure_document_embedder.AzureOpenAIDocumentEmbedder.__init__"></a>
  35  
  36  #### AzureOpenAIDocumentEmbedder.\_\_init\_\_
  37  
  38  ```python
  39  def __init__(azure_endpoint: Optional[str] = None,
  40               api_version: Optional[str] = "2023-05-15",
  41               azure_deployment: str = "text-embedding-ada-002",
  42               dimensions: Optional[int] = None,
  43               api_key: Optional[Secret] = Secret.from_env_var(
  44                   "AZURE_OPENAI_API_KEY", strict=False),
  45               azure_ad_token: Optional[Secret] = Secret.from_env_var(
  46                   "AZURE_OPENAI_AD_TOKEN", strict=False),
  47               organization: Optional[str] = None,
  48               prefix: str = "",
  49               suffix: str = "",
  50               batch_size: int = 32,
  51               progress_bar: bool = True,
  52               meta_fields_to_embed: Optional[list[str]] = None,
  53               embedding_separator: str = "\n",
  54               timeout: Optional[float] = None,
  55               max_retries: Optional[int] = None,
  56               *,
  57               default_headers: Optional[dict[str, str]] = None,
  58               azure_ad_token_provider: Optional[AzureADTokenProvider] = None,
  59               http_client_kwargs: Optional[dict[str, Any]] = None,
  60               raise_on_failure: bool = False)
  61  ```
  62  
  63  Creates an AzureOpenAIDocumentEmbedder component.
  64  
  65  **Arguments**:
  66  
  67  - `azure_endpoint`: The endpoint of the model deployed on Azure.
  68  - `api_version`: The version of the API to use.
  69  - `azure_deployment`: The name of the model deployed on Azure. The default model is text-embedding-ada-002.
  70  - `dimensions`: The number of dimensions of the resulting embeddings. Only supported in text-embedding-3
  71  and later models.
  72  - `api_key`: The Azure OpenAI API key.
  73  You can set it with an environment variable `AZURE_OPENAI_API_KEY`, or pass with this
  74  parameter during initialization.
  75  - `azure_ad_token`: Microsoft Entra ID token, see Microsoft's
  76  [Entra ID](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id)
  77  documentation for more information. You can set it with an environment variable
  78  `AZURE_OPENAI_AD_TOKEN`, or pass with this parameter during initialization.
  79  Previously called Azure Active Directory.
  80  - `organization`: Your organization ID. See OpenAI's
  81  [Setting Up Your Organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization)
  82  for more information.
  83  - `prefix`: A string to add at the beginning of each text.
  84  - `suffix`: A string to add at the end of each text.
  85  - `batch_size`: Number of documents to embed at once.
  86  - `progress_bar`: If `True`, shows a progress bar when running.
  87  - `meta_fields_to_embed`: List of metadata fields to embed along with the document text.
  88  - `embedding_separator`: Separator used to concatenate the metadata fields to the document text.
  89  - `timeout`: The timeout for `AzureOpenAI` client calls, in seconds.
  90  If not set, defaults to either the
  91  `OPENAI_TIMEOUT` environment variable, or 30 seconds.
  92  - `max_retries`: Maximum number of retries to contact AzureOpenAI after an internal error.
  93  If not set, defaults to either the `OPENAI_MAX_RETRIES` environment variable or to 5 retries.
  94  - `default_headers`: Default headers to send to the AzureOpenAI client.
  95  - `azure_ad_token_provider`: A function that returns an Azure Active Directory token, will be invoked on
  96  every request.
  97  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
  98  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
  99  - `raise_on_failure`: Whether to raise an exception if the embedding request fails. If `False`, the component will log the error
 100  and continue processing the remaining documents. If `True`, it will raise an exception on failure.
 101  
 102  <a id="azure_document_embedder.AzureOpenAIDocumentEmbedder.to_dict"></a>
 103  
 104  #### AzureOpenAIDocumentEmbedder.to\_dict
 105  
 106  ```python
 107  def to_dict() -> dict[str, Any]
 108  ```
 109  
 110  Serializes the component to a dictionary.
 111  
 112  **Returns**:
 113  
 114  Dictionary with serialized data.
 115  
 116  <a id="azure_document_embedder.AzureOpenAIDocumentEmbedder.from_dict"></a>
 117  
 118  #### AzureOpenAIDocumentEmbedder.from\_dict
 119  
 120  ```python
 121  @classmethod
 122  def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIDocumentEmbedder"
 123  ```
 124  
 125  Deserializes the component from a dictionary.
 126  
 127  **Arguments**:
 128  
 129  - `data`: Dictionary to deserialize from.
 130  
 131  **Returns**:
 132  
 133  Deserialized component.
 134  
 135  <a id="azure_document_embedder.AzureOpenAIDocumentEmbedder.run"></a>
 136  
 137  #### AzureOpenAIDocumentEmbedder.run
 138  
 139  ```python
 140  @component.output_types(documents=list[Document], meta=dict[str, Any])
 141  def run(documents: list[Document])
 142  ```
 143  
 144  Embeds a list of documents.
 145  
 146  **Arguments**:
 147  
 148  - `documents`: A list of documents to embed.
 149  
 150  **Returns**:
 151  
 152  A dictionary with the following keys:
 153  - `documents`: A list of documents with embeddings.
 154  - `meta`: Information about the usage of the model.
 155  
 156  <a id="azure_document_embedder.AzureOpenAIDocumentEmbedder.run_async"></a>
 157  
 158  #### AzureOpenAIDocumentEmbedder.run\_async
 159  
 160  ```python
 161  @component.output_types(documents=list[Document], meta=dict[str, Any])
 162  async def run_async(documents: list[Document])
 163  ```
 164  
 165  Embeds a list of documents asynchronously.
 166  
 167  **Arguments**:
 168  
 169  - `documents`: A list of documents to embed.
 170  
 171  **Returns**:
 172  
 173  A dictionary with the following keys:
 174  - `documents`: A list of documents with embeddings.
 175  - `meta`: Information about the usage of the model.
 176  
 177  <a id="azure_text_embedder"></a>
 178  
 179  # Module azure\_text\_embedder
 180  
 181  <a id="azure_text_embedder.AzureOpenAITextEmbedder"></a>
 182  
 183  ## AzureOpenAITextEmbedder
 184  
 185  Embeds strings using OpenAI models deployed on Azure.
 186  
 187  ### Usage example
 188  
 189  ```python
 190  from haystack.components.embedders import AzureOpenAITextEmbedder
 191  
 192  text_to_embed = "I love pizza!"
 193  
 194  text_embedder = AzureOpenAITextEmbedder()
 195  
 196  print(text_embedder.run(text_to_embed))
 197  
 198  # {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
 199  # 'meta': {'model': 'text-embedding-ada-002-v2',
 200  #          'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}
 201  ```
 202  
 203  <a id="azure_text_embedder.AzureOpenAITextEmbedder.__init__"></a>
 204  
 205  #### AzureOpenAITextEmbedder.\_\_init\_\_
 206  
 207  ```python
 208  def __init__(azure_endpoint: Optional[str] = None,
 209               api_version: Optional[str] = "2023-05-15",
 210               azure_deployment: str = "text-embedding-ada-002",
 211               dimensions: Optional[int] = None,
 212               api_key: Optional[Secret] = Secret.from_env_var(
 213                   "AZURE_OPENAI_API_KEY", strict=False),
 214               azure_ad_token: Optional[Secret] = Secret.from_env_var(
 215                   "AZURE_OPENAI_AD_TOKEN", strict=False),
 216               organization: Optional[str] = None,
 217               timeout: Optional[float] = None,
 218               max_retries: Optional[int] = None,
 219               prefix: str = "",
 220               suffix: str = "",
 221               *,
 222               default_headers: Optional[dict[str, str]] = None,
 223               azure_ad_token_provider: Optional[AzureADTokenProvider] = None,
 224               http_client_kwargs: Optional[dict[str, Any]] = None)
 225  ```
 226  
 227  Creates an AzureOpenAITextEmbedder component.
 228  
 229  **Arguments**:
 230  
 231  - `azure_endpoint`: The endpoint of the model deployed on Azure.
 232  - `api_version`: The version of the API to use.
 233  - `azure_deployment`: The name of the model deployed on Azure. The default model is text-embedding-ada-002.
 234  - `dimensions`: The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3
 235  and later models.
 236  - `api_key`: The Azure OpenAI API key.
 237  You can set it with an environment variable `AZURE_OPENAI_API_KEY`, or pass with this
 238  parameter during initialization.
 239  - `azure_ad_token`: Microsoft Entra ID token, see Microsoft's
 240  [Entra ID](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id)
 241  documentation for more information. You can set it with an environment variable
 242  `AZURE_OPENAI_AD_TOKEN`, or pass with this parameter during initialization.
 243  Previously called Azure Active Directory.
 244  - `organization`: Your organization ID. See OpenAI's
 245  [Setting Up Your Organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization)
 246  for more information.
 247  - `timeout`: The timeout for `AzureOpenAI` client calls, in seconds.
 248  If not set, defaults to either the
 249  `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 250  - `max_retries`: Maximum number of retries to contact AzureOpenAI after an internal error.
 251  If not set, defaults to either the `OPENAI_MAX_RETRIES` environment variable, or to 5 retries.
 252  - `prefix`: A string to add at the beginning of each text.
 253  - `suffix`: A string to add at the end of each text.
 254  - `default_headers`: Default headers to send to the AzureOpenAI client.
 255  - `azure_ad_token_provider`: A function that returns an Azure Active Directory token, will be invoked on
 256  every request.
 257  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 258  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
 259  
 260  <a id="azure_text_embedder.AzureOpenAITextEmbedder.to_dict"></a>
 261  
 262  #### AzureOpenAITextEmbedder.to\_dict
 263  
 264  ```python
 265  def to_dict() -> dict[str, Any]
 266  ```
 267  
 268  Serializes the component to a dictionary.
 269  
 270  **Returns**:
 271  
 272  Dictionary with serialized data.
 273  
 274  <a id="azure_text_embedder.AzureOpenAITextEmbedder.from_dict"></a>
 275  
 276  #### AzureOpenAITextEmbedder.from\_dict
 277  
 278  ```python
 279  @classmethod
 280  def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAITextEmbedder"
 281  ```
 282  
 283  Deserializes the component from a dictionary.
 284  
 285  **Arguments**:
 286  
 287  - `data`: Dictionary to deserialize from.
 288  
 289  **Returns**:
 290  
 291  Deserialized component.
 292  
 293  <a id="azure_text_embedder.AzureOpenAITextEmbedder.run"></a>
 294  
 295  #### AzureOpenAITextEmbedder.run
 296  
 297  ```python
 298  @component.output_types(embedding=list[float], meta=dict[str, Any])
 299  def run(text: str)
 300  ```
 301  
 302  Embeds a single string.
 303  
 304  **Arguments**:
 305  
 306  - `text`: Text to embed.
 307  
 308  **Returns**:
 309  
 310  A dictionary with the following keys:
 311  - `embedding`: The embedding of the input text.
 312  - `meta`: Information about the usage of the model.
 313  
 314  <a id="azure_text_embedder.AzureOpenAITextEmbedder.run_async"></a>
 315  
 316  #### AzureOpenAITextEmbedder.run\_async
 317  
 318  ```python
 319  @component.output_types(embedding=list[float], meta=dict[str, Any])
 320  async def run_async(text: str)
 321  ```
 322  
 323  Asynchronously embed a single string.
 324  
 325  This is the asynchronous version of the `run` method. It has the same parameters and return values
 326  but can be used with `await` in async code.
 327  
 328  **Arguments**:
 329  
 330  - `text`: Text to embed.
 331  
 332  **Returns**:
 333  
 334  A dictionary with the following keys:
 335  - `embedding`: The embedding of the input text.
 336  - `meta`: Information about the usage of the model.
 337  
 338  <a id="hugging_face_api_document_embedder"></a>
 339  
 340  # Module hugging\_face\_api\_document\_embedder
 341  
 342  <a id="hugging_face_api_document_embedder.HuggingFaceAPIDocumentEmbedder"></a>
 343  
 344  ## HuggingFaceAPIDocumentEmbedder
 345  
 346  Embeds documents using Hugging Face APIs.
 347  
 348  Use it with the following Hugging Face APIs:
 349  - [Free Serverless Inference API](https://huggingface.co/inference-api)
 350  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
 351  - [Self-hosted Text Embeddings Inference](https://github.com/huggingface/text-embeddings-inference)
 352  
 353  
 354  ### Usage examples
 355  
 356  #### With free serverless inference API
 357  
 358  ```python
 359  from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
 360  from haystack.utils import Secret
 361  from haystack.dataclasses import Document
 362  
 363  doc = Document(content="I love pizza!")
 364  
 365  doc_embedder = HuggingFaceAPIDocumentEmbedder(api_type="serverless_inference_api",
 366                                                api_params={"model": "BAAI/bge-small-en-v1.5"},
 367                                                token=Secret.from_token("<your-api-key>"))
 368  
 369  result = document_embedder.run([doc])
 370  print(result["documents"][0].embedding)
 371  
 372  # [0.017020374536514282, -0.023255806416273117, ...]
 373  ```
 374  
 375  #### With paid inference endpoints
 376  
 377  ```python
 378  from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
 379  from haystack.utils import Secret
 380  from haystack.dataclasses import Document
 381  
 382  doc = Document(content="I love pizza!")
 383  
 384  doc_embedder = HuggingFaceAPIDocumentEmbedder(api_type="inference_endpoints",
 385                                                api_params={"url": "<your-inference-endpoint-url>"},
 386                                                token=Secret.from_token("<your-api-key>"))
 387  
 388  result = document_embedder.run([doc])
 389  print(result["documents"][0].embedding)
 390  
 391  # [0.017020374536514282, -0.023255806416273117, ...]
 392  ```
 393  
 394  #### With self-hosted text embeddings inference
 395  
 396  ```python
 397  from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder
 398  from haystack.dataclasses import Document
 399  
 400  doc = Document(content="I love pizza!")
 401  
 402  doc_embedder = HuggingFaceAPIDocumentEmbedder(api_type="text_embeddings_inference",
 403                                                api_params={"url": "http://localhost:8080"})
 404  
 405  result = document_embedder.run([doc])
 406  print(result["documents"][0].embedding)
 407  
 408  # [0.017020374536514282, -0.023255806416273117, ...]
 409  ```
 410  
 411  <a id="hugging_face_api_document_embedder.HuggingFaceAPIDocumentEmbedder.__init__"></a>
 412  
 413  #### HuggingFaceAPIDocumentEmbedder.\_\_init\_\_
 414  
 415  ```python
 416  def __init__(api_type: Union[HFEmbeddingAPIType, str],
 417               api_params: dict[str, str],
 418               token: Optional[Secret] = Secret.from_env_var(
 419                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
 420               prefix: str = "",
 421               suffix: str = "",
 422               truncate: Optional[bool] = True,
 423               normalize: Optional[bool] = False,
 424               batch_size: int = 32,
 425               progress_bar: bool = True,
 426               meta_fields_to_embed: Optional[list[str]] = None,
 427               embedding_separator: str = "\n")
 428  ```
 429  
 430  Creates a HuggingFaceAPIDocumentEmbedder component.
 431  
 432  **Arguments**:
 433  
 434  - `api_type`: The type of Hugging Face API to use.
 435  - `api_params`: A dictionary with the following keys:
 436  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
 437  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
 438  `TEXT_EMBEDDINGS_INFERENCE`.
 439  - `token`: The Hugging Face token to use as HTTP bearer authorization.
 440  Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
 441  - `prefix`: A string to add at the beginning of each text.
 442  - `suffix`: A string to add at the end of each text.
 443  - `truncate`: Truncates the input text to the maximum length supported by the model.
 444  Applicable when `api_type` is `TEXT_EMBEDDINGS_INFERENCE`, or `INFERENCE_ENDPOINTS`
 445  if the backend uses Text Embeddings Inference.
 446  If `api_type` is `SERVERLESS_INFERENCE_API`, this parameter is ignored.
 447  - `normalize`: Normalizes the embeddings to unit length.
 448  Applicable when `api_type` is `TEXT_EMBEDDINGS_INFERENCE`, or `INFERENCE_ENDPOINTS`
 449  if the backend uses Text Embeddings Inference.
 450  If `api_type` is `SERVERLESS_INFERENCE_API`, this parameter is ignored.
 451  - `batch_size`: Number of documents to process at once.
 452  - `progress_bar`: If `True`, shows a progress bar when running.
 453  - `meta_fields_to_embed`: List of metadata fields to embed along with the document text.
 454  - `embedding_separator`: Separator used to concatenate the metadata fields to the document text.
 455  
 456  <a id="hugging_face_api_document_embedder.HuggingFaceAPIDocumentEmbedder.to_dict"></a>
 457  
 458  #### HuggingFaceAPIDocumentEmbedder.to\_dict
 459  
 460  ```python
 461  def to_dict() -> dict[str, Any]
 462  ```
 463  
 464  Serializes the component to a dictionary.
 465  
 466  **Returns**:
 467  
 468  Dictionary with serialized data.
 469  
 470  <a id="hugging_face_api_document_embedder.HuggingFaceAPIDocumentEmbedder.from_dict"></a>
 471  
 472  #### HuggingFaceAPIDocumentEmbedder.from\_dict
 473  
 474  ```python
 475  @classmethod
 476  def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIDocumentEmbedder"
 477  ```
 478  
 479  Deserializes the component from a dictionary.
 480  
 481  **Arguments**:
 482  
 483  - `data`: Dictionary to deserialize from.
 484  
 485  **Returns**:
 486  
 487  Deserialized component.
 488  
 489  <a id="hugging_face_api_document_embedder.HuggingFaceAPIDocumentEmbedder.run"></a>
 490  
 491  #### HuggingFaceAPIDocumentEmbedder.run
 492  
 493  ```python
 494  @component.output_types(documents=list[Document])
 495  def run(documents: list[Document])
 496  ```
 497  
 498  Embeds a list of documents.
 499  
 500  **Arguments**:
 501  
 502  - `documents`: Documents to embed.
 503  
 504  **Returns**:
 505  
 506  A dictionary with the following keys:
 507  - `documents`: A list of documents with embeddings.
 508  
 509  <a id="hugging_face_api_document_embedder.HuggingFaceAPIDocumentEmbedder.run_async"></a>
 510  
 511  #### HuggingFaceAPIDocumentEmbedder.run\_async
 512  
 513  ```python
 514  @component.output_types(documents=list[Document])
 515  async def run_async(documents: list[Document])
 516  ```
 517  
 518  Embeds a list of documents asynchronously.
 519  
 520  **Arguments**:
 521  
 522  - `documents`: Documents to embed.
 523  
 524  **Returns**:
 525  
 526  A dictionary with the following keys:
 527  - `documents`: A list of documents with embeddings.
 528  
 529  <a id="hugging_face_api_text_embedder"></a>
 530  
 531  # Module hugging\_face\_api\_text\_embedder
 532  
 533  <a id="hugging_face_api_text_embedder.HuggingFaceAPITextEmbedder"></a>
 534  
 535  ## HuggingFaceAPITextEmbedder
 536  
 537  Embeds strings using Hugging Face APIs.
 538  
 539  Use it with the following Hugging Face APIs:
 540  - [Free Serverless Inference API](https://huggingface.co/inference-api)
 541  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
 542  - [Self-hosted Text Embeddings Inference](https://github.com/huggingface/text-embeddings-inference)
 543  
 544  ### Usage examples
 545  
 546  #### With free serverless inference API
 547  
 548  ```python
 549  from haystack.components.embedders import HuggingFaceAPITextEmbedder
 550  from haystack.utils import Secret
 551  
 552  text_embedder = HuggingFaceAPITextEmbedder(api_type="serverless_inference_api",
 553                                             api_params={"model": "BAAI/bge-small-en-v1.5"},
 554                                             token=Secret.from_token("<your-api-key>"))
 555  
 556  print(text_embedder.run("I love pizza!"))
 557  
 558  # {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
 559  ```
 560  
 561  #### With paid inference endpoints
 562  
 563  ```python
 564  from haystack.components.embedders import HuggingFaceAPITextEmbedder
 565  from haystack.utils import Secret
 566  text_embedder = HuggingFaceAPITextEmbedder(api_type="inference_endpoints",
 567                                             api_params={"model": "BAAI/bge-small-en-v1.5"},
 568                                             token=Secret.from_token("<your-api-key>"))
 569  
 570  print(text_embedder.run("I love pizza!"))
 571  
 572  # {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
 573  ```
 574  
 575  #### With self-hosted text embeddings inference
 576  
 577  ```python
 578  from haystack.components.embedders import HuggingFaceAPITextEmbedder
 579  from haystack.utils import Secret
 580  
 581  text_embedder = HuggingFaceAPITextEmbedder(api_type="text_embeddings_inference",
 582                                             api_params={"url": "http://localhost:8080"})
 583  
 584  print(text_embedder.run("I love pizza!"))
 585  
 586  # {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
 587  ```
 588  
 589  <a id="hugging_face_api_text_embedder.HuggingFaceAPITextEmbedder.__init__"></a>
 590  
 591  #### HuggingFaceAPITextEmbedder.\_\_init\_\_
 592  
 593  ```python
 594  def __init__(api_type: Union[HFEmbeddingAPIType, str],
 595               api_params: dict[str, str],
 596               token: Optional[Secret] = Secret.from_env_var(
 597                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
 598               prefix: str = "",
 599               suffix: str = "",
 600               truncate: Optional[bool] = True,
 601               normalize: Optional[bool] = False)
 602  ```
 603  
 604  Creates a HuggingFaceAPITextEmbedder component.
 605  
 606  **Arguments**:
 607  
 608  - `api_type`: The type of Hugging Face API to use.
 609  - `api_params`: A dictionary with the following keys:
 610  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
 611  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
 612  `TEXT_EMBEDDINGS_INFERENCE`.
 613  - `token`: The Hugging Face token to use as HTTP bearer authorization.
 614  Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
 615  - `prefix`: A string to add at the beginning of each text.
 616  - `suffix`: A string to add at the end of each text.
 617  - `truncate`: Truncates the input text to the maximum length supported by the model.
 618  Applicable when `api_type` is `TEXT_EMBEDDINGS_INFERENCE`, or `INFERENCE_ENDPOINTS`
 619  if the backend uses Text Embeddings Inference.
 620  If `api_type` is `SERVERLESS_INFERENCE_API`, this parameter is ignored.
 621  - `normalize`: Normalizes the embeddings to unit length.
 622  Applicable when `api_type` is `TEXT_EMBEDDINGS_INFERENCE`, or `INFERENCE_ENDPOINTS`
 623  if the backend uses Text Embeddings Inference.
 624  If `api_type` is `SERVERLESS_INFERENCE_API`, this parameter is ignored.
 625  
 626  <a id="hugging_face_api_text_embedder.HuggingFaceAPITextEmbedder.to_dict"></a>
 627  
 628  #### HuggingFaceAPITextEmbedder.to\_dict
 629  
 630  ```python
 631  def to_dict() -> dict[str, Any]
 632  ```
 633  
 634  Serializes the component to a dictionary.
 635  
 636  **Returns**:
 637  
 638  Dictionary with serialized data.
 639  
 640  <a id="hugging_face_api_text_embedder.HuggingFaceAPITextEmbedder.from_dict"></a>
 641  
 642  #### HuggingFaceAPITextEmbedder.from\_dict
 643  
 644  ```python
 645  @classmethod
 646  def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPITextEmbedder"
 647  ```
 648  
 649  Deserializes the component from a dictionary.
 650  
 651  **Arguments**:
 652  
 653  - `data`: Dictionary to deserialize from.
 654  
 655  **Returns**:
 656  
 657  Deserialized component.
 658  
 659  <a id="hugging_face_api_text_embedder.HuggingFaceAPITextEmbedder.run"></a>
 660  
 661  #### HuggingFaceAPITextEmbedder.run
 662  
 663  ```python
 664  @component.output_types(embedding=list[float])
 665  def run(text: str)
 666  ```
 667  
 668  Embeds a single string.
 669  
 670  **Arguments**:
 671  
 672  - `text`: Text to embed.
 673  
 674  **Returns**:
 675  
 676  A dictionary with the following keys:
 677  - `embedding`: The embedding of the input text.
 678  
 679  <a id="hugging_face_api_text_embedder.HuggingFaceAPITextEmbedder.run_async"></a>
 680  
 681  #### HuggingFaceAPITextEmbedder.run\_async
 682  
 683  ```python
 684  @component.output_types(embedding=list[float])
 685  async def run_async(text: str)
 686  ```
 687  
 688  Embeds a single string asynchronously.
 689  
 690  **Arguments**:
 691  
 692  - `text`: Text to embed.
 693  
 694  **Returns**:
 695  
 696  A dictionary with the following keys:
 697  - `embedding`: The embedding of the input text.
 698  
 699  <a id="openai_document_embedder"></a>
 700  
 701  # Module openai\_document\_embedder
 702  
 703  <a id="openai_document_embedder.OpenAIDocumentEmbedder"></a>
 704  
 705  ## OpenAIDocumentEmbedder
 706  
 707  Computes document embeddings using OpenAI models.
 708  
 709  ### Usage example
 710  
 711  ```python
 712  from haystack import Document
 713  from haystack.components.embedders import OpenAIDocumentEmbedder
 714  
 715  doc = Document(content="I love pizza!")
 716  
 717  document_embedder = OpenAIDocumentEmbedder()
 718  
 719  result = document_embedder.run([doc])
 720  print(result['documents'][0].embedding)
 721  
 722  # [0.017020374536514282, -0.023255806416273117, ...]
 723  ```
 724  
 725  <a id="openai_document_embedder.OpenAIDocumentEmbedder.__init__"></a>
 726  
 727  #### OpenAIDocumentEmbedder.\_\_init\_\_
 728  
 729  ```python
 730  def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
 731               model: str = "text-embedding-ada-002",
 732               dimensions: Optional[int] = None,
 733               api_base_url: Optional[str] = None,
 734               organization: Optional[str] = None,
 735               prefix: str = "",
 736               suffix: str = "",
 737               batch_size: int = 32,
 738               progress_bar: bool = True,
 739               meta_fields_to_embed: Optional[list[str]] = None,
 740               embedding_separator: str = "\n",
 741               timeout: Optional[float] = None,
 742               max_retries: Optional[int] = None,
 743               http_client_kwargs: Optional[dict[str, Any]] = None,
 744               *,
 745               raise_on_failure: bool = False)
 746  ```
 747  
 748  Creates an OpenAIDocumentEmbedder component.
 749  
 750  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
 751  environment variables to override the `timeout` and `max_retries` parameters respectively
 752  in the OpenAI client.
 753  
 754  **Arguments**:
 755  
 756  - `api_key`: The OpenAI API key.
 757  You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
 758  during initialization.
 759  - `model`: The name of the model to use for calculating embeddings.
 760  The default model is `text-embedding-ada-002`.
 761  - `dimensions`: The number of dimensions of the resulting embeddings. Only `text-embedding-3` and
 762  later models support this parameter.
 763  - `api_base_url`: Overrides the default base URL for all HTTP requests.
 764  - `organization`: Your OpenAI organization ID. See OpenAI's
 765  [Setting Up Your Organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization)
 766  for more information.
 767  - `prefix`: A string to add at the beginning of each text.
 768  - `suffix`: A string to add at the end of each text.
 769  - `batch_size`: Number of documents to embed at once.
 770  - `progress_bar`: If `True`, shows a progress bar when running.
 771  - `meta_fields_to_embed`: List of metadata fields to embed along with the document text.
 772  - `embedding_separator`: Separator used to concatenate the metadata fields to the document text.
 773  - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the
 774  `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 775  - `max_retries`: Maximum number of retries to contact OpenAI after an internal error.
 776  If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or 5 retries.
 777  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 778  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
 779  - `raise_on_failure`: Whether to raise an exception if the embedding request fails. If `False`, the component will log the error
 780  and continue processing the remaining documents. If `True`, it will raise an exception on failure.
 781  
 782  <a id="openai_document_embedder.OpenAIDocumentEmbedder.to_dict"></a>
 783  
 784  #### OpenAIDocumentEmbedder.to\_dict
 785  
 786  ```python
 787  def to_dict() -> dict[str, Any]
 788  ```
 789  
 790  Serializes the component to a dictionary.
 791  
 792  **Returns**:
 793  
 794  Dictionary with serialized data.
 795  
 796  <a id="openai_document_embedder.OpenAIDocumentEmbedder.from_dict"></a>
 797  
 798  #### OpenAIDocumentEmbedder.from\_dict
 799  
 800  ```python
 801  @classmethod
 802  def from_dict(cls, data: dict[str, Any]) -> "OpenAIDocumentEmbedder"
 803  ```
 804  
 805  Deserializes the component from a dictionary.
 806  
 807  **Arguments**:
 808  
 809  - `data`: Dictionary to deserialize from.
 810  
 811  **Returns**:
 812  
 813  Deserialized component.
 814  
 815  <a id="openai_document_embedder.OpenAIDocumentEmbedder.run"></a>
 816  
 817  #### OpenAIDocumentEmbedder.run
 818  
 819  ```python
 820  @component.output_types(documents=list[Document], meta=dict[str, Any])
 821  def run(documents: list[Document])
 822  ```
 823  
 824  Embeds a list of documents.
 825  
 826  **Arguments**:
 827  
 828  - `documents`: A list of documents to embed.
 829  
 830  **Returns**:
 831  
 832  A dictionary with the following keys:
 833  - `documents`: A list of documents with embeddings.
 834  - `meta`: Information about the usage of the model.
 835  
 836  <a id="openai_document_embedder.OpenAIDocumentEmbedder.run_async"></a>
 837  
 838  #### OpenAIDocumentEmbedder.run\_async
 839  
 840  ```python
 841  @component.output_types(documents=list[Document], meta=dict[str, Any])
 842  async def run_async(documents: list[Document])
 843  ```
 844  
 845  Embeds a list of documents asynchronously.
 846  
 847  **Arguments**:
 848  
 849  - `documents`: A list of documents to embed.
 850  
 851  **Returns**:
 852  
 853  A dictionary with the following keys:
 854  - `documents`: A list of documents with embeddings.
 855  - `meta`: Information about the usage of the model.
 856  
 857  <a id="openai_text_embedder"></a>
 858  
 859  # Module openai\_text\_embedder
 860  
 861  <a id="openai_text_embedder.OpenAITextEmbedder"></a>
 862  
 863  ## OpenAITextEmbedder
 864  
 865  Embeds strings using OpenAI models.
 866  
 867  You can use it to embed user query and send it to an embedding Retriever.
 868  
 869  ### Usage example
 870  
 871  ```python
 872  from haystack.components.embedders import OpenAITextEmbedder
 873  
 874  text_to_embed = "I love pizza!"
 875  
 876  text_embedder = OpenAITextEmbedder()
 877  
 878  print(text_embedder.run(text_to_embed))
 879  
 880  # {'embedding': [0.017020374536514282, -0.023255806416273117, ...],
 881  # 'meta': {'model': 'text-embedding-ada-002-v2',
 882  #          'usage': {'prompt_tokens': 4, 'total_tokens': 4}}}
 883  ```
 884  
 885  <a id="openai_text_embedder.OpenAITextEmbedder.__init__"></a>
 886  
 887  #### OpenAITextEmbedder.\_\_init\_\_
 888  
 889  ```python
 890  def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
 891               model: str = "text-embedding-ada-002",
 892               dimensions: Optional[int] = None,
 893               api_base_url: Optional[str] = None,
 894               organization: Optional[str] = None,
 895               prefix: str = "",
 896               suffix: str = "",
 897               timeout: Optional[float] = None,
 898               max_retries: Optional[int] = None,
 899               http_client_kwargs: Optional[dict[str, Any]] = None)
 900  ```
 901  
 902  Creates an OpenAITextEmbedder component.
 903  
 904  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
 905  environment variables to override the `timeout` and `max_retries` parameters respectively
 906  in the OpenAI client.
 907  
 908  **Arguments**:
 909  
 910  - `api_key`: The OpenAI API key.
 911  You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
 912  during initialization.
 913  - `model`: The name of the model to use for calculating embeddings.
 914  The default model is `text-embedding-ada-002`.
 915  - `dimensions`: The number of dimensions of the resulting embeddings. Only `text-embedding-3` and
 916  later models support this parameter.
 917  - `api_base_url`: Overrides default base URL for all HTTP requests.
 918  - `organization`: Your organization ID. See OpenAI's
 919  [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization)
 920  for more information.
 921  - `prefix`: A string to add at the beginning of each text to embed.
 922  - `suffix`: A string to add at the end of each text to embed.
 923  - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the
 924  `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 925  - `max_retries`: Maximum number of retries to contact OpenAI after an internal error.
 926  If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 927  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 928  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
 929  
 930  <a id="openai_text_embedder.OpenAITextEmbedder.to_dict"></a>
 931  
 932  #### OpenAITextEmbedder.to\_dict
 933  
 934  ```python
 935  def to_dict() -> dict[str, Any]
 936  ```
 937  
 938  Serializes the component to a dictionary.
 939  
 940  **Returns**:
 941  
 942  Dictionary with serialized data.
 943  
 944  <a id="openai_text_embedder.OpenAITextEmbedder.from_dict"></a>
 945  
 946  #### OpenAITextEmbedder.from\_dict
 947  
 948  ```python
 949  @classmethod
 950  def from_dict(cls, data: dict[str, Any]) -> "OpenAITextEmbedder"
 951  ```
 952  
 953  Deserializes the component from a dictionary.
 954  
 955  **Arguments**:
 956  
 957  - `data`: Dictionary to deserialize from.
 958  
 959  **Returns**:
 960  
 961  Deserialized component.
 962  
 963  <a id="openai_text_embedder.OpenAITextEmbedder.run"></a>
 964  
 965  #### OpenAITextEmbedder.run
 966  
 967  ```python
 968  @component.output_types(embedding=list[float], meta=dict[str, Any])
 969  def run(text: str)
 970  ```
 971  
 972  Embeds a single string.
 973  
 974  **Arguments**:
 975  
 976  - `text`: Text to embed.
 977  
 978  **Returns**:
 979  
 980  A dictionary with the following keys:
 981  - `embedding`: The embedding of the input text.
 982  - `meta`: Information about the usage of the model.
 983  
 984  <a id="openai_text_embedder.OpenAITextEmbedder.run_async"></a>
 985  
 986  #### OpenAITextEmbedder.run\_async
 987  
 988  ```python
 989  @component.output_types(embedding=list[float], meta=dict[str, Any])
 990  async def run_async(text: str)
 991  ```
 992  
 993  Asynchronously embed a single string.
 994  
 995  This is the asynchronous version of the `run` method. It has the same parameters and return values
 996  but can be used with `await` in async code.
 997  
 998  **Arguments**:
 999  
1000  - `text`: Text to embed.
1001  
1002  **Returns**:
1003  
1004  A dictionary with the following keys:
1005  - `embedding`: The embedding of the input text.
1006  - `meta`: Information about the usage of the model.
1007  
1008  <a id="sentence_transformers_document_embedder"></a>
1009  
1010  # Module sentence\_transformers\_document\_embedder
1011  
1012  <a id="sentence_transformers_document_embedder.SentenceTransformersDocumentEmbedder"></a>
1013  
1014  ## SentenceTransformersDocumentEmbedder
1015  
1016  Calculates document embeddings using Sentence Transformers models.
1017  
1018  It stores the embeddings in the `embedding` metadata field of each document.
1019  You can also embed documents' metadata.
1020  Use this component in indexing pipelines to embed input documents
1021  and send them to DocumentWriter to write a into a Document Store.
1022  
1023  ### Usage example:
1024  
1025  ```python
1026  from haystack import Document
1027  from haystack.components.embedders import SentenceTransformersDocumentEmbedder
1028  doc = Document(content="I love pizza!")
1029  doc_embedder = SentenceTransformersDocumentEmbedder()
1030  doc_embedder.warm_up()
1031  
1032  result = doc_embedder.run([doc])
1033  print(result['documents'][0].embedding)
1034  
1035  # [-0.07804739475250244, 0.1498992145061493, ...]
1036  ```
1037  
1038  <a id="sentence_transformers_document_embedder.SentenceTransformersDocumentEmbedder.__init__"></a>
1039  
1040  #### SentenceTransformersDocumentEmbedder.\_\_init\_\_
1041  
1042  ```python
1043  def __init__(model: str = "sentence-transformers/all-mpnet-base-v2",
1044               device: Optional[ComponentDevice] = None,
1045               token: Optional[Secret] = Secret.from_env_var(
1046                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
1047               prefix: str = "",
1048               suffix: str = "",
1049               batch_size: int = 32,
1050               progress_bar: bool = True,
1051               normalize_embeddings: bool = False,
1052               meta_fields_to_embed: Optional[list[str]] = None,
1053               embedding_separator: str = "\n",
1054               trust_remote_code: bool = False,
1055               local_files_only: bool = False,
1056               truncate_dim: Optional[int] = None,
1057               model_kwargs: Optional[dict[str, Any]] = None,
1058               tokenizer_kwargs: Optional[dict[str, Any]] = None,
1059               config_kwargs: Optional[dict[str, Any]] = None,
1060               precision: Literal["float32", "int8", "uint8", "binary",
1061                                  "ubinary"] = "float32",
1062               encode_kwargs: Optional[dict[str, Any]] = None,
1063               backend: Literal["torch", "onnx", "openvino"] = "torch")
1064  ```
1065  
1066  Creates a SentenceTransformersDocumentEmbedder component.
1067  
1068  **Arguments**:
1069  
1070  - `model`: The model to use for calculating embeddings.
1071  Pass a local path or ID of the model on Hugging Face.
1072  - `device`: The device to use for loading the model.
1073  Overrides the default device.
1074  - `token`: The API token to download private models from Hugging Face.
1075  - `prefix`: A string to add at the beginning of each document text.
1076  Can be used to prepend the text with an instruction, as required by some embedding models,
1077  such as E5 and bge.
1078  - `suffix`: A string to add at the end of each document text.
1079  - `batch_size`: Number of documents to embed at once.
1080  - `progress_bar`: If `True`, shows a progress bar when embedding documents.
1081  - `normalize_embeddings`: If `True`, the embeddings are normalized using L2 normalization, so that each embedding has a norm of 1.
1082  - `meta_fields_to_embed`: List of metadata fields to embed along with the document text.
1083  - `embedding_separator`: Separator used to concatenate the metadata fields to the document text.
1084  - `trust_remote_code`: If `False`, allows only Hugging Face verified model architectures.
1085  If `True`, allows custom models and scripts.
1086  - `local_files_only`: If `True`, does not attempt to download the model from Hugging Face Hub and only looks at local files.
1087  - `truncate_dim`: The dimension to truncate sentence embeddings to. `None` does no truncation.
1088  If the model wasn't trained with Matryoshka Representation Learning,
1089  truncating embeddings can significantly affect performance.
1090  - `model_kwargs`: Additional keyword arguments for `AutoModelForSequenceClassification.from_pretrained`
1091  when loading the model. Refer to specific model documentation for available kwargs.
1092  - `tokenizer_kwargs`: Additional keyword arguments for `AutoTokenizer.from_pretrained` when loading the tokenizer.
1093  Refer to specific model documentation for available kwargs.
1094  - `config_kwargs`: Additional keyword arguments for `AutoConfig.from_pretrained` when loading the model configuration.
1095  - `precision`: The precision to use for the embeddings.
1096  All non-float32 precisions are quantized embeddings.
1097  Quantized embeddings are smaller and faster to compute, but may have a lower accuracy.
1098  They are useful for reducing the size of the embeddings of a corpus for semantic search, among other tasks.
1099  - `encode_kwargs`: Additional keyword arguments for `SentenceTransformer.encode` when embedding documents.
1100  This parameter is provided for fine customization. Be careful not to clash with already set parameters and
1101  avoid passing parameters that change the output type.
1102  - `backend`: The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino".
1103  Refer to the [Sentence Transformers documentation](https://sbert.net/docs/sentence_transformer/usage/efficiency.html)
1104  for more information on acceleration and quantization options.
1105  
1106  <a id="sentence_transformers_document_embedder.SentenceTransformersDocumentEmbedder.to_dict"></a>
1107  
1108  #### SentenceTransformersDocumentEmbedder.to\_dict
1109  
1110  ```python
1111  def to_dict() -> dict[str, Any]
1112  ```
1113  
1114  Serializes the component to a dictionary.
1115  
1116  **Returns**:
1117  
1118  Dictionary with serialized data.
1119  
1120  <a id="sentence_transformers_document_embedder.SentenceTransformersDocumentEmbedder.from_dict"></a>
1121  
1122  #### SentenceTransformersDocumentEmbedder.from\_dict
1123  
1124  ```python
1125  @classmethod
1126  def from_dict(cls, data: dict[str,
1127                                Any]) -> "SentenceTransformersDocumentEmbedder"
1128  ```
1129  
1130  Deserializes the component from a dictionary.
1131  
1132  **Arguments**:
1133  
1134  - `data`: Dictionary to deserialize from.
1135  
1136  **Returns**:
1137  
1138  Deserialized component.
1139  
1140  <a id="sentence_transformers_document_embedder.SentenceTransformersDocumentEmbedder.warm_up"></a>
1141  
1142  #### SentenceTransformersDocumentEmbedder.warm\_up
1143  
1144  ```python
1145  def warm_up()
1146  ```
1147  
1148  Initializes the component.
1149  
1150  <a id="sentence_transformers_document_embedder.SentenceTransformersDocumentEmbedder.run"></a>
1151  
1152  #### SentenceTransformersDocumentEmbedder.run
1153  
1154  ```python
1155  @component.output_types(documents=list[Document])
1156  def run(documents: list[Document])
1157  ```
1158  
1159  Embed a list of documents.
1160  
1161  **Arguments**:
1162  
1163  - `documents`: Documents to embed.
1164  
1165  **Returns**:
1166  
1167  A dictionary with the following keys:
1168  - `documents`: Documents with embeddings.
1169  
1170  <a id="sentence_transformers_text_embedder"></a>
1171  
1172  # Module sentence\_transformers\_text\_embedder
1173  
1174  <a id="sentence_transformers_text_embedder.SentenceTransformersTextEmbedder"></a>
1175  
1176  ## SentenceTransformersTextEmbedder
1177  
1178  Embeds strings using Sentence Transformers models.
1179  
1180  You can use it to embed user query and send it to an embedding retriever.
1181  
1182  Usage example:
1183  ```python
1184  from haystack.components.embedders import SentenceTransformersTextEmbedder
1185  
1186  text_to_embed = "I love pizza!"
1187  
1188  text_embedder = SentenceTransformersTextEmbedder()
1189  text_embedder.warm_up()
1190  
1191  print(text_embedder.run(text_to_embed))
1192  
1193  # {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}
1194  ```
1195  
1196  <a id="sentence_transformers_text_embedder.SentenceTransformersTextEmbedder.__init__"></a>
1197  
1198  #### SentenceTransformersTextEmbedder.\_\_init\_\_
1199  
1200  ```python
1201  def __init__(model: str = "sentence-transformers/all-mpnet-base-v2",
1202               device: Optional[ComponentDevice] = None,
1203               token: Optional[Secret] = Secret.from_env_var(
1204                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
1205               prefix: str = "",
1206               suffix: str = "",
1207               batch_size: int = 32,
1208               progress_bar: bool = True,
1209               normalize_embeddings: bool = False,
1210               trust_remote_code: bool = False,
1211               local_files_only: bool = False,
1212               truncate_dim: Optional[int] = None,
1213               model_kwargs: Optional[dict[str, Any]] = None,
1214               tokenizer_kwargs: Optional[dict[str, Any]] = None,
1215               config_kwargs: Optional[dict[str, Any]] = None,
1216               precision: Literal["float32", "int8", "uint8", "binary",
1217                                  "ubinary"] = "float32",
1218               encode_kwargs: Optional[dict[str, Any]] = None,
1219               backend: Literal["torch", "onnx", "openvino"] = "torch")
1220  ```
1221  
1222  Create a SentenceTransformersTextEmbedder component.
1223  
1224  **Arguments**:
1225  
1226  - `model`: The model to use for calculating embeddings.
1227  Specify the path to a local model or the ID of the model on Hugging Face.
1228  - `device`: Overrides the default device used to load the model.
1229  - `token`: An API token to use private models from Hugging Face.
1230  - `prefix`: A string to add at the beginning of each text to be embedded.
1231  You can use it to prepend the text with an instruction, as required by some embedding models,
1232  such as E5 and bge.
1233  - `suffix`: A string to add at the end of each text to embed.
1234  - `batch_size`: Number of texts to embed at once.
1235  - `progress_bar`: If `True`, shows a progress bar for calculating embeddings.
1236  If `False`, disables the progress bar.
1237  - `normalize_embeddings`: If `True`, the embeddings are normalized using L2 normalization, so that the embeddings have a norm of 1.
1238  - `trust_remote_code`: If `False`, permits only Hugging Face verified model architectures.
1239  If `True`, permits custom models and scripts.
1240  - `local_files_only`: If `True`, does not attempt to download the model from Hugging Face Hub and only looks at local files.
1241  - `truncate_dim`: The dimension to truncate sentence embeddings to. `None` does no truncation.
1242  If the model has not been trained with Matryoshka Representation Learning,
1243  truncation of embeddings can significantly affect performance.
1244  - `model_kwargs`: Additional keyword arguments for `AutoModelForSequenceClassification.from_pretrained`
1245  when loading the model. Refer to specific model documentation for available kwargs.
1246  - `tokenizer_kwargs`: Additional keyword arguments for `AutoTokenizer.from_pretrained` when loading the tokenizer.
1247  Refer to specific model documentation for available kwargs.
1248  - `config_kwargs`: Additional keyword arguments for `AutoConfig.from_pretrained` when loading the model configuration.
1249  - `precision`: The precision to use for the embeddings.
1250  All non-float32 precisions are quantized embeddings.
1251  Quantized embeddings are smaller in size and faster to compute, but may have a lower accuracy.
1252  They are useful for reducing the size of the embeddings of a corpus for semantic search, among other tasks.
1253  - `encode_kwargs`: Additional keyword arguments for `SentenceTransformer.encode` when embedding texts.
1254  This parameter is provided for fine customization. Be careful not to clash with already set parameters and
1255  avoid passing parameters that change the output type.
1256  - `backend`: The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino".
1257  Refer to the [Sentence Transformers documentation](https://sbert.net/docs/sentence_transformer/usage/efficiency.html)
1258  for more information on acceleration and quantization options.
1259  
1260  <a id="sentence_transformers_text_embedder.SentenceTransformersTextEmbedder.to_dict"></a>
1261  
1262  #### SentenceTransformersTextEmbedder.to\_dict
1263  
1264  ```python
1265  def to_dict() -> dict[str, Any]
1266  ```
1267  
1268  Serializes the component to a dictionary.
1269  
1270  **Returns**:
1271  
1272  Dictionary with serialized data.
1273  
1274  <a id="sentence_transformers_text_embedder.SentenceTransformersTextEmbedder.from_dict"></a>
1275  
1276  #### SentenceTransformersTextEmbedder.from\_dict
1277  
1278  ```python
1279  @classmethod
1280  def from_dict(cls, data: dict[str, Any]) -> "SentenceTransformersTextEmbedder"
1281  ```
1282  
1283  Deserializes the component from a dictionary.
1284  
1285  **Arguments**:
1286  
1287  - `data`: Dictionary to deserialize from.
1288  
1289  **Returns**:
1290  
1291  Deserialized component.
1292  
1293  <a id="sentence_transformers_text_embedder.SentenceTransformersTextEmbedder.warm_up"></a>
1294  
1295  #### SentenceTransformersTextEmbedder.warm\_up
1296  
1297  ```python
1298  def warm_up()
1299  ```
1300  
1301  Initializes the component.
1302  
1303  <a id="sentence_transformers_text_embedder.SentenceTransformersTextEmbedder.run"></a>
1304  
1305  #### SentenceTransformersTextEmbedder.run
1306  
1307  ```python
1308  @component.output_types(embedding=list[float])
1309  def run(text: str)
1310  ```
1311  
1312  Embed a single string.
1313  
1314  **Arguments**:
1315  
1316  - `text`: Text to embed.
1317  
1318  **Returns**:
1319  
1320  A dictionary with the following keys:
1321  - `embedding`: The embedding of the input text.
1322  
1323  <a id="sentence_transformers_sparse_document_embedder"></a>
1324  
1325  # Module sentence\_transformers\_sparse\_document\_embedder
1326  
1327  <a id="sentence_transformers_sparse_document_embedder.SentenceTransformersSparseDocumentEmbedder"></a>
1328  
1329  ## SentenceTransformersSparseDocumentEmbedder
1330  
1331  Calculates document sparse embeddings using sparse embedding models from Sentence Transformers.
1332  
1333  It stores the sparse embeddings in the `sparse_embedding` metadata field of each document.
1334  You can also embed documents' metadata.
1335  Use this component in indexing pipelines to embed input documents
1336  and send them to DocumentWriter to write a into a Document Store.
1337  
1338  ### Usage example:
1339  
1340  ```python
1341  from haystack import Document
1342  from haystack.components.embedders import SentenceTransformersSparseDocumentEmbedder
1343  
1344  doc = Document(content="I love pizza!")
1345  doc_embedder = SentenceTransformersSparseDocumentEmbedder()
1346  doc_embedder.warm_up()
1347  
1348  result = doc_embedder.run([doc])
1349  print(result['documents'][0].sparse_embedding)
1350  
1351  # SparseEmbedding(indices=[999, 1045, ...], values=[0.918, 0.867, ...])
1352  ```
1353  
1354  <a id="sentence_transformers_sparse_document_embedder.SentenceTransformersSparseDocumentEmbedder.__init__"></a>
1355  
1356  #### SentenceTransformersSparseDocumentEmbedder.\_\_init\_\_
1357  
1358  ```python
1359  def __init__(*,
1360               model: str = "prithivida/Splade_PP_en_v2",
1361               device: Optional[ComponentDevice] = None,
1362               token: Optional[Secret] = Secret.from_env_var(
1363                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
1364               prefix: str = "",
1365               suffix: str = "",
1366               batch_size: int = 32,
1367               progress_bar: bool = True,
1368               meta_fields_to_embed: Optional[list[str]] = None,
1369               embedding_separator: str = "\n",
1370               trust_remote_code: bool = False,
1371               local_files_only: bool = False,
1372               model_kwargs: Optional[dict[str, Any]] = None,
1373               tokenizer_kwargs: Optional[dict[str, Any]] = None,
1374               config_kwargs: Optional[dict[str, Any]] = None,
1375               backend: Literal["torch", "onnx", "openvino"] = "torch")
1376  ```
1377  
1378  Creates a SentenceTransformersSparseDocumentEmbedder component.
1379  
1380  **Arguments**:
1381  
1382  - `model`: The model to use for calculating sparse embeddings.
1383  Pass a local path or ID of the model on Hugging Face.
1384  - `device`: The device to use for loading the model.
1385  Overrides the default device.
1386  - `token`: The API token to download private models from Hugging Face.
1387  - `prefix`: A string to add at the beginning of each document text.
1388  - `suffix`: A string to add at the end of each document text.
1389  - `batch_size`: Number of documents to embed at once.
1390  - `progress_bar`: If `True`, shows a progress bar when embedding documents.
1391  - `meta_fields_to_embed`: List of metadata fields to embed along with the document text.
1392  - `embedding_separator`: Separator used to concatenate the metadata fields to the document text.
1393  - `trust_remote_code`: If `False`, allows only Hugging Face verified model architectures.
1394  If `True`, allows custom models and scripts.
1395  - `local_files_only`: If `True`, does not attempt to download the model from Hugging Face Hub and only looks at local files.
1396  - `model_kwargs`: Additional keyword arguments for `AutoModelForSequenceClassification.from_pretrained`
1397  when loading the model. Refer to specific model documentation for available kwargs.
1398  - `tokenizer_kwargs`: Additional keyword arguments for `AutoTokenizer.from_pretrained` when loading the tokenizer.
1399  Refer to specific model documentation for available kwargs.
1400  - `config_kwargs`: Additional keyword arguments for `AutoConfig.from_pretrained` when loading the model configuration.
1401  - `backend`: The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino".
1402  Refer to the [Sentence Transformers documentation](https://sbert.net/docs/sentence_transformer/usage/efficiency.html)
1403  for more information on acceleration and quantization options.
1404  
1405  <a id="sentence_transformers_sparse_document_embedder.SentenceTransformersSparseDocumentEmbedder.to_dict"></a>
1406  
1407  #### SentenceTransformersSparseDocumentEmbedder.to\_dict
1408  
1409  ```python
1410  def to_dict() -> dict[str, Any]
1411  ```
1412  
1413  Serializes the component to a dictionary.
1414  
1415  **Returns**:
1416  
1417  Dictionary with serialized data.
1418  
1419  <a id="sentence_transformers_sparse_document_embedder.SentenceTransformersSparseDocumentEmbedder.from_dict"></a>
1420  
1421  #### SentenceTransformersSparseDocumentEmbedder.from\_dict
1422  
1423  ```python
1424  @classmethod
1425  def from_dict(
1426          cls, data: dict[str,
1427                          Any]) -> "SentenceTransformersSparseDocumentEmbedder"
1428  ```
1429  
1430  Deserializes the component from a dictionary.
1431  
1432  **Arguments**:
1433  
1434  - `data`: Dictionary to deserialize from.
1435  
1436  **Returns**:
1437  
1438  Deserialized component.
1439  
1440  <a id="sentence_transformers_sparse_document_embedder.SentenceTransformersSparseDocumentEmbedder.warm_up"></a>
1441  
1442  #### SentenceTransformersSparseDocumentEmbedder.warm\_up
1443  
1444  ```python
1445  def warm_up()
1446  ```
1447  
1448  Initializes the component.
1449  
1450  <a id="sentence_transformers_sparse_document_embedder.SentenceTransformersSparseDocumentEmbedder.run"></a>
1451  
1452  #### SentenceTransformersSparseDocumentEmbedder.run
1453  
1454  ```python
1455  @component.output_types(documents=list[Document])
1456  def run(documents: list[Document])
1457  ```
1458  
1459  Embed a list of documents.
1460  
1461  **Arguments**:
1462  
1463  - `documents`: Documents to embed.
1464  
1465  **Returns**:
1466  
1467  A dictionary with the following keys:
1468  - `documents`: Documents with sparse embeddings under the `sparse_embedding` field.
1469  
1470  <a id="sentence_transformers_sparse_text_embedder"></a>
1471  
1472  # Module sentence\_transformers\_sparse\_text\_embedder
1473  
1474  <a id="sentence_transformers_sparse_text_embedder.SentenceTransformersSparseTextEmbedder"></a>
1475  
1476  ## SentenceTransformersSparseTextEmbedder
1477  
1478  Embeds strings using sparse embedding models from Sentence Transformers.
1479  
1480  You can use it to embed user query and send it to a sparse embedding retriever.
1481  
1482  Usage example:
1483  ```python
1484  from haystack.components.embedders import SentenceTransformersSparseTextEmbedder
1485  
1486  text_to_embed = "I love pizza!"
1487  
1488  text_embedder = SentenceTransformersSparseTextEmbedder()
1489  text_embedder.warm_up()
1490  
1491  print(text_embedder.run(text_to_embed))
1492  
1493  # {'sparse_embedding': SparseEmbedding(indices=[999, 1045, ...], values=[0.918, 0.867, ...])}
1494  ```
1495  
1496  <a id="sentence_transformers_sparse_text_embedder.SentenceTransformersSparseTextEmbedder.__init__"></a>
1497  
1498  #### SentenceTransformersSparseTextEmbedder.\_\_init\_\_
1499  
1500  ```python
1501  def __init__(*,
1502               model: str = "prithivida/Splade_PP_en_v2",
1503               device: Optional[ComponentDevice] = None,
1504               token: Optional[Secret] = Secret.from_env_var(
1505                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
1506               prefix: str = "",
1507               suffix: str = "",
1508               trust_remote_code: bool = False,
1509               local_files_only: bool = False,
1510               model_kwargs: Optional[dict[str, Any]] = None,
1511               tokenizer_kwargs: Optional[dict[str, Any]] = None,
1512               config_kwargs: Optional[dict[str, Any]] = None,
1513               encode_kwargs: Optional[dict[str, Any]] = None,
1514               backend: Literal["torch", "onnx", "openvino"] = "torch")
1515  ```
1516  
1517  Create a SentenceTransformersSparseTextEmbedder component.
1518  
1519  **Arguments**:
1520  
1521  - `model`: The model to use for calculating sparse embeddings.
1522  Specify the path to a local model or the ID of the model on Hugging Face.
1523  - `device`: Overrides the default device used to load the model.
1524  - `token`: An API token to use private models from Hugging Face.
1525  - `prefix`: A string to add at the beginning of each text to be embedded.
1526  - `suffix`: A string to add at the end of each text to embed.
1527  - `trust_remote_code`: If `False`, permits only Hugging Face verified model architectures.
1528  If `True`, permits custom models and scripts.
1529  - `local_files_only`: If `True`, does not attempt to download the model from Hugging Face Hub and only looks at local files.
1530  - `model_kwargs`: Additional keyword arguments for `AutoModelForSequenceClassification.from_pretrained`
1531  when loading the model. Refer to specific model documentation for available kwargs.
1532  - `tokenizer_kwargs`: Additional keyword arguments for `AutoTokenizer.from_pretrained` when loading the tokenizer.
1533  Refer to specific model documentation for available kwargs.
1534  - `config_kwargs`: Additional keyword arguments for `AutoConfig.from_pretrained` when loading the model configuration.
1535  - `backend`: The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino".
1536  Refer to the [Sentence Transformers documentation](https://sbert.net/docs/sentence_transformer/usage/efficiency.html)
1537  for more information on acceleration and quantization options.
1538  
1539  <a id="sentence_transformers_sparse_text_embedder.SentenceTransformersSparseTextEmbedder.to_dict"></a>
1540  
1541  #### SentenceTransformersSparseTextEmbedder.to\_dict
1542  
1543  ```python
1544  def to_dict() -> dict[str, Any]
1545  ```
1546  
1547  Serializes the component to a dictionary.
1548  
1549  **Returns**:
1550  
1551  Dictionary with serialized data.
1552  
1553  <a id="sentence_transformers_sparse_text_embedder.SentenceTransformersSparseTextEmbedder.from_dict"></a>
1554  
1555  #### SentenceTransformersSparseTextEmbedder.from\_dict
1556  
1557  ```python
1558  @classmethod
1559  def from_dict(
1560          cls, data: dict[str, Any]) -> "SentenceTransformersSparseTextEmbedder"
1561  ```
1562  
1563  Deserializes the component from a dictionary.
1564  
1565  **Arguments**:
1566  
1567  - `data`: Dictionary to deserialize from.
1568  
1569  **Returns**:
1570  
1571  Deserialized component.
1572  
1573  <a id="sentence_transformers_sparse_text_embedder.SentenceTransformersSparseTextEmbedder.warm_up"></a>
1574  
1575  #### SentenceTransformersSparseTextEmbedder.warm\_up
1576  
1577  ```python
1578  def warm_up()
1579  ```
1580  
1581  Initializes the component.
1582  
1583  <a id="sentence_transformers_sparse_text_embedder.SentenceTransformersSparseTextEmbedder.run"></a>
1584  
1585  #### SentenceTransformersSparseTextEmbedder.run
1586  
1587  ```python
1588  @component.output_types(sparse_embedding=SparseEmbedding)
1589  def run(text: str)
1590  ```
1591  
1592  Embed a single string.
1593  
1594  **Arguments**:
1595  
1596  - `text`: Text to embed.
1597  
1598  **Returns**:
1599  
1600  A dictionary with the following keys:
1601  - `sparse_embedding`: The sparse embedding of the input text.
1602  
1603  <a id="image/sentence_transformers_doc_image_embedder"></a>
1604  
1605  # Module image/sentence\_transformers\_doc\_image\_embedder
1606  
1607  <a id="image/sentence_transformers_doc_image_embedder.SentenceTransformersDocumentImageEmbedder"></a>
1608  
1609  ## SentenceTransformersDocumentImageEmbedder
1610  
1611  A component for computing Document embeddings based on images using Sentence Transformers models.
1612  
1613  The embedding of each Document is stored in the `embedding` field of the Document.
1614  
1615  ### Usage example
1616  ```python
1617  from haystack import Document
1618  from haystack.components.embedders.image import SentenceTransformersDocumentImageEmbedder
1619  
1620  embedder = SentenceTransformersDocumentImageEmbedder(model="sentence-transformers/clip-ViT-B-32")
1621  embedder.warm_up()
1622  
1623  documents = [
1624      Document(content="A photo of a cat", meta={"file_path": "cat.jpg"}),
1625      Document(content="A photo of a dog", meta={"file_path": "dog.jpg"}),
1626  ]
1627  
1628  result = embedder.run(documents=documents)
1629  documents_with_embeddings = result["documents"]
1630  print(documents_with_embeddings)
1631  
1632  # [Document(id=...,
1633  #           content='A photo of a cat',
1634  #           meta={'file_path': 'cat.jpg',
1635  #                 'embedding_source': {'type': 'image', 'file_path_meta_field': 'file_path'}},
1636  #           embedding=vector of size 512),
1637  #  ...]
1638  ```
1639  
1640  <a id="image/sentence_transformers_doc_image_embedder.SentenceTransformersDocumentImageEmbedder.__init__"></a>
1641  
1642  #### SentenceTransformersDocumentImageEmbedder.\_\_init\_\_
1643  
1644  ```python
1645  def __init__(*,
1646               file_path_meta_field: str = "file_path",
1647               root_path: Optional[str] = None,
1648               model: str = "sentence-transformers/clip-ViT-B-32",
1649               device: Optional[ComponentDevice] = None,
1650               token: Optional[Secret] = Secret.from_env_var(
1651                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
1652               batch_size: int = 32,
1653               progress_bar: bool = True,
1654               normalize_embeddings: bool = False,
1655               trust_remote_code: bool = False,
1656               local_files_only: bool = False,
1657               model_kwargs: Optional[dict[str, Any]] = None,
1658               tokenizer_kwargs: Optional[dict[str, Any]] = None,
1659               config_kwargs: Optional[dict[str, Any]] = None,
1660               precision: Literal["float32", "int8", "uint8", "binary",
1661                                  "ubinary"] = "float32",
1662               encode_kwargs: Optional[dict[str, Any]] = None,
1663               backend: Literal["torch", "onnx", "openvino"] = "torch") -> None
1664  ```
1665  
1666  Creates a SentenceTransformersDocumentEmbedder component.
1667  
1668  **Arguments**:
1669  
1670  - `file_path_meta_field`: The metadata field in the Document that contains the file path to the image or PDF.
1671  - `root_path`: The root directory path where document files are located. If provided, file paths in
1672  document metadata will be resolved relative to this path. If None, file paths are treated as absolute paths.
1673  - `model`: The Sentence Transformers model to use for calculating embeddings. Pass a local path or ID of the model on
1674  Hugging Face. To be used with this component, the model must be able to embed images and text into the same
1675  vector space. Compatible models include:
1676  - "sentence-transformers/clip-ViT-B-32"
1677  - "sentence-transformers/clip-ViT-L-14"
1678  - "sentence-transformers/clip-ViT-B-16"
1679  - "sentence-transformers/clip-ViT-B-32-multilingual-v1"
1680  - "jinaai/jina-embeddings-v4"
1681  - "jinaai/jina-clip-v1"
1682  - "jinaai/jina-clip-v2".
1683  - `device`: The device to use for loading the model.
1684  Overrides the default device.
1685  - `token`: The API token to download private models from Hugging Face.
1686  - `batch_size`: Number of documents to embed at once.
1687  - `progress_bar`: If `True`, shows a progress bar when embedding documents.
1688  - `normalize_embeddings`: If `True`, the embeddings are normalized using L2 normalization, so that each embedding has a norm of 1.
1689  - `trust_remote_code`: If `False`, allows only Hugging Face verified model architectures.
1690  If `True`, allows custom models and scripts.
1691  - `local_files_only`: If `True`, does not attempt to download the model from Hugging Face Hub and only looks at local files.
1692  - `model_kwargs`: Additional keyword arguments for `AutoModelForSequenceClassification.from_pretrained`
1693  when loading the model. Refer to specific model documentation for available kwargs.
1694  - `tokenizer_kwargs`: Additional keyword arguments for `AutoTokenizer.from_pretrained` when loading the tokenizer.
1695  Refer to specific model documentation for available kwargs.
1696  - `config_kwargs`: Additional keyword arguments for `AutoConfig.from_pretrained` when loading the model configuration.
1697  - `precision`: The precision to use for the embeddings.
1698  All non-float32 precisions are quantized embeddings.
1699  Quantized embeddings are smaller and faster to compute, but may have a lower accuracy.
1700  They are useful for reducing the size of the embeddings of a corpus for semantic search, among other tasks.
1701  - `encode_kwargs`: Additional keyword arguments for `SentenceTransformer.encode` when embedding documents.
1702  This parameter is provided for fine customization. Be careful not to clash with already set parameters and
1703  avoid passing parameters that change the output type.
1704  - `backend`: The backend to use for the Sentence Transformers model. Choose from "torch", "onnx", or "openvino".
1705  Refer to the [Sentence Transformers documentation](https://sbert.net/docs/sentence_transformer/usage/efficiency.html)
1706  for more information on acceleration and quantization options.
1707  
1708  <a id="image/sentence_transformers_doc_image_embedder.SentenceTransformersDocumentImageEmbedder.to_dict"></a>
1709  
1710  #### SentenceTransformersDocumentImageEmbedder.to\_dict
1711  
1712  ```python
1713  def to_dict() -> dict[str, Any]
1714  ```
1715  
1716  Serializes the component to a dictionary.
1717  
1718  **Returns**:
1719  
1720  Dictionary with serialized data.
1721  
1722  <a id="image/sentence_transformers_doc_image_embedder.SentenceTransformersDocumentImageEmbedder.from_dict"></a>
1723  
1724  #### SentenceTransformersDocumentImageEmbedder.from\_dict
1725  
1726  ```python
1727  @classmethod
1728  def from_dict(
1729          cls, data: dict[str,
1730                          Any]) -> "SentenceTransformersDocumentImageEmbedder"
1731  ```
1732  
1733  Deserializes the component from a dictionary.
1734  
1735  **Arguments**:
1736  
1737  - `data`: Dictionary to deserialize from.
1738  
1739  **Returns**:
1740  
1741  Deserialized component.
1742  
1743  <a id="image/sentence_transformers_doc_image_embedder.SentenceTransformersDocumentImageEmbedder.warm_up"></a>
1744  
1745  #### SentenceTransformersDocumentImageEmbedder.warm\_up
1746  
1747  ```python
1748  def warm_up() -> None
1749  ```
1750  
1751  Initializes the component.
1752  
1753  <a id="image/sentence_transformers_doc_image_embedder.SentenceTransformersDocumentImageEmbedder.run"></a>
1754  
1755  #### SentenceTransformersDocumentImageEmbedder.run
1756  
1757  ```python
1758  @component.output_types(documents=list[Document])
1759  def run(documents: list[Document]) -> dict[str, list[Document]]
1760  ```
1761  
1762  Embed a list of documents.
1763  
1764  **Arguments**:
1765  
1766  - `documents`: Documents to embed.
1767  
1768  **Returns**:
1769  
1770  A dictionary with the following keys:
1771  - `documents`: Documents with embeddings.