Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.18 / haystack-api / generators_api.md
generators_api.md
   1  ---
   2  title: Generators
   3  id: generators-api
   4  description: Enables text generation using LLMs.
   5  slug: "/generators-api"
   6  ---
   7  
   8  <a id="azure"></a>
   9  
  10  # Module azure
  11  
  12  <a id="azure.AzureOpenAIGenerator"></a>
  13  
  14  ## AzureOpenAIGenerator
  15  
  16  Generates text using OpenAI's large language models (LLMs).
  17  
  18  It works with the gpt-4 - type models and supports streaming responses
  19  from OpenAI API.
  20  
  21  You can customize how the text is generated by passing parameters to the
  22  OpenAI API. Use the `**generation_kwargs` argument when you initialize
  23  the component or when you run it. Any parameter that works with
  24  `openai.ChatCompletion.create` will work here too.
  25  
  26  
  27  For details on OpenAI API parameters, see
  28  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
  29  
  30  
  31  ### Usage example
  32  
  33  ```python
  34  from haystack.components.generators import AzureOpenAIGenerator
  35  from haystack.utils import Secret
  36  client = AzureOpenAIGenerator(
  37      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
  38      api_key=Secret.from_token("<your-api-key>"),
  39      azure_deployment="<this a model name, e.g.  gpt-4o-mini>")
  40  response = client.run("What's Natural Language Processing? Be brief.")
  41  print(response)
  42  ```
  43  
  44  ```
  45  >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
  46  >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
  47  >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
  48  >> 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
  49  >> 'completion_tokens': 49, 'total_tokens': 65}}]}
  50  ```
  51  
  52  <a id="azure.AzureOpenAIGenerator.__init__"></a>
  53  
  54  #### AzureOpenAIGenerator.\_\_init\_\_
  55  
  56  ```python
  57  def __init__(azure_endpoint: Optional[str] = None,
  58               api_version: Optional[str] = "2023-05-15",
  59               azure_deployment: Optional[str] = "gpt-4o-mini",
  60               api_key: Optional[Secret] = Secret.from_env_var(
  61                   "AZURE_OPENAI_API_KEY", strict=False),
  62               azure_ad_token: Optional[Secret] = Secret.from_env_var(
  63                   "AZURE_OPENAI_AD_TOKEN", strict=False),
  64               organization: Optional[str] = None,
  65               streaming_callback: Optional[StreamingCallbackT] = None,
  66               system_prompt: Optional[str] = None,
  67               timeout: Optional[float] = None,
  68               max_retries: Optional[int] = None,
  69               http_client_kwargs: Optional[dict[str, Any]] = None,
  70               generation_kwargs: Optional[dict[str, Any]] = None,
  71               default_headers: Optional[dict[str, str]] = None,
  72               *,
  73               azure_ad_token_provider: Optional[AzureADTokenProvider] = None)
  74  ```
  75  
  76  Initialize the Azure OpenAI Generator.
  77  
  78  **Arguments**:
  79  
  80  - `azure_endpoint`: The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`.
  81  - `api_version`: The version of the API to use. Defaults to 2023-05-15.
  82  - `azure_deployment`: The deployment of the model, usually the model name.
  83  - `api_key`: The API key to use for authentication.
  84  - `azure_ad_token`: [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
  85  - `organization`: Your organization ID, defaults to `None`. For help, see
  86  [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
  87  - `streaming_callback`: A callback function called when a new token is received from the stream.
  88  It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
  89  as an argument.
  90  - `system_prompt`: The system prompt to use for text generation. If not provided, the Generator
  91  omits the system prompt and uses the default system prompt.
  92  - `timeout`: Timeout for AzureOpenAI client. If not set, it is inferred from the
  93  `OPENAI_TIMEOUT` environment variable or set to 30.
  94  - `max_retries`: Maximum retries to establish contact with AzureOpenAI if it returns an internal error.
  95  If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
  96  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
  97  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
  98  - `generation_kwargs`: Other parameters to use for the model, sent directly to
  99  the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for
 100  more details.
 101  Some of the supported parameters:
 102  - `max_tokens`: The maximum number of tokens the output text can have.
 103  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 104      Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 105  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 106      considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
 107      comprising the top 10% probability mass are considered.
 108  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 109      the LLM will generate two completions per prompt, resulting in 6 completions total.
 110  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 111  - `presence_penalty`: The penalty applied if a token is already present.
 112      Higher values make the model less likely to repeat the token.
 113  - `frequency_penalty`: Penalty applied if a token has already been generated.
 114      Higher values make the model less likely to repeat the token.
 115  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 116      values are the bias to add to that token.
 117  - `default_headers`: Default headers to use for the AzureOpenAI client.
 118  - `azure_ad_token_provider`: A function that returns an Azure Active Directory token, will be invoked on
 119  every request.
 120  
 121  <a id="azure.AzureOpenAIGenerator.to_dict"></a>
 122  
 123  #### AzureOpenAIGenerator.to\_dict
 124  
 125  ```python
 126  def to_dict() -> dict[str, Any]
 127  ```
 128  
 129  Serialize this component to a dictionary.
 130  
 131  **Returns**:
 132  
 133  The serialized component as a dictionary.
 134  
 135  <a id="azure.AzureOpenAIGenerator.from_dict"></a>
 136  
 137  #### AzureOpenAIGenerator.from\_dict
 138  
 139  ```python
 140  @classmethod
 141  def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIGenerator"
 142  ```
 143  
 144  Deserialize this component from a dictionary.
 145  
 146  **Arguments**:
 147  
 148  - `data`: The dictionary representation of this component.
 149  
 150  **Returns**:
 151  
 152  The deserialized component instance.
 153  
 154  <a id="azure.AzureOpenAIGenerator.run"></a>
 155  
 156  #### AzureOpenAIGenerator.run
 157  
 158  ```python
 159  @component.output_types(replies=list[str], meta=list[dict[str, Any]])
 160  def run(prompt: str,
 161          system_prompt: Optional[str] = None,
 162          streaming_callback: Optional[StreamingCallbackT] = None,
 163          generation_kwargs: Optional[dict[str, Any]] = None)
 164  ```
 165  
 166  Invoke the text generation inference based on the provided messages and generation parameters.
 167  
 168  **Arguments**:
 169  
 170  - `prompt`: The string prompt to use for text generation.
 171  - `system_prompt`: The system prompt to use for text generation. If this run time system prompt is omitted, the system
 172  prompt, if defined at initialisation time, is used.
 173  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 174  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters
 175  passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to
 176  the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create).
 177  
 178  **Returns**:
 179  
 180  A list of strings containing the generated responses and a list of dictionaries containing the metadata
 181  for each response.
 182  
 183  <a id="hugging_face_local"></a>
 184  
 185  # Module hugging\_face\_local
 186  
 187  <a id="hugging_face_local.HuggingFaceLocalGenerator"></a>
 188  
 189  ## HuggingFaceLocalGenerator
 190  
 191  Generates text using models from Hugging Face that run locally.
 192  
 193  LLMs running locally may need powerful hardware.
 194  
 195  ### Usage example
 196  
 197  ```python
 198  from haystack.components.generators import HuggingFaceLocalGenerator
 199  
 200  generator = HuggingFaceLocalGenerator(
 201      model="google/flan-t5-large",
 202      task="text2text-generation",
 203      generation_kwargs={"max_new_tokens": 100, "temperature": 0.9})
 204  
 205  generator.warm_up()
 206  
 207  print(generator.run("Who is the best American actor?"))
 208  # {'replies': ['John Cusack']}
 209  ```
 210  
 211  <a id="hugging_face_local.HuggingFaceLocalGenerator.__init__"></a>
 212  
 213  #### HuggingFaceLocalGenerator.\_\_init\_\_
 214  
 215  ```python
 216  def __init__(model: str = "google/flan-t5-base",
 217               task: Optional[Literal["text-generation",
 218                                      "text2text-generation"]] = None,
 219               device: Optional[ComponentDevice] = None,
 220               token: Optional[Secret] = Secret.from_env_var(
 221                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
 222               generation_kwargs: Optional[dict[str, Any]] = None,
 223               huggingface_pipeline_kwargs: Optional[dict[str, Any]] = None,
 224               stop_words: Optional[list[str]] = None,
 225               streaming_callback: Optional[StreamingCallbackT] = None)
 226  ```
 227  
 228  Creates an instance of a HuggingFaceLocalGenerator.
 229  
 230  **Arguments**:
 231  
 232  - `model`: The Hugging Face text generation model name or path.
 233  - `task`: The task for the Hugging Face pipeline. Possible options:
 234  - `text-generation`: Supported by decoder models, like GPT.
 235  - `text2text-generation`: Supported by encoder-decoder models, like T5.
 236  If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
 237  If not specified, the component calls the Hugging Face API to infer the task from the model name.
 238  - `device`: The device for loading the model. If `None`, automatically selects the default device.
 239  If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
 240  - `token`: The token to use as HTTP bearer authorization for remote files.
 241  If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
 242  - `generation_kwargs`: A dictionary with keyword arguments to customize text generation.
 243  Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
 244  See Hugging Face's documentation for more information:
 245  - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
 246  - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
 247  - `huggingface_pipeline_kwargs`: Dictionary with keyword arguments to initialize the
 248  Hugging Face pipeline for text generation.
 249  These keyword arguments provide fine-grained control over the Hugging Face pipeline.
 250  In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
 251  For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
 252  In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization:
 253  [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
 254  - `stop_words`: If the model generates a stop word, the generation stops.
 255  If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
 256  For some chat models, the output includes both the new text and the original prompt.
 257  In these cases, make sure your prompt has no stop words.
 258  - `streaming_callback`: An optional callable for handling streaming responses.
 259  
 260  <a id="hugging_face_local.HuggingFaceLocalGenerator.warm_up"></a>
 261  
 262  #### HuggingFaceLocalGenerator.warm\_up
 263  
 264  ```python
 265  def warm_up()
 266  ```
 267  
 268  Initializes the component.
 269  
 270  <a id="hugging_face_local.HuggingFaceLocalGenerator.to_dict"></a>
 271  
 272  #### HuggingFaceLocalGenerator.to\_dict
 273  
 274  ```python
 275  def to_dict() -> dict[str, Any]
 276  ```
 277  
 278  Serializes the component to a dictionary.
 279  
 280  **Returns**:
 281  
 282  Dictionary with serialized data.
 283  
 284  <a id="hugging_face_local.HuggingFaceLocalGenerator.from_dict"></a>
 285  
 286  #### HuggingFaceLocalGenerator.from\_dict
 287  
 288  ```python
 289  @classmethod
 290  def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceLocalGenerator"
 291  ```
 292  
 293  Deserializes the component from a dictionary.
 294  
 295  **Arguments**:
 296  
 297  - `data`: The dictionary to deserialize from.
 298  
 299  **Returns**:
 300  
 301  The deserialized component.
 302  
 303  <a id="hugging_face_local.HuggingFaceLocalGenerator.run"></a>
 304  
 305  #### HuggingFaceLocalGenerator.run
 306  
 307  ```python
 308  @component.output_types(replies=list[str])
 309  def run(prompt: str,
 310          streaming_callback: Optional[StreamingCallbackT] = None,
 311          generation_kwargs: Optional[dict[str, Any]] = None)
 312  ```
 313  
 314  Run the text generation model on the given prompt.
 315  
 316  **Arguments**:
 317  
 318  - `prompt`: A string representing the prompt.
 319  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 320  - `generation_kwargs`: Additional keyword arguments for text generation.
 321  
 322  **Returns**:
 323  
 324  A dictionary containing the generated replies.
 325  - replies: A list of strings representing the generated replies.
 326  
 327  <a id="hugging_face_api"></a>
 328  
 329  # Module hugging\_face\_api
 330  
 331  <a id="hugging_face_api.HuggingFaceAPIGenerator"></a>
 332  
 333  ## HuggingFaceAPIGenerator
 334  
 335  Generates text using Hugging Face APIs.
 336  
 337  Use it with the following Hugging Face APIs:
 338  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
 339  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
 340  
 341  **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the
 342  `text_generation` endpoint. Generative models are now only available through providers supporting the
 343  `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API.
 344  Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint.
 345  
 346  ### Usage examples
 347  
 348  #### With Hugging Face Inference Endpoints
 349  
 350  
 351  #### With self-hosted text generation inference
 352  
 353  #### With the free serverless inference API
 354  
 355  Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the
 356  `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the
 357  `chat_completion` endpoint.
 358  
 359  ```python
 360  from haystack.components.generators import HuggingFaceAPIGenerator
 361  from haystack.utils import Secret
 362  
 363  generator = HuggingFaceAPIGenerator(api_type="inference_endpoints",
 364                                      api_params={"url": "<your-inference-endpoint-url>"},
 365                                      token=Secret.from_token("<your-api-key>"))
 366  
 367  result = generator.run(prompt="What's Natural Language Processing?")
 368  print(result)
 369  ```
 370  ```python
 371  from haystack.components.generators import HuggingFaceAPIGenerator
 372  
 373  generator = HuggingFaceAPIGenerator(api_type="text_generation_inference",
 374                                      api_params={"url": "http://localhost:8080"})
 375  
 376  result = generator.run(prompt="What's Natural Language Processing?")
 377  print(result)
 378  ```
 379  ```python
 380  from haystack.components.generators import HuggingFaceAPIGenerator
 381  from haystack.utils import Secret
 382  
 383  generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
 384                                      api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
 385                                      token=Secret.from_token("<your-api-key>"))
 386  
 387  result = generator.run(prompt="What's Natural Language Processing?")
 388  print(result)
 389  ```
 390  
 391  <a id="hugging_face_api.HuggingFaceAPIGenerator.__init__"></a>
 392  
 393  #### HuggingFaceAPIGenerator.\_\_init\_\_
 394  
 395  ```python
 396  def __init__(api_type: Union[HFGenerationAPIType, str],
 397               api_params: dict[str, str],
 398               token: Optional[Secret] = Secret.from_env_var(
 399                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
 400               generation_kwargs: Optional[dict[str, Any]] = None,
 401               stop_words: Optional[list[str]] = None,
 402               streaming_callback: Optional[StreamingCallbackT] = None)
 403  ```
 404  
 405  Initialize the HuggingFaceAPIGenerator instance.
 406  
 407  **Arguments**:
 408  
 409  - `api_type`: The type of Hugging Face API to use. Available types:
 410  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
 411  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
 412  - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api).
 413    This might no longer work due to changes in the models offered in the Hugging Face Inference API.
 414    Please use the `HuggingFaceAPIChatGenerator` component instead.
 415  - `api_params`: A dictionary with the following keys:
 416  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
 417  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
 418  `TEXT_GENERATION_INFERENCE`.
 419  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc.
 420  - `token`: The Hugging Face token to use as HTTP bearer authorization.
 421  Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
 422  - `generation_kwargs`: A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`,
 423  `temperature`, `top_k`, `top_p`.
 424  For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation)
 425  for more information.
 426  - `stop_words`: An optional list of strings representing the stop words.
 427  - `streaming_callback`: An optional callable for handling streaming responses.
 428  
 429  <a id="hugging_face_api.HuggingFaceAPIGenerator.to_dict"></a>
 430  
 431  #### HuggingFaceAPIGenerator.to\_dict
 432  
 433  ```python
 434  def to_dict() -> dict[str, Any]
 435  ```
 436  
 437  Serialize this component to a dictionary.
 438  
 439  **Returns**:
 440  
 441  A dictionary containing the serialized component.
 442  
 443  <a id="hugging_face_api.HuggingFaceAPIGenerator.from_dict"></a>
 444  
 445  #### HuggingFaceAPIGenerator.from\_dict
 446  
 447  ```python
 448  @classmethod
 449  def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIGenerator"
 450  ```
 451  
 452  Deserialize this component from a dictionary.
 453  
 454  <a id="hugging_face_api.HuggingFaceAPIGenerator.run"></a>
 455  
 456  #### HuggingFaceAPIGenerator.run
 457  
 458  ```python
 459  @component.output_types(replies=list[str], meta=list[dict[str, Any]])
 460  def run(prompt: str,
 461          streaming_callback: Optional[StreamingCallbackT] = None,
 462          generation_kwargs: Optional[dict[str, Any]] = None)
 463  ```
 464  
 465  Invoke the text generation inference for the given prompt and generation parameters.
 466  
 467  **Arguments**:
 468  
 469  - `prompt`: A string representing the prompt.
 470  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 471  - `generation_kwargs`: Additional keyword arguments for text generation.
 472  
 473  **Returns**:
 474  
 475  A dictionary with the generated replies and metadata. Both are lists of length n.
 476  - replies: A list of strings representing the generated replies.
 477  
 478  <a id="openai"></a>
 479  
 480  # Module openai
 481  
 482  <a id="openai.OpenAIGenerator"></a>
 483  
 484  ## OpenAIGenerator
 485  
 486  Generates text using OpenAI's large language models (LLMs).
 487  
 488  It works with the gpt-4 and o-series models and supports streaming responses
 489  from OpenAI API. It uses strings as input and output.
 490  
 491  You can customize how the text is generated by passing parameters to the
 492  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 493  the component or when you run it. Any parameter that works with
 494  `openai.ChatCompletion.create` will work here too.
 495  
 496  
 497  For details on OpenAI API parameters, see
 498  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 499  
 500  ### Usage example
 501  
 502  ```python
 503  from haystack.components.generators import OpenAIGenerator
 504  client = OpenAIGenerator()
 505  response = client.run("What's Natural Language Processing? Be brief.")
 506  print(response)
 507  
 508  >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
 509  >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
 510  >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
 511  >> 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
 512  >> 'completion_tokens': 49, 'total_tokens': 65}}]}
 513  ```
 514  
 515  <a id="openai.OpenAIGenerator.__init__"></a>
 516  
 517  #### OpenAIGenerator.\_\_init\_\_
 518  
 519  ```python
 520  def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
 521               model: str = "gpt-4o-mini",
 522               streaming_callback: Optional[StreamingCallbackT] = None,
 523               api_base_url: Optional[str] = None,
 524               organization: Optional[str] = None,
 525               system_prompt: Optional[str] = None,
 526               generation_kwargs: Optional[dict[str, Any]] = None,
 527               timeout: Optional[float] = None,
 528               max_retries: Optional[int] = None,
 529               http_client_kwargs: Optional[dict[str, Any]] = None)
 530  ```
 531  
 532  Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-4o-mini
 533  
 534  By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters
 535  in the OpenAI client.
 536  
 537  **Arguments**:
 538  
 539  - `api_key`: The OpenAI API key to connect to OpenAI.
 540  - `model`: The name of the model to use.
 541  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 542  The callback function accepts StreamingChunk as an argument.
 543  - `api_base_url`: An optional base URL.
 544  - `organization`: The Organization ID, defaults to `None`.
 545  - `system_prompt`: The system prompt to use for text generation. If not provided, the system prompt is
 546  omitted, and the default system prompt of the model is used.
 547  - `generation_kwargs`: Other parameters to use for the model. These parameters are all sent directly to
 548  the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
 549  more details.
 550  Some of the supported parameters:
 551  - `max_tokens`: The maximum number of tokens the output text can have.
 552  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
 553      Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 554  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 555      considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens
 556      comprising the top 10% probability mass are considered.
 557  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
 558      it will generate two completions for each of the three prompts, ending up with 6 completions in total.
 559  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 560  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
 561      the model will be less likely to repeat the same token in the text.
 562  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
 563      Bigger values mean the model will be less likely to repeat the same token in the text.
 564  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 565      values are the bias to add to that token.
 566  - `timeout`: Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable
 567  or set to 30.
 568  - `max_retries`: Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred
 569  from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
 570  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 571  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
 572  
 573  <a id="openai.OpenAIGenerator.to_dict"></a>
 574  
 575  #### OpenAIGenerator.to\_dict
 576  
 577  ```python
 578  def to_dict() -> dict[str, Any]
 579  ```
 580  
 581  Serialize this component to a dictionary.
 582  
 583  **Returns**:
 584  
 585  The serialized component as a dictionary.
 586  
 587  <a id="openai.OpenAIGenerator.from_dict"></a>
 588  
 589  #### OpenAIGenerator.from\_dict
 590  
 591  ```python
 592  @classmethod
 593  def from_dict(cls, data: dict[str, Any]) -> "OpenAIGenerator"
 594  ```
 595  
 596  Deserialize this component from a dictionary.
 597  
 598  **Arguments**:
 599  
 600  - `data`: The dictionary representation of this component.
 601  
 602  **Returns**:
 603  
 604  The deserialized component instance.
 605  
 606  <a id="openai.OpenAIGenerator.run"></a>
 607  
 608  #### OpenAIGenerator.run
 609  
 610  ```python
 611  @component.output_types(replies=list[str], meta=list[dict[str, Any]])
 612  def run(prompt: str,
 613          system_prompt: Optional[str] = None,
 614          streaming_callback: Optional[StreamingCallbackT] = None,
 615          generation_kwargs: Optional[dict[str, Any]] = None)
 616  ```
 617  
 618  Invoke the text generation inference based on the provided messages and generation parameters.
 619  
 620  **Arguments**:
 621  
 622  - `prompt`: The string prompt to use for text generation.
 623  - `system_prompt`: The system prompt to use for text generation. If this run time system prompt is omitted, the system
 624  prompt, if defined at initialisation time, is used.
 625  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 626  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters
 627  passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to
 628  the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create).
 629  
 630  **Returns**:
 631  
 632  A list of strings containing the generated responses and a list of dictionaries containing the metadata
 633  for each response.
 634  
 635  <a id="openai_dalle"></a>
 636  
 637  # Module openai\_dalle
 638  
 639  <a id="openai_dalle.DALLEImageGenerator"></a>
 640  
 641  ## DALLEImageGenerator
 642  
 643  Generates images using OpenAI's DALL-E model.
 644  
 645  For details on OpenAI API parameters, see
 646  [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create).
 647  
 648  ### Usage example
 649  
 650  ```python
 651  from haystack.components.generators import DALLEImageGenerator
 652  image_generator = DALLEImageGenerator()
 653  response = image_generator.run("Show me a picture of a black cat.")
 654  print(response)
 655  ```
 656  
 657  <a id="openai_dalle.DALLEImageGenerator.__init__"></a>
 658  
 659  #### DALLEImageGenerator.\_\_init\_\_
 660  
 661  ```python
 662  def __init__(model: str = "dall-e-3",
 663               quality: Literal["standard", "hd"] = "standard",
 664               size: Literal["256x256", "512x512", "1024x1024", "1792x1024",
 665                             "1024x1792"] = "1024x1024",
 666               response_format: Literal["url", "b64_json"] = "url",
 667               api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
 668               api_base_url: Optional[str] = None,
 669               organization: Optional[str] = None,
 670               timeout: Optional[float] = None,
 671               max_retries: Optional[int] = None,
 672               http_client_kwargs: Optional[dict[str, Any]] = None)
 673  ```
 674  
 675  Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3.
 676  
 677  **Arguments**:
 678  
 679  - `model`: The model to use for image generation. Can be "dall-e-2" or "dall-e-3".
 680  - `quality`: The quality of the generated image. Can be "standard" or "hd".
 681  - `size`: The size of the generated images.
 682  Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2.
 683  Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models.
 684  - `response_format`: The format of the response. Can be "url" or "b64_json".
 685  - `api_key`: The OpenAI API key to connect to OpenAI.
 686  - `api_base_url`: An optional base URL.
 687  - `organization`: The Organization ID, defaults to `None`.
 688  - `timeout`: Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable
 689  or set to 30.
 690  - `max_retries`: Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred
 691  from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
 692  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 693  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
 694  
 695  <a id="openai_dalle.DALLEImageGenerator.warm_up"></a>
 696  
 697  #### DALLEImageGenerator.warm\_up
 698  
 699  ```python
 700  def warm_up() -> None
 701  ```
 702  
 703  Warm up the OpenAI client.
 704  
 705  <a id="openai_dalle.DALLEImageGenerator.run"></a>
 706  
 707  #### DALLEImageGenerator.run
 708  
 709  ```python
 710  @component.output_types(images=list[str], revised_prompt=str)
 711  def run(prompt: str,
 712          size: Optional[Literal["256x256", "512x512", "1024x1024", "1792x1024",
 713                                 "1024x1792"]] = None,
 714          quality: Optional[Literal["standard", "hd"]] = None,
 715          response_format: Optional[Optional[Literal["url",
 716                                                     "b64_json"]]] = None)
 717  ```
 718  
 719  Invokes the image generation inference based on the provided prompt and generation parameters.
 720  
 721  **Arguments**:
 722  
 723  - `prompt`: The prompt to generate the image.
 724  - `size`: If provided, overrides the size provided during initialization.
 725  - `quality`: If provided, overrides the quality provided during initialization.
 726  - `response_format`: If provided, overrides the response format provided during initialization.
 727  
 728  **Returns**:
 729  
 730  A dictionary containing the generated list of images and the revised prompt.
 731  Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings.
 732  The revised prompt is the prompt that was used to generate the image, if there was any revision
 733  to the prompt made by OpenAI.
 734  
 735  <a id="openai_dalle.DALLEImageGenerator.to_dict"></a>
 736  
 737  #### DALLEImageGenerator.to\_dict
 738  
 739  ```python
 740  def to_dict() -> dict[str, Any]
 741  ```
 742  
 743  Serialize this component to a dictionary.
 744  
 745  **Returns**:
 746  
 747  The serialized component as a dictionary.
 748  
 749  <a id="openai_dalle.DALLEImageGenerator.from_dict"></a>
 750  
 751  #### DALLEImageGenerator.from\_dict
 752  
 753  ```python
 754  @classmethod
 755  def from_dict(cls, data: dict[str, Any]) -> "DALLEImageGenerator"
 756  ```
 757  
 758  Deserialize this component from a dictionary.
 759  
 760  **Arguments**:
 761  
 762  - `data`: The dictionary representation of this component.
 763  
 764  **Returns**:
 765  
 766  The deserialized component instance.
 767  
 768  <a id="chat/azure"></a>
 769  
 770  # Module chat/azure
 771  
 772  <a id="chat/azure.AzureOpenAIChatGenerator"></a>
 773  
 774  ## AzureOpenAIChatGenerator
 775  
 776  Generates text using OpenAI's models on Azure.
 777  
 778  It works with the gpt-4 - type models and supports streaming responses
 779  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 780  format in input and output.
 781  
 782  You can customize how the text is generated by passing parameters to the
 783  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 784  the component or when you run it. Any parameter that works with
 785  `openai.ChatCompletion.create` will work here too.
 786  
 787  For details on OpenAI API parameters, see
 788  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 789  
 790  ### Usage example
 791  
 792  ```python
 793  from haystack.components.generators.chat import AzureOpenAIChatGenerator
 794  from haystack.dataclasses import ChatMessage
 795  from haystack.utils import Secret
 796  
 797  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 798  
 799  client = AzureOpenAIChatGenerator(
 800      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
 801      api_key=Secret.from_token("<your-api-key>"),
 802      azure_deployment="<this a model name, e.g. gpt-4o-mini>")
 803  response = client.run(messages)
 804  print(response)
 805  ```
 806  
 807  ```
 808  {'replies':
 809      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
 810      "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
 811       enabling computers to understand, interpret, and generate human language in a way that is useful.")],
 812       _name=None,
 813       _meta={'model': 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop',
 814       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})]
 815  }
 816  ```
 817  
 818  <a id="chat/azure.AzureOpenAIChatGenerator.__init__"></a>
 819  
 820  #### AzureOpenAIChatGenerator.\_\_init\_\_
 821  
 822  ```python
 823  def __init__(azure_endpoint: Optional[str] = None,
 824               api_version: Optional[str] = "2023-05-15",
 825               azure_deployment: Optional[str] = "gpt-4o-mini",
 826               api_key: Optional[Secret] = Secret.from_env_var(
 827                   "AZURE_OPENAI_API_KEY", strict=False),
 828               azure_ad_token: Optional[Secret] = Secret.from_env_var(
 829                   "AZURE_OPENAI_AD_TOKEN", strict=False),
 830               organization: Optional[str] = None,
 831               streaming_callback: Optional[StreamingCallbackT] = None,
 832               timeout: Optional[float] = None,
 833               max_retries: Optional[int] = None,
 834               generation_kwargs: Optional[dict[str, Any]] = None,
 835               default_headers: Optional[dict[str, str]] = None,
 836               tools: Optional[Union[list[Tool], Toolset]] = None,
 837               tools_strict: bool = False,
 838               *,
 839               azure_ad_token_provider: Optional[Union[
 840                   AzureADTokenProvider, AsyncAzureADTokenProvider]] = None,
 841               http_client_kwargs: Optional[dict[str, Any]] = None)
 842  ```
 843  
 844  Initialize the Azure OpenAI Chat Generator component.
 845  
 846  **Arguments**:
 847  
 848  - `azure_endpoint`: The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
 849  - `api_version`: The version of the API to use. Defaults to 2023-05-15.
 850  - `azure_deployment`: The deployment of the model, usually the model name.
 851  - `api_key`: The API key to use for authentication.
 852  - `azure_ad_token`: [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
 853  - `organization`: Your organization ID, defaults to `None`. For help, see
 854  [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
 855  - `streaming_callback`: A callback function called when a new token is received from the stream.
 856  It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
 857  as an argument.
 858  - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the
 859  `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 860  - `max_retries`: Maximum number of retries to contact OpenAI after an internal error.
 861  If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 862  - `generation_kwargs`: Other parameters to use for the model. These parameters are sent directly to
 863  the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 864  Some of the supported parameters:
 865  - `max_tokens`: The maximum number of tokens the output text can have.
 866  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 867      Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 868  - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers
 869      tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising
 870      the top 10% probability mass are considered.
 871  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 872      the LLM will generate two completions per prompt, resulting in 6 completions total.
 873  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 874  - `presence_penalty`: The penalty applied if a token is already present.
 875      Higher values make the model less likely to repeat the token.
 876  - `frequency_penalty`: Penalty applied if a token has already been generated.
 877      Higher values make the model less likely to repeat the token.
 878  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 879      values are the bias to add to that token.
 880  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
 881      If provided, the output will always be validated against this
 882      format (unless the model returns a tool call).
 883      For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
 884      Notes:
 885      - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
 886        Older models only support basic version of structured outputs through `{"type": "json_object"}`.
 887        For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
 888      - For structured outputs with streaming,
 889        the `response_format` must be a JSON schema and not a Pydantic model.
 890  - `default_headers`: Default headers to use for the AzureOpenAI client.
 891  - `tools`: A list of tools or a Toolset for which the model can prepare calls. This parameter can accept either a
 892  list of `Tool` objects or a `Toolset` instance.
 893  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 894  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 895  - `azure_ad_token_provider`: A function that returns an Azure Active Directory token, will be invoked on
 896  every request.
 897  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 898  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
 899  
 900  <a id="chat/azure.AzureOpenAIChatGenerator.to_dict"></a>
 901  
 902  #### AzureOpenAIChatGenerator.to\_dict
 903  
 904  ```python
 905  def to_dict() -> dict[str, Any]
 906  ```
 907  
 908  Serialize this component to a dictionary.
 909  
 910  **Returns**:
 911  
 912  The serialized component as a dictionary.
 913  
 914  <a id="chat/azure.AzureOpenAIChatGenerator.from_dict"></a>
 915  
 916  #### AzureOpenAIChatGenerator.from\_dict
 917  
 918  ```python
 919  @classmethod
 920  def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIChatGenerator"
 921  ```
 922  
 923  Deserialize this component from a dictionary.
 924  
 925  **Arguments**:
 926  
 927  - `data`: The dictionary representation of this component.
 928  
 929  **Returns**:
 930  
 931  The deserialized component instance.
 932  
 933  <a id="chat/azure.AzureOpenAIChatGenerator.run"></a>
 934  
 935  #### AzureOpenAIChatGenerator.run
 936  
 937  ```python
 938  @component.output_types(replies=list[ChatMessage])
 939  def run(messages: list[ChatMessage],
 940          streaming_callback: Optional[StreamingCallbackT] = None,
 941          generation_kwargs: Optional[dict[str, Any]] = None,
 942          *,
 943          tools: Optional[Union[list[Tool], Toolset]] = None,
 944          tools_strict: Optional[bool] = None)
 945  ```
 946  
 947  Invokes chat completion based on the provided messages and generation parameters.
 948  
 949  **Arguments**:
 950  
 951  - `messages`: A list of ChatMessage instances representing the input messages.
 952  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 953  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
 954  override the parameters passed during component initialization.
 955  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
 956  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
 957  `tools` parameter set during component initialization. This parameter can accept either a list of
 958  `Tool` objects or a `Toolset` instance.
 959  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 960  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 961  If set, it will override the `tools_strict` parameter set during component initialization.
 962  
 963  **Returns**:
 964  
 965  A dictionary with the following key:
 966  - `replies`: A list containing the generated responses as ChatMessage instances.
 967  
 968  <a id="chat/azure.AzureOpenAIChatGenerator.run_async"></a>
 969  
 970  #### AzureOpenAIChatGenerator.run\_async
 971  
 972  ```python
 973  @component.output_types(replies=list[ChatMessage])
 974  async def run_async(messages: list[ChatMessage],
 975                      streaming_callback: Optional[StreamingCallbackT] = None,
 976                      generation_kwargs: Optional[dict[str, Any]] = None,
 977                      *,
 978                      tools: Optional[Union[list[Tool], Toolset]] = None,
 979                      tools_strict: Optional[bool] = None)
 980  ```
 981  
 982  Asynchronously invokes chat completion based on the provided messages and generation parameters.
 983  
 984  This is the asynchronous version of the `run` method. It has the same parameters and return values
 985  but can be used with `await` in async code.
 986  
 987  **Arguments**:
 988  
 989  - `messages`: A list of ChatMessage instances representing the input messages.
 990  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 991  Must be a coroutine.
 992  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
 993  override the parameters passed during component initialization.
 994  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
 995  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
 996  `tools` parameter set during component initialization. This parameter can accept either a list of
 997  `Tool` objects or a `Toolset` instance.
 998  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 999  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1000  If set, it will override the `tools_strict` parameter set during component initialization.
1001  
1002  **Returns**:
1003  
1004  A dictionary with the following key:
1005  - `replies`: A list containing the generated responses as ChatMessage instances.
1006  
1007  <a id="chat/hugging_face_local"></a>
1008  
1009  # Module chat/hugging\_face\_local
1010  
1011  <a id="chat/hugging_face_local.default_tool_parser"></a>
1012  
1013  #### default\_tool\_parser
1014  
1015  ```python
1016  def default_tool_parser(text: str) -> Optional[list[ToolCall]]
1017  ```
1018  
1019  Default implementation for parsing tool calls from model output text.
1020  
1021  Uses DEFAULT_TOOL_PATTERN to extract tool calls.
1022  
1023  **Arguments**:
1024  
1025  - `text`: The text to parse for tool calls.
1026  
1027  **Returns**:
1028  
1029  A list containing a single ToolCall if a valid tool call is found, None otherwise.
1030  
1031  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator"></a>
1032  
1033  ## HuggingFaceLocalChatGenerator
1034  
1035  Generates chat responses using models from Hugging Face that run locally.
1036  
1037  Use this component with chat-based models,
1038  such as `HuggingFaceH4/zephyr-7b-beta` or `meta-llama/Llama-2-7b-chat-hf`.
1039  LLMs running locally may need powerful hardware.
1040  
1041  ### Usage example
1042  
1043  ```python
1044  from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
1045  from haystack.dataclasses import ChatMessage
1046  
1047  generator = HuggingFaceLocalChatGenerator(model="HuggingFaceH4/zephyr-7b-beta")
1048  generator.warm_up()
1049  messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
1050  print(generator.run(messages))
1051  ```
1052  
1053  ```
1054  {'replies':
1055      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
1056      "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals
1057      with the interaction between computers and human language. It enables computers to understand, interpret, and
1058      generate human language in a valuable way. NLP involves various techniques such as speech recognition, text
1059      analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to
1060      process and derive meaning from human language, improving communication between humans and machines.")],
1061      _name=None,
1062      _meta={'finish_reason': 'stop', 'index': 0, 'model':
1063            'mistralai/Mistral-7B-Instruct-v0.2',
1064            'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}})
1065            ]
1066  }
1067  ```
1068  
1069  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.__init__"></a>
1070  
1071  #### HuggingFaceLocalChatGenerator.\_\_init\_\_
1072  
1073  ```python
1074  def __init__(model: str = "HuggingFaceH4/zephyr-7b-beta",
1075               task: Optional[Literal["text-generation",
1076                                      "text2text-generation"]] = None,
1077               device: Optional[ComponentDevice] = None,
1078               token: Optional[Secret] = Secret.from_env_var(
1079                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
1080               chat_template: Optional[str] = None,
1081               generation_kwargs: Optional[dict[str, Any]] = None,
1082               huggingface_pipeline_kwargs: Optional[dict[str, Any]] = None,
1083               stop_words: Optional[list[str]] = None,
1084               streaming_callback: Optional[StreamingCallbackT] = None,
1085               tools: Optional[Union[list[Tool], Toolset]] = None,
1086               tool_parsing_function: Optional[Callable[
1087                   [str], Optional[list[ToolCall]]]] = None,
1088               async_executor: Optional[ThreadPoolExecutor] = None) -> None
1089  ```
1090  
1091  Initializes the HuggingFaceLocalChatGenerator component.
1092  
1093  **Arguments**:
1094  
1095  - `model`: The Hugging Face text generation model name or path,
1096  for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`.
1097  The model must be a chat model supporting the ChatML messaging
1098  format.
1099  If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1100  - `task`: The task for the Hugging Face pipeline. Possible options:
1101  - `text-generation`: Supported by decoder models, like GPT.
1102  - `text2text-generation`: Supported by encoder-decoder models, like T5.
1103  If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1104  If not specified, the component calls the Hugging Face API to infer the task from the model name.
1105  - `device`: The device for loading the model. If `None`, automatically selects the default device.
1106  If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
1107  - `token`: The token to use as HTTP bearer authorization for remote files.
1108  If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1109  - `chat_template`: Specifies an optional Jinja template for formatting chat
1110  messages. Most high-quality chat models have their own templates, but for models without this
1111  feature or if you prefer a custom template, use this parameter.
1112  - `generation_kwargs`: A dictionary with keyword arguments to customize text generation.
1113  Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
1114  See Hugging Face's documentation for more information:
1115  - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
1116  - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
1117  The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens.
1118  - `huggingface_pipeline_kwargs`: Dictionary with keyword arguments to initialize the
1119  Hugging Face pipeline for text generation.
1120  These keyword arguments provide fine-grained control over the Hugging Face pipeline.
1121  In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
1122  For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
1123  In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
1124  - `stop_words`: A list of stop words. If the model generates a stop word, the generation stops.
1125  If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
1126  For some chat models, the output includes both the new text and the original prompt.
1127  In these cases, make sure your prompt has no stop words.
1128  - `streaming_callback`: An optional callable for handling streaming responses.
1129  - `tools`: A list of tools or a Toolset for which the model can prepare calls.
1130  This parameter can accept either a list of `Tool` objects or a `Toolset` instance.
1131  - `tool_parsing_function`: A callable that takes a string and returns a list of ToolCall objects or None.
1132  If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern.
1133  - `async_executor`: Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be
1134  initialized and used
1135  
1136  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.__del__"></a>
1137  
1138  #### HuggingFaceLocalChatGenerator.\_\_del\_\_
1139  
1140  ```python
1141  def __del__() -> None
1142  ```
1143  
1144  Cleanup when the instance is being destroyed.
1145  
1146  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.shutdown"></a>
1147  
1148  #### HuggingFaceLocalChatGenerator.shutdown
1149  
1150  ```python
1151  def shutdown() -> None
1152  ```
1153  
1154  Explicitly shutdown the executor if we own it.
1155  
1156  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.warm_up"></a>
1157  
1158  #### HuggingFaceLocalChatGenerator.warm\_up
1159  
1160  ```python
1161  def warm_up() -> None
1162  ```
1163  
1164  Initializes the component.
1165  
1166  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.to_dict"></a>
1167  
1168  #### HuggingFaceLocalChatGenerator.to\_dict
1169  
1170  ```python
1171  def to_dict() -> dict[str, Any]
1172  ```
1173  
1174  Serializes the component to a dictionary.
1175  
1176  **Returns**:
1177  
1178  Dictionary with serialized data.
1179  
1180  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.from_dict"></a>
1181  
1182  #### HuggingFaceLocalChatGenerator.from\_dict
1183  
1184  ```python
1185  @classmethod
1186  def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceLocalChatGenerator"
1187  ```
1188  
1189  Deserializes the component from a dictionary.
1190  
1191  **Arguments**:
1192  
1193  - `data`: The dictionary to deserialize from.
1194  
1195  **Returns**:
1196  
1197  The deserialized component.
1198  
1199  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.run"></a>
1200  
1201  #### HuggingFaceLocalChatGenerator.run
1202  
1203  ```python
1204  @component.output_types(replies=list[ChatMessage])
1205  def run(
1206      messages: list[ChatMessage],
1207      generation_kwargs: Optional[dict[str, Any]] = None,
1208      streaming_callback: Optional[StreamingCallbackT] = None,
1209      tools: Optional[Union[list[Tool], Toolset]] = None
1210  ) -> dict[str, list[ChatMessage]]
1211  ```
1212  
1213  Invoke text generation inference based on the provided messages and generation parameters.
1214  
1215  **Arguments**:
1216  
1217  - `messages`: A list of ChatMessage objects representing the input messages.
1218  - `generation_kwargs`: Additional keyword arguments for text generation.
1219  - `streaming_callback`: An optional callable for handling streaming responses.
1220  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override
1221  the `tools` parameter provided during initialization. This parameter can accept either a list
1222  of `Tool` objects or a `Toolset` instance.
1223  
1224  **Returns**:
1225  
1226  A dictionary with the following keys:
1227  - `replies`: A list containing the generated responses as ChatMessage instances.
1228  
1229  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.create_message"></a>
1230  
1231  #### HuggingFaceLocalChatGenerator.create\_message
1232  
1233  ```python
1234  def create_message(text: str,
1235                     index: int,
1236                     tokenizer: Union["PreTrainedTokenizer",
1237                                      "PreTrainedTokenizerFast"],
1238                     prompt: str,
1239                     generation_kwargs: dict[str, Any],
1240                     parse_tool_calls: bool = False) -> ChatMessage
1241  ```
1242  
1243  Create a ChatMessage instance from the provided text, populated with metadata.
1244  
1245  **Arguments**:
1246  
1247  - `text`: The generated text.
1248  - `index`: The index of the generated text.
1249  - `tokenizer`: The tokenizer used for generation.
1250  - `prompt`: The prompt used for generation.
1251  - `generation_kwargs`: The generation parameters.
1252  - `parse_tool_calls`: Whether to attempt parsing tool calls from the text.
1253  
1254  **Returns**:
1255  
1256  A ChatMessage instance.
1257  
1258  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.run_async"></a>
1259  
1260  #### HuggingFaceLocalChatGenerator.run\_async
1261  
1262  ```python
1263  @component.output_types(replies=list[ChatMessage])
1264  async def run_async(
1265      messages: list[ChatMessage],
1266      generation_kwargs: Optional[dict[str, Any]] = None,
1267      streaming_callback: Optional[StreamingCallbackT] = None,
1268      tools: Optional[Union[list[Tool], Toolset]] = None
1269  ) -> dict[str, list[ChatMessage]]
1270  ```
1271  
1272  Asynchronously invokes text generation inference based on the provided messages and generation parameters.
1273  
1274  This is the asynchronous version of the `run` method. It has the same parameters
1275  and return values but can be used with `await` in an async code.
1276  
1277  **Arguments**:
1278  
1279  - `messages`: A list of ChatMessage objects representing the input messages.
1280  - `generation_kwargs`: Additional keyword arguments for text generation.
1281  - `streaming_callback`: An optional callable for handling streaming responses.
1282  - `tools`: A list of tools or a Toolset for which the model can prepare calls.
1283  This parameter can accept either a list of `Tool` objects or a `Toolset` instance.
1284  
1285  **Returns**:
1286  
1287  A dictionary with the following keys:
1288  - `replies`: A list containing the generated responses as ChatMessage instances.
1289  
1290  <a id="chat/hugging_face_api"></a>
1291  
1292  # Module chat/hugging\_face\_api
1293  
1294  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator"></a>
1295  
1296  ## HuggingFaceAPIChatGenerator
1297  
1298  Completes chats using Hugging Face APIs.
1299  
1300  HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1301  format for input and output. Use it to generate text with Hugging Face APIs:
1302  - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers)
1303  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
1304  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
1305  
1306  ### Usage examples
1307  
1308  #### With the serverless inference API (Inference Providers) - free tier available
1309  
1310  ```python
1311  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
1312  from haystack.dataclasses import ChatMessage
1313  from haystack.utils import Secret
1314  from haystack.utils.hf import HFGenerationAPIType
1315  
1316  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
1317              ChatMessage.from_user("What's Natural Language Processing?")]
1318  
1319  # the api_type can be expressed using the HFGenerationAPIType enum or as a string
1320  api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API
1321  api_type = "serverless_inference_api" # this is equivalent to the above
1322  
1323  generator = HuggingFaceAPIChatGenerator(api_type=api_type,
1324                                          api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
1325                                                      "provider": "together"},
1326                                          token=Secret.from_token("<your-api-key>"))
1327  
1328  result = generator.run(messages)
1329  print(result)
1330  ```
1331  
1332  #### With the serverless inference API (Inference Providers) and text+image input
1333  
1334  ```python
1335  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
1336  from haystack.dataclasses import ChatMessage, ImageContent
1337  from haystack.utils import Secret
1338  from haystack.utils.hf import HFGenerationAPIType
1339  
1340  # Create an image from file path, URL, or base64
1341  image = ImageContent.from_file_path("path/to/your/image.jpg")
1342  
1343  # Create a multimodal message with both text and image
1344  messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])]
1345  
1346  generator = HuggingFaceAPIChatGenerator(
1347      api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
1348      api_params={
1349          "model": "Qwen/Qwen2.5-VL-7B-Instruct",  # Vision Language Model
1350          "provider": "hyperbolic"
1351      },
1352      token=Secret.from_token("<your-api-key>")
1353  )
1354  
1355  result = generator.run(messages)
1356  print(result)
1357  ```
1358  
1359  #### With paid inference endpoints
1360  
1361  ```python
1362  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
1363  from haystack.dataclasses import ChatMessage
1364  from haystack.utils import Secret
1365  
1366  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
1367              ChatMessage.from_user("What's Natural Language Processing?")]
1368  
1369  generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints",
1370                                          api_params={"url": "<your-inference-endpoint-url>"},
1371                                          token=Secret.from_token("<your-api-key>"))
1372  
1373  result = generator.run(messages)
1374  print(result)
1375  
1376  #### With self-hosted text generation inference
1377  
1378  ```python
1379  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
1380  from haystack.dataclasses import ChatMessage
1381  
1382  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
1383              ChatMessage.from_user("What's Natural Language Processing?")]
1384  
1385  generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference",
1386                                          api_params={"url": "http://localhost:8080"})
1387  
1388  result = generator.run(messages)
1389  print(result)
1390  ```
1391  
1392  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.__init__"></a>
1393  
1394  #### HuggingFaceAPIChatGenerator.\_\_init\_\_
1395  
1396  ```python
1397  def __init__(api_type: Union[HFGenerationAPIType, str],
1398               api_params: dict[str, str],
1399               token: Optional[Secret] = Secret.from_env_var(
1400                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
1401               generation_kwargs: Optional[dict[str, Any]] = None,
1402               stop_words: Optional[list[str]] = None,
1403               streaming_callback: Optional[StreamingCallbackT] = None,
1404               tools: Optional[Union[list[Tool], Toolset]] = None)
1405  ```
1406  
1407  Initialize the HuggingFaceAPIChatGenerator instance.
1408  
1409  **Arguments**:
1410  
1411  - `api_type`: The type of Hugging Face API to use. Available types:
1412  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
1413  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
1414  - `serverless_inference_api`: See
1415  [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers).
1416  - `api_params`: A dictionary with the following keys:
1417  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
1418  - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`.
1419  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
1420  `TEXT_GENERATION_INFERENCE`.
1421  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc.
1422  - `token`: The Hugging Face token to use as HTTP bearer authorization.
1423  Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
1424  - `generation_kwargs`: A dictionary with keyword arguments to customize text generation.
1425  Some examples: `max_tokens`, `temperature`, `top_p`.
1426  For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
1427  - `stop_words`: An optional list of strings representing the stop words.
1428  - `streaming_callback`: An optional callable for handling streaming responses.
1429  - `tools`: A list of tools or a Toolset for which the model can prepare calls.
1430  The chosen model should support tool/function calling, according to the model card.
1431  Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience
1432  unexpected behavior. This parameter can accept either a list of `Tool` objects or a `Toolset` instance.
1433  
1434  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.to_dict"></a>
1435  
1436  #### HuggingFaceAPIChatGenerator.to\_dict
1437  
1438  ```python
1439  def to_dict() -> dict[str, Any]
1440  ```
1441  
1442  Serialize this component to a dictionary.
1443  
1444  **Returns**:
1445  
1446  A dictionary containing the serialized component.
1447  
1448  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.from_dict"></a>
1449  
1450  #### HuggingFaceAPIChatGenerator.from\_dict
1451  
1452  ```python
1453  @classmethod
1454  def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIChatGenerator"
1455  ```
1456  
1457  Deserialize this component from a dictionary.
1458  
1459  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.run"></a>
1460  
1461  #### HuggingFaceAPIChatGenerator.run
1462  
1463  ```python
1464  @component.output_types(replies=list[ChatMessage])
1465  def run(messages: list[ChatMessage],
1466          generation_kwargs: Optional[dict[str, Any]] = None,
1467          tools: Optional[Union[list[Tool], Toolset]] = None,
1468          streaming_callback: Optional[StreamingCallbackT] = None)
1469  ```
1470  
1471  Invoke the text generation inference based on the provided messages and generation parameters.
1472  
1473  **Arguments**:
1474  
1475  - `messages`: A list of ChatMessage objects representing the input messages.
1476  - `generation_kwargs`: Additional keyword arguments for text generation.
1477  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override
1478  the `tools` parameter set during component initialization. This parameter can accept either a
1479  list of `Tool` objects or a `Toolset` instance.
1480  - `streaming_callback`: An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
1481  parameter set during component initialization.
1482  
1483  **Returns**:
1484  
1485  A dictionary with the following keys:
1486  - `replies`: A list containing the generated responses as ChatMessage objects.
1487  
1488  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.run_async"></a>
1489  
1490  #### HuggingFaceAPIChatGenerator.run\_async
1491  
1492  ```python
1493  @component.output_types(replies=list[ChatMessage])
1494  async def run_async(messages: list[ChatMessage],
1495                      generation_kwargs: Optional[dict[str, Any]] = None,
1496                      tools: Optional[Union[list[Tool], Toolset]] = None,
1497                      streaming_callback: Optional[StreamingCallbackT] = None)
1498  ```
1499  
1500  Asynchronously invokes the text generation inference based on the provided messages and generation parameters.
1501  
1502  This is the asynchronous version of the `run` method. It has the same parameters
1503  and return values but can be used with `await` in an async code.
1504  
1505  **Arguments**:
1506  
1507  - `messages`: A list of ChatMessage objects representing the input messages.
1508  - `generation_kwargs`: Additional keyword arguments for text generation.
1509  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools`
1510  parameter set during component initialization. This parameter can accept either a list of `Tool` objects
1511  or a `Toolset` instance.
1512  - `streaming_callback`: An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
1513  parameter set during component initialization.
1514  
1515  **Returns**:
1516  
1517  A dictionary with the following keys:
1518  - `replies`: A list containing the generated responses as ChatMessage objects.
1519  
1520  <a id="chat/openai"></a>
1521  
1522  # Module chat/openai
1523  
1524  <a id="chat/openai.OpenAIChatGenerator"></a>
1525  
1526  ## OpenAIChatGenerator
1527  
1528  Completes chats using OpenAI's large language models (LLMs).
1529  
1530  It works with the gpt-4 and o-series models and supports streaming responses
1531  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1532  format in input and output.
1533  
1534  You can customize how the text is generated by passing parameters to the
1535  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1536  the component or when you run it. Any parameter that works with
1537  `openai.ChatCompletion.create` will work here too.
1538  
1539  For details on OpenAI API parameters, see
1540  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
1541  
1542  ### Usage example
1543  
1544  ```python
1545  from haystack.components.generators.chat import OpenAIChatGenerator
1546  from haystack.dataclasses import ChatMessage
1547  
1548  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1549  
1550  client = OpenAIChatGenerator()
1551  response = client.run(messages)
1552  print(response)
1553  ```
1554  Output:
1555  ```
1556  {'replies':
1557      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=
1558      [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence
1559          that focuses on enabling computers to understand, interpret, and generate human language in
1560          a way that is meaningful and useful.")],
1561       _name=None,
1562       _meta={'model': 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop',
1563       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})
1564      ]
1565  }
1566  ```
1567  
1568  <a id="chat/openai.OpenAIChatGenerator.__init__"></a>
1569  
1570  #### OpenAIChatGenerator.\_\_init\_\_
1571  
1572  ```python
1573  def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
1574               model: str = "gpt-4o-mini",
1575               streaming_callback: Optional[StreamingCallbackT] = None,
1576               api_base_url: Optional[str] = None,
1577               organization: Optional[str] = None,
1578               generation_kwargs: Optional[dict[str, Any]] = None,
1579               timeout: Optional[float] = None,
1580               max_retries: Optional[int] = None,
1581               tools: Optional[Union[list[Tool], Toolset]] = None,
1582               tools_strict: bool = False,
1583               http_client_kwargs: Optional[dict[str, Any]] = None)
1584  ```
1585  
1586  Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-4o-mini
1587  
1588  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
1589  environment variables to override the `timeout` and `max_retries` parameters respectively
1590  in the OpenAI client.
1591  
1592  **Arguments**:
1593  
1594  - `api_key`: The OpenAI API key.
1595  You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
1596  during initialization.
1597  - `model`: The name of the model to use.
1598  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1599  The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1600  as an argument.
1601  - `api_base_url`: An optional base URL.
1602  - `organization`: Your organization ID, defaults to `None`. See
1603  [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1604  - `generation_kwargs`: Other parameters to use for the model. These parameters are sent directly to
1605  the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
1606  more details.
1607  Some of the supported parameters:
1608  - `max_tokens`: The maximum number of tokens the output text can have.
1609  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
1610      Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
1611  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1612      considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1613      comprising the top 10% probability mass are considered.
1614  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
1615      it will generate two completions for each of the three prompts, ending up with 6 completions in total.
1616  - `stop`: One or more sequences after which the LLM should stop generating tokens.
1617  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
1618      the model will be less likely to repeat the same token in the text.
1619  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
1620      Bigger values mean the model will be less likely to repeat the same token in the text.
1621  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
1622      values are the bias to add to that token.
1623  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
1624      If provided, the output will always be validated against this
1625      format (unless the model returns a tool call).
1626      For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1627      Notes:
1628      - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
1629        Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1630        For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1631      - For structured outputs with streaming,
1632        the `response_format` must be a JSON schema and not a Pydantic model.
1633  - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the
1634  `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1635  - `max_retries`: Maximum number of retries to contact OpenAI after an internal error.
1636  If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1637  - `tools`: A list of tools or a Toolset for which the model can prepare calls. This parameter can accept either a
1638  list of `Tool` objects or a `Toolset` instance.
1639  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1640  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1641  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1642  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
1643  
1644  <a id="chat/openai.OpenAIChatGenerator.to_dict"></a>
1645  
1646  #### OpenAIChatGenerator.to\_dict
1647  
1648  ```python
1649  def to_dict() -> dict[str, Any]
1650  ```
1651  
1652  Serialize this component to a dictionary.
1653  
1654  **Returns**:
1655  
1656  The serialized component as a dictionary.
1657  
1658  <a id="chat/openai.OpenAIChatGenerator.from_dict"></a>
1659  
1660  #### OpenAIChatGenerator.from\_dict
1661  
1662  ```python
1663  @classmethod
1664  def from_dict(cls, data: dict[str, Any]) -> "OpenAIChatGenerator"
1665  ```
1666  
1667  Deserialize this component from a dictionary.
1668  
1669  **Arguments**:
1670  
1671  - `data`: The dictionary representation of this component.
1672  
1673  **Returns**:
1674  
1675  The deserialized component instance.
1676  
1677  <a id="chat/openai.OpenAIChatGenerator.run"></a>
1678  
1679  #### OpenAIChatGenerator.run
1680  
1681  ```python
1682  @component.output_types(replies=list[ChatMessage])
1683  def run(messages: list[ChatMessage],
1684          streaming_callback: Optional[StreamingCallbackT] = None,
1685          generation_kwargs: Optional[dict[str, Any]] = None,
1686          *,
1687          tools: Optional[Union[list[Tool], Toolset]] = None,
1688          tools_strict: Optional[bool] = None)
1689  ```
1690  
1691  Invokes chat completion based on the provided messages and generation parameters.
1692  
1693  **Arguments**:
1694  
1695  - `messages`: A list of ChatMessage instances representing the input messages.
1696  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1697  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
1698  override the parameters passed during component initialization.
1699  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1700  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
1701  `tools` parameter set during component initialization. This parameter can accept either a list of
1702  `Tool` objects or a `Toolset` instance.
1703  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1704  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1705  If set, it will override the `tools_strict` parameter set during component initialization.
1706  
1707  **Returns**:
1708  
1709  A dictionary with the following key:
1710  - `replies`: A list containing the generated responses as ChatMessage instances.
1711  
1712  <a id="chat/openai.OpenAIChatGenerator.run_async"></a>
1713  
1714  #### OpenAIChatGenerator.run\_async
1715  
1716  ```python
1717  @component.output_types(replies=list[ChatMessage])
1718  async def run_async(messages: list[ChatMessage],
1719                      streaming_callback: Optional[StreamingCallbackT] = None,
1720                      generation_kwargs: Optional[dict[str, Any]] = None,
1721                      *,
1722                      tools: Optional[Union[list[Tool], Toolset]] = None,
1723                      tools_strict: Optional[bool] = None)
1724  ```
1725  
1726  Asynchronously invokes chat completion based on the provided messages and generation parameters.
1727  
1728  This is the asynchronous version of the `run` method. It has the same parameters and return values
1729  but can be used with `await` in async code.
1730  
1731  **Arguments**:
1732  
1733  - `messages`: A list of ChatMessage instances representing the input messages.
1734  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1735  Must be a coroutine.
1736  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
1737  override the parameters passed during component initialization.
1738  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1739  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
1740  `tools` parameter set during component initialization. This parameter can accept either a list of
1741  `Tool` objects or a `Toolset` instance.
1742  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1743  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1744  If set, it will override the `tools_strict` parameter set during component initialization.
1745  
1746  **Returns**:
1747  
1748  A dictionary with the following key:
1749  - `replies`: A list containing the generated responses as ChatMessage instances.