generators_api.md
   1  ---
   2  title: "Generators"
   3  id: generators-api
   4  description: "Enables text generation using LLMs."
   5  slug: "/generators-api"
   6  ---
   7  
   8  
   9  ## azure
  10  
  11  ### AzureOpenAIGenerator
  12  
  13  Bases: <code>OpenAIGenerator</code>
  14  
  15  Generates text using OpenAI's large language models (LLMs).
  16  
  17  It works with the gpt-4 - type models and supports streaming responses
  18  from OpenAI API.
  19  
  20  You can customize how the text is generated by passing parameters to the
  21  OpenAI API. Use the `**generation_kwargs` argument when you initialize
  22  the component or when you run it. Any parameter that works with
  23  `openai.ChatCompletion.create` will work here too.
  24  
  25  For details on OpenAI API parameters, see
  26  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
  27  
  28  ### Usage example
  29  
  30  <!-- test-ignore -->
  31  
  32  ```python
  33  from haystack.components.generators import AzureOpenAIGenerator
  34  from haystack.utils import Secret
  35  client = AzureOpenAIGenerator(
  36      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
  37      api_key=Secret.from_token("<your-api-key>"),
  38      azure_deployment="<this a model name, e.g.  gpt-4.1-mini>")
  39  response = client.run("What's Natural Language Processing? Be brief.")
  40  print(response)
  41  ```
  42  
  43  ```
  44  # >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
  45  # >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
  46  # >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
  47  # >> 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
  48  # >> 'completion_tokens': 49, 'total_tokens': 65}}]}
  49  ```
  50  
  51  #### __init__
  52  
  53  ```python
  54  __init__(
  55      azure_endpoint: str | None = None,
  56      api_version: str | None = "2024-12-01-preview",
  57      azure_deployment: str | None = "gpt-4.1-mini",
  58      api_key: Secret | None = Secret.from_env_var(
  59          "AZURE_OPENAI_API_KEY", strict=False
  60      ),
  61      azure_ad_token: Secret | None = Secret.from_env_var(
  62          "AZURE_OPENAI_AD_TOKEN", strict=False
  63      ),
  64      organization: str | None = None,
  65      streaming_callback: StreamingCallbackT | None = None,
  66      system_prompt: str | None = None,
  67      timeout: float | None = None,
  68      max_retries: int | None = None,
  69      http_client_kwargs: dict[str, Any] | None = None,
  70      generation_kwargs: dict[str, Any] | None = None,
  71      default_headers: dict[str, str] | None = None,
  72      *,
  73      azure_ad_token_provider: AzureADTokenProvider | None = None
  74  ) -> None
  75  ```
  76  
  77  Initialize the Azure OpenAI Generator.
  78  
  79  **Parameters:**
  80  
  81  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`.
  82  - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview.
  83  - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name.
  84  - **api_key** (<code>Secret | None</code>) – The API key to use for authentication.
  85  - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
  86  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
  87    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
  88  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
  89    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
  90    as an argument.
  91  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the Generator
  92    omits the system prompt and uses the default system prompt.
  93  - **timeout** (<code>float | None</code>) – Timeout for AzureOpenAI client. If not set, it is inferred from the
  94    `OPENAI_TIMEOUT` environment variable or set to 30.
  95  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with AzureOpenAI if it returns an internal error.
  96    If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
  97  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
  98    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
  99  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model, sent directly to
 100    the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for
 101    more details.
 102    Some of the supported parameters:
 103  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 104    including visible output tokens and reasoning tokens.
 105  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 106    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 107  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 108    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
 109    comprising the top 10% probability mass are considered.
 110  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 111    the LLM will generate two completions per prompt, resulting in 6 completions total.
 112  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 113  - `presence_penalty`: The penalty applied if a token is already present.
 114    Higher values make the model less likely to repeat the token.
 115  - `frequency_penalty`: Penalty applied if a token has already been generated.
 116    Higher values make the model less likely to repeat the token.
 117  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 118    values are the bias to add to that token.
 119  - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client.
 120  - **azure_ad_token_provider** (<code>AzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on
 121    every request.
 122  
 123  #### to_dict
 124  
 125  ```python
 126  to_dict() -> dict[str, Any]
 127  ```
 128  
 129  Serialize this component to a dictionary.
 130  
 131  **Returns:**
 132  
 133  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 134  
 135  #### from_dict
 136  
 137  ```python
 138  from_dict(data: dict[str, Any]) -> AzureOpenAIGenerator
 139  ```
 140  
 141  Deserialize this component from a dictionary.
 142  
 143  **Parameters:**
 144  
 145  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 146  
 147  **Returns:**
 148  
 149  - <code>AzureOpenAIGenerator</code> – The deserialized component instance.
 150  
 151  ## chat/azure
 152  
 153  ### AzureOpenAIChatGenerator
 154  
 155  Bases: <code>OpenAIChatGenerator</code>
 156  
 157  Generates text using OpenAI's models on Azure.
 158  
 159  It works with the gpt-4 - type models and supports streaming responses
 160  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 161  format in input and output.
 162  
 163  You can customize how the text is generated by passing parameters to the
 164  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 165  the component or when you run it. Any parameter that works with
 166  `openai.ChatCompletion.create` will work here too.
 167  
 168  For details on OpenAI API parameters, see
 169  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 170  
 171  ### Usage example
 172  
 173  <!-- test-ignore -->
 174  
 175  ```python
 176  from haystack.components.generators.chat import AzureOpenAIChatGenerator
 177  from haystack.dataclasses import ChatMessage
 178  from haystack.utils import Secret
 179  
 180  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 181  
 182  client = AzureOpenAIChatGenerator(
 183      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
 184      api_key=Secret.from_token("<your-api-key>"),
 185      azure_deployment="<this a model name, e.g. gpt-4.1-mini>")
 186  response = client.run(messages)
 187  print(response)
 188  ```
 189  
 190  ```
 191  {'replies':
 192      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
 193      "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
 194       enabling computers to understand, interpret, and generate human language in a way that is useful.")],
 195       _name=None,
 196       _meta={'model': 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop',
 197       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})]
 198  }
 199  ```
 200  
 201  #### SUPPORTED_MODELS
 202  
 203  ```python
 204  SUPPORTED_MODELS: list[str] = [
 205      "gpt-5.4",
 206      "gpt-5.4-pro",
 207      "gpt-5.3-codex",
 208      "gpt-5.2",
 209      "gpt-5.2-codex",
 210      "gpt-5.2-chat",
 211      "gpt-5.1",
 212      "gpt-5.1-chat",
 213      "gpt-5.1-codex",
 214      "gpt-5.1-codex-mini",
 215      "gpt-5",
 216      "gpt-5-mini",
 217      "gpt-5-nano",
 218      "gpt-5-chat",
 219      "gpt-4.1",
 220      "gpt-4.1-mini",
 221      "gpt-4.1-nano",
 222      "gpt-4o",
 223      "gpt-4o-mini",
 224      "gpt-4o-audio-preview",
 225      "gpt-realtime-1.5",
 226      "gpt-audio-1.5",
 227      "o1",
 228      "o1-mini",
 229      "o3",
 230      "o3-mini",
 231      "o4-mini",
 232      "codex-mini",
 233      "gpt-4",
 234      "gpt-35-turbo",
 235      "gpt-oss-120b",
 236      "computer-use-preview",
 237  ]
 238  
 239  ```
 240  
 241  A non-exhaustive list of chat models supported by this component.
 242  See https://learn.microsoft.com/en-us/azure/foundry/foundry-models/concepts/models-sold-directly-by-azure
 243  for the full list.
 244  
 245  #### __init__
 246  
 247  ```python
 248  __init__(
 249      azure_endpoint: str | None = None,
 250      api_version: str | None = "2024-12-01-preview",
 251      azure_deployment: str | None = "gpt-4.1-mini",
 252      api_key: Secret | None = Secret.from_env_var(
 253          "AZURE_OPENAI_API_KEY", strict=False
 254      ),
 255      azure_ad_token: Secret | None = Secret.from_env_var(
 256          "AZURE_OPENAI_AD_TOKEN", strict=False
 257      ),
 258      organization: str | None = None,
 259      streaming_callback: StreamingCallbackT | None = None,
 260      timeout: float | None = None,
 261      max_retries: int | None = None,
 262      generation_kwargs: dict[str, Any] | None = None,
 263      default_headers: dict[str, str] | None = None,
 264      tools: ToolsType | None = None,
 265      tools_strict: bool = False,
 266      *,
 267      azure_ad_token_provider: (
 268          AzureADTokenProvider | AsyncAzureADTokenProvider | None
 269      ) = None,
 270      http_client_kwargs: dict[str, Any] | None = None
 271  ) -> None
 272  ```
 273  
 274  Initialize the Azure OpenAI Chat Generator component.
 275  
 276  **Parameters:**
 277  
 278  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
 279  - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview.
 280  - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name.
 281  - **api_key** (<code>Secret | None</code>) – The API key to use for authentication.
 282  - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
 283  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
 284    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
 285  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
 286    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
 287    as an argument.
 288  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
 289    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 290  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
 291    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 292  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to
 293    the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 294    Some of the supported parameters:
 295  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 296    including visible output tokens and reasoning tokens.
 297  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 298    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 299  - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers
 300    tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising
 301    the top 10% probability mass are considered.
 302  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 303    the LLM will generate two completions per prompt, resulting in 6 completions total.
 304  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 305  - `presence_penalty`: The penalty applied if a token is already present.
 306    Higher values make the model less likely to repeat the token.
 307  - `frequency_penalty`: Penalty applied if a token has already been generated.
 308    Higher values make the model less likely to repeat the token.
 309  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 310    values are the bias to add to that token.
 311  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
 312    If provided, the output will always be validated against this
 313    format (unless the model returns a tool call).
 314    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
 315    Notes:
 316    - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
 317      Older models only support basic version of structured outputs through `{"type": "json_object"}`.
 318      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
 319    - For structured outputs with streaming,
 320      the `response_format` must be a JSON schema and not a Pydantic model.
 321  - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client.
 322  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 323  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 324    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 325  - **azure_ad_token_provider** (<code>AzureADTokenProvider | AsyncAzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on
 326    every request.
 327  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 328    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
 329  
 330  #### warm_up
 331  
 332  ```python
 333  warm_up() -> None
 334  ```
 335  
 336  Warm up the Azure OpenAI chat generator.
 337  
 338  This will warm up the tools registered in the chat generator.
 339  This method is idempotent and will only warm up the tools once.
 340  
 341  #### to_dict
 342  
 343  ```python
 344  to_dict() -> dict[str, Any]
 345  ```
 346  
 347  Serialize this component to a dictionary.
 348  
 349  **Returns:**
 350  
 351  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 352  
 353  #### from_dict
 354  
 355  ```python
 356  from_dict(data: dict[str, Any]) -> AzureOpenAIChatGenerator
 357  ```
 358  
 359  Deserialize this component from a dictionary.
 360  
 361  **Parameters:**
 362  
 363  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 364  
 365  **Returns:**
 366  
 367  - <code>AzureOpenAIChatGenerator</code> – The deserialized component instance.
 368  
 369  ## chat/azure_responses
 370  
 371  ### AzureOpenAIResponsesChatGenerator
 372  
 373  Bases: <code>OpenAIResponsesChatGenerator</code>
 374  
 375  Completes chats using OpenAI's Responses API on Azure.
 376  
 377  It works with the gpt-5 and o-series models and supports streaming responses
 378  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 379  format in input and output.
 380  
 381  You can customize how the text is generated by passing parameters to the
 382  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 383  the component or when you run it. Any parameter that works with
 384  `openai.Responses.create` will work here too.
 385  
 386  For details on OpenAI API parameters, see
 387  [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses).
 388  
 389  ### Usage example
 390  
 391  <!-- test-ignore -->
 392  
 393  ```python
 394  from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator
 395  from haystack.dataclasses import ChatMessage
 396  
 397  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 398  
 399  client = AzureOpenAIResponsesChatGenerator(
 400      azure_endpoint="https://example-resource.azure.openai.com/",
 401      generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}
 402  )
 403  response = client.run(messages)
 404  print(response)
 405  ```
 406  
 407  #### SUPPORTED_MODELS
 408  
 409  ```python
 410  SUPPORTED_MODELS: list[str] = [
 411      "gpt-5.4-pro",
 412      "gpt-5.4",
 413      "gpt-5.3-chat",
 414      "gpt-5.3-codex",
 415      "gpt-5.2-codex",
 416      "gpt-5.2",
 417      "gpt-5.2-chat",
 418      "gpt-5.1-codex-max",
 419      "gpt-5.1",
 420      "gpt-5.1-chat",
 421      "gpt-5.1-codex",
 422      "gpt-5.1-codex-mini",
 423      "gpt-5-pro",
 424      "gpt-5-codex",
 425      "gpt-5",
 426      "gpt-5-mini",
 427      "gpt-5-nano",
 428      "gpt-5-chat",
 429      "gpt-4o",
 430      "gpt-4o-mini",
 431      "computer-use-preview",
 432      "gpt-4.1",
 433      "gpt-4.1-nano",
 434      "gpt-4.1-mini",
 435      "gpt-image-1",
 436      "gpt-image-1-mini",
 437      "gpt-image-1.5",
 438      "o1",
 439      "o3-mini",
 440      "o3",
 441      "o4-mini",
 442  ]
 443  
 444  ```
 445  
 446  A non-exhaustive list of chat models supported by this component.
 447  See https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/responses#model-support for the full list.
 448  
 449  #### __init__
 450  
 451  ```python
 452  __init__(
 453      *,
 454      api_key: (
 455          Secret | Callable[[], str] | Callable[[], Awaitable[str]]
 456      ) = Secret.from_env_var("AZURE_OPENAI_API_KEY", strict=False),
 457      azure_endpoint: str | None = None,
 458      azure_deployment: str = "gpt-5-mini",
 459      streaming_callback: StreamingCallbackT | None = None,
 460      organization: str | None = None,
 461      generation_kwargs: dict[str, Any] | None = None,
 462      timeout: float | None = None,
 463      max_retries: int | None = None,
 464      tools: ToolsType | None = None,
 465      tools_strict: bool = False,
 466      http_client_kwargs: dict[str, Any] | None = None
 467  ) -> None
 468  ```
 469  
 470  Initialize the AzureOpenAIResponsesChatGenerator component.
 471  
 472  **Parameters:**
 473  
 474  - **api_key** (<code>Secret | Callable\[[], str\] | Callable\[[], Awaitable\[str\]\]</code>) – The API key to use for authentication. Can be:
 475  - A `Secret` object containing the API key.
 476  - A `Secret` object containing the [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
 477  - A function that returns an Azure Active Directory token.
 478  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
 479  - **azure_deployment** (<code>str</code>) – The deployment of the model, usually the model name.
 480  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
 481    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
 482  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
 483    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
 484    as an argument.
 485  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
 486    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 487  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
 488    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 489  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent
 490    directly to the OpenAI endpoint.
 491    See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for
 492    more details.
 493    Some of the supported parameters:
 494  - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random,
 495    while lower values like 0.2 will make it more focused and deterministic.
 496  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 497    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
 498    comprising the top 10% probability mass are considered.
 499  - `previous_response_id`: The ID of the previous response.
 500    Use this to create multi-turn conversations.
 501  - `text_format`: A Pydantic model that enforces the structure of the model's response.
 502    If provided, the output will always be validated against this
 503    format (unless the model returns a tool call).
 504    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
 505  - `text`: A JSON schema that enforces the structure of the model's response.
 506    If provided, the output will always be validated against this
 507    format (unless the model returns a tool call).
 508    Notes:
 509    - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
 510    - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored.
 511    - Currently, this component doesn't support streaming for structured outputs.
 512    - Older models only support basic version of structured outputs through `{"type": "json_object"}`.
 513      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
 514  - `reasoning`: A dictionary of parameters for reasoning. For example:
 515    - `summary`: The summary of the reasoning.
 516    - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`.
 517    - `generate_summary`: Whether to generate a summary of the reasoning.
 518      Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled.
 519      For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning).
 520  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 521  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 522    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 523  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 524    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
 525  
 526  #### to_dict
 527  
 528  ```python
 529  to_dict() -> dict[str, Any]
 530  ```
 531  
 532  Serialize this component to a dictionary.
 533  
 534  **Returns:**
 535  
 536  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 537  
 538  #### from_dict
 539  
 540  ```python
 541  from_dict(data: dict[str, Any]) -> AzureOpenAIResponsesChatGenerator
 542  ```
 543  
 544  Deserialize this component from a dictionary.
 545  
 546  **Parameters:**
 547  
 548  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 549  
 550  **Returns:**
 551  
 552  - <code>AzureOpenAIResponsesChatGenerator</code> – The deserialized component instance.
 553  
 554  ## chat/fallback
 555  
 556  ### FallbackChatGenerator
 557  
 558  A chat generator wrapper that tries multiple chat generators sequentially.
 559  
 560  It forwards all parameters transparently to the underlying chat generators and returns the first successful result.
 561  Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator.
 562  If all chat generators fail, it raises a RuntimeError with details.
 563  
 564  Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only
 565  work correctly if the underlying chat generators implement proper timeout handling and raise exceptions
 566  when timeouts occur. For predictable latency guarantees, ensure your chat generators:
 567  
 568  - Support a `timeout` parameter in their initialization
 569  - Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming)
 570  - Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded
 571  
 572  Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters
 573  with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`)
 574  typically applies to all connection phases: connection setup, read, write, and pool. For streaming
 575  responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for
 576  receiving the complete response.
 577  
 578  Fail over is automatically triggered when a generator raises any exception, including:
 579  
 580  - Timeout errors (if the generator implements and raises them)
 581  - Rate limit errors (429)
 582  - Authentication errors (401)
 583  - Context length errors (400)
 584  - Server errors (500+)
 585  - Any other exception
 586  
 587  #### __init__
 588  
 589  ```python
 590  __init__(chat_generators: list[ChatGenerator]) -> None
 591  ```
 592  
 593  Creates an instance of FallbackChatGenerator.
 594  
 595  **Parameters:**
 596  
 597  - **chat_generators** (<code>list\[ChatGenerator\]</code>) – A non-empty list of chat generator components to try in order.
 598  
 599  #### to_dict
 600  
 601  ```python
 602  to_dict() -> dict[str, Any]
 603  ```
 604  
 605  Serialize the component, including nested chat generators when they support serialization.
 606  
 607  #### from_dict
 608  
 609  ```python
 610  from_dict(data: dict[str, Any]) -> FallbackChatGenerator
 611  ```
 612  
 613  Rebuild the component from a serialized representation, restoring nested chat generators.
 614  
 615  #### warm_up
 616  
 617  ```python
 618  warm_up() -> None
 619  ```
 620  
 621  Warm up all underlying chat generators.
 622  
 623  This method calls warm_up() on each underlying generator that supports it.
 624  
 625  #### run
 626  
 627  ```python
 628  run(
 629      messages: list[ChatMessage],
 630      generation_kwargs: dict[str, Any] | None = None,
 631      tools: ToolsType | None = None,
 632      streaming_callback: StreamingCallbackT | None = None,
 633  ) -> dict[str, list[ChatMessage] | dict[str, Any]]
 634  ```
 635  
 636  Execute chat generators sequentially until one succeeds.
 637  
 638  **Parameters:**
 639  
 640  - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances.
 641  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens).
 642  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
 643  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses.
 644  
 645  **Returns:**
 646  
 647  - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with:
 648  - "replies": Generated ChatMessage instances from the first successful generator.
 649  - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
 650    total_attempts, failed_chat_generators, plus any metadata from the successful generator.
 651  
 652  **Raises:**
 653  
 654  - <code>RuntimeError</code> – If all chat generators fail.
 655  
 656  #### run_async
 657  
 658  ```python
 659  run_async(
 660      messages: list[ChatMessage],
 661      generation_kwargs: dict[str, Any] | None = None,
 662      tools: ToolsType | None = None,
 663      streaming_callback: StreamingCallbackT | None = None,
 664  ) -> dict[str, list[ChatMessage] | dict[str, Any]]
 665  ```
 666  
 667  Asynchronously execute chat generators sequentially until one succeeds.
 668  
 669  **Parameters:**
 670  
 671  - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances.
 672  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens).
 673  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
 674  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses.
 675  
 676  **Returns:**
 677  
 678  - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with:
 679  - "replies": Generated ChatMessage instances from the first successful generator.
 680  - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
 681    total_attempts, failed_chat_generators, plus any metadata from the successful generator.
 682  
 683  **Raises:**
 684  
 685  - <code>RuntimeError</code> – If all chat generators fail.
 686  
 687  ## chat/hugging_face_api
 688  
 689  ### HuggingFaceAPIChatGenerator
 690  
 691  Completes chats using Hugging Face APIs.
 692  
 693  HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 694  format for input and output. Use it to generate text with Hugging Face APIs:
 695  
 696  - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers)
 697  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
 698  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
 699  
 700  ### Usage examples
 701  
 702  #### With the serverless inference API (Inference Providers) - free tier available
 703  
 704  <!-- test-ignore -->
 705  
 706  ```python
 707  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 708  from haystack.dataclasses import ChatMessage
 709  from haystack.utils import Secret
 710  from haystack.utils.hf import HFGenerationAPIType
 711  
 712  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 713              ChatMessage.from_user("What's Natural Language Processing?")]
 714  
 715  # the api_type can be expressed using the HFGenerationAPIType enum or as a string
 716  api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API
 717  api_type = "serverless_inference_api" # this is equivalent to the above
 718  
 719  generator = HuggingFaceAPIChatGenerator(api_type=api_type,
 720                                          api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
 721                                                      "provider": "together"},
 722                                          token=Secret.from_token("<your-api-key>"))
 723  
 724  result = generator.run(messages)
 725  print(result)
 726  ```
 727  
 728  #### With the serverless inference API (Inference Providers) and text+image input
 729  
 730  <!-- test-ignore -->
 731  
 732  ```python
 733  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 734  from haystack.dataclasses import ChatMessage, ImageContent
 735  from haystack.utils import Secret
 736  from haystack.utils.hf import HFGenerationAPIType
 737  
 738  # Create an image from file path, URL, or base64
 739  image = ImageContent.from_file_path("path/to/your/image.jpg")
 740  
 741  # Create a multimodal message with both text and image
 742  messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])]
 743  
 744  generator = HuggingFaceAPIChatGenerator(
 745      api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
 746      api_params={
 747          "model": "Qwen/Qwen2.5-VL-7B-Instruct",  # Vision Language Model
 748          "provider": "hyperbolic"
 749      },
 750      token=Secret.from_token("<your-api-key>")
 751  )
 752  
 753  result = generator.run(messages)
 754  print(result)
 755  ```
 756  
 757  #### With paid inference endpoints
 758  
 759  <!-- test-ignore -->
 760  
 761  ```python
 762  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 763  from haystack.dataclasses import ChatMessage
 764  from haystack.utils import Secret
 765  
 766  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 767              ChatMessage.from_user("What's Natural Language Processing?")]
 768  
 769  generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints",
 770                                          api_params={"url": "<your-inference-endpoint-url>"},
 771                                          token=Secret.from_token("<your-api-key>"))
 772  
 773  result = generator.run(messages)
 774  print(result)
 775  ```
 776  
 777  #### With self-hosted text generation inference
 778  
 779  <!-- test-ignore -->
 780  
 781  ```python
 782  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 783  from haystack.dataclasses import ChatMessage
 784  
 785  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 786              ChatMessage.from_user("What's Natural Language Processing?")]
 787  
 788  generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference",
 789                                          api_params={"url": "http://localhost:8080"})
 790  
 791  result = generator.run(messages)
 792  print(result)
 793  ```
 794  
 795  #### __init__
 796  
 797  ```python
 798  __init__(
 799      api_type: HFGenerationAPIType | str,
 800      api_params: dict[str, str],
 801      token: Secret | None = Secret.from_env_var(
 802          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
 803      ),
 804      generation_kwargs: dict[str, Any] | None = None,
 805      stop_words: list[str] | None = None,
 806      streaming_callback: StreamingCallbackT | None = None,
 807      tools: ToolsType | None = None,
 808  ) -> None
 809  ```
 810  
 811  Initialize the HuggingFaceAPIChatGenerator instance.
 812  
 813  **Parameters:**
 814  
 815  - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types:
 816  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
 817  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
 818  - `serverless_inference_api`: See
 819    [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers).
 820  - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys:
 821  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
 822  - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`.
 823  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
 824    `TEXT_GENERATION_INFERENCE`.
 825  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc.
 826  - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization.
 827    Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
 828  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
 829    Some examples: `max_tokens`, `temperature`, `top_p`.
 830    For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
 831  - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words.
 832  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
 833  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 834    The chosen model should support tool/function calling, according to the model card.
 835    Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience
 836    unexpected behavior.
 837  
 838  #### warm_up
 839  
 840  ```python
 841  warm_up() -> None
 842  ```
 843  
 844  Warm up the Hugging Face API chat generator.
 845  
 846  This will warm up the tools registered in the chat generator.
 847  This method is idempotent and will only warm up the tools once.
 848  
 849  #### to_dict
 850  
 851  ```python
 852  to_dict() -> dict[str, Any]
 853  ```
 854  
 855  Serialize this component to a dictionary.
 856  
 857  **Returns:**
 858  
 859  - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component.
 860  
 861  #### from_dict
 862  
 863  ```python
 864  from_dict(data: dict[str, Any]) -> HuggingFaceAPIChatGenerator
 865  ```
 866  
 867  Deserialize this component from a dictionary.
 868  
 869  #### run
 870  
 871  ```python
 872  run(
 873      messages: list[ChatMessage],
 874      generation_kwargs: dict[str, Any] | None = None,
 875      tools: ToolsType | None = None,
 876      streaming_callback: StreamingCallbackT | None = None,
 877  ) -> dict[str, list[ChatMessage]]
 878  ```
 879  
 880  Invoke the text generation inference based on the provided messages and generation parameters.
 881  
 882  **Parameters:**
 883  
 884  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
 885  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
 886  - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override
 887    the `tools` parameter set during component initialization. This parameter can accept either a
 888    list of `Tool` objects or a `Toolset` instance.
 889  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
 890    parameter set during component initialization.
 891  
 892  **Returns:**
 893  
 894  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
 895  - `replies`: A list containing the generated responses as ChatMessage objects.
 896  
 897  #### run_async
 898  
 899  ```python
 900  run_async(
 901      messages: list[ChatMessage],
 902      generation_kwargs: dict[str, Any] | None = None,
 903      tools: ToolsType | None = None,
 904      streaming_callback: StreamingCallbackT | None = None,
 905  ) -> dict[str, list[ChatMessage]]
 906  ```
 907  
 908  Asynchronously invokes the text generation inference based on the provided messages and generation parameters.
 909  
 910  This is the asynchronous version of the `run` method. It has the same parameters
 911  and return values but can be used with `await` in an async code.
 912  
 913  **Parameters:**
 914  
 915  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
 916  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
 917  - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools`
 918    parameter set during component initialization. This parameter can accept either a list of `Tool` objects
 919    or a `Toolset` instance.
 920  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
 921    parameter set during component initialization.
 922  
 923  **Returns:**
 924  
 925  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
 926  - `replies`: A list containing the generated responses as ChatMessage objects.
 927  
 928  ## chat/hugging_face_local
 929  
 930  ### default_tool_parser
 931  
 932  ```python
 933  default_tool_parser(text: str) -> list[ToolCall] | None
 934  ```
 935  
 936  Default implementation for parsing tool calls from model output text.
 937  
 938  Uses DEFAULT_TOOL_PATTERN to extract tool calls.
 939  
 940  **Parameters:**
 941  
 942  - **text** (<code>str</code>) – The text to parse for tool calls.
 943  
 944  **Returns:**
 945  
 946  - <code>list\[ToolCall\] | None</code> – A list containing a single ToolCall if a valid tool call is found, None otherwise.
 947  
 948  ### HuggingFaceLocalChatGenerator
 949  
 950  Generates chat responses using models from Hugging Face that run locally.
 951  
 952  Use this component with chat-based models,
 953  such as `Qwen/Qwen3-0.6B` or `meta-llama/Llama-2-7b-chat-hf`.
 954  LLMs running locally may need powerful hardware.
 955  
 956  ### Usage example
 957  
 958  <!-- test-ignore -->
 959  
 960  ```python
 961  from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
 962  from haystack.dataclasses import ChatMessage
 963  
 964  generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B")
 965  messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
 966  print(generator.run(messages))
 967  ```
 968  
 969  ```
 970  {'replies':
 971      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
 972      "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals
 973      with the interaction between computers and human language. It enables computers to understand, interpret, and
 974      generate human language in a valuable way. NLP involves various techniques such as speech recognition, text
 975      analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to
 976      process and derive meaning from human language, improving communication between humans and machines.")],
 977      _name=None,
 978      _meta={'finish_reason': 'stop', 'index': 0, 'model':
 979            'mistralai/Mistral-7B-Instruct-v0.2',
 980            'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}})
 981            ]
 982  }
 983  ```
 984  
 985  #### __init__
 986  
 987  ```python
 988  __init__(
 989      model: str = "Qwen/Qwen3-0.6B",
 990      task: (
 991          Literal["text-generation", "text2text-generation", "image-text-to-text"]
 992          | None
 993      ) = None,
 994      device: ComponentDevice | None = None,
 995      token: Secret | None = Secret.from_env_var(
 996          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
 997      ),
 998      chat_template: str | None = None,
 999      generation_kwargs: dict[str, Any] | None = None,
1000      huggingface_pipeline_kwargs: dict[str, Any] | None = None,
1001      stop_words: list[str] | None = None,
1002      streaming_callback: StreamingCallbackT | None = None,
1003      tools: ToolsType | None = None,
1004      tool_parsing_function: Callable[[str], list[ToolCall] | None] | None = None,
1005      async_executor: ThreadPoolExecutor | None = None,
1006      *,
1007      enable_thinking: bool = False
1008  ) -> None
1009  ```
1010  
1011  Initializes the HuggingFaceLocalChatGenerator component.
1012  
1013  **Parameters:**
1014  
1015  - **model** (<code>str</code>) – The Hugging Face text generation model name or path,
1016    for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`.
1017    The model must be a chat model supporting the ChatML messaging
1018    format.
1019    If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1020  - **task** (<code>Literal['text-generation', 'text2text-generation', 'image-text-to-text'] | None</code>) – The task for the Hugging Face pipeline. Possible options:
1021  - `text-generation`: Supported by decoder models, like GPT.
1022  - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead.
1023    Previously supported by encoder–decoder models such as T5.
1024  - `image-text-to-text`: Supported by vision-language models.
1025    If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1026    If not specified, the component calls the Hugging Face API to infer the task from the model name.
1027  - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device.
1028    If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
1029  - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files.
1030    If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1031  - **chat_template** (<code>str | None</code>) – Specifies an optional Jinja template for formatting chat
1032    messages. Most high-quality chat models have their own templates, but for models without this
1033    feature or if you prefer a custom template, use this parameter.
1034  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
1035    Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
1036    See Hugging Face's documentation for more information:
1037  - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
1038  - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
1039      The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens.
1040  - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the
1041    Hugging Face pipeline for text generation.
1042    These keyword arguments provide fine-grained control over the Hugging Face pipeline.
1043    In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
1044    For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
1045    In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
1046  - **stop_words** (<code>list\[str\] | None</code>) – A list of stop words. If the model generates a stop word, the generation stops.
1047    If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
1048    For some chat models, the output includes both the new text and the original prompt.
1049    In these cases, make sure your prompt has no stop words.
1050  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1051  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1052  - **tool_parsing_function** (<code>Callable\\[[str\], list\[ToolCall\] | None\] | None</code>) – A callable that takes a string and returns a list of ToolCall objects or None.
1053    If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern.
1054  - **async_executor** (<code>ThreadPoolExecutor | None</code>) – Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be
1055    initialized and used
1056  - **enable_thinking** (<code>bool</code>) – Whether to enable thinking mode in the chat template for thinking-capable models.
1057    When enabled, the model generates intermediate reasoning before the final response. Defaults to False.
1058  
1059  #### shutdown
1060  
1061  ```python
1062  shutdown() -> None
1063  ```
1064  
1065  Explicitly shutdown the executor if we own it.
1066  
1067  #### warm_up
1068  
1069  ```python
1070  warm_up() -> None
1071  ```
1072  
1073  Initializes the component and warms up tools if provided.
1074  
1075  #### to_dict
1076  
1077  ```python
1078  to_dict() -> dict[str, Any]
1079  ```
1080  
1081  Serializes the component to a dictionary.
1082  
1083  **Returns:**
1084  
1085  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1086  
1087  #### from_dict
1088  
1089  ```python
1090  from_dict(data: dict[str, Any]) -> HuggingFaceLocalChatGenerator
1091  ```
1092  
1093  Deserializes the component from a dictionary.
1094  
1095  **Parameters:**
1096  
1097  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
1098  
1099  **Returns:**
1100  
1101  - <code>HuggingFaceLocalChatGenerator</code> – The deserialized component.
1102  
1103  #### run
1104  
1105  ```python
1106  run(
1107      messages: list[ChatMessage],
1108      generation_kwargs: dict[str, Any] | None = None,
1109      streaming_callback: StreamingCallbackT | None = None,
1110      tools: ToolsType | None = None,
1111  ) -> dict[str, list[ChatMessage]]
1112  ```
1113  
1114  Invoke text generation inference based on the provided messages and generation parameters.
1115  
1116  **Parameters:**
1117  
1118  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
1119  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
1120  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1121  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1122    If set, it will override the `tools` parameter provided during initialization.
1123  
1124  **Returns:**
1125  
1126  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
1127  - `replies`: A list containing the generated responses as ChatMessage instances.
1128  
1129  #### create_message
1130  
1131  ```python
1132  create_message(
1133      text: str,
1134      index: int,
1135      tokenizer: Union[PreTrainedTokenizer, PreTrainedTokenizerFast],
1136      prompt: str,
1137      generation_kwargs: dict[str, Any],
1138      parse_tool_calls: bool = False,
1139  ) -> ChatMessage
1140  ```
1141  
1142  Create a ChatMessage instance from the provided text, populated with metadata.
1143  
1144  **Parameters:**
1145  
1146  - **text** (<code>str</code>) – The generated text.
1147  - **index** (<code>int</code>) – The index of the generated text.
1148  - **tokenizer** (<code>Union\[PreTrainedTokenizer, PreTrainedTokenizerFast\]</code>) – The tokenizer used for generation.
1149  - **prompt** (<code>str</code>) – The prompt used for generation.
1150  - **generation_kwargs** (<code>dict\[str, Any\]</code>) – The generation parameters.
1151  - **parse_tool_calls** (<code>bool</code>) – Whether to attempt parsing tool calls from the text.
1152  
1153  **Returns:**
1154  
1155  - <code>ChatMessage</code> – A ChatMessage instance.
1156  
1157  #### run_async
1158  
1159  ```python
1160  run_async(
1161      messages: list[ChatMessage],
1162      generation_kwargs: dict[str, Any] | None = None,
1163      streaming_callback: StreamingCallbackT | None = None,
1164      tools: ToolsType | None = None,
1165  ) -> dict[str, list[ChatMessage]]
1166  ```
1167  
1168  Asynchronously invokes text generation inference based on the provided messages and generation parameters.
1169  
1170  This is the asynchronous version of the `run` method. It has the same parameters
1171  and return values but can be used with `await` in an async code.
1172  
1173  **Parameters:**
1174  
1175  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
1176  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
1177  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1178  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1179    If set, it will override the `tools` parameter provided during initialization.
1180  
1181  **Returns:**
1182  
1183  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
1184  - `replies`: A list containing the generated responses as ChatMessage instances.
1185  
1186  ## chat/llm
1187  
1188  ### LLM
1189  
1190  Bases: <code>Agent</code>
1191  
1192  A text generation component powered by a large language model.
1193  
1194  The LLM component is a simplified version of the Agent that focuses solely on text generation
1195  without tool usage. It processes messages and returns a single response from the language model.
1196  
1197  ### Usage examples
1198  
1199  ```python
1200  from haystack.components.generators.chat import LLM
1201  from haystack.components.generators.chat import OpenAIChatGenerator
1202  from haystack.dataclasses import ChatMessage
1203  
1204  llm = LLM(
1205      chat_generator=OpenAIChatGenerator(),
1206      system_prompt="You are a helpful translation assistant.",
1207      user_prompt="""{% message role="user"%}
1208  Summarize the following document: {{ document }}
1209  {% endmessage %}""",
1210      required_variables=["document"],
1211  )
1212  
1213  result = llm.run(document="The weather is lovely today and the sun is shining. ")
1214  print(result["last_message"].text)
1215  ```
1216  
1217  #### __init__
1218  
1219  ```python
1220  __init__(
1221      *,
1222      chat_generator: ChatGenerator,
1223      system_prompt: str | None = None,
1224      user_prompt: str | None = None,
1225      required_variables: list[str] | Literal["*"] | None = None,
1226      streaming_callback: StreamingCallbackT | None = None
1227  ) -> None
1228  ```
1229  
1230  Initialize the LLM component.
1231  
1232  **Parameters:**
1233  
1234  - **chat_generator** (<code>ChatGenerator</code>) – An instance of the chat generator that the LLM should use.
1235  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM.
1236  - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided this is appended to the messages provided at runtime.
1237  - **required_variables** (<code>list\[str\] | Literal['\*'] | None</code>) – List variables that must be provided as input to user_prompt.
1238    If a variable listed as required is not provided, an exception is raised.
1239    If set to `"*"`, all variables found in the prompt are required. Optional.
1240  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM.
1241  
1242  #### to_dict
1243  
1244  ```python
1245  to_dict() -> dict[str, Any]
1246  ```
1247  
1248  Serialize the LLM component to a dictionary.
1249  
1250  **Returns:**
1251  
1252  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1253  
1254  #### from_dict
1255  
1256  ```python
1257  from_dict(data: dict[str, Any]) -> LLM
1258  ```
1259  
1260  Deserialize the LLM from a dictionary.
1261  
1262  **Parameters:**
1263  
1264  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
1265  
1266  **Returns:**
1267  
1268  - <code>LLM</code> – Deserialized LLM instance.
1269  
1270  #### run
1271  
1272  ```python
1273  run(
1274      messages: list[ChatMessage] | None = None,
1275      streaming_callback: StreamingCallbackT | None = None,
1276      *,
1277      generation_kwargs: dict[str, Any] | None = None,
1278      system_prompt: str | None = None,
1279      user_prompt: str | None = None,
1280      **kwargs: Any
1281  ) -> dict[str, Any]
1282  ```
1283  
1284  Process messages and generate a response from the language model.
1285  
1286  **Parameters:**
1287  
1288  - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process.
1289  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM.
1290  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters
1291    will override the parameters passed during component initialization.
1292  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt.
1293  - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is
1294    appended to the messages provided at runtime.
1295  - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt`
1296    (the keys must match template variable names).
1297  
1298  **Returns:**
1299  
1300  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
1301  - "messages": List of all messages exchanged during the LLM's run.
1302  - "last_message": The last message exchanged during the LLM's run.
1303  
1304  #### run_async
1305  
1306  ```python
1307  run_async(
1308      messages: list[ChatMessage] | None = None,
1309      streaming_callback: StreamingCallbackT | None = None,
1310      *,
1311      generation_kwargs: dict[str, Any] | None = None,
1312      system_prompt: str | None = None,
1313      user_prompt: str | None = None,
1314      **kwargs: Any
1315  ) -> dict[str, Any]
1316  ```
1317  
1318  Asynchronously process messages and generate a response from the language model.
1319  
1320  **Parameters:**
1321  
1322  - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process.
1323  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An asynchronous callback that will be invoked when a response is streamed
1324    from the LLM.
1325  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters
1326    will override the parameters passed during component initialization.
1327  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt.
1328  - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is
1329    appended to the messages provided at runtime.
1330  - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt`
1331    (the keys must match template variable names).
1332  
1333  **Returns:**
1334  
1335  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
1336  - "messages": List of all messages exchanged during the LLM's run.
1337  - "last_message": The last message exchanged during the LLM's run.
1338  
1339  ## chat/openai
1340  
1341  ### OpenAIChatGenerator
1342  
1343  Completes chats using OpenAI's large language models (LLMs).
1344  
1345  It works with the gpt-4 and gpt-5 series models and supports streaming responses
1346  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1347  format in input and output.
1348  
1349  You can customize how the text is generated by passing parameters to the
1350  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1351  the component or when you run it. Any parameter that works with
1352  `openai.ChatCompletion.create` will work here too.
1353  
1354  For details on OpenAI API parameters, see
1355  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
1356  
1357  ### Usage example
1358  
1359  ```python
1360  from haystack.components.generators.chat import OpenAIChatGenerator
1361  from haystack.dataclasses import ChatMessage
1362  
1363  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1364  
1365  client = OpenAIChatGenerator()
1366  response = client.run(messages)
1367  print(response)
1368  ```
1369  
1370  Output:
1371  
1372  ```
1373  {'replies':
1374      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=
1375      [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence
1376          that focuses on enabling computers to understand, interpret, and generate human language in
1377          a way that is meaningful and useful.")],
1378       _name=None,
1379       _meta={'model': 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop',
1380       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})
1381      ]
1382  }
1383  ```
1384  
1385  #### SUPPORTED_MODELS
1386  
1387  ```python
1388  SUPPORTED_MODELS: list[str] = [
1389      "gpt-5-mini",
1390      "gpt-5-nano",
1391      "gpt-5",
1392      "gpt-5.1",
1393      "gpt-5.2",
1394      "gpt-5.2-pro",
1395      "gpt-5.4",
1396      "gpt-5-pro",
1397      "gpt-4.1",
1398      "gpt-4.1-mini",
1399      "gpt-4.1-nano",
1400      "gpt-4o",
1401      "gpt-4o-mini",
1402      "gpt-4-turbo",
1403      "gpt-4",
1404      "gpt-3.5-turbo",
1405  ]
1406  
1407  ```
1408  
1409  A non-exhaustive list of chat models supported by this component.
1410  See https://developers.openai.com/api/docs/models for the full list and snapshot IDs.
1411  
1412  #### __init__
1413  
1414  ```python
1415  __init__(
1416      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
1417      model: str = "gpt-5-mini",
1418      streaming_callback: StreamingCallbackT | None = None,
1419      api_base_url: str | None = None,
1420      organization: str | None = None,
1421      generation_kwargs: dict[str, Any] | None = None,
1422      timeout: float | None = None,
1423      max_retries: int | None = None,
1424      tools: ToolsType | None = None,
1425      tools_strict: bool = False,
1426      http_client_kwargs: dict[str, Any] | None = None,
1427  ) -> None
1428  ```
1429  
1430  Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini
1431  
1432  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
1433  environment variables to override the `timeout` and `max_retries` parameters respectively
1434  in the OpenAI client.
1435  
1436  **Parameters:**
1437  
1438  - **api_key** (<code>Secret</code>) – The OpenAI API key.
1439    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
1440    during initialization.
1441  - **model** (<code>str</code>) – The name of the model to use.
1442  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1443    The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1444    as an argument.
1445  - **api_base_url** (<code>str | None</code>) – An optional base URL.
1446  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See
1447    [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1448  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to
1449    the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
1450    more details.
1451    Some of the supported parameters:
1452  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
1453    including visible output tokens and reasoning tokens.
1454  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
1455    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
1456  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1457    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1458    comprising the top 10% probability mass are considered.
1459  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
1460    it will generate two completions for each of the three prompts, ending up with 6 completions in total.
1461  - `stop`: One or more sequences after which the LLM should stop generating tokens.
1462  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
1463    the model will be less likely to repeat the same token in the text.
1464  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
1465    Bigger values mean the model will be less likely to repeat the same token in the text.
1466  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
1467    values are the bias to add to that token.
1468  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
1469    If provided, the output will always be validated against this
1470    format (unless the model returns a tool call).
1471    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1472    Notes:
1473    - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
1474      Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1475      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1476    - For structured outputs with streaming,
1477      the `response_format` must be a JSON schema and not a Pydantic model.
1478  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
1479    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1480  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
1481    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1482  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1483  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1484    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1485  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1486    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
1487  
1488  #### warm_up
1489  
1490  ```python
1491  warm_up() -> None
1492  ```
1493  
1494  Warm up the OpenAI chat generator.
1495  
1496  This will warm up the tools registered in the chat generator.
1497  This method is idempotent and will only warm up the tools once.
1498  
1499  #### to_dict
1500  
1501  ```python
1502  to_dict() -> dict[str, Any]
1503  ```
1504  
1505  Serialize this component to a dictionary.
1506  
1507  **Returns:**
1508  
1509  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
1510  
1511  #### from_dict
1512  
1513  ```python
1514  from_dict(data: dict[str, Any]) -> OpenAIChatGenerator
1515  ```
1516  
1517  Deserialize this component from a dictionary.
1518  
1519  **Parameters:**
1520  
1521  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
1522  
1523  **Returns:**
1524  
1525  - <code>OpenAIChatGenerator</code> – The deserialized component instance.
1526  
1527  #### run
1528  
1529  ```python
1530  run(
1531      messages: list[ChatMessage],
1532      streaming_callback: StreamingCallbackT | None = None,
1533      generation_kwargs: dict[str, Any] | None = None,
1534      *,
1535      tools: ToolsType | None = None,
1536      tools_strict: bool | None = None
1537  ) -> dict[str, list[ChatMessage]]
1538  ```
1539  
1540  Invokes chat completion based on the provided messages and generation parameters.
1541  
1542  **Parameters:**
1543  
1544  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1545  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1546  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1547    override the parameters passed during component initialization.
1548    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1549  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1550    If set, it will override the `tools` parameter provided during initialization.
1551  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1552    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1553    If set, it will override the `tools_strict` parameter set during component initialization.
1554  
1555  **Returns:**
1556  
1557  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1558  - `replies`: A list containing the generated responses as ChatMessage instances.
1559  
1560  #### run_async
1561  
1562  ```python
1563  run_async(
1564      messages: list[ChatMessage],
1565      streaming_callback: StreamingCallbackT | None = None,
1566      generation_kwargs: dict[str, Any] | None = None,
1567      *,
1568      tools: ToolsType | None = None,
1569      tools_strict: bool | None = None
1570  ) -> dict[str, list[ChatMessage]]
1571  ```
1572  
1573  Asynchronously invokes chat completion based on the provided messages and generation parameters.
1574  
1575  This is the asynchronous version of the `run` method. It has the same parameters and return values
1576  but can be used with `await` in async code.
1577  
1578  **Parameters:**
1579  
1580  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1581  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1582    Must be a coroutine.
1583  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1584    override the parameters passed during component initialization.
1585    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1586  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1587    If set, it will override the `tools` parameter provided during initialization.
1588  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1589    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1590    If set, it will override the `tools_strict` parameter set during component initialization.
1591  
1592  **Returns:**
1593  
1594  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1595  - `replies`: A list containing the generated responses as ChatMessage instances.
1596  
1597  ## chat/openai_responses
1598  
1599  ### OpenAIResponsesChatGenerator
1600  
1601  Completes chats using OpenAI's Responses API.
1602  
1603  It works with the gpt-4 and o-series models and supports streaming responses
1604  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1605  format in input and output.
1606  
1607  You can customize how the text is generated by passing parameters to the
1608  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1609  the component or when you run it. Any parameter that works with
1610  `openai.Responses.create` will work here too.
1611  
1612  For details on OpenAI API parameters, see
1613  [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses).
1614  
1615  ### Usage example
1616  
1617  ```python
1618  from haystack.components.generators.chat import OpenAIResponsesChatGenerator
1619  from haystack.dataclasses import ChatMessage
1620  
1621  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1622  
1623  client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}})
1624  response = client.run(messages)
1625  print(response)
1626  ```
1627  
1628  #### SUPPORTED_MODELS
1629  
1630  ```python
1631  SUPPORTED_MODELS: list[str] = [
1632      "gpt-5-mini",
1633      "gpt-5-nano",
1634      "gpt-5",
1635      "gpt-5.1",
1636      "gpt-5.2",
1637      "gpt-5.2-pro",
1638      "gpt-5.4",
1639      "gpt-5-pro",
1640      "gpt-4.1",
1641      "gpt-4.1-mini",
1642      "gpt-4.1-nano",
1643      "gpt-4o",
1644      "gpt-4o-mini",
1645      "o1",
1646      "o1-mini",
1647      "o1-pro",
1648      "o3",
1649      "o3-mini",
1650      "o3-pro",
1651      "o4-mini",
1652  ]
1653  
1654  ```
1655  
1656  A non-exhaustive list of chat models supported by this component.
1657  See https://platform.openai.com/docs/models for the full list and snapshot IDs.
1658  
1659  #### __init__
1660  
1661  ```python
1662  __init__(
1663      *,
1664      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
1665      model: str = "gpt-5-mini",
1666      streaming_callback: StreamingCallbackT | None = None,
1667      api_base_url: str | None = None,
1668      organization: str | None = None,
1669      generation_kwargs: dict[str, Any] | None = None,
1670      timeout: float | None = None,
1671      max_retries: int | None = None,
1672      tools: ToolsType | list[dict] | None = None,
1673      tools_strict: bool = False,
1674      http_client_kwargs: dict[str, Any] | None = None
1675  ) -> None
1676  ```
1677  
1678  Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default.
1679  
1680  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
1681  environment variables to override the `timeout` and `max_retries` parameters respectively
1682  in the OpenAI client.
1683  
1684  **Parameters:**
1685  
1686  - **api_key** (<code>Secret</code>) – The OpenAI API key.
1687    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
1688    during initialization.
1689  - **model** (<code>str</code>) – The name of the model to use.
1690  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1691    The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1692    as an argument.
1693  - **api_base_url** (<code>str | None</code>) – An optional base URL.
1694  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See
1695    [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1696  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent
1697    directly to the OpenAI endpoint.
1698    See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for
1699    more details.
1700    Some of the supported parameters:
1701  - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random,
1702    while lower values like 0.2 will make it more focused and deterministic.
1703  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1704    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1705    comprising the top 10% probability mass are considered.
1706  - `previous_response_id`: The ID of the previous response.
1707    Use this to create multi-turn conversations.
1708  - `text_format`: A Pydantic model that enforces the structure of the model's response.
1709    If provided, the output will always be validated against this
1710    format (unless the model returns a tool call).
1711    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1712  - `text`: A JSON schema that enforces the structure of the model's response.
1713    If provided, the output will always be validated against this
1714    format (unless the model returns a tool call).
1715    Notes:
1716    - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
1717    - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored.
1718    - Currently, this component doesn't support streaming for structured outputs.
1719    - Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1720      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1721  - `reasoning`: A dictionary of parameters for reasoning. For example:
1722    - `summary`: The summary of the reasoning.
1723    - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`.
1724    - `generate_summary`: Whether to generate a summary of the reasoning.
1725      Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled.
1726      For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning).
1727  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
1728    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1729  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
1730    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1731  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. This parameter can accept either a
1732    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1733    OpenAI/MCP tool definitions.
1734    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1735    For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
1736  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
1737    follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
1738    are strict by default.
1739  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1740    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
1741  
1742  #### warm_up
1743  
1744  ```python
1745  warm_up() -> None
1746  ```
1747  
1748  Warm up the OpenAI responses chat generator.
1749  
1750  This will warm up the tools registered in the chat generator.
1751  This method is idempotent and will only warm up the tools once.
1752  
1753  #### to_dict
1754  
1755  ```python
1756  to_dict() -> dict[str, Any]
1757  ```
1758  
1759  Serialize this component to a dictionary.
1760  
1761  **Returns:**
1762  
1763  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
1764  
1765  #### from_dict
1766  
1767  ```python
1768  from_dict(data: dict[str, Any]) -> OpenAIResponsesChatGenerator
1769  ```
1770  
1771  Deserialize this component from a dictionary.
1772  
1773  **Parameters:**
1774  
1775  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
1776  
1777  **Returns:**
1778  
1779  - <code>OpenAIResponsesChatGenerator</code> – The deserialized component instance.
1780  
1781  #### run
1782  
1783  ```python
1784  run(
1785      messages: list[ChatMessage],
1786      *,
1787      streaming_callback: StreamingCallbackT | None = None,
1788      generation_kwargs: dict[str, Any] | None = None,
1789      tools: ToolsType | list[dict] | None = None,
1790      tools_strict: bool | None = None
1791  ) -> dict[str, list[ChatMessage]]
1792  ```
1793  
1794  Invokes response generation based on the provided messages and generation parameters.
1795  
1796  **Parameters:**
1797  
1798  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1799  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1800  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1801    override the parameters passed during component initialization.
1802    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
1803  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. If set, it will override the
1804    `tools` parameter set during component initialization. This parameter can accept either a
1805    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1806    OpenAI/MCP tool definitions.
1807    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1808    For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
1809  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
1810    follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
1811    are strict by default.
1812    If set, it will override the `tools_strict` parameter set during component initialization.
1813  
1814  **Returns:**
1815  
1816  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1817  - `replies`: A list containing the generated responses as ChatMessage instances.
1818  
1819  #### run_async
1820  
1821  ```python
1822  run_async(
1823      messages: list[ChatMessage],
1824      *,
1825      streaming_callback: StreamingCallbackT | None = None,
1826      generation_kwargs: dict[str, Any] | None = None,
1827      tools: ToolsType | list[dict] | None = None,
1828      tools_strict: bool | None = None
1829  ) -> dict[str, list[ChatMessage]]
1830  ```
1831  
1832  Asynchronously invokes response generation based on the provided messages and generation parameters.
1833  
1834  This is the asynchronous version of the `run` method. It has the same parameters and return values
1835  but can be used with `await` in async code.
1836  
1837  **Parameters:**
1838  
1839  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1840  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1841    Must be a coroutine.
1842  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1843    override the parameters passed during component initialization.
1844    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
1845  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
1846    `tools` parameter set during component initialization. This parameter can accept either a list of
1847    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1848    OpenAI/MCP tool definitions.
1849    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1850  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1851    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1852    If set, it will override the `tools_strict` parameter set during component initialization.
1853  
1854  **Returns:**
1855  
1856  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1857  - `replies`: A list containing the generated responses as ChatMessage instances.
1858  
1859  ## hugging_face_api
1860  
1861  ### HuggingFaceAPIGenerator
1862  
1863  Generates text using Hugging Face APIs.
1864  
1865  Use it with the following Hugging Face APIs:
1866  
1867  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
1868  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
1869  
1870  **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the
1871  `text_generation` endpoint. Generative models are now only available through providers supporting the
1872  `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API.
1873  Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint.
1874  
1875  ### Usage examples
1876  
1877  #### With Hugging Face Inference Endpoints
1878  
1879  <!-- test-ignore -->
1880  
1881  ```python
1882  from haystack.components.generators import HuggingFaceAPIGenerator
1883  from haystack.utils import Secret
1884  
1885  generator = HuggingFaceAPIGenerator(api_type="inference_endpoints",
1886                                      api_params={"url": "<your-inference-endpoint-url>"},
1887                                      token=Secret.from_token("<your-api-key>"))
1888  
1889  result = generator.run(prompt="What's Natural Language Processing?")
1890  print(result)
1891  ```
1892  
1893  #### With self-hosted text generation inference
1894  
1895  <!-- test-ignore -->
1896  
1897  ```python
1898  from haystack.components.generators import HuggingFaceAPIGenerator
1899  
1900  generator = HuggingFaceAPIGenerator(api_type="text_generation_inference",
1901                                      api_params={"url": "http://localhost:8080"})
1902  
1903  result = generator.run(prompt="What's Natural Language Processing?")
1904  print(result)
1905  ```
1906  
1907  #### With the free serverless inference API
1908  
1909  Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the
1910  `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the
1911  `chat_completion` endpoint.
1912  
1913  <!-- test-ignore -->
1914  
1915  ```python
1916  from haystack.components.generators import HuggingFaceAPIGenerator
1917  from haystack.utils import Secret
1918  
1919  generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
1920                                      api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
1921                                      token=Secret.from_token("<your-api-key>"))
1922  
1923  result = generator.run(prompt="What's Natural Language Processing?")
1924  print(result)
1925  ```
1926  
1927  #### __init__
1928  
1929  ```python
1930  __init__(
1931      api_type: HFGenerationAPIType | str,
1932      api_params: dict[str, str],
1933      token: Secret | None = Secret.from_env_var(
1934          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
1935      ),
1936      generation_kwargs: dict[str, Any] | None = None,
1937      stop_words: list[str] | None = None,
1938      streaming_callback: StreamingCallbackT | None = None,
1939  ) -> None
1940  ```
1941  
1942  Initialize the HuggingFaceAPIGenerator instance.
1943  
1944  **Parameters:**
1945  
1946  - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types:
1947  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
1948  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
1949  - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api).
1950    This might no longer work due to changes in the models offered in the Hugging Face Inference API.
1951    Please use the `HuggingFaceAPIChatGenerator` component instead.
1952  - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys:
1953  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
1954  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
1955    `TEXT_GENERATION_INFERENCE`.
1956  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc.
1957  - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization.
1958    Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
1959  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`,
1960    `temperature`, `top_k`, `top_p`.
1961    For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation)
1962    for more information.
1963  - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words.
1964  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1965  
1966  #### to_dict
1967  
1968  ```python
1969  to_dict() -> dict[str, Any]
1970  ```
1971  
1972  Serialize this component to a dictionary.
1973  
1974  **Returns:**
1975  
1976  - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component.
1977  
1978  #### from_dict
1979  
1980  ```python
1981  from_dict(data: dict[str, Any]) -> HuggingFaceAPIGenerator
1982  ```
1983  
1984  Deserialize this component from a dictionary.
1985  
1986  #### run
1987  
1988  ```python
1989  run(
1990      prompt: str,
1991      streaming_callback: StreamingCallbackT | None = None,
1992      generation_kwargs: dict[str, Any] | None = None,
1993  ) -> dict[str, Any]
1994  ```
1995  
1996  Invoke the text generation inference for the given prompt and generation parameters.
1997  
1998  **Parameters:**
1999  
2000  - **prompt** (<code>str</code>) – A string representing the prompt.
2001  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2002  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
2003  
2004  **Returns:**
2005  
2006  - <code>dict\[str, Any\]</code> – A dictionary with the generated replies and metadata. Both are lists of length n.
2007  - replies: A list of strings representing the generated replies.
2008  
2009  ## hugging_face_local
2010  
2011  ### HuggingFaceLocalGenerator
2012  
2013  Generates text using models from Hugging Face that run locally.
2014  
2015  LLMs running locally may need powerful hardware.
2016  
2017  ### Usage example
2018  
2019  ```python
2020  from haystack.components.generators import HuggingFaceLocalGenerator
2021  
2022  generator = HuggingFaceLocalGenerator(
2023      model="Qwen/Qwen3-0.6B",
2024      task="text-generation",
2025      generation_kwargs={"max_new_tokens": 100, "temperature": 0.9}
2026  )
2027  
2028  print(generator.run("Who is the best American actor?"))
2029  # >> {'replies': ['John Cusack']}
2030  ```
2031  
2032  #### __init__
2033  
2034  ```python
2035  __init__(
2036      model: str = "Qwen/Qwen3-0.6B",
2037      task: Literal["text-generation", "text2text-generation"] | None = None,
2038      device: ComponentDevice | None = None,
2039      token: Secret | None = Secret.from_env_var(
2040          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
2041      ),
2042      generation_kwargs: dict[str, Any] | None = None,
2043      huggingface_pipeline_kwargs: dict[str, Any] | None = None,
2044      stop_words: list[str] | None = None,
2045      streaming_callback: StreamingCallbackT | None = None,
2046  ) -> None
2047  ```
2048  
2049  Creates an instance of a HuggingFaceLocalGenerator.
2050  
2051  **Parameters:**
2052  
2053  - **model** (<code>str</code>) – The Hugging Face text generation model name or path.
2054  - **task** (<code>Literal['text-generation', 'text2text-generation'] | None</code>) – The task for the Hugging Face pipeline. Possible options:
2055  - `text-generation`: Supported by decoder models, like GPT.
2056  - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead.
2057    Previously supported by encoder–decoder models such as T5.
2058    If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
2059    If not specified, the component calls the Hugging Face API to infer the task from the model name.
2060  - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device.
2061    If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
2062  - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files.
2063    If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
2064  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
2065    Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
2066    See Hugging Face's documentation for more information:
2067  - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
2068  - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
2069  - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the
2070    Hugging Face pipeline for text generation.
2071    These keyword arguments provide fine-grained control over the Hugging Face pipeline.
2072    In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
2073    For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
2074    In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization:
2075    [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
2076  - **stop_words** (<code>list\[str\] | None</code>) – If the model generates a stop word, the generation stops.
2077    If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
2078    For some chat models, the output includes both the new text and the original prompt.
2079    In these cases, make sure your prompt has no stop words.
2080  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
2081  
2082  #### warm_up
2083  
2084  ```python
2085  warm_up() -> None
2086  ```
2087  
2088  Initializes the component.
2089  
2090  #### to_dict
2091  
2092  ```python
2093  to_dict() -> dict[str, Any]
2094  ```
2095  
2096  Serializes the component to a dictionary.
2097  
2098  **Returns:**
2099  
2100  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
2101  
2102  #### from_dict
2103  
2104  ```python
2105  from_dict(data: dict[str, Any]) -> HuggingFaceLocalGenerator
2106  ```
2107  
2108  Deserializes the component from a dictionary.
2109  
2110  **Parameters:**
2111  
2112  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
2113  
2114  **Returns:**
2115  
2116  - <code>HuggingFaceLocalGenerator</code> – The deserialized component.
2117  
2118  #### run
2119  
2120  ```python
2121  run(
2122      prompt: str,
2123      streaming_callback: StreamingCallbackT | None = None,
2124      generation_kwargs: dict[str, Any] | None = None,
2125  ) -> dict[str, Any]
2126  ```
2127  
2128  Run the text generation model on the given prompt.
2129  
2130  **Parameters:**
2131  
2132  - **prompt** (<code>str</code>) – A string representing the prompt.
2133  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2134  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
2135  
2136  **Returns:**
2137  
2138  - <code>dict\[str, Any\]</code> – A dictionary containing the generated replies.
2139  - replies: A list of strings representing the generated replies.
2140  
2141  ## openai
2142  
2143  ### OpenAIGenerator
2144  
2145  Generates text using OpenAI's large language models (LLMs).
2146  
2147  It works with the gpt-4 and gpt-5 series models and supports streaming responses
2148  from OpenAI API. It uses strings as input and output.
2149  
2150  You can customize how the text is generated by passing parameters to the
2151  OpenAI API. Use the `**generation_kwargs` argument when you initialize
2152  the component or when you run it. Any parameter that works with
2153  `openai.ChatCompletion.create` will work here too.
2154  
2155  For details on OpenAI API parameters, see
2156  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
2157  
2158  ### Usage example
2159  
2160  ```python
2161  from haystack.components.generators import OpenAIGenerator
2162  client = OpenAIGenerator()
2163  response = client.run("What's Natural Language Processing? Be brief.")
2164  print(response)
2165  
2166  # >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
2167  # >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
2168  # >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
2169  # >> 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
2170  # >> 'completion_tokens': 49, 'total_tokens': 65}}]}
2171  ```
2172  
2173  #### __init__
2174  
2175  ```python
2176  __init__(
2177      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
2178      model: str = "gpt-5-mini",
2179      streaming_callback: StreamingCallbackT | None = None,
2180      api_base_url: str | None = None,
2181      organization: str | None = None,
2182      system_prompt: str | None = None,
2183      generation_kwargs: dict[str, Any] | None = None,
2184      timeout: float | None = None,
2185      max_retries: int | None = None,
2186      http_client_kwargs: dict[str, Any] | None = None,
2187  ) -> None
2188  ```
2189  
2190  Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini
2191  
2192  By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters
2193  in the OpenAI client.
2194  
2195  **Parameters:**
2196  
2197  - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI.
2198  - **model** (<code>str</code>) – The name of the model to use.
2199  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2200    The callback function accepts StreamingChunk as an argument.
2201  - **api_base_url** (<code>str | None</code>) – An optional base URL.
2202  - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`.
2203  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the system prompt is
2204    omitted, and the default system prompt of the model is used.
2205  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are all sent directly to
2206    the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
2207    more details.
2208    Some of the supported parameters:
2209  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
2210    including visible output tokens and reasoning tokens.
2211  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
2212    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
2213  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
2214    considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens
2215    comprising the top 10% probability mass are considered.
2216  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
2217    it will generate two completions for each of the three prompts, ending up with 6 completions in total.
2218  - `stop`: One or more sequences after which the LLM should stop generating tokens.
2219  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
2220    the model will be less likely to repeat the same token in the text.
2221  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
2222    Bigger values mean the model will be less likely to repeat the same token in the text.
2223  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
2224    values are the bias to add to that token.
2225  - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable
2226    or set to 30.
2227  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred
2228    from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
2229  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
2230    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
2231  
2232  #### to_dict
2233  
2234  ```python
2235  to_dict() -> dict[str, Any]
2236  ```
2237  
2238  Serialize this component to a dictionary.
2239  
2240  **Returns:**
2241  
2242  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
2243  
2244  #### from_dict
2245  
2246  ```python
2247  from_dict(data: dict[str, Any]) -> OpenAIGenerator
2248  ```
2249  
2250  Deserialize this component from a dictionary.
2251  
2252  **Parameters:**
2253  
2254  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
2255  
2256  **Returns:**
2257  
2258  - <code>OpenAIGenerator</code> – The deserialized component instance.
2259  
2260  #### run
2261  
2262  ```python
2263  run(
2264      prompt: str,
2265      system_prompt: str | None = None,
2266      streaming_callback: StreamingCallbackT | None = None,
2267      generation_kwargs: dict[str, Any] | None = None,
2268  ) -> dict[str, list[str] | list[dict[str, Any]]]
2269  ```
2270  
2271  Invoke the text generation inference based on the provided messages and generation parameters.
2272  
2273  **Parameters:**
2274  
2275  - **prompt** (<code>str</code>) – The string prompt to use for text generation.
2276  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If this run time system prompt is omitted, the system
2277    prompt, if defined at initialisation time, is used.
2278  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2279  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will potentially override the parameters
2280    passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to
2281    the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create).
2282  
2283  **Returns:**
2284  
2285  - <code>dict\[str, list\[str\] | list\[dict\[str, Any\]\]\]</code> – A list of strings containing the generated responses and a list of dictionaries containing the metadata
2286    for each response.
2287  
2288  ## openai_dalle
2289  
2290  ### DALLEImageGenerator
2291  
2292  Generates images using OpenAI's DALL-E model.
2293  
2294  For details on OpenAI API parameters, see
2295  [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create).
2296  
2297  ### Usage example
2298  
2299  ```python
2300  from haystack.components.generators import DALLEImageGenerator
2301  image_generator = DALLEImageGenerator()
2302  response = image_generator.run("Show me a picture of a black cat.")
2303  print(response)
2304  ```
2305  
2306  #### __init__
2307  
2308  ```python
2309  __init__(
2310      model: str = "dall-e-3",
2311      quality: Literal["standard", "hd"] = "standard",
2312      size: Literal[
2313          "256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"
2314      ] = "1024x1024",
2315      response_format: Literal["url", "b64_json"] = "url",
2316      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
2317      api_base_url: str | None = None,
2318      organization: str | None = None,
2319      timeout: float | None = None,
2320      max_retries: int | None = None,
2321      http_client_kwargs: dict[str, Any] | None = None,
2322  ) -> None
2323  ```
2324  
2325  Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3.
2326  
2327  **Parameters:**
2328  
2329  - **model** (<code>str</code>) – The model to use for image generation. Can be "dall-e-2" or "dall-e-3".
2330  - **quality** (<code>Literal['standard', 'hd']</code>) – The quality of the generated image. Can be "standard" or "hd".
2331  - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792']</code>) – The size of the generated images.
2332    Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2.
2333    Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models.
2334  - **response_format** (<code>Literal['url', 'b64_json']</code>) – The format of the response. Can be "url" or "b64_json".
2335  - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI.
2336  - **api_base_url** (<code>str | None</code>) – An optional base URL.
2337  - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`.
2338  - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable
2339    or set to 30.
2340  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred
2341    from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
2342  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
2343    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
2344  
2345  #### warm_up
2346  
2347  ```python
2348  warm_up() -> None
2349  ```
2350  
2351  Warm up the OpenAI client.
2352  
2353  #### run
2354  
2355  ```python
2356  run(
2357      prompt: str,
2358      size: (
2359          Literal["256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"]
2360          | None
2361      ) = None,
2362      quality: Literal["standard", "hd"] | None = None,
2363      response_format: Literal["url", "b64_json"] | None = None,
2364  ) -> dict[str, Any]
2365  ```
2366  
2367  Invokes the image generation inference based on the provided prompt and generation parameters.
2368  
2369  **Parameters:**
2370  
2371  - **prompt** (<code>str</code>) – The prompt to generate the image.
2372  - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792'] | None</code>) – If provided, overrides the size provided during initialization.
2373  - **quality** (<code>Literal['standard', 'hd'] | None</code>) – If provided, overrides the quality provided during initialization.
2374  - **response_format** (<code>Literal['url', 'b64_json'] | None</code>) – If provided, overrides the response format provided during initialization.
2375  
2376  **Returns:**
2377  
2378  - <code>dict\[str, Any\]</code> – A dictionary containing the generated list of images and the revised prompt.
2379    Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings.
2380    The revised prompt is the prompt that was used to generate the image, if there was any revision
2381    to the prompt made by OpenAI.
2382  
2383  #### to_dict
2384  
2385  ```python
2386  to_dict() -> dict[str, Any]
2387  ```
2388  
2389  Serialize this component to a dictionary.
2390  
2391  **Returns:**
2392  
2393  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
2394  
2395  #### from_dict
2396  
2397  ```python
2398  from_dict(data: dict[str, Any]) -> DALLEImageGenerator
2399  ```
2400  
2401  Deserialize this component from a dictionary.
2402  
2403  **Parameters:**
2404  
2405  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
2406  
2407  **Returns:**
2408  
2409  - <code>DALLEImageGenerator</code> – The deserialized component instance.
2410  
2411  ## utils
2412  
2413  ### print_streaming_chunk
2414  
2415  ```python
2416  print_streaming_chunk(chunk: StreamingChunk) -> None
2417  ```
2418  
2419  Callback function to handle and display streaming output chunks.
2420  
2421  This function processes a `StreamingChunk` object by:
2422  
2423  - Printing tool call metadata (if any), including function names and arguments, as they arrive.
2424  - Printing tool call results when available.
2425  - Printing the main content (e.g., text tokens) of the chunk as it is received.
2426  
2427  The function outputs data directly to stdout and flushes output buffers to ensure immediate display during
2428  streaming.
2429  
2430  **Parameters:**
2431  
2432  - **chunk** (<code>StreamingChunk</code>) – A chunk of streaming data containing content and optional metadata, such as tool calls and
2433    tool results.