/ docs-website / reference / haystack-api / generators_api.md
generators_api.md
   1  ---
   2  title: "Generators"
   3  id: generators-api
   4  description: "Enables text generation using LLMs."
   5  slug: "/generators-api"
   6  ---
   7  
   8  
   9  ## azure
  10  
  11  ### AzureOpenAIGenerator
  12  
  13  Bases: <code>OpenAIGenerator</code>
  14  
  15  Generates text using OpenAI's large language models (LLMs).
  16  
  17  It works with the gpt-4 - type models and supports streaming responses
  18  from OpenAI API.
  19  
  20  You can customize how the text is generated by passing parameters to the
  21  OpenAI API. Use the `**generation_kwargs` argument when you initialize
  22  the component or when you run it. Any parameter that works with
  23  `openai.ChatCompletion.create` will work here too.
  24  
  25  For details on OpenAI API parameters, see
  26  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
  27  
  28  ### Usage example
  29  
  30  <!-- test-ignore -->
  31  
  32  ```python
  33  from haystack.components.generators import AzureOpenAIGenerator
  34  from haystack.utils import Secret
  35  client = AzureOpenAIGenerator(
  36      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
  37      api_key=Secret.from_token("<your-api-key>"),
  38      azure_deployment="<this a model name, e.g.  gpt-4.1-mini>")
  39  response = client.run("What's Natural Language Processing? Be brief.")
  40  print(response)
  41  ```
  42  
  43  ```
  44  # >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
  45  # >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
  46  # >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
  47  # >> 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
  48  # >> 'completion_tokens': 49, 'total_tokens': 65}}]}
  49  ```
  50  
  51  #### __init__
  52  
  53  ```python
  54  __init__(
  55      azure_endpoint: str | None = None,
  56      api_version: str | None = "2024-12-01-preview",
  57      azure_deployment: str | None = "gpt-4.1-mini",
  58      api_key: Secret | None = Secret.from_env_var(
  59          "AZURE_OPENAI_API_KEY", strict=False
  60      ),
  61      azure_ad_token: Secret | None = Secret.from_env_var(
  62          "AZURE_OPENAI_AD_TOKEN", strict=False
  63      ),
  64      organization: str | None = None,
  65      streaming_callback: StreamingCallbackT | None = None,
  66      system_prompt: str | None = None,
  67      timeout: float | None = None,
  68      max_retries: int | None = None,
  69      http_client_kwargs: dict[str, Any] | None = None,
  70      generation_kwargs: dict[str, Any] | None = None,
  71      default_headers: dict[str, str] | None = None,
  72      *,
  73      azure_ad_token_provider: AzureADTokenProvider | None = None
  74  ) -> None
  75  ```
  76  
  77  Initialize the Azure OpenAI Generator.
  78  
  79  **Parameters:**
  80  
  81  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`.
  82  - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview.
  83  - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name.
  84  - **api_key** (<code>Secret | None</code>) – The API key to use for authentication.
  85  - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
  86  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
  87    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
  88  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
  89    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
  90    as an argument.
  91  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the Generator
  92    omits the system prompt and uses the default system prompt.
  93  - **timeout** (<code>float | None</code>) – Timeout for AzureOpenAI client. If not set, it is inferred from the
  94    `OPENAI_TIMEOUT` environment variable or set to 30.
  95  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with AzureOpenAI if it returns an internal error.
  96    If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
  97  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
  98    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
  99  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model, sent directly to
 100    the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for
 101    more details.
 102    Some of the supported parameters:
 103  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 104    including visible output tokens and reasoning tokens.
 105  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 106    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 107  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 108    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
 109    comprising the top 10% probability mass are considered.
 110  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 111    the LLM will generate two completions per prompt, resulting in 6 completions total.
 112  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 113  - `presence_penalty`: The penalty applied if a token is already present.
 114    Higher values make the model less likely to repeat the token.
 115  - `frequency_penalty`: Penalty applied if a token has already been generated.
 116    Higher values make the model less likely to repeat the token.
 117  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 118    values are the bias to add to that token.
 119  - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client.
 120  - **azure_ad_token_provider** (<code>AzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on
 121    every request.
 122  
 123  #### to_dict
 124  
 125  ```python
 126  to_dict() -> dict[str, Any]
 127  ```
 128  
 129  Serialize this component to a dictionary.
 130  
 131  **Returns:**
 132  
 133  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 134  
 135  #### from_dict
 136  
 137  ```python
 138  from_dict(data: dict[str, Any]) -> AzureOpenAIGenerator
 139  ```
 140  
 141  Deserialize this component from a dictionary.
 142  
 143  **Parameters:**
 144  
 145  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 146  
 147  **Returns:**
 148  
 149  - <code>AzureOpenAIGenerator</code> – The deserialized component instance.
 150  
 151  ## chat/azure
 152  
 153  ### AzureOpenAIChatGenerator
 154  
 155  Bases: <code>OpenAIChatGenerator</code>
 156  
 157  Generates text using OpenAI's models on Azure.
 158  
 159  It works with the gpt-4 - type models and supports streaming responses
 160  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 161  format in input and output.
 162  
 163  You can customize how the text is generated by passing parameters to the
 164  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 165  the component or when you run it. Any parameter that works with
 166  `openai.ChatCompletion.create` will work here too.
 167  
 168  For details on OpenAI API parameters, see
 169  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 170  
 171  ### Usage example
 172  
 173  <!-- test-ignore -->
 174  
 175  ```python
 176  from haystack.components.generators.chat import AzureOpenAIChatGenerator
 177  from haystack.dataclasses import ChatMessage
 178  from haystack.utils import Secret
 179  
 180  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 181  
 182  client = AzureOpenAIChatGenerator(
 183      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
 184      api_key=Secret.from_token("<your-api-key>"),
 185      azure_deployment="<this a model name, e.g. gpt-4.1-mini>")
 186  response = client.run(messages)
 187  print(response)
 188  ```
 189  
 190  ```
 191  {'replies':
 192      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
 193      "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
 194       enabling computers to understand, interpret, and generate human language in a way that is useful.")],
 195       _name=None,
 196       _meta={'model': 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop',
 197       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})]
 198  }
 199  ```
 200  
 201  #### SUPPORTED_MODELS
 202  
 203  ```python
 204  SUPPORTED_MODELS: list[str] = [
 205      "gpt-5.4",
 206      "gpt-5.4-pro",
 207      "gpt-5.3-codex",
 208      "gpt-5.2",
 209      "gpt-5.2-codex",
 210      "gpt-5.2-chat",
 211      "gpt-5.1",
 212      "gpt-5.1-chat",
 213      "gpt-5.1-codex",
 214      "gpt-5.1-codex-mini",
 215      "gpt-5",
 216      "gpt-5-mini",
 217      "gpt-5-nano",
 218      "gpt-5-chat",
 219      "gpt-4.1",
 220      "gpt-4.1-mini",
 221      "gpt-4.1-nano",
 222      "gpt-4o",
 223      "gpt-4o-mini",
 224      "gpt-4o-audio-preview",
 225      "gpt-realtime-1.5",
 226      "gpt-audio-1.5",
 227      "o1",
 228      "o1-mini",
 229      "o3",
 230      "o3-mini",
 231      "o4-mini",
 232      "codex-mini",
 233      "gpt-4",
 234      "gpt-35-turbo",
 235      "gpt-oss-120b",
 236      "computer-use-preview",
 237  ]
 238  
 239  ```
 240  
 241  A non-exhaustive list of chat models supported by this component.
 242  See https://learn.microsoft.com/en-us/azure/foundry/foundry-models/concepts/models-sold-directly-by-azure
 243  for the full list.
 244  
 245  #### __init__
 246  
 247  ```python
 248  __init__(
 249      azure_endpoint: str | None = None,
 250      api_version: str | None = "2024-12-01-preview",
 251      azure_deployment: str | None = "gpt-4.1-mini",
 252      api_key: Secret | None = Secret.from_env_var(
 253          "AZURE_OPENAI_API_KEY", strict=False
 254      ),
 255      azure_ad_token: Secret | None = Secret.from_env_var(
 256          "AZURE_OPENAI_AD_TOKEN", strict=False
 257      ),
 258      organization: str | None = None,
 259      streaming_callback: StreamingCallbackT | None = None,
 260      timeout: float | None = None,
 261      max_retries: int | None = None,
 262      generation_kwargs: dict[str, Any] | None = None,
 263      default_headers: dict[str, str] | None = None,
 264      tools: ToolsType | None = None,
 265      tools_strict: bool = False,
 266      *,
 267      azure_ad_token_provider: (
 268          AzureADTokenProvider | AsyncAzureADTokenProvider | None
 269      ) = None,
 270      http_client_kwargs: dict[str, Any] | None = None
 271  ) -> None
 272  ```
 273  
 274  Initialize the Azure OpenAI Chat Generator component.
 275  
 276  **Parameters:**
 277  
 278  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
 279  - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview.
 280  - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name.
 281  - **api_key** (<code>Secret | None</code>) – The API key to use for authentication.
 282  - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
 283  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
 284    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
 285  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
 286    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
 287    as an argument.
 288  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
 289    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 290  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
 291    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 292  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to
 293    the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 294    Some of the supported parameters:
 295  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 296    including visible output tokens and reasoning tokens.
 297  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 298    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 299  - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers
 300    tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising
 301    the top 10% probability mass are considered.
 302  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 303    the LLM will generate two completions per prompt, resulting in 6 completions total.
 304  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 305  - `presence_penalty`: The penalty applied if a token is already present.
 306    Higher values make the model less likely to repeat the token.
 307  - `frequency_penalty`: Penalty applied if a token has already been generated.
 308    Higher values make the model less likely to repeat the token.
 309  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 310    values are the bias to add to that token.
 311  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
 312    If provided, the output will always be validated against this
 313    format (unless the model returns a tool call).
 314    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
 315    Notes:
 316    - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
 317      Older models only support basic version of structured outputs through `{"type": "json_object"}`.
 318      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
 319    - For structured outputs with streaming,
 320      the `response_format` must be a JSON schema and not a Pydantic model.
 321  - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client.
 322  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 323  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 324    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 325  - **azure_ad_token_provider** (<code>AzureADTokenProvider | AsyncAzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on
 326    every request.
 327  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 328    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
 329  
 330  #### warm_up
 331  
 332  ```python
 333  warm_up() -> None
 334  ```
 335  
 336  Warm up the Azure OpenAI chat generator.
 337  
 338  This will warm up the tools registered in the chat generator.
 339  This method is idempotent and will only warm up the tools once.
 340  
 341  #### to_dict
 342  
 343  ```python
 344  to_dict() -> dict[str, Any]
 345  ```
 346  
 347  Serialize this component to a dictionary.
 348  
 349  **Returns:**
 350  
 351  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 352  
 353  #### from_dict
 354  
 355  ```python
 356  from_dict(data: dict[str, Any]) -> AzureOpenAIChatGenerator
 357  ```
 358  
 359  Deserialize this component from a dictionary.
 360  
 361  **Parameters:**
 362  
 363  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 364  
 365  **Returns:**
 366  
 367  - <code>AzureOpenAIChatGenerator</code> – The deserialized component instance.
 368  
 369  ## chat/azure_responses
 370  
 371  ### AzureOpenAIResponsesChatGenerator
 372  
 373  Bases: <code>OpenAIResponsesChatGenerator</code>
 374  
 375  Completes chats using OpenAI's Responses API on Azure.
 376  
 377  It works with the gpt-5 and o-series models and supports streaming responses
 378  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 379  format in input and output.
 380  
 381  You can customize how the text is generated by passing parameters to the
 382  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 383  the component or when you run it. Any parameter that works with
 384  `openai.Responses.create` will work here too.
 385  
 386  For details on OpenAI API parameters, see
 387  [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses).
 388  
 389  ### Usage example
 390  
 391  <!-- test-ignore -->
 392  
 393  ```python
 394  from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator
 395  from haystack.dataclasses import ChatMessage
 396  
 397  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 398  
 399  client = AzureOpenAIResponsesChatGenerator(
 400      azure_endpoint="https://example-resource.azure.openai.com/",
 401      generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}
 402  )
 403  response = client.run(messages)
 404  print(response)
 405  ```
 406  
 407  #### SUPPORTED_MODELS
 408  
 409  ```python
 410  SUPPORTED_MODELS: list[str] = [
 411      "gpt-5.4-pro",
 412      "gpt-5.4",
 413      "gpt-5.3-chat",
 414      "gpt-5.3-codex",
 415      "gpt-5.2-codex",
 416      "gpt-5.2",
 417      "gpt-5.2-chat",
 418      "gpt-5.1-codex-max",
 419      "gpt-5.1",
 420      "gpt-5.1-chat",
 421      "gpt-5.1-codex",
 422      "gpt-5.1-codex-mini",
 423      "gpt-5-pro",
 424      "gpt-5-codex",
 425      "gpt-5",
 426      "gpt-5-mini",
 427      "gpt-5-nano",
 428      "gpt-5-chat",
 429      "gpt-4o",
 430      "gpt-4o-mini",
 431      "computer-use-preview",
 432      "gpt-4.1",
 433      "gpt-4.1-nano",
 434      "gpt-4.1-mini",
 435      "gpt-image-1",
 436      "gpt-image-1-mini",
 437      "gpt-image-1.5",
 438      "o1",
 439      "o3-mini",
 440      "o3",
 441      "o4-mini",
 442  ]
 443  
 444  ```
 445  
 446  A non-exhaustive list of chat models supported by this component.
 447  See https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/responses#model-support for the full list.
 448  
 449  #### __init__
 450  
 451  ```python
 452  __init__(
 453      *,
 454      api_key: (
 455          Secret | Callable[[], str] | Callable[[], Awaitable[str]]
 456      ) = Secret.from_env_var("AZURE_OPENAI_API_KEY", strict=False),
 457      azure_endpoint: str | None = None,
 458      azure_deployment: str = "gpt-5-mini",
 459      streaming_callback: StreamingCallbackT | None = None,
 460      organization: str | None = None,
 461      generation_kwargs: dict[str, Any] | None = None,
 462      timeout: float | None = None,
 463      max_retries: int | None = None,
 464      tools: ToolsType | None = None,
 465      tools_strict: bool = False,
 466      http_client_kwargs: dict[str, Any] | None = None
 467  ) -> None
 468  ```
 469  
 470  Initialize the AzureOpenAIResponsesChatGenerator component.
 471  
 472  **Parameters:**
 473  
 474  - **api_key** (<code>Secret | Callable\[[], str\] | Callable\[[], Awaitable\[str\]\]</code>) – The API key to use for authentication. Can be:
 475  - A `Secret` object containing the API key.
 476  - A `Secret` object containing the [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
 477  - A function that returns an Azure Active Directory token.
 478  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
 479  - **azure_deployment** (<code>str</code>) – The deployment of the model, usually the model name.
 480  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
 481    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
 482  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
 483    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
 484    as an argument.
 485  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
 486    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 487  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
 488    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 489  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent
 490    directly to the OpenAI endpoint.
 491    See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for
 492    more details.
 493    Some of the supported parameters:
 494  - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random,
 495    while lower values like 0.2 will make it more focused and deterministic.
 496  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 497    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
 498    comprising the top 10% probability mass are considered.
 499  - `previous_response_id`: The ID of the previous response.
 500    Use this to create multi-turn conversations.
 501  - `text_format`: A Pydantic model that enforces the structure of the model's response.
 502    If provided, the output will always be validated against this
 503    format (unless the model returns a tool call).
 504    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
 505  - `text`: A JSON schema that enforces the structure of the model's response.
 506    If provided, the output will always be validated against this
 507    format (unless the model returns a tool call).
 508    Notes:
 509    - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
 510    - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored.
 511    - Currently, this component doesn't support streaming for structured outputs.
 512    - Older models only support basic version of structured outputs through `{"type": "json_object"}`.
 513      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
 514  - `reasoning`: A dictionary of parameters for reasoning. For example:
 515    - `summary`: The summary of the reasoning.
 516    - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`.
 517    - `generate_summary`: Whether to generate a summary of the reasoning.
 518      Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled.
 519      For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning).
 520  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 521  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 522    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 523  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 524    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
 525  
 526  #### to_dict
 527  
 528  ```python
 529  to_dict() -> dict[str, Any]
 530  ```
 531  
 532  Serialize this component to a dictionary.
 533  
 534  **Returns:**
 535  
 536  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 537  
 538  #### from_dict
 539  
 540  ```python
 541  from_dict(data: dict[str, Any]) -> AzureOpenAIResponsesChatGenerator
 542  ```
 543  
 544  Deserialize this component from a dictionary.
 545  
 546  **Parameters:**
 547  
 548  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 549  
 550  **Returns:**
 551  
 552  - <code>AzureOpenAIResponsesChatGenerator</code> – The deserialized component instance.
 553  
 554  ## chat/fallback
 555  
 556  ### FallbackChatGenerator
 557  
 558  A chat generator wrapper that tries multiple chat generators sequentially.
 559  
 560  It forwards all parameters transparently to the underlying chat generators and returns the first successful result.
 561  Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator.
 562  If all chat generators fail, it raises a RuntimeError with details.
 563  
 564  Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only
 565  work correctly if the underlying chat generators implement proper timeout handling and raise exceptions
 566  when timeouts occur. For predictable latency guarantees, ensure your chat generators:
 567  
 568  - Support a `timeout` parameter in their initialization
 569  - Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming)
 570  - Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded
 571  
 572  Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters
 573  with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`)
 574  typically applies to all connection phases: connection setup, read, write, and pool. For streaming
 575  responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for
 576  receiving the complete response.
 577  
 578  Fail over is automatically triggered when a generator raises any exception, including:
 579  
 580  - Timeout errors (if the generator implements and raises them)
 581  - Rate limit errors (429)
 582  - Authentication errors (401)
 583  - Context length errors (400)
 584  - Server errors (500+)
 585  - Any other exception
 586  
 587  #### __init__
 588  
 589  ```python
 590  __init__(chat_generators: list[ChatGenerator]) -> None
 591  ```
 592  
 593  Creates an instance of FallbackChatGenerator.
 594  
 595  **Parameters:**
 596  
 597  - **chat_generators** (<code>list\[ChatGenerator\]</code>) – A non-empty list of chat generator components to try in order.
 598  
 599  #### to_dict
 600  
 601  ```python
 602  to_dict() -> dict[str, Any]
 603  ```
 604  
 605  Serialize the component, including nested chat generators when they support serialization.
 606  
 607  #### from_dict
 608  
 609  ```python
 610  from_dict(data: dict[str, Any]) -> FallbackChatGenerator
 611  ```
 612  
 613  Rebuild the component from a serialized representation, restoring nested chat generators.
 614  
 615  #### warm_up
 616  
 617  ```python
 618  warm_up() -> None
 619  ```
 620  
 621  Warm up all underlying chat generators.
 622  
 623  This method calls warm_up() on each underlying generator that supports it.
 624  
 625  #### run
 626  
 627  ```python
 628  run(
 629      messages: list[ChatMessage],
 630      generation_kwargs: dict[str, Any] | None = None,
 631      tools: ToolsType | None = None,
 632      streaming_callback: StreamingCallbackT | None = None,
 633  ) -> dict[str, list[ChatMessage] | dict[str, Any]]
 634  ```
 635  
 636  Execute chat generators sequentially until one succeeds.
 637  
 638  **Parameters:**
 639  
 640  - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances.
 641  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens).
 642  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
 643  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses.
 644  
 645  **Returns:**
 646  
 647  - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with:
 648  - "replies": Generated ChatMessage instances from the first successful generator.
 649  - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
 650    total_attempts, failed_chat_generators, plus any metadata from the successful generator.
 651  
 652  **Raises:**
 653  
 654  - <code>RuntimeError</code> – If all chat generators fail.
 655  
 656  #### run_async
 657  
 658  ```python
 659  run_async(
 660      messages: list[ChatMessage],
 661      generation_kwargs: dict[str, Any] | None = None,
 662      tools: ToolsType | None = None,
 663      streaming_callback: StreamingCallbackT | None = None,
 664  ) -> dict[str, list[ChatMessage] | dict[str, Any]]
 665  ```
 666  
 667  Asynchronously execute chat generators sequentially until one succeeds.
 668  
 669  **Parameters:**
 670  
 671  - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances.
 672  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens).
 673  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
 674  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses.
 675  
 676  **Returns:**
 677  
 678  - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with:
 679  - "replies": Generated ChatMessage instances from the first successful generator.
 680  - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
 681    total_attempts, failed_chat_generators, plus any metadata from the successful generator.
 682  
 683  **Raises:**
 684  
 685  - <code>RuntimeError</code> – If all chat generators fail.
 686  
 687  ## chat/hugging_face_api
 688  
 689  ### HuggingFaceAPIChatGenerator
 690  
 691  Completes chats using Hugging Face APIs.
 692  
 693  HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 694  format for input and output. Use it to generate text with Hugging Face APIs:
 695  
 696  - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers)
 697  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
 698  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
 699  
 700  ### Usage examples
 701  
 702  #### With the serverless inference API (Inference Providers) - free tier available
 703  
 704  <!-- test-ignore -->
 705  
 706  ```python
 707  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 708  from haystack.dataclasses import ChatMessage
 709  from haystack.utils import Secret
 710  from haystack.utils.hf import HFGenerationAPIType
 711  
 712  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 713              ChatMessage.from_user("What's Natural Language Processing?")]
 714  
 715  # the api_type can be expressed using the HFGenerationAPIType enum or as a string
 716  api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API
 717  api_type = "serverless_inference_api" # this is equivalent to the above
 718  
 719  generator = HuggingFaceAPIChatGenerator(api_type=api_type,
 720                                          api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
 721                                                      "provider": "together"},
 722                                          token=Secret.from_token("<your-api-key>"))
 723  
 724  result = generator.run(messages)
 725  print(result)
 726  ```
 727  
 728  #### With the serverless inference API (Inference Providers) and text+image input
 729  
 730  <!-- test-ignore -->
 731  
 732  ```python
 733  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 734  from haystack.dataclasses import ChatMessage, ImageContent
 735  from haystack.utils import Secret
 736  from haystack.utils.hf import HFGenerationAPIType
 737  
 738  # Create an image from file path, URL, or base64
 739  image = ImageContent.from_file_path("path/to/your/image.jpg")
 740  
 741  # Create a multimodal message with both text and image
 742  messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])]
 743  
 744  generator = HuggingFaceAPIChatGenerator(
 745      api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
 746      api_params={
 747          "model": "Qwen/Qwen2.5-VL-7B-Instruct",  # Vision Language Model
 748          "provider": "hyperbolic"
 749      },
 750      token=Secret.from_token("<your-api-key>")
 751  )
 752  
 753  result = generator.run(messages)
 754  print(result)
 755  ```
 756  
 757  #### With paid inference endpoints
 758  
 759  <!-- test-ignore -->
 760  
 761  ```python
 762  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 763  from haystack.dataclasses import ChatMessage
 764  from haystack.utils import Secret
 765  
 766  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 767              ChatMessage.from_user("What's Natural Language Processing?")]
 768  
 769  generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints",
 770                                          api_params={"url": "<your-inference-endpoint-url>"},
 771                                          token=Secret.from_token("<your-api-key>"))
 772  
 773  result = generator.run(messages)
 774  print(result)
 775  ```
 776  
 777  #### With self-hosted text generation inference
 778  
 779  <!-- test-ignore -->
 780  
 781  ```python
 782  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 783  from haystack.dataclasses import ChatMessage
 784  
 785  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 786              ChatMessage.from_user("What's Natural Language Processing?")]
 787  
 788  generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference",
 789                                          api_params={"url": "http://localhost:8080"})
 790  
 791  result = generator.run(messages)
 792  print(result)
 793  ```
 794  
 795  #### __init__
 796  
 797  ```python
 798  __init__(
 799      api_type: HFGenerationAPIType | str,
 800      api_params: dict[str, str],
 801      token: Secret | None = Secret.from_env_var(
 802          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
 803      ),
 804      generation_kwargs: dict[str, Any] | None = None,
 805      stop_words: list[str] | None = None,
 806      streaming_callback: StreamingCallbackT | None = None,
 807      tools: ToolsType | None = None,
 808  ) -> None
 809  ```
 810  
 811  Initialize the HuggingFaceAPIChatGenerator instance.
 812  
 813  **Parameters:**
 814  
 815  - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types:
 816  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
 817  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
 818  - `serverless_inference_api`: See
 819    [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers).
 820  - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys:
 821  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
 822  - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`.
 823  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
 824    `TEXT_GENERATION_INFERENCE`.
 825  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc.
 826  - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization.
 827    Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
 828  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
 829    Some examples: `max_tokens`, `temperature`, `top_p`.
 830    For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
 831  - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words.
 832  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
 833  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 834    The chosen model should support tool/function calling, according to the model card.
 835    Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience
 836    unexpected behavior.
 837  
 838  #### warm_up
 839  
 840  ```python
 841  warm_up() -> None
 842  ```
 843  
 844  Warm up the Hugging Face API chat generator.
 845  
 846  This will warm up the tools registered in the chat generator.
 847  This method is idempotent and will only warm up the tools once.
 848  
 849  #### to_dict
 850  
 851  ```python
 852  to_dict() -> dict[str, Any]
 853  ```
 854  
 855  Serialize this component to a dictionary.
 856  
 857  **Returns:**
 858  
 859  - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component.
 860  
 861  #### from_dict
 862  
 863  ```python
 864  from_dict(data: dict[str, Any]) -> HuggingFaceAPIChatGenerator
 865  ```
 866  
 867  Deserialize this component from a dictionary.
 868  
 869  #### run
 870  
 871  ```python
 872  run(
 873      messages: list[ChatMessage],
 874      generation_kwargs: dict[str, Any] | None = None,
 875      tools: ToolsType | None = None,
 876      streaming_callback: StreamingCallbackT | None = None,
 877  ) -> dict[str, list[ChatMessage]]
 878  ```
 879  
 880  Invoke the text generation inference based on the provided messages and generation parameters.
 881  
 882  **Parameters:**
 883  
 884  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
 885  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
 886  - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override
 887    the `tools` parameter set during component initialization. This parameter can accept either a
 888    list of `Tool` objects or a `Toolset` instance.
 889  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
 890    parameter set during component initialization.
 891  
 892  **Returns:**
 893  
 894  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
 895  - `replies`: A list containing the generated responses as ChatMessage objects.
 896  
 897  #### run_async
 898  
 899  ```python
 900  run_async(
 901      messages: list[ChatMessage],
 902      generation_kwargs: dict[str, Any] | None = None,
 903      tools: ToolsType | None = None,
 904      streaming_callback: StreamingCallbackT | None = None,
 905  ) -> dict[str, list[ChatMessage]]
 906  ```
 907  
 908  Asynchronously invokes the text generation inference based on the provided messages and generation parameters.
 909  
 910  This is the asynchronous version of the `run` method. It has the same parameters
 911  and return values but can be used with `await` in an async code.
 912  
 913  **Parameters:**
 914  
 915  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
 916  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
 917  - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools`
 918    parameter set during component initialization. This parameter can accept either a list of `Tool` objects
 919    or a `Toolset` instance.
 920  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
 921    parameter set during component initialization.
 922  
 923  **Returns:**
 924  
 925  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
 926  - `replies`: A list containing the generated responses as ChatMessage objects.
 927  
 928  ## chat/hugging_face_local
 929  
 930  ### default_tool_parser
 931  
 932  ```python
 933  default_tool_parser(text: str) -> list[ToolCall] | None
 934  ```
 935  
 936  Default implementation for parsing tool calls from model output text.
 937  
 938  Uses DEFAULT_TOOL_PATTERN to extract tool calls.
 939  
 940  **Parameters:**
 941  
 942  - **text** (<code>str</code>) – The text to parse for tool calls.
 943  
 944  **Returns:**
 945  
 946  - <code>list\[ToolCall\] | None</code> – A list containing a single ToolCall if a valid tool call is found, None otherwise.
 947  
 948  ### HuggingFaceLocalChatGenerator
 949  
 950  Generates chat responses using models from Hugging Face that run locally.
 951  
 952  Use this component with chat-based models,
 953  such as `Qwen/Qwen3-0.6B` or `meta-llama/Llama-2-7b-chat-hf`.
 954  LLMs running locally may need powerful hardware.
 955  
 956  ### Usage example
 957  
 958  <!-- test-ignore -->
 959  
 960  ```python
 961  from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
 962  from haystack.dataclasses import ChatMessage
 963  
 964  generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B")
 965  messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
 966  print(generator.run(messages))
 967  ```
 968  
 969  ```
 970  {'replies':
 971      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
 972      "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals
 973      with the interaction between computers and human language. It enables computers to understand, interpret, and
 974      generate human language in a valuable way. NLP involves various techniques such as speech recognition, text
 975      analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to
 976      process and derive meaning from human language, improving communication between humans and machines.")],
 977      _name=None,
 978      _meta={'finish_reason': 'stop', 'index': 0, 'model':
 979            'mistralai/Mistral-7B-Instruct-v0.2',
 980            'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}})
 981            ]
 982  }
 983  ```
 984  
 985  #### __init__
 986  
 987  ```python
 988  __init__(
 989      model: str = "Qwen/Qwen3-0.6B",
 990      task: (
 991          Literal["text-generation", "text2text-generation", "image-text-to-text"]
 992          | None
 993      ) = None,
 994      device: ComponentDevice | None = None,
 995      token: Secret | None = Secret.from_env_var(
 996          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
 997      ),
 998      chat_template: str | None = None,
 999      generation_kwargs: dict[str, Any] | None = None,
1000      huggingface_pipeline_kwargs: dict[str, Any] | None = None,
1001      stop_words: list[str] | None = None,
1002      streaming_callback: StreamingCallbackT | None = None,
1003      tools: ToolsType | None = None,
1004      tool_parsing_function: Callable[[str], list[ToolCall] | None] | None = None,
1005      async_executor: ThreadPoolExecutor | None = None,
1006      *,
1007      enable_thinking: bool = False
1008  ) -> None
1009  ```
1010  
1011  Initializes the HuggingFaceLocalChatGenerator component.
1012  
1013  **Parameters:**
1014  
1015  - **model** (<code>str</code>) – The Hugging Face text generation model name or path,
1016    for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`.
1017    The model must be a chat model supporting the ChatML messaging
1018    format.
1019    If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1020  - **task** (<code>Literal['text-generation', 'text2text-generation', 'image-text-to-text'] | None</code>) – The task for the Hugging Face pipeline. Possible options:
1021  - `text-generation`: Supported by decoder models, like GPT.
1022  - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead.
1023    Previously supported by encoder–decoder models such as T5.
1024  - `image-text-to-text`: Supported by vision-language models.
1025    If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1026    If not specified, the component calls the Hugging Face API to infer the task from the model name.
1027  - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device.
1028    If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
1029  - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files.
1030    If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1031  - **chat_template** (<code>str | None</code>) – Specifies an optional Jinja template for formatting chat
1032    messages. Most high-quality chat models have their own templates, but for models without this
1033    feature or if you prefer a custom template, use this parameter.
1034  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
1035    Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
1036    See Hugging Face's documentation for more information:
1037  - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
1038  - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
1039      The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens.
1040  - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the
1041    Hugging Face pipeline for text generation.
1042    These keyword arguments provide fine-grained control over the Hugging Face pipeline.
1043    In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
1044    For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
1045    In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
1046  - **stop_words** (<code>list\[str\] | None</code>) – A list of stop words. If the model generates a stop word, the generation stops.
1047    If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
1048    For some chat models, the output includes both the new text and the original prompt.
1049    In these cases, make sure your prompt has no stop words.
1050  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1051  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1052  - **tool_parsing_function** (<code>Callable\\[[str\], list\[ToolCall\] | None\] | None</code>) – A callable that takes a string and returns a list of ToolCall objects or None.
1053    If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern.
1054  - **async_executor** (<code>ThreadPoolExecutor | None</code>) – Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be
1055    initialized and used
1056  - **enable_thinking** (<code>bool</code>) – Whether to enable thinking mode in the chat template for thinking-capable models.
1057    When enabled, the model generates intermediate reasoning before the final response. Defaults to False.
1058  
1059  #### shutdown
1060  
1061  ```python
1062  shutdown() -> None
1063  ```
1064  
1065  Explicitly shutdown the executor if we own it.
1066  
1067  #### warm_up
1068  
1069  ```python
1070  warm_up() -> None
1071  ```
1072  
1073  Initializes the component and warms up tools if provided.
1074  
1075  #### to_dict
1076  
1077  ```python
1078  to_dict() -> dict[str, Any]
1079  ```
1080  
1081  Serializes the component to a dictionary.
1082  
1083  **Returns:**
1084  
1085  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1086  
1087  #### from_dict
1088  
1089  ```python
1090  from_dict(data: dict[str, Any]) -> HuggingFaceLocalChatGenerator
1091  ```
1092  
1093  Deserializes the component from a dictionary.
1094  
1095  **Parameters:**
1096  
1097  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
1098  
1099  **Returns:**
1100  
1101  - <code>HuggingFaceLocalChatGenerator</code> – The deserialized component.
1102  
1103  #### run
1104  
1105  ```python
1106  run(
1107      messages: list[ChatMessage],
1108      generation_kwargs: dict[str, Any] | None = None,
1109      streaming_callback: StreamingCallbackT | None = None,
1110      tools: ToolsType | None = None,
1111  ) -> dict[str, list[ChatMessage]]
1112  ```
1113  
1114  Invoke text generation inference based on the provided messages and generation parameters.
1115  
1116  **Parameters:**
1117  
1118  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
1119  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
1120  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1121  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1122    If set, it will override the `tools` parameter provided during initialization.
1123  
1124  **Returns:**
1125  
1126  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
1127  - `replies`: A list containing the generated responses as ChatMessage instances.
1128  
1129  #### create_message
1130  
1131  ```python
1132  create_message(
1133      text: str,
1134      index: int,
1135      tokenizer: Union[PreTrainedTokenizer, PreTrainedTokenizerFast],
1136      prompt: str,
1137      generation_kwargs: dict[str, Any],
1138      parse_tool_calls: bool = False,
1139  ) -> ChatMessage
1140  ```
1141  
1142  Create a ChatMessage instance from the provided text, populated with metadata.
1143  
1144  **Parameters:**
1145  
1146  - **text** (<code>str</code>) – The generated text.
1147  - **index** (<code>int</code>) – The index of the generated text.
1148  - **tokenizer** (<code>Union\[PreTrainedTokenizer, PreTrainedTokenizerFast\]</code>) – The tokenizer used for generation.
1149  - **prompt** (<code>str</code>) – The prompt used for generation.
1150  - **generation_kwargs** (<code>dict\[str, Any\]</code>) – The generation parameters.
1151  - **parse_tool_calls** (<code>bool</code>) – Whether to attempt parsing tool calls from the text.
1152  
1153  **Returns:**
1154  
1155  - <code>ChatMessage</code> – A ChatMessage instance.
1156  
1157  #### run_async
1158  
1159  ```python
1160  run_async(
1161      messages: list[ChatMessage],
1162      generation_kwargs: dict[str, Any] | None = None,
1163      streaming_callback: StreamingCallbackT | None = None,
1164      tools: ToolsType | None = None,
1165  ) -> dict[str, list[ChatMessage]]
1166  ```
1167  
1168  Asynchronously invokes text generation inference based on the provided messages and generation parameters.
1169  
1170  This is the asynchronous version of the `run` method. It has the same parameters
1171  and return values but can be used with `await` in an async code.
1172  
1173  **Parameters:**
1174  
1175  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
1176  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
1177  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1178  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1179    If set, it will override the `tools` parameter provided during initialization.
1180  
1181  **Returns:**
1182  
1183  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
1184  - `replies`: A list containing the generated responses as ChatMessage instances.
1185  
1186  ## chat/llm
1187  
1188  ### LLM
1189  
1190  Bases: <code>Agent</code>
1191  
1192  A text generation component powered by a large language model.
1193  
1194  The LLM component is a simplified version of the Agent that focuses solely on text generation
1195  without tool usage. It processes messages and returns a single response from the language model.
1196  
1197  ### Usage examples
1198  
1199  ```python
1200  from haystack.components.generators.chat import LLM
1201  from haystack.components.generators.chat import OpenAIChatGenerator
1202  from haystack.dataclasses import ChatMessage
1203  
1204  llm = LLM(
1205      chat_generator=OpenAIChatGenerator(),
1206      system_prompt="You are a helpful translation assistant.",
1207      user_prompt="""{% message role="user"%}
1208  Summarize the following document: {{ document }}
1209  {% endmessage %}""",
1210      required_variables=["document"],
1211  )
1212  
1213  result = llm.run(document="The weather is lovely today and the sun is shining. ")
1214  print(result["last_message"].text)
1215  ```
1216  
1217  #### __init__
1218  
1219  ```python
1220  __init__(
1221      *,
1222      chat_generator: ChatGenerator,
1223      system_prompt: str | None = None,
1224      user_prompt: str,
1225      required_variables: list[str] | Literal["*"] = "*",
1226      streaming_callback: StreamingCallbackT | None = None
1227  ) -> None
1228  ```
1229  
1230  Initialize the LLM component.
1231  
1232  **Parameters:**
1233  
1234  - **chat_generator** (<code>ChatGenerator</code>) – An instance of the chat generator that the LLM should use.
1235  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM.
1236  - **user_prompt** (<code>str</code>) – User prompt for the LLM. Must contain at least one Jinja2 template variable
1237    (e.g., `{{ variable_name }}`). This prompt is appended to the messages provided at runtime.
1238  - **required_variables** (<code>list\[str\] | Literal['\*']</code>) – Variables that must be provided as input to user_prompt.
1239    If a variable listed as required is not provided, an exception is raised.
1240    If set to `"*"`, all variables found in the prompt are required. Defaults to `"*"`.
1241  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM.
1242  
1243  **Raises:**
1244  
1245  - <code>ValueError</code> – If user_prompt contains no template variables.
1246  - <code>ValueError</code> – If required_variables is an empty list.
1247  
1248  #### to_dict
1249  
1250  ```python
1251  to_dict() -> dict[str, Any]
1252  ```
1253  
1254  Serialize the LLM component to a dictionary.
1255  
1256  **Returns:**
1257  
1258  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1259  
1260  #### from_dict
1261  
1262  ```python
1263  from_dict(data: dict[str, Any]) -> LLM
1264  ```
1265  
1266  Deserialize the LLM from a dictionary.
1267  
1268  **Parameters:**
1269  
1270  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
1271  
1272  **Returns:**
1273  
1274  - <code>LLM</code> – Deserialized LLM instance.
1275  
1276  #### run
1277  
1278  ```python
1279  run(
1280      messages: list[ChatMessage] | None = None,
1281      streaming_callback: StreamingCallbackT | None = None,
1282      *,
1283      generation_kwargs: dict[str, Any] | None = None,
1284      system_prompt: str | None = None,
1285      user_prompt: str | None = None,
1286      **kwargs: Any
1287  ) -> dict[str, Any]
1288  ```
1289  
1290  Process messages and generate a response from the language model.
1291  
1292  **Parameters:**
1293  
1294  - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process.
1295  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM.
1296  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters
1297    will override the parameters passed during component initialization.
1298  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt.
1299  - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is
1300    appended to the messages provided at runtime.
1301  - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt`
1302    (the keys must match template variable names).
1303  
1304  **Returns:**
1305  
1306  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
1307  - "messages": List of all messages exchanged during the LLM's run.
1308  - "last_message": The last message exchanged during the LLM's run.
1309  
1310  #### run_async
1311  
1312  ```python
1313  run_async(
1314      messages: list[ChatMessage] | None = None,
1315      streaming_callback: StreamingCallbackT | None = None,
1316      *,
1317      generation_kwargs: dict[str, Any] | None = None,
1318      system_prompt: str | None = None,
1319      user_prompt: str | None = None,
1320      **kwargs: Any
1321  ) -> dict[str, Any]
1322  ```
1323  
1324  Asynchronously process messages and generate a response from the language model.
1325  
1326  **Parameters:**
1327  
1328  - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process.
1329  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An asynchronous callback that will be invoked when a response is streamed
1330    from the LLM.
1331  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters
1332    will override the parameters passed during component initialization.
1333  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt.
1334  - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is
1335    appended to the messages provided at runtime.
1336  - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt`
1337    (the keys must match template variable names).
1338  
1339  **Returns:**
1340  
1341  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
1342  - "messages": List of all messages exchanged during the LLM's run.
1343  - "last_message": The last message exchanged during the LLM's run.
1344  
1345  ## chat/openai
1346  
1347  ### OpenAIChatGenerator
1348  
1349  Completes chats using OpenAI's large language models (LLMs).
1350  
1351  It works with the gpt-4 and gpt-5 series models and supports streaming responses
1352  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1353  format in input and output.
1354  
1355  You can customize how the text is generated by passing parameters to the
1356  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1357  the component or when you run it. Any parameter that works with
1358  `openai.ChatCompletion.create` will work here too.
1359  
1360  For details on OpenAI API parameters, see
1361  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
1362  
1363  ### Usage example
1364  
1365  ```python
1366  from haystack.components.generators.chat import OpenAIChatGenerator
1367  from haystack.dataclasses import ChatMessage
1368  
1369  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1370  
1371  client = OpenAIChatGenerator()
1372  response = client.run(messages)
1373  print(response)
1374  ```
1375  
1376  Output:
1377  
1378  ```
1379  {'replies':
1380      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=
1381      [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence
1382          that focuses on enabling computers to understand, interpret, and generate human language in
1383          a way that is meaningful and useful.")],
1384       _name=None,
1385       _meta={'model': 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop',
1386       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})
1387      ]
1388  }
1389  ```
1390  
1391  #### SUPPORTED_MODELS
1392  
1393  ```python
1394  SUPPORTED_MODELS: list[str] = [
1395      "gpt-5-mini",
1396      "gpt-5-nano",
1397      "gpt-5",
1398      "gpt-5.1",
1399      "gpt-5.2",
1400      "gpt-5.2-pro",
1401      "gpt-5.4",
1402      "gpt-5-pro",
1403      "gpt-4.1",
1404      "gpt-4.1-mini",
1405      "gpt-4.1-nano",
1406      "gpt-4o",
1407      "gpt-4o-mini",
1408      "gpt-4-turbo",
1409      "gpt-4",
1410      "gpt-3.5-turbo",
1411  ]
1412  
1413  ```
1414  
1415  A non-exhaustive list of chat models supported by this component.
1416  See https://developers.openai.com/api/docs/models for the full list and snapshot IDs.
1417  
1418  #### __init__
1419  
1420  ```python
1421  __init__(
1422      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
1423      model: str = "gpt-5-mini",
1424      streaming_callback: StreamingCallbackT | None = None,
1425      api_base_url: str | None = None,
1426      organization: str | None = None,
1427      generation_kwargs: dict[str, Any] | None = None,
1428      timeout: float | None = None,
1429      max_retries: int | None = None,
1430      tools: ToolsType | None = None,
1431      tools_strict: bool = False,
1432      http_client_kwargs: dict[str, Any] | None = None,
1433  ) -> None
1434  ```
1435  
1436  Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini
1437  
1438  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
1439  environment variables to override the `timeout` and `max_retries` parameters respectively
1440  in the OpenAI client.
1441  
1442  **Parameters:**
1443  
1444  - **api_key** (<code>Secret</code>) – The OpenAI API key.
1445    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
1446    during initialization.
1447  - **model** (<code>str</code>) – The name of the model to use.
1448  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1449    The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1450    as an argument.
1451  - **api_base_url** (<code>str | None</code>) – An optional base URL.
1452  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See
1453    [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1454  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to
1455    the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
1456    more details.
1457    Some of the supported parameters:
1458  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
1459    including visible output tokens and reasoning tokens.
1460  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
1461    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
1462  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1463    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1464    comprising the top 10% probability mass are considered.
1465  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
1466    it will generate two completions for each of the three prompts, ending up with 6 completions in total.
1467  - `stop`: One or more sequences after which the LLM should stop generating tokens.
1468  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
1469    the model will be less likely to repeat the same token in the text.
1470  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
1471    Bigger values mean the model will be less likely to repeat the same token in the text.
1472  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
1473    values are the bias to add to that token.
1474  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
1475    If provided, the output will always be validated against this
1476    format (unless the model returns a tool call).
1477    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1478    Notes:
1479    - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
1480      Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1481      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1482    - For structured outputs with streaming,
1483      the `response_format` must be a JSON schema and not a Pydantic model.
1484  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
1485    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1486  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
1487    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1488  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1489  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1490    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1491  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1492    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
1493  
1494  #### warm_up
1495  
1496  ```python
1497  warm_up() -> None
1498  ```
1499  
1500  Warm up the OpenAI chat generator.
1501  
1502  This will warm up the tools registered in the chat generator.
1503  This method is idempotent and will only warm up the tools once.
1504  
1505  #### to_dict
1506  
1507  ```python
1508  to_dict() -> dict[str, Any]
1509  ```
1510  
1511  Serialize this component to a dictionary.
1512  
1513  **Returns:**
1514  
1515  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
1516  
1517  #### from_dict
1518  
1519  ```python
1520  from_dict(data: dict[str, Any]) -> OpenAIChatGenerator
1521  ```
1522  
1523  Deserialize this component from a dictionary.
1524  
1525  **Parameters:**
1526  
1527  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
1528  
1529  **Returns:**
1530  
1531  - <code>OpenAIChatGenerator</code> – The deserialized component instance.
1532  
1533  #### run
1534  
1535  ```python
1536  run(
1537      messages: list[ChatMessage],
1538      streaming_callback: StreamingCallbackT | None = None,
1539      generation_kwargs: dict[str, Any] | None = None,
1540      *,
1541      tools: ToolsType | None = None,
1542      tools_strict: bool | None = None
1543  ) -> dict[str, list[ChatMessage]]
1544  ```
1545  
1546  Invokes chat completion based on the provided messages and generation parameters.
1547  
1548  **Parameters:**
1549  
1550  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1551  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1552  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1553    override the parameters passed during component initialization.
1554    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1555  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1556    If set, it will override the `tools` parameter provided during initialization.
1557  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1558    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1559    If set, it will override the `tools_strict` parameter set during component initialization.
1560  
1561  **Returns:**
1562  
1563  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1564  - `replies`: A list containing the generated responses as ChatMessage instances.
1565  
1566  #### run_async
1567  
1568  ```python
1569  run_async(
1570      messages: list[ChatMessage],
1571      streaming_callback: StreamingCallbackT | None = None,
1572      generation_kwargs: dict[str, Any] | None = None,
1573      *,
1574      tools: ToolsType | None = None,
1575      tools_strict: bool | None = None
1576  ) -> dict[str, list[ChatMessage]]
1577  ```
1578  
1579  Asynchronously invokes chat completion based on the provided messages and generation parameters.
1580  
1581  This is the asynchronous version of the `run` method. It has the same parameters and return values
1582  but can be used with `await` in async code.
1583  
1584  **Parameters:**
1585  
1586  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1587  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1588    Must be a coroutine.
1589  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1590    override the parameters passed during component initialization.
1591    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1592  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1593    If set, it will override the `tools` parameter provided during initialization.
1594  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1595    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1596    If set, it will override the `tools_strict` parameter set during component initialization.
1597  
1598  **Returns:**
1599  
1600  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1601  - `replies`: A list containing the generated responses as ChatMessage instances.
1602  
1603  ## chat/openai_responses
1604  
1605  ### OpenAIResponsesChatGenerator
1606  
1607  Completes chats using OpenAI's Responses API.
1608  
1609  It works with the gpt-4 and o-series models and supports streaming responses
1610  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1611  format in input and output.
1612  
1613  You can customize how the text is generated by passing parameters to the
1614  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1615  the component or when you run it. Any parameter that works with
1616  `openai.Responses.create` will work here too.
1617  
1618  For details on OpenAI API parameters, see
1619  [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses).
1620  
1621  ### Usage example
1622  
1623  ```python
1624  from haystack.components.generators.chat import OpenAIResponsesChatGenerator
1625  from haystack.dataclasses import ChatMessage
1626  
1627  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1628  
1629  client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}})
1630  response = client.run(messages)
1631  print(response)
1632  ```
1633  
1634  #### SUPPORTED_MODELS
1635  
1636  ```python
1637  SUPPORTED_MODELS: list[str] = [
1638      "gpt-5-mini",
1639      "gpt-5-nano",
1640      "gpt-5",
1641      "gpt-5.1",
1642      "gpt-5.2",
1643      "gpt-5.2-pro",
1644      "gpt-5.4",
1645      "gpt-5-pro",
1646      "gpt-4.1",
1647      "gpt-4.1-mini",
1648      "gpt-4.1-nano",
1649      "gpt-4o",
1650      "gpt-4o-mini",
1651      "o1",
1652      "o1-mini",
1653      "o1-pro",
1654      "o3",
1655      "o3-mini",
1656      "o3-pro",
1657      "o4-mini",
1658  ]
1659  
1660  ```
1661  
1662  A non-exhaustive list of chat models supported by this component.
1663  See https://platform.openai.com/docs/models for the full list and snapshot IDs.
1664  
1665  #### __init__
1666  
1667  ```python
1668  __init__(
1669      *,
1670      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
1671      model: str = "gpt-5-mini",
1672      streaming_callback: StreamingCallbackT | None = None,
1673      api_base_url: str | None = None,
1674      organization: str | None = None,
1675      generation_kwargs: dict[str, Any] | None = None,
1676      timeout: float | None = None,
1677      max_retries: int | None = None,
1678      tools: ToolsType | list[dict] | None = None,
1679      tools_strict: bool = False,
1680      http_client_kwargs: dict[str, Any] | None = None
1681  ) -> None
1682  ```
1683  
1684  Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default.
1685  
1686  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
1687  environment variables to override the `timeout` and `max_retries` parameters respectively
1688  in the OpenAI client.
1689  
1690  **Parameters:**
1691  
1692  - **api_key** (<code>Secret</code>) – The OpenAI API key.
1693    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
1694    during initialization.
1695  - **model** (<code>str</code>) – The name of the model to use.
1696  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1697    The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1698    as an argument.
1699  - **api_base_url** (<code>str | None</code>) – An optional base URL.
1700  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See
1701    [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1702  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent
1703    directly to the OpenAI endpoint.
1704    See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for
1705    more details.
1706    Some of the supported parameters:
1707  - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random,
1708    while lower values like 0.2 will make it more focused and deterministic.
1709  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1710    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1711    comprising the top 10% probability mass are considered.
1712  - `previous_response_id`: The ID of the previous response.
1713    Use this to create multi-turn conversations.
1714  - `text_format`: A Pydantic model that enforces the structure of the model's response.
1715    If provided, the output will always be validated against this
1716    format (unless the model returns a tool call).
1717    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1718  - `text`: A JSON schema that enforces the structure of the model's response.
1719    If provided, the output will always be validated against this
1720    format (unless the model returns a tool call).
1721    Notes:
1722    - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
1723    - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored.
1724    - Currently, this component doesn't support streaming for structured outputs.
1725    - Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1726      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1727  - `reasoning`: A dictionary of parameters for reasoning. For example:
1728    - `summary`: The summary of the reasoning.
1729    - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`.
1730    - `generate_summary`: Whether to generate a summary of the reasoning.
1731      Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled.
1732      For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning).
1733  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
1734    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1735  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
1736    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1737  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. This parameter can accept either a
1738    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1739    OpenAI/MCP tool definitions.
1740    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1741    For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
1742  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
1743    follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
1744    are strict by default.
1745  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1746    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
1747  
1748  #### warm_up
1749  
1750  ```python
1751  warm_up() -> None
1752  ```
1753  
1754  Warm up the OpenAI responses chat generator.
1755  
1756  This will warm up the tools registered in the chat generator.
1757  This method is idempotent and will only warm up the tools once.
1758  
1759  #### to_dict
1760  
1761  ```python
1762  to_dict() -> dict[str, Any]
1763  ```
1764  
1765  Serialize this component to a dictionary.
1766  
1767  **Returns:**
1768  
1769  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
1770  
1771  #### from_dict
1772  
1773  ```python
1774  from_dict(data: dict[str, Any]) -> OpenAIResponsesChatGenerator
1775  ```
1776  
1777  Deserialize this component from a dictionary.
1778  
1779  **Parameters:**
1780  
1781  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
1782  
1783  **Returns:**
1784  
1785  - <code>OpenAIResponsesChatGenerator</code> – The deserialized component instance.
1786  
1787  #### run
1788  
1789  ```python
1790  run(
1791      messages: list[ChatMessage],
1792      *,
1793      streaming_callback: StreamingCallbackT | None = None,
1794      generation_kwargs: dict[str, Any] | None = None,
1795      tools: ToolsType | list[dict] | None = None,
1796      tools_strict: bool | None = None
1797  ) -> dict[str, list[ChatMessage]]
1798  ```
1799  
1800  Invokes response generation based on the provided messages and generation parameters.
1801  
1802  **Parameters:**
1803  
1804  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1805  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1806  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1807    override the parameters passed during component initialization.
1808    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
1809  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. If set, it will override the
1810    `tools` parameter set during component initialization. This parameter can accept either a
1811    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1812    OpenAI/MCP tool definitions.
1813    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1814    For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
1815  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
1816    follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
1817    are strict by default.
1818    If set, it will override the `tools_strict` parameter set during component initialization.
1819  
1820  **Returns:**
1821  
1822  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1823  - `replies`: A list containing the generated responses as ChatMessage instances.
1824  
1825  #### run_async
1826  
1827  ```python
1828  run_async(
1829      messages: list[ChatMessage],
1830      *,
1831      streaming_callback: StreamingCallbackT | None = None,
1832      generation_kwargs: dict[str, Any] | None = None,
1833      tools: ToolsType | list[dict] | None = None,
1834      tools_strict: bool | None = None
1835  ) -> dict[str, list[ChatMessage]]
1836  ```
1837  
1838  Asynchronously invokes response generation based on the provided messages and generation parameters.
1839  
1840  This is the asynchronous version of the `run` method. It has the same parameters and return values
1841  but can be used with `await` in async code.
1842  
1843  **Parameters:**
1844  
1845  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1846  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1847    Must be a coroutine.
1848  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1849    override the parameters passed during component initialization.
1850    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
1851  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
1852    `tools` parameter set during component initialization. This parameter can accept either a list of
1853    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1854    OpenAI/MCP tool definitions.
1855    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1856  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1857    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1858    If set, it will override the `tools_strict` parameter set during component initialization.
1859  
1860  **Returns:**
1861  
1862  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1863  - `replies`: A list containing the generated responses as ChatMessage instances.
1864  
1865  ## hugging_face_api
1866  
1867  ### HuggingFaceAPIGenerator
1868  
1869  Generates text using Hugging Face APIs.
1870  
1871  Use it with the following Hugging Face APIs:
1872  
1873  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
1874  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
1875  
1876  **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the
1877  `text_generation` endpoint. Generative models are now only available through providers supporting the
1878  `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API.
1879  Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint.
1880  
1881  ### Usage examples
1882  
1883  #### With Hugging Face Inference Endpoints
1884  
1885  <!-- test-ignore -->
1886  
1887  ```python
1888  from haystack.components.generators import HuggingFaceAPIGenerator
1889  from haystack.utils import Secret
1890  
1891  generator = HuggingFaceAPIGenerator(api_type="inference_endpoints",
1892                                      api_params={"url": "<your-inference-endpoint-url>"},
1893                                      token=Secret.from_token("<your-api-key>"))
1894  
1895  result = generator.run(prompt="What's Natural Language Processing?")
1896  print(result)
1897  ```
1898  
1899  #### With self-hosted text generation inference
1900  
1901  <!-- test-ignore -->
1902  
1903  ```python
1904  from haystack.components.generators import HuggingFaceAPIGenerator
1905  
1906  generator = HuggingFaceAPIGenerator(api_type="text_generation_inference",
1907                                      api_params={"url": "http://localhost:8080"})
1908  
1909  result = generator.run(prompt="What's Natural Language Processing?")
1910  print(result)
1911  ```
1912  
1913  #### With the free serverless inference API
1914  
1915  Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the
1916  `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the
1917  `chat_completion` endpoint.
1918  
1919  <!-- test-ignore -->
1920  
1921  ```python
1922  from haystack.components.generators import HuggingFaceAPIGenerator
1923  from haystack.utils import Secret
1924  
1925  generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
1926                                      api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
1927                                      token=Secret.from_token("<your-api-key>"))
1928  
1929  result = generator.run(prompt="What's Natural Language Processing?")
1930  print(result)
1931  ```
1932  
1933  #### __init__
1934  
1935  ```python
1936  __init__(
1937      api_type: HFGenerationAPIType | str,
1938      api_params: dict[str, str],
1939      token: Secret | None = Secret.from_env_var(
1940          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
1941      ),
1942      generation_kwargs: dict[str, Any] | None = None,
1943      stop_words: list[str] | None = None,
1944      streaming_callback: StreamingCallbackT | None = None,
1945  ) -> None
1946  ```
1947  
1948  Initialize the HuggingFaceAPIGenerator instance.
1949  
1950  **Parameters:**
1951  
1952  - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types:
1953  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
1954  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
1955  - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api).
1956    This might no longer work due to changes in the models offered in the Hugging Face Inference API.
1957    Please use the `HuggingFaceAPIChatGenerator` component instead.
1958  - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys:
1959  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
1960  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
1961    `TEXT_GENERATION_INFERENCE`.
1962  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc.
1963  - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization.
1964    Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
1965  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`,
1966    `temperature`, `top_k`, `top_p`.
1967    For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation)
1968    for more information.
1969  - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words.
1970  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1971  
1972  #### to_dict
1973  
1974  ```python
1975  to_dict() -> dict[str, Any]
1976  ```
1977  
1978  Serialize this component to a dictionary.
1979  
1980  **Returns:**
1981  
1982  - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component.
1983  
1984  #### from_dict
1985  
1986  ```python
1987  from_dict(data: dict[str, Any]) -> HuggingFaceAPIGenerator
1988  ```
1989  
1990  Deserialize this component from a dictionary.
1991  
1992  #### run
1993  
1994  ```python
1995  run(
1996      prompt: str,
1997      streaming_callback: StreamingCallbackT | None = None,
1998      generation_kwargs: dict[str, Any] | None = None,
1999  ) -> dict[str, Any]
2000  ```
2001  
2002  Invoke the text generation inference for the given prompt and generation parameters.
2003  
2004  **Parameters:**
2005  
2006  - **prompt** (<code>str</code>) – A string representing the prompt.
2007  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2008  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
2009  
2010  **Returns:**
2011  
2012  - <code>dict\[str, Any\]</code> – A dictionary with the generated replies and metadata. Both are lists of length n.
2013  - replies: A list of strings representing the generated replies.
2014  
2015  ## hugging_face_local
2016  
2017  ### HuggingFaceLocalGenerator
2018  
2019  Generates text using models from Hugging Face that run locally.
2020  
2021  LLMs running locally may need powerful hardware.
2022  
2023  ### Usage example
2024  
2025  ```python
2026  from haystack.components.generators import HuggingFaceLocalGenerator
2027  
2028  generator = HuggingFaceLocalGenerator(
2029      model="Qwen/Qwen3-0.6B",
2030      task="text-generation",
2031      generation_kwargs={"max_new_tokens": 100, "temperature": 0.9}
2032  )
2033  
2034  print(generator.run("Who is the best American actor?"))
2035  # >> {'replies': ['John Cusack']}
2036  ```
2037  
2038  #### __init__
2039  
2040  ```python
2041  __init__(
2042      model: str = "Qwen/Qwen3-0.6B",
2043      task: Literal["text-generation", "text2text-generation"] | None = None,
2044      device: ComponentDevice | None = None,
2045      token: Secret | None = Secret.from_env_var(
2046          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
2047      ),
2048      generation_kwargs: dict[str, Any] | None = None,
2049      huggingface_pipeline_kwargs: dict[str, Any] | None = None,
2050      stop_words: list[str] | None = None,
2051      streaming_callback: StreamingCallbackT | None = None,
2052  ) -> None
2053  ```
2054  
2055  Creates an instance of a HuggingFaceLocalGenerator.
2056  
2057  **Parameters:**
2058  
2059  - **model** (<code>str</code>) – The Hugging Face text generation model name or path.
2060  - **task** (<code>Literal['text-generation', 'text2text-generation'] | None</code>) – The task for the Hugging Face pipeline. Possible options:
2061  - `text-generation`: Supported by decoder models, like GPT.
2062  - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead.
2063    Previously supported by encoder–decoder models such as T5.
2064    If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
2065    If not specified, the component calls the Hugging Face API to infer the task from the model name.
2066  - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device.
2067    If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
2068  - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files.
2069    If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
2070  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
2071    Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
2072    See Hugging Face's documentation for more information:
2073  - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
2074  - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
2075  - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the
2076    Hugging Face pipeline for text generation.
2077    These keyword arguments provide fine-grained control over the Hugging Face pipeline.
2078    In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
2079    For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
2080    In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization:
2081    [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
2082  - **stop_words** (<code>list\[str\] | None</code>) – If the model generates a stop word, the generation stops.
2083    If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
2084    For some chat models, the output includes both the new text and the original prompt.
2085    In these cases, make sure your prompt has no stop words.
2086  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
2087  
2088  #### warm_up
2089  
2090  ```python
2091  warm_up() -> None
2092  ```
2093  
2094  Initializes the component.
2095  
2096  #### to_dict
2097  
2098  ```python
2099  to_dict() -> dict[str, Any]
2100  ```
2101  
2102  Serializes the component to a dictionary.
2103  
2104  **Returns:**
2105  
2106  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
2107  
2108  #### from_dict
2109  
2110  ```python
2111  from_dict(data: dict[str, Any]) -> HuggingFaceLocalGenerator
2112  ```
2113  
2114  Deserializes the component from a dictionary.
2115  
2116  **Parameters:**
2117  
2118  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
2119  
2120  **Returns:**
2121  
2122  - <code>HuggingFaceLocalGenerator</code> – The deserialized component.
2123  
2124  #### run
2125  
2126  ```python
2127  run(
2128      prompt: str,
2129      streaming_callback: StreamingCallbackT | None = None,
2130      generation_kwargs: dict[str, Any] | None = None,
2131  ) -> dict[str, Any]
2132  ```
2133  
2134  Run the text generation model on the given prompt.
2135  
2136  **Parameters:**
2137  
2138  - **prompt** (<code>str</code>) – A string representing the prompt.
2139  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2140  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
2141  
2142  **Returns:**
2143  
2144  - <code>dict\[str, Any\]</code> – A dictionary containing the generated replies.
2145  - replies: A list of strings representing the generated replies.
2146  
2147  ## openai
2148  
2149  ### OpenAIGenerator
2150  
2151  Generates text using OpenAI's large language models (LLMs).
2152  
2153  It works with the gpt-4 and gpt-5 series models and supports streaming responses
2154  from OpenAI API. It uses strings as input and output.
2155  
2156  You can customize how the text is generated by passing parameters to the
2157  OpenAI API. Use the `**generation_kwargs` argument when you initialize
2158  the component or when you run it. Any parameter that works with
2159  `openai.ChatCompletion.create` will work here too.
2160  
2161  For details on OpenAI API parameters, see
2162  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
2163  
2164  ### Usage example
2165  
2166  ```python
2167  from haystack.components.generators import OpenAIGenerator
2168  client = OpenAIGenerator()
2169  response = client.run("What's Natural Language Processing? Be brief.")
2170  print(response)
2171  
2172  # >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
2173  # >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
2174  # >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
2175  # >> 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
2176  # >> 'completion_tokens': 49, 'total_tokens': 65}}]}
2177  ```
2178  
2179  #### __init__
2180  
2181  ```python
2182  __init__(
2183      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
2184      model: str = "gpt-5-mini",
2185      streaming_callback: StreamingCallbackT | None = None,
2186      api_base_url: str | None = None,
2187      organization: str | None = None,
2188      system_prompt: str | None = None,
2189      generation_kwargs: dict[str, Any] | None = None,
2190      timeout: float | None = None,
2191      max_retries: int | None = None,
2192      http_client_kwargs: dict[str, Any] | None = None,
2193  ) -> None
2194  ```
2195  
2196  Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini
2197  
2198  By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters
2199  in the OpenAI client.
2200  
2201  **Parameters:**
2202  
2203  - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI.
2204  - **model** (<code>str</code>) – The name of the model to use.
2205  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2206    The callback function accepts StreamingChunk as an argument.
2207  - **api_base_url** (<code>str | None</code>) – An optional base URL.
2208  - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`.
2209  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the system prompt is
2210    omitted, and the default system prompt of the model is used.
2211  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are all sent directly to
2212    the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
2213    more details.
2214    Some of the supported parameters:
2215  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
2216    including visible output tokens and reasoning tokens.
2217  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
2218    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
2219  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
2220    considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens
2221    comprising the top 10% probability mass are considered.
2222  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
2223    it will generate two completions for each of the three prompts, ending up with 6 completions in total.
2224  - `stop`: One or more sequences after which the LLM should stop generating tokens.
2225  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
2226    the model will be less likely to repeat the same token in the text.
2227  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
2228    Bigger values mean the model will be less likely to repeat the same token in the text.
2229  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
2230    values are the bias to add to that token.
2231  - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable
2232    or set to 30.
2233  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred
2234    from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
2235  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
2236    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
2237  
2238  #### to_dict
2239  
2240  ```python
2241  to_dict() -> dict[str, Any]
2242  ```
2243  
2244  Serialize this component to a dictionary.
2245  
2246  **Returns:**
2247  
2248  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
2249  
2250  #### from_dict
2251  
2252  ```python
2253  from_dict(data: dict[str, Any]) -> OpenAIGenerator
2254  ```
2255  
2256  Deserialize this component from a dictionary.
2257  
2258  **Parameters:**
2259  
2260  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
2261  
2262  **Returns:**
2263  
2264  - <code>OpenAIGenerator</code> – The deserialized component instance.
2265  
2266  #### run
2267  
2268  ```python
2269  run(
2270      prompt: str,
2271      system_prompt: str | None = None,
2272      streaming_callback: StreamingCallbackT | None = None,
2273      generation_kwargs: dict[str, Any] | None = None,
2274  ) -> dict[str, list[str] | list[dict[str, Any]]]
2275  ```
2276  
2277  Invoke the text generation inference based on the provided messages and generation parameters.
2278  
2279  **Parameters:**
2280  
2281  - **prompt** (<code>str</code>) – The string prompt to use for text generation.
2282  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If this run time system prompt is omitted, the system
2283    prompt, if defined at initialisation time, is used.
2284  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2285  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will potentially override the parameters
2286    passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to
2287    the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create).
2288  
2289  **Returns:**
2290  
2291  - <code>dict\[str, list\[str\] | list\[dict\[str, Any\]\]\]</code> – A list of strings containing the generated responses and a list of dictionaries containing the metadata
2292    for each response.
2293  
2294  ## openai_dalle
2295  
2296  ### DALLEImageGenerator
2297  
2298  Generates images using OpenAI's DALL-E model.
2299  
2300  For details on OpenAI API parameters, see
2301  [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create).
2302  
2303  ### Usage example
2304  
2305  ```python
2306  from haystack.components.generators import DALLEImageGenerator
2307  image_generator = DALLEImageGenerator()
2308  response = image_generator.run("Show me a picture of a black cat.")
2309  print(response)
2310  ```
2311  
2312  #### __init__
2313  
2314  ```python
2315  __init__(
2316      model: str = "dall-e-3",
2317      quality: Literal["standard", "hd"] = "standard",
2318      size: Literal[
2319          "256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"
2320      ] = "1024x1024",
2321      response_format: Literal["url", "b64_json"] = "url",
2322      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
2323      api_base_url: str | None = None,
2324      organization: str | None = None,
2325      timeout: float | None = None,
2326      max_retries: int | None = None,
2327      http_client_kwargs: dict[str, Any] | None = None,
2328  ) -> None
2329  ```
2330  
2331  Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3.
2332  
2333  **Parameters:**
2334  
2335  - **model** (<code>str</code>) – The model to use for image generation. Can be "dall-e-2" or "dall-e-3".
2336  - **quality** (<code>Literal['standard', 'hd']</code>) – The quality of the generated image. Can be "standard" or "hd".
2337  - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792']</code>) – The size of the generated images.
2338    Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2.
2339    Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models.
2340  - **response_format** (<code>Literal['url', 'b64_json']</code>) – The format of the response. Can be "url" or "b64_json".
2341  - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI.
2342  - **api_base_url** (<code>str | None</code>) – An optional base URL.
2343  - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`.
2344  - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable
2345    or set to 30.
2346  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred
2347    from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
2348  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
2349    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
2350  
2351  #### warm_up
2352  
2353  ```python
2354  warm_up() -> None
2355  ```
2356  
2357  Warm up the OpenAI client.
2358  
2359  #### run
2360  
2361  ```python
2362  run(
2363      prompt: str,
2364      size: (
2365          Literal["256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"]
2366          | None
2367      ) = None,
2368      quality: Literal["standard", "hd"] | None = None,
2369      response_format: Literal["url", "b64_json"] | None = None,
2370  ) -> dict[str, Any]
2371  ```
2372  
2373  Invokes the image generation inference based on the provided prompt and generation parameters.
2374  
2375  **Parameters:**
2376  
2377  - **prompt** (<code>str</code>) – The prompt to generate the image.
2378  - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792'] | None</code>) – If provided, overrides the size provided during initialization.
2379  - **quality** (<code>Literal['standard', 'hd'] | None</code>) – If provided, overrides the quality provided during initialization.
2380  - **response_format** (<code>Literal['url', 'b64_json'] | None</code>) – If provided, overrides the response format provided during initialization.
2381  
2382  **Returns:**
2383  
2384  - <code>dict\[str, Any\]</code> – A dictionary containing the generated list of images and the revised prompt.
2385    Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings.
2386    The revised prompt is the prompt that was used to generate the image, if there was any revision
2387    to the prompt made by OpenAI.
2388  
2389  #### to_dict
2390  
2391  ```python
2392  to_dict() -> dict[str, Any]
2393  ```
2394  
2395  Serialize this component to a dictionary.
2396  
2397  **Returns:**
2398  
2399  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
2400  
2401  #### from_dict
2402  
2403  ```python
2404  from_dict(data: dict[str, Any]) -> DALLEImageGenerator
2405  ```
2406  
2407  Deserialize this component from a dictionary.
2408  
2409  **Parameters:**
2410  
2411  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
2412  
2413  **Returns:**
2414  
2415  - <code>DALLEImageGenerator</code> – The deserialized component instance.
2416  
2417  ## utils
2418  
2419  ### print_streaming_chunk
2420  
2421  ```python
2422  print_streaming_chunk(chunk: StreamingChunk) -> None
2423  ```
2424  
2425  Callback function to handle and display streaming output chunks.
2426  
2427  This function processes a `StreamingChunk` object by:
2428  
2429  - Printing tool call metadata (if any), including function names and arguments, as they arrive.
2430  - Printing tool call results when available.
2431  - Printing the main content (e.g., text tokens) of the chunk as it is received.
2432  
2433  The function outputs data directly to stdout and flushes output buffers to ensure immediate display during
2434  streaming.
2435  
2436  **Parameters:**
2437  
2438  - **chunk** (<code>StreamingChunk</code>) – A chunk of streaming data containing content and optional metadata, such as tool calls and
2439    tool results.