Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.25 / haystack-api / generators_api.md
generators_api.md
   1  ---
   2  title: "Generators"
   3  id: generators-api
   4  description: "Enables text generation using LLMs."
   5  slug: "/generators-api"
   6  ---
   7  
   8  
   9  ## azure
  10  
  11  ### AzureOpenAIGenerator
  12  
  13  Bases: <code>OpenAIGenerator</code>
  14  
  15  Generates text using OpenAI's large language models (LLMs).
  16  
  17  It works with the gpt-4 - type models and supports streaming responses
  18  from OpenAI API.
  19  
  20  You can customize how the text is generated by passing parameters to the
  21  OpenAI API. Use the `**generation_kwargs` argument when you initialize
  22  the component or when you run it. Any parameter that works with
  23  `openai.ChatCompletion.create` will work here too.
  24  
  25  For details on OpenAI API parameters, see
  26  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
  27  
  28  ### Usage example
  29  
  30  ```python
  31  from haystack.components.generators import AzureOpenAIGenerator
  32  from haystack.utils import Secret
  33  client = AzureOpenAIGenerator(
  34      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
  35      api_key=Secret.from_token("<your-api-key>"),
  36      azure_deployment="<this a model name, e.g.  gpt-4.1-mini>")
  37  response = client.run("What's Natural Language Processing? Be brief.")
  38  print(response)
  39  ```
  40  
  41  ```
  42  >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
  43  >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
  44  >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
  45  >> 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
  46  >> 'completion_tokens': 49, 'total_tokens': 65}}]}
  47  ```
  48  
  49  #### __init__
  50  
  51  ```python
  52  __init__(
  53      azure_endpoint: str | None = None,
  54      api_version: str | None = "2024-12-01-preview",
  55      azure_deployment: str | None = "gpt-4.1-mini",
  56      api_key: Secret | None = Secret.from_env_var(
  57          "AZURE_OPENAI_API_KEY", strict=False
  58      ),
  59      azure_ad_token: Secret | None = Secret.from_env_var(
  60          "AZURE_OPENAI_AD_TOKEN", strict=False
  61      ),
  62      organization: str | None = None,
  63      streaming_callback: StreamingCallbackT | None = None,
  64      system_prompt: str | None = None,
  65      timeout: float | None = None,
  66      max_retries: int | None = None,
  67      http_client_kwargs: dict[str, Any] | None = None,
  68      generation_kwargs: dict[str, Any] | None = None,
  69      default_headers: dict[str, str] | None = None,
  70      *,
  71      azure_ad_token_provider: AzureADTokenProvider | None = None
  72  )
  73  ```
  74  
  75  Initialize the Azure OpenAI Generator.
  76  
  77  **Parameters:**
  78  
  79  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`.
  80  - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview.
  81  - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name.
  82  - **api_key** (<code>Secret | None</code>) – The API key to use for authentication.
  83  - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
  84  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
  85    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
  86  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
  87    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
  88    as an argument.
  89  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the Generator
  90    omits the system prompt and uses the default system prompt.
  91  - **timeout** (<code>float | None</code>) – Timeout for AzureOpenAI client. If not set, it is inferred from the
  92    `OPENAI_TIMEOUT` environment variable or set to 30.
  93  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with AzureOpenAI if it returns an internal error.
  94    If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
  95  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
  96    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
  97  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model, sent directly to
  98    the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for
  99    more details.
 100    Some of the supported parameters:
 101  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 102    including visible output tokens and reasoning tokens.
 103  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 104    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 105  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 106    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
 107    comprising the top 10% probability mass are considered.
 108  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 109    the LLM will generate two completions per prompt, resulting in 6 completions total.
 110  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 111  - `presence_penalty`: The penalty applied if a token is already present.
 112    Higher values make the model less likely to repeat the token.
 113  - `frequency_penalty`: Penalty applied if a token has already been generated.
 114    Higher values make the model less likely to repeat the token.
 115  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 116    values are the bias to add to that token.
 117  - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client.
 118  - **azure_ad_token_provider** (<code>AzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on
 119    every request.
 120  
 121  #### to_dict
 122  
 123  ```python
 124  to_dict() -> dict[str, Any]
 125  ```
 126  
 127  Serialize this component to a dictionary.
 128  
 129  **Returns:**
 130  
 131  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 132  
 133  #### from_dict
 134  
 135  ```python
 136  from_dict(data: dict[str, Any]) -> AzureOpenAIGenerator
 137  ```
 138  
 139  Deserialize this component from a dictionary.
 140  
 141  **Parameters:**
 142  
 143  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 144  
 145  **Returns:**
 146  
 147  - <code>AzureOpenAIGenerator</code> – The deserialized component instance.
 148  
 149  ## chat/azure
 150  
 151  ### AzureOpenAIChatGenerator
 152  
 153  Bases: <code>OpenAIChatGenerator</code>
 154  
 155  Generates text using OpenAI's models on Azure.
 156  
 157  It works with the gpt-4 - type models and supports streaming responses
 158  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 159  format in input and output.
 160  
 161  You can customize how the text is generated by passing parameters to the
 162  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 163  the component or when you run it. Any parameter that works with
 164  `openai.ChatCompletion.create` will work here too.
 165  
 166  For details on OpenAI API parameters, see
 167  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 168  
 169  ### Usage example
 170  
 171  ```python
 172  from haystack.components.generators.chat import AzureOpenAIChatGenerator
 173  from haystack.dataclasses import ChatMessage
 174  from haystack.utils import Secret
 175  
 176  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 177  
 178  client = AzureOpenAIChatGenerator(
 179      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
 180      api_key=Secret.from_token("<your-api-key>"),
 181      azure_deployment="<this a model name, e.g. gpt-4.1-mini>")
 182  response = client.run(messages)
 183  print(response)
 184  ```
 185  
 186  ```
 187  {'replies':
 188      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
 189      "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
 190       enabling computers to understand, interpret, and generate human language in a way that is useful.")],
 191       _name=None,
 192       _meta={'model': 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop',
 193       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})]
 194  }
 195  ```
 196  
 197  #### __init__
 198  
 199  ```python
 200  __init__(
 201      azure_endpoint: str | None = None,
 202      api_version: str | None = "2024-12-01-preview",
 203      azure_deployment: str | None = "gpt-4.1-mini",
 204      api_key: Secret | None = Secret.from_env_var(
 205          "AZURE_OPENAI_API_KEY", strict=False
 206      ),
 207      azure_ad_token: Secret | None = Secret.from_env_var(
 208          "AZURE_OPENAI_AD_TOKEN", strict=False
 209      ),
 210      organization: str | None = None,
 211      streaming_callback: StreamingCallbackT | None = None,
 212      timeout: float | None = None,
 213      max_retries: int | None = None,
 214      generation_kwargs: dict[str, Any] | None = None,
 215      default_headers: dict[str, str] | None = None,
 216      tools: ToolsType | None = None,
 217      tools_strict: bool = False,
 218      *,
 219      azure_ad_token_provider: (
 220          AzureADTokenProvider | AsyncAzureADTokenProvider | None
 221      ) = None,
 222      http_client_kwargs: dict[str, Any] | None = None
 223  )
 224  ```
 225  
 226  Initialize the Azure OpenAI Chat Generator component.
 227  
 228  **Parameters:**
 229  
 230  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
 231  - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview.
 232  - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name.
 233  - **api_key** (<code>Secret | None</code>) – The API key to use for authentication.
 234  - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
 235  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
 236    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
 237  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
 238    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
 239    as an argument.
 240  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
 241    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 242  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
 243    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 244  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to
 245    the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 246    Some of the supported parameters:
 247  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 248    including visible output tokens and reasoning tokens.
 249  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 250    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 251  - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers
 252    tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising
 253    the top 10% probability mass are considered.
 254  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 255    the LLM will generate two completions per prompt, resulting in 6 completions total.
 256  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 257  - `presence_penalty`: The penalty applied if a token is already present.
 258    Higher values make the model less likely to repeat the token.
 259  - `frequency_penalty`: Penalty applied if a token has already been generated.
 260    Higher values make the model less likely to repeat the token.
 261  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 262    values are the bias to add to that token.
 263  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
 264    If provided, the output will always be validated against this
 265    format (unless the model returns a tool call).
 266    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
 267    Notes:
 268    - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
 269      Older models only support basic version of structured outputs through `{"type": "json_object"}`.
 270      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
 271    - For structured outputs with streaming,
 272      the `response_format` must be a JSON schema and not a Pydantic model.
 273  - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client.
 274  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 275  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 276    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 277  - **azure_ad_token_provider** (<code>AzureADTokenProvider | AsyncAzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on
 278    every request.
 279  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 280    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
 281  
 282  #### warm_up
 283  
 284  ```python
 285  warm_up()
 286  ```
 287  
 288  Warm up the Azure OpenAI chat generator.
 289  
 290  This will warm up the tools registered in the chat generator.
 291  This method is idempotent and will only warm up the tools once.
 292  
 293  #### to_dict
 294  
 295  ```python
 296  to_dict() -> dict[str, Any]
 297  ```
 298  
 299  Serialize this component to a dictionary.
 300  
 301  **Returns:**
 302  
 303  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 304  
 305  #### from_dict
 306  
 307  ```python
 308  from_dict(data: dict[str, Any]) -> AzureOpenAIChatGenerator
 309  ```
 310  
 311  Deserialize this component from a dictionary.
 312  
 313  **Parameters:**
 314  
 315  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 316  
 317  **Returns:**
 318  
 319  - <code>AzureOpenAIChatGenerator</code> – The deserialized component instance.
 320  
 321  ## chat/azure_responses
 322  
 323  ### AzureOpenAIResponsesChatGenerator
 324  
 325  Bases: <code>OpenAIResponsesChatGenerator</code>
 326  
 327  Completes chats using OpenAI's Responses API on Azure.
 328  
 329  It works with the gpt-5 and o-series models and supports streaming responses
 330  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 331  format in input and output.
 332  
 333  You can customize how the text is generated by passing parameters to the
 334  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 335  the component or when you run it. Any parameter that works with
 336  `openai.Responses.create` will work here too.
 337  
 338  For details on OpenAI API parameters, see
 339  [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses).
 340  
 341  ### Usage example
 342  
 343  ```python
 344  from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator
 345  from haystack.dataclasses import ChatMessage
 346  
 347  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 348  
 349  client = AzureOpenAIResponsesChatGenerator(
 350      azure_endpoint="https://example-resource.azure.openai.com/",
 351      generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}
 352  )
 353  response = client.run(messages)
 354  print(response)
 355  ```
 356  
 357  #### __init__
 358  
 359  ```python
 360  __init__(
 361      *,
 362      api_key: (
 363          Secret | Callable[[], str] | Callable[[], Awaitable[str]]
 364      ) = Secret.from_env_var("AZURE_OPENAI_API_KEY", strict=False),
 365      azure_endpoint: str | None = None,
 366      azure_deployment: str = "gpt-5-mini",
 367      streaming_callback: StreamingCallbackT | None = None,
 368      organization: str | None = None,
 369      generation_kwargs: dict[str, Any] | None = None,
 370      timeout: float | None = None,
 371      max_retries: int | None = None,
 372      tools: ToolsType | None = None,
 373      tools_strict: bool = False,
 374      http_client_kwargs: dict[str, Any] | None = None
 375  )
 376  ```
 377  
 378  Initialize the AzureOpenAIResponsesChatGenerator component.
 379  
 380  **Parameters:**
 381  
 382  - **api_key** (<code>Secret | Callable\[[], str\] | Callable\[[], Awaitable\[str\]\]</code>) – The API key to use for authentication. Can be:
 383  - A `Secret` object containing the API key.
 384  - A `Secret` object containing the [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
 385  - A function that returns an Azure Active Directory token.
 386  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
 387  - **azure_deployment** (<code>str</code>) – The deployment of the model, usually the model name.
 388  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
 389    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
 390  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
 391    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
 392    as an argument.
 393  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
 394    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 395  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
 396    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 397  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent
 398    directly to the OpenAI endpoint.
 399    See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for
 400    more details.
 401    Some of the supported parameters:
 402  - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random,
 403    while lower values like 0.2 will make it more focused and deterministic.
 404  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 405    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
 406    comprising the top 10% probability mass are considered.
 407  - `previous_response_id`: The ID of the previous response.
 408    Use this to create multi-turn conversations.
 409  - `text_format`: A Pydantic model that enforces the structure of the model's response.
 410    If provided, the output will always be validated against this
 411    format (unless the model returns a tool call).
 412    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
 413  - `text`: A JSON schema that enforces the structure of the model's response.
 414    If provided, the output will always be validated against this
 415    format (unless the model returns a tool call).
 416    Notes:
 417    - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
 418    - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored.
 419    - Currently, this component doesn't support streaming for structured outputs.
 420    - Older models only support basic version of structured outputs through `{"type": "json_object"}`.
 421      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
 422  - `reasoning`: A dictionary of parameters for reasoning. For example:
 423    - `summary`: The summary of the reasoning.
 424    - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`.
 425    - `generate_summary`: Whether to generate a summary of the reasoning.
 426      Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled.
 427      For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning).
 428  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 429  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 430    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 431  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 432    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
 433  
 434  #### to_dict
 435  
 436  ```python
 437  to_dict() -> dict[str, Any]
 438  ```
 439  
 440  Serialize this component to a dictionary.
 441  
 442  **Returns:**
 443  
 444  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 445  
 446  #### from_dict
 447  
 448  ```python
 449  from_dict(data: dict[str, Any]) -> AzureOpenAIResponsesChatGenerator
 450  ```
 451  
 452  Deserialize this component from a dictionary.
 453  
 454  **Parameters:**
 455  
 456  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 457  
 458  **Returns:**
 459  
 460  - <code>AzureOpenAIResponsesChatGenerator</code> – The deserialized component instance.
 461  
 462  ## chat/fallback
 463  
 464  ### FallbackChatGenerator
 465  
 466  A chat generator wrapper that tries multiple chat generators sequentially.
 467  
 468  It forwards all parameters transparently to the underlying chat generators and returns the first successful result.
 469  Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator.
 470  If all chat generators fail, it raises a RuntimeError with details.
 471  
 472  Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only
 473  work correctly if the underlying chat generators implement proper timeout handling and raise exceptions
 474  when timeouts occur. For predictable latency guarantees, ensure your chat generators:
 475  
 476  - Support a `timeout` parameter in their initialization
 477  - Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming)
 478  - Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded
 479  
 480  Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters
 481  with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`)
 482  typically applies to all connection phases: connection setup, read, write, and pool. For streaming
 483  responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for
 484  receiving the complete response.
 485  
 486  Failover is automatically triggered when a generator raises any exception, including:
 487  
 488  - Timeout errors (if the generator implements and raises them)
 489  - Rate limit errors (429)
 490  - Authentication errors (401)
 491  - Context length errors (400)
 492  - Server errors (500+)
 493  - Any other exception
 494  
 495  #### __init__
 496  
 497  ```python
 498  __init__(chat_generators: list[ChatGenerator]) -> None
 499  ```
 500  
 501  Creates an instance of FallbackChatGenerator.
 502  
 503  **Parameters:**
 504  
 505  - **chat_generators** (<code>list\[ChatGenerator\]</code>) – A non-empty list of chat generator components to try in order.
 506  
 507  #### to_dict
 508  
 509  ```python
 510  to_dict() -> dict[str, Any]
 511  ```
 512  
 513  Serialize the component, including nested chat generators when they support serialization.
 514  
 515  #### from_dict
 516  
 517  ```python
 518  from_dict(data: dict[str, Any]) -> FallbackChatGenerator
 519  ```
 520  
 521  Rebuild the component from a serialized representation, restoring nested chat generators.
 522  
 523  #### warm_up
 524  
 525  ```python
 526  warm_up() -> None
 527  ```
 528  
 529  Warm up all underlying chat generators.
 530  
 531  This method calls warm_up() on each underlying generator that supports it.
 532  
 533  #### run
 534  
 535  ```python
 536  run(
 537      messages: list[ChatMessage],
 538      generation_kwargs: dict[str, Any] | None = None,
 539      tools: ToolsType | None = None,
 540      streaming_callback: StreamingCallbackT | None = None,
 541  ) -> dict[str, list[ChatMessage] | dict[str, Any]]
 542  ```
 543  
 544  Execute chat generators sequentially until one succeeds.
 545  
 546  **Parameters:**
 547  
 548  - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances.
 549  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens).
 550  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
 551  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses.
 552  
 553  **Returns:**
 554  
 555  - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with:
 556  - "replies": Generated ChatMessage instances from the first successful generator.
 557  - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
 558    total_attempts, failed_chat_generators, plus any metadata from the successful generator.
 559  
 560  **Raises:**
 561  
 562  - <code>RuntimeError</code> – If all chat generators fail.
 563  
 564  #### run_async
 565  
 566  ```python
 567  run_async(
 568      messages: list[ChatMessage],
 569      generation_kwargs: dict[str, Any] | None = None,
 570      tools: ToolsType | None = None,
 571      streaming_callback: StreamingCallbackT | None = None,
 572  ) -> dict[str, list[ChatMessage] | dict[str, Any]]
 573  ```
 574  
 575  Asynchronously execute chat generators sequentially until one succeeds.
 576  
 577  **Parameters:**
 578  
 579  - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances.
 580  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens).
 581  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
 582  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses.
 583  
 584  **Returns:**
 585  
 586  - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with:
 587  - "replies": Generated ChatMessage instances from the first successful generator.
 588  - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
 589    total_attempts, failed_chat_generators, plus any metadata from the successful generator.
 590  
 591  **Raises:**
 592  
 593  - <code>RuntimeError</code> – If all chat generators fail.
 594  
 595  ## chat/hugging_face_api
 596  
 597  ### HuggingFaceAPIChatGenerator
 598  
 599  Completes chats using Hugging Face APIs.
 600  
 601  HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 602  format for input and output. Use it to generate text with Hugging Face APIs:
 603  
 604  - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers)
 605  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
 606  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
 607  
 608  ### Usage examples
 609  
 610  #### With the serverless inference API (Inference Providers) - free tier available
 611  
 612  ```python
 613  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 614  from haystack.dataclasses import ChatMessage
 615  from haystack.utils import Secret
 616  from haystack.utils.hf import HFGenerationAPIType
 617  
 618  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 619              ChatMessage.from_user("What's Natural Language Processing?")]
 620  
 621  # the api_type can be expressed using the HFGenerationAPIType enum or as a string
 622  api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API
 623  api_type = "serverless_inference_api" # this is equivalent to the above
 624  
 625  generator = HuggingFaceAPIChatGenerator(api_type=api_type,
 626                                          api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
 627                                                      "provider": "together"},
 628                                          token=Secret.from_token("<your-api-key>"))
 629  
 630  result = generator.run(messages)
 631  print(result)
 632  ```
 633  
 634  #### With the serverless inference API (Inference Providers) and text+image input
 635  
 636  ```python
 637  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 638  from haystack.dataclasses import ChatMessage, ImageContent
 639  from haystack.utils import Secret
 640  from haystack.utils.hf import HFGenerationAPIType
 641  
 642  # Create an image from file path, URL, or base64
 643  image = ImageContent.from_file_path("path/to/your/image.jpg")
 644  
 645  # Create a multimodal message with both text and image
 646  messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])]
 647  
 648  generator = HuggingFaceAPIChatGenerator(
 649      api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
 650      api_params={
 651          "model": "Qwen/Qwen2.5-VL-7B-Instruct",  # Vision Language Model
 652          "provider": "hyperbolic"
 653      },
 654      token=Secret.from_token("<your-api-key>")
 655  )
 656  
 657  result = generator.run(messages)
 658  print(result)
 659  ```
 660  
 661  #### With paid inference endpoints
 662  
 663  ````python
 664  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 665  from haystack.dataclasses import ChatMessage
 666  from haystack.utils import Secret
 667  
 668  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 669              ChatMessage.from_user("What's Natural Language Processing?")]
 670  
 671  generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints",
 672                                          api_params={"url": "<your-inference-endpoint-url>"},
 673                                          token=Secret.from_token("<your-api-key>"))
 674  
 675  result = generator.run(messages)
 676  print(result)
 677  
 678  #### With self-hosted text generation inference
 679  
 680  ```python
 681  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 682  from haystack.dataclasses import ChatMessage
 683  
 684  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 685              ChatMessage.from_user("What's Natural Language Processing?")]
 686  
 687  generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference",
 688                                          api_params={"url": "http://localhost:8080"})
 689  
 690  result = generator.run(messages)
 691  print(result)
 692  ````
 693  
 694  #### __init__
 695  
 696  ```python
 697  __init__(
 698      api_type: HFGenerationAPIType | str,
 699      api_params: dict[str, str],
 700      token: Secret | None = Secret.from_env_var(
 701          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
 702      ),
 703      generation_kwargs: dict[str, Any] | None = None,
 704      stop_words: list[str] | None = None,
 705      streaming_callback: StreamingCallbackT | None = None,
 706      tools: ToolsType | None = None,
 707  )
 708  ```
 709  
 710  Initialize the HuggingFaceAPIChatGenerator instance.
 711  
 712  **Parameters:**
 713  
 714  - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types:
 715  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
 716  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
 717  - `serverless_inference_api`: See
 718    [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers).
 719  - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys:
 720  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
 721  - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`.
 722  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
 723    `TEXT_GENERATION_INFERENCE`.
 724  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc.
 725  - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization.
 726    Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
 727  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
 728    Some examples: `max_tokens`, `temperature`, `top_p`.
 729    For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
 730  - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words.
 731  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
 732  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 733    The chosen model should support tool/function calling, according to the model card.
 734    Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience
 735    unexpected behavior.
 736  
 737  #### warm_up
 738  
 739  ```python
 740  warm_up()
 741  ```
 742  
 743  Warm up the Hugging Face API chat generator.
 744  
 745  This will warm up the tools registered in the chat generator.
 746  This method is idempotent and will only warm up the tools once.
 747  
 748  #### to_dict
 749  
 750  ```python
 751  to_dict() -> dict[str, Any]
 752  ```
 753  
 754  Serialize this component to a dictionary.
 755  
 756  **Returns:**
 757  
 758  - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component.
 759  
 760  #### from_dict
 761  
 762  ```python
 763  from_dict(data: dict[str, Any]) -> HuggingFaceAPIChatGenerator
 764  ```
 765  
 766  Deserialize this component from a dictionary.
 767  
 768  #### run
 769  
 770  ```python
 771  run(
 772      messages: list[ChatMessage],
 773      generation_kwargs: dict[str, Any] | None = None,
 774      tools: ToolsType | None = None,
 775      streaming_callback: StreamingCallbackT | None = None,
 776  ) -> dict[str, list[ChatMessage]]
 777  ```
 778  
 779  Invoke the text generation inference based on the provided messages and generation parameters.
 780  
 781  **Parameters:**
 782  
 783  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
 784  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
 785  - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override
 786    the `tools` parameter set during component initialization. This parameter can accept either a
 787    list of `Tool` objects or a `Toolset` instance.
 788  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
 789    parameter set during component initialization.
 790  
 791  **Returns:**
 792  
 793  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
 794  - `replies`: A list containing the generated responses as ChatMessage objects.
 795  
 796  #### run_async
 797  
 798  ```python
 799  run_async(
 800      messages: list[ChatMessage],
 801      generation_kwargs: dict[str, Any] | None = None,
 802      tools: ToolsType | None = None,
 803      streaming_callback: StreamingCallbackT | None = None,
 804  ) -> dict[str, list[ChatMessage]]
 805  ```
 806  
 807  Asynchronously invokes the text generation inference based on the provided messages and generation parameters.
 808  
 809  This is the asynchronous version of the `run` method. It has the same parameters
 810  and return values but can be used with `await` in an async code.
 811  
 812  **Parameters:**
 813  
 814  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
 815  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
 816  - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools`
 817    parameter set during component initialization. This parameter can accept either a list of `Tool` objects
 818    or a `Toolset` instance.
 819  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
 820    parameter set during component initialization.
 821  
 822  **Returns:**
 823  
 824  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
 825  - `replies`: A list containing the generated responses as ChatMessage objects.
 826  
 827  ## chat/hugging_face_local
 828  
 829  ### default_tool_parser
 830  
 831  ```python
 832  default_tool_parser(text: str) -> list[ToolCall] | None
 833  ```
 834  
 835  Default implementation for parsing tool calls from model output text.
 836  
 837  Uses DEFAULT_TOOL_PATTERN to extract tool calls.
 838  
 839  **Parameters:**
 840  
 841  - **text** (<code>str</code>) – The text to parse for tool calls.
 842  
 843  **Returns:**
 844  
 845  - <code>list\[ToolCall\] | None</code> – A list containing a single ToolCall if a valid tool call is found, None otherwise.
 846  
 847  ### HuggingFaceLocalChatGenerator
 848  
 849  Generates chat responses using models from Hugging Face that run locally.
 850  
 851  Use this component with chat-based models,
 852  such as `Qwen/Qwen3-0.6B` or `meta-llama/Llama-2-7b-chat-hf`.
 853  LLMs running locally may need powerful hardware.
 854  
 855  ### Usage example
 856  
 857  ```python
 858  from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
 859  from haystack.dataclasses import ChatMessage
 860  
 861  generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B")
 862  messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
 863  print(generator.run(messages))
 864  ```
 865  
 866  ```
 867  {'replies':
 868      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
 869      "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals
 870      with the interaction between computers and human language. It enables computers to understand, interpret, and
 871      generate human language in a valuable way. NLP involves various techniques such as speech recognition, text
 872      analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to
 873      process and derive meaning from human language, improving communication between humans and machines.")],
 874      _name=None,
 875      _meta={'finish_reason': 'stop', 'index': 0, 'model':
 876            'mistralai/Mistral-7B-Instruct-v0.2',
 877            'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}})
 878            ]
 879  }
 880  ```
 881  
 882  #### __init__
 883  
 884  ```python
 885  __init__(
 886      model: str = "Qwen/Qwen3-0.6B",
 887      task: Literal["text-generation", "text2text-generation"] | None = None,
 888      device: ComponentDevice | None = None,
 889      token: Secret | None = Secret.from_env_var(
 890          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
 891      ),
 892      chat_template: str | None = None,
 893      generation_kwargs: dict[str, Any] | None = None,
 894      huggingface_pipeline_kwargs: dict[str, Any] | None = None,
 895      stop_words: list[str] | None = None,
 896      streaming_callback: StreamingCallbackT | None = None,
 897      tools: ToolsType | None = None,
 898      tool_parsing_function: Callable[[str], list[ToolCall] | None] | None = None,
 899      async_executor: ThreadPoolExecutor | None = None,
 900      *,
 901      enable_thinking: bool = False
 902  ) -> None
 903  ```
 904  
 905  Initializes the HuggingFaceLocalChatGenerator component.
 906  
 907  **Parameters:**
 908  
 909  - **model** (<code>str</code>) – The Hugging Face text generation model name or path,
 910    for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`.
 911    The model must be a chat model supporting the ChatML messaging
 912    format.
 913    If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
 914  - **task** (<code>Literal['text-generation', 'text2text-generation'] | None</code>) – The task for the Hugging Face pipeline. Possible options:
 915  - `text-generation`: Supported by decoder models, like GPT.
 916  - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead.
 917    Previously supported by encoder–decoder models such as T5.
 918    If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
 919    If not specified, the component calls the Hugging Face API to infer the task from the model name.
 920  - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device.
 921    If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
 922  - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files.
 923    If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
 924  - **chat_template** (<code>str | None</code>) – Specifies an optional Jinja template for formatting chat
 925    messages. Most high-quality chat models have their own templates, but for models without this
 926    feature or if you prefer a custom template, use this parameter.
 927  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
 928    Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
 929    See Hugging Face's documentation for more information:
 930  - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
 931  - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
 932      The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens.
 933  - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the
 934    Hugging Face pipeline for text generation.
 935    These keyword arguments provide fine-grained control over the Hugging Face pipeline.
 936    In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
 937    For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
 938    In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
 939  - **stop_words** (<code>list\[str\] | None</code>) – A list of stop words. If the model generates a stop word, the generation stops.
 940    If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
 941    For some chat models, the output includes both the new text and the original prompt.
 942    In these cases, make sure your prompt has no stop words.
 943  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
 944  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 945  - **tool_parsing_function** (<code>Callable\\[[str\], list\[ToolCall\] | None\] | None</code>) – A callable that takes a string and returns a list of ToolCall objects or None.
 946    If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern.
 947  - **async_executor** (<code>ThreadPoolExecutor | None</code>) – Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be
 948    initialized and used
 949  - **enable_thinking** (<code>bool</code>) – Whether to enable thinking mode in the chat template for thinking-capable models.
 950    When enabled, the model generates intermediate reasoning before the final response. Defaults to False.
 951  
 952  #### shutdown
 953  
 954  ```python
 955  shutdown() -> None
 956  ```
 957  
 958  Explicitly shutdown the executor if we own it.
 959  
 960  #### warm_up
 961  
 962  ```python
 963  warm_up() -> None
 964  ```
 965  
 966  Initializes the component and warms up tools if provided.
 967  
 968  #### to_dict
 969  
 970  ```python
 971  to_dict() -> dict[str, Any]
 972  ```
 973  
 974  Serializes the component to a dictionary.
 975  
 976  **Returns:**
 977  
 978  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 979  
 980  #### from_dict
 981  
 982  ```python
 983  from_dict(data: dict[str, Any]) -> HuggingFaceLocalChatGenerator
 984  ```
 985  
 986  Deserializes the component from a dictionary.
 987  
 988  **Parameters:**
 989  
 990  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
 991  
 992  **Returns:**
 993  
 994  - <code>HuggingFaceLocalChatGenerator</code> – The deserialized component.
 995  
 996  #### run
 997  
 998  ```python
 999  run(
1000      messages: list[ChatMessage],
1001      generation_kwargs: dict[str, Any] | None = None,
1002      streaming_callback: StreamingCallbackT | None = None,
1003      tools: ToolsType | None = None,
1004  ) -> dict[str, list[ChatMessage]]
1005  ```
1006  
1007  Invoke text generation inference based on the provided messages and generation parameters.
1008  
1009  **Parameters:**
1010  
1011  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
1012  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
1013  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1014  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1015    If set, it will override the `tools` parameter provided during initialization.
1016  
1017  **Returns:**
1018  
1019  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
1020  - `replies`: A list containing the generated responses as ChatMessage instances.
1021  
1022  #### create_message
1023  
1024  ```python
1025  create_message(
1026      text: str,
1027      index: int,
1028      tokenizer: Union[PreTrainedTokenizer, PreTrainedTokenizerFast],
1029      prompt: str,
1030      generation_kwargs: dict[str, Any],
1031      parse_tool_calls: bool = False,
1032  ) -> ChatMessage
1033  ```
1034  
1035  Create a ChatMessage instance from the provided text, populated with metadata.
1036  
1037  **Parameters:**
1038  
1039  - **text** (<code>str</code>) – The generated text.
1040  - **index** (<code>int</code>) – The index of the generated text.
1041  - **tokenizer** (<code>Union\[PreTrainedTokenizer, PreTrainedTokenizerFast\]</code>) – The tokenizer used for generation.
1042  - **prompt** (<code>str</code>) – The prompt used for generation.
1043  - **generation_kwargs** (<code>dict\[str, Any\]</code>) – The generation parameters.
1044  - **parse_tool_calls** (<code>bool</code>) – Whether to attempt parsing tool calls from the text.
1045  
1046  **Returns:**
1047  
1048  - <code>ChatMessage</code> – A ChatMessage instance.
1049  
1050  #### run_async
1051  
1052  ```python
1053  run_async(
1054      messages: list[ChatMessage],
1055      generation_kwargs: dict[str, Any] | None = None,
1056      streaming_callback: StreamingCallbackT | None = None,
1057      tools: ToolsType | None = None,
1058  ) -> dict[str, list[ChatMessage]]
1059  ```
1060  
1061  Asynchronously invokes text generation inference based on the provided messages and generation parameters.
1062  
1063  This is the asynchronous version of the `run` method. It has the same parameters
1064  and return values but can be used with `await` in an async code.
1065  
1066  **Parameters:**
1067  
1068  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
1069  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
1070  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1071  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1072    If set, it will override the `tools` parameter provided during initialization.
1073  
1074  **Returns:**
1075  
1076  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
1077  - `replies`: A list containing the generated responses as ChatMessage instances.
1078  
1079  ## chat/llm
1080  
1081  ### LLM
1082  
1083  Bases: <code>Agent</code>
1084  
1085  A text generation component powered by a large language model.
1086  
1087  The LLM component is a simplified version of the Agent that focuses solely on text generation
1088  without tool usage. It processes messages and returns a single response from the language model.
1089  
1090  ### Usage examples
1091  
1092  ```python
1093  from haystack.components.generators.chat import LLM
1094  from haystack.components.generators.chat import OpenAIChatGenerator
1095  from haystack.dataclasses import ChatMessage
1096  
1097  llm = LLM(
1098      chat_generator=OpenAIChatGenerator(),
1099      system_prompt="You are a helpful translation assistant.",
1100      user_prompt="""{% message role="user"%}
1101  Summarize the following document: {{ document }}
1102  {% endmessage %}""",
1103      required_variables=["document"],
1104  )
1105  
1106  result = llm.run(document="The weather is lovely today and the sun is shining. ")
1107  print(result["last_message"].text)
1108  ```
1109  
1110  #### __init__
1111  
1112  ```python
1113  __init__(
1114      *,
1115      chat_generator: ChatGenerator,
1116      system_prompt: str | None = None,
1117      user_prompt: str | None = None,
1118      required_variables: list[str] | Literal["*"] | None = None,
1119      streaming_callback: StreamingCallbackT | None = None
1120  ) -> None
1121  ```
1122  
1123  Initialize the LLM component.
1124  
1125  **Parameters:**
1126  
1127  - **chat_generator** (<code>ChatGenerator</code>) – An instance of the chat generator that the LLM should use.
1128  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM.
1129  - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided this is appended to the messages provided at runtime.
1130  - **required_variables** (<code>list\[str\] | Literal['\*'] | None</code>) – List variables that must be provided as input to user_prompt.
1131    If a variable listed as required is not provided, an exception is raised.
1132    If set to `"*"`, all variables found in the prompt are required. Optional.
1133  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM.
1134  
1135  #### to_dict
1136  
1137  ```python
1138  to_dict() -> dict[str, Any]
1139  ```
1140  
1141  Serialize the LLM component to a dictionary.
1142  
1143  **Returns:**
1144  
1145  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1146  
1147  #### from_dict
1148  
1149  ```python
1150  from_dict(data: dict[str, Any]) -> LLM
1151  ```
1152  
1153  Deserialize the LLM from a dictionary.
1154  
1155  **Parameters:**
1156  
1157  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
1158  
1159  **Returns:**
1160  
1161  - <code>LLM</code> – Deserialized LLM instance.
1162  
1163  #### run
1164  
1165  ```python
1166  run(
1167      messages: list[ChatMessage] | None = None,
1168      streaming_callback: StreamingCallbackT | None = None,
1169      *,
1170      generation_kwargs: dict[str, Any] | None = None,
1171      system_prompt: str | None = None,
1172      user_prompt: str | None = None,
1173      **kwargs: Any
1174  ) -> dict[str, Any]
1175  ```
1176  
1177  Process messages and generate a response from the language model.
1178  
1179  **Parameters:**
1180  
1181  - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process.
1182  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM.
1183  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters
1184    will override the parameters passed during component initialization.
1185  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt.
1186  - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is
1187    appended to the messages provided at runtime.
1188  - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt`
1189    (the keys must match template variable names).
1190  
1191  **Returns:**
1192  
1193  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
1194  - "messages": List of all messages exchanged during the LLM's run.
1195  - "last_message": The last message exchanged during the LLM's run.
1196  
1197  #### run_async
1198  
1199  ```python
1200  run_async(
1201      messages: list[ChatMessage] | None = None,
1202      streaming_callback: StreamingCallbackT | None = None,
1203      *,
1204      generation_kwargs: dict[str, Any] | None = None,
1205      system_prompt: str | None = None,
1206      user_prompt: str | None = None,
1207      **kwargs: Any
1208  ) -> dict[str, Any]
1209  ```
1210  
1211  Asynchronously process messages and generate a response from the language model.
1212  
1213  **Parameters:**
1214  
1215  - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process.
1216  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An asynchronous callback that will be invoked when a response is streamed
1217    from the LLM.
1218  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters
1219    will override the parameters passed during component initialization.
1220  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt.
1221  - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is
1222    appended to the messages provided at runtime.
1223  - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt`
1224    (the keys must match template variable names).
1225  
1226  **Returns:**
1227  
1228  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
1229  - "messages": List of all messages exchanged during the LLM's run.
1230  - "last_message": The last message exchanged during the LLM's run.
1231  
1232  ## chat/openai
1233  
1234  ### OpenAIChatGenerator
1235  
1236  Completes chats using OpenAI's large language models (LLMs).
1237  
1238  It works with the gpt-4 and gpt-5 series models and supports streaming responses
1239  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1240  format in input and output.
1241  
1242  You can customize how the text is generated by passing parameters to the
1243  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1244  the component or when you run it. Any parameter that works with
1245  `openai.ChatCompletion.create` will work here too.
1246  
1247  For details on OpenAI API parameters, see
1248  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
1249  
1250  ### Usage example
1251  
1252  ```python
1253  from haystack.components.generators.chat import OpenAIChatGenerator
1254  from haystack.dataclasses import ChatMessage
1255  
1256  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1257  
1258  client = OpenAIChatGenerator()
1259  response = client.run(messages)
1260  print(response)
1261  ```
1262  
1263  Output:
1264  
1265  ```
1266  {'replies':
1267      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=
1268      [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence
1269          that focuses on enabling computers to understand, interpret, and generate human language in
1270          a way that is meaningful and useful.")],
1271       _name=None,
1272       _meta={'model': 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop',
1273       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})
1274      ]
1275  }
1276  ```
1277  
1278  #### __init__
1279  
1280  ```python
1281  __init__(
1282      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
1283      model: str = "gpt-5-mini",
1284      streaming_callback: StreamingCallbackT | None = None,
1285      api_base_url: str | None = None,
1286      organization: str | None = None,
1287      generation_kwargs: dict[str, Any] | None = None,
1288      timeout: float | None = None,
1289      max_retries: int | None = None,
1290      tools: ToolsType | None = None,
1291      tools_strict: bool = False,
1292      http_client_kwargs: dict[str, Any] | None = None,
1293  )
1294  ```
1295  
1296  Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini
1297  
1298  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
1299  environment variables to override the `timeout` and `max_retries` parameters respectively
1300  in the OpenAI client.
1301  
1302  **Parameters:**
1303  
1304  - **api_key** (<code>Secret</code>) – The OpenAI API key.
1305    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
1306    during initialization.
1307  - **model** (<code>str</code>) – The name of the model to use.
1308  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1309    The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1310    as an argument.
1311  - **api_base_url** (<code>str | None</code>) – An optional base URL.
1312  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See
1313    [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1314  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to
1315    the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
1316    more details.
1317    Some of the supported parameters:
1318  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
1319    including visible output tokens and reasoning tokens.
1320  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
1321    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
1322  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1323    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1324    comprising the top 10% probability mass are considered.
1325  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
1326    it will generate two completions for each of the three prompts, ending up with 6 completions in total.
1327  - `stop`: One or more sequences after which the LLM should stop generating tokens.
1328  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
1329    the model will be less likely to repeat the same token in the text.
1330  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
1331    Bigger values mean the model will be less likely to repeat the same token in the text.
1332  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
1333    values are the bias to add to that token.
1334  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
1335    If provided, the output will always be validated against this
1336    format (unless the model returns a tool call).
1337    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1338    Notes:
1339    - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
1340      Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1341      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1342    - For structured outputs with streaming,
1343      the `response_format` must be a JSON schema and not a Pydantic model.
1344  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
1345    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1346  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
1347    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1348  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1349  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1350    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1351  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1352    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
1353  
1354  #### warm_up
1355  
1356  ```python
1357  warm_up()
1358  ```
1359  
1360  Warm up the OpenAI chat generator.
1361  
1362  This will warm up the tools registered in the chat generator.
1363  This method is idempotent and will only warm up the tools once.
1364  
1365  #### to_dict
1366  
1367  ```python
1368  to_dict() -> dict[str, Any]
1369  ```
1370  
1371  Serialize this component to a dictionary.
1372  
1373  **Returns:**
1374  
1375  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
1376  
1377  #### from_dict
1378  
1379  ```python
1380  from_dict(data: dict[str, Any]) -> OpenAIChatGenerator
1381  ```
1382  
1383  Deserialize this component from a dictionary.
1384  
1385  **Parameters:**
1386  
1387  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
1388  
1389  **Returns:**
1390  
1391  - <code>OpenAIChatGenerator</code> – The deserialized component instance.
1392  
1393  #### run
1394  
1395  ```python
1396  run(
1397      messages: list[ChatMessage],
1398      streaming_callback: StreamingCallbackT | None = None,
1399      generation_kwargs: dict[str, Any] | None = None,
1400      *,
1401      tools: ToolsType | None = None,
1402      tools_strict: bool | None = None
1403  ) -> dict[str, list[ChatMessage]]
1404  ```
1405  
1406  Invokes chat completion based on the provided messages and generation parameters.
1407  
1408  **Parameters:**
1409  
1410  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1411  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1412  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1413    override the parameters passed during component initialization.
1414    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1415  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1416    If set, it will override the `tools` parameter provided during initialization.
1417  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1418    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1419    If set, it will override the `tools_strict` parameter set during component initialization.
1420  
1421  **Returns:**
1422  
1423  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1424  - `replies`: A list containing the generated responses as ChatMessage instances.
1425  
1426  #### run_async
1427  
1428  ```python
1429  run_async(
1430      messages: list[ChatMessage],
1431      streaming_callback: StreamingCallbackT | None = None,
1432      generation_kwargs: dict[str, Any] | None = None,
1433      *,
1434      tools: ToolsType | None = None,
1435      tools_strict: bool | None = None
1436  ) -> dict[str, list[ChatMessage]]
1437  ```
1438  
1439  Asynchronously invokes chat completion based on the provided messages and generation parameters.
1440  
1441  This is the asynchronous version of the `run` method. It has the same parameters and return values
1442  but can be used with `await` in async code.
1443  
1444  **Parameters:**
1445  
1446  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1447  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1448    Must be a coroutine.
1449  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1450    override the parameters passed during component initialization.
1451    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1452  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1453    If set, it will override the `tools` parameter provided during initialization.
1454  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1455    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1456    If set, it will override the `tools_strict` parameter set during component initialization.
1457  
1458  **Returns:**
1459  
1460  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1461  - `replies`: A list containing the generated responses as ChatMessage instances.
1462  
1463  ## chat/openai_responses
1464  
1465  ### OpenAIResponsesChatGenerator
1466  
1467  Completes chats using OpenAI's Responses API.
1468  
1469  It works with the gpt-4 and o-series models and supports streaming responses
1470  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1471  format in input and output.
1472  
1473  You can customize how the text is generated by passing parameters to the
1474  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1475  the component or when you run it. Any parameter that works with
1476  `openai.Responses.create` will work here too.
1477  
1478  For details on OpenAI API parameters, see
1479  [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses).
1480  
1481  ### Usage example
1482  
1483  ```python
1484  from haystack.components.generators.chat import OpenAIResponsesChatGenerator
1485  from haystack.dataclasses import ChatMessage
1486  
1487  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1488  
1489  client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}})
1490  response = client.run(messages)
1491  print(response)
1492  ```
1493  
1494  #### __init__
1495  
1496  ```python
1497  __init__(
1498      *,
1499      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
1500      model: str = "gpt-5-mini",
1501      streaming_callback: StreamingCallbackT | None = None,
1502      api_base_url: str | None = None,
1503      organization: str | None = None,
1504      generation_kwargs: dict[str, Any] | None = None,
1505      timeout: float | None = None,
1506      max_retries: int | None = None,
1507      tools: ToolsType | list[dict] | None = None,
1508      tools_strict: bool = False,
1509      http_client_kwargs: dict[str, Any] | None = None
1510  )
1511  ```
1512  
1513  Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default.
1514  
1515  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
1516  environment variables to override the `timeout` and `max_retries` parameters respectively
1517  in the OpenAI client.
1518  
1519  **Parameters:**
1520  
1521  - **api_key** (<code>Secret</code>) – The OpenAI API key.
1522    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
1523    during initialization.
1524  - **model** (<code>str</code>) – The name of the model to use.
1525  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1526    The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1527    as an argument.
1528  - **api_base_url** (<code>str | None</code>) – An optional base URL.
1529  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See
1530    [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1531  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent
1532    directly to the OpenAI endpoint.
1533    See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for
1534    more details.
1535    Some of the supported parameters:
1536  - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random,
1537    while lower values like 0.2 will make it more focused and deterministic.
1538  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1539    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1540    comprising the top 10% probability mass are considered.
1541  - `previous_response_id`: The ID of the previous response.
1542    Use this to create multi-turn conversations.
1543  - `text_format`: A Pydantic model that enforces the structure of the model's response.
1544    If provided, the output will always be validated against this
1545    format (unless the model returns a tool call).
1546    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1547  - `text`: A JSON schema that enforces the structure of the model's response.
1548    If provided, the output will always be validated against this
1549    format (unless the model returns a tool call).
1550    Notes:
1551    - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
1552    - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored.
1553    - Currently, this component doesn't support streaming for structured outputs.
1554    - Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1555      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1556  - `reasoning`: A dictionary of parameters for reasoning. For example:
1557    - `summary`: The summary of the reasoning.
1558    - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`.
1559    - `generate_summary`: Whether to generate a summary of the reasoning.
1560      Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled.
1561      For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning).
1562  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
1563    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1564  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
1565    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1566  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. This parameter can accept either a
1567    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1568    OpenAI/MCP tool definitions.
1569    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1570    For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
1571  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
1572    follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
1573    are strict by default.
1574  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1575    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
1576  
1577  #### warm_up
1578  
1579  ```python
1580  warm_up()
1581  ```
1582  
1583  Warm up the OpenAI responses chat generator.
1584  
1585  This will warm up the tools registered in the chat generator.
1586  This method is idempotent and will only warm up the tools once.
1587  
1588  #### to_dict
1589  
1590  ```python
1591  to_dict() -> dict[str, Any]
1592  ```
1593  
1594  Serialize this component to a dictionary.
1595  
1596  **Returns:**
1597  
1598  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
1599  
1600  #### from_dict
1601  
1602  ```python
1603  from_dict(data: dict[str, Any]) -> OpenAIResponsesChatGenerator
1604  ```
1605  
1606  Deserialize this component from a dictionary.
1607  
1608  **Parameters:**
1609  
1610  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
1611  
1612  **Returns:**
1613  
1614  - <code>OpenAIResponsesChatGenerator</code> – The deserialized component instance.
1615  
1616  #### run
1617  
1618  ```python
1619  run(
1620      messages: list[ChatMessage],
1621      *,
1622      streaming_callback: StreamingCallbackT | None = None,
1623      generation_kwargs: dict[str, Any] | None = None,
1624      tools: ToolsType | list[dict] | None = None,
1625      tools_strict: bool | None = None
1626  ) -> dict[str, list[ChatMessage]]
1627  ```
1628  
1629  Invokes response generation based on the provided messages and generation parameters.
1630  
1631  **Parameters:**
1632  
1633  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1634  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1635  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1636    override the parameters passed during component initialization.
1637    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
1638  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. If set, it will override the
1639    `tools` parameter set during component initialization. This parameter can accept either a
1640    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1641    OpenAI/MCP tool definitions.
1642    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1643    For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
1644  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
1645    follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
1646    are strict by default.
1647    If set, it will override the `tools_strict` parameter set during component initialization.
1648  
1649  **Returns:**
1650  
1651  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1652  - `replies`: A list containing the generated responses as ChatMessage instances.
1653  
1654  #### run_async
1655  
1656  ```python
1657  run_async(
1658      messages: list[ChatMessage],
1659      *,
1660      streaming_callback: StreamingCallbackT | None = None,
1661      generation_kwargs: dict[str, Any] | None = None,
1662      tools: ToolsType | list[dict] | None = None,
1663      tools_strict: bool | None = None
1664  ) -> dict[str, list[ChatMessage]]
1665  ```
1666  
1667  Asynchronously invokes response generation based on the provided messages and generation parameters.
1668  
1669  This is the asynchronous version of the `run` method. It has the same parameters and return values
1670  but can be used with `await` in async code.
1671  
1672  **Parameters:**
1673  
1674  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1675  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1676    Must be a coroutine.
1677  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1678    override the parameters passed during component initialization.
1679    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
1680  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
1681    `tools` parameter set during component initialization. This parameter can accept either a list of
1682    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1683    OpenAI/MCP tool definitions.
1684    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1685  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1686    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1687    If set, it will override the `tools_strict` parameter set during component initialization.
1688  
1689  **Returns:**
1690  
1691  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1692  - `replies`: A list containing the generated responses as ChatMessage instances.
1693  
1694  ## hugging_face_api
1695  
1696  ### HuggingFaceAPIGenerator
1697  
1698  Generates text using Hugging Face APIs.
1699  
1700  Use it with the following Hugging Face APIs:
1701  
1702  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
1703  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
1704  
1705  **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the
1706  `text_generation` endpoint. Generative models are now only available through providers supporting the
1707  `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API.
1708  Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint.
1709  
1710  ### Usage examples
1711  
1712  #### With Hugging Face Inference Endpoints
1713  
1714  ```python
1715  from haystack.components.generators import HuggingFaceAPIGenerator
1716  from haystack.utils import Secret
1717  
1718  generator = HuggingFaceAPIGenerator(api_type="inference_endpoints",
1719                                      api_params={"url": "<your-inference-endpoint-url>"},
1720                                      token=Secret.from_token("<your-api-key>"))
1721  
1722  result = generator.run(prompt="What's Natural Language Processing?")
1723  print(result)
1724  ```
1725  
1726  #### With self-hosted text generation inference
1727  
1728  ```python
1729  from haystack.components.generators import HuggingFaceAPIGenerator
1730  
1731  generator = HuggingFaceAPIGenerator(api_type="text_generation_inference",
1732                                      api_params={"url": "http://localhost:8080"})
1733  
1734  result = generator.run(prompt="What's Natural Language Processing?")
1735  print(result)
1736  ```
1737  
1738  #### With the free serverless inference API
1739  
1740  Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the
1741  `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the
1742  `chat_completion` endpoint.
1743  
1744  ```python
1745  from haystack.components.generators import HuggingFaceAPIGenerator
1746  from haystack.utils import Secret
1747  
1748  generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
1749                                      api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
1750                                      token=Secret.from_token("<your-api-key>"))
1751  
1752  result = generator.run(prompt="What's Natural Language Processing?")
1753  print(result)
1754  ```
1755  
1756  #### __init__
1757  
1758  ```python
1759  __init__(
1760      api_type: HFGenerationAPIType | str,
1761      api_params: dict[str, str],
1762      token: Secret | None = Secret.from_env_var(
1763          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
1764      ),
1765      generation_kwargs: dict[str, Any] | None = None,
1766      stop_words: list[str] | None = None,
1767      streaming_callback: StreamingCallbackT | None = None,
1768  )
1769  ```
1770  
1771  Initialize the HuggingFaceAPIGenerator instance.
1772  
1773  **Parameters:**
1774  
1775  - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types:
1776  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
1777  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
1778  - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api).
1779    This might no longer work due to changes in the models offered in the Hugging Face Inference API.
1780    Please use the `HuggingFaceAPIChatGenerator` component instead.
1781  - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys:
1782  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
1783  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
1784    `TEXT_GENERATION_INFERENCE`.
1785  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc.
1786  - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization.
1787    Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
1788  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`,
1789    `temperature`, `top_k`, `top_p`.
1790    For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation)
1791    for more information.
1792  - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words.
1793  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1794  
1795  #### to_dict
1796  
1797  ```python
1798  to_dict() -> dict[str, Any]
1799  ```
1800  
1801  Serialize this component to a dictionary.
1802  
1803  **Returns:**
1804  
1805  - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component.
1806  
1807  #### from_dict
1808  
1809  ```python
1810  from_dict(data: dict[str, Any]) -> HuggingFaceAPIGenerator
1811  ```
1812  
1813  Deserialize this component from a dictionary.
1814  
1815  #### run
1816  
1817  ```python
1818  run(
1819      prompt: str,
1820      streaming_callback: StreamingCallbackT | None = None,
1821      generation_kwargs: dict[str, Any] | None = None,
1822  )
1823  ```
1824  
1825  Invoke the text generation inference for the given prompt and generation parameters.
1826  
1827  **Parameters:**
1828  
1829  - **prompt** (<code>str</code>) – A string representing the prompt.
1830  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1831  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
1832  
1833  **Returns:**
1834  
1835  - – A dictionary with the generated replies and metadata. Both are lists of length n.
1836  - replies: A list of strings representing the generated replies.
1837  
1838  ## hugging_face_local
1839  
1840  ### HuggingFaceLocalGenerator
1841  
1842  Generates text using models from Hugging Face that run locally.
1843  
1844  LLMs running locally may need powerful hardware.
1845  
1846  ### Usage example
1847  
1848  ```python
1849  from haystack.components.generators import HuggingFaceLocalGenerator
1850  
1851  generator = HuggingFaceLocalGenerator(
1852      model="Qwen/Qwen3-0.6B",
1853      task="text-generation",
1854      generation_kwargs={"max_new_tokens": 100, "temperature": 0.9}
1855  )
1856  
1857  print(generator.run("Who is the best American actor?"))
1858  # {'replies': ['John Cusack']}
1859  ```
1860  
1861  #### __init__
1862  
1863  ```python
1864  __init__(
1865      model: str = "Qwen/Qwen3-0.6B",
1866      task: Literal["text-generation", "text2text-generation"] | None = None,
1867      device: ComponentDevice | None = None,
1868      token: Secret | None = Secret.from_env_var(
1869          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
1870      ),
1871      generation_kwargs: dict[str, Any] | None = None,
1872      huggingface_pipeline_kwargs: dict[str, Any] | None = None,
1873      stop_words: list[str] | None = None,
1874      streaming_callback: StreamingCallbackT | None = None,
1875  )
1876  ```
1877  
1878  Creates an instance of a HuggingFaceLocalGenerator.
1879  
1880  **Parameters:**
1881  
1882  - **model** (<code>str</code>) – The Hugging Face text generation model name or path.
1883  - **task** (<code>Literal['text-generation', 'text2text-generation'] | None</code>) – The task for the Hugging Face pipeline. Possible options:
1884  - `text-generation`: Supported by decoder models, like GPT.
1885  - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead.
1886    Previously supported by encoder–decoder models such as T5.
1887    If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1888    If not specified, the component calls the Hugging Face API to infer the task from the model name.
1889  - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device.
1890    If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
1891  - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files.
1892    If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1893  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
1894    Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
1895    See Hugging Face's documentation for more information:
1896  - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
1897  - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
1898  - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the
1899    Hugging Face pipeline for text generation.
1900    These keyword arguments provide fine-grained control over the Hugging Face pipeline.
1901    In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
1902    For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
1903    In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization:
1904    [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
1905  - **stop_words** (<code>list\[str\] | None</code>) – If the model generates a stop word, the generation stops.
1906    If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
1907    For some chat models, the output includes both the new text and the original prompt.
1908    In these cases, make sure your prompt has no stop words.
1909  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1910  
1911  #### warm_up
1912  
1913  ```python
1914  warm_up()
1915  ```
1916  
1917  Initializes the component.
1918  
1919  #### to_dict
1920  
1921  ```python
1922  to_dict() -> dict[str, Any]
1923  ```
1924  
1925  Serializes the component to a dictionary.
1926  
1927  **Returns:**
1928  
1929  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1930  
1931  #### from_dict
1932  
1933  ```python
1934  from_dict(data: dict[str, Any]) -> HuggingFaceLocalGenerator
1935  ```
1936  
1937  Deserializes the component from a dictionary.
1938  
1939  **Parameters:**
1940  
1941  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
1942  
1943  **Returns:**
1944  
1945  - <code>HuggingFaceLocalGenerator</code> – The deserialized component.
1946  
1947  #### run
1948  
1949  ```python
1950  run(
1951      prompt: str,
1952      streaming_callback: StreamingCallbackT | None = None,
1953      generation_kwargs: dict[str, Any] | None = None,
1954  )
1955  ```
1956  
1957  Run the text generation model on the given prompt.
1958  
1959  **Parameters:**
1960  
1961  - **prompt** (<code>str</code>) – A string representing the prompt.
1962  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1963  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
1964  
1965  **Returns:**
1966  
1967  - – A dictionary containing the generated replies.
1968  - replies: A list of strings representing the generated replies.
1969  
1970  ## openai
1971  
1972  ### OpenAIGenerator
1973  
1974  Generates text using OpenAI's large language models (LLMs).
1975  
1976  It works with the gpt-4 and gpt-5 series models and supports streaming responses
1977  from OpenAI API. It uses strings as input and output.
1978  
1979  You can customize how the text is generated by passing parameters to the
1980  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1981  the component or when you run it. Any parameter that works with
1982  `openai.ChatCompletion.create` will work here too.
1983  
1984  For details on OpenAI API parameters, see
1985  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
1986  
1987  ### Usage example
1988  
1989  ```python
1990  from haystack.components.generators import OpenAIGenerator
1991  client = OpenAIGenerator()
1992  response = client.run("What's Natural Language Processing? Be brief.")
1993  print(response)
1994  
1995  >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
1996  >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
1997  >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
1998  >> 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
1999  >> 'completion_tokens': 49, 'total_tokens': 65}}]}
2000  ```
2001  
2002  #### __init__
2003  
2004  ```python
2005  __init__(
2006      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
2007      model: str = "gpt-5-mini",
2008      streaming_callback: StreamingCallbackT | None = None,
2009      api_base_url: str | None = None,
2010      organization: str | None = None,
2011      system_prompt: str | None = None,
2012      generation_kwargs: dict[str, Any] | None = None,
2013      timeout: float | None = None,
2014      max_retries: int | None = None,
2015      http_client_kwargs: dict[str, Any] | None = None,
2016  )
2017  ```
2018  
2019  Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini
2020  
2021  By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters
2022  in the OpenAI client.
2023  
2024  **Parameters:**
2025  
2026  - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI.
2027  - **model** (<code>str</code>) – The name of the model to use.
2028  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2029    The callback function accepts StreamingChunk as an argument.
2030  - **api_base_url** (<code>str | None</code>) – An optional base URL.
2031  - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`.
2032  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the system prompt is
2033    omitted, and the default system prompt of the model is used.
2034  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are all sent directly to
2035    the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
2036    more details.
2037    Some of the supported parameters:
2038  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
2039    including visible output tokens and reasoning tokens.
2040  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
2041    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
2042  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
2043    considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens
2044    comprising the top 10% probability mass are considered.
2045  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
2046    it will generate two completions for each of the three prompts, ending up with 6 completions in total.
2047  - `stop`: One or more sequences after which the LLM should stop generating tokens.
2048  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
2049    the model will be less likely to repeat the same token in the text.
2050  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
2051    Bigger values mean the model will be less likely to repeat the same token in the text.
2052  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
2053    values are the bias to add to that token.
2054  - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable
2055    or set to 30.
2056  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred
2057    from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
2058  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
2059    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
2060  
2061  #### to_dict
2062  
2063  ```python
2064  to_dict() -> dict[str, Any]
2065  ```
2066  
2067  Serialize this component to a dictionary.
2068  
2069  **Returns:**
2070  
2071  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
2072  
2073  #### from_dict
2074  
2075  ```python
2076  from_dict(data: dict[str, Any]) -> OpenAIGenerator
2077  ```
2078  
2079  Deserialize this component from a dictionary.
2080  
2081  **Parameters:**
2082  
2083  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
2084  
2085  **Returns:**
2086  
2087  - <code>OpenAIGenerator</code> – The deserialized component instance.
2088  
2089  #### run
2090  
2091  ```python
2092  run(
2093      prompt: str,
2094      system_prompt: str | None = None,
2095      streaming_callback: StreamingCallbackT | None = None,
2096      generation_kwargs: dict[str, Any] | None = None,
2097  ) -> dict[str, list[str] | list[dict[str, Any]]]
2098  ```
2099  
2100  Invoke the text generation inference based on the provided messages and generation parameters.
2101  
2102  **Parameters:**
2103  
2104  - **prompt** (<code>str</code>) – The string prompt to use for text generation.
2105  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If this run time system prompt is omitted, the system
2106    prompt, if defined at initialisation time, is used.
2107  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2108  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will potentially override the parameters
2109    passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to
2110    the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create).
2111  
2112  **Returns:**
2113  
2114  - <code>dict\[str, list\[str\] | list\[dict\[str, Any\]\]\]</code> – A list of strings containing the generated responses and a list of dictionaries containing the metadata
2115    for each response.
2116  
2117  ## openai_dalle
2118  
2119  ### DALLEImageGenerator
2120  
2121  Generates images using OpenAI's DALL-E model.
2122  
2123  For details on OpenAI API parameters, see
2124  [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create).
2125  
2126  ### Usage example
2127  
2128  ```python
2129  from haystack.components.generators import DALLEImageGenerator
2130  image_generator = DALLEImageGenerator()
2131  response = image_generator.run("Show me a picture of a black cat.")
2132  print(response)
2133  ```
2134  
2135  #### __init__
2136  
2137  ```python
2138  __init__(
2139      model: str = "dall-e-3",
2140      quality: Literal["standard", "hd"] = "standard",
2141      size: Literal[
2142          "256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"
2143      ] = "1024x1024",
2144      response_format: Literal["url", "b64_json"] = "url",
2145      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
2146      api_base_url: str | None = None,
2147      organization: str | None = None,
2148      timeout: float | None = None,
2149      max_retries: int | None = None,
2150      http_client_kwargs: dict[str, Any] | None = None,
2151  )
2152  ```
2153  
2154  Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3.
2155  
2156  **Parameters:**
2157  
2158  - **model** (<code>str</code>) – The model to use for image generation. Can be "dall-e-2" or "dall-e-3".
2159  - **quality** (<code>Literal['standard', 'hd']</code>) – The quality of the generated image. Can be "standard" or "hd".
2160  - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792']</code>) – The size of the generated images.
2161    Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2.
2162    Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models.
2163  - **response_format** (<code>Literal['url', 'b64_json']</code>) – The format of the response. Can be "url" or "b64_json".
2164  - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI.
2165  - **api_base_url** (<code>str | None</code>) – An optional base URL.
2166  - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`.
2167  - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable
2168    or set to 30.
2169  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred
2170    from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
2171  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
2172    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
2173  
2174  #### warm_up
2175  
2176  ```python
2177  warm_up() -> None
2178  ```
2179  
2180  Warm up the OpenAI client.
2181  
2182  #### run
2183  
2184  ```python
2185  run(
2186      prompt: str,
2187      size: (
2188          Literal["256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"]
2189          | None
2190      ) = None,
2191      quality: Literal["standard", "hd"] | None = None,
2192      response_format: Literal["url", "b64_json"] | None = None,
2193  )
2194  ```
2195  
2196  Invokes the image generation inference based on the provided prompt and generation parameters.
2197  
2198  **Parameters:**
2199  
2200  - **prompt** (<code>str</code>) – The prompt to generate the image.
2201  - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792'] | None</code>) – If provided, overrides the size provided during initialization.
2202  - **quality** (<code>Literal['standard', 'hd'] | None</code>) – If provided, overrides the quality provided during initialization.
2203  - **response_format** (<code>Literal['url', 'b64_json'] | None</code>) – If provided, overrides the response format provided during initialization.
2204  
2205  **Returns:**
2206  
2207  - – A dictionary containing the generated list of images and the revised prompt.
2208    Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings.
2209    The revised prompt is the prompt that was used to generate the image, if there was any revision
2210    to the prompt made by OpenAI.
2211  
2212  #### to_dict
2213  
2214  ```python
2215  to_dict() -> dict[str, Any]
2216  ```
2217  
2218  Serialize this component to a dictionary.
2219  
2220  **Returns:**
2221  
2222  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
2223  
2224  #### from_dict
2225  
2226  ```python
2227  from_dict(data: dict[str, Any]) -> DALLEImageGenerator
2228  ```
2229  
2230  Deserialize this component from a dictionary.
2231  
2232  **Parameters:**
2233  
2234  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
2235  
2236  **Returns:**
2237  
2238  - <code>DALLEImageGenerator</code> – The deserialized component instance.
2239  
2240  ## utils
2241  
2242  ### print_streaming_chunk
2243  
2244  ```python
2245  print_streaming_chunk(chunk: StreamingChunk) -> None
2246  ```
2247  
2248  Callback function to handle and display streaming output chunks.
2249  
2250  This function processes a `StreamingChunk` object by:
2251  
2252  - Printing tool call metadata (if any), including function names and arguments, as they arrive.
2253  - Printing tool call results when available.
2254  - Printing the main content (e.g., text tokens) of the chunk as it is received.
2255  
2256  The function outputs data directly to stdout and flushes output buffers to ensure immediate display during
2257  streaming.
2258  
2259  **Parameters:**
2260  
2261  - **chunk** (<code>StreamingChunk</code>) – A chunk of streaming data containing content and optional metadata, such as tool calls and
2262    tool results.