generators_api.md
   1  ---
   2  title: "Generators"
   3  id: generators-api
   4  description: "Enables text generation using LLMs."
   5  slug: "/generators-api"
   6  ---
   7  
   8  
   9  ## azure
  10  
  11  ### AzureOpenAIGenerator
  12  
  13  Bases: <code>OpenAIGenerator</code>
  14  
  15  Generates text using OpenAI's large language models (LLMs).
  16  
  17  It works with the gpt-4 - type models and supports streaming responses
  18  from OpenAI API.
  19  
  20  You can customize how the text is generated by passing parameters to the
  21  OpenAI API. Use the `**generation_kwargs` argument when you initialize
  22  the component or when you run it. Any parameter that works with
  23  `openai.ChatCompletion.create` will work here too.
  24  
  25  For details on OpenAI API parameters, see
  26  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
  27  
  28  ### Usage example
  29  
  30  ```python
  31  from haystack.components.generators import AzureOpenAIGenerator
  32  from haystack.utils import Secret
  33  client = AzureOpenAIGenerator(
  34      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
  35      api_key=Secret.from_token("<your-api-key>"),
  36      azure_deployment="<this a model name, e.g.  gpt-4.1-mini>")
  37  response = client.run("What's Natural Language Processing? Be brief.")
  38  print(response)
  39  ```
  40  
  41  ```
  42  >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
  43  >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
  44  >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
  45  >> 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
  46  >> 'completion_tokens': 49, 'total_tokens': 65}}]}
  47  ```
  48  
  49  #### __init__
  50  
  51  ```python
  52  __init__(
  53      azure_endpoint: str | None = None,
  54      api_version: str | None = "2024-12-01-preview",
  55      azure_deployment: str | None = "gpt-4.1-mini",
  56      api_key: Secret | None = Secret.from_env_var(
  57          "AZURE_OPENAI_API_KEY", strict=False
  58      ),
  59      azure_ad_token: Secret | None = Secret.from_env_var(
  60          "AZURE_OPENAI_AD_TOKEN", strict=False
  61      ),
  62      organization: str | None = None,
  63      streaming_callback: StreamingCallbackT | None = None,
  64      system_prompt: str | None = None,
  65      timeout: float | None = None,
  66      max_retries: int | None = None,
  67      http_client_kwargs: dict[str, Any] | None = None,
  68      generation_kwargs: dict[str, Any] | None = None,
  69      default_headers: dict[str, str] | None = None,
  70      *,
  71      azure_ad_token_provider: AzureADTokenProvider | None = None
  72  )
  73  ```
  74  
  75  Initialize the Azure OpenAI Generator.
  76  
  77  **Parameters:**
  78  
  79  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`.
  80  - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview.
  81  - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name.
  82  - **api_key** (<code>Secret | None</code>) – The API key to use for authentication.
  83  - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
  84  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
  85    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
  86  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
  87    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
  88    as an argument.
  89  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the Generator
  90    omits the system prompt and uses the default system prompt.
  91  - **timeout** (<code>float | None</code>) – Timeout for AzureOpenAI client. If not set, it is inferred from the
  92    `OPENAI_TIMEOUT` environment variable or set to 30.
  93  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with AzureOpenAI if it returns an internal error.
  94    If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
  95  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
  96    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
  97  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model, sent directly to
  98    the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for
  99    more details.
 100    Some of the supported parameters:
 101  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 102    including visible output tokens and reasoning tokens.
 103  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 104    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 105  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 106    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
 107    comprising the top 10% probability mass are considered.
 108  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 109    the LLM will generate two completions per prompt, resulting in 6 completions total.
 110  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 111  - `presence_penalty`: The penalty applied if a token is already present.
 112    Higher values make the model less likely to repeat the token.
 113  - `frequency_penalty`: Penalty applied if a token has already been generated.
 114    Higher values make the model less likely to repeat the token.
 115  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 116    values are the bias to add to that token.
 117  - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client.
 118  - **azure_ad_token_provider** (<code>AzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on
 119    every request.
 120  
 121  #### to_dict
 122  
 123  ```python
 124  to_dict() -> dict[str, Any]
 125  ```
 126  
 127  Serialize this component to a dictionary.
 128  
 129  **Returns:**
 130  
 131  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 132  
 133  #### from_dict
 134  
 135  ```python
 136  from_dict(data: dict[str, Any]) -> AzureOpenAIGenerator
 137  ```
 138  
 139  Deserialize this component from a dictionary.
 140  
 141  **Parameters:**
 142  
 143  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 144  
 145  **Returns:**
 146  
 147  - <code>AzureOpenAIGenerator</code> – The deserialized component instance.
 148  
 149  ## chat/azure
 150  
 151  ### AzureOpenAIChatGenerator
 152  
 153  Bases: <code>OpenAIChatGenerator</code>
 154  
 155  Generates text using OpenAI's models on Azure.
 156  
 157  It works with the gpt-4 - type models and supports streaming responses
 158  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 159  format in input and output.
 160  
 161  You can customize how the text is generated by passing parameters to the
 162  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 163  the component or when you run it. Any parameter that works with
 164  `openai.ChatCompletion.create` will work here too.
 165  
 166  For details on OpenAI API parameters, see
 167  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 168  
 169  ### Usage example
 170  
 171  ```python
 172  from haystack.components.generators.chat import AzureOpenAIChatGenerator
 173  from haystack.dataclasses import ChatMessage
 174  from haystack.utils import Secret
 175  
 176  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 177  
 178  client = AzureOpenAIChatGenerator(
 179      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
 180      api_key=Secret.from_token("<your-api-key>"),
 181      azure_deployment="<this a model name, e.g. gpt-4.1-mini>")
 182  response = client.run(messages)
 183  print(response)
 184  ```
 185  
 186  ```
 187  {'replies':
 188      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
 189      "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
 190       enabling computers to understand, interpret, and generate human language in a way that is useful.")],
 191       _name=None,
 192       _meta={'model': 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop',
 193       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})]
 194  }
 195  ```
 196  
 197  #### __init__
 198  
 199  ```python
 200  __init__(
 201      azure_endpoint: str | None = None,
 202      api_version: str | None = "2024-12-01-preview",
 203      azure_deployment: str | None = "gpt-4.1-mini",
 204      api_key: Secret | None = Secret.from_env_var(
 205          "AZURE_OPENAI_API_KEY", strict=False
 206      ),
 207      azure_ad_token: Secret | None = Secret.from_env_var(
 208          "AZURE_OPENAI_AD_TOKEN", strict=False
 209      ),
 210      organization: str | None = None,
 211      streaming_callback: StreamingCallbackT | None = None,
 212      timeout: float | None = None,
 213      max_retries: int | None = None,
 214      generation_kwargs: dict[str, Any] | None = None,
 215      default_headers: dict[str, str] | None = None,
 216      tools: ToolsType | None = None,
 217      tools_strict: bool = False,
 218      *,
 219      azure_ad_token_provider: (
 220          AzureADTokenProvider | AsyncAzureADTokenProvider | None
 221      ) = None,
 222      http_client_kwargs: dict[str, Any] | None = None
 223  )
 224  ```
 225  
 226  Initialize the Azure OpenAI Chat Generator component.
 227  
 228  **Parameters:**
 229  
 230  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
 231  - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview.
 232  - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name.
 233  - **api_key** (<code>Secret | None</code>) – The API key to use for authentication.
 234  - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
 235  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
 236    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
 237  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
 238    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
 239    as an argument.
 240  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
 241    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 242  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
 243    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 244  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to
 245    the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 246    Some of the supported parameters:
 247  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 248    including visible output tokens and reasoning tokens.
 249  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 250    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 251  - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers
 252    tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising
 253    the top 10% probability mass are considered.
 254  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 255    the LLM will generate two completions per prompt, resulting in 6 completions total.
 256  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 257  - `presence_penalty`: The penalty applied if a token is already present.
 258    Higher values make the model less likely to repeat the token.
 259  - `frequency_penalty`: Penalty applied if a token has already been generated.
 260    Higher values make the model less likely to repeat the token.
 261  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 262    values are the bias to add to that token.
 263  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
 264    If provided, the output will always be validated against this
 265    format (unless the model returns a tool call).
 266    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
 267    Notes:
 268    - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
 269      Older models only support basic version of structured outputs through `{"type": "json_object"}`.
 270      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
 271    - For structured outputs with streaming,
 272      the `response_format` must be a JSON schema and not a Pydantic model.
 273  - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client.
 274  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 275  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 276    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 277  - **azure_ad_token_provider** (<code>AzureADTokenProvider | AsyncAzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on
 278    every request.
 279  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 280    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
 281  
 282  #### warm_up
 283  
 284  ```python
 285  warm_up()
 286  ```
 287  
 288  Warm up the Azure OpenAI chat generator.
 289  
 290  This will warm up the tools registered in the chat generator.
 291  This method is idempotent and will only warm up the tools once.
 292  
 293  #### to_dict
 294  
 295  ```python
 296  to_dict() -> dict[str, Any]
 297  ```
 298  
 299  Serialize this component to a dictionary.
 300  
 301  **Returns:**
 302  
 303  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 304  
 305  #### from_dict
 306  
 307  ```python
 308  from_dict(data: dict[str, Any]) -> AzureOpenAIChatGenerator
 309  ```
 310  
 311  Deserialize this component from a dictionary.
 312  
 313  **Parameters:**
 314  
 315  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 316  
 317  **Returns:**
 318  
 319  - <code>AzureOpenAIChatGenerator</code> – The deserialized component instance.
 320  
 321  ## chat/azure_responses
 322  
 323  ### AzureOpenAIResponsesChatGenerator
 324  
 325  Bases: <code>OpenAIResponsesChatGenerator</code>
 326  
 327  Completes chats using OpenAI's Responses API on Azure.
 328  
 329  It works with the gpt-5 and o-series models and supports streaming responses
 330  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 331  format in input and output.
 332  
 333  You can customize how the text is generated by passing parameters to the
 334  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 335  the component or when you run it. Any parameter that works with
 336  `openai.Responses.create` will work here too.
 337  
 338  For details on OpenAI API parameters, see
 339  [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses).
 340  
 341  ### Usage example
 342  
 343  ```python
 344  from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator
 345  from haystack.dataclasses import ChatMessage
 346  
 347  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 348  
 349  client = AzureOpenAIResponsesChatGenerator(
 350      azure_endpoint="https://example-resource.azure.openai.com/",
 351      generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}
 352  )
 353  response = client.run(messages)
 354  print(response)
 355  ```
 356  
 357  #### SUPPORTED_MODELS
 358  
 359  ```python
 360  SUPPORTED_MODELS: list[str] = [
 361      "gpt-5.4-pro",
 362      "gpt-5.4",
 363      "gpt-5.3-chat",
 364      "gpt-5.3-codex",
 365      "gpt-5.2-codex",
 366      "gpt-5.2",
 367      "gpt-5.2-chat",
 368      "gpt-5.1-codex-max",
 369      "gpt-5.1",
 370      "gpt-5.1-chat",
 371      "gpt-5.1-codex",
 372      "gpt-5.1-codex-mini",
 373      "gpt-5-pro",
 374      "gpt-5-codex",
 375      "gpt-5",
 376      "gpt-5-mini",
 377      "gpt-5-nano",
 378      "gpt-5-chat",
 379      "gpt-4o",
 380      "gpt-4o-mini",
 381      "computer-use-preview",
 382      "gpt-4.1",
 383      "gpt-4.1-nano",
 384      "gpt-4.1-mini",
 385      "gpt-image-1",
 386      "gpt-image-1-mini",
 387      "gpt-image-1.5",
 388      "o1",
 389      "o3-mini",
 390      "o3",
 391      "o4-mini",
 392  ]
 393  
 394  ```
 395  
 396  A non-exhaustive list of chat models supported by this component.
 397  See https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/responses#model-support for the full list.
 398  
 399  #### __init__
 400  
 401  ```python
 402  __init__(
 403      *,
 404      api_key: (
 405          Secret | Callable[[], str] | Callable[[], Awaitable[str]]
 406      ) = Secret.from_env_var("AZURE_OPENAI_API_KEY", strict=False),
 407      azure_endpoint: str | None = None,
 408      azure_deployment: str = "gpt-5-mini",
 409      streaming_callback: StreamingCallbackT | None = None,
 410      organization: str | None = None,
 411      generation_kwargs: dict[str, Any] | None = None,
 412      timeout: float | None = None,
 413      max_retries: int | None = None,
 414      tools: ToolsType | None = None,
 415      tools_strict: bool = False,
 416      http_client_kwargs: dict[str, Any] | None = None
 417  )
 418  ```
 419  
 420  Initialize the AzureOpenAIResponsesChatGenerator component.
 421  
 422  **Parameters:**
 423  
 424  - **api_key** (<code>Secret | Callable\[[], str\] | Callable\[[], Awaitable\[str\]\]</code>) – The API key to use for authentication. Can be:
 425  - A `Secret` object containing the API key.
 426  - A `Secret` object containing the [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
 427  - A function that returns an Azure Active Directory token.
 428  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
 429  - **azure_deployment** (<code>str</code>) – The deployment of the model, usually the model name.
 430  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
 431    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
 432  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
 433    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
 434    as an argument.
 435  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
 436    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 437  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
 438    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 439  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent
 440    directly to the OpenAI endpoint.
 441    See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for
 442    more details.
 443    Some of the supported parameters:
 444  - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random,
 445    while lower values like 0.2 will make it more focused and deterministic.
 446  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 447    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
 448    comprising the top 10% probability mass are considered.
 449  - `previous_response_id`: The ID of the previous response.
 450    Use this to create multi-turn conversations.
 451  - `text_format`: A Pydantic model that enforces the structure of the model's response.
 452    If provided, the output will always be validated against this
 453    format (unless the model returns a tool call).
 454    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
 455  - `text`: A JSON schema that enforces the structure of the model's response.
 456    If provided, the output will always be validated against this
 457    format (unless the model returns a tool call).
 458    Notes:
 459    - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
 460    - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored.
 461    - Currently, this component doesn't support streaming for structured outputs.
 462    - Older models only support basic version of structured outputs through `{"type": "json_object"}`.
 463      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
 464  - `reasoning`: A dictionary of parameters for reasoning. For example:
 465    - `summary`: The summary of the reasoning.
 466    - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`.
 467    - `generate_summary`: Whether to generate a summary of the reasoning.
 468      Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled.
 469      For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning).
 470  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 471  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 472    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 473  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 474    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
 475  
 476  #### to_dict
 477  
 478  ```python
 479  to_dict() -> dict[str, Any]
 480  ```
 481  
 482  Serialize this component to a dictionary.
 483  
 484  **Returns:**
 485  
 486  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 487  
 488  #### from_dict
 489  
 490  ```python
 491  from_dict(data: dict[str, Any]) -> AzureOpenAIResponsesChatGenerator
 492  ```
 493  
 494  Deserialize this component from a dictionary.
 495  
 496  **Parameters:**
 497  
 498  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 499  
 500  **Returns:**
 501  
 502  - <code>AzureOpenAIResponsesChatGenerator</code> – The deserialized component instance.
 503  
 504  ## chat/fallback
 505  
 506  ### FallbackChatGenerator
 507  
 508  A chat generator wrapper that tries multiple chat generators sequentially.
 509  
 510  It forwards all parameters transparently to the underlying chat generators and returns the first successful result.
 511  Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator.
 512  If all chat generators fail, it raises a RuntimeError with details.
 513  
 514  Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only
 515  work correctly if the underlying chat generators implement proper timeout handling and raise exceptions
 516  when timeouts occur. For predictable latency guarantees, ensure your chat generators:
 517  
 518  - Support a `timeout` parameter in their initialization
 519  - Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming)
 520  - Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded
 521  
 522  Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters
 523  with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`)
 524  typically applies to all connection phases: connection setup, read, write, and pool. For streaming
 525  responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for
 526  receiving the complete response.
 527  
 528  Failover is automatically triggered when a generator raises any exception, including:
 529  
 530  - Timeout errors (if the generator implements and raises them)
 531  - Rate limit errors (429)
 532  - Authentication errors (401)
 533  - Context length errors (400)
 534  - Server errors (500+)
 535  - Any other exception
 536  
 537  #### __init__
 538  
 539  ```python
 540  __init__(chat_generators: list[ChatGenerator]) -> None
 541  ```
 542  
 543  Creates an instance of FallbackChatGenerator.
 544  
 545  **Parameters:**
 546  
 547  - **chat_generators** (<code>list\[ChatGenerator\]</code>) – A non-empty list of chat generator components to try in order.
 548  
 549  #### to_dict
 550  
 551  ```python
 552  to_dict() -> dict[str, Any]
 553  ```
 554  
 555  Serialize the component, including nested chat generators when they support serialization.
 556  
 557  #### from_dict
 558  
 559  ```python
 560  from_dict(data: dict[str, Any]) -> FallbackChatGenerator
 561  ```
 562  
 563  Rebuild the component from a serialized representation, restoring nested chat generators.
 564  
 565  #### warm_up
 566  
 567  ```python
 568  warm_up() -> None
 569  ```
 570  
 571  Warm up all underlying chat generators.
 572  
 573  This method calls warm_up() on each underlying generator that supports it.
 574  
 575  #### run
 576  
 577  ```python
 578  run(
 579      messages: list[ChatMessage],
 580      generation_kwargs: dict[str, Any] | None = None,
 581      tools: ToolsType | None = None,
 582      streaming_callback: StreamingCallbackT | None = None,
 583  ) -> dict[str, list[ChatMessage] | dict[str, Any]]
 584  ```
 585  
 586  Execute chat generators sequentially until one succeeds.
 587  
 588  **Parameters:**
 589  
 590  - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances.
 591  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens).
 592  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
 593  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses.
 594  
 595  **Returns:**
 596  
 597  - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with:
 598  - "replies": Generated ChatMessage instances from the first successful generator.
 599  - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
 600    total_attempts, failed_chat_generators, plus any metadata from the successful generator.
 601  
 602  **Raises:**
 603  
 604  - <code>RuntimeError</code> – If all chat generators fail.
 605  
 606  #### run_async
 607  
 608  ```python
 609  run_async(
 610      messages: list[ChatMessage],
 611      generation_kwargs: dict[str, Any] | None = None,
 612      tools: ToolsType | None = None,
 613      streaming_callback: StreamingCallbackT | None = None,
 614  ) -> dict[str, list[ChatMessage] | dict[str, Any]]
 615  ```
 616  
 617  Asynchronously execute chat generators sequentially until one succeeds.
 618  
 619  **Parameters:**
 620  
 621  - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances.
 622  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens).
 623  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
 624  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses.
 625  
 626  **Returns:**
 627  
 628  - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with:
 629  - "replies": Generated ChatMessage instances from the first successful generator.
 630  - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
 631    total_attempts, failed_chat_generators, plus any metadata from the successful generator.
 632  
 633  **Raises:**
 634  
 635  - <code>RuntimeError</code> – If all chat generators fail.
 636  
 637  ## chat/hugging_face_api
 638  
 639  ### HuggingFaceAPIChatGenerator
 640  
 641  Completes chats using Hugging Face APIs.
 642  
 643  HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 644  format for input and output. Use it to generate text with Hugging Face APIs:
 645  
 646  - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers)
 647  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
 648  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
 649  
 650  ### Usage examples
 651  
 652  #### With the serverless inference API (Inference Providers) - free tier available
 653  
 654  ```python
 655  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 656  from haystack.dataclasses import ChatMessage
 657  from haystack.utils import Secret
 658  from haystack.utils.hf import HFGenerationAPIType
 659  
 660  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 661              ChatMessage.from_user("What's Natural Language Processing?")]
 662  
 663  # the api_type can be expressed using the HFGenerationAPIType enum or as a string
 664  api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API
 665  api_type = "serverless_inference_api" # this is equivalent to the above
 666  
 667  generator = HuggingFaceAPIChatGenerator(api_type=api_type,
 668                                          api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
 669                                                      "provider": "together"},
 670                                          token=Secret.from_token("<your-api-key>"))
 671  
 672  result = generator.run(messages)
 673  print(result)
 674  ```
 675  
 676  #### With the serverless inference API (Inference Providers) and text+image input
 677  
 678  ```python
 679  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 680  from haystack.dataclasses import ChatMessage, ImageContent
 681  from haystack.utils import Secret
 682  from haystack.utils.hf import HFGenerationAPIType
 683  
 684  # Create an image from file path, URL, or base64
 685  image = ImageContent.from_file_path("path/to/your/image.jpg")
 686  
 687  # Create a multimodal message with both text and image
 688  messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])]
 689  
 690  generator = HuggingFaceAPIChatGenerator(
 691      api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
 692      api_params={
 693          "model": "Qwen/Qwen2.5-VL-7B-Instruct",  # Vision Language Model
 694          "provider": "hyperbolic"
 695      },
 696      token=Secret.from_token("<your-api-key>")
 697  )
 698  
 699  result = generator.run(messages)
 700  print(result)
 701  ```
 702  
 703  #### With paid inference endpoints
 704  
 705  ````python
 706  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 707  from haystack.dataclasses import ChatMessage
 708  from haystack.utils import Secret
 709  
 710  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 711              ChatMessage.from_user("What's Natural Language Processing?")]
 712  
 713  generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints",
 714                                          api_params={"url": "<your-inference-endpoint-url>"},
 715                                          token=Secret.from_token("<your-api-key>"))
 716  
 717  result = generator.run(messages)
 718  print(result)
 719  
 720  #### With self-hosted text generation inference
 721  
 722  ```python
 723  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 724  from haystack.dataclasses import ChatMessage
 725  
 726  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 727              ChatMessage.from_user("What's Natural Language Processing?")]
 728  
 729  generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference",
 730                                          api_params={"url": "http://localhost:8080"})
 731  
 732  result = generator.run(messages)
 733  print(result)
 734  ````
 735  
 736  #### __init__
 737  
 738  ```python
 739  __init__(
 740      api_type: HFGenerationAPIType | str,
 741      api_params: dict[str, str],
 742      token: Secret | None = Secret.from_env_var(
 743          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
 744      ),
 745      generation_kwargs: dict[str, Any] | None = None,
 746      stop_words: list[str] | None = None,
 747      streaming_callback: StreamingCallbackT | None = None,
 748      tools: ToolsType | None = None,
 749  )
 750  ```
 751  
 752  Initialize the HuggingFaceAPIChatGenerator instance.
 753  
 754  **Parameters:**
 755  
 756  - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types:
 757  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
 758  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
 759  - `serverless_inference_api`: See
 760    [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers).
 761  - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys:
 762  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
 763  - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`.
 764  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
 765    `TEXT_GENERATION_INFERENCE`.
 766  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc.
 767  - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization.
 768    Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
 769  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
 770    Some examples: `max_tokens`, `temperature`, `top_p`.
 771    For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
 772  - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words.
 773  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
 774  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 775    The chosen model should support tool/function calling, according to the model card.
 776    Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience
 777    unexpected behavior.
 778  
 779  #### warm_up
 780  
 781  ```python
 782  warm_up()
 783  ```
 784  
 785  Warm up the Hugging Face API chat generator.
 786  
 787  This will warm up the tools registered in the chat generator.
 788  This method is idempotent and will only warm up the tools once.
 789  
 790  #### to_dict
 791  
 792  ```python
 793  to_dict() -> dict[str, Any]
 794  ```
 795  
 796  Serialize this component to a dictionary.
 797  
 798  **Returns:**
 799  
 800  - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component.
 801  
 802  #### from_dict
 803  
 804  ```python
 805  from_dict(data: dict[str, Any]) -> HuggingFaceAPIChatGenerator
 806  ```
 807  
 808  Deserialize this component from a dictionary.
 809  
 810  #### run
 811  
 812  ```python
 813  run(
 814      messages: list[ChatMessage],
 815      generation_kwargs: dict[str, Any] | None = None,
 816      tools: ToolsType | None = None,
 817      streaming_callback: StreamingCallbackT | None = None,
 818  ) -> dict[str, list[ChatMessage]]
 819  ```
 820  
 821  Invoke the text generation inference based on the provided messages and generation parameters.
 822  
 823  **Parameters:**
 824  
 825  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
 826  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
 827  - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override
 828    the `tools` parameter set during component initialization. This parameter can accept either a
 829    list of `Tool` objects or a `Toolset` instance.
 830  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
 831    parameter set during component initialization.
 832  
 833  **Returns:**
 834  
 835  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
 836  - `replies`: A list containing the generated responses as ChatMessage objects.
 837  
 838  #### run_async
 839  
 840  ```python
 841  run_async(
 842      messages: list[ChatMessage],
 843      generation_kwargs: dict[str, Any] | None = None,
 844      tools: ToolsType | None = None,
 845      streaming_callback: StreamingCallbackT | None = None,
 846  ) -> dict[str, list[ChatMessage]]
 847  ```
 848  
 849  Asynchronously invokes the text generation inference based on the provided messages and generation parameters.
 850  
 851  This is the asynchronous version of the `run` method. It has the same parameters
 852  and return values but can be used with `await` in an async code.
 853  
 854  **Parameters:**
 855  
 856  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
 857  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
 858  - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools`
 859    parameter set during component initialization. This parameter can accept either a list of `Tool` objects
 860    or a `Toolset` instance.
 861  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
 862    parameter set during component initialization.
 863  
 864  **Returns:**
 865  
 866  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
 867  - `replies`: A list containing the generated responses as ChatMessage objects.
 868  
 869  ## chat/hugging_face_local
 870  
 871  ### default_tool_parser
 872  
 873  ```python
 874  default_tool_parser(text: str) -> list[ToolCall] | None
 875  ```
 876  
 877  Default implementation for parsing tool calls from model output text.
 878  
 879  Uses DEFAULT_TOOL_PATTERN to extract tool calls.
 880  
 881  **Parameters:**
 882  
 883  - **text** (<code>str</code>) – The text to parse for tool calls.
 884  
 885  **Returns:**
 886  
 887  - <code>list\[ToolCall\] | None</code> – A list containing a single ToolCall if a valid tool call is found, None otherwise.
 888  
 889  ### HuggingFaceLocalChatGenerator
 890  
 891  Generates chat responses using models from Hugging Face that run locally.
 892  
 893  Use this component with chat-based models,
 894  such as `Qwen/Qwen3-0.6B` or `meta-llama/Llama-2-7b-chat-hf`.
 895  LLMs running locally may need powerful hardware.
 896  
 897  ### Usage example
 898  
 899  ```python
 900  from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
 901  from haystack.dataclasses import ChatMessage
 902  
 903  generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B")
 904  messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
 905  print(generator.run(messages))
 906  ```
 907  
 908  ```
 909  {'replies':
 910      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
 911      "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals
 912      with the interaction between computers and human language. It enables computers to understand, interpret, and
 913      generate human language in a valuable way. NLP involves various techniques such as speech recognition, text
 914      analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to
 915      process and derive meaning from human language, improving communication between humans and machines.")],
 916      _name=None,
 917      _meta={'finish_reason': 'stop', 'index': 0, 'model':
 918            'mistralai/Mistral-7B-Instruct-v0.2',
 919            'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}})
 920            ]
 921  }
 922  ```
 923  
 924  #### __init__
 925  
 926  ```python
 927  __init__(
 928      model: str = "Qwen/Qwen3-0.6B",
 929      task: Literal["text-generation", "text2text-generation"] | None = None,
 930      device: ComponentDevice | None = None,
 931      token: Secret | None = Secret.from_env_var(
 932          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
 933      ),
 934      chat_template: str | None = None,
 935      generation_kwargs: dict[str, Any] | None = None,
 936      huggingface_pipeline_kwargs: dict[str, Any] | None = None,
 937      stop_words: list[str] | None = None,
 938      streaming_callback: StreamingCallbackT | None = None,
 939      tools: ToolsType | None = None,
 940      tool_parsing_function: Callable[[str], list[ToolCall] | None] | None = None,
 941      async_executor: ThreadPoolExecutor | None = None,
 942      *,
 943      enable_thinking: bool = False
 944  ) -> None
 945  ```
 946  
 947  Initializes the HuggingFaceLocalChatGenerator component.
 948  
 949  **Parameters:**
 950  
 951  - **model** (<code>str</code>) – The Hugging Face text generation model name or path,
 952    for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`.
 953    The model must be a chat model supporting the ChatML messaging
 954    format.
 955    If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
 956  - **task** (<code>Literal['text-generation', 'text2text-generation'] | None</code>) – The task for the Hugging Face pipeline. Possible options:
 957  - `text-generation`: Supported by decoder models, like GPT.
 958  - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead.
 959    Previously supported by encoder–decoder models such as T5.
 960    If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
 961    If not specified, the component calls the Hugging Face API to infer the task from the model name.
 962  - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device.
 963    If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
 964  - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files.
 965    If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
 966  - **chat_template** (<code>str | None</code>) – Specifies an optional Jinja template for formatting chat
 967    messages. Most high-quality chat models have their own templates, but for models without this
 968    feature or if you prefer a custom template, use this parameter.
 969  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
 970    Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
 971    See Hugging Face's documentation for more information:
 972  - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
 973  - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
 974      The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens.
 975  - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the
 976    Hugging Face pipeline for text generation.
 977    These keyword arguments provide fine-grained control over the Hugging Face pipeline.
 978    In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
 979    For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
 980    In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
 981  - **stop_words** (<code>list\[str\] | None</code>) – A list of stop words. If the model generates a stop word, the generation stops.
 982    If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
 983    For some chat models, the output includes both the new text and the original prompt.
 984    In these cases, make sure your prompt has no stop words.
 985  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
 986  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 987  - **tool_parsing_function** (<code>Callable\\[[str\], list\[ToolCall\] | None\] | None</code>) – A callable that takes a string and returns a list of ToolCall objects or None.
 988    If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern.
 989  - **async_executor** (<code>ThreadPoolExecutor | None</code>) – Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be
 990    initialized and used
 991  - **enable_thinking** (<code>bool</code>) – Whether to enable thinking mode in the chat template for thinking-capable models.
 992    When enabled, the model generates intermediate reasoning before the final response. Defaults to False.
 993  
 994  #### shutdown
 995  
 996  ```python
 997  shutdown() -> None
 998  ```
 999  
1000  Explicitly shutdown the executor if we own it.
1001  
1002  #### warm_up
1003  
1004  ```python
1005  warm_up() -> None
1006  ```
1007  
1008  Initializes the component and warms up tools if provided.
1009  
1010  #### to_dict
1011  
1012  ```python
1013  to_dict() -> dict[str, Any]
1014  ```
1015  
1016  Serializes the component to a dictionary.
1017  
1018  **Returns:**
1019  
1020  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1021  
1022  #### from_dict
1023  
1024  ```python
1025  from_dict(data: dict[str, Any]) -> HuggingFaceLocalChatGenerator
1026  ```
1027  
1028  Deserializes the component from a dictionary.
1029  
1030  **Parameters:**
1031  
1032  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
1033  
1034  **Returns:**
1035  
1036  - <code>HuggingFaceLocalChatGenerator</code> – The deserialized component.
1037  
1038  #### run
1039  
1040  ```python
1041  run(
1042      messages: list[ChatMessage],
1043      generation_kwargs: dict[str, Any] | None = None,
1044      streaming_callback: StreamingCallbackT | None = None,
1045      tools: ToolsType | None = None,
1046  ) -> dict[str, list[ChatMessage]]
1047  ```
1048  
1049  Invoke text generation inference based on the provided messages and generation parameters.
1050  
1051  **Parameters:**
1052  
1053  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
1054  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
1055  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1056  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1057    If set, it will override the `tools` parameter provided during initialization.
1058  
1059  **Returns:**
1060  
1061  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
1062  - `replies`: A list containing the generated responses as ChatMessage instances.
1063  
1064  #### create_message
1065  
1066  ```python
1067  create_message(
1068      text: str,
1069      index: int,
1070      tokenizer: Union[PreTrainedTokenizer, PreTrainedTokenizerFast],
1071      prompt: str,
1072      generation_kwargs: dict[str, Any],
1073      parse_tool_calls: bool = False,
1074  ) -> ChatMessage
1075  ```
1076  
1077  Create a ChatMessage instance from the provided text, populated with metadata.
1078  
1079  **Parameters:**
1080  
1081  - **text** (<code>str</code>) – The generated text.
1082  - **index** (<code>int</code>) – The index of the generated text.
1083  - **tokenizer** (<code>Union\[PreTrainedTokenizer, PreTrainedTokenizerFast\]</code>) – The tokenizer used for generation.
1084  - **prompt** (<code>str</code>) – The prompt used for generation.
1085  - **generation_kwargs** (<code>dict\[str, Any\]</code>) – The generation parameters.
1086  - **parse_tool_calls** (<code>bool</code>) – Whether to attempt parsing tool calls from the text.
1087  
1088  **Returns:**
1089  
1090  - <code>ChatMessage</code> – A ChatMessage instance.
1091  
1092  #### run_async
1093  
1094  ```python
1095  run_async(
1096      messages: list[ChatMessage],
1097      generation_kwargs: dict[str, Any] | None = None,
1098      streaming_callback: StreamingCallbackT | None = None,
1099      tools: ToolsType | None = None,
1100  ) -> dict[str, list[ChatMessage]]
1101  ```
1102  
1103  Asynchronously invokes text generation inference based on the provided messages and generation parameters.
1104  
1105  This is the asynchronous version of the `run` method. It has the same parameters
1106  and return values but can be used with `await` in an async code.
1107  
1108  **Parameters:**
1109  
1110  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
1111  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
1112  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1113  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1114    If set, it will override the `tools` parameter provided during initialization.
1115  
1116  **Returns:**
1117  
1118  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
1119  - `replies`: A list containing the generated responses as ChatMessage instances.
1120  
1121  ## chat/llm
1122  
1123  ### LLM
1124  
1125  Bases: <code>Agent</code>
1126  
1127  A text generation component powered by a large language model.
1128  
1129  The LLM component is a simplified version of the Agent that focuses solely on text generation
1130  without tool usage. It processes messages and returns a single response from the language model.
1131  
1132  ### Usage examples
1133  
1134  ```python
1135  from haystack.components.generators.chat import LLM
1136  from haystack.components.generators.chat import OpenAIChatGenerator
1137  from haystack.dataclasses import ChatMessage
1138  
1139  llm = LLM(
1140      chat_generator=OpenAIChatGenerator(),
1141      system_prompt="You are a helpful translation assistant.",
1142      user_prompt="""{% message role="user"%}
1143  Summarize the following document: {{ document }}
1144  {% endmessage %}""",
1145      required_variables=["document"],
1146  )
1147  
1148  result = llm.run(document="The weather is lovely today and the sun is shining. ")
1149  print(result["last_message"].text)
1150  ```
1151  
1152  #### __init__
1153  
1154  ```python
1155  __init__(
1156      *,
1157      chat_generator: ChatGenerator,
1158      system_prompt: str | None = None,
1159      user_prompt: str | None = None,
1160      required_variables: list[str] | Literal["*"] | None = None,
1161      streaming_callback: StreamingCallbackT | None = None
1162  ) -> None
1163  ```
1164  
1165  Initialize the LLM component.
1166  
1167  **Parameters:**
1168  
1169  - **chat_generator** (<code>ChatGenerator</code>) – An instance of the chat generator that the LLM should use.
1170  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM.
1171  - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided this is appended to the messages provided at runtime.
1172  - **required_variables** (<code>list\[str\] | Literal['\*'] | None</code>) – List variables that must be provided as input to user_prompt.
1173    If a variable listed as required is not provided, an exception is raised.
1174    If set to `"*"`, all variables found in the prompt are required. Optional.
1175  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM.
1176  
1177  #### to_dict
1178  
1179  ```python
1180  to_dict() -> dict[str, Any]
1181  ```
1182  
1183  Serialize the LLM component to a dictionary.
1184  
1185  **Returns:**
1186  
1187  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1188  
1189  #### from_dict
1190  
1191  ```python
1192  from_dict(data: dict[str, Any]) -> LLM
1193  ```
1194  
1195  Deserialize the LLM from a dictionary.
1196  
1197  **Parameters:**
1198  
1199  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
1200  
1201  **Returns:**
1202  
1203  - <code>LLM</code> – Deserialized LLM instance.
1204  
1205  #### run
1206  
1207  ```python
1208  run(
1209      messages: list[ChatMessage] | None = None,
1210      streaming_callback: StreamingCallbackT | None = None,
1211      *,
1212      generation_kwargs: dict[str, Any] | None = None,
1213      system_prompt: str | None = None,
1214      user_prompt: str | None = None,
1215      **kwargs: Any
1216  ) -> dict[str, Any]
1217  ```
1218  
1219  Process messages and generate a response from the language model.
1220  
1221  **Parameters:**
1222  
1223  - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process.
1224  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM.
1225  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters
1226    will override the parameters passed during component initialization.
1227  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt.
1228  - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is
1229    appended to the messages provided at runtime.
1230  - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt`
1231    (the keys must match template variable names).
1232  
1233  **Returns:**
1234  
1235  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
1236  - "messages": List of all messages exchanged during the LLM's run.
1237  - "last_message": The last message exchanged during the LLM's run.
1238  
1239  #### run_async
1240  
1241  ```python
1242  run_async(
1243      messages: list[ChatMessage] | None = None,
1244      streaming_callback: StreamingCallbackT | None = None,
1245      *,
1246      generation_kwargs: dict[str, Any] | None = None,
1247      system_prompt: str | None = None,
1248      user_prompt: str | None = None,
1249      **kwargs: Any
1250  ) -> dict[str, Any]
1251  ```
1252  
1253  Asynchronously process messages and generate a response from the language model.
1254  
1255  **Parameters:**
1256  
1257  - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process.
1258  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An asynchronous callback that will be invoked when a response is streamed
1259    from the LLM.
1260  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters
1261    will override the parameters passed during component initialization.
1262  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt.
1263  - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is
1264    appended to the messages provided at runtime.
1265  - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt`
1266    (the keys must match template variable names).
1267  
1268  **Returns:**
1269  
1270  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
1271  - "messages": List of all messages exchanged during the LLM's run.
1272  - "last_message": The last message exchanged during the LLM's run.
1273  
1274  ## chat/openai
1275  
1276  ### OpenAIChatGenerator
1277  
1278  Completes chats using OpenAI's large language models (LLMs).
1279  
1280  It works with the gpt-4 and gpt-5 series models and supports streaming responses
1281  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1282  format in input and output.
1283  
1284  You can customize how the text is generated by passing parameters to the
1285  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1286  the component or when you run it. Any parameter that works with
1287  `openai.ChatCompletion.create` will work here too.
1288  
1289  For details on OpenAI API parameters, see
1290  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
1291  
1292  ### Usage example
1293  
1294  ```python
1295  from haystack.components.generators.chat import OpenAIChatGenerator
1296  from haystack.dataclasses import ChatMessage
1297  
1298  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1299  
1300  client = OpenAIChatGenerator()
1301  response = client.run(messages)
1302  print(response)
1303  ```
1304  
1305  Output:
1306  
1307  ```
1308  {'replies':
1309      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=
1310      [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence
1311          that focuses on enabling computers to understand, interpret, and generate human language in
1312          a way that is meaningful and useful.")],
1313       _name=None,
1314       _meta={'model': 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop',
1315       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})
1316      ]
1317  }
1318  ```
1319  
1320  #### SUPPORTED_MODELS
1321  
1322  ```python
1323  SUPPORTED_MODELS = [
1324      "gpt-5-mini",
1325      "gpt-5-nano",
1326      "gpt-5",
1327      "gpt-5.1",
1328      "gpt-5.2",
1329      "gpt-5.2-pro",
1330      "gpt-5.4",
1331      "gpt-5-pro",
1332      "gpt-4.1",
1333      "gpt-4.1-mini",
1334      "gpt-4.1-nano",
1335      "gpt-4o",
1336      "gpt-4o-mini",
1337      "gpt-4-turbo",
1338      "gpt-4",
1339      "gpt-3.5-turbo",
1340  ]
1341  
1342  ```
1343  
1344  A non-exhaustive list of chat models supported by this component.
1345  See https://developers.openai.com/api/docs/models for the full list and snapshot IDs.
1346  
1347  #### __init__
1348  
1349  ```python
1350  __init__(
1351      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
1352      model: str = "gpt-5-mini",
1353      streaming_callback: StreamingCallbackT | None = None,
1354      api_base_url: str | None = None,
1355      organization: str | None = None,
1356      generation_kwargs: dict[str, Any] | None = None,
1357      timeout: float | None = None,
1358      max_retries: int | None = None,
1359      tools: ToolsType | None = None,
1360      tools_strict: bool = False,
1361      http_client_kwargs: dict[str, Any] | None = None,
1362  )
1363  ```
1364  
1365  Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini
1366  
1367  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
1368  environment variables to override the `timeout` and `max_retries` parameters respectively
1369  in the OpenAI client.
1370  
1371  **Parameters:**
1372  
1373  - **api_key** (<code>Secret</code>) – The OpenAI API key.
1374    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
1375    during initialization.
1376  - **model** (<code>str</code>) – The name of the model to use.
1377  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1378    The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1379    as an argument.
1380  - **api_base_url** (<code>str | None</code>) – An optional base URL.
1381  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See
1382    [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1383  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to
1384    the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
1385    more details.
1386    Some of the supported parameters:
1387  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
1388    including visible output tokens and reasoning tokens.
1389  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
1390    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
1391  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1392    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1393    comprising the top 10% probability mass are considered.
1394  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
1395    it will generate two completions for each of the three prompts, ending up with 6 completions in total.
1396  - `stop`: One or more sequences after which the LLM should stop generating tokens.
1397  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
1398    the model will be less likely to repeat the same token in the text.
1399  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
1400    Bigger values mean the model will be less likely to repeat the same token in the text.
1401  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
1402    values are the bias to add to that token.
1403  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
1404    If provided, the output will always be validated against this
1405    format (unless the model returns a tool call).
1406    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1407    Notes:
1408    - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
1409      Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1410      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1411    - For structured outputs with streaming,
1412      the `response_format` must be a JSON schema and not a Pydantic model.
1413  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
1414    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1415  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
1416    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1417  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1418  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1419    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1420  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1421    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
1422  
1423  #### warm_up
1424  
1425  ```python
1426  warm_up()
1427  ```
1428  
1429  Warm up the OpenAI chat generator.
1430  
1431  This will warm up the tools registered in the chat generator.
1432  This method is idempotent and will only warm up the tools once.
1433  
1434  #### to_dict
1435  
1436  ```python
1437  to_dict() -> dict[str, Any]
1438  ```
1439  
1440  Serialize this component to a dictionary.
1441  
1442  **Returns:**
1443  
1444  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
1445  
1446  #### from_dict
1447  
1448  ```python
1449  from_dict(data: dict[str, Any]) -> OpenAIChatGenerator
1450  ```
1451  
1452  Deserialize this component from a dictionary.
1453  
1454  **Parameters:**
1455  
1456  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
1457  
1458  **Returns:**
1459  
1460  - <code>OpenAIChatGenerator</code> – The deserialized component instance.
1461  
1462  #### run
1463  
1464  ```python
1465  run(
1466      messages: list[ChatMessage],
1467      streaming_callback: StreamingCallbackT | None = None,
1468      generation_kwargs: dict[str, Any] | None = None,
1469      *,
1470      tools: ToolsType | None = None,
1471      tools_strict: bool | None = None
1472  ) -> dict[str, list[ChatMessage]]
1473  ```
1474  
1475  Invokes chat completion based on the provided messages and generation parameters.
1476  
1477  **Parameters:**
1478  
1479  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1480  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1481  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1482    override the parameters passed during component initialization.
1483    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1484  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1485    If set, it will override the `tools` parameter provided during initialization.
1486  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1487    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1488    If set, it will override the `tools_strict` parameter set during component initialization.
1489  
1490  **Returns:**
1491  
1492  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1493  - `replies`: A list containing the generated responses as ChatMessage instances.
1494  
1495  #### run_async
1496  
1497  ```python
1498  run_async(
1499      messages: list[ChatMessage],
1500      streaming_callback: StreamingCallbackT | None = None,
1501      generation_kwargs: dict[str, Any] | None = None,
1502      *,
1503      tools: ToolsType | None = None,
1504      tools_strict: bool | None = None
1505  ) -> dict[str, list[ChatMessage]]
1506  ```
1507  
1508  Asynchronously invokes chat completion based on the provided messages and generation parameters.
1509  
1510  This is the asynchronous version of the `run` method. It has the same parameters and return values
1511  but can be used with `await` in async code.
1512  
1513  **Parameters:**
1514  
1515  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1516  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1517    Must be a coroutine.
1518  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1519    override the parameters passed during component initialization.
1520    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1521  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1522    If set, it will override the `tools` parameter provided during initialization.
1523  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1524    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1525    If set, it will override the `tools_strict` parameter set during component initialization.
1526  
1527  **Returns:**
1528  
1529  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1530  - `replies`: A list containing the generated responses as ChatMessage instances.
1531  
1532  ## chat/openai_responses
1533  
1534  ### OpenAIResponsesChatGenerator
1535  
1536  Completes chats using OpenAI's Responses API.
1537  
1538  It works with the gpt-4 and o-series models and supports streaming responses
1539  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1540  format in input and output.
1541  
1542  You can customize how the text is generated by passing parameters to the
1543  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1544  the component or when you run it. Any parameter that works with
1545  `openai.Responses.create` will work here too.
1546  
1547  For details on OpenAI API parameters, see
1548  [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses).
1549  
1550  ### Usage example
1551  
1552  ```python
1553  from haystack.components.generators.chat import OpenAIResponsesChatGenerator
1554  from haystack.dataclasses import ChatMessage
1555  
1556  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1557  
1558  client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}})
1559  response = client.run(messages)
1560  print(response)
1561  ```
1562  
1563  #### __init__
1564  
1565  ```python
1566  __init__(
1567      *,
1568      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
1569      model: str = "gpt-5-mini",
1570      streaming_callback: StreamingCallbackT | None = None,
1571      api_base_url: str | None = None,
1572      organization: str | None = None,
1573      generation_kwargs: dict[str, Any] | None = None,
1574      timeout: float | None = None,
1575      max_retries: int | None = None,
1576      tools: ToolsType | list[dict] | None = None,
1577      tools_strict: bool = False,
1578      http_client_kwargs: dict[str, Any] | None = None
1579  )
1580  ```
1581  
1582  Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default.
1583  
1584  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
1585  environment variables to override the `timeout` and `max_retries` parameters respectively
1586  in the OpenAI client.
1587  
1588  **Parameters:**
1589  
1590  - **api_key** (<code>Secret</code>) – The OpenAI API key.
1591    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
1592    during initialization.
1593  - **model** (<code>str</code>) – The name of the model to use.
1594  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1595    The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1596    as an argument.
1597  - **api_base_url** (<code>str | None</code>) – An optional base URL.
1598  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See
1599    [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1600  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent
1601    directly to the OpenAI endpoint.
1602    See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for
1603    more details.
1604    Some of the supported parameters:
1605  - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random,
1606    while lower values like 0.2 will make it more focused and deterministic.
1607  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1608    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1609    comprising the top 10% probability mass are considered.
1610  - `previous_response_id`: The ID of the previous response.
1611    Use this to create multi-turn conversations.
1612  - `text_format`: A Pydantic model that enforces the structure of the model's response.
1613    If provided, the output will always be validated against this
1614    format (unless the model returns a tool call).
1615    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1616  - `text`: A JSON schema that enforces the structure of the model's response.
1617    If provided, the output will always be validated against this
1618    format (unless the model returns a tool call).
1619    Notes:
1620    - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
1621    - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored.
1622    - Currently, this component doesn't support streaming for structured outputs.
1623    - Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1624      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1625  - `reasoning`: A dictionary of parameters for reasoning. For example:
1626    - `summary`: The summary of the reasoning.
1627    - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`.
1628    - `generate_summary`: Whether to generate a summary of the reasoning.
1629      Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled.
1630      For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning).
1631  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
1632    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1633  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
1634    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1635  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. This parameter can accept either a
1636    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1637    OpenAI/MCP tool definitions.
1638    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1639    For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
1640  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
1641    follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
1642    are strict by default.
1643  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1644    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
1645  
1646  #### warm_up
1647  
1648  ```python
1649  warm_up()
1650  ```
1651  
1652  Warm up the OpenAI responses chat generator.
1653  
1654  This will warm up the tools registered in the chat generator.
1655  This method is idempotent and will only warm up the tools once.
1656  
1657  #### to_dict
1658  
1659  ```python
1660  to_dict() -> dict[str, Any]
1661  ```
1662  
1663  Serialize this component to a dictionary.
1664  
1665  **Returns:**
1666  
1667  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
1668  
1669  #### from_dict
1670  
1671  ```python
1672  from_dict(data: dict[str, Any]) -> OpenAIResponsesChatGenerator
1673  ```
1674  
1675  Deserialize this component from a dictionary.
1676  
1677  **Parameters:**
1678  
1679  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
1680  
1681  **Returns:**
1682  
1683  - <code>OpenAIResponsesChatGenerator</code> – The deserialized component instance.
1684  
1685  #### run
1686  
1687  ```python
1688  run(
1689      messages: list[ChatMessage],
1690      *,
1691      streaming_callback: StreamingCallbackT | None = None,
1692      generation_kwargs: dict[str, Any] | None = None,
1693      tools: ToolsType | list[dict] | None = None,
1694      tools_strict: bool | None = None
1695  ) -> dict[str, list[ChatMessage]]
1696  ```
1697  
1698  Invokes response generation based on the provided messages and generation parameters.
1699  
1700  **Parameters:**
1701  
1702  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1703  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1704  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1705    override the parameters passed during component initialization.
1706    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
1707  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. If set, it will override the
1708    `tools` parameter set during component initialization. This parameter can accept either a
1709    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1710    OpenAI/MCP tool definitions.
1711    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1712    For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
1713  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
1714    follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
1715    are strict by default.
1716    If set, it will override the `tools_strict` parameter set during component initialization.
1717  
1718  **Returns:**
1719  
1720  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1721  - `replies`: A list containing the generated responses as ChatMessage instances.
1722  
1723  #### run_async
1724  
1725  ```python
1726  run_async(
1727      messages: list[ChatMessage],
1728      *,
1729      streaming_callback: StreamingCallbackT | None = None,
1730      generation_kwargs: dict[str, Any] | None = None,
1731      tools: ToolsType | list[dict] | None = None,
1732      tools_strict: bool | None = None
1733  ) -> dict[str, list[ChatMessage]]
1734  ```
1735  
1736  Asynchronously invokes response generation based on the provided messages and generation parameters.
1737  
1738  This is the asynchronous version of the `run` method. It has the same parameters and return values
1739  but can be used with `await` in async code.
1740  
1741  **Parameters:**
1742  
1743  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1744  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1745    Must be a coroutine.
1746  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1747    override the parameters passed during component initialization.
1748    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
1749  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
1750    `tools` parameter set during component initialization. This parameter can accept either a list of
1751    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1752    OpenAI/MCP tool definitions.
1753    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1754  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1755    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1756    If set, it will override the `tools_strict` parameter set during component initialization.
1757  
1758  **Returns:**
1759  
1760  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1761  - `replies`: A list containing the generated responses as ChatMessage instances.
1762  
1763  ## hugging_face_api
1764  
1765  ### HuggingFaceAPIGenerator
1766  
1767  Generates text using Hugging Face APIs.
1768  
1769  Use it with the following Hugging Face APIs:
1770  
1771  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
1772  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
1773  
1774  **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the
1775  `text_generation` endpoint. Generative models are now only available through providers supporting the
1776  `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API.
1777  Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint.
1778  
1779  ### Usage examples
1780  
1781  #### With Hugging Face Inference Endpoints
1782  
1783  ```python
1784  from haystack.components.generators import HuggingFaceAPIGenerator
1785  from haystack.utils import Secret
1786  
1787  generator = HuggingFaceAPIGenerator(api_type="inference_endpoints",
1788                                      api_params={"url": "<your-inference-endpoint-url>"},
1789                                      token=Secret.from_token("<your-api-key>"))
1790  
1791  result = generator.run(prompt="What's Natural Language Processing?")
1792  print(result)
1793  ```
1794  
1795  #### With self-hosted text generation inference
1796  
1797  ```python
1798  from haystack.components.generators import HuggingFaceAPIGenerator
1799  
1800  generator = HuggingFaceAPIGenerator(api_type="text_generation_inference",
1801                                      api_params={"url": "http://localhost:8080"})
1802  
1803  result = generator.run(prompt="What's Natural Language Processing?")
1804  print(result)
1805  ```
1806  
1807  #### With the free serverless inference API
1808  
1809  Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the
1810  `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the
1811  `chat_completion` endpoint.
1812  
1813  ```python
1814  from haystack.components.generators import HuggingFaceAPIGenerator
1815  from haystack.utils import Secret
1816  
1817  generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
1818                                      api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
1819                                      token=Secret.from_token("<your-api-key>"))
1820  
1821  result = generator.run(prompt="What's Natural Language Processing?")
1822  print(result)
1823  ```
1824  
1825  #### __init__
1826  
1827  ```python
1828  __init__(
1829      api_type: HFGenerationAPIType | str,
1830      api_params: dict[str, str],
1831      token: Secret | None = Secret.from_env_var(
1832          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
1833      ),
1834      generation_kwargs: dict[str, Any] | None = None,
1835      stop_words: list[str] | None = None,
1836      streaming_callback: StreamingCallbackT | None = None,
1837  )
1838  ```
1839  
1840  Initialize the HuggingFaceAPIGenerator instance.
1841  
1842  **Parameters:**
1843  
1844  - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types:
1845  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
1846  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
1847  - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api).
1848    This might no longer work due to changes in the models offered in the Hugging Face Inference API.
1849    Please use the `HuggingFaceAPIChatGenerator` component instead.
1850  - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys:
1851  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
1852  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
1853    `TEXT_GENERATION_INFERENCE`.
1854  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc.
1855  - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization.
1856    Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
1857  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`,
1858    `temperature`, `top_k`, `top_p`.
1859    For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation)
1860    for more information.
1861  - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words.
1862  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1863  
1864  #### to_dict
1865  
1866  ```python
1867  to_dict() -> dict[str, Any]
1868  ```
1869  
1870  Serialize this component to a dictionary.
1871  
1872  **Returns:**
1873  
1874  - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component.
1875  
1876  #### from_dict
1877  
1878  ```python
1879  from_dict(data: dict[str, Any]) -> HuggingFaceAPIGenerator
1880  ```
1881  
1882  Deserialize this component from a dictionary.
1883  
1884  #### run
1885  
1886  ```python
1887  run(
1888      prompt: str,
1889      streaming_callback: StreamingCallbackT | None = None,
1890      generation_kwargs: dict[str, Any] | None = None,
1891  )
1892  ```
1893  
1894  Invoke the text generation inference for the given prompt and generation parameters.
1895  
1896  **Parameters:**
1897  
1898  - **prompt** (<code>str</code>) – A string representing the prompt.
1899  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1900  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
1901  
1902  **Returns:**
1903  
1904  - – A dictionary with the generated replies and metadata. Both are lists of length n.
1905  - replies: A list of strings representing the generated replies.
1906  
1907  ## hugging_face_local
1908  
1909  ### HuggingFaceLocalGenerator
1910  
1911  Generates text using models from Hugging Face that run locally.
1912  
1913  LLMs running locally may need powerful hardware.
1914  
1915  ### Usage example
1916  
1917  ```python
1918  from haystack.components.generators import HuggingFaceLocalGenerator
1919  
1920  generator = HuggingFaceLocalGenerator(
1921      model="Qwen/Qwen3-0.6B",
1922      task="text-generation",
1923      generation_kwargs={"max_new_tokens": 100, "temperature": 0.9}
1924  )
1925  
1926  print(generator.run("Who is the best American actor?"))
1927  # {'replies': ['John Cusack']}
1928  ```
1929  
1930  #### __init__
1931  
1932  ```python
1933  __init__(
1934      model: str = "Qwen/Qwen3-0.6B",
1935      task: Literal["text-generation", "text2text-generation"] | None = None,
1936      device: ComponentDevice | None = None,
1937      token: Secret | None = Secret.from_env_var(
1938          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
1939      ),
1940      generation_kwargs: dict[str, Any] | None = None,
1941      huggingface_pipeline_kwargs: dict[str, Any] | None = None,
1942      stop_words: list[str] | None = None,
1943      streaming_callback: StreamingCallbackT | None = None,
1944  )
1945  ```
1946  
1947  Creates an instance of a HuggingFaceLocalGenerator.
1948  
1949  **Parameters:**
1950  
1951  - **model** (<code>str</code>) – The Hugging Face text generation model name or path.
1952  - **task** (<code>Literal['text-generation', 'text2text-generation'] | None</code>) – The task for the Hugging Face pipeline. Possible options:
1953  - `text-generation`: Supported by decoder models, like GPT.
1954  - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead.
1955    Previously supported by encoder–decoder models such as T5.
1956    If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1957    If not specified, the component calls the Hugging Face API to infer the task from the model name.
1958  - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device.
1959    If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
1960  - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files.
1961    If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1962  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
1963    Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
1964    See Hugging Face's documentation for more information:
1965  - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
1966  - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
1967  - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the
1968    Hugging Face pipeline for text generation.
1969    These keyword arguments provide fine-grained control over the Hugging Face pipeline.
1970    In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
1971    For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
1972    In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization:
1973    [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
1974  - **stop_words** (<code>list\[str\] | None</code>) – If the model generates a stop word, the generation stops.
1975    If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
1976    For some chat models, the output includes both the new text and the original prompt.
1977    In these cases, make sure your prompt has no stop words.
1978  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1979  
1980  #### warm_up
1981  
1982  ```python
1983  warm_up()
1984  ```
1985  
1986  Initializes the component.
1987  
1988  #### to_dict
1989  
1990  ```python
1991  to_dict() -> dict[str, Any]
1992  ```
1993  
1994  Serializes the component to a dictionary.
1995  
1996  **Returns:**
1997  
1998  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1999  
2000  #### from_dict
2001  
2002  ```python
2003  from_dict(data: dict[str, Any]) -> HuggingFaceLocalGenerator
2004  ```
2005  
2006  Deserializes the component from a dictionary.
2007  
2008  **Parameters:**
2009  
2010  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
2011  
2012  **Returns:**
2013  
2014  - <code>HuggingFaceLocalGenerator</code> – The deserialized component.
2015  
2016  #### run
2017  
2018  ```python
2019  run(
2020      prompt: str,
2021      streaming_callback: StreamingCallbackT | None = None,
2022      generation_kwargs: dict[str, Any] | None = None,
2023  )
2024  ```
2025  
2026  Run the text generation model on the given prompt.
2027  
2028  **Parameters:**
2029  
2030  - **prompt** (<code>str</code>) – A string representing the prompt.
2031  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2032  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
2033  
2034  **Returns:**
2035  
2036  - – A dictionary containing the generated replies.
2037  - replies: A list of strings representing the generated replies.
2038  
2039  ## openai
2040  
2041  ### OpenAIGenerator
2042  
2043  Generates text using OpenAI's large language models (LLMs).
2044  
2045  It works with the gpt-4 and gpt-5 series models and supports streaming responses
2046  from OpenAI API. It uses strings as input and output.
2047  
2048  You can customize how the text is generated by passing parameters to the
2049  OpenAI API. Use the `**generation_kwargs` argument when you initialize
2050  the component or when you run it. Any parameter that works with
2051  `openai.ChatCompletion.create` will work here too.
2052  
2053  For details on OpenAI API parameters, see
2054  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
2055  
2056  ### Usage example
2057  
2058  ```python
2059  from haystack.components.generators import OpenAIGenerator
2060  client = OpenAIGenerator()
2061  response = client.run("What's Natural Language Processing? Be brief.")
2062  print(response)
2063  
2064  >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
2065  >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
2066  >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
2067  >> 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
2068  >> 'completion_tokens': 49, 'total_tokens': 65}}]}
2069  ```
2070  
2071  #### __init__
2072  
2073  ```python
2074  __init__(
2075      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
2076      model: str = "gpt-5-mini",
2077      streaming_callback: StreamingCallbackT | None = None,
2078      api_base_url: str | None = None,
2079      organization: str | None = None,
2080      system_prompt: str | None = None,
2081      generation_kwargs: dict[str, Any] | None = None,
2082      timeout: float | None = None,
2083      max_retries: int | None = None,
2084      http_client_kwargs: dict[str, Any] | None = None,
2085  )
2086  ```
2087  
2088  Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini
2089  
2090  By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters
2091  in the OpenAI client.
2092  
2093  **Parameters:**
2094  
2095  - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI.
2096  - **model** (<code>str</code>) – The name of the model to use.
2097  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2098    The callback function accepts StreamingChunk as an argument.
2099  - **api_base_url** (<code>str | None</code>) – An optional base URL.
2100  - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`.
2101  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the system prompt is
2102    omitted, and the default system prompt of the model is used.
2103  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are all sent directly to
2104    the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
2105    more details.
2106    Some of the supported parameters:
2107  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
2108    including visible output tokens and reasoning tokens.
2109  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
2110    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
2111  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
2112    considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens
2113    comprising the top 10% probability mass are considered.
2114  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
2115    it will generate two completions for each of the three prompts, ending up with 6 completions in total.
2116  - `stop`: One or more sequences after which the LLM should stop generating tokens.
2117  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
2118    the model will be less likely to repeat the same token in the text.
2119  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
2120    Bigger values mean the model will be less likely to repeat the same token in the text.
2121  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
2122    values are the bias to add to that token.
2123  - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable
2124    or set to 30.
2125  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred
2126    from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
2127  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
2128    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
2129  
2130  #### to_dict
2131  
2132  ```python
2133  to_dict() -> dict[str, Any]
2134  ```
2135  
2136  Serialize this component to a dictionary.
2137  
2138  **Returns:**
2139  
2140  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
2141  
2142  #### from_dict
2143  
2144  ```python
2145  from_dict(data: dict[str, Any]) -> OpenAIGenerator
2146  ```
2147  
2148  Deserialize this component from a dictionary.
2149  
2150  **Parameters:**
2151  
2152  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
2153  
2154  **Returns:**
2155  
2156  - <code>OpenAIGenerator</code> – The deserialized component instance.
2157  
2158  #### run
2159  
2160  ```python
2161  run(
2162      prompt: str,
2163      system_prompt: str | None = None,
2164      streaming_callback: StreamingCallbackT | None = None,
2165      generation_kwargs: dict[str, Any] | None = None,
2166  ) -> dict[str, list[str] | list[dict[str, Any]]]
2167  ```
2168  
2169  Invoke the text generation inference based on the provided messages and generation parameters.
2170  
2171  **Parameters:**
2172  
2173  - **prompt** (<code>str</code>) – The string prompt to use for text generation.
2174  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If this run time system prompt is omitted, the system
2175    prompt, if defined at initialisation time, is used.
2176  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2177  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will potentially override the parameters
2178    passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to
2179    the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create).
2180  
2181  **Returns:**
2182  
2183  - <code>dict\[str, list\[str\] | list\[dict\[str, Any\]\]\]</code> – A list of strings containing the generated responses and a list of dictionaries containing the metadata
2184    for each response.
2185  
2186  ## openai_dalle
2187  
2188  ### DALLEImageGenerator
2189  
2190  Generates images using OpenAI's DALL-E model.
2191  
2192  For details on OpenAI API parameters, see
2193  [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create).
2194  
2195  ### Usage example
2196  
2197  ```python
2198  from haystack.components.generators import DALLEImageGenerator
2199  image_generator = DALLEImageGenerator()
2200  response = image_generator.run("Show me a picture of a black cat.")
2201  print(response)
2202  ```
2203  
2204  #### __init__
2205  
2206  ```python
2207  __init__(
2208      model: str = "dall-e-3",
2209      quality: Literal["standard", "hd"] = "standard",
2210      size: Literal[
2211          "256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"
2212      ] = "1024x1024",
2213      response_format: Literal["url", "b64_json"] = "url",
2214      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
2215      api_base_url: str | None = None,
2216      organization: str | None = None,
2217      timeout: float | None = None,
2218      max_retries: int | None = None,
2219      http_client_kwargs: dict[str, Any] | None = None,
2220  )
2221  ```
2222  
2223  Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3.
2224  
2225  **Parameters:**
2226  
2227  - **model** (<code>str</code>) – The model to use for image generation. Can be "dall-e-2" or "dall-e-3".
2228  - **quality** (<code>Literal['standard', 'hd']</code>) – The quality of the generated image. Can be "standard" or "hd".
2229  - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792']</code>) – The size of the generated images.
2230    Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2.
2231    Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models.
2232  - **response_format** (<code>Literal['url', 'b64_json']</code>) – The format of the response. Can be "url" or "b64_json".
2233  - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI.
2234  - **api_base_url** (<code>str | None</code>) – An optional base URL.
2235  - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`.
2236  - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable
2237    or set to 30.
2238  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred
2239    from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
2240  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
2241    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
2242  
2243  #### warm_up
2244  
2245  ```python
2246  warm_up() -> None
2247  ```
2248  
2249  Warm up the OpenAI client.
2250  
2251  #### run
2252  
2253  ```python
2254  run(
2255      prompt: str,
2256      size: (
2257          Literal["256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"]
2258          | None
2259      ) = None,
2260      quality: Literal["standard", "hd"] | None = None,
2261      response_format: Literal["url", "b64_json"] | None = None,
2262  )
2263  ```
2264  
2265  Invokes the image generation inference based on the provided prompt and generation parameters.
2266  
2267  **Parameters:**
2268  
2269  - **prompt** (<code>str</code>) – The prompt to generate the image.
2270  - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792'] | None</code>) – If provided, overrides the size provided during initialization.
2271  - **quality** (<code>Literal['standard', 'hd'] | None</code>) – If provided, overrides the quality provided during initialization.
2272  - **response_format** (<code>Literal['url', 'b64_json'] | None</code>) – If provided, overrides the response format provided during initialization.
2273  
2274  **Returns:**
2275  
2276  - – A dictionary containing the generated list of images and the revised prompt.
2277    Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings.
2278    The revised prompt is the prompt that was used to generate the image, if there was any revision
2279    to the prompt made by OpenAI.
2280  
2281  #### to_dict
2282  
2283  ```python
2284  to_dict() -> dict[str, Any]
2285  ```
2286  
2287  Serialize this component to a dictionary.
2288  
2289  **Returns:**
2290  
2291  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
2292  
2293  #### from_dict
2294  
2295  ```python
2296  from_dict(data: dict[str, Any]) -> DALLEImageGenerator
2297  ```
2298  
2299  Deserialize this component from a dictionary.
2300  
2301  **Parameters:**
2302  
2303  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
2304  
2305  **Returns:**
2306  
2307  - <code>DALLEImageGenerator</code> – The deserialized component instance.
2308  
2309  ## utils
2310  
2311  ### print_streaming_chunk
2312  
2313  ```python
2314  print_streaming_chunk(chunk: StreamingChunk) -> None
2315  ```
2316  
2317  Callback function to handle and display streaming output chunks.
2318  
2319  This function processes a `StreamingChunk` object by:
2320  
2321  - Printing tool call metadata (if any), including function names and arguments, as they arrive.
2322  - Printing tool call results when available.
2323  - Printing the main content (e.g., text tokens) of the chunk as it is received.
2324  
2325  The function outputs data directly to stdout and flushes output buffers to ensure immediate display during
2326  streaming.
2327  
2328  **Parameters:**
2329  
2330  - **chunk** (<code>StreamingChunk</code>) – A chunk of streaming data containing content and optional metadata, such as tool calls and
2331    tool results.