generators_api.md
   1  ---
   2  title: "Generators"
   3  id: generators-api
   4  description: "Enables text generation using LLMs."
   5  slug: "/generators-api"
   6  ---
   7  
   8  
   9  ## azure
  10  
  11  ### AzureOpenAIGenerator
  12  
  13  Bases: <code>OpenAIGenerator</code>
  14  
  15  Generates text using OpenAI's large language models (LLMs).
  16  
  17  It works with the gpt-4 - type models and supports streaming responses
  18  from OpenAI API.
  19  
  20  You can customize how the text is generated by passing parameters to the
  21  OpenAI API. Use the `**generation_kwargs` argument when you initialize
  22  the component or when you run it. Any parameter that works with
  23  `openai.ChatCompletion.create` will work here too.
  24  
  25  For details on OpenAI API parameters, see
  26  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
  27  
  28  ### Usage example
  29  
  30  ```python
  31  from haystack.components.generators import AzureOpenAIGenerator
  32  from haystack.utils import Secret
  33  client = AzureOpenAIGenerator(
  34      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
  35      api_key=Secret.from_token("<your-api-key>"),
  36      azure_deployment="<this a model name, e.g.  gpt-4.1-mini>")
  37  response = client.run("What's Natural Language Processing? Be brief.")
  38  print(response)
  39  ```
  40  
  41  ```
  42  >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
  43  >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
  44  >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
  45  >> 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
  46  >> 'completion_tokens': 49, 'total_tokens': 65}}]}
  47  ```
  48  
  49  #### __init__
  50  
  51  ```python
  52  __init__(
  53      azure_endpoint: str | None = None,
  54      api_version: str | None = "2024-12-01-preview",
  55      azure_deployment: str | None = "gpt-4.1-mini",
  56      api_key: Secret | None = Secret.from_env_var(
  57          "AZURE_OPENAI_API_KEY", strict=False
  58      ),
  59      azure_ad_token: Secret | None = Secret.from_env_var(
  60          "AZURE_OPENAI_AD_TOKEN", strict=False
  61      ),
  62      organization: str | None = None,
  63      streaming_callback: StreamingCallbackT | None = None,
  64      system_prompt: str | None = None,
  65      timeout: float | None = None,
  66      max_retries: int | None = None,
  67      http_client_kwargs: dict[str, Any] | None = None,
  68      generation_kwargs: dict[str, Any] | None = None,
  69      default_headers: dict[str, str] | None = None,
  70      *,
  71      azure_ad_token_provider: AzureADTokenProvider | None = None
  72  ) -> None
  73  ```
  74  
  75  Initialize the Azure OpenAI Generator.
  76  
  77  **Parameters:**
  78  
  79  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`.
  80  - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview.
  81  - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name.
  82  - **api_key** (<code>Secret | None</code>) – The API key to use for authentication.
  83  - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
  84  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
  85    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
  86  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
  87    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
  88    as an argument.
  89  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the Generator
  90    omits the system prompt and uses the default system prompt.
  91  - **timeout** (<code>float | None</code>) – Timeout for AzureOpenAI client. If not set, it is inferred from the
  92    `OPENAI_TIMEOUT` environment variable or set to 30.
  93  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with AzureOpenAI if it returns an internal error.
  94    If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
  95  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
  96    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
  97  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model, sent directly to
  98    the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for
  99    more details.
 100    Some of the supported parameters:
 101  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 102    including visible output tokens and reasoning tokens.
 103  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 104    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 105  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 106    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
 107    comprising the top 10% probability mass are considered.
 108  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 109    the LLM will generate two completions per prompt, resulting in 6 completions total.
 110  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 111  - `presence_penalty`: The penalty applied if a token is already present.
 112    Higher values make the model less likely to repeat the token.
 113  - `frequency_penalty`: Penalty applied if a token has already been generated.
 114    Higher values make the model less likely to repeat the token.
 115  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 116    values are the bias to add to that token.
 117  - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client.
 118  - **azure_ad_token_provider** (<code>AzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on
 119    every request.
 120  
 121  #### to_dict
 122  
 123  ```python
 124  to_dict() -> dict[str, Any]
 125  ```
 126  
 127  Serialize this component to a dictionary.
 128  
 129  **Returns:**
 130  
 131  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 132  
 133  #### from_dict
 134  
 135  ```python
 136  from_dict(data: dict[str, Any]) -> AzureOpenAIGenerator
 137  ```
 138  
 139  Deserialize this component from a dictionary.
 140  
 141  **Parameters:**
 142  
 143  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 144  
 145  **Returns:**
 146  
 147  - <code>AzureOpenAIGenerator</code> – The deserialized component instance.
 148  
 149  ## chat/azure
 150  
 151  ### AzureOpenAIChatGenerator
 152  
 153  Bases: <code>OpenAIChatGenerator</code>
 154  
 155  Generates text using OpenAI's models on Azure.
 156  
 157  It works with the gpt-4 - type models and supports streaming responses
 158  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 159  format in input and output.
 160  
 161  You can customize how the text is generated by passing parameters to the
 162  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 163  the component or when you run it. Any parameter that works with
 164  `openai.ChatCompletion.create` will work here too.
 165  
 166  For details on OpenAI API parameters, see
 167  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 168  
 169  ### Usage example
 170  
 171  ```python
 172  from haystack.components.generators.chat import AzureOpenAIChatGenerator
 173  from haystack.dataclasses import ChatMessage
 174  from haystack.utils import Secret
 175  
 176  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 177  
 178  client = AzureOpenAIChatGenerator(
 179      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
 180      api_key=Secret.from_token("<your-api-key>"),
 181      azure_deployment="<this a model name, e.g. gpt-4.1-mini>")
 182  response = client.run(messages)
 183  print(response)
 184  ```
 185  
 186  ```
 187  {'replies':
 188      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
 189      "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
 190       enabling computers to understand, interpret, and generate human language in a way that is useful.")],
 191       _name=None,
 192       _meta={'model': 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop',
 193       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})]
 194  }
 195  ```
 196  
 197  #### SUPPORTED_MODELS
 198  
 199  ```python
 200  SUPPORTED_MODELS: list[str] = [
 201      "gpt-5.4",
 202      "gpt-5.4-pro",
 203      "gpt-5.3-codex",
 204      "gpt-5.2",
 205      "gpt-5.2-codex",
 206      "gpt-5.2-chat",
 207      "gpt-5.1",
 208      "gpt-5.1-chat",
 209      "gpt-5.1-codex",
 210      "gpt-5.1-codex-mini",
 211      "gpt-5",
 212      "gpt-5-mini",
 213      "gpt-5-nano",
 214      "gpt-5-chat",
 215      "gpt-4.1",
 216      "gpt-4.1-mini",
 217      "gpt-4.1-nano",
 218      "gpt-4o",
 219      "gpt-4o-mini",
 220      "gpt-4o-audio-preview",
 221      "gpt-realtime-1.5",
 222      "gpt-audio-1.5",
 223      "o1",
 224      "o1-mini",
 225      "o3",
 226      "o3-mini",
 227      "o4-mini",
 228      "codex-mini",
 229      "gpt-4",
 230      "gpt-35-turbo",
 231      "gpt-oss-120b",
 232      "computer-use-preview",
 233  ]
 234  
 235  ```
 236  
 237  A non-exhaustive list of chat models supported by this component.
 238  See https://learn.microsoft.com/en-us/azure/foundry/foundry-models/concepts/models-sold-directly-by-azure
 239  for the full list.
 240  
 241  #### __init__
 242  
 243  ```python
 244  __init__(
 245      azure_endpoint: str | None = None,
 246      api_version: str | None = "2024-12-01-preview",
 247      azure_deployment: str | None = "gpt-4.1-mini",
 248      api_key: Secret | None = Secret.from_env_var(
 249          "AZURE_OPENAI_API_KEY", strict=False
 250      ),
 251      azure_ad_token: Secret | None = Secret.from_env_var(
 252          "AZURE_OPENAI_AD_TOKEN", strict=False
 253      ),
 254      organization: str | None = None,
 255      streaming_callback: StreamingCallbackT | None = None,
 256      timeout: float | None = None,
 257      max_retries: int | None = None,
 258      generation_kwargs: dict[str, Any] | None = None,
 259      default_headers: dict[str, str] | None = None,
 260      tools: ToolsType | None = None,
 261      tools_strict: bool = False,
 262      *,
 263      azure_ad_token_provider: (
 264          AzureADTokenProvider | AsyncAzureADTokenProvider | None
 265      ) = None,
 266      http_client_kwargs: dict[str, Any] | None = None
 267  ) -> None
 268  ```
 269  
 270  Initialize the Azure OpenAI Chat Generator component.
 271  
 272  **Parameters:**
 273  
 274  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
 275  - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview.
 276  - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name.
 277  - **api_key** (<code>Secret | None</code>) – The API key to use for authentication.
 278  - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
 279  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
 280    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
 281  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
 282    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
 283    as an argument.
 284  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
 285    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 286  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
 287    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 288  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to
 289    the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 290    Some of the supported parameters:
 291  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 292    including visible output tokens and reasoning tokens.
 293  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 294    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 295  - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers
 296    tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising
 297    the top 10% probability mass are considered.
 298  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 299    the LLM will generate two completions per prompt, resulting in 6 completions total.
 300  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 301  - `presence_penalty`: The penalty applied if a token is already present.
 302    Higher values make the model less likely to repeat the token.
 303  - `frequency_penalty`: Penalty applied if a token has already been generated.
 304    Higher values make the model less likely to repeat the token.
 305  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 306    values are the bias to add to that token.
 307  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
 308    If provided, the output will always be validated against this
 309    format (unless the model returns a tool call).
 310    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
 311    Notes:
 312    - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
 313      Older models only support basic version of structured outputs through `{"type": "json_object"}`.
 314      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
 315    - For structured outputs with streaming,
 316      the `response_format` must be a JSON schema and not a Pydantic model.
 317  - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client.
 318  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 319  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 320    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 321  - **azure_ad_token_provider** (<code>AzureADTokenProvider | AsyncAzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on
 322    every request.
 323  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 324    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
 325  
 326  #### warm_up
 327  
 328  ```python
 329  warm_up() -> None
 330  ```
 331  
 332  Warm up the Azure OpenAI chat generator.
 333  
 334  This will warm up the tools registered in the chat generator.
 335  This method is idempotent and will only warm up the tools once.
 336  
 337  #### to_dict
 338  
 339  ```python
 340  to_dict() -> dict[str, Any]
 341  ```
 342  
 343  Serialize this component to a dictionary.
 344  
 345  **Returns:**
 346  
 347  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 348  
 349  #### from_dict
 350  
 351  ```python
 352  from_dict(data: dict[str, Any]) -> AzureOpenAIChatGenerator
 353  ```
 354  
 355  Deserialize this component from a dictionary.
 356  
 357  **Parameters:**
 358  
 359  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 360  
 361  **Returns:**
 362  
 363  - <code>AzureOpenAIChatGenerator</code> – The deserialized component instance.
 364  
 365  ## chat/azure_responses
 366  
 367  ### AzureOpenAIResponsesChatGenerator
 368  
 369  Bases: <code>OpenAIResponsesChatGenerator</code>
 370  
 371  Completes chats using OpenAI's Responses API on Azure.
 372  
 373  It works with the gpt-5 and o-series models and supports streaming responses
 374  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 375  format in input and output.
 376  
 377  You can customize how the text is generated by passing parameters to the
 378  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 379  the component or when you run it. Any parameter that works with
 380  `openai.Responses.create` will work here too.
 381  
 382  For details on OpenAI API parameters, see
 383  [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses).
 384  
 385  ### Usage example
 386  
 387  ```python
 388  from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator
 389  from haystack.dataclasses import ChatMessage
 390  
 391  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 392  
 393  client = AzureOpenAIResponsesChatGenerator(
 394      azure_endpoint="https://example-resource.azure.openai.com/",
 395      generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}
 396  )
 397  response = client.run(messages)
 398  print(response)
 399  ```
 400  
 401  #### SUPPORTED_MODELS
 402  
 403  ```python
 404  SUPPORTED_MODELS: list[str] = [
 405      "gpt-5.4-pro",
 406      "gpt-5.4",
 407      "gpt-5.3-chat",
 408      "gpt-5.3-codex",
 409      "gpt-5.2-codex",
 410      "gpt-5.2",
 411      "gpt-5.2-chat",
 412      "gpt-5.1-codex-max",
 413      "gpt-5.1",
 414      "gpt-5.1-chat",
 415      "gpt-5.1-codex",
 416      "gpt-5.1-codex-mini",
 417      "gpt-5-pro",
 418      "gpt-5-codex",
 419      "gpt-5",
 420      "gpt-5-mini",
 421      "gpt-5-nano",
 422      "gpt-5-chat",
 423      "gpt-4o",
 424      "gpt-4o-mini",
 425      "computer-use-preview",
 426      "gpt-4.1",
 427      "gpt-4.1-nano",
 428      "gpt-4.1-mini",
 429      "gpt-image-1",
 430      "gpt-image-1-mini",
 431      "gpt-image-1.5",
 432      "o1",
 433      "o3-mini",
 434      "o3",
 435      "o4-mini",
 436  ]
 437  
 438  ```
 439  
 440  A non-exhaustive list of chat models supported by this component.
 441  See https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/responses#model-support for the full list.
 442  
 443  #### __init__
 444  
 445  ```python
 446  __init__(
 447      *,
 448      api_key: (
 449          Secret | Callable[[], str] | Callable[[], Awaitable[str]]
 450      ) = Secret.from_env_var("AZURE_OPENAI_API_KEY", strict=False),
 451      azure_endpoint: str | None = None,
 452      azure_deployment: str = "gpt-5-mini",
 453      streaming_callback: StreamingCallbackT | None = None,
 454      organization: str | None = None,
 455      generation_kwargs: dict[str, Any] | None = None,
 456      timeout: float | None = None,
 457      max_retries: int | None = None,
 458      tools: ToolsType | None = None,
 459      tools_strict: bool = False,
 460      http_client_kwargs: dict[str, Any] | None = None
 461  ) -> None
 462  ```
 463  
 464  Initialize the AzureOpenAIResponsesChatGenerator component.
 465  
 466  **Parameters:**
 467  
 468  - **api_key** (<code>Secret | Callable\[[], str\] | Callable\[[], Awaitable\[str\]\]</code>) – The API key to use for authentication. Can be:
 469  - A `Secret` object containing the API key.
 470  - A `Secret` object containing the [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
 471  - A function that returns an Azure Active Directory token.
 472  - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
 473  - **azure_deployment** (<code>str</code>) – The deployment of the model, usually the model name.
 474  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see
 475    [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
 476  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream.
 477    It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
 478    as an argument.
 479  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
 480    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 481  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
 482    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 483  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent
 484    directly to the OpenAI endpoint.
 485    See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for
 486    more details.
 487    Some of the supported parameters:
 488  - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random,
 489    while lower values like 0.2 will make it more focused and deterministic.
 490  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 491    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
 492    comprising the top 10% probability mass are considered.
 493  - `previous_response_id`: The ID of the previous response.
 494    Use this to create multi-turn conversations.
 495  - `text_format`: A Pydantic model that enforces the structure of the model's response.
 496    If provided, the output will always be validated against this
 497    format (unless the model returns a tool call).
 498    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
 499  - `text`: A JSON schema that enforces the structure of the model's response.
 500    If provided, the output will always be validated against this
 501    format (unless the model returns a tool call).
 502    Notes:
 503    - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
 504    - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored.
 505    - Currently, this component doesn't support streaming for structured outputs.
 506    - Older models only support basic version of structured outputs through `{"type": "json_object"}`.
 507      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
 508  - `reasoning`: A dictionary of parameters for reasoning. For example:
 509    - `summary`: The summary of the reasoning.
 510    - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`.
 511    - `generate_summary`: Whether to generate a summary of the reasoning.
 512      Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled.
 513      For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning).
 514  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 515  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 516    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 517  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 518    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
 519  
 520  #### to_dict
 521  
 522  ```python
 523  to_dict() -> dict[str, Any]
 524  ```
 525  
 526  Serialize this component to a dictionary.
 527  
 528  **Returns:**
 529  
 530  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
 531  
 532  #### from_dict
 533  
 534  ```python
 535  from_dict(data: dict[str, Any]) -> AzureOpenAIResponsesChatGenerator
 536  ```
 537  
 538  Deserialize this component from a dictionary.
 539  
 540  **Parameters:**
 541  
 542  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
 543  
 544  **Returns:**
 545  
 546  - <code>AzureOpenAIResponsesChatGenerator</code> – The deserialized component instance.
 547  
 548  ## chat/fallback
 549  
 550  ### FallbackChatGenerator
 551  
 552  A chat generator wrapper that tries multiple chat generators sequentially.
 553  
 554  It forwards all parameters transparently to the underlying chat generators and returns the first successful result.
 555  Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator.
 556  If all chat generators fail, it raises a RuntimeError with details.
 557  
 558  Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only
 559  work correctly if the underlying chat generators implement proper timeout handling and raise exceptions
 560  when timeouts occur. For predictable latency guarantees, ensure your chat generators:
 561  
 562  - Support a `timeout` parameter in their initialization
 563  - Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming)
 564  - Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded
 565  
 566  Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters
 567  with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`)
 568  typically applies to all connection phases: connection setup, read, write, and pool. For streaming
 569  responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for
 570  receiving the complete response.
 571  
 572  Failover is automatically triggered when a generator raises any exception, including:
 573  
 574  - Timeout errors (if the generator implements and raises them)
 575  - Rate limit errors (429)
 576  - Authentication errors (401)
 577  - Context length errors (400)
 578  - Server errors (500+)
 579  - Any other exception
 580  
 581  #### __init__
 582  
 583  ```python
 584  __init__(chat_generators: list[ChatGenerator]) -> None
 585  ```
 586  
 587  Creates an instance of FallbackChatGenerator.
 588  
 589  **Parameters:**
 590  
 591  - **chat_generators** (<code>list\[ChatGenerator\]</code>) – A non-empty list of chat generator components to try in order.
 592  
 593  #### to_dict
 594  
 595  ```python
 596  to_dict() -> dict[str, Any]
 597  ```
 598  
 599  Serialize the component, including nested chat generators when they support serialization.
 600  
 601  #### from_dict
 602  
 603  ```python
 604  from_dict(data: dict[str, Any]) -> FallbackChatGenerator
 605  ```
 606  
 607  Rebuild the component from a serialized representation, restoring nested chat generators.
 608  
 609  #### warm_up
 610  
 611  ```python
 612  warm_up() -> None
 613  ```
 614  
 615  Warm up all underlying chat generators.
 616  
 617  This method calls warm_up() on each underlying generator that supports it.
 618  
 619  #### run
 620  
 621  ```python
 622  run(
 623      messages: list[ChatMessage],
 624      generation_kwargs: dict[str, Any] | None = None,
 625      tools: ToolsType | None = None,
 626      streaming_callback: StreamingCallbackT | None = None,
 627  ) -> dict[str, list[ChatMessage] | dict[str, Any]]
 628  ```
 629  
 630  Execute chat generators sequentially until one succeeds.
 631  
 632  **Parameters:**
 633  
 634  - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances.
 635  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens).
 636  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
 637  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses.
 638  
 639  **Returns:**
 640  
 641  - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with:
 642  - "replies": Generated ChatMessage instances from the first successful generator.
 643  - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
 644    total_attempts, failed_chat_generators, plus any metadata from the successful generator.
 645  
 646  **Raises:**
 647  
 648  - <code>RuntimeError</code> – If all chat generators fail.
 649  
 650  #### run_async
 651  
 652  ```python
 653  run_async(
 654      messages: list[ChatMessage],
 655      generation_kwargs: dict[str, Any] | None = None,
 656      tools: ToolsType | None = None,
 657      streaming_callback: StreamingCallbackT | None = None,
 658  ) -> dict[str, list[ChatMessage] | dict[str, Any]]
 659  ```
 660  
 661  Asynchronously execute chat generators sequentially until one succeeds.
 662  
 663  **Parameters:**
 664  
 665  - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances.
 666  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens).
 667  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
 668  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses.
 669  
 670  **Returns:**
 671  
 672  - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with:
 673  - "replies": Generated ChatMessage instances from the first successful generator.
 674  - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
 675    total_attempts, failed_chat_generators, plus any metadata from the successful generator.
 676  
 677  **Raises:**
 678  
 679  - <code>RuntimeError</code> – If all chat generators fail.
 680  
 681  ## chat/hugging_face_api
 682  
 683  ### HuggingFaceAPIChatGenerator
 684  
 685  Completes chats using Hugging Face APIs.
 686  
 687  HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 688  format for input and output. Use it to generate text with Hugging Face APIs:
 689  
 690  - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers)
 691  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
 692  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
 693  
 694  ### Usage examples
 695  
 696  #### With the serverless inference API (Inference Providers) - free tier available
 697  
 698  ```python
 699  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 700  from haystack.dataclasses import ChatMessage
 701  from haystack.utils import Secret
 702  from haystack.utils.hf import HFGenerationAPIType
 703  
 704  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 705              ChatMessage.from_user("What's Natural Language Processing?")]
 706  
 707  # the api_type can be expressed using the HFGenerationAPIType enum or as a string
 708  api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API
 709  api_type = "serverless_inference_api" # this is equivalent to the above
 710  
 711  generator = HuggingFaceAPIChatGenerator(api_type=api_type,
 712                                          api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
 713                                                      "provider": "together"},
 714                                          token=Secret.from_token("<your-api-key>"))
 715  
 716  result = generator.run(messages)
 717  print(result)
 718  ```
 719  
 720  #### With the serverless inference API (Inference Providers) and text+image input
 721  
 722  ```python
 723  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 724  from haystack.dataclasses import ChatMessage, ImageContent
 725  from haystack.utils import Secret
 726  from haystack.utils.hf import HFGenerationAPIType
 727  
 728  # Create an image from file path, URL, or base64
 729  image = ImageContent.from_file_path("path/to/your/image.jpg")
 730  
 731  # Create a multimodal message with both text and image
 732  messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])]
 733  
 734  generator = HuggingFaceAPIChatGenerator(
 735      api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
 736      api_params={
 737          "model": "Qwen/Qwen2.5-VL-7B-Instruct",  # Vision Language Model
 738          "provider": "hyperbolic"
 739      },
 740      token=Secret.from_token("<your-api-key>")
 741  )
 742  
 743  result = generator.run(messages)
 744  print(result)
 745  ```
 746  
 747  #### With paid inference endpoints
 748  
 749  ```python
 750  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 751  from haystack.dataclasses import ChatMessage
 752  from haystack.utils import Secret
 753  
 754  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 755              ChatMessage.from_user("What's Natural Language Processing?")]
 756  
 757  generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints",
 758                                          api_params={"url": "<your-inference-endpoint-url>"},
 759                                          token=Secret.from_token("<your-api-key>"))
 760  
 761  result = generator.run(messages)
 762  print(result)
 763  ```
 764  
 765  #### With self-hosted text generation inference
 766  
 767  ```python
 768  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 769  from haystack.dataclasses import ChatMessage
 770  
 771  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 772              ChatMessage.from_user("What's Natural Language Processing?")]
 773  
 774  generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference",
 775                                          api_params={"url": "http://localhost:8080"})
 776  
 777  result = generator.run(messages)
 778  print(result)
 779  ```
 780  
 781  #### __init__
 782  
 783  ```python
 784  __init__(
 785      api_type: HFGenerationAPIType | str,
 786      api_params: dict[str, str],
 787      token: Secret | None = Secret.from_env_var(
 788          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
 789      ),
 790      generation_kwargs: dict[str, Any] | None = None,
 791      stop_words: list[str] | None = None,
 792      streaming_callback: StreamingCallbackT | None = None,
 793      tools: ToolsType | None = None,
 794  ) -> None
 795  ```
 796  
 797  Initialize the HuggingFaceAPIChatGenerator instance.
 798  
 799  **Parameters:**
 800  
 801  - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types:
 802  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
 803  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
 804  - `serverless_inference_api`: See
 805    [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers).
 806  - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys:
 807  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
 808  - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`.
 809  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
 810    `TEXT_GENERATION_INFERENCE`.
 811  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc.
 812  - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization.
 813    Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
 814  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
 815    Some examples: `max_tokens`, `temperature`, `top_p`.
 816    For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
 817  - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words.
 818  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
 819  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 820    The chosen model should support tool/function calling, according to the model card.
 821    Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience
 822    unexpected behavior.
 823  
 824  #### warm_up
 825  
 826  ```python
 827  warm_up() -> None
 828  ```
 829  
 830  Warm up the Hugging Face API chat generator.
 831  
 832  This will warm up the tools registered in the chat generator.
 833  This method is idempotent and will only warm up the tools once.
 834  
 835  #### to_dict
 836  
 837  ```python
 838  to_dict() -> dict[str, Any]
 839  ```
 840  
 841  Serialize this component to a dictionary.
 842  
 843  **Returns:**
 844  
 845  - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component.
 846  
 847  #### from_dict
 848  
 849  ```python
 850  from_dict(data: dict[str, Any]) -> HuggingFaceAPIChatGenerator
 851  ```
 852  
 853  Deserialize this component from a dictionary.
 854  
 855  #### run
 856  
 857  ```python
 858  run(
 859      messages: list[ChatMessage],
 860      generation_kwargs: dict[str, Any] | None = None,
 861      tools: ToolsType | None = None,
 862      streaming_callback: StreamingCallbackT | None = None,
 863  ) -> dict[str, list[ChatMessage]]
 864  ```
 865  
 866  Invoke the text generation inference based on the provided messages and generation parameters.
 867  
 868  **Parameters:**
 869  
 870  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
 871  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
 872  - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override
 873    the `tools` parameter set during component initialization. This parameter can accept either a
 874    list of `Tool` objects or a `Toolset` instance.
 875  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
 876    parameter set during component initialization.
 877  
 878  **Returns:**
 879  
 880  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
 881  - `replies`: A list containing the generated responses as ChatMessage objects.
 882  
 883  #### run_async
 884  
 885  ```python
 886  run_async(
 887      messages: list[ChatMessage],
 888      generation_kwargs: dict[str, Any] | None = None,
 889      tools: ToolsType | None = None,
 890      streaming_callback: StreamingCallbackT | None = None,
 891  ) -> dict[str, list[ChatMessage]]
 892  ```
 893  
 894  Asynchronously invokes the text generation inference based on the provided messages and generation parameters.
 895  
 896  This is the asynchronous version of the `run` method. It has the same parameters
 897  and return values but can be used with `await` in an async code.
 898  
 899  **Parameters:**
 900  
 901  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
 902  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
 903  - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools`
 904    parameter set during component initialization. This parameter can accept either a list of `Tool` objects
 905    or a `Toolset` instance.
 906  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
 907    parameter set during component initialization.
 908  
 909  **Returns:**
 910  
 911  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
 912  - `replies`: A list containing the generated responses as ChatMessage objects.
 913  
 914  ## chat/hugging_face_local
 915  
 916  ### default_tool_parser
 917  
 918  ```python
 919  default_tool_parser(text: str) -> list[ToolCall] | None
 920  ```
 921  
 922  Default implementation for parsing tool calls from model output text.
 923  
 924  Uses DEFAULT_TOOL_PATTERN to extract tool calls.
 925  
 926  **Parameters:**
 927  
 928  - **text** (<code>str</code>) – The text to parse for tool calls.
 929  
 930  **Returns:**
 931  
 932  - <code>list\[ToolCall\] | None</code> – A list containing a single ToolCall if a valid tool call is found, None otherwise.
 933  
 934  ### HuggingFaceLocalChatGenerator
 935  
 936  Generates chat responses using models from Hugging Face that run locally.
 937  
 938  Use this component with chat-based models,
 939  such as `Qwen/Qwen3-0.6B` or `meta-llama/Llama-2-7b-chat-hf`.
 940  LLMs running locally may need powerful hardware.
 941  
 942  ### Usage example
 943  
 944  ```python
 945  from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
 946  from haystack.dataclasses import ChatMessage
 947  
 948  generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B")
 949  messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
 950  print(generator.run(messages))
 951  ```
 952  
 953  ```
 954  {'replies':
 955      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
 956      "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals
 957      with the interaction between computers and human language. It enables computers to understand, interpret, and
 958      generate human language in a valuable way. NLP involves various techniques such as speech recognition, text
 959      analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to
 960      process and derive meaning from human language, improving communication between humans and machines.")],
 961      _name=None,
 962      _meta={'finish_reason': 'stop', 'index': 0, 'model':
 963            'mistralai/Mistral-7B-Instruct-v0.2',
 964            'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}})
 965            ]
 966  }
 967  ```
 968  
 969  #### __init__
 970  
 971  ```python
 972  __init__(
 973      model: str = "Qwen/Qwen3-0.6B",
 974      task: (
 975          Literal["text-generation", "text2text-generation", "image-text-to-text"]
 976          | None
 977      ) = None,
 978      device: ComponentDevice | None = None,
 979      token: Secret | None = Secret.from_env_var(
 980          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
 981      ),
 982      chat_template: str | None = None,
 983      generation_kwargs: dict[str, Any] | None = None,
 984      huggingface_pipeline_kwargs: dict[str, Any] | None = None,
 985      stop_words: list[str] | None = None,
 986      streaming_callback: StreamingCallbackT | None = None,
 987      tools: ToolsType | None = None,
 988      tool_parsing_function: Callable[[str], list[ToolCall] | None] | None = None,
 989      async_executor: ThreadPoolExecutor | None = None,
 990      *,
 991      enable_thinking: bool = False
 992  ) -> None
 993  ```
 994  
 995  Initializes the HuggingFaceLocalChatGenerator component.
 996  
 997  **Parameters:**
 998  
 999  - **model** (<code>str</code>) – The Hugging Face text generation model name or path,
1000    for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`.
1001    The model must be a chat model supporting the ChatML messaging
1002    format.
1003    If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1004  - **task** (<code>Literal['text-generation', 'text2text-generation', 'image-text-to-text'] | None</code>) – The task for the Hugging Face pipeline. Possible options:
1005  - `text-generation`: Supported by decoder models, like GPT.
1006  - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead.
1007    Previously supported by encoder–decoder models such as T5.
1008  - `image-text-to-text`: Supported by vision-language models.
1009    If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1010    If not specified, the component calls the Hugging Face API to infer the task from the model name.
1011  - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device.
1012    If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
1013  - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files.
1014    If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1015  - **chat_template** (<code>str | None</code>) – Specifies an optional Jinja template for formatting chat
1016    messages. Most high-quality chat models have their own templates, but for models without this
1017    feature or if you prefer a custom template, use this parameter.
1018  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
1019    Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
1020    See Hugging Face's documentation for more information:
1021  - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
1022  - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
1023      The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens.
1024  - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the
1025    Hugging Face pipeline for text generation.
1026    These keyword arguments provide fine-grained control over the Hugging Face pipeline.
1027    In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
1028    For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
1029    In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
1030  - **stop_words** (<code>list\[str\] | None</code>) – A list of stop words. If the model generates a stop word, the generation stops.
1031    If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
1032    For some chat models, the output includes both the new text and the original prompt.
1033    In these cases, make sure your prompt has no stop words.
1034  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1035  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1036  - **tool_parsing_function** (<code>Callable\\[[str\], list\[ToolCall\] | None\] | None</code>) – A callable that takes a string and returns a list of ToolCall objects or None.
1037    If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern.
1038  - **async_executor** (<code>ThreadPoolExecutor | None</code>) – Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be
1039    initialized and used
1040  - **enable_thinking** (<code>bool</code>) – Whether to enable thinking mode in the chat template for thinking-capable models.
1041    When enabled, the model generates intermediate reasoning before the final response. Defaults to False.
1042  
1043  #### shutdown
1044  
1045  ```python
1046  shutdown() -> None
1047  ```
1048  
1049  Explicitly shutdown the executor if we own it.
1050  
1051  #### warm_up
1052  
1053  ```python
1054  warm_up() -> None
1055  ```
1056  
1057  Initializes the component and warms up tools if provided.
1058  
1059  #### to_dict
1060  
1061  ```python
1062  to_dict() -> dict[str, Any]
1063  ```
1064  
1065  Serializes the component to a dictionary.
1066  
1067  **Returns:**
1068  
1069  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1070  
1071  #### from_dict
1072  
1073  ```python
1074  from_dict(data: dict[str, Any]) -> HuggingFaceLocalChatGenerator
1075  ```
1076  
1077  Deserializes the component from a dictionary.
1078  
1079  **Parameters:**
1080  
1081  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
1082  
1083  **Returns:**
1084  
1085  - <code>HuggingFaceLocalChatGenerator</code> – The deserialized component.
1086  
1087  #### run
1088  
1089  ```python
1090  run(
1091      messages: list[ChatMessage],
1092      generation_kwargs: dict[str, Any] | None = None,
1093      streaming_callback: StreamingCallbackT | None = None,
1094      tools: ToolsType | None = None,
1095  ) -> dict[str, list[ChatMessage]]
1096  ```
1097  
1098  Invoke text generation inference based on the provided messages and generation parameters.
1099  
1100  **Parameters:**
1101  
1102  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
1103  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
1104  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1105  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1106    If set, it will override the `tools` parameter provided during initialization.
1107  
1108  **Returns:**
1109  
1110  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
1111  - `replies`: A list containing the generated responses as ChatMessage instances.
1112  
1113  #### create_message
1114  
1115  ```python
1116  create_message(
1117      text: str,
1118      index: int,
1119      tokenizer: Union[PreTrainedTokenizer, PreTrainedTokenizerFast],
1120      prompt: str,
1121      generation_kwargs: dict[str, Any],
1122      parse_tool_calls: bool = False,
1123  ) -> ChatMessage
1124  ```
1125  
1126  Create a ChatMessage instance from the provided text, populated with metadata.
1127  
1128  **Parameters:**
1129  
1130  - **text** (<code>str</code>) – The generated text.
1131  - **index** (<code>int</code>) – The index of the generated text.
1132  - **tokenizer** (<code>Union\[PreTrainedTokenizer, PreTrainedTokenizerFast\]</code>) – The tokenizer used for generation.
1133  - **prompt** (<code>str</code>) – The prompt used for generation.
1134  - **generation_kwargs** (<code>dict\[str, Any\]</code>) – The generation parameters.
1135  - **parse_tool_calls** (<code>bool</code>) – Whether to attempt parsing tool calls from the text.
1136  
1137  **Returns:**
1138  
1139  - <code>ChatMessage</code> – A ChatMessage instance.
1140  
1141  #### run_async
1142  
1143  ```python
1144  run_async(
1145      messages: list[ChatMessage],
1146      generation_kwargs: dict[str, Any] | None = None,
1147      streaming_callback: StreamingCallbackT | None = None,
1148      tools: ToolsType | None = None,
1149  ) -> dict[str, list[ChatMessage]]
1150  ```
1151  
1152  Asynchronously invokes text generation inference based on the provided messages and generation parameters.
1153  
1154  This is the asynchronous version of the `run` method. It has the same parameters
1155  and return values but can be used with `await` in an async code.
1156  
1157  **Parameters:**
1158  
1159  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages.
1160  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
1161  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1162  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1163    If set, it will override the `tools` parameter provided during initialization.
1164  
1165  **Returns:**
1166  
1167  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys:
1168  - `replies`: A list containing the generated responses as ChatMessage instances.
1169  
1170  ## chat/llm
1171  
1172  ### LLM
1173  
1174  Bases: <code>Agent</code>
1175  
1176  A text generation component powered by a large language model.
1177  
1178  The LLM component is a simplified version of the Agent that focuses solely on text generation
1179  without tool usage. It processes messages and returns a single response from the language model.
1180  
1181  ### Usage examples
1182  
1183  ```python
1184  from haystack.components.generators.chat import LLM
1185  from haystack.components.generators.chat import OpenAIChatGenerator
1186  from haystack.dataclasses import ChatMessage
1187  
1188  llm = LLM(
1189      chat_generator=OpenAIChatGenerator(),
1190      system_prompt="You are a helpful translation assistant.",
1191      user_prompt="""{% message role="user"%}
1192  Summarize the following document: {{ document }}
1193  {% endmessage %}""",
1194      required_variables=["document"],
1195  )
1196  
1197  result = llm.run(document="The weather is lovely today and the sun is shining. ")
1198  print(result["last_message"].text)
1199  ```
1200  
1201  #### __init__
1202  
1203  ```python
1204  __init__(
1205      *,
1206      chat_generator: ChatGenerator,
1207      system_prompt: str | None = None,
1208      user_prompt: str | None = None,
1209      required_variables: list[str] | Literal["*"] | None = None,
1210      streaming_callback: StreamingCallbackT | None = None
1211  ) -> None
1212  ```
1213  
1214  Initialize the LLM component.
1215  
1216  **Parameters:**
1217  
1218  - **chat_generator** (<code>ChatGenerator</code>) – An instance of the chat generator that the LLM should use.
1219  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM.
1220  - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided this is appended to the messages provided at runtime.
1221  - **required_variables** (<code>list\[str\] | Literal['\*'] | None</code>) – List variables that must be provided as input to user_prompt.
1222    If a variable listed as required is not provided, an exception is raised.
1223    If set to `"*"`, all variables found in the prompt are required. Optional.
1224  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM.
1225  
1226  #### to_dict
1227  
1228  ```python
1229  to_dict() -> dict[str, Any]
1230  ```
1231  
1232  Serialize the LLM component to a dictionary.
1233  
1234  **Returns:**
1235  
1236  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
1237  
1238  #### from_dict
1239  
1240  ```python
1241  from_dict(data: dict[str, Any]) -> LLM
1242  ```
1243  
1244  Deserialize the LLM from a dictionary.
1245  
1246  **Parameters:**
1247  
1248  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
1249  
1250  **Returns:**
1251  
1252  - <code>LLM</code> – Deserialized LLM instance.
1253  
1254  #### run
1255  
1256  ```python
1257  run(
1258      messages: list[ChatMessage] | None = None,
1259      streaming_callback: StreamingCallbackT | None = None,
1260      *,
1261      generation_kwargs: dict[str, Any] | None = None,
1262      system_prompt: str | None = None,
1263      user_prompt: str | None = None,
1264      **kwargs: Any
1265  ) -> dict[str, Any]
1266  ```
1267  
1268  Process messages and generate a response from the language model.
1269  
1270  **Parameters:**
1271  
1272  - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process.
1273  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM.
1274  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters
1275    will override the parameters passed during component initialization.
1276  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt.
1277  - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is
1278    appended to the messages provided at runtime.
1279  - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt`
1280    (the keys must match template variable names).
1281  
1282  **Returns:**
1283  
1284  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
1285  - "messages": List of all messages exchanged during the LLM's run.
1286  - "last_message": The last message exchanged during the LLM's run.
1287  
1288  #### run_async
1289  
1290  ```python
1291  run_async(
1292      messages: list[ChatMessage] | None = None,
1293      streaming_callback: StreamingCallbackT | None = None,
1294      *,
1295      generation_kwargs: dict[str, Any] | None = None,
1296      system_prompt: str | None = None,
1297      user_prompt: str | None = None,
1298      **kwargs: Any
1299  ) -> dict[str, Any]
1300  ```
1301  
1302  Asynchronously process messages and generate a response from the language model.
1303  
1304  **Parameters:**
1305  
1306  - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process.
1307  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An asynchronous callback that will be invoked when a response is streamed
1308    from the LLM.
1309  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters
1310    will override the parameters passed during component initialization.
1311  - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt.
1312  - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is
1313    appended to the messages provided at runtime.
1314  - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt`
1315    (the keys must match template variable names).
1316  
1317  **Returns:**
1318  
1319  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
1320  - "messages": List of all messages exchanged during the LLM's run.
1321  - "last_message": The last message exchanged during the LLM's run.
1322  
1323  ## chat/openai
1324  
1325  ### OpenAIChatGenerator
1326  
1327  Completes chats using OpenAI's large language models (LLMs).
1328  
1329  It works with the gpt-4 and gpt-5 series models and supports streaming responses
1330  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1331  format in input and output.
1332  
1333  You can customize how the text is generated by passing parameters to the
1334  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1335  the component or when you run it. Any parameter that works with
1336  `openai.ChatCompletion.create` will work here too.
1337  
1338  For details on OpenAI API parameters, see
1339  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
1340  
1341  ### Usage example
1342  
1343  ```python
1344  from haystack.components.generators.chat import OpenAIChatGenerator
1345  from haystack.dataclasses import ChatMessage
1346  
1347  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1348  
1349  client = OpenAIChatGenerator()
1350  response = client.run(messages)
1351  print(response)
1352  ```
1353  
1354  Output:
1355  
1356  ```
1357  {'replies':
1358      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=
1359      [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence
1360          that focuses on enabling computers to understand, interpret, and generate human language in
1361          a way that is meaningful and useful.")],
1362       _name=None,
1363       _meta={'model': 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop',
1364       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})
1365      ]
1366  }
1367  ```
1368  
1369  #### SUPPORTED_MODELS
1370  
1371  ```python
1372  SUPPORTED_MODELS: list[str] = [
1373      "gpt-5-mini",
1374      "gpt-5-nano",
1375      "gpt-5",
1376      "gpt-5.1",
1377      "gpt-5.2",
1378      "gpt-5.2-pro",
1379      "gpt-5.4",
1380      "gpt-5-pro",
1381      "gpt-4.1",
1382      "gpt-4.1-mini",
1383      "gpt-4.1-nano",
1384      "gpt-4o",
1385      "gpt-4o-mini",
1386      "gpt-4-turbo",
1387      "gpt-4",
1388      "gpt-3.5-turbo",
1389  ]
1390  
1391  ```
1392  
1393  A non-exhaustive list of chat models supported by this component.
1394  See https://developers.openai.com/api/docs/models for the full list and snapshot IDs.
1395  
1396  #### __init__
1397  
1398  ```python
1399  __init__(
1400      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
1401      model: str = "gpt-5-mini",
1402      streaming_callback: StreamingCallbackT | None = None,
1403      api_base_url: str | None = None,
1404      organization: str | None = None,
1405      generation_kwargs: dict[str, Any] | None = None,
1406      timeout: float | None = None,
1407      max_retries: int | None = None,
1408      tools: ToolsType | None = None,
1409      tools_strict: bool = False,
1410      http_client_kwargs: dict[str, Any] | None = None,
1411  ) -> None
1412  ```
1413  
1414  Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini
1415  
1416  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
1417  environment variables to override the `timeout` and `max_retries` parameters respectively
1418  in the OpenAI client.
1419  
1420  **Parameters:**
1421  
1422  - **api_key** (<code>Secret</code>) – The OpenAI API key.
1423    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
1424    during initialization.
1425  - **model** (<code>str</code>) – The name of the model to use.
1426  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1427    The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1428    as an argument.
1429  - **api_base_url** (<code>str | None</code>) – An optional base URL.
1430  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See
1431    [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1432  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to
1433    the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
1434    more details.
1435    Some of the supported parameters:
1436  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
1437    including visible output tokens and reasoning tokens.
1438  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
1439    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
1440  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1441    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1442    comprising the top 10% probability mass are considered.
1443  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
1444    it will generate two completions for each of the three prompts, ending up with 6 completions in total.
1445  - `stop`: One or more sequences after which the LLM should stop generating tokens.
1446  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
1447    the model will be less likely to repeat the same token in the text.
1448  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
1449    Bigger values mean the model will be less likely to repeat the same token in the text.
1450  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
1451    values are the bias to add to that token.
1452  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
1453    If provided, the output will always be validated against this
1454    format (unless the model returns a tool call).
1455    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1456    Notes:
1457    - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
1458      Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1459      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1460    - For structured outputs with streaming,
1461      the `response_format` must be a JSON schema and not a Pydantic model.
1462  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
1463    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1464  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
1465    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1466  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1467  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1468    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1469  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1470    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
1471  
1472  #### warm_up
1473  
1474  ```python
1475  warm_up() -> None
1476  ```
1477  
1478  Warm up the OpenAI chat generator.
1479  
1480  This will warm up the tools registered in the chat generator.
1481  This method is idempotent and will only warm up the tools once.
1482  
1483  #### to_dict
1484  
1485  ```python
1486  to_dict() -> dict[str, Any]
1487  ```
1488  
1489  Serialize this component to a dictionary.
1490  
1491  **Returns:**
1492  
1493  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
1494  
1495  #### from_dict
1496  
1497  ```python
1498  from_dict(data: dict[str, Any]) -> OpenAIChatGenerator
1499  ```
1500  
1501  Deserialize this component from a dictionary.
1502  
1503  **Parameters:**
1504  
1505  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
1506  
1507  **Returns:**
1508  
1509  - <code>OpenAIChatGenerator</code> – The deserialized component instance.
1510  
1511  #### run
1512  
1513  ```python
1514  run(
1515      messages: list[ChatMessage],
1516      streaming_callback: StreamingCallbackT | None = None,
1517      generation_kwargs: dict[str, Any] | None = None,
1518      *,
1519      tools: ToolsType | None = None,
1520      tools_strict: bool | None = None
1521  ) -> dict[str, list[ChatMessage]]
1522  ```
1523  
1524  Invokes chat completion based on the provided messages and generation parameters.
1525  
1526  **Parameters:**
1527  
1528  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1529  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1530  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1531    override the parameters passed during component initialization.
1532    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1533  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1534    If set, it will override the `tools` parameter provided during initialization.
1535  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1536    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1537    If set, it will override the `tools_strict` parameter set during component initialization.
1538  
1539  **Returns:**
1540  
1541  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1542  - `replies`: A list containing the generated responses as ChatMessage instances.
1543  
1544  #### run_async
1545  
1546  ```python
1547  run_async(
1548      messages: list[ChatMessage],
1549      streaming_callback: StreamingCallbackT | None = None,
1550      generation_kwargs: dict[str, Any] | None = None,
1551      *,
1552      tools: ToolsType | None = None,
1553      tools_strict: bool | None = None
1554  ) -> dict[str, list[ChatMessage]]
1555  ```
1556  
1557  Asynchronously invokes chat completion based on the provided messages and generation parameters.
1558  
1559  This is the asynchronous version of the `run` method. It has the same parameters and return values
1560  but can be used with `await` in async code.
1561  
1562  **Parameters:**
1563  
1564  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1565  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1566    Must be a coroutine.
1567  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1568    override the parameters passed during component initialization.
1569    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1570  - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1571    If set, it will override the `tools` parameter provided during initialization.
1572  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1573    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1574    If set, it will override the `tools_strict` parameter set during component initialization.
1575  
1576  **Returns:**
1577  
1578  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1579  - `replies`: A list containing the generated responses as ChatMessage instances.
1580  
1581  ## chat/openai_responses
1582  
1583  ### OpenAIResponsesChatGenerator
1584  
1585  Completes chats using OpenAI's Responses API.
1586  
1587  It works with the gpt-4 and o-series models and supports streaming responses
1588  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1589  format in input and output.
1590  
1591  You can customize how the text is generated by passing parameters to the
1592  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1593  the component or when you run it. Any parameter that works with
1594  `openai.Responses.create` will work here too.
1595  
1596  For details on OpenAI API parameters, see
1597  [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses).
1598  
1599  ### Usage example
1600  
1601  ```python
1602  from haystack.components.generators.chat import OpenAIResponsesChatGenerator
1603  from haystack.dataclasses import ChatMessage
1604  
1605  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1606  
1607  client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}})
1608  response = client.run(messages)
1609  print(response)
1610  ```
1611  
1612  #### SUPPORTED_MODELS
1613  
1614  ```python
1615  SUPPORTED_MODELS: list[str] = [
1616      "gpt-5-mini",
1617      "gpt-5-nano",
1618      "gpt-5",
1619      "gpt-5.1",
1620      "gpt-5.2",
1621      "gpt-5.2-pro",
1622      "gpt-5.4",
1623      "gpt-5-pro",
1624      "gpt-4.1",
1625      "gpt-4.1-mini",
1626      "gpt-4.1-nano",
1627      "gpt-4o",
1628      "gpt-4o-mini",
1629      "o1",
1630      "o1-mini",
1631      "o1-pro",
1632      "o3",
1633      "o3-mini",
1634      "o3-pro",
1635      "o4-mini",
1636  ]
1637  
1638  ```
1639  
1640  A non-exhaustive list of chat models supported by this component.
1641  See https://platform.openai.com/docs/models for the full list and snapshot IDs.
1642  
1643  #### __init__
1644  
1645  ```python
1646  __init__(
1647      *,
1648      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
1649      model: str = "gpt-5-mini",
1650      streaming_callback: StreamingCallbackT | None = None,
1651      api_base_url: str | None = None,
1652      organization: str | None = None,
1653      generation_kwargs: dict[str, Any] | None = None,
1654      timeout: float | None = None,
1655      max_retries: int | None = None,
1656      tools: ToolsType | list[dict] | None = None,
1657      tools_strict: bool = False,
1658      http_client_kwargs: dict[str, Any] | None = None
1659  ) -> None
1660  ```
1661  
1662  Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default.
1663  
1664  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
1665  environment variables to override the `timeout` and `max_retries` parameters respectively
1666  in the OpenAI client.
1667  
1668  **Parameters:**
1669  
1670  - **api_key** (<code>Secret</code>) – The OpenAI API key.
1671    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
1672    during initialization.
1673  - **model** (<code>str</code>) – The name of the model to use.
1674  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1675    The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1676    as an argument.
1677  - **api_base_url** (<code>str | None</code>) – An optional base URL.
1678  - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See
1679    [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1680  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent
1681    directly to the OpenAI endpoint.
1682    See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for
1683    more details.
1684    Some of the supported parameters:
1685  - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random,
1686    while lower values like 0.2 will make it more focused and deterministic.
1687  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1688    considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1689    comprising the top 10% probability mass are considered.
1690  - `previous_response_id`: The ID of the previous response.
1691    Use this to create multi-turn conversations.
1692  - `text_format`: A Pydantic model that enforces the structure of the model's response.
1693    If provided, the output will always be validated against this
1694    format (unless the model returns a tool call).
1695    For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1696  - `text`: A JSON schema that enforces the structure of the model's response.
1697    If provided, the output will always be validated against this
1698    format (unless the model returns a tool call).
1699    Notes:
1700    - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
1701    - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored.
1702    - Currently, this component doesn't support streaming for structured outputs.
1703    - Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1704      For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1705  - `reasoning`: A dictionary of parameters for reasoning. For example:
1706    - `summary`: The summary of the reasoning.
1707    - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`.
1708    - `generate_summary`: Whether to generate a summary of the reasoning.
1709      Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled.
1710      For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning).
1711  - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the
1712    `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1713  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error.
1714    If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1715  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. This parameter can accept either a
1716    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1717    OpenAI/MCP tool definitions.
1718    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1719    For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
1720  - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
1721    follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
1722    are strict by default.
1723  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1724    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
1725  
1726  #### warm_up
1727  
1728  ```python
1729  warm_up() -> None
1730  ```
1731  
1732  Warm up the OpenAI responses chat generator.
1733  
1734  This will warm up the tools registered in the chat generator.
1735  This method is idempotent and will only warm up the tools once.
1736  
1737  #### to_dict
1738  
1739  ```python
1740  to_dict() -> dict[str, Any]
1741  ```
1742  
1743  Serialize this component to a dictionary.
1744  
1745  **Returns:**
1746  
1747  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
1748  
1749  #### from_dict
1750  
1751  ```python
1752  from_dict(data: dict[str, Any]) -> OpenAIResponsesChatGenerator
1753  ```
1754  
1755  Deserialize this component from a dictionary.
1756  
1757  **Parameters:**
1758  
1759  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
1760  
1761  **Returns:**
1762  
1763  - <code>OpenAIResponsesChatGenerator</code> – The deserialized component instance.
1764  
1765  #### run
1766  
1767  ```python
1768  run(
1769      messages: list[ChatMessage],
1770      *,
1771      streaming_callback: StreamingCallbackT | None = None,
1772      generation_kwargs: dict[str, Any] | None = None,
1773      tools: ToolsType | list[dict] | None = None,
1774      tools_strict: bool | None = None
1775  ) -> dict[str, list[ChatMessage]]
1776  ```
1777  
1778  Invokes response generation based on the provided messages and generation parameters.
1779  
1780  **Parameters:**
1781  
1782  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1783  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1784  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1785    override the parameters passed during component initialization.
1786    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
1787  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. If set, it will override the
1788    `tools` parameter set during component initialization. This parameter can accept either a
1789    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1790    OpenAI/MCP tool definitions.
1791    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1792    For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
1793  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
1794    follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
1795    are strict by default.
1796    If set, it will override the `tools_strict` parameter set during component initialization.
1797  
1798  **Returns:**
1799  
1800  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1801  - `replies`: A list containing the generated responses as ChatMessage instances.
1802  
1803  #### run_async
1804  
1805  ```python
1806  run_async(
1807      messages: list[ChatMessage],
1808      *,
1809      streaming_callback: StreamingCallbackT | None = None,
1810      generation_kwargs: dict[str, Any] | None = None,
1811      tools: ToolsType | list[dict] | None = None,
1812      tools_strict: bool | None = None
1813  ) -> dict[str, list[ChatMessage]]
1814  ```
1815  
1816  Asynchronously invokes response generation based on the provided messages and generation parameters.
1817  
1818  This is the asynchronous version of the `run` method. It has the same parameters and return values
1819  but can be used with `await` in async code.
1820  
1821  **Parameters:**
1822  
1823  - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages.
1824  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1825    Must be a coroutine.
1826  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will
1827    override the parameters passed during component initialization.
1828    For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
1829  - **tools** (<code>ToolsType | list\[dict\] | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
1830    `tools` parameter set during component initialization. This parameter can accept either a list of
1831    mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1832    OpenAI/MCP tool definitions.
1833    Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1834  - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1835    the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1836    If set, it will override the `tools_strict` parameter set during component initialization.
1837  
1838  **Returns:**
1839  
1840  - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key:
1841  - `replies`: A list containing the generated responses as ChatMessage instances.
1842  
1843  ## hugging_face_api
1844  
1845  ### HuggingFaceAPIGenerator
1846  
1847  Generates text using Hugging Face APIs.
1848  
1849  Use it with the following Hugging Face APIs:
1850  
1851  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
1852  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
1853  
1854  **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the
1855  `text_generation` endpoint. Generative models are now only available through providers supporting the
1856  `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API.
1857  Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint.
1858  
1859  ### Usage examples
1860  
1861  #### With Hugging Face Inference Endpoints
1862  
1863  ```python
1864  from haystack.components.generators import HuggingFaceAPIGenerator
1865  from haystack.utils import Secret
1866  
1867  generator = HuggingFaceAPIGenerator(api_type="inference_endpoints",
1868                                      api_params={"url": "<your-inference-endpoint-url>"},
1869                                      token=Secret.from_token("<your-api-key>"))
1870  
1871  result = generator.run(prompt="What's Natural Language Processing?")
1872  print(result)
1873  ```
1874  
1875  #### With self-hosted text generation inference
1876  
1877  ```python
1878  from haystack.components.generators import HuggingFaceAPIGenerator
1879  
1880  generator = HuggingFaceAPIGenerator(api_type="text_generation_inference",
1881                                      api_params={"url": "http://localhost:8080"})
1882  
1883  result = generator.run(prompt="What's Natural Language Processing?")
1884  print(result)
1885  ```
1886  
1887  #### With the free serverless inference API
1888  
1889  Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the
1890  `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the
1891  `chat_completion` endpoint.
1892  
1893  ```python
1894  from haystack.components.generators import HuggingFaceAPIGenerator
1895  from haystack.utils import Secret
1896  
1897  generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
1898                                      api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
1899                                      token=Secret.from_token("<your-api-key>"))
1900  
1901  result = generator.run(prompt="What's Natural Language Processing?")
1902  print(result)
1903  ```
1904  
1905  #### __init__
1906  
1907  ```python
1908  __init__(
1909      api_type: HFGenerationAPIType | str,
1910      api_params: dict[str, str],
1911      token: Secret | None = Secret.from_env_var(
1912          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
1913      ),
1914      generation_kwargs: dict[str, Any] | None = None,
1915      stop_words: list[str] | None = None,
1916      streaming_callback: StreamingCallbackT | None = None,
1917  ) -> None
1918  ```
1919  
1920  Initialize the HuggingFaceAPIGenerator instance.
1921  
1922  **Parameters:**
1923  
1924  - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types:
1925  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
1926  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
1927  - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api).
1928    This might no longer work due to changes in the models offered in the Hugging Face Inference API.
1929    Please use the `HuggingFaceAPIChatGenerator` component instead.
1930  - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys:
1931  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
1932  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
1933    `TEXT_GENERATION_INFERENCE`.
1934  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc.
1935  - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization.
1936    Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
1937  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`,
1938    `temperature`, `top_k`, `top_p`.
1939    For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation)
1940    for more information.
1941  - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words.
1942  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
1943  
1944  #### to_dict
1945  
1946  ```python
1947  to_dict() -> dict[str, Any]
1948  ```
1949  
1950  Serialize this component to a dictionary.
1951  
1952  **Returns:**
1953  
1954  - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component.
1955  
1956  #### from_dict
1957  
1958  ```python
1959  from_dict(data: dict[str, Any]) -> HuggingFaceAPIGenerator
1960  ```
1961  
1962  Deserialize this component from a dictionary.
1963  
1964  #### run
1965  
1966  ```python
1967  run(
1968      prompt: str,
1969      streaming_callback: StreamingCallbackT | None = None,
1970      generation_kwargs: dict[str, Any] | None = None,
1971  ) -> dict[str, Any]
1972  ```
1973  
1974  Invoke the text generation inference for the given prompt and generation parameters.
1975  
1976  **Parameters:**
1977  
1978  - **prompt** (<code>str</code>) – A string representing the prompt.
1979  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
1980  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
1981  
1982  **Returns:**
1983  
1984  - <code>dict\[str, Any\]</code> – A dictionary with the generated replies and metadata. Both are lists of length n.
1985  - replies: A list of strings representing the generated replies.
1986  
1987  ## hugging_face_local
1988  
1989  ### HuggingFaceLocalGenerator
1990  
1991  Generates text using models from Hugging Face that run locally.
1992  
1993  LLMs running locally may need powerful hardware.
1994  
1995  ### Usage example
1996  
1997  ```python
1998  from haystack.components.generators import HuggingFaceLocalGenerator
1999  
2000  generator = HuggingFaceLocalGenerator(
2001      model="Qwen/Qwen3-0.6B",
2002      task="text-generation",
2003      generation_kwargs={"max_new_tokens": 100, "temperature": 0.9}
2004  )
2005  
2006  print(generator.run("Who is the best American actor?"))
2007  # {'replies': ['John Cusack']}
2008  ```
2009  
2010  #### __init__
2011  
2012  ```python
2013  __init__(
2014      model: str = "Qwen/Qwen3-0.6B",
2015      task: Literal["text-generation", "text2text-generation"] | None = None,
2016      device: ComponentDevice | None = None,
2017      token: Secret | None = Secret.from_env_var(
2018          ["HF_API_TOKEN", "HF_TOKEN"], strict=False
2019      ),
2020      generation_kwargs: dict[str, Any] | None = None,
2021      huggingface_pipeline_kwargs: dict[str, Any] | None = None,
2022      stop_words: list[str] | None = None,
2023      streaming_callback: StreamingCallbackT | None = None,
2024  ) -> None
2025  ```
2026  
2027  Creates an instance of a HuggingFaceLocalGenerator.
2028  
2029  **Parameters:**
2030  
2031  - **model** (<code>str</code>) – The Hugging Face text generation model name or path.
2032  - **task** (<code>Literal['text-generation', 'text2text-generation'] | None</code>) – The task for the Hugging Face pipeline. Possible options:
2033  - `text-generation`: Supported by decoder models, like GPT.
2034  - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead.
2035    Previously supported by encoder–decoder models such as T5.
2036    If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
2037    If not specified, the component calls the Hugging Face API to infer the task from the model name.
2038  - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device.
2039    If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
2040  - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files.
2041    If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
2042  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation.
2043    Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
2044    See Hugging Face's documentation for more information:
2045  - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
2046  - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
2047  - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the
2048    Hugging Face pipeline for text generation.
2049    These keyword arguments provide fine-grained control over the Hugging Face pipeline.
2050    In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
2051    For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
2052    In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization:
2053    [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
2054  - **stop_words** (<code>list\[str\] | None</code>) – If the model generates a stop word, the generation stops.
2055    If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
2056    For some chat models, the output includes both the new text and the original prompt.
2057    In these cases, make sure your prompt has no stop words.
2058  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses.
2059  
2060  #### warm_up
2061  
2062  ```python
2063  warm_up() -> None
2064  ```
2065  
2066  Initializes the component.
2067  
2068  #### to_dict
2069  
2070  ```python
2071  to_dict() -> dict[str, Any]
2072  ```
2073  
2074  Serializes the component to a dictionary.
2075  
2076  **Returns:**
2077  
2078  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
2079  
2080  #### from_dict
2081  
2082  ```python
2083  from_dict(data: dict[str, Any]) -> HuggingFaceLocalGenerator
2084  ```
2085  
2086  Deserializes the component from a dictionary.
2087  
2088  **Parameters:**
2089  
2090  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
2091  
2092  **Returns:**
2093  
2094  - <code>HuggingFaceLocalGenerator</code> – The deserialized component.
2095  
2096  #### run
2097  
2098  ```python
2099  run(
2100      prompt: str,
2101      streaming_callback: StreamingCallbackT | None = None,
2102      generation_kwargs: dict[str, Any] | None = None,
2103  ) -> dict[str, Any]
2104  ```
2105  
2106  Run the text generation model on the given prompt.
2107  
2108  **Parameters:**
2109  
2110  - **prompt** (<code>str</code>) – A string representing the prompt.
2111  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2112  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation.
2113  
2114  **Returns:**
2115  
2116  - <code>dict\[str, Any\]</code> – A dictionary containing the generated replies.
2117  - replies: A list of strings representing the generated replies.
2118  
2119  ## openai
2120  
2121  ### OpenAIGenerator
2122  
2123  Generates text using OpenAI's large language models (LLMs).
2124  
2125  It works with the gpt-4 and gpt-5 series models and supports streaming responses
2126  from OpenAI API. It uses strings as input and output.
2127  
2128  You can customize how the text is generated by passing parameters to the
2129  OpenAI API. Use the `**generation_kwargs` argument when you initialize
2130  the component or when you run it. Any parameter that works with
2131  `openai.ChatCompletion.create` will work here too.
2132  
2133  For details on OpenAI API parameters, see
2134  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
2135  
2136  ### Usage example
2137  
2138  ```python
2139  from haystack.components.generators import OpenAIGenerator
2140  client = OpenAIGenerator()
2141  response = client.run("What's Natural Language Processing? Be brief.")
2142  print(response)
2143  
2144  # >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
2145  # >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
2146  # >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
2147  # >> 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
2148  # >> 'completion_tokens': 49, 'total_tokens': 65}}]}
2149  ```
2150  
2151  #### __init__
2152  
2153  ```python
2154  __init__(
2155      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
2156      model: str = "gpt-5-mini",
2157      streaming_callback: StreamingCallbackT | None = None,
2158      api_base_url: str | None = None,
2159      organization: str | None = None,
2160      system_prompt: str | None = None,
2161      generation_kwargs: dict[str, Any] | None = None,
2162      timeout: float | None = None,
2163      max_retries: int | None = None,
2164      http_client_kwargs: dict[str, Any] | None = None,
2165  ) -> None
2166  ```
2167  
2168  Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini
2169  
2170  By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters
2171  in the OpenAI client.
2172  
2173  **Parameters:**
2174  
2175  - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI.
2176  - **model** (<code>str</code>) – The name of the model to use.
2177  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2178    The callback function accepts StreamingChunk as an argument.
2179  - **api_base_url** (<code>str | None</code>) – An optional base URL.
2180  - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`.
2181  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the system prompt is
2182    omitted, and the default system prompt of the model is used.
2183  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are all sent directly to
2184    the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
2185    more details.
2186    Some of the supported parameters:
2187  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
2188    including visible output tokens and reasoning tokens.
2189  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
2190    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
2191  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
2192    considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens
2193    comprising the top 10% probability mass are considered.
2194  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
2195    it will generate two completions for each of the three prompts, ending up with 6 completions in total.
2196  - `stop`: One or more sequences after which the LLM should stop generating tokens.
2197  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
2198    the model will be less likely to repeat the same token in the text.
2199  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
2200    Bigger values mean the model will be less likely to repeat the same token in the text.
2201  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
2202    values are the bias to add to that token.
2203  - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable
2204    or set to 30.
2205  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred
2206    from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
2207  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
2208    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
2209  
2210  #### to_dict
2211  
2212  ```python
2213  to_dict() -> dict[str, Any]
2214  ```
2215  
2216  Serialize this component to a dictionary.
2217  
2218  **Returns:**
2219  
2220  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
2221  
2222  #### from_dict
2223  
2224  ```python
2225  from_dict(data: dict[str, Any]) -> OpenAIGenerator
2226  ```
2227  
2228  Deserialize this component from a dictionary.
2229  
2230  **Parameters:**
2231  
2232  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
2233  
2234  **Returns:**
2235  
2236  - <code>OpenAIGenerator</code> – The deserialized component instance.
2237  
2238  #### run
2239  
2240  ```python
2241  run(
2242      prompt: str,
2243      system_prompt: str | None = None,
2244      streaming_callback: StreamingCallbackT | None = None,
2245      generation_kwargs: dict[str, Any] | None = None,
2246  ) -> dict[str, list[str] | list[dict[str, Any]]]
2247  ```
2248  
2249  Invoke the text generation inference based on the provided messages and generation parameters.
2250  
2251  **Parameters:**
2252  
2253  - **prompt** (<code>str</code>) – The string prompt to use for text generation.
2254  - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If this run time system prompt is omitted, the system
2255    prompt, if defined at initialisation time, is used.
2256  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
2257  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will potentially override the parameters
2258    passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to
2259    the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create).
2260  
2261  **Returns:**
2262  
2263  - <code>dict\[str, list\[str\] | list\[dict\[str, Any\]\]\]</code> – A list of strings containing the generated responses and a list of dictionaries containing the metadata
2264    for each response.
2265  
2266  ## openai_dalle
2267  
2268  ### DALLEImageGenerator
2269  
2270  Generates images using OpenAI's DALL-E model.
2271  
2272  For details on OpenAI API parameters, see
2273  [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create).
2274  
2275  ### Usage example
2276  
2277  ```python
2278  from haystack.components.generators import DALLEImageGenerator
2279  image_generator = DALLEImageGenerator()
2280  response = image_generator.run("Show me a picture of a black cat.")
2281  print(response)
2282  ```
2283  
2284  #### __init__
2285  
2286  ```python
2287  __init__(
2288      model: str = "dall-e-3",
2289      quality: Literal["standard", "hd"] = "standard",
2290      size: Literal[
2291          "256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"
2292      ] = "1024x1024",
2293      response_format: Literal["url", "b64_json"] = "url",
2294      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
2295      api_base_url: str | None = None,
2296      organization: str | None = None,
2297      timeout: float | None = None,
2298      max_retries: int | None = None,
2299      http_client_kwargs: dict[str, Any] | None = None,
2300  ) -> None
2301  ```
2302  
2303  Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3.
2304  
2305  **Parameters:**
2306  
2307  - **model** (<code>str</code>) – The model to use for image generation. Can be "dall-e-2" or "dall-e-3".
2308  - **quality** (<code>Literal['standard', 'hd']</code>) – The quality of the generated image. Can be "standard" or "hd".
2309  - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792']</code>) – The size of the generated images.
2310    Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2.
2311    Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models.
2312  - **response_format** (<code>Literal['url', 'b64_json']</code>) – The format of the response. Can be "url" or "b64_json".
2313  - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI.
2314  - **api_base_url** (<code>str | None</code>) – An optional base URL.
2315  - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`.
2316  - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable
2317    or set to 30.
2318  - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred
2319    from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
2320  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
2321    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
2322  
2323  #### warm_up
2324  
2325  ```python
2326  warm_up() -> None
2327  ```
2328  
2329  Warm up the OpenAI client.
2330  
2331  #### run
2332  
2333  ```python
2334  run(
2335      prompt: str,
2336      size: (
2337          Literal["256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"]
2338          | None
2339      ) = None,
2340      quality: Literal["standard", "hd"] | None = None,
2341      response_format: Literal["url", "b64_json"] | None = None,
2342  ) -> dict[str, Any]
2343  ```
2344  
2345  Invokes the image generation inference based on the provided prompt and generation parameters.
2346  
2347  **Parameters:**
2348  
2349  - **prompt** (<code>str</code>) – The prompt to generate the image.
2350  - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792'] | None</code>) – If provided, overrides the size provided during initialization.
2351  - **quality** (<code>Literal['standard', 'hd'] | None</code>) – If provided, overrides the quality provided during initialization.
2352  - **response_format** (<code>Literal['url', 'b64_json'] | None</code>) – If provided, overrides the response format provided during initialization.
2353  
2354  **Returns:**
2355  
2356  - <code>dict\[str, Any\]</code> – A dictionary containing the generated list of images and the revised prompt.
2357    Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings.
2358    The revised prompt is the prompt that was used to generate the image, if there was any revision
2359    to the prompt made by OpenAI.
2360  
2361  #### to_dict
2362  
2363  ```python
2364  to_dict() -> dict[str, Any]
2365  ```
2366  
2367  Serialize this component to a dictionary.
2368  
2369  **Returns:**
2370  
2371  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
2372  
2373  #### from_dict
2374  
2375  ```python
2376  from_dict(data: dict[str, Any]) -> DALLEImageGenerator
2377  ```
2378  
2379  Deserialize this component from a dictionary.
2380  
2381  **Parameters:**
2382  
2383  - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component.
2384  
2385  **Returns:**
2386  
2387  - <code>DALLEImageGenerator</code> – The deserialized component instance.
2388  
2389  ## utils
2390  
2391  ### print_streaming_chunk
2392  
2393  ```python
2394  print_streaming_chunk(chunk: StreamingChunk) -> None
2395  ```
2396  
2397  Callback function to handle and display streaming output chunks.
2398  
2399  This function processes a `StreamingChunk` object by:
2400  
2401  - Printing tool call metadata (if any), including function names and arguments, as they arrive.
2402  - Printing tool call results when available.
2403  - Printing the main content (e.g., text tokens) of the chunk as it is received.
2404  
2405  The function outputs data directly to stdout and flushes output buffers to ensure immediate display during
2406  streaming.
2407  
2408  **Parameters:**
2409  
2410  - **chunk** (<code>StreamingChunk</code>) – A chunk of streaming data containing content and optional metadata, such as tool calls and
2411    tool results.