generators_api.md
   1  ---
   2  title: "Generators"
   3  id: generators-api
   4  description: "Enables text generation using LLMs."
   5  slug: "/generators-api"
   6  ---
   7  
   8  <a id="azure"></a>
   9  
  10  ## Module azure
  11  
  12  <a id="azure.AzureOpenAIGenerator"></a>
  13  
  14  ### AzureOpenAIGenerator
  15  
  16  Generates text using OpenAI's large language models (LLMs).
  17  
  18  It works with the gpt-4 - type models and supports streaming responses
  19  from OpenAI API.
  20  
  21  You can customize how the text is generated by passing parameters to the
  22  OpenAI API. Use the `**generation_kwargs` argument when you initialize
  23  the component or when you run it. Any parameter that works with
  24  `openai.ChatCompletion.create` will work here too.
  25  
  26  
  27  For details on OpenAI API parameters, see
  28  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
  29  
  30  
  31  ### Usage example
  32  
  33  ```python
  34  from haystack.components.generators import AzureOpenAIGenerator
  35  from haystack.utils import Secret
  36  client = AzureOpenAIGenerator(
  37      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
  38      api_key=Secret.from_token("<your-api-key>"),
  39      azure_deployment="<this a model name, e.g.  gpt-4.1-mini>")
  40  response = client.run("What's Natural Language Processing? Be brief.")
  41  print(response)
  42  ```
  43  
  44  ```
  45  >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
  46  >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
  47  >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
  48  >> 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
  49  >> 'completion_tokens': 49, 'total_tokens': 65}}]}
  50  ```
  51  
  52  <a id="azure.AzureOpenAIGenerator.__init__"></a>
  53  
  54  #### AzureOpenAIGenerator.\_\_init\_\_
  55  
  56  ```python
  57  def __init__(azure_endpoint: str | None = None,
  58               api_version: str | None = "2024-12-01-preview",
  59               azure_deployment: str | None = "gpt-4.1-mini",
  60               api_key: Secret | None = Secret.from_env_var(
  61                   "AZURE_OPENAI_API_KEY", strict=False),
  62               azure_ad_token: Secret | None = Secret.from_env_var(
  63                   "AZURE_OPENAI_AD_TOKEN", strict=False),
  64               organization: str | None = None,
  65               streaming_callback: StreamingCallbackT | None = None,
  66               system_prompt: str | None = None,
  67               timeout: float | None = None,
  68               max_retries: int | None = None,
  69               http_client_kwargs: dict[str, Any] | None = None,
  70               generation_kwargs: dict[str, Any] | None = None,
  71               default_headers: dict[str, str] | None = None,
  72               *,
  73               azure_ad_token_provider: AzureADTokenProvider | None = None)
  74  ```
  75  
  76  Initialize the Azure OpenAI Generator.
  77  
  78  **Arguments**:
  79  
  80  - `azure_endpoint`: The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`.
  81  - `api_version`: The version of the API to use. Defaults to 2024-12-01-preview.
  82  - `azure_deployment`: The deployment of the model, usually the model name.
  83  - `api_key`: The API key to use for authentication.
  84  - `azure_ad_token`: [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
  85  - `organization`: Your organization ID, defaults to `None`. For help, see
  86  [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
  87  - `streaming_callback`: A callback function called when a new token is received from the stream.
  88  It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
  89  as an argument.
  90  - `system_prompt`: The system prompt to use for text generation. If not provided, the Generator
  91  omits the system prompt and uses the default system prompt.
  92  - `timeout`: Timeout for AzureOpenAI client. If not set, it is inferred from the
  93  `OPENAI_TIMEOUT` environment variable or set to 30.
  94  - `max_retries`: Maximum retries to establish contact with AzureOpenAI if it returns an internal error.
  95  If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
  96  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
  97  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
  98  - `generation_kwargs`: Other parameters to use for the model, sent directly to
  99  the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for
 100  more details.
 101  Some of the supported parameters:
 102  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 103      including visible output tokens and reasoning tokens.
 104  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 105      Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 106  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 107      considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
 108      comprising the top 10% probability mass are considered.
 109  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 110      the LLM will generate two completions per prompt, resulting in 6 completions total.
 111  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 112  - `presence_penalty`: The penalty applied if a token is already present.
 113      Higher values make the model less likely to repeat the token.
 114  - `frequency_penalty`: Penalty applied if a token has already been generated.
 115      Higher values make the model less likely to repeat the token.
 116  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 117      values are the bias to add to that token.
 118  - `default_headers`: Default headers to use for the AzureOpenAI client.
 119  - `azure_ad_token_provider`: A function that returns an Azure Active Directory token, will be invoked on
 120  every request.
 121  
 122  <a id="azure.AzureOpenAIGenerator.to_dict"></a>
 123  
 124  #### AzureOpenAIGenerator.to\_dict
 125  
 126  ```python
 127  def to_dict() -> dict[str, Any]
 128  ```
 129  
 130  Serialize this component to a dictionary.
 131  
 132  **Returns**:
 133  
 134  The serialized component as a dictionary.
 135  
 136  <a id="azure.AzureOpenAIGenerator.from_dict"></a>
 137  
 138  #### AzureOpenAIGenerator.from\_dict
 139  
 140  ```python
 141  @classmethod
 142  def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIGenerator"
 143  ```
 144  
 145  Deserialize this component from a dictionary.
 146  
 147  **Arguments**:
 148  
 149  - `data`: The dictionary representation of this component.
 150  
 151  **Returns**:
 152  
 153  The deserialized component instance.
 154  
 155  <a id="azure.AzureOpenAIGenerator.run"></a>
 156  
 157  #### AzureOpenAIGenerator.run
 158  
 159  ```python
 160  @component.output_types(replies=list[str], meta=list[dict[str, Any]])
 161  def run(
 162      prompt: str,
 163      system_prompt: str | None = None,
 164      streaming_callback: StreamingCallbackT | None = None,
 165      generation_kwargs: dict[str, Any] | None = None
 166  ) -> dict[str, list[str] | list[dict[str, Any]]]
 167  ```
 168  
 169  Invoke the text generation inference based on the provided messages and generation parameters.
 170  
 171  **Arguments**:
 172  
 173  - `prompt`: The string prompt to use for text generation.
 174  - `system_prompt`: The system prompt to use for text generation. If this run time system prompt is omitted, the system
 175  prompt, if defined at initialisation time, is used.
 176  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 177  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters
 178  passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to
 179  the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create).
 180  
 181  **Returns**:
 182  
 183  A list of strings containing the generated responses and a list of dictionaries containing the metadata
 184  for each response.
 185  
 186  <a id="chat/azure"></a>
 187  
 188  ## Module chat/azure
 189  
 190  <a id="chat/azure.AzureOpenAIChatGenerator"></a>
 191  
 192  ### AzureOpenAIChatGenerator
 193  
 194  Generates text using OpenAI's models on Azure.
 195  
 196  It works with the gpt-4 - type models and supports streaming responses
 197  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 198  format in input and output.
 199  
 200  You can customize how the text is generated by passing parameters to the
 201  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 202  the component or when you run it. Any parameter that works with
 203  `openai.ChatCompletion.create` will work here too.
 204  
 205  For details on OpenAI API parameters, see
 206  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 207  
 208  ### Usage example
 209  
 210  ```python
 211  from haystack.components.generators.chat import AzureOpenAIChatGenerator
 212  from haystack.dataclasses import ChatMessage
 213  from haystack.utils import Secret
 214  
 215  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 216  
 217  client = AzureOpenAIChatGenerator(
 218      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
 219      api_key=Secret.from_token("<your-api-key>"),
 220      azure_deployment="<this a model name, e.g. gpt-4.1-mini>")
 221  response = client.run(messages)
 222  print(response)
 223  ```
 224  
 225  ```
 226  {'replies':
 227      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
 228      "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
 229       enabling computers to understand, interpret, and generate human language in a way that is useful.")],
 230       _name=None,
 231       _meta={'model': 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop',
 232       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})]
 233  }
 234  ```
 235  
 236  <a id="chat/azure.AzureOpenAIChatGenerator.__init__"></a>
 237  
 238  #### AzureOpenAIChatGenerator.\_\_init\_\_
 239  
 240  ```python
 241  def __init__(azure_endpoint: str | None = None,
 242               api_version: str | None = "2024-12-01-preview",
 243               azure_deployment: str | None = "gpt-4.1-mini",
 244               api_key: Secret | None = Secret.from_env_var(
 245                   "AZURE_OPENAI_API_KEY", strict=False),
 246               azure_ad_token: Secret | None = Secret.from_env_var(
 247                   "AZURE_OPENAI_AD_TOKEN", strict=False),
 248               organization: str | None = None,
 249               streaming_callback: StreamingCallbackT | None = None,
 250               timeout: float | None = None,
 251               max_retries: int | None = None,
 252               generation_kwargs: dict[str, Any] | None = None,
 253               default_headers: dict[str, str] | None = None,
 254               tools: ToolsType | None = None,
 255               tools_strict: bool = False,
 256               *,
 257               azure_ad_token_provider: AzureADTokenProvider
 258               | AsyncAzureADTokenProvider | None = None,
 259               http_client_kwargs: dict[str, Any] | None = None)
 260  ```
 261  
 262  Initialize the Azure OpenAI Chat Generator component.
 263  
 264  **Arguments**:
 265  
 266  - `azure_endpoint`: The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
 267  - `api_version`: The version of the API to use. Defaults to 2024-12-01-preview.
 268  - `azure_deployment`: The deployment of the model, usually the model name.
 269  - `api_key`: The API key to use for authentication.
 270  - `azure_ad_token`: [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
 271  - `organization`: Your organization ID, defaults to `None`. For help, see
 272  [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
 273  - `streaming_callback`: A callback function called when a new token is received from the stream.
 274  It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
 275  as an argument.
 276  - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the
 277  `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 278  - `max_retries`: Maximum number of retries to contact OpenAI after an internal error.
 279  If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 280  - `generation_kwargs`: Other parameters to use for the model. These parameters are sent directly to
 281  the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 282  Some of the supported parameters:
 283  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 284      including visible output tokens and reasoning tokens.
 285  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 286      Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 287  - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers
 288      tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising
 289      the top 10% probability mass are considered.
 290  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 291      the LLM will generate two completions per prompt, resulting in 6 completions total.
 292  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 293  - `presence_penalty`: The penalty applied if a token is already present.
 294      Higher values make the model less likely to repeat the token.
 295  - `frequency_penalty`: Penalty applied if a token has already been generated.
 296      Higher values make the model less likely to repeat the token.
 297  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 298      values are the bias to add to that token.
 299  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
 300      If provided, the output will always be validated against this
 301      format (unless the model returns a tool call).
 302      For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
 303      Notes:
 304      - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
 305        Older models only support basic version of structured outputs through `{"type": "json_object"}`.
 306        For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
 307      - For structured outputs with streaming,
 308        the `response_format` must be a JSON schema and not a Pydantic model.
 309  - `default_headers`: Default headers to use for the AzureOpenAI client.
 310  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 311  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 312  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 313  - `azure_ad_token_provider`: A function that returns an Azure Active Directory token, will be invoked on
 314  every request.
 315  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 316  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
 317  
 318  <a id="chat/azure.AzureOpenAIChatGenerator.warm_up"></a>
 319  
 320  #### AzureOpenAIChatGenerator.warm\_up
 321  
 322  ```python
 323  def warm_up()
 324  ```
 325  
 326  Warm up the Azure OpenAI chat generator.
 327  
 328  This will warm up the tools registered in the chat generator.
 329  This method is idempotent and will only warm up the tools once.
 330  
 331  <a id="chat/azure.AzureOpenAIChatGenerator.to_dict"></a>
 332  
 333  #### AzureOpenAIChatGenerator.to\_dict
 334  
 335  ```python
 336  def to_dict() -> dict[str, Any]
 337  ```
 338  
 339  Serialize this component to a dictionary.
 340  
 341  **Returns**:
 342  
 343  The serialized component as a dictionary.
 344  
 345  <a id="chat/azure.AzureOpenAIChatGenerator.from_dict"></a>
 346  
 347  #### AzureOpenAIChatGenerator.from\_dict
 348  
 349  ```python
 350  @classmethod
 351  def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIChatGenerator"
 352  ```
 353  
 354  Deserialize this component from a dictionary.
 355  
 356  **Arguments**:
 357  
 358  - `data`: The dictionary representation of this component.
 359  
 360  **Returns**:
 361  
 362  The deserialized component instance.
 363  
 364  <a id="chat/azure.AzureOpenAIChatGenerator.run"></a>
 365  
 366  #### AzureOpenAIChatGenerator.run
 367  
 368  ```python
 369  @component.output_types(replies=list[ChatMessage])
 370  def run(messages: list[ChatMessage],
 371          streaming_callback: StreamingCallbackT | None = None,
 372          generation_kwargs: dict[str, Any] | None = None,
 373          *,
 374          tools: ToolsType | None = None,
 375          tools_strict: bool | None = None) -> dict[str, list[ChatMessage]]
 376  ```
 377  
 378  Invokes chat completion based on the provided messages and generation parameters.
 379  
 380  **Arguments**:
 381  
 382  - `messages`: A list of ChatMessage instances representing the input messages.
 383  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 384  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
 385  override the parameters passed during component initialization.
 386  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
 387  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 388  If set, it will override the `tools` parameter provided during initialization.
 389  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 390  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 391  If set, it will override the `tools_strict` parameter set during component initialization.
 392  
 393  **Returns**:
 394  
 395  A dictionary with the following key:
 396  - `replies`: A list containing the generated responses as ChatMessage instances.
 397  
 398  <a id="chat/azure.AzureOpenAIChatGenerator.run_async"></a>
 399  
 400  #### AzureOpenAIChatGenerator.run\_async
 401  
 402  ```python
 403  @component.output_types(replies=list[ChatMessage])
 404  async def run_async(
 405          messages: list[ChatMessage],
 406          streaming_callback: StreamingCallbackT | None = None,
 407          generation_kwargs: dict[str, Any] | None = None,
 408          *,
 409          tools: ToolsType | None = None,
 410          tools_strict: bool | None = None) -> dict[str, list[ChatMessage]]
 411  ```
 412  
 413  Asynchronously invokes chat completion based on the provided messages and generation parameters.
 414  
 415  This is the asynchronous version of the `run` method. It has the same parameters and return values
 416  but can be used with `await` in async code.
 417  
 418  **Arguments**:
 419  
 420  - `messages`: A list of ChatMessage instances representing the input messages.
 421  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 422  Must be a coroutine.
 423  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
 424  override the parameters passed during component initialization.
 425  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
 426  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 427  If set, it will override the `tools` parameter provided during initialization.
 428  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 429  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 430  If set, it will override the `tools_strict` parameter set during component initialization.
 431  
 432  **Returns**:
 433  
 434  A dictionary with the following key:
 435  - `replies`: A list containing the generated responses as ChatMessage instances.
 436  
 437  <a id="chat/azure_responses"></a>
 438  
 439  ## Module chat/azure\_responses
 440  
 441  <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator"></a>
 442  
 443  ### AzureOpenAIResponsesChatGenerator
 444  
 445  Completes chats using OpenAI's Responses API on Azure.
 446  
 447  It works with the gpt-5 and o-series models and supports streaming responses
 448  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 449  format in input and output.
 450  
 451  You can customize how the text is generated by passing parameters to the
 452  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 453  the component or when you run it. Any parameter that works with
 454  `openai.Responses.create` will work here too.
 455  
 456  For details on OpenAI API parameters, see
 457  [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses).
 458  
 459  ### Usage example
 460  
 461  ```python
 462  from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator
 463  from haystack.dataclasses import ChatMessage
 464  
 465  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 466  
 467  client = AzureOpenAIResponsesChatGenerator(
 468      azure_endpoint="https://example-resource.azure.openai.com/",
 469      generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}
 470  )
 471  response = client.run(messages)
 472  print(response)
 473  ```
 474  
 475  <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.__init__"></a>
 476  
 477  #### AzureOpenAIResponsesChatGenerator.\_\_init\_\_
 478  
 479  ```python
 480  def __init__(*,
 481               api_key: Secret | Callable[[], str]
 482               | Callable[[], Awaitable[str]] = Secret.from_env_var(
 483                   "AZURE_OPENAI_API_KEY", strict=False),
 484               azure_endpoint: str | None = None,
 485               azure_deployment: str = "gpt-5-mini",
 486               streaming_callback: StreamingCallbackT | None = None,
 487               organization: str | None = None,
 488               generation_kwargs: dict[str, Any] | None = None,
 489               timeout: float | None = None,
 490               max_retries: int | None = None,
 491               tools: ToolsType | None = None,
 492               tools_strict: bool = False,
 493               http_client_kwargs: dict[str, Any] | None = None)
 494  ```
 495  
 496  Initialize the AzureOpenAIResponsesChatGenerator component.
 497  
 498  **Arguments**:
 499  
 500  - `api_key`: The API key to use for authentication. Can be:
 501  - A `Secret` object containing the API key.
 502  - A `Secret` object containing the [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
 503  - A function that returns an Azure Active Directory token.
 504  - `azure_endpoint`: The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
 505  - `azure_deployment`: The deployment of the model, usually the model name.
 506  - `organization`: Your organization ID, defaults to `None`. For help, see
 507  [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
 508  - `streaming_callback`: A callback function called when a new token is received from the stream.
 509  It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
 510  as an argument.
 511  - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the
 512  `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 513  - `max_retries`: Maximum number of retries to contact OpenAI after an internal error.
 514  If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 515  - `generation_kwargs`: Other parameters to use for the model. These parameters are sent
 516  directly to the OpenAI endpoint.
 517  See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for
 518   more details.
 519   Some of the supported parameters:
 520   - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random,
 521       while lower values like 0.2 will make it more focused and deterministic.
 522   - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 523       considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
 524       comprising the top 10% probability mass are considered.
 525   - `previous_response_id`: The ID of the previous response.
 526       Use this to create multi-turn conversations.
 527   - `text_format`: A Pydantic model that enforces the structure of the model's response.
 528       If provided, the output will always be validated against this
 529       format (unless the model returns a tool call).
 530       For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
 531   - `text`: A JSON schema that enforces the structure of the model's response.
 532       If provided, the output will always be validated against this
 533       format (unless the model returns a tool call).
 534       Notes:
 535       - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
 536       - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored.
 537       - Currently, this component doesn't support streaming for structured outputs.
 538       - Older models only support basic version of structured outputs through `{"type": "json_object"}`.
 539           For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
 540   - `reasoning`: A dictionary of parameters for reasoning. For example:
 541       - `summary`: The summary of the reasoning.
 542       - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`.
 543       - `generate_summary`: Whether to generate a summary of the reasoning.
 544       Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled.
 545       For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning).
 546  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 547  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 548  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 549  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 550  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
 551  
 552  <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.to_dict"></a>
 553  
 554  #### AzureOpenAIResponsesChatGenerator.to\_dict
 555  
 556  ```python
 557  def to_dict() -> dict[str, Any]
 558  ```
 559  
 560  Serialize this component to a dictionary.
 561  
 562  **Returns**:
 563  
 564  The serialized component as a dictionary.
 565  
 566  <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.from_dict"></a>
 567  
 568  #### AzureOpenAIResponsesChatGenerator.from\_dict
 569  
 570  ```python
 571  @classmethod
 572  def from_dict(cls, data: dict[str,
 573                                Any]) -> "AzureOpenAIResponsesChatGenerator"
 574  ```
 575  
 576  Deserialize this component from a dictionary.
 577  
 578  **Arguments**:
 579  
 580  - `data`: The dictionary representation of this component.
 581  
 582  **Returns**:
 583  
 584  The deserialized component instance.
 585  
 586  <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.warm_up"></a>
 587  
 588  #### AzureOpenAIResponsesChatGenerator.warm\_up
 589  
 590  ```python
 591  def warm_up()
 592  ```
 593  
 594  Warm up the OpenAI responses chat generator.
 595  
 596  This will warm up the tools registered in the chat generator.
 597  This method is idempotent and will only warm up the tools once.
 598  
 599  <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.run"></a>
 600  
 601  #### AzureOpenAIResponsesChatGenerator.run
 602  
 603  ```python
 604  @component.output_types(replies=list[ChatMessage])
 605  def run(messages: list[ChatMessage],
 606          *,
 607          streaming_callback: StreamingCallbackT | None = None,
 608          generation_kwargs: dict[str, Any] | None = None,
 609          tools: ToolsType | list[dict] | None = None,
 610          tools_strict: bool | None = None) -> dict[str, list[ChatMessage]]
 611  ```
 612  
 613  Invokes response generation based on the provided messages and generation parameters.
 614  
 615  **Arguments**:
 616  
 617  - `messages`: A list of ChatMessage instances representing the input messages.
 618  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 619  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
 620  override the parameters passed during component initialization.
 621  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
 622  - `tools`: The tools that the model can use to prepare calls. If set, it will override the
 623  `tools` parameter set during component initialization. This parameter can accept either a
 624  mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
 625  OpenAI/MCP tool definitions.
 626  Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
 627  For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
 628  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
 629  follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
 630  are strict by default.
 631  If set, it will override the `tools_strict` parameter set during component initialization.
 632  
 633  **Returns**:
 634  
 635  A dictionary with the following key:
 636  - `replies`: A list containing the generated responses as ChatMessage instances.
 637  
 638  <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.run_async"></a>
 639  
 640  #### AzureOpenAIResponsesChatGenerator.run\_async
 641  
 642  ```python
 643  @component.output_types(replies=list[ChatMessage])
 644  async def run_async(
 645          messages: list[ChatMessage],
 646          *,
 647          streaming_callback: StreamingCallbackT | None = None,
 648          generation_kwargs: dict[str, Any] | None = None,
 649          tools: ToolsType | list[dict] | None = None,
 650          tools_strict: bool | None = None) -> dict[str, list[ChatMessage]]
 651  ```
 652  
 653  Asynchronously invokes response generation based on the provided messages and generation parameters.
 654  
 655  This is the asynchronous version of the `run` method. It has the same parameters and return values
 656  but can be used with `await` in async code.
 657  
 658  **Arguments**:
 659  
 660  - `messages`: A list of ChatMessage instances representing the input messages.
 661  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 662  Must be a coroutine.
 663  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
 664  override the parameters passed during component initialization.
 665  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
 666  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
 667  `tools` parameter set during component initialization. This parameter can accept either a list of
 668  mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
 669  OpenAI/MCP tool definitions.
 670  Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
 671  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 672  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 673  If set, it will override the `tools_strict` parameter set during component initialization.
 674  
 675  **Returns**:
 676  
 677  A dictionary with the following key:
 678  - `replies`: A list containing the generated responses as ChatMessage instances.
 679  
 680  <a id="chat/fallback"></a>
 681  
 682  ## Module chat/fallback
 683  
 684  <a id="chat/fallback.FallbackChatGenerator"></a>
 685  
 686  ### FallbackChatGenerator
 687  
 688  A chat generator wrapper that tries multiple chat generators sequentially.
 689  
 690  It forwards all parameters transparently to the underlying chat generators and returns the first successful result.
 691  Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator.
 692  If all chat generators fail, it raises a RuntimeError with details.
 693  
 694  Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only
 695  work correctly if the underlying chat generators implement proper timeout handling and raise exceptions
 696  when timeouts occur. For predictable latency guarantees, ensure your chat generators:
 697  - Support a `timeout` parameter in their initialization
 698  - Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming)
 699  - Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded
 700  
 701  Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters
 702  with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`)
 703  typically applies to all connection phases: connection setup, read, write, and pool. For streaming
 704  responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for
 705  receiving the complete response.
 706  
 707  Failover is automatically triggered when a generator raises any exception, including:
 708  - Timeout errors (if the generator implements and raises them)
 709  - Rate limit errors (429)
 710  - Authentication errors (401)
 711  - Context length errors (400)
 712  - Server errors (500+)
 713  - Any other exception
 714  
 715  <a id="chat/fallback.FallbackChatGenerator.__init__"></a>
 716  
 717  #### FallbackChatGenerator.\_\_init\_\_
 718  
 719  ```python
 720  def __init__(chat_generators: list[ChatGenerator]) -> None
 721  ```
 722  
 723  Creates an instance of FallbackChatGenerator.
 724  
 725  **Arguments**:
 726  
 727  - `chat_generators`: A non-empty list of chat generator components to try in order.
 728  
 729  <a id="chat/fallback.FallbackChatGenerator.to_dict"></a>
 730  
 731  #### FallbackChatGenerator.to\_dict
 732  
 733  ```python
 734  def to_dict() -> dict[str, Any]
 735  ```
 736  
 737  Serialize the component, including nested chat generators when they support serialization.
 738  
 739  <a id="chat/fallback.FallbackChatGenerator.from_dict"></a>
 740  
 741  #### FallbackChatGenerator.from\_dict
 742  
 743  ```python
 744  @classmethod
 745  def from_dict(cls, data: dict[str, Any]) -> FallbackChatGenerator
 746  ```
 747  
 748  Rebuild the component from a serialized representation, restoring nested chat generators.
 749  
 750  <a id="chat/fallback.FallbackChatGenerator.warm_up"></a>
 751  
 752  #### FallbackChatGenerator.warm\_up
 753  
 754  ```python
 755  def warm_up() -> None
 756  ```
 757  
 758  Warm up all underlying chat generators.
 759  
 760  This method calls warm_up() on each underlying generator that supports it.
 761  
 762  <a id="chat/fallback.FallbackChatGenerator.run"></a>
 763  
 764  #### FallbackChatGenerator.run
 765  
 766  ```python
 767  @component.output_types(replies=list[ChatMessage], meta=dict[str, Any])
 768  def run(
 769      messages: list[ChatMessage],
 770      generation_kwargs: dict[str, Any] | None = None,
 771      tools: ToolsType | None = None,
 772      streaming_callback: StreamingCallbackT | None = None
 773  ) -> dict[str, list[ChatMessage] | dict[str, Any]]
 774  ```
 775  
 776  Execute chat generators sequentially until one succeeds.
 777  
 778  **Arguments**:
 779  
 780  - `messages`: The conversation history as a list of ChatMessage instances.
 781  - `generation_kwargs`: Optional parameters for the chat generator (e.g., temperature, max_tokens).
 782  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
 783  - `streaming_callback`: Optional callable for handling streaming responses.
 784  
 785  **Raises**:
 786  
 787  - `RuntimeError`: If all chat generators fail.
 788  
 789  **Returns**:
 790  
 791  A dictionary with:
 792  - "replies": Generated ChatMessage instances from the first successful generator.
 793  - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
 794    total_attempts, failed_chat_generators, plus any metadata from the successful generator.
 795  
 796  <a id="chat/fallback.FallbackChatGenerator.run_async"></a>
 797  
 798  #### FallbackChatGenerator.run\_async
 799  
 800  ```python
 801  @component.output_types(replies=list[ChatMessage], meta=dict[str, Any])
 802  async def run_async(
 803      messages: list[ChatMessage],
 804      generation_kwargs: dict[str, Any] | None = None,
 805      tools: ToolsType | None = None,
 806      streaming_callback: StreamingCallbackT | None = None
 807  ) -> dict[str, list[ChatMessage] | dict[str, Any]]
 808  ```
 809  
 810  Asynchronously execute chat generators sequentially until one succeeds.
 811  
 812  **Arguments**:
 813  
 814  - `messages`: The conversation history as a list of ChatMessage instances.
 815  - `generation_kwargs`: Optional parameters for the chat generator (e.g., temperature, max_tokens).
 816  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
 817  - `streaming_callback`: Optional callable for handling streaming responses.
 818  
 819  **Raises**:
 820  
 821  - `RuntimeError`: If all chat generators fail.
 822  
 823  **Returns**:
 824  
 825  A dictionary with:
 826  - "replies": Generated ChatMessage instances from the first successful generator.
 827  - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
 828    total_attempts, failed_chat_generators, plus any metadata from the successful generator.
 829  
 830  <a id="chat/hugging_face_api"></a>
 831  
 832  ## Module chat/hugging\_face\_api
 833  
 834  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator"></a>
 835  
 836  ### HuggingFaceAPIChatGenerator
 837  
 838  Completes chats using Hugging Face APIs.
 839  
 840  HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 841  format for input and output. Use it to generate text with Hugging Face APIs:
 842  - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers)
 843  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
 844  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
 845  
 846  ### Usage examples
 847  
 848  #### With the serverless inference API (Inference Providers) - free tier available
 849  
 850  ```python
 851  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 852  from haystack.dataclasses import ChatMessage
 853  from haystack.utils import Secret
 854  from haystack.utils.hf import HFGenerationAPIType
 855  
 856  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 857              ChatMessage.from_user("What's Natural Language Processing?")]
 858  
 859  # the api_type can be expressed using the HFGenerationAPIType enum or as a string
 860  api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API
 861  api_type = "serverless_inference_api" # this is equivalent to the above
 862  
 863  generator = HuggingFaceAPIChatGenerator(api_type=api_type,
 864                                          api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
 865                                                      "provider": "together"},
 866                                          token=Secret.from_token("<your-api-key>"))
 867  
 868  result = generator.run(messages)
 869  print(result)
 870  ```
 871  
 872  #### With the serverless inference API (Inference Providers) and text+image input
 873  
 874  ```python
 875  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 876  from haystack.dataclasses import ChatMessage, ImageContent
 877  from haystack.utils import Secret
 878  from haystack.utils.hf import HFGenerationAPIType
 879  
 880  # Create an image from file path, URL, or base64
 881  image = ImageContent.from_file_path("path/to/your/image.jpg")
 882  
 883  # Create a multimodal message with both text and image
 884  messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])]
 885  
 886  generator = HuggingFaceAPIChatGenerator(
 887      api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
 888      api_params={
 889          "model": "Qwen/Qwen2.5-VL-7B-Instruct",  # Vision Language Model
 890          "provider": "hyperbolic"
 891      },
 892      token=Secret.from_token("<your-api-key>")
 893  )
 894  
 895  result = generator.run(messages)
 896  print(result)
 897  ```
 898  
 899  #### With paid inference endpoints
 900  
 901  ```python
 902  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 903  from haystack.dataclasses import ChatMessage
 904  from haystack.utils import Secret
 905  
 906  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 907              ChatMessage.from_user("What's Natural Language Processing?")]
 908  
 909  generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints",
 910                                          api_params={"url": "<your-inference-endpoint-url>"},
 911                                          token=Secret.from_token("<your-api-key>"))
 912  
 913  result = generator.run(messages)
 914  print(result)
 915  
 916  #### With self-hosted text generation inference
 917  
 918  ```python
 919  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
 920  from haystack.dataclasses import ChatMessage
 921  
 922  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
 923              ChatMessage.from_user("What's Natural Language Processing?")]
 924  
 925  generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference",
 926                                          api_params={"url": "http://localhost:8080"})
 927  
 928  result = generator.run(messages)
 929  print(result)
 930  ```
 931  
 932  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.__init__"></a>
 933  
 934  #### HuggingFaceAPIChatGenerator.\_\_init\_\_
 935  
 936  ```python
 937  def __init__(api_type: HFGenerationAPIType | str,
 938               api_params: dict[str, str],
 939               token: Secret | None = Secret.from_env_var(
 940                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
 941               generation_kwargs: dict[str, Any] | None = None,
 942               stop_words: list[str] | None = None,
 943               streaming_callback: StreamingCallbackT | None = None,
 944               tools: ToolsType | None = None)
 945  ```
 946  
 947  Initialize the HuggingFaceAPIChatGenerator instance.
 948  
 949  **Arguments**:
 950  
 951  - `api_type`: The type of Hugging Face API to use. Available types:
 952  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
 953  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
 954  - `serverless_inference_api`: See
 955  [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers).
 956  - `api_params`: A dictionary with the following keys:
 957  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
 958  - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`.
 959  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
 960  `TEXT_GENERATION_INFERENCE`.
 961  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc.
 962  - `token`: The Hugging Face token to use as HTTP bearer authorization.
 963  Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
 964  - `generation_kwargs`: A dictionary with keyword arguments to customize text generation.
 965  Some examples: `max_tokens`, `temperature`, `top_p`.
 966  For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
 967  - `stop_words`: An optional list of strings representing the stop words.
 968  - `streaming_callback`: An optional callable for handling streaming responses.
 969  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 970  The chosen model should support tool/function calling, according to the model card.
 971  Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience
 972  unexpected behavior.
 973  
 974  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.warm_up"></a>
 975  
 976  #### HuggingFaceAPIChatGenerator.warm\_up
 977  
 978  ```python
 979  def warm_up()
 980  ```
 981  
 982  Warm up the Hugging Face API chat generator.
 983  
 984  This will warm up the tools registered in the chat generator.
 985  This method is idempotent and will only warm up the tools once.
 986  
 987  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.to_dict"></a>
 988  
 989  #### HuggingFaceAPIChatGenerator.to\_dict
 990  
 991  ```python
 992  def to_dict() -> dict[str, Any]
 993  ```
 994  
 995  Serialize this component to a dictionary.
 996  
 997  **Returns**:
 998  
 999  A dictionary containing the serialized component.
1000  
1001  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.from_dict"></a>
1002  
1003  #### HuggingFaceAPIChatGenerator.from\_dict
1004  
1005  ```python
1006  @classmethod
1007  def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIChatGenerator"
1008  ```
1009  
1010  Deserialize this component from a dictionary.
1011  
1012  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.run"></a>
1013  
1014  #### HuggingFaceAPIChatGenerator.run
1015  
1016  ```python
1017  @component.output_types(replies=list[ChatMessage])
1018  def run(
1019      messages: list[ChatMessage],
1020      generation_kwargs: dict[str, Any] | None = None,
1021      tools: ToolsType | None = None,
1022      streaming_callback: StreamingCallbackT | None = None
1023  ) -> dict[str, list[ChatMessage]]
1024  ```
1025  
1026  Invoke the text generation inference based on the provided messages and generation parameters.
1027  
1028  **Arguments**:
1029  
1030  - `messages`: A list of ChatMessage objects representing the input messages.
1031  - `generation_kwargs`: Additional keyword arguments for text generation.
1032  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override
1033  the `tools` parameter set during component initialization. This parameter can accept either a
1034  list of `Tool` objects or a `Toolset` instance.
1035  - `streaming_callback`: An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
1036  parameter set during component initialization.
1037  
1038  **Returns**:
1039  
1040  A dictionary with the following keys:
1041  - `replies`: A list containing the generated responses as ChatMessage objects.
1042  
1043  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.run_async"></a>
1044  
1045  #### HuggingFaceAPIChatGenerator.run\_async
1046  
1047  ```python
1048  @component.output_types(replies=list[ChatMessage])
1049  async def run_async(
1050      messages: list[ChatMessage],
1051      generation_kwargs: dict[str, Any] | None = None,
1052      tools: ToolsType | None = None,
1053      streaming_callback: StreamingCallbackT | None = None
1054  ) -> dict[str, list[ChatMessage]]
1055  ```
1056  
1057  Asynchronously invokes the text generation inference based on the provided messages and generation parameters.
1058  
1059  This is the asynchronous version of the `run` method. It has the same parameters
1060  and return values but can be used with `await` in an async code.
1061  
1062  **Arguments**:
1063  
1064  - `messages`: A list of ChatMessage objects representing the input messages.
1065  - `generation_kwargs`: Additional keyword arguments for text generation.
1066  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools`
1067  parameter set during component initialization. This parameter can accept either a list of `Tool` objects
1068  or a `Toolset` instance.
1069  - `streaming_callback`: An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
1070  parameter set during component initialization.
1071  
1072  **Returns**:
1073  
1074  A dictionary with the following keys:
1075  - `replies`: A list containing the generated responses as ChatMessage objects.
1076  
1077  <a id="chat/hugging_face_local"></a>
1078  
1079  ## Module chat/hugging\_face\_local
1080  
1081  <a id="chat/hugging_face_local.default_tool_parser"></a>
1082  
1083  #### default\_tool\_parser
1084  
1085  ```python
1086  def default_tool_parser(text: str) -> list[ToolCall] | None
1087  ```
1088  
1089  Default implementation for parsing tool calls from model output text.
1090  
1091  Uses DEFAULT_TOOL_PATTERN to extract tool calls.
1092  
1093  **Arguments**:
1094  
1095  - `text`: The text to parse for tool calls.
1096  
1097  **Returns**:
1098  
1099  A list containing a single ToolCall if a valid tool call is found, None otherwise.
1100  
1101  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator"></a>
1102  
1103  ### HuggingFaceLocalChatGenerator
1104  
1105  Generates chat responses using models from Hugging Face that run locally.
1106  
1107  Use this component with chat-based models,
1108  such as `Qwen/Qwen3-0.6B` or `meta-llama/Llama-2-7b-chat-hf`.
1109  LLMs running locally may need powerful hardware.
1110  
1111  ### Usage example
1112  
1113  ```python
1114  from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
1115  from haystack.dataclasses import ChatMessage
1116  
1117  generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B")
1118  generator.warm_up()
1119  messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
1120  print(generator.run(messages))
1121  ```
1122  
1123  ```
1124  {'replies':
1125      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
1126      "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals
1127      with the interaction between computers and human language. It enables computers to understand, interpret, and
1128      generate human language in a valuable way. NLP involves various techniques such as speech recognition, text
1129      analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to
1130      process and derive meaning from human language, improving communication between humans and machines.")],
1131      _name=None,
1132      _meta={'finish_reason': 'stop', 'index': 0, 'model':
1133            'mistralai/Mistral-7B-Instruct-v0.2',
1134            'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}})
1135            ]
1136  }
1137  ```
1138  
1139  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.__init__"></a>
1140  
1141  #### HuggingFaceLocalChatGenerator.\_\_init\_\_
1142  
1143  ```python
1144  def __init__(model: str = "Qwen/Qwen3-0.6B",
1145               task: Literal["text-generation", "text2text-generation"]
1146               | None = None,
1147               device: ComponentDevice | None = None,
1148               token: Secret | None = Secret.from_env_var(
1149                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
1150               chat_template: str | None = None,
1151               generation_kwargs: dict[str, Any] | None = None,
1152               huggingface_pipeline_kwargs: dict[str, Any] | None = None,
1153               stop_words: list[str] | None = None,
1154               streaming_callback: StreamingCallbackT | None = None,
1155               tools: ToolsType | None = None,
1156               tool_parsing_function: Callable[[str], list[ToolCall] | None]
1157               | None = None,
1158               async_executor: ThreadPoolExecutor | None = None,
1159               *,
1160               enable_thinking: bool = False) -> None
1161  ```
1162  
1163  Initializes the HuggingFaceLocalChatGenerator component.
1164  
1165  **Arguments**:
1166  
1167  - `model`: The Hugging Face text generation model name or path,
1168  for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`.
1169  The model must be a chat model supporting the ChatML messaging
1170  format.
1171  If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1172  - `task`: The task for the Hugging Face pipeline. Possible options:
1173  - `text-generation`: Supported by decoder models, like GPT.
1174  - `text2text-generation`: Supported by encoder-decoder models, like T5.
1175  If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1176  If not specified, the component calls the Hugging Face API to infer the task from the model name.
1177  - `device`: The device for loading the model. If `None`, automatically selects the default device.
1178  If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
1179  - `token`: The token to use as HTTP bearer authorization for remote files.
1180  If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1181  - `chat_template`: Specifies an optional Jinja template for formatting chat
1182  messages. Most high-quality chat models have their own templates, but for models without this
1183  feature or if you prefer a custom template, use this parameter.
1184  - `generation_kwargs`: A dictionary with keyword arguments to customize text generation.
1185  Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
1186  See Hugging Face's documentation for more information:
1187  - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
1188  - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
1189  The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens.
1190  - `huggingface_pipeline_kwargs`: Dictionary with keyword arguments to initialize the
1191  Hugging Face pipeline for text generation.
1192  These keyword arguments provide fine-grained control over the Hugging Face pipeline.
1193  In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
1194  For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
1195  In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
1196  - `stop_words`: A list of stop words. If the model generates a stop word, the generation stops.
1197  If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
1198  For some chat models, the output includes both the new text and the original prompt.
1199  In these cases, make sure your prompt has no stop words.
1200  - `streaming_callback`: An optional callable for handling streaming responses.
1201  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1202  - `tool_parsing_function`: A callable that takes a string and returns a list of ToolCall objects or None.
1203  If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern.
1204  - `async_executor`: Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be
1205  initialized and used
1206  - `enable_thinking`: Whether to enable thinking mode in the chat template for thinking-capable models.
1207  When enabled, the model generates intermediate reasoning before the final response. Defaults to False.
1208  
1209  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.__del__"></a>
1210  
1211  #### HuggingFaceLocalChatGenerator.\_\_del\_\_
1212  
1213  ```python
1214  def __del__() -> None
1215  ```
1216  
1217  Cleanup when the instance is being destroyed.
1218  
1219  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.shutdown"></a>
1220  
1221  #### HuggingFaceLocalChatGenerator.shutdown
1222  
1223  ```python
1224  def shutdown() -> None
1225  ```
1226  
1227  Explicitly shutdown the executor if we own it.
1228  
1229  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.warm_up"></a>
1230  
1231  #### HuggingFaceLocalChatGenerator.warm\_up
1232  
1233  ```python
1234  def warm_up() -> None
1235  ```
1236  
1237  Initializes the component and warms up tools if provided.
1238  
1239  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.to_dict"></a>
1240  
1241  #### HuggingFaceLocalChatGenerator.to\_dict
1242  
1243  ```python
1244  def to_dict() -> dict[str, Any]
1245  ```
1246  
1247  Serializes the component to a dictionary.
1248  
1249  **Returns**:
1250  
1251  Dictionary with serialized data.
1252  
1253  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.from_dict"></a>
1254  
1255  #### HuggingFaceLocalChatGenerator.from\_dict
1256  
1257  ```python
1258  @classmethod
1259  def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceLocalChatGenerator"
1260  ```
1261  
1262  Deserializes the component from a dictionary.
1263  
1264  **Arguments**:
1265  
1266  - `data`: The dictionary to deserialize from.
1267  
1268  **Returns**:
1269  
1270  The deserialized component.
1271  
1272  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.run"></a>
1273  
1274  #### HuggingFaceLocalChatGenerator.run
1275  
1276  ```python
1277  @component.output_types(replies=list[ChatMessage])
1278  def run(messages: list[ChatMessage],
1279          generation_kwargs: dict[str, Any] | None = None,
1280          streaming_callback: StreamingCallbackT | None = None,
1281          tools: ToolsType | None = None) -> dict[str, list[ChatMessage]]
1282  ```
1283  
1284  Invoke text generation inference based on the provided messages and generation parameters.
1285  
1286  **Arguments**:
1287  
1288  - `messages`: A list of ChatMessage objects representing the input messages.
1289  - `generation_kwargs`: Additional keyword arguments for text generation.
1290  - `streaming_callback`: An optional callable for handling streaming responses.
1291  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1292  If set, it will override the `tools` parameter provided during initialization.
1293  
1294  **Returns**:
1295  
1296  A dictionary with the following keys:
1297  - `replies`: A list containing the generated responses as ChatMessage instances.
1298  
1299  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.create_message"></a>
1300  
1301  #### HuggingFaceLocalChatGenerator.create\_message
1302  
1303  ```python
1304  def create_message(text: str,
1305                     index: int,
1306                     tokenizer: Union["PreTrainedTokenizer",
1307                                      "PreTrainedTokenizerFast"],
1308                     prompt: str,
1309                     generation_kwargs: dict[str, Any],
1310                     parse_tool_calls: bool = False) -> ChatMessage
1311  ```
1312  
1313  Create a ChatMessage instance from the provided text, populated with metadata.
1314  
1315  **Arguments**:
1316  
1317  - `text`: The generated text.
1318  - `index`: The index of the generated text.
1319  - `tokenizer`: The tokenizer used for generation.
1320  - `prompt`: The prompt used for generation.
1321  - `generation_kwargs`: The generation parameters.
1322  - `parse_tool_calls`: Whether to attempt parsing tool calls from the text.
1323  
1324  **Returns**:
1325  
1326  A ChatMessage instance.
1327  
1328  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.run_async"></a>
1329  
1330  #### HuggingFaceLocalChatGenerator.run\_async
1331  
1332  ```python
1333  @component.output_types(replies=list[ChatMessage])
1334  async def run_async(
1335          messages: list[ChatMessage],
1336          generation_kwargs: dict[str, Any] | None = None,
1337          streaming_callback: StreamingCallbackT | None = None,
1338          tools: ToolsType | None = None) -> dict[str, list[ChatMessage]]
1339  ```
1340  
1341  Asynchronously invokes text generation inference based on the provided messages and generation parameters.
1342  
1343  This is the asynchronous version of the `run` method. It has the same parameters
1344  and return values but can be used with `await` in an async code.
1345  
1346  **Arguments**:
1347  
1348  - `messages`: A list of ChatMessage objects representing the input messages.
1349  - `generation_kwargs`: Additional keyword arguments for text generation.
1350  - `streaming_callback`: An optional callable for handling streaming responses.
1351  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1352  If set, it will override the `tools` parameter provided during initialization.
1353  
1354  **Returns**:
1355  
1356  A dictionary with the following keys:
1357  - `replies`: A list containing the generated responses as ChatMessage instances.
1358  
1359  <a id="chat/openai"></a>
1360  
1361  ## Module chat/openai
1362  
1363  <a id="chat/openai.OpenAIChatGenerator"></a>
1364  
1365  ### OpenAIChatGenerator
1366  
1367  Completes chats using OpenAI's large language models (LLMs).
1368  
1369  It works with the gpt-4 and gpt-5 series models and supports streaming responses
1370  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1371  format in input and output.
1372  
1373  You can customize how the text is generated by passing parameters to the
1374  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1375  the component or when you run it. Any parameter that works with
1376  `openai.ChatCompletion.create` will work here too.
1377  
1378  For details on OpenAI API parameters, see
1379  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
1380  
1381  ### Usage example
1382  
1383  ```python
1384  from haystack.components.generators.chat import OpenAIChatGenerator
1385  from haystack.dataclasses import ChatMessage
1386  
1387  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1388  
1389  client = OpenAIChatGenerator()
1390  response = client.run(messages)
1391  print(response)
1392  ```
1393  Output:
1394  ```
1395  {'replies':
1396      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=
1397      [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence
1398          that focuses on enabling computers to understand, interpret, and generate human language in
1399          a way that is meaningful and useful.")],
1400       _name=None,
1401       _meta={'model': 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop',
1402       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})
1403      ]
1404  }
1405  ```
1406  
1407  <a id="chat/openai.OpenAIChatGenerator.__init__"></a>
1408  
1409  #### OpenAIChatGenerator.\_\_init\_\_
1410  
1411  ```python
1412  def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
1413               model: str = "gpt-5-mini",
1414               streaming_callback: StreamingCallbackT | None = None,
1415               api_base_url: str | None = None,
1416               organization: str | None = None,
1417               generation_kwargs: dict[str, Any] | None = None,
1418               timeout: float | None = None,
1419               max_retries: int | None = None,
1420               tools: ToolsType | None = None,
1421               tools_strict: bool = False,
1422               http_client_kwargs: dict[str, Any] | None = None)
1423  ```
1424  
1425  Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini
1426  
1427  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
1428  environment variables to override the `timeout` and `max_retries` parameters respectively
1429  in the OpenAI client.
1430  
1431  **Arguments**:
1432  
1433  - `api_key`: The OpenAI API key.
1434  You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
1435  during initialization.
1436  - `model`: The name of the model to use.
1437  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1438  The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1439  as an argument.
1440  - `api_base_url`: An optional base URL.
1441  - `organization`: Your organization ID, defaults to `None`. See
1442  [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1443  - `generation_kwargs`: Other parameters to use for the model. These parameters are sent directly to
1444  the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
1445  more details.
1446  Some of the supported parameters:
1447  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
1448      including visible output tokens and reasoning tokens.
1449  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
1450      Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
1451  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1452      considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1453      comprising the top 10% probability mass are considered.
1454  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
1455      it will generate two completions for each of the three prompts, ending up with 6 completions in total.
1456  - `stop`: One or more sequences after which the LLM should stop generating tokens.
1457  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
1458      the model will be less likely to repeat the same token in the text.
1459  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
1460      Bigger values mean the model will be less likely to repeat the same token in the text.
1461  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
1462      values are the bias to add to that token.
1463  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
1464      If provided, the output will always be validated against this
1465      format (unless the model returns a tool call).
1466      For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1467      Notes:
1468      - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
1469        Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1470        For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1471      - For structured outputs with streaming,
1472        the `response_format` must be a JSON schema and not a Pydantic model.
1473  - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the
1474  `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1475  - `max_retries`: Maximum number of retries to contact OpenAI after an internal error.
1476  If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1477  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1478  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1479  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1480  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1481  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
1482  
1483  <a id="chat/openai.OpenAIChatGenerator.warm_up"></a>
1484  
1485  #### OpenAIChatGenerator.warm\_up
1486  
1487  ```python
1488  def warm_up()
1489  ```
1490  
1491  Warm up the OpenAI chat generator.
1492  
1493  This will warm up the tools registered in the chat generator.
1494  This method is idempotent and will only warm up the tools once.
1495  
1496  <a id="chat/openai.OpenAIChatGenerator.to_dict"></a>
1497  
1498  #### OpenAIChatGenerator.to\_dict
1499  
1500  ```python
1501  def to_dict() -> dict[str, Any]
1502  ```
1503  
1504  Serialize this component to a dictionary.
1505  
1506  **Returns**:
1507  
1508  The serialized component as a dictionary.
1509  
1510  <a id="chat/openai.OpenAIChatGenerator.from_dict"></a>
1511  
1512  #### OpenAIChatGenerator.from\_dict
1513  
1514  ```python
1515  @classmethod
1516  def from_dict(cls, data: dict[str, Any]) -> "OpenAIChatGenerator"
1517  ```
1518  
1519  Deserialize this component from a dictionary.
1520  
1521  **Arguments**:
1522  
1523  - `data`: The dictionary representation of this component.
1524  
1525  **Returns**:
1526  
1527  The deserialized component instance.
1528  
1529  <a id="chat/openai.OpenAIChatGenerator.run"></a>
1530  
1531  #### OpenAIChatGenerator.run
1532  
1533  ```python
1534  @component.output_types(replies=list[ChatMessage])
1535  def run(messages: list[ChatMessage],
1536          streaming_callback: StreamingCallbackT | None = None,
1537          generation_kwargs: dict[str, Any] | None = None,
1538          *,
1539          tools: ToolsType | None = None,
1540          tools_strict: bool | None = None) -> dict[str, list[ChatMessage]]
1541  ```
1542  
1543  Invokes chat completion based on the provided messages and generation parameters.
1544  
1545  **Arguments**:
1546  
1547  - `messages`: A list of ChatMessage instances representing the input messages.
1548  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1549  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
1550  override the parameters passed during component initialization.
1551  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1552  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1553  If set, it will override the `tools` parameter provided during initialization.
1554  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1555  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1556  If set, it will override the `tools_strict` parameter set during component initialization.
1557  
1558  **Returns**:
1559  
1560  A dictionary with the following key:
1561  - `replies`: A list containing the generated responses as ChatMessage instances.
1562  
1563  <a id="chat/openai.OpenAIChatGenerator.run_async"></a>
1564  
1565  #### OpenAIChatGenerator.run\_async
1566  
1567  ```python
1568  @component.output_types(replies=list[ChatMessage])
1569  async def run_async(
1570          messages: list[ChatMessage],
1571          streaming_callback: StreamingCallbackT | None = None,
1572          generation_kwargs: dict[str, Any] | None = None,
1573          *,
1574          tools: ToolsType | None = None,
1575          tools_strict: bool | None = None) -> dict[str, list[ChatMessage]]
1576  ```
1577  
1578  Asynchronously invokes chat completion based on the provided messages and generation parameters.
1579  
1580  This is the asynchronous version of the `run` method. It has the same parameters and return values
1581  but can be used with `await` in async code.
1582  
1583  **Arguments**:
1584  
1585  - `messages`: A list of ChatMessage instances representing the input messages.
1586  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1587  Must be a coroutine.
1588  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
1589  override the parameters passed during component initialization.
1590  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1591  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1592  If set, it will override the `tools` parameter provided during initialization.
1593  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1594  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1595  If set, it will override the `tools_strict` parameter set during component initialization.
1596  
1597  **Returns**:
1598  
1599  A dictionary with the following key:
1600  - `replies`: A list containing the generated responses as ChatMessage instances.
1601  
1602  <a id="chat/openai_responses"></a>
1603  
1604  ## Module chat/openai\_responses
1605  
1606  <a id="chat/openai_responses.OpenAIResponsesChatGenerator"></a>
1607  
1608  ### OpenAIResponsesChatGenerator
1609  
1610  Completes chats using OpenAI's Responses API.
1611  
1612  It works with the gpt-4 and o-series models and supports streaming responses
1613  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1614  format in input and output.
1615  
1616  You can customize how the text is generated by passing parameters to the
1617  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1618  the component or when you run it. Any parameter that works with
1619  `openai.Responses.create` will work here too.
1620  
1621  For details on OpenAI API parameters, see
1622  [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses).
1623  
1624  ### Usage example
1625  
1626  ```python
1627  from haystack.components.generators.chat import OpenAIResponsesChatGenerator
1628  from haystack.dataclasses import ChatMessage
1629  
1630  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1631  
1632  client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}})
1633  response = client.run(messages)
1634  print(response)
1635  ```
1636  
1637  <a id="chat/openai_responses.OpenAIResponsesChatGenerator.__init__"></a>
1638  
1639  #### OpenAIResponsesChatGenerator.\_\_init\_\_
1640  
1641  ```python
1642  def __init__(*,
1643               api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
1644               model: str = "gpt-5-mini",
1645               streaming_callback: StreamingCallbackT | None = None,
1646               api_base_url: str | None = None,
1647               organization: str | None = None,
1648               generation_kwargs: dict[str, Any] | None = None,
1649               timeout: float | None = None,
1650               max_retries: int | None = None,
1651               tools: ToolsType | list[dict] | None = None,
1652               tools_strict: bool = False,
1653               http_client_kwargs: dict[str, Any] | None = None)
1654  ```
1655  
1656  Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default.
1657  
1658  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
1659  environment variables to override the `timeout` and `max_retries` parameters respectively
1660  in the OpenAI client.
1661  
1662  **Arguments**:
1663  
1664  - `api_key`: The OpenAI API key.
1665  You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
1666  during initialization.
1667  - `model`: The name of the model to use.
1668  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1669  The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1670  as an argument.
1671  - `api_base_url`: An optional base URL.
1672  - `organization`: Your organization ID, defaults to `None`. See
1673  [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1674  - `generation_kwargs`: Other parameters to use for the model. These parameters are sent
1675  directly to the OpenAI endpoint.
1676  See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for
1677   more details.
1678   Some of the supported parameters:
1679   - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random,
1680       while lower values like 0.2 will make it more focused and deterministic.
1681   - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1682       considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1683       comprising the top 10% probability mass are considered.
1684   - `previous_response_id`: The ID of the previous response.
1685       Use this to create multi-turn conversations.
1686   - `text_format`: A Pydantic model that enforces the structure of the model's response.
1687       If provided, the output will always be validated against this
1688       format (unless the model returns a tool call).
1689       For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1690   - `text`: A JSON schema that enforces the structure of the model's response.
1691       If provided, the output will always be validated against this
1692       format (unless the model returns a tool call).
1693       Notes:
1694       - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
1695       - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored.
1696       - Currently, this component doesn't support streaming for structured outputs.
1697       - Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1698           For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1699   - `reasoning`: A dictionary of parameters for reasoning. For example:
1700       - `summary`: The summary of the reasoning.
1701       - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`.
1702       - `generate_summary`: Whether to generate a summary of the reasoning.
1703       Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled.
1704       For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning).
1705  - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the
1706  `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1707  - `max_retries`: Maximum number of retries to contact OpenAI after an internal error.
1708  If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1709  - `tools`: The tools that the model can use to prepare calls. This parameter can accept either a
1710  mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1711  OpenAI/MCP tool definitions.
1712  Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1713  For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
1714  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
1715  follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
1716  are strict by default.
1717  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1718  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
1719  
1720  <a id="chat/openai_responses.OpenAIResponsesChatGenerator.warm_up"></a>
1721  
1722  #### OpenAIResponsesChatGenerator.warm\_up
1723  
1724  ```python
1725  def warm_up()
1726  ```
1727  
1728  Warm up the OpenAI responses chat generator.
1729  
1730  This will warm up the tools registered in the chat generator.
1731  This method is idempotent and will only warm up the tools once.
1732  
1733  <a id="chat/openai_responses.OpenAIResponsesChatGenerator.to_dict"></a>
1734  
1735  #### OpenAIResponsesChatGenerator.to\_dict
1736  
1737  ```python
1738  def to_dict() -> dict[str, Any]
1739  ```
1740  
1741  Serialize this component to a dictionary.
1742  
1743  **Returns**:
1744  
1745  The serialized component as a dictionary.
1746  
1747  <a id="chat/openai_responses.OpenAIResponsesChatGenerator.from_dict"></a>
1748  
1749  #### OpenAIResponsesChatGenerator.from\_dict
1750  
1751  ```python
1752  @classmethod
1753  def from_dict(cls, data: dict[str, Any]) -> "OpenAIResponsesChatGenerator"
1754  ```
1755  
1756  Deserialize this component from a dictionary.
1757  
1758  **Arguments**:
1759  
1760  - `data`: The dictionary representation of this component.
1761  
1762  **Returns**:
1763  
1764  The deserialized component instance.
1765  
1766  <a id="chat/openai_responses.OpenAIResponsesChatGenerator.run"></a>
1767  
1768  #### OpenAIResponsesChatGenerator.run
1769  
1770  ```python
1771  @component.output_types(replies=list[ChatMessage])
1772  def run(messages: list[ChatMessage],
1773          *,
1774          streaming_callback: StreamingCallbackT | None = None,
1775          generation_kwargs: dict[str, Any] | None = None,
1776          tools: ToolsType | list[dict] | None = None,
1777          tools_strict: bool | None = None) -> dict[str, list[ChatMessage]]
1778  ```
1779  
1780  Invokes response generation based on the provided messages and generation parameters.
1781  
1782  **Arguments**:
1783  
1784  - `messages`: A list of ChatMessage instances representing the input messages.
1785  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1786  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
1787  override the parameters passed during component initialization.
1788  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
1789  - `tools`: The tools that the model can use to prepare calls. If set, it will override the
1790  `tools` parameter set during component initialization. This parameter can accept either a
1791  mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1792  OpenAI/MCP tool definitions.
1793  Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1794  For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
1795  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
1796  follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
1797  are strict by default.
1798  If set, it will override the `tools_strict` parameter set during component initialization.
1799  
1800  **Returns**:
1801  
1802  A dictionary with the following key:
1803  - `replies`: A list containing the generated responses as ChatMessage instances.
1804  
1805  <a id="chat/openai_responses.OpenAIResponsesChatGenerator.run_async"></a>
1806  
1807  #### OpenAIResponsesChatGenerator.run\_async
1808  
1809  ```python
1810  @component.output_types(replies=list[ChatMessage])
1811  async def run_async(
1812          messages: list[ChatMessage],
1813          *,
1814          streaming_callback: StreamingCallbackT | None = None,
1815          generation_kwargs: dict[str, Any] | None = None,
1816          tools: ToolsType | list[dict] | None = None,
1817          tools_strict: bool | None = None) -> dict[str, list[ChatMessage]]
1818  ```
1819  
1820  Asynchronously invokes response generation based on the provided messages and generation parameters.
1821  
1822  This is the asynchronous version of the `run` method. It has the same parameters and return values
1823  but can be used with `await` in async code.
1824  
1825  **Arguments**:
1826  
1827  - `messages`: A list of ChatMessage instances representing the input messages.
1828  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1829  Must be a coroutine.
1830  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
1831  override the parameters passed during component initialization.
1832  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
1833  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
1834  `tools` parameter set during component initialization. This parameter can accept either a list of
1835  mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1836  OpenAI/MCP tool definitions.
1837  Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1838  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1839  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1840  If set, it will override the `tools_strict` parameter set during component initialization.
1841  
1842  **Returns**:
1843  
1844  A dictionary with the following key:
1845  - `replies`: A list containing the generated responses as ChatMessage instances.
1846  
1847  <a id="hugging_face_api"></a>
1848  
1849  ## Module hugging\_face\_api
1850  
1851  <a id="hugging_face_api.HuggingFaceAPIGenerator"></a>
1852  
1853  ### HuggingFaceAPIGenerator
1854  
1855  Generates text using Hugging Face APIs.
1856  
1857  Use it with the following Hugging Face APIs:
1858  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
1859  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
1860  
1861  **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the
1862  `text_generation` endpoint. Generative models are now only available through providers supporting the
1863  `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API.
1864  Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint.
1865  
1866  ### Usage examples
1867  
1868  #### With Hugging Face Inference Endpoints
1869  
1870  
1871  #### With self-hosted text generation inference
1872  
1873  #### With the free serverless inference API
1874  
1875  Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the
1876  `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the
1877  `chat_completion` endpoint.
1878  
1879  ```python
1880  from haystack.components.generators import HuggingFaceAPIGenerator
1881  from haystack.utils import Secret
1882  
1883  generator = HuggingFaceAPIGenerator(api_type="inference_endpoints",
1884                                      api_params={"url": "<your-inference-endpoint-url>"},
1885                                      token=Secret.from_token("<your-api-key>"))
1886  
1887  result = generator.run(prompt="What's Natural Language Processing?")
1888  print(result)
1889  ```
1890  ```python
1891  from haystack.components.generators import HuggingFaceAPIGenerator
1892  
1893  generator = HuggingFaceAPIGenerator(api_type="text_generation_inference",
1894                                      api_params={"url": "http://localhost:8080"})
1895  
1896  result = generator.run(prompt="What's Natural Language Processing?")
1897  print(result)
1898  ```
1899  ```python
1900  from haystack.components.generators import HuggingFaceAPIGenerator
1901  from haystack.utils import Secret
1902  
1903  generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
1904                                      api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
1905                                      token=Secret.from_token("<your-api-key>"))
1906  
1907  result = generator.run(prompt="What's Natural Language Processing?")
1908  print(result)
1909  ```
1910  
1911  <a id="hugging_face_api.HuggingFaceAPIGenerator.__init__"></a>
1912  
1913  #### HuggingFaceAPIGenerator.\_\_init\_\_
1914  
1915  ```python
1916  def __init__(api_type: HFGenerationAPIType | str,
1917               api_params: dict[str, str],
1918               token: Secret | None = Secret.from_env_var(
1919                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
1920               generation_kwargs: dict[str, Any] | None = None,
1921               stop_words: list[str] | None = None,
1922               streaming_callback: StreamingCallbackT | None = None)
1923  ```
1924  
1925  Initialize the HuggingFaceAPIGenerator instance.
1926  
1927  **Arguments**:
1928  
1929  - `api_type`: The type of Hugging Face API to use. Available types:
1930  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
1931  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
1932  - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api).
1933    This might no longer work due to changes in the models offered in the Hugging Face Inference API.
1934    Please use the `HuggingFaceAPIChatGenerator` component instead.
1935  - `api_params`: A dictionary with the following keys:
1936  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
1937  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
1938  `TEXT_GENERATION_INFERENCE`.
1939  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc.
1940  - `token`: The Hugging Face token to use as HTTP bearer authorization.
1941  Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
1942  - `generation_kwargs`: A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`,
1943  `temperature`, `top_k`, `top_p`.
1944  For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation)
1945  for more information.
1946  - `stop_words`: An optional list of strings representing the stop words.
1947  - `streaming_callback`: An optional callable for handling streaming responses.
1948  
1949  <a id="hugging_face_api.HuggingFaceAPIGenerator.to_dict"></a>
1950  
1951  #### HuggingFaceAPIGenerator.to\_dict
1952  
1953  ```python
1954  def to_dict() -> dict[str, Any]
1955  ```
1956  
1957  Serialize this component to a dictionary.
1958  
1959  **Returns**:
1960  
1961  A dictionary containing the serialized component.
1962  
1963  <a id="hugging_face_api.HuggingFaceAPIGenerator.from_dict"></a>
1964  
1965  #### HuggingFaceAPIGenerator.from\_dict
1966  
1967  ```python
1968  @classmethod
1969  def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIGenerator"
1970  ```
1971  
1972  Deserialize this component from a dictionary.
1973  
1974  <a id="hugging_face_api.HuggingFaceAPIGenerator.run"></a>
1975  
1976  #### HuggingFaceAPIGenerator.run
1977  
1978  ```python
1979  @component.output_types(replies=list[str], meta=list[dict[str, Any]])
1980  def run(prompt: str,
1981          streaming_callback: StreamingCallbackT | None = None,
1982          generation_kwargs: dict[str, Any] | None = None)
1983  ```
1984  
1985  Invoke the text generation inference for the given prompt and generation parameters.
1986  
1987  **Arguments**:
1988  
1989  - `prompt`: A string representing the prompt.
1990  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1991  - `generation_kwargs`: Additional keyword arguments for text generation.
1992  
1993  **Returns**:
1994  
1995  A dictionary with the generated replies and metadata. Both are lists of length n.
1996  - replies: A list of strings representing the generated replies.
1997  
1998  <a id="hugging_face_local"></a>
1999  
2000  ## Module hugging\_face\_local
2001  
2002  <a id="hugging_face_local.HuggingFaceLocalGenerator"></a>
2003  
2004  ### HuggingFaceLocalGenerator
2005  
2006  Generates text using models from Hugging Face that run locally.
2007  
2008  LLMs running locally may need powerful hardware.
2009  
2010  ### Usage example
2011  
2012  ```python
2013  from haystack.components.generators import HuggingFaceLocalGenerator
2014  
2015  generator = HuggingFaceLocalGenerator(
2016      model="google/flan-t5-large",
2017      task="text2text-generation",
2018      generation_kwargs={"max_new_tokens": 100, "temperature": 0.9})
2019  
2020  generator.warm_up()
2021  
2022  print(generator.run("Who is the best American actor?"))
2023  # {'replies': ['John Cusack']}
2024  ```
2025  
2026  <a id="hugging_face_local.HuggingFaceLocalGenerator.__init__"></a>
2027  
2028  #### HuggingFaceLocalGenerator.\_\_init\_\_
2029  
2030  ```python
2031  def __init__(model: str = "google/flan-t5-base",
2032               task: Literal["text-generation", "text2text-generation"]
2033               | None = None,
2034               device: ComponentDevice | None = None,
2035               token: Secret | None = Secret.from_env_var(
2036                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
2037               generation_kwargs: dict[str, Any] | None = None,
2038               huggingface_pipeline_kwargs: dict[str, Any] | None = None,
2039               stop_words: list[str] | None = None,
2040               streaming_callback: StreamingCallbackT | None = None)
2041  ```
2042  
2043  Creates an instance of a HuggingFaceLocalGenerator.
2044  
2045  **Arguments**:
2046  
2047  - `model`: The Hugging Face text generation model name or path.
2048  - `task`: The task for the Hugging Face pipeline. Possible options:
2049  - `text-generation`: Supported by decoder models, like GPT.
2050  - `text2text-generation`: Supported by encoder-decoder models, like T5.
2051  If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
2052  If not specified, the component calls the Hugging Face API to infer the task from the model name.
2053  - `device`: The device for loading the model. If `None`, automatically selects the default device.
2054  If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
2055  - `token`: The token to use as HTTP bearer authorization for remote files.
2056  If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
2057  - `generation_kwargs`: A dictionary with keyword arguments to customize text generation.
2058  Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
2059  See Hugging Face's documentation for more information:
2060  - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
2061  - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
2062  - `huggingface_pipeline_kwargs`: Dictionary with keyword arguments to initialize the
2063  Hugging Face pipeline for text generation.
2064  These keyword arguments provide fine-grained control over the Hugging Face pipeline.
2065  In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
2066  For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
2067  In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization:
2068  [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
2069  - `stop_words`: If the model generates a stop word, the generation stops.
2070  If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
2071  For some chat models, the output includes both the new text and the original prompt.
2072  In these cases, make sure your prompt has no stop words.
2073  - `streaming_callback`: An optional callable for handling streaming responses.
2074  
2075  <a id="hugging_face_local.HuggingFaceLocalGenerator.warm_up"></a>
2076  
2077  #### HuggingFaceLocalGenerator.warm\_up
2078  
2079  ```python
2080  def warm_up()
2081  ```
2082  
2083  Initializes the component.
2084  
2085  <a id="hugging_face_local.HuggingFaceLocalGenerator.to_dict"></a>
2086  
2087  #### HuggingFaceLocalGenerator.to\_dict
2088  
2089  ```python
2090  def to_dict() -> dict[str, Any]
2091  ```
2092  
2093  Serializes the component to a dictionary.
2094  
2095  **Returns**:
2096  
2097  Dictionary with serialized data.
2098  
2099  <a id="hugging_face_local.HuggingFaceLocalGenerator.from_dict"></a>
2100  
2101  #### HuggingFaceLocalGenerator.from\_dict
2102  
2103  ```python
2104  @classmethod
2105  def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceLocalGenerator"
2106  ```
2107  
2108  Deserializes the component from a dictionary.
2109  
2110  **Arguments**:
2111  
2112  - `data`: The dictionary to deserialize from.
2113  
2114  **Returns**:
2115  
2116  The deserialized component.
2117  
2118  <a id="hugging_face_local.HuggingFaceLocalGenerator.run"></a>
2119  
2120  #### HuggingFaceLocalGenerator.run
2121  
2122  ```python
2123  @component.output_types(replies=list[str])
2124  def run(prompt: str,
2125          streaming_callback: StreamingCallbackT | None = None,
2126          generation_kwargs: dict[str, Any] | None = None)
2127  ```
2128  
2129  Run the text generation model on the given prompt.
2130  
2131  **Arguments**:
2132  
2133  - `prompt`: A string representing the prompt.
2134  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
2135  - `generation_kwargs`: Additional keyword arguments for text generation.
2136  
2137  **Returns**:
2138  
2139  A dictionary containing the generated replies.
2140  - replies: A list of strings representing the generated replies.
2141  
2142  <a id="openai"></a>
2143  
2144  ## Module openai
2145  
2146  <a id="openai.OpenAIGenerator"></a>
2147  
2148  ### OpenAIGenerator
2149  
2150  Generates text using OpenAI's large language models (LLMs).
2151  
2152  It works with the gpt-4 and gpt-5 series models and supports streaming responses
2153  from OpenAI API. It uses strings as input and output.
2154  
2155  You can customize how the text is generated by passing parameters to the
2156  OpenAI API. Use the `**generation_kwargs` argument when you initialize
2157  the component or when you run it. Any parameter that works with
2158  `openai.ChatCompletion.create` will work here too.
2159  
2160  
2161  For details on OpenAI API parameters, see
2162  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
2163  
2164  ### Usage example
2165  
2166  ```python
2167  from haystack.components.generators import OpenAIGenerator
2168  client = OpenAIGenerator()
2169  response = client.run("What's Natural Language Processing? Be brief.")
2170  print(response)
2171  
2172  >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
2173  >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
2174  >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
2175  >> 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
2176  >> 'completion_tokens': 49, 'total_tokens': 65}}]}
2177  ```
2178  
2179  <a id="openai.OpenAIGenerator.__init__"></a>
2180  
2181  #### OpenAIGenerator.\_\_init\_\_
2182  
2183  ```python
2184  def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
2185               model: str = "gpt-5-mini",
2186               streaming_callback: StreamingCallbackT | None = None,
2187               api_base_url: str | None = None,
2188               organization: str | None = None,
2189               system_prompt: str | None = None,
2190               generation_kwargs: dict[str, Any] | None = None,
2191               timeout: float | None = None,
2192               max_retries: int | None = None,
2193               http_client_kwargs: dict[str, Any] | None = None)
2194  ```
2195  
2196  Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini
2197  
2198  By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters
2199  in the OpenAI client.
2200  
2201  **Arguments**:
2202  
2203  - `api_key`: The OpenAI API key to connect to OpenAI.
2204  - `model`: The name of the model to use.
2205  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
2206  The callback function accepts StreamingChunk as an argument.
2207  - `api_base_url`: An optional base URL.
2208  - `organization`: The Organization ID, defaults to `None`.
2209  - `system_prompt`: The system prompt to use for text generation. If not provided, the system prompt is
2210  omitted, and the default system prompt of the model is used.
2211  - `generation_kwargs`: Other parameters to use for the model. These parameters are all sent directly to
2212  the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
2213  more details.
2214  Some of the supported parameters:
2215  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
2216      including visible output tokens and reasoning tokens.
2217  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
2218      Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
2219  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
2220      considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens
2221      comprising the top 10% probability mass are considered.
2222  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
2223      it will generate two completions for each of the three prompts, ending up with 6 completions in total.
2224  - `stop`: One or more sequences after which the LLM should stop generating tokens.
2225  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
2226      the model will be less likely to repeat the same token in the text.
2227  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
2228      Bigger values mean the model will be less likely to repeat the same token in the text.
2229  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
2230      values are the bias to add to that token.
2231  - `timeout`: Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable
2232  or set to 30.
2233  - `max_retries`: Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred
2234  from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
2235  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
2236  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
2237  
2238  <a id="openai.OpenAIGenerator.to_dict"></a>
2239  
2240  #### OpenAIGenerator.to\_dict
2241  
2242  ```python
2243  def to_dict() -> dict[str, Any]
2244  ```
2245  
2246  Serialize this component to a dictionary.
2247  
2248  **Returns**:
2249  
2250  The serialized component as a dictionary.
2251  
2252  <a id="openai.OpenAIGenerator.from_dict"></a>
2253  
2254  #### OpenAIGenerator.from\_dict
2255  
2256  ```python
2257  @classmethod
2258  def from_dict(cls, data: dict[str, Any]) -> "OpenAIGenerator"
2259  ```
2260  
2261  Deserialize this component from a dictionary.
2262  
2263  **Arguments**:
2264  
2265  - `data`: The dictionary representation of this component.
2266  
2267  **Returns**:
2268  
2269  The deserialized component instance.
2270  
2271  <a id="openai.OpenAIGenerator.run"></a>
2272  
2273  #### OpenAIGenerator.run
2274  
2275  ```python
2276  @component.output_types(replies=list[str], meta=list[dict[str, Any]])
2277  def run(
2278      prompt: str,
2279      system_prompt: str | None = None,
2280      streaming_callback: StreamingCallbackT | None = None,
2281      generation_kwargs: dict[str, Any] | None = None
2282  ) -> dict[str, list[str] | list[dict[str, Any]]]
2283  ```
2284  
2285  Invoke the text generation inference based on the provided messages and generation parameters.
2286  
2287  **Arguments**:
2288  
2289  - `prompt`: The string prompt to use for text generation.
2290  - `system_prompt`: The system prompt to use for text generation. If this run time system prompt is omitted, the system
2291  prompt, if defined at initialisation time, is used.
2292  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
2293  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters
2294  passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to
2295  the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create).
2296  
2297  **Returns**:
2298  
2299  A list of strings containing the generated responses and a list of dictionaries containing the metadata
2300  for each response.
2301  
2302  <a id="openai_dalle"></a>
2303  
2304  ## Module openai\_dalle
2305  
2306  <a id="openai_dalle.DALLEImageGenerator"></a>
2307  
2308  ### DALLEImageGenerator
2309  
2310  Generates images using OpenAI's DALL-E model.
2311  
2312  For details on OpenAI API parameters, see
2313  [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create).
2314  
2315  ### Usage example
2316  
2317  ```python
2318  from haystack.components.generators import DALLEImageGenerator
2319  image_generator = DALLEImageGenerator()
2320  response = image_generator.run("Show me a picture of a black cat.")
2321  print(response)
2322  ```
2323  
2324  <a id="openai_dalle.DALLEImageGenerator.__init__"></a>
2325  
2326  #### DALLEImageGenerator.\_\_init\_\_
2327  
2328  ```python
2329  def __init__(model: str = "dall-e-3",
2330               quality: Literal["standard", "hd"] = "standard",
2331               size: Literal["256x256", "512x512", "1024x1024", "1792x1024",
2332                             "1024x1792"] = "1024x1024",
2333               response_format: Literal["url", "b64_json"] = "url",
2334               api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
2335               api_base_url: str | None = None,
2336               organization: str | None = None,
2337               timeout: float | None = None,
2338               max_retries: int | None = None,
2339               http_client_kwargs: dict[str, Any] | None = None)
2340  ```
2341  
2342  Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3.
2343  
2344  **Arguments**:
2345  
2346  - `model`: The model to use for image generation. Can be "dall-e-2" or "dall-e-3".
2347  - `quality`: The quality of the generated image. Can be "standard" or "hd".
2348  - `size`: The size of the generated images.
2349  Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2.
2350  Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models.
2351  - `response_format`: The format of the response. Can be "url" or "b64_json".
2352  - `api_key`: The OpenAI API key to connect to OpenAI.
2353  - `api_base_url`: An optional base URL.
2354  - `organization`: The Organization ID, defaults to `None`.
2355  - `timeout`: Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable
2356  or set to 30.
2357  - `max_retries`: Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred
2358  from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
2359  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
2360  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
2361  
2362  <a id="openai_dalle.DALLEImageGenerator.warm_up"></a>
2363  
2364  #### DALLEImageGenerator.warm\_up
2365  
2366  ```python
2367  def warm_up() -> None
2368  ```
2369  
2370  Warm up the OpenAI client.
2371  
2372  <a id="openai_dalle.DALLEImageGenerator.run"></a>
2373  
2374  #### DALLEImageGenerator.run
2375  
2376  ```python
2377  @component.output_types(images=list[str], revised_prompt=str)
2378  def run(prompt: str,
2379          size: Literal["256x256", "512x512", "1024x1024", "1792x1024",
2380                        "1024x1792"] | None = None,
2381          quality: Literal["standard", "hd"] | None = None,
2382          response_format: Literal["url", "b64_json"] | None = None)
2383  ```
2384  
2385  Invokes the image generation inference based on the provided prompt and generation parameters.
2386  
2387  **Arguments**:
2388  
2389  - `prompt`: The prompt to generate the image.
2390  - `size`: If provided, overrides the size provided during initialization.
2391  - `quality`: If provided, overrides the quality provided during initialization.
2392  - `response_format`: If provided, overrides the response format provided during initialization.
2393  
2394  **Returns**:
2395  
2396  A dictionary containing the generated list of images and the revised prompt.
2397  Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings.
2398  The revised prompt is the prompt that was used to generate the image, if there was any revision
2399  to the prompt made by OpenAI.
2400  
2401  <a id="openai_dalle.DALLEImageGenerator.to_dict"></a>
2402  
2403  #### DALLEImageGenerator.to\_dict
2404  
2405  ```python
2406  def to_dict() -> dict[str, Any]
2407  ```
2408  
2409  Serialize this component to a dictionary.
2410  
2411  **Returns**:
2412  
2413  The serialized component as a dictionary.
2414  
2415  <a id="openai_dalle.DALLEImageGenerator.from_dict"></a>
2416  
2417  #### DALLEImageGenerator.from\_dict
2418  
2419  ```python
2420  @classmethod
2421  def from_dict(cls, data: dict[str, Any]) -> "DALLEImageGenerator"
2422  ```
2423  
2424  Deserialize this component from a dictionary.
2425  
2426  **Arguments**:
2427  
2428  - `data`: The dictionary representation of this component.
2429  
2430  **Returns**:
2431  
2432  The deserialized component instance.
2433  
2434  <a id="utils"></a>
2435  
2436  ## Module utils
2437  
2438  <a id="utils.print_streaming_chunk"></a>
2439  
2440  #### print\_streaming\_chunk
2441  
2442  ```python
2443  def print_streaming_chunk(chunk: StreamingChunk) -> None
2444  ```
2445  
2446  Callback function to handle and display streaming output chunks.
2447  
2448  This function processes a `StreamingChunk` object by:
2449  - Printing tool call metadata (if any), including function names and arguments, as they arrive.
2450  - Printing tool call results when available.
2451  - Printing the main content (e.g., text tokens) of the chunk as it is received.
2452  
2453  The function outputs data directly to stdout and flushes output buffers to ensure immediate display during
2454  streaming.
2455  
2456  **Arguments**:
2457  
2458  - `chunk`: A chunk of streaming data containing content and optional metadata, such as tool calls and
2459  tool results.
2460