generators_api.md
   1  ---
   2  title: "Generators"
   3  id: generators-api
   4  description: "Enables text generation using LLMs."
   5  slug: "/generators-api"
   6  ---
   7  
   8  <a id="azure"></a>
   9  
  10  ## Module azure
  11  
  12  <a id="azure.AzureOpenAIGenerator"></a>
  13  
  14  ### AzureOpenAIGenerator
  15  
  16  Generates text using OpenAI's large language models (LLMs).
  17  
  18  It works with the gpt-4 - type models and supports streaming responses
  19  from OpenAI API.
  20  
  21  You can customize how the text is generated by passing parameters to the
  22  OpenAI API. Use the `**generation_kwargs` argument when you initialize
  23  the component or when you run it. Any parameter that works with
  24  `openai.ChatCompletion.create` will work here too.
  25  
  26  
  27  For details on OpenAI API parameters, see
  28  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
  29  
  30  
  31  ### Usage example
  32  
  33  ```python
  34  from haystack.components.generators import AzureOpenAIGenerator
  35  from haystack.utils import Secret
  36  client = AzureOpenAIGenerator(
  37      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
  38      api_key=Secret.from_token("<your-api-key>"),
  39      azure_deployment="<this a model name, e.g.  gpt-4o-mini>")
  40  response = client.run("What's Natural Language Processing? Be brief.")
  41  print(response)
  42  ```
  43  
  44  ```
  45  >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
  46  >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
  47  >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
  48  >> 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
  49  >> 'completion_tokens': 49, 'total_tokens': 65}}]}
  50  ```
  51  
  52  <a id="azure.AzureOpenAIGenerator.__init__"></a>
  53  
  54  #### AzureOpenAIGenerator.\_\_init\_\_
  55  
  56  ```python
  57  def __init__(azure_endpoint: Optional[str] = None,
  58               api_version: Optional[str] = "2023-05-15",
  59               azure_deployment: Optional[str] = "gpt-4o-mini",
  60               api_key: Optional[Secret] = Secret.from_env_var(
  61                   "AZURE_OPENAI_API_KEY", strict=False),
  62               azure_ad_token: Optional[Secret] = Secret.from_env_var(
  63                   "AZURE_OPENAI_AD_TOKEN", strict=False),
  64               organization: Optional[str] = None,
  65               streaming_callback: Optional[StreamingCallbackT] = None,
  66               system_prompt: Optional[str] = None,
  67               timeout: Optional[float] = None,
  68               max_retries: Optional[int] = None,
  69               http_client_kwargs: Optional[dict[str, Any]] = None,
  70               generation_kwargs: Optional[dict[str, Any]] = None,
  71               default_headers: Optional[dict[str, str]] = None,
  72               *,
  73               azure_ad_token_provider: Optional[AzureADTokenProvider] = None)
  74  ```
  75  
  76  Initialize the Azure OpenAI Generator.
  77  
  78  **Arguments**:
  79  
  80  - `azure_endpoint`: The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`.
  81  - `api_version`: The version of the API to use. Defaults to 2023-05-15.
  82  - `azure_deployment`: The deployment of the model, usually the model name.
  83  - `api_key`: The API key to use for authentication.
  84  - `azure_ad_token`: [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
  85  - `organization`: Your organization ID, defaults to `None`. For help, see
  86  [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
  87  - `streaming_callback`: A callback function called when a new token is received from the stream.
  88  It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
  89  as an argument.
  90  - `system_prompt`: The system prompt to use for text generation. If not provided, the Generator
  91  omits the system prompt and uses the default system prompt.
  92  - `timeout`: Timeout for AzureOpenAI client. If not set, it is inferred from the
  93  `OPENAI_TIMEOUT` environment variable or set to 30.
  94  - `max_retries`: Maximum retries to establish contact with AzureOpenAI if it returns an internal error.
  95  If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
  96  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
  97  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
  98  - `generation_kwargs`: Other parameters to use for the model, sent directly to
  99  the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for
 100  more details.
 101  Some of the supported parameters:
 102  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 103      including visible output tokens and reasoning tokens.
 104  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 105      Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 106  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 107      considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
 108      comprising the top 10% probability mass are considered.
 109  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 110      the LLM will generate two completions per prompt, resulting in 6 completions total.
 111  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 112  - `presence_penalty`: The penalty applied if a token is already present.
 113      Higher values make the model less likely to repeat the token.
 114  - `frequency_penalty`: Penalty applied if a token has already been generated.
 115      Higher values make the model less likely to repeat the token.
 116  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 117      values are the bias to add to that token.
 118  - `default_headers`: Default headers to use for the AzureOpenAI client.
 119  - `azure_ad_token_provider`: A function that returns an Azure Active Directory token, will be invoked on
 120  every request.
 121  
 122  <a id="azure.AzureOpenAIGenerator.to_dict"></a>
 123  
 124  #### AzureOpenAIGenerator.to\_dict
 125  
 126  ```python
 127  def to_dict() -> dict[str, Any]
 128  ```
 129  
 130  Serialize this component to a dictionary.
 131  
 132  **Returns**:
 133  
 134  The serialized component as a dictionary.
 135  
 136  <a id="azure.AzureOpenAIGenerator.from_dict"></a>
 137  
 138  #### AzureOpenAIGenerator.from\_dict
 139  
 140  ```python
 141  @classmethod
 142  def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIGenerator"
 143  ```
 144  
 145  Deserialize this component from a dictionary.
 146  
 147  **Arguments**:
 148  
 149  - `data`: The dictionary representation of this component.
 150  
 151  **Returns**:
 152  
 153  The deserialized component instance.
 154  
 155  <a id="azure.AzureOpenAIGenerator.run"></a>
 156  
 157  #### AzureOpenAIGenerator.run
 158  
 159  ```python
 160  @component.output_types(replies=list[str], meta=list[dict[str, Any]])
 161  def run(prompt: str,
 162          system_prompt: Optional[str] = None,
 163          streaming_callback: Optional[StreamingCallbackT] = None,
 164          generation_kwargs: Optional[dict[str, Any]] = None)
 165  ```
 166  
 167  Invoke the text generation inference based on the provided messages and generation parameters.
 168  
 169  **Arguments**:
 170  
 171  - `prompt`: The string prompt to use for text generation.
 172  - `system_prompt`: The system prompt to use for text generation. If this run time system prompt is omitted, the system
 173  prompt, if defined at initialisation time, is used.
 174  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 175  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters
 176  passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to
 177  the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create).
 178  
 179  **Returns**:
 180  
 181  A list of strings containing the generated responses and a list of dictionaries containing the metadata
 182  for each response.
 183  
 184  <a id="hugging_face_local"></a>
 185  
 186  ## Module hugging\_face\_local
 187  
 188  <a id="hugging_face_local.HuggingFaceLocalGenerator"></a>
 189  
 190  ### HuggingFaceLocalGenerator
 191  
 192  Generates text using models from Hugging Face that run locally.
 193  
 194  LLMs running locally may need powerful hardware.
 195  
 196  ### Usage example
 197  
 198  ```python
 199  from haystack.components.generators import HuggingFaceLocalGenerator
 200  
 201  generator = HuggingFaceLocalGenerator(
 202      model="google/flan-t5-large",
 203      task="text2text-generation",
 204      generation_kwargs={"max_new_tokens": 100, "temperature": 0.9})
 205  
 206  generator.warm_up()
 207  
 208  print(generator.run("Who is the best American actor?"))
 209  # {'replies': ['John Cusack']}
 210  ```
 211  
 212  <a id="hugging_face_local.HuggingFaceLocalGenerator.__init__"></a>
 213  
 214  #### HuggingFaceLocalGenerator.\_\_init\_\_
 215  
 216  ```python
 217  def __init__(model: str = "google/flan-t5-base",
 218               task: Optional[Literal["text-generation",
 219                                      "text2text-generation"]] = None,
 220               device: Optional[ComponentDevice] = None,
 221               token: Optional[Secret] = Secret.from_env_var(
 222                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
 223               generation_kwargs: Optional[dict[str, Any]] = None,
 224               huggingface_pipeline_kwargs: Optional[dict[str, Any]] = None,
 225               stop_words: Optional[list[str]] = None,
 226               streaming_callback: Optional[StreamingCallbackT] = None)
 227  ```
 228  
 229  Creates an instance of a HuggingFaceLocalGenerator.
 230  
 231  **Arguments**:
 232  
 233  - `model`: The Hugging Face text generation model name or path.
 234  - `task`: The task for the Hugging Face pipeline. Possible options:
 235  - `text-generation`: Supported by decoder models, like GPT.
 236  - `text2text-generation`: Supported by encoder-decoder models, like T5.
 237  If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
 238  If not specified, the component calls the Hugging Face API to infer the task from the model name.
 239  - `device`: The device for loading the model. If `None`, automatically selects the default device.
 240  If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
 241  - `token`: The token to use as HTTP bearer authorization for remote files.
 242  If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
 243  - `generation_kwargs`: A dictionary with keyword arguments to customize text generation.
 244  Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
 245  See Hugging Face's documentation for more information:
 246  - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
 247  - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
 248  - `huggingface_pipeline_kwargs`: Dictionary with keyword arguments to initialize the
 249  Hugging Face pipeline for text generation.
 250  These keyword arguments provide fine-grained control over the Hugging Face pipeline.
 251  In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
 252  For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
 253  In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization:
 254  [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
 255  - `stop_words`: If the model generates a stop word, the generation stops.
 256  If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
 257  For some chat models, the output includes both the new text and the original prompt.
 258  In these cases, make sure your prompt has no stop words.
 259  - `streaming_callback`: An optional callable for handling streaming responses.
 260  
 261  <a id="hugging_face_local.HuggingFaceLocalGenerator.warm_up"></a>
 262  
 263  #### HuggingFaceLocalGenerator.warm\_up
 264  
 265  ```python
 266  def warm_up()
 267  ```
 268  
 269  Initializes the component.
 270  
 271  <a id="hugging_face_local.HuggingFaceLocalGenerator.to_dict"></a>
 272  
 273  #### HuggingFaceLocalGenerator.to\_dict
 274  
 275  ```python
 276  def to_dict() -> dict[str, Any]
 277  ```
 278  
 279  Serializes the component to a dictionary.
 280  
 281  **Returns**:
 282  
 283  Dictionary with serialized data.
 284  
 285  <a id="hugging_face_local.HuggingFaceLocalGenerator.from_dict"></a>
 286  
 287  #### HuggingFaceLocalGenerator.from\_dict
 288  
 289  ```python
 290  @classmethod
 291  def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceLocalGenerator"
 292  ```
 293  
 294  Deserializes the component from a dictionary.
 295  
 296  **Arguments**:
 297  
 298  - `data`: The dictionary to deserialize from.
 299  
 300  **Returns**:
 301  
 302  The deserialized component.
 303  
 304  <a id="hugging_face_local.HuggingFaceLocalGenerator.run"></a>
 305  
 306  #### HuggingFaceLocalGenerator.run
 307  
 308  ```python
 309  @component.output_types(replies=list[str])
 310  def run(prompt: str,
 311          streaming_callback: Optional[StreamingCallbackT] = None,
 312          generation_kwargs: Optional[dict[str, Any]] = None)
 313  ```
 314  
 315  Run the text generation model on the given prompt.
 316  
 317  **Arguments**:
 318  
 319  - `prompt`: A string representing the prompt.
 320  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 321  - `generation_kwargs`: Additional keyword arguments for text generation.
 322  
 323  **Returns**:
 324  
 325  A dictionary containing the generated replies.
 326  - replies: A list of strings representing the generated replies.
 327  
 328  <a id="hugging_face_api"></a>
 329  
 330  ## Module hugging\_face\_api
 331  
 332  <a id="hugging_face_api.HuggingFaceAPIGenerator"></a>
 333  
 334  ### HuggingFaceAPIGenerator
 335  
 336  Generates text using Hugging Face APIs.
 337  
 338  Use it with the following Hugging Face APIs:
 339  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
 340  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
 341  
 342  **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the
 343  `text_generation` endpoint. Generative models are now only available through providers supporting the
 344  `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API.
 345  Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint.
 346  
 347  ### Usage examples
 348  
 349  #### With Hugging Face Inference Endpoints
 350  
 351  
 352  #### With self-hosted text generation inference
 353  
 354  #### With the free serverless inference API
 355  
 356  Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the
 357  `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the
 358  `chat_completion` endpoint.
 359  
 360  ```python
 361  from haystack.components.generators import HuggingFaceAPIGenerator
 362  from haystack.utils import Secret
 363  
 364  generator = HuggingFaceAPIGenerator(api_type="inference_endpoints",
 365                                      api_params={"url": "<your-inference-endpoint-url>"},
 366                                      token=Secret.from_token("<your-api-key>"))
 367  
 368  result = generator.run(prompt="What's Natural Language Processing?")
 369  print(result)
 370  ```
 371  ```python
 372  from haystack.components.generators import HuggingFaceAPIGenerator
 373  
 374  generator = HuggingFaceAPIGenerator(api_type="text_generation_inference",
 375                                      api_params={"url": "http://localhost:8080"})
 376  
 377  result = generator.run(prompt="What's Natural Language Processing?")
 378  print(result)
 379  ```
 380  ```python
 381  from haystack.components.generators import HuggingFaceAPIGenerator
 382  from haystack.utils import Secret
 383  
 384  generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
 385                                      api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
 386                                      token=Secret.from_token("<your-api-key>"))
 387  
 388  result = generator.run(prompt="What's Natural Language Processing?")
 389  print(result)
 390  ```
 391  
 392  <a id="hugging_face_api.HuggingFaceAPIGenerator.__init__"></a>
 393  
 394  #### HuggingFaceAPIGenerator.\_\_init\_\_
 395  
 396  ```python
 397  def __init__(api_type: Union[HFGenerationAPIType, str],
 398               api_params: dict[str, str],
 399               token: Optional[Secret] = Secret.from_env_var(
 400                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
 401               generation_kwargs: Optional[dict[str, Any]] = None,
 402               stop_words: Optional[list[str]] = None,
 403               streaming_callback: Optional[StreamingCallbackT] = None)
 404  ```
 405  
 406  Initialize the HuggingFaceAPIGenerator instance.
 407  
 408  **Arguments**:
 409  
 410  - `api_type`: The type of Hugging Face API to use. Available types:
 411  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
 412  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
 413  - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api).
 414    This might no longer work due to changes in the models offered in the Hugging Face Inference API.
 415    Please use the `HuggingFaceAPIChatGenerator` component instead.
 416  - `api_params`: A dictionary with the following keys:
 417  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
 418  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
 419  `TEXT_GENERATION_INFERENCE`.
 420  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc.
 421  - `token`: The Hugging Face token to use as HTTP bearer authorization.
 422  Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
 423  - `generation_kwargs`: A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`,
 424  `temperature`, `top_k`, `top_p`.
 425  For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation)
 426  for more information.
 427  - `stop_words`: An optional list of strings representing the stop words.
 428  - `streaming_callback`: An optional callable for handling streaming responses.
 429  
 430  <a id="hugging_face_api.HuggingFaceAPIGenerator.to_dict"></a>
 431  
 432  #### HuggingFaceAPIGenerator.to\_dict
 433  
 434  ```python
 435  def to_dict() -> dict[str, Any]
 436  ```
 437  
 438  Serialize this component to a dictionary.
 439  
 440  **Returns**:
 441  
 442  A dictionary containing the serialized component.
 443  
 444  <a id="hugging_face_api.HuggingFaceAPIGenerator.from_dict"></a>
 445  
 446  #### HuggingFaceAPIGenerator.from\_dict
 447  
 448  ```python
 449  @classmethod
 450  def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIGenerator"
 451  ```
 452  
 453  Deserialize this component from a dictionary.
 454  
 455  <a id="hugging_face_api.HuggingFaceAPIGenerator.run"></a>
 456  
 457  #### HuggingFaceAPIGenerator.run
 458  
 459  ```python
 460  @component.output_types(replies=list[str], meta=list[dict[str, Any]])
 461  def run(prompt: str,
 462          streaming_callback: Optional[StreamingCallbackT] = None,
 463          generation_kwargs: Optional[dict[str, Any]] = None)
 464  ```
 465  
 466  Invoke the text generation inference for the given prompt and generation parameters.
 467  
 468  **Arguments**:
 469  
 470  - `prompt`: A string representing the prompt.
 471  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 472  - `generation_kwargs`: Additional keyword arguments for text generation.
 473  
 474  **Returns**:
 475  
 476  A dictionary with the generated replies and metadata. Both are lists of length n.
 477  - replies: A list of strings representing the generated replies.
 478  
 479  <a id="openai"></a>
 480  
 481  ## Module openai
 482  
 483  <a id="openai.OpenAIGenerator"></a>
 484  
 485  ### OpenAIGenerator
 486  
 487  Generates text using OpenAI's large language models (LLMs).
 488  
 489  It works with the gpt-4 and o-series models and supports streaming responses
 490  from OpenAI API. It uses strings as input and output.
 491  
 492  You can customize how the text is generated by passing parameters to the
 493  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 494  the component or when you run it. Any parameter that works with
 495  `openai.ChatCompletion.create` will work here too.
 496  
 497  
 498  For details on OpenAI API parameters, see
 499  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 500  
 501  ### Usage example
 502  
 503  ```python
 504  from haystack.components.generators import OpenAIGenerator
 505  client = OpenAIGenerator()
 506  response = client.run("What's Natural Language Processing? Be brief.")
 507  print(response)
 508  
 509  >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
 510  >> the interaction between computers and human language. It involves enabling computers to understand, interpret,
 511  >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model':
 512  >> 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16,
 513  >> 'completion_tokens': 49, 'total_tokens': 65}}]}
 514  ```
 515  
 516  <a id="openai.OpenAIGenerator.__init__"></a>
 517  
 518  #### OpenAIGenerator.\_\_init\_\_
 519  
 520  ```python
 521  def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
 522               model: str = "gpt-4o-mini",
 523               streaming_callback: Optional[StreamingCallbackT] = None,
 524               api_base_url: Optional[str] = None,
 525               organization: Optional[str] = None,
 526               system_prompt: Optional[str] = None,
 527               generation_kwargs: Optional[dict[str, Any]] = None,
 528               timeout: Optional[float] = None,
 529               max_retries: Optional[int] = None,
 530               http_client_kwargs: Optional[dict[str, Any]] = None)
 531  ```
 532  
 533  Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-4o-mini
 534  
 535  By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters
 536  in the OpenAI client.
 537  
 538  **Arguments**:
 539  
 540  - `api_key`: The OpenAI API key to connect to OpenAI.
 541  - `model`: The name of the model to use.
 542  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 543  The callback function accepts StreamingChunk as an argument.
 544  - `api_base_url`: An optional base URL.
 545  - `organization`: The Organization ID, defaults to `None`.
 546  - `system_prompt`: The system prompt to use for text generation. If not provided, the system prompt is
 547  omitted, and the default system prompt of the model is used.
 548  - `generation_kwargs`: Other parameters to use for the model. These parameters are all sent directly to
 549  the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
 550  more details.
 551  Some of the supported parameters:
 552  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 553      including visible output tokens and reasoning tokens.
 554  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
 555      Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 556  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 557      considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens
 558      comprising the top 10% probability mass are considered.
 559  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
 560      it will generate two completions for each of the three prompts, ending up with 6 completions in total.
 561  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 562  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
 563      the model will be less likely to repeat the same token in the text.
 564  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
 565      Bigger values mean the model will be less likely to repeat the same token in the text.
 566  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 567      values are the bias to add to that token.
 568  - `timeout`: Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable
 569  or set to 30.
 570  - `max_retries`: Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred
 571  from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
 572  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 573  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
 574  
 575  <a id="openai.OpenAIGenerator.to_dict"></a>
 576  
 577  #### OpenAIGenerator.to\_dict
 578  
 579  ```python
 580  def to_dict() -> dict[str, Any]
 581  ```
 582  
 583  Serialize this component to a dictionary.
 584  
 585  **Returns**:
 586  
 587  The serialized component as a dictionary.
 588  
 589  <a id="openai.OpenAIGenerator.from_dict"></a>
 590  
 591  #### OpenAIGenerator.from\_dict
 592  
 593  ```python
 594  @classmethod
 595  def from_dict(cls, data: dict[str, Any]) -> "OpenAIGenerator"
 596  ```
 597  
 598  Deserialize this component from a dictionary.
 599  
 600  **Arguments**:
 601  
 602  - `data`: The dictionary representation of this component.
 603  
 604  **Returns**:
 605  
 606  The deserialized component instance.
 607  
 608  <a id="openai.OpenAIGenerator.run"></a>
 609  
 610  #### OpenAIGenerator.run
 611  
 612  ```python
 613  @component.output_types(replies=list[str], meta=list[dict[str, Any]])
 614  def run(prompt: str,
 615          system_prompt: Optional[str] = None,
 616          streaming_callback: Optional[StreamingCallbackT] = None,
 617          generation_kwargs: Optional[dict[str, Any]] = None)
 618  ```
 619  
 620  Invoke the text generation inference based on the provided messages and generation parameters.
 621  
 622  **Arguments**:
 623  
 624  - `prompt`: The string prompt to use for text generation.
 625  - `system_prompt`: The system prompt to use for text generation. If this run time system prompt is omitted, the system
 626  prompt, if defined at initialisation time, is used.
 627  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 628  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters
 629  passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to
 630  the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create).
 631  
 632  **Returns**:
 633  
 634  A list of strings containing the generated responses and a list of dictionaries containing the metadata
 635  for each response.
 636  
 637  <a id="openai_dalle"></a>
 638  
 639  ## Module openai\_dalle
 640  
 641  <a id="openai_dalle.DALLEImageGenerator"></a>
 642  
 643  ### DALLEImageGenerator
 644  
 645  Generates images using OpenAI's DALL-E model.
 646  
 647  For details on OpenAI API parameters, see
 648  [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create).
 649  
 650  ### Usage example
 651  
 652  ```python
 653  from haystack.components.generators import DALLEImageGenerator
 654  image_generator = DALLEImageGenerator()
 655  response = image_generator.run("Show me a picture of a black cat.")
 656  print(response)
 657  ```
 658  
 659  <a id="openai_dalle.DALLEImageGenerator.__init__"></a>
 660  
 661  #### DALLEImageGenerator.\_\_init\_\_
 662  
 663  ```python
 664  def __init__(model: str = "dall-e-3",
 665               quality: Literal["standard", "hd"] = "standard",
 666               size: Literal["256x256", "512x512", "1024x1024", "1792x1024",
 667                             "1024x1792"] = "1024x1024",
 668               response_format: Literal["url", "b64_json"] = "url",
 669               api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
 670               api_base_url: Optional[str] = None,
 671               organization: Optional[str] = None,
 672               timeout: Optional[float] = None,
 673               max_retries: Optional[int] = None,
 674               http_client_kwargs: Optional[dict[str, Any]] = None)
 675  ```
 676  
 677  Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3.
 678  
 679  **Arguments**:
 680  
 681  - `model`: The model to use for image generation. Can be "dall-e-2" or "dall-e-3".
 682  - `quality`: The quality of the generated image. Can be "standard" or "hd".
 683  - `size`: The size of the generated images.
 684  Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2.
 685  Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models.
 686  - `response_format`: The format of the response. Can be "url" or "b64_json".
 687  - `api_key`: The OpenAI API key to connect to OpenAI.
 688  - `api_base_url`: An optional base URL.
 689  - `organization`: The Organization ID, defaults to `None`.
 690  - `timeout`: Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable
 691  or set to 30.
 692  - `max_retries`: Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred
 693  from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
 694  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 695  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
 696  
 697  <a id="openai_dalle.DALLEImageGenerator.warm_up"></a>
 698  
 699  #### DALLEImageGenerator.warm\_up
 700  
 701  ```python
 702  def warm_up() -> None
 703  ```
 704  
 705  Warm up the OpenAI client.
 706  
 707  <a id="openai_dalle.DALLEImageGenerator.run"></a>
 708  
 709  #### DALLEImageGenerator.run
 710  
 711  ```python
 712  @component.output_types(images=list[str], revised_prompt=str)
 713  def run(prompt: str,
 714          size: Optional[Literal["256x256", "512x512", "1024x1024", "1792x1024",
 715                                 "1024x1792"]] = None,
 716          quality: Optional[Literal["standard", "hd"]] = None,
 717          response_format: Optional[Optional[Literal["url",
 718                                                     "b64_json"]]] = None)
 719  ```
 720  
 721  Invokes the image generation inference based on the provided prompt and generation parameters.
 722  
 723  **Arguments**:
 724  
 725  - `prompt`: The prompt to generate the image.
 726  - `size`: If provided, overrides the size provided during initialization.
 727  - `quality`: If provided, overrides the quality provided during initialization.
 728  - `response_format`: If provided, overrides the response format provided during initialization.
 729  
 730  **Returns**:
 731  
 732  A dictionary containing the generated list of images and the revised prompt.
 733  Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings.
 734  The revised prompt is the prompt that was used to generate the image, if there was any revision
 735  to the prompt made by OpenAI.
 736  
 737  <a id="openai_dalle.DALLEImageGenerator.to_dict"></a>
 738  
 739  #### DALLEImageGenerator.to\_dict
 740  
 741  ```python
 742  def to_dict() -> dict[str, Any]
 743  ```
 744  
 745  Serialize this component to a dictionary.
 746  
 747  **Returns**:
 748  
 749  The serialized component as a dictionary.
 750  
 751  <a id="openai_dalle.DALLEImageGenerator.from_dict"></a>
 752  
 753  #### DALLEImageGenerator.from\_dict
 754  
 755  ```python
 756  @classmethod
 757  def from_dict(cls, data: dict[str, Any]) -> "DALLEImageGenerator"
 758  ```
 759  
 760  Deserialize this component from a dictionary.
 761  
 762  **Arguments**:
 763  
 764  - `data`: The dictionary representation of this component.
 765  
 766  **Returns**:
 767  
 768  The deserialized component instance.
 769  
 770  <a id="chat/azure"></a>
 771  
 772  ## Module chat/azure
 773  
 774  <a id="chat/azure.AzureOpenAIChatGenerator"></a>
 775  
 776  ### AzureOpenAIChatGenerator
 777  
 778  Generates text using OpenAI's models on Azure.
 779  
 780  It works with the gpt-4 - type models and supports streaming responses
 781  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
 782  format in input and output.
 783  
 784  You can customize how the text is generated by passing parameters to the
 785  OpenAI API. Use the `**generation_kwargs` argument when you initialize
 786  the component or when you run it. Any parameter that works with
 787  `openai.ChatCompletion.create` will work here too.
 788  
 789  For details on OpenAI API parameters, see
 790  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 791  
 792  ### Usage example
 793  
 794  ```python
 795  from haystack.components.generators.chat import AzureOpenAIChatGenerator
 796  from haystack.dataclasses import ChatMessage
 797  from haystack.utils import Secret
 798  
 799  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 800  
 801  client = AzureOpenAIChatGenerator(
 802      azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>",
 803      api_key=Secret.from_token("<your-api-key>"),
 804      azure_deployment="<this a model name, e.g. gpt-4o-mini>")
 805  response = client.run(messages)
 806  print(response)
 807  ```
 808  
 809  ```
 810  {'replies':
 811      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
 812      "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on
 813       enabling computers to understand, interpret, and generate human language in a way that is useful.")],
 814       _name=None,
 815       _meta={'model': 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop',
 816       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})]
 817  }
 818  ```
 819  
 820  <a id="chat/azure.AzureOpenAIChatGenerator.__init__"></a>
 821  
 822  #### AzureOpenAIChatGenerator.\_\_init\_\_
 823  
 824  ```python
 825  def __init__(azure_endpoint: Optional[str] = None,
 826               api_version: Optional[str] = "2023-05-15",
 827               azure_deployment: Optional[str] = "gpt-4o-mini",
 828               api_key: Optional[Secret] = Secret.from_env_var(
 829                   "AZURE_OPENAI_API_KEY", strict=False),
 830               azure_ad_token: Optional[Secret] = Secret.from_env_var(
 831                   "AZURE_OPENAI_AD_TOKEN", strict=False),
 832               organization: Optional[str] = None,
 833               streaming_callback: Optional[StreamingCallbackT] = None,
 834               timeout: Optional[float] = None,
 835               max_retries: Optional[int] = None,
 836               generation_kwargs: Optional[dict[str, Any]] = None,
 837               default_headers: Optional[dict[str, str]] = None,
 838               tools: Optional[ToolsType] = None,
 839               tools_strict: bool = False,
 840               *,
 841               azure_ad_token_provider: Optional[Union[
 842                   AzureADTokenProvider, AsyncAzureADTokenProvider]] = None,
 843               http_client_kwargs: Optional[dict[str, Any]] = None)
 844  ```
 845  
 846  Initialize the Azure OpenAI Chat Generator component.
 847  
 848  **Arguments**:
 849  
 850  - `azure_endpoint`: The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
 851  - `api_version`: The version of the API to use. Defaults to 2023-05-15.
 852  - `azure_deployment`: The deployment of the model, usually the model name.
 853  - `api_key`: The API key to use for authentication.
 854  - `azure_ad_token`: [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
 855  - `organization`: Your organization ID, defaults to `None`. For help, see
 856  [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
 857  - `streaming_callback`: A callback function called when a new token is received from the stream.
 858  It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
 859  as an argument.
 860  - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the
 861  `OPENAI_TIMEOUT` environment variable, or 30 seconds.
 862  - `max_retries`: Maximum number of retries to contact OpenAI after an internal error.
 863  If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
 864  - `generation_kwargs`: Other parameters to use for the model. These parameters are sent directly to
 865  the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
 866  Some of the supported parameters:
 867  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
 868      including visible output tokens and reasoning tokens.
 869  - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks.
 870      Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 871  - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers
 872      tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising
 873      the top 10% probability mass are considered.
 874  - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2,
 875      the LLM will generate two completions per prompt, resulting in 6 completions total.
 876  - `stop`: One or more sequences after which the LLM should stop generating tokens.
 877  - `presence_penalty`: The penalty applied if a token is already present.
 878      Higher values make the model less likely to repeat the token.
 879  - `frequency_penalty`: Penalty applied if a token has already been generated.
 880      Higher values make the model less likely to repeat the token.
 881  - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the
 882      values are the bias to add to that token.
 883  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
 884      If provided, the output will always be validated against this
 885      format (unless the model returns a tool call).
 886      For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
 887      Notes:
 888      - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
 889        Older models only support basic version of structured outputs through `{"type": "json_object"}`.
 890        For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
 891      - For structured outputs with streaming,
 892        the `response_format` must be a JSON schema and not a Pydantic model.
 893  - `default_headers`: Default headers to use for the AzureOpenAI client.
 894  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 895  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 896  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 897  - `azure_ad_token_provider`: A function that returns an Azure Active Directory token, will be invoked on
 898  every request.
 899  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
 900  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
 901  
 902  <a id="chat/azure.AzureOpenAIChatGenerator.warm_up"></a>
 903  
 904  #### AzureOpenAIChatGenerator.warm\_up
 905  
 906  ```python
 907  def warm_up()
 908  ```
 909  
 910  Warm up the Azure OpenAI chat generator.
 911  
 912  This will warm up the tools registered in the chat generator.
 913  This method is idempotent and will only warm up the tools once.
 914  
 915  <a id="chat/azure.AzureOpenAIChatGenerator.to_dict"></a>
 916  
 917  #### AzureOpenAIChatGenerator.to\_dict
 918  
 919  ```python
 920  def to_dict() -> dict[str, Any]
 921  ```
 922  
 923  Serialize this component to a dictionary.
 924  
 925  **Returns**:
 926  
 927  The serialized component as a dictionary.
 928  
 929  <a id="chat/azure.AzureOpenAIChatGenerator.from_dict"></a>
 930  
 931  #### AzureOpenAIChatGenerator.from\_dict
 932  
 933  ```python
 934  @classmethod
 935  def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIChatGenerator"
 936  ```
 937  
 938  Deserialize this component from a dictionary.
 939  
 940  **Arguments**:
 941  
 942  - `data`: The dictionary representation of this component.
 943  
 944  **Returns**:
 945  
 946  The deserialized component instance.
 947  
 948  <a id="chat/azure.AzureOpenAIChatGenerator.run"></a>
 949  
 950  #### AzureOpenAIChatGenerator.run
 951  
 952  ```python
 953  @component.output_types(replies=list[ChatMessage])
 954  def run(messages: list[ChatMessage],
 955          streaming_callback: Optional[StreamingCallbackT] = None,
 956          generation_kwargs: Optional[dict[str, Any]] = None,
 957          *,
 958          tools: Optional[ToolsType] = None,
 959          tools_strict: Optional[bool] = None)
 960  ```
 961  
 962  Invokes chat completion based on the provided messages and generation parameters.
 963  
 964  **Arguments**:
 965  
 966  - `messages`: A list of ChatMessage instances representing the input messages.
 967  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 968  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
 969  override the parameters passed during component initialization.
 970  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
 971  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
 972  If set, it will override the `tools` parameter provided during initialization.
 973  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
 974  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
 975  If set, it will override the `tools_strict` parameter set during component initialization.
 976  
 977  **Returns**:
 978  
 979  A dictionary with the following key:
 980  - `replies`: A list containing the generated responses as ChatMessage instances.
 981  
 982  <a id="chat/azure.AzureOpenAIChatGenerator.run_async"></a>
 983  
 984  #### AzureOpenAIChatGenerator.run\_async
 985  
 986  ```python
 987  @component.output_types(replies=list[ChatMessage])
 988  async def run_async(messages: list[ChatMessage],
 989                      streaming_callback: Optional[StreamingCallbackT] = None,
 990                      generation_kwargs: Optional[dict[str, Any]] = None,
 991                      *,
 992                      tools: Optional[ToolsType] = None,
 993                      tools_strict: Optional[bool] = None)
 994  ```
 995  
 996  Asynchronously invokes chat completion based on the provided messages and generation parameters.
 997  
 998  This is the asynchronous version of the `run` method. It has the same parameters and return values
 999  but can be used with `await` in async code.
1000  
1001  **Arguments**:
1002  
1003  - `messages`: A list of ChatMessage instances representing the input messages.
1004  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1005  Must be a coroutine.
1006  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
1007  override the parameters passed during component initialization.
1008  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1009  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1010  If set, it will override the `tools` parameter provided during initialization.
1011  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1012  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1013  If set, it will override the `tools_strict` parameter set during component initialization.
1014  
1015  **Returns**:
1016  
1017  A dictionary with the following key:
1018  - `replies`: A list containing the generated responses as ChatMessage instances.
1019  
1020  <a id="chat/azure_responses"></a>
1021  
1022  ## Module chat/azure\_responses
1023  
1024  <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator"></a>
1025  
1026  ### AzureOpenAIResponsesChatGenerator
1027  
1028  Completes chats using OpenAI's Responses API on Azure.
1029  
1030  It works with the gpt-5 and o-series models and supports streaming responses
1031  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1032  format in input and output.
1033  
1034  You can customize how the text is generated by passing parameters to the
1035  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1036  the component or when you run it. Any parameter that works with
1037  `openai.Responses.create` will work here too.
1038  
1039  For details on OpenAI API parameters, see
1040  [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses).
1041  
1042  ### Usage example
1043  
1044  ```python
1045  from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator
1046  from haystack.dataclasses import ChatMessage
1047  
1048  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1049  
1050  client = AzureOpenAIResponsesChatGenerator(
1051      azure_endpoint="https://example-resource.azure.openai.com/",
1052      generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}
1053  )
1054  response = client.run(messages)
1055  print(response)
1056  ```
1057  
1058  <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.__init__"></a>
1059  
1060  #### AzureOpenAIResponsesChatGenerator.\_\_init\_\_
1061  
1062  ```python
1063  def __init__(*,
1064               api_key: Union[Secret, Callable[[], str],
1065                              Callable[[],
1066                                       Awaitable[str]]] = Secret.from_env_var(
1067                                           "AZURE_OPENAI_API_KEY", strict=False),
1068               azure_endpoint: Optional[str] = None,
1069               azure_deployment: str = "gpt-5-mini",
1070               streaming_callback: Optional[StreamingCallbackT] = None,
1071               organization: Optional[str] = None,
1072               generation_kwargs: Optional[dict[str, Any]] = None,
1073               timeout: Optional[float] = None,
1074               max_retries: Optional[int] = None,
1075               tools: Optional[ToolsType] = None,
1076               tools_strict: bool = False,
1077               http_client_kwargs: Optional[dict[str, Any]] = None)
1078  ```
1079  
1080  Initialize the AzureOpenAIResponsesChatGenerator component.
1081  
1082  **Arguments**:
1083  
1084  - `api_key`: The API key to use for authentication. Can be:
1085  - A `Secret` object containing the API key.
1086  - A `Secret` object containing the [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id).
1087  - A function that returns an Azure Active Directory token.
1088  - `azure_endpoint`: The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`.
1089  - `azure_deployment`: The deployment of the model, usually the model name.
1090  - `organization`: Your organization ID, defaults to `None`. For help, see
1091  [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1092  - `streaming_callback`: A callback function called when a new token is received from the stream.
1093  It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1094  as an argument.
1095  - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the
1096  `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1097  - `max_retries`: Maximum number of retries to contact OpenAI after an internal error.
1098  If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1099  - `generation_kwargs`: Other parameters to use for the model. These parameters are sent
1100  directly to the OpenAI endpoint.
1101  See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for
1102   more details.
1103   Some of the supported parameters:
1104   - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random,
1105       while lower values like 0.2 will make it more focused and deterministic.
1106   - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1107       considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1108       comprising the top 10% probability mass are considered.
1109   - `previous_response_id`: The ID of the previous response.
1110       Use this to create multi-turn conversations.
1111   - `text_format`: A Pydantic model that enforces the structure of the model's response.
1112       If provided, the output will always be validated against this
1113       format (unless the model returns a tool call).
1114       For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1115   - `text`: A JSON schema that enforces the structure of the model's response.
1116       If provided, the output will always be validated against this
1117       format (unless the model returns a tool call).
1118       Notes:
1119       - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
1120       - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored.
1121       - Currently, this component doesn't support streaming for structured outputs.
1122       - Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1123           For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1124   - `reasoning`: A dictionary of parameters for reasoning. For example:
1125       - `summary`: The summary of the reasoning.
1126       - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`.
1127       - `generate_summary`: Whether to generate a summary of the reasoning.
1128       Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled.
1129       For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning).
1130  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1131  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1132  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1133  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1134  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
1135  
1136  <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.to_dict"></a>
1137  
1138  #### AzureOpenAIResponsesChatGenerator.to\_dict
1139  
1140  ```python
1141  def to_dict() -> dict[str, Any]
1142  ```
1143  
1144  Serialize this component to a dictionary.
1145  
1146  **Returns**:
1147  
1148  The serialized component as a dictionary.
1149  
1150  <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.from_dict"></a>
1151  
1152  #### AzureOpenAIResponsesChatGenerator.from\_dict
1153  
1154  ```python
1155  @classmethod
1156  def from_dict(cls, data: dict[str,
1157                                Any]) -> "AzureOpenAIResponsesChatGenerator"
1158  ```
1159  
1160  Deserialize this component from a dictionary.
1161  
1162  **Arguments**:
1163  
1164  - `data`: The dictionary representation of this component.
1165  
1166  **Returns**:
1167  
1168  The deserialized component instance.
1169  
1170  <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.warm_up"></a>
1171  
1172  #### AzureOpenAIResponsesChatGenerator.warm\_up
1173  
1174  ```python
1175  def warm_up()
1176  ```
1177  
1178  Warm up the OpenAI responses chat generator.
1179  
1180  This will warm up the tools registered in the chat generator.
1181  This method is idempotent and will only warm up the tools once.
1182  
1183  <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.run"></a>
1184  
1185  #### AzureOpenAIResponsesChatGenerator.run
1186  
1187  ```python
1188  @component.output_types(replies=list[ChatMessage])
1189  def run(messages: list[ChatMessage],
1190          *,
1191          streaming_callback: Optional[StreamingCallbackT] = None,
1192          generation_kwargs: Optional[dict[str, Any]] = None,
1193          tools: Optional[Union[ToolsType, list[dict]]] = None,
1194          tools_strict: Optional[bool] = None)
1195  ```
1196  
1197  Invokes response generation based on the provided messages and generation parameters.
1198  
1199  **Arguments**:
1200  
1201  - `messages`: A list of ChatMessage instances representing the input messages.
1202  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1203  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
1204  override the parameters passed during component initialization.
1205  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
1206  - `tools`: The tools that the model can use to prepare calls. If set, it will override the
1207  `tools` parameter set during component initialization. This parameter can accept either a
1208  mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1209  OpenAI/MCP tool definitions.
1210  Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1211  For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
1212  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
1213  follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
1214  are strict by default.
1215  If set, it will override the `tools_strict` parameter set during component initialization.
1216  
1217  **Returns**:
1218  
1219  A dictionary with the following key:
1220  - `replies`: A list containing the generated responses as ChatMessage instances.
1221  
1222  <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.run_async"></a>
1223  
1224  #### AzureOpenAIResponsesChatGenerator.run\_async
1225  
1226  ```python
1227  @component.output_types(replies=list[ChatMessage])
1228  async def run_async(messages: list[ChatMessage],
1229                      *,
1230                      streaming_callback: Optional[StreamingCallbackT] = None,
1231                      generation_kwargs: Optional[dict[str, Any]] = None,
1232                      tools: Optional[Union[ToolsType, list[dict]]] = None,
1233                      tools_strict: Optional[bool] = None)
1234  ```
1235  
1236  Asynchronously invokes response generation based on the provided messages and generation parameters.
1237  
1238  This is the asynchronous version of the `run` method. It has the same parameters and return values
1239  but can be used with `await` in async code.
1240  
1241  **Arguments**:
1242  
1243  - `messages`: A list of ChatMessage instances representing the input messages.
1244  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1245  Must be a coroutine.
1246  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
1247  override the parameters passed during component initialization.
1248  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
1249  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
1250  `tools` parameter set during component initialization. This parameter can accept either a list of
1251  mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
1252  OpenAI/MCP tool definitions.
1253  Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
1254  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1255  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1256  If set, it will override the `tools_strict` parameter set during component initialization.
1257  
1258  **Returns**:
1259  
1260  A dictionary with the following key:
1261  - `replies`: A list containing the generated responses as ChatMessage instances.
1262  
1263  <a id="chat/hugging_face_local"></a>
1264  
1265  ## Module chat/hugging\_face\_local
1266  
1267  <a id="chat/hugging_face_local.default_tool_parser"></a>
1268  
1269  #### default\_tool\_parser
1270  
1271  ```python
1272  def default_tool_parser(text: str) -> Optional[list[ToolCall]]
1273  ```
1274  
1275  Default implementation for parsing tool calls from model output text.
1276  
1277  Uses DEFAULT_TOOL_PATTERN to extract tool calls.
1278  
1279  **Arguments**:
1280  
1281  - `text`: The text to parse for tool calls.
1282  
1283  **Returns**:
1284  
1285  A list containing a single ToolCall if a valid tool call is found, None otherwise.
1286  
1287  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator"></a>
1288  
1289  ### HuggingFaceLocalChatGenerator
1290  
1291  Generates chat responses using models from Hugging Face that run locally.
1292  
1293  Use this component with chat-based models,
1294  such as `HuggingFaceH4/zephyr-7b-beta` or `meta-llama/Llama-2-7b-chat-hf`.
1295  LLMs running locally may need powerful hardware.
1296  
1297  ### Usage example
1298  
1299  ```python
1300  from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
1301  from haystack.dataclasses import ChatMessage
1302  
1303  generator = HuggingFaceLocalChatGenerator(model="HuggingFaceH4/zephyr-7b-beta")
1304  generator.warm_up()
1305  messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")]
1306  print(generator.run(messages))
1307  ```
1308  
1309  ```
1310  {'replies':
1311      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=
1312      "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals
1313      with the interaction between computers and human language. It enables computers to understand, interpret, and
1314      generate human language in a valuable way. NLP involves various techniques such as speech recognition, text
1315      analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to
1316      process and derive meaning from human language, improving communication between humans and machines.")],
1317      _name=None,
1318      _meta={'finish_reason': 'stop', 'index': 0, 'model':
1319            'mistralai/Mistral-7B-Instruct-v0.2',
1320            'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}})
1321            ]
1322  }
1323  ```
1324  
1325  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.__init__"></a>
1326  
1327  #### HuggingFaceLocalChatGenerator.\_\_init\_\_
1328  
1329  ```python
1330  def __init__(model: str = "HuggingFaceH4/zephyr-7b-beta",
1331               task: Optional[Literal["text-generation",
1332                                      "text2text-generation"]] = None,
1333               device: Optional[ComponentDevice] = None,
1334               token: Optional[Secret] = Secret.from_env_var(
1335                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
1336               chat_template: Optional[str] = None,
1337               generation_kwargs: Optional[dict[str, Any]] = None,
1338               huggingface_pipeline_kwargs: Optional[dict[str, Any]] = None,
1339               stop_words: Optional[list[str]] = None,
1340               streaming_callback: Optional[StreamingCallbackT] = None,
1341               tools: Optional[ToolsType] = None,
1342               tool_parsing_function: Optional[Callable[
1343                   [str], Optional[list[ToolCall]]]] = None,
1344               async_executor: Optional[ThreadPoolExecutor] = None) -> None
1345  ```
1346  
1347  Initializes the HuggingFaceLocalChatGenerator component.
1348  
1349  **Arguments**:
1350  
1351  - `model`: The Hugging Face text generation model name or path,
1352  for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`.
1353  The model must be a chat model supporting the ChatML messaging
1354  format.
1355  If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1356  - `task`: The task for the Hugging Face pipeline. Possible options:
1357  - `text-generation`: Supported by decoder models, like GPT.
1358  - `text2text-generation`: Supported by encoder-decoder models, like T5.
1359  If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1360  If not specified, the component calls the Hugging Face API to infer the task from the model name.
1361  - `device`: The device for loading the model. If `None`, automatically selects the default device.
1362  If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter.
1363  - `token`: The token to use as HTTP bearer authorization for remote files.
1364  If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored.
1365  - `chat_template`: Specifies an optional Jinja template for formatting chat
1366  messages. Most high-quality chat models have their own templates, but for models without this
1367  feature or if you prefer a custom template, use this parameter.
1368  - `generation_kwargs`: A dictionary with keyword arguments to customize text generation.
1369  Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`.
1370  See Hugging Face's documentation for more information:
1371  - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation)
1372  - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)
1373  The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens.
1374  - `huggingface_pipeline_kwargs`: Dictionary with keyword arguments to initialize the
1375  Hugging Face pipeline for text generation.
1376  These keyword arguments provide fine-grained control over the Hugging Face pipeline.
1377  In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters.
1378  For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).
1379  In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained)
1380  - `stop_words`: A list of stop words. If the model generates a stop word, the generation stops.
1381  If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`.
1382  For some chat models, the output includes both the new text and the original prompt.
1383  In these cases, make sure your prompt has no stop words.
1384  - `streaming_callback`: An optional callable for handling streaming responses.
1385  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1386  - `tool_parsing_function`: A callable that takes a string and returns a list of ToolCall objects or None.
1387  If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern.
1388  - `async_executor`: Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be
1389  initialized and used
1390  
1391  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.__del__"></a>
1392  
1393  #### HuggingFaceLocalChatGenerator.\_\_del\_\_
1394  
1395  ```python
1396  def __del__() -> None
1397  ```
1398  
1399  Cleanup when the instance is being destroyed.
1400  
1401  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.shutdown"></a>
1402  
1403  #### HuggingFaceLocalChatGenerator.shutdown
1404  
1405  ```python
1406  def shutdown() -> None
1407  ```
1408  
1409  Explicitly shutdown the executor if we own it.
1410  
1411  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.warm_up"></a>
1412  
1413  #### HuggingFaceLocalChatGenerator.warm\_up
1414  
1415  ```python
1416  def warm_up() -> None
1417  ```
1418  
1419  Initializes the component and warms up tools if provided.
1420  
1421  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.to_dict"></a>
1422  
1423  #### HuggingFaceLocalChatGenerator.to\_dict
1424  
1425  ```python
1426  def to_dict() -> dict[str, Any]
1427  ```
1428  
1429  Serializes the component to a dictionary.
1430  
1431  **Returns**:
1432  
1433  Dictionary with serialized data.
1434  
1435  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.from_dict"></a>
1436  
1437  #### HuggingFaceLocalChatGenerator.from\_dict
1438  
1439  ```python
1440  @classmethod
1441  def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceLocalChatGenerator"
1442  ```
1443  
1444  Deserializes the component from a dictionary.
1445  
1446  **Arguments**:
1447  
1448  - `data`: The dictionary to deserialize from.
1449  
1450  **Returns**:
1451  
1452  The deserialized component.
1453  
1454  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.run"></a>
1455  
1456  #### HuggingFaceLocalChatGenerator.run
1457  
1458  ```python
1459  @component.output_types(replies=list[ChatMessage])
1460  def run(messages: list[ChatMessage],
1461          generation_kwargs: Optional[dict[str, Any]] = None,
1462          streaming_callback: Optional[StreamingCallbackT] = None,
1463          tools: Optional[ToolsType] = None) -> dict[str, list[ChatMessage]]
1464  ```
1465  
1466  Invoke text generation inference based on the provided messages and generation parameters.
1467  
1468  **Arguments**:
1469  
1470  - `messages`: A list of ChatMessage objects representing the input messages.
1471  - `generation_kwargs`: Additional keyword arguments for text generation.
1472  - `streaming_callback`: An optional callable for handling streaming responses.
1473  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1474  If set, it will override the `tools` parameter provided during initialization.
1475  
1476  **Returns**:
1477  
1478  A dictionary with the following keys:
1479  - `replies`: A list containing the generated responses as ChatMessage instances.
1480  
1481  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.create_message"></a>
1482  
1483  #### HuggingFaceLocalChatGenerator.create\_message
1484  
1485  ```python
1486  def create_message(text: str,
1487                     index: int,
1488                     tokenizer: Union["PreTrainedTokenizer",
1489                                      "PreTrainedTokenizerFast"],
1490                     prompt: str,
1491                     generation_kwargs: dict[str, Any],
1492                     parse_tool_calls: bool = False) -> ChatMessage
1493  ```
1494  
1495  Create a ChatMessage instance from the provided text, populated with metadata.
1496  
1497  **Arguments**:
1498  
1499  - `text`: The generated text.
1500  - `index`: The index of the generated text.
1501  - `tokenizer`: The tokenizer used for generation.
1502  - `prompt`: The prompt used for generation.
1503  - `generation_kwargs`: The generation parameters.
1504  - `parse_tool_calls`: Whether to attempt parsing tool calls from the text.
1505  
1506  **Returns**:
1507  
1508  A ChatMessage instance.
1509  
1510  <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.run_async"></a>
1511  
1512  #### HuggingFaceLocalChatGenerator.run\_async
1513  
1514  ```python
1515  @component.output_types(replies=list[ChatMessage])
1516  async def run_async(
1517          messages: list[ChatMessage],
1518          generation_kwargs: Optional[dict[str, Any]] = None,
1519          streaming_callback: Optional[StreamingCallbackT] = None,
1520          tools: Optional[ToolsType] = None) -> dict[str, list[ChatMessage]]
1521  ```
1522  
1523  Asynchronously invokes text generation inference based on the provided messages and generation parameters.
1524  
1525  This is the asynchronous version of the `run` method. It has the same parameters
1526  and return values but can be used with `await` in an async code.
1527  
1528  **Arguments**:
1529  
1530  - `messages`: A list of ChatMessage objects representing the input messages.
1531  - `generation_kwargs`: Additional keyword arguments for text generation.
1532  - `streaming_callback`: An optional callable for handling streaming responses.
1533  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1534  If set, it will override the `tools` parameter provided during initialization.
1535  
1536  **Returns**:
1537  
1538  A dictionary with the following keys:
1539  - `replies`: A list containing the generated responses as ChatMessage instances.
1540  
1541  <a id="chat/hugging_face_api"></a>
1542  
1543  ## Module chat/hugging\_face\_api
1544  
1545  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator"></a>
1546  
1547  ### HuggingFaceAPIChatGenerator
1548  
1549  Completes chats using Hugging Face APIs.
1550  
1551  HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1552  format for input and output. Use it to generate text with Hugging Face APIs:
1553  - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers)
1554  - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
1555  - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)
1556  
1557  ### Usage examples
1558  
1559  #### With the serverless inference API (Inference Providers) - free tier available
1560  
1561  ```python
1562  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
1563  from haystack.dataclasses import ChatMessage
1564  from haystack.utils import Secret
1565  from haystack.utils.hf import HFGenerationAPIType
1566  
1567  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
1568              ChatMessage.from_user("What's Natural Language Processing?")]
1569  
1570  # the api_type can be expressed using the HFGenerationAPIType enum or as a string
1571  api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API
1572  api_type = "serverless_inference_api" # this is equivalent to the above
1573  
1574  generator = HuggingFaceAPIChatGenerator(api_type=api_type,
1575                                          api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
1576                                                      "provider": "together"},
1577                                          token=Secret.from_token("<your-api-key>"))
1578  
1579  result = generator.run(messages)
1580  print(result)
1581  ```
1582  
1583  #### With the serverless inference API (Inference Providers) and text+image input
1584  
1585  ```python
1586  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
1587  from haystack.dataclasses import ChatMessage, ImageContent
1588  from haystack.utils import Secret
1589  from haystack.utils.hf import HFGenerationAPIType
1590  
1591  # Create an image from file path, URL, or base64
1592  image = ImageContent.from_file_path("path/to/your/image.jpg")
1593  
1594  # Create a multimodal message with both text and image
1595  messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])]
1596  
1597  generator = HuggingFaceAPIChatGenerator(
1598      api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
1599      api_params={
1600          "model": "Qwen/Qwen2.5-VL-7B-Instruct",  # Vision Language Model
1601          "provider": "hyperbolic"
1602      },
1603      token=Secret.from_token("<your-api-key>")
1604  )
1605  
1606  result = generator.run(messages)
1607  print(result)
1608  ```
1609  
1610  #### With paid inference endpoints
1611  
1612  ```python
1613  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
1614  from haystack.dataclasses import ChatMessage
1615  from haystack.utils import Secret
1616  
1617  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
1618              ChatMessage.from_user("What's Natural Language Processing?")]
1619  
1620  generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints",
1621                                          api_params={"url": "<your-inference-endpoint-url>"},
1622                                          token=Secret.from_token("<your-api-key>"))
1623  
1624  result = generator.run(messages)
1625  print(result)
1626  
1627  #### With self-hosted text generation inference
1628  
1629  ```python
1630  from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
1631  from haystack.dataclasses import ChatMessage
1632  
1633  messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"),
1634              ChatMessage.from_user("What's Natural Language Processing?")]
1635  
1636  generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference",
1637                                          api_params={"url": "http://localhost:8080"})
1638  
1639  result = generator.run(messages)
1640  print(result)
1641  ```
1642  
1643  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.__init__"></a>
1644  
1645  #### HuggingFaceAPIChatGenerator.\_\_init\_\_
1646  
1647  ```python
1648  def __init__(api_type: Union[HFGenerationAPIType, str],
1649               api_params: dict[str, str],
1650               token: Optional[Secret] = Secret.from_env_var(
1651                   ["HF_API_TOKEN", "HF_TOKEN"], strict=False),
1652               generation_kwargs: Optional[dict[str, Any]] = None,
1653               stop_words: Optional[list[str]] = None,
1654               streaming_callback: Optional[StreamingCallbackT] = None,
1655               tools: Optional[ToolsType] = None)
1656  ```
1657  
1658  Initialize the HuggingFaceAPIChatGenerator instance.
1659  
1660  **Arguments**:
1661  
1662  - `api_type`: The type of Hugging Face API to use. Available types:
1663  - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference).
1664  - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints).
1665  - `serverless_inference_api`: See
1666  [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers).
1667  - `api_params`: A dictionary with the following keys:
1668  - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.
1669  - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`.
1670  - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or
1671  `TEXT_GENERATION_INFERENCE`.
1672  - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc.
1673  - `token`: The Hugging Face token to use as HTTP bearer authorization.
1674  Check your HF token in your [account settings](https://huggingface.co/settings/tokens).
1675  - `generation_kwargs`: A dictionary with keyword arguments to customize text generation.
1676  Some examples: `max_tokens`, `temperature`, `top_p`.
1677  For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
1678  - `stop_words`: An optional list of strings representing the stop words.
1679  - `streaming_callback`: An optional callable for handling streaming responses.
1680  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1681  The chosen model should support tool/function calling, according to the model card.
1682  Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience
1683  unexpected behavior.
1684  
1685  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.warm_up"></a>
1686  
1687  #### HuggingFaceAPIChatGenerator.warm\_up
1688  
1689  ```python
1690  def warm_up()
1691  ```
1692  
1693  Warm up the Hugging Face API chat generator.
1694  
1695  This will warm up the tools registered in the chat generator.
1696  This method is idempotent and will only warm up the tools once.
1697  
1698  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.to_dict"></a>
1699  
1700  #### HuggingFaceAPIChatGenerator.to\_dict
1701  
1702  ```python
1703  def to_dict() -> dict[str, Any]
1704  ```
1705  
1706  Serialize this component to a dictionary.
1707  
1708  **Returns**:
1709  
1710  A dictionary containing the serialized component.
1711  
1712  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.from_dict"></a>
1713  
1714  #### HuggingFaceAPIChatGenerator.from\_dict
1715  
1716  ```python
1717  @classmethod
1718  def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIChatGenerator"
1719  ```
1720  
1721  Deserialize this component from a dictionary.
1722  
1723  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.run"></a>
1724  
1725  #### HuggingFaceAPIChatGenerator.run
1726  
1727  ```python
1728  @component.output_types(replies=list[ChatMessage])
1729  def run(messages: list[ChatMessage],
1730          generation_kwargs: Optional[dict[str, Any]] = None,
1731          tools: Optional[ToolsType] = None,
1732          streaming_callback: Optional[StreamingCallbackT] = None)
1733  ```
1734  
1735  Invoke the text generation inference based on the provided messages and generation parameters.
1736  
1737  **Arguments**:
1738  
1739  - `messages`: A list of ChatMessage objects representing the input messages.
1740  - `generation_kwargs`: Additional keyword arguments for text generation.
1741  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override
1742  the `tools` parameter set during component initialization. This parameter can accept either a
1743  list of `Tool` objects or a `Toolset` instance.
1744  - `streaming_callback`: An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
1745  parameter set during component initialization.
1746  
1747  **Returns**:
1748  
1749  A dictionary with the following keys:
1750  - `replies`: A list containing the generated responses as ChatMessage objects.
1751  
1752  <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.run_async"></a>
1753  
1754  #### HuggingFaceAPIChatGenerator.run\_async
1755  
1756  ```python
1757  @component.output_types(replies=list[ChatMessage])
1758  async def run_async(messages: list[ChatMessage],
1759                      generation_kwargs: Optional[dict[str, Any]] = None,
1760                      tools: Optional[ToolsType] = None,
1761                      streaming_callback: Optional[StreamingCallbackT] = None)
1762  ```
1763  
1764  Asynchronously invokes the text generation inference based on the provided messages and generation parameters.
1765  
1766  This is the asynchronous version of the `run` method. It has the same parameters
1767  and return values but can be used with `await` in an async code.
1768  
1769  **Arguments**:
1770  
1771  - `messages`: A list of ChatMessage objects representing the input messages.
1772  - `generation_kwargs`: Additional keyword arguments for text generation.
1773  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools`
1774  parameter set during component initialization. This parameter can accept either a list of `Tool` objects
1775  or a `Toolset` instance.
1776  - `streaming_callback`: An optional callable for handling streaming responses. If set, it will override the `streaming_callback`
1777  parameter set during component initialization.
1778  
1779  **Returns**:
1780  
1781  A dictionary with the following keys:
1782  - `replies`: A list containing the generated responses as ChatMessage objects.
1783  
1784  <a id="chat/openai"></a>
1785  
1786  ## Module chat/openai
1787  
1788  <a id="chat/openai.OpenAIChatGenerator"></a>
1789  
1790  ### OpenAIChatGenerator
1791  
1792  Completes chats using OpenAI's large language models (LLMs).
1793  
1794  It works with the gpt-4 and o-series models and supports streaming responses
1795  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
1796  format in input and output.
1797  
1798  You can customize how the text is generated by passing parameters to the
1799  OpenAI API. Use the `**generation_kwargs` argument when you initialize
1800  the component or when you run it. Any parameter that works with
1801  `openai.ChatCompletion.create` will work here too.
1802  
1803  For details on OpenAI API parameters, see
1804  [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat).
1805  
1806  ### Usage example
1807  
1808  ```python
1809  from haystack.components.generators.chat import OpenAIChatGenerator
1810  from haystack.dataclasses import ChatMessage
1811  
1812  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
1813  
1814  client = OpenAIChatGenerator()
1815  response = client.run(messages)
1816  print(response)
1817  ```
1818  Output:
1819  ```
1820  {'replies':
1821      [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=
1822      [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence
1823          that focuses on enabling computers to understand, interpret, and generate human language in
1824          a way that is meaningful and useful.")],
1825       _name=None,
1826       _meta={'model': 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop',
1827       'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})
1828      ]
1829  }
1830  ```
1831  
1832  <a id="chat/openai.OpenAIChatGenerator.__init__"></a>
1833  
1834  #### OpenAIChatGenerator.\_\_init\_\_
1835  
1836  ```python
1837  def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
1838               model: str = "gpt-4o-mini",
1839               streaming_callback: Optional[StreamingCallbackT] = None,
1840               api_base_url: Optional[str] = None,
1841               organization: Optional[str] = None,
1842               generation_kwargs: Optional[dict[str, Any]] = None,
1843               timeout: Optional[float] = None,
1844               max_retries: Optional[int] = None,
1845               tools: Optional[ToolsType] = None,
1846               tools_strict: bool = False,
1847               http_client_kwargs: Optional[dict[str, Any]] = None)
1848  ```
1849  
1850  Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-4o-mini
1851  
1852  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
1853  environment variables to override the `timeout` and `max_retries` parameters respectively
1854  in the OpenAI client.
1855  
1856  **Arguments**:
1857  
1858  - `api_key`: The OpenAI API key.
1859  You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
1860  during initialization.
1861  - `model`: The name of the model to use.
1862  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1863  The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
1864  as an argument.
1865  - `api_base_url`: An optional base URL.
1866  - `organization`: Your organization ID, defaults to `None`. See
1867  [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
1868  - `generation_kwargs`: Other parameters to use for the model. These parameters are sent directly to
1869  the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for
1870  more details.
1871  Some of the supported parameters:
1872  - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion,
1873      including visible output tokens and reasoning tokens.
1874  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
1875      Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
1876  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
1877      considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
1878      comprising the top 10% probability mass are considered.
1879  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
1880      it will generate two completions for each of the three prompts, ending up with 6 completions in total.
1881  - `stop`: One or more sequences after which the LLM should stop generating tokens.
1882  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
1883      the model will be less likely to repeat the same token in the text.
1884  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
1885      Bigger values mean the model will be less likely to repeat the same token in the text.
1886  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
1887      values are the bias to add to that token.
1888  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
1889      If provided, the output will always be validated against this
1890      format (unless the model returns a tool call).
1891      For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
1892      Notes:
1893      - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o.
1894        Older models only support basic version of structured outputs through `{"type": "json_object"}`.
1895        For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
1896      - For structured outputs with streaming,
1897        the `response_format` must be a JSON schema and not a Pydantic model.
1898  - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the
1899  `OPENAI_TIMEOUT` environment variable, or 30 seconds.
1900  - `max_retries`: Maximum number of retries to contact OpenAI after an internal error.
1901  If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
1902  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1903  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1904  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1905  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
1906  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
1907  
1908  <a id="chat/openai.OpenAIChatGenerator.warm_up"></a>
1909  
1910  #### OpenAIChatGenerator.warm\_up
1911  
1912  ```python
1913  def warm_up()
1914  ```
1915  
1916  Warm up the OpenAI chat generator.
1917  
1918  This will warm up the tools registered in the chat generator.
1919  This method is idempotent and will only warm up the tools once.
1920  
1921  <a id="chat/openai.OpenAIChatGenerator.to_dict"></a>
1922  
1923  #### OpenAIChatGenerator.to\_dict
1924  
1925  ```python
1926  def to_dict() -> dict[str, Any]
1927  ```
1928  
1929  Serialize this component to a dictionary.
1930  
1931  **Returns**:
1932  
1933  The serialized component as a dictionary.
1934  
1935  <a id="chat/openai.OpenAIChatGenerator.from_dict"></a>
1936  
1937  #### OpenAIChatGenerator.from\_dict
1938  
1939  ```python
1940  @classmethod
1941  def from_dict(cls, data: dict[str, Any]) -> "OpenAIChatGenerator"
1942  ```
1943  
1944  Deserialize this component from a dictionary.
1945  
1946  **Arguments**:
1947  
1948  - `data`: The dictionary representation of this component.
1949  
1950  **Returns**:
1951  
1952  The deserialized component instance.
1953  
1954  <a id="chat/openai.OpenAIChatGenerator.run"></a>
1955  
1956  #### OpenAIChatGenerator.run
1957  
1958  ```python
1959  @component.output_types(replies=list[ChatMessage])
1960  def run(messages: list[ChatMessage],
1961          streaming_callback: Optional[StreamingCallbackT] = None,
1962          generation_kwargs: Optional[dict[str, Any]] = None,
1963          *,
1964          tools: Optional[ToolsType] = None,
1965          tools_strict: Optional[bool] = None)
1966  ```
1967  
1968  Invokes chat completion based on the provided messages and generation parameters.
1969  
1970  **Arguments**:
1971  
1972  - `messages`: A list of ChatMessage instances representing the input messages.
1973  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
1974  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
1975  override the parameters passed during component initialization.
1976  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
1977  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
1978  If set, it will override the `tools` parameter provided during initialization.
1979  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
1980  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
1981  If set, it will override the `tools_strict` parameter set during component initialization.
1982  
1983  **Returns**:
1984  
1985  A dictionary with the following key:
1986  - `replies`: A list containing the generated responses as ChatMessage instances.
1987  
1988  <a id="chat/openai.OpenAIChatGenerator.run_async"></a>
1989  
1990  #### OpenAIChatGenerator.run\_async
1991  
1992  ```python
1993  @component.output_types(replies=list[ChatMessage])
1994  async def run_async(messages: list[ChatMessage],
1995                      streaming_callback: Optional[StreamingCallbackT] = None,
1996                      generation_kwargs: Optional[dict[str, Any]] = None,
1997                      *,
1998                      tools: Optional[ToolsType] = None,
1999                      tools_strict: Optional[bool] = None)
2000  ```
2001  
2002  Asynchronously invokes chat completion based on the provided messages and generation parameters.
2003  
2004  This is the asynchronous version of the `run` method. It has the same parameters and return values
2005  but can be used with `await` in async code.
2006  
2007  **Arguments**:
2008  
2009  - `messages`: A list of ChatMessage instances representing the input messages.
2010  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
2011  Must be a coroutine.
2012  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
2013  override the parameters passed during component initialization.
2014  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create).
2015  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
2016  If set, it will override the `tools` parameter provided during initialization.
2017  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
2018  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
2019  If set, it will override the `tools_strict` parameter set during component initialization.
2020  
2021  **Returns**:
2022  
2023  A dictionary with the following key:
2024  - `replies`: A list containing the generated responses as ChatMessage instances.
2025  
2026  <a id="chat/openai_responses"></a>
2027  
2028  ## Module chat/openai\_responses
2029  
2030  <a id="chat/openai_responses.OpenAIResponsesChatGenerator"></a>
2031  
2032  ### OpenAIResponsesChatGenerator
2033  
2034  Completes chats using OpenAI's Responses API.
2035  
2036  It works with the gpt-4 and o-series models and supports streaming responses
2037  from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage)
2038  format in input and output.
2039  
2040  You can customize how the text is generated by passing parameters to the
2041  OpenAI API. Use the `**generation_kwargs` argument when you initialize
2042  the component or when you run it. Any parameter that works with
2043  `openai.Responses.create` will work here too.
2044  
2045  For details on OpenAI API parameters, see
2046  [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses).
2047  
2048  ### Usage example
2049  
2050  ```python
2051  from haystack.components.generators.chat import OpenAIResponsesChatGenerator
2052  from haystack.dataclasses import ChatMessage
2053  
2054  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
2055  
2056  client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}})
2057  response = client.run(messages)
2058  print(response)
2059  ```
2060  
2061  <a id="chat/openai_responses.OpenAIResponsesChatGenerator.__init__"></a>
2062  
2063  #### OpenAIResponsesChatGenerator.\_\_init\_\_
2064  
2065  ```python
2066  def __init__(*,
2067               api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
2068               model: str = "gpt-5-mini",
2069               streaming_callback: Optional[StreamingCallbackT] = None,
2070               api_base_url: Optional[str] = None,
2071               organization: Optional[str] = None,
2072               generation_kwargs: Optional[dict[str, Any]] = None,
2073               timeout: Optional[float] = None,
2074               max_retries: Optional[int] = None,
2075               tools: Optional[Union[ToolsType, list[dict]]] = None,
2076               tools_strict: bool = False,
2077               http_client_kwargs: Optional[dict[str, Any]] = None)
2078  ```
2079  
2080  Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default.
2081  
2082  Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES'
2083  environment variables to override the `timeout` and `max_retries` parameters respectively
2084  in the OpenAI client.
2085  
2086  **Arguments**:
2087  
2088  - `api_key`: The OpenAI API key.
2089  You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
2090  during initialization.
2091  - `model`: The name of the model to use.
2092  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
2093  The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk)
2094  as an argument.
2095  - `api_base_url`: An optional base URL.
2096  - `organization`: Your organization ID, defaults to `None`. See
2097  [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
2098  - `generation_kwargs`: Other parameters to use for the model. These parameters are sent
2099  directly to the OpenAI endpoint.
2100  See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for
2101   more details.
2102   Some of the supported parameters:
2103   - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random,
2104       while lower values like 0.2 will make it more focused and deterministic.
2105   - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
2106       considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens
2107       comprising the top 10% probability mass are considered.
2108   - `previous_response_id`: The ID of the previous response.
2109       Use this to create multi-turn conversations.
2110   - `text_format`: A Pydantic model that enforces the structure of the model's response.
2111       If provided, the output will always be validated against this
2112       format (unless the model returns a tool call).
2113       For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
2114   - `text`: A JSON schema that enforces the structure of the model's response.
2115       If provided, the output will always be validated against this
2116       format (unless the model returns a tool call).
2117       Notes:
2118       - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o.
2119       - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored.
2120       - Currently, this component doesn't support streaming for structured outputs.
2121       - Older models only support basic version of structured outputs through `{"type": "json_object"}`.
2122           For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode).
2123   - `reasoning`: A dictionary of parameters for reasoning. For example:
2124       - `summary`: The summary of the reasoning.
2125       - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`.
2126       - `generate_summary`: Whether to generate a summary of the reasoning.
2127       Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled.
2128       For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning).
2129  - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the
2130  `OPENAI_TIMEOUT` environment variable, or 30 seconds.
2131  - `max_retries`: Maximum number of retries to contact OpenAI after an internal error.
2132  If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
2133  - `tools`: The tools that the model can use to prepare calls. This parameter can accept either a
2134  mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
2135  OpenAI/MCP tool definitions.
2136  Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
2137  For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
2138  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
2139  follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
2140  are strict by default.
2141  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
2142  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
2143  
2144  <a id="chat/openai_responses.OpenAIResponsesChatGenerator.warm_up"></a>
2145  
2146  #### OpenAIResponsesChatGenerator.warm\_up
2147  
2148  ```python
2149  def warm_up()
2150  ```
2151  
2152  Warm up the OpenAI responses chat generator.
2153  
2154  This will warm up the tools registered in the chat generator.
2155  This method is idempotent and will only warm up the tools once.
2156  
2157  <a id="chat/openai_responses.OpenAIResponsesChatGenerator.to_dict"></a>
2158  
2159  #### OpenAIResponsesChatGenerator.to\_dict
2160  
2161  ```python
2162  def to_dict() -> dict[str, Any]
2163  ```
2164  
2165  Serialize this component to a dictionary.
2166  
2167  **Returns**:
2168  
2169  The serialized component as a dictionary.
2170  
2171  <a id="chat/openai_responses.OpenAIResponsesChatGenerator.from_dict"></a>
2172  
2173  #### OpenAIResponsesChatGenerator.from\_dict
2174  
2175  ```python
2176  @classmethod
2177  def from_dict(cls, data: dict[str, Any]) -> "OpenAIResponsesChatGenerator"
2178  ```
2179  
2180  Deserialize this component from a dictionary.
2181  
2182  **Arguments**:
2183  
2184  - `data`: The dictionary representation of this component.
2185  
2186  **Returns**:
2187  
2188  The deserialized component instance.
2189  
2190  <a id="chat/openai_responses.OpenAIResponsesChatGenerator.run"></a>
2191  
2192  #### OpenAIResponsesChatGenerator.run
2193  
2194  ```python
2195  @component.output_types(replies=list[ChatMessage])
2196  def run(messages: list[ChatMessage],
2197          *,
2198          streaming_callback: Optional[StreamingCallbackT] = None,
2199          generation_kwargs: Optional[dict[str, Any]] = None,
2200          tools: Optional[Union[ToolsType, list[dict]]] = None,
2201          tools_strict: Optional[bool] = None)
2202  ```
2203  
2204  Invokes response generation based on the provided messages and generation parameters.
2205  
2206  **Arguments**:
2207  
2208  - `messages`: A list of ChatMessage instances representing the input messages.
2209  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
2210  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
2211  override the parameters passed during component initialization.
2212  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
2213  - `tools`: The tools that the model can use to prepare calls. If set, it will override the
2214  `tools` parameter set during component initialization. This parameter can accept either a
2215  mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
2216  OpenAI/MCP tool definitions.
2217  Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
2218  For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools).
2219  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly
2220  follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls
2221  are strict by default.
2222  If set, it will override the `tools_strict` parameter set during component initialization.
2223  
2224  **Returns**:
2225  
2226  A dictionary with the following key:
2227  - `replies`: A list containing the generated responses as ChatMessage instances.
2228  
2229  <a id="chat/openai_responses.OpenAIResponsesChatGenerator.run_async"></a>
2230  
2231  #### OpenAIResponsesChatGenerator.run\_async
2232  
2233  ```python
2234  @component.output_types(replies=list[ChatMessage])
2235  async def run_async(messages: list[ChatMessage],
2236                      *,
2237                      streaming_callback: Optional[StreamingCallbackT] = None,
2238                      generation_kwargs: Optional[dict[str, Any]] = None,
2239                      tools: Optional[Union[ToolsType, list[dict]]] = None,
2240                      tools_strict: Optional[bool] = None)
2241  ```
2242  
2243  Asynchronously invokes response generation based on the provided messages and generation parameters.
2244  
2245  This is the asynchronous version of the `run` method. It has the same parameters and return values
2246  but can be used with `await` in async code.
2247  
2248  **Arguments**:
2249  
2250  - `messages`: A list of ChatMessage instances representing the input messages.
2251  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
2252  Must be a coroutine.
2253  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will
2254  override the parameters passed during component initialization.
2255  For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create).
2256  - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the
2257  `tools` parameter set during component initialization. This parameter can accept either a list of
2258  mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of
2259  OpenAI/MCP tool definitions.
2260  Note: You cannot pass OpenAI/MCP tools and Haystack tools together.
2261  - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly
2262  the schema provided in the `parameters` field of the tool definition, but this may increase latency.
2263  If set, it will override the `tools_strict` parameter set during component initialization.
2264  
2265  **Returns**:
2266  
2267  A dictionary with the following key:
2268  - `replies`: A list containing the generated responses as ChatMessage instances.
2269  
2270  <a id="chat/fallback"></a>
2271  
2272  ## Module chat/fallback
2273  
2274  <a id="chat/fallback.FallbackChatGenerator"></a>
2275  
2276  ### FallbackChatGenerator
2277  
2278  A chat generator wrapper that tries multiple chat generators sequentially.
2279  
2280  It forwards all parameters transparently to the underlying chat generators and returns the first successful result.
2281  Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator.
2282  If all chat generators fail, it raises a RuntimeError with details.
2283  
2284  Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only
2285  work correctly if the underlying chat generators implement proper timeout handling and raise exceptions
2286  when timeouts occur. For predictable latency guarantees, ensure your chat generators:
2287  - Support a `timeout` parameter in their initialization
2288  - Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming)
2289  - Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded
2290  
2291  Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters
2292  with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`)
2293  typically applies to all connection phases: connection setup, read, write, and pool. For streaming
2294  responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for
2295  receiving the complete response.
2296  
2297  Failover is automatically triggered when a generator raises any exception, including:
2298  - Timeout errors (if the generator implements and raises them)
2299  - Rate limit errors (429)
2300  - Authentication errors (401)
2301  - Context length errors (400)
2302  - Server errors (500+)
2303  - Any other exception
2304  
2305  <a id="chat/fallback.FallbackChatGenerator.__init__"></a>
2306  
2307  #### FallbackChatGenerator.\_\_init\_\_
2308  
2309  ```python
2310  def __init__(chat_generators: list[ChatGenerator])
2311  ```
2312  
2313  Creates an instance of FallbackChatGenerator.
2314  
2315  **Arguments**:
2316  
2317  - `chat_generators`: A non-empty list of chat generator components to try in order.
2318  
2319  <a id="chat/fallback.FallbackChatGenerator.to_dict"></a>
2320  
2321  #### FallbackChatGenerator.to\_dict
2322  
2323  ```python
2324  def to_dict() -> dict[str, Any]
2325  ```
2326  
2327  Serialize the component, including nested chat generators when they support serialization.
2328  
2329  <a id="chat/fallback.FallbackChatGenerator.from_dict"></a>
2330  
2331  #### FallbackChatGenerator.from\_dict
2332  
2333  ```python
2334  @classmethod
2335  def from_dict(cls, data: dict[str, Any]) -> FallbackChatGenerator
2336  ```
2337  
2338  Rebuild the component from a serialized representation, restoring nested chat generators.
2339  
2340  <a id="chat/fallback.FallbackChatGenerator.warm_up"></a>
2341  
2342  #### FallbackChatGenerator.warm\_up
2343  
2344  ```python
2345  def warm_up() -> None
2346  ```
2347  
2348  Warm up all underlying chat generators.
2349  
2350  This method calls warm_up() on each underlying generator that supports it.
2351  
2352  <a id="chat/fallback.FallbackChatGenerator.run"></a>
2353  
2354  #### FallbackChatGenerator.run
2355  
2356  ```python
2357  @component.output_types(replies=list[ChatMessage], meta=dict[str, Any])
2358  def run(
2359      messages: list[ChatMessage],
2360      generation_kwargs: Union[dict[str, Any], None] = None,
2361      tools: Optional[ToolsType] = None,
2362      streaming_callback: Union[StreamingCallbackT,
2363                                None] = None) -> dict[str, Any]
2364  ```
2365  
2366  Execute chat generators sequentially until one succeeds.
2367  
2368  **Arguments**:
2369  
2370  - `messages`: The conversation history as a list of ChatMessage instances.
2371  - `generation_kwargs`: Optional parameters for the chat generator (e.g., temperature, max_tokens).
2372  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
2373  - `streaming_callback`: Optional callable for handling streaming responses.
2374  
2375  **Raises**:
2376  
2377  - `RuntimeError`: If all chat generators fail.
2378  
2379  **Returns**:
2380  
2381  A dictionary with:
2382  - "replies": Generated ChatMessage instances from the first successful generator.
2383  - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
2384    total_attempts, failed_chat_generators, plus any metadata from the successful generator.
2385  
2386  <a id="chat/fallback.FallbackChatGenerator.run_async"></a>
2387  
2388  #### FallbackChatGenerator.run\_async
2389  
2390  ```python
2391  @component.output_types(replies=list[ChatMessage], meta=dict[str, Any])
2392  async def run_async(
2393      messages: list[ChatMessage],
2394      generation_kwargs: Union[dict[str, Any], None] = None,
2395      tools: Optional[ToolsType] = None,
2396      streaming_callback: Union[StreamingCallbackT,
2397                                None] = None) -> dict[str, Any]
2398  ```
2399  
2400  Asynchronously execute chat generators sequentially until one succeeds.
2401  
2402  **Arguments**:
2403  
2404  - `messages`: The conversation history as a list of ChatMessage instances.
2405  - `generation_kwargs`: Optional parameters for the chat generator (e.g., temperature, max_tokens).
2406  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities.
2407  - `streaming_callback`: Optional callable for handling streaming responses.
2408  
2409  **Raises**:
2410  
2411  - `RuntimeError`: If all chat generators fail.
2412  
2413  **Returns**:
2414  
2415  A dictionary with:
2416  - "replies": Generated ChatMessage instances from the first successful generator.
2417  - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class,
2418    total_attempts, failed_chat_generators, plus any metadata from the successful generator.
2419