generators_api.md
1 --- 2 title: "Generators" 3 id: generators-api 4 description: "Enables text generation using LLMs." 5 slug: "/generators-api" 6 --- 7 8 <a id="azure"></a> 9 10 ## Module azure 11 12 <a id="azure.AzureOpenAIGenerator"></a> 13 14 ### AzureOpenAIGenerator 15 16 Generates text using OpenAI's large language models (LLMs). 17 18 It works with the gpt-4 - type models and supports streaming responses 19 from OpenAI API. 20 21 You can customize how the text is generated by passing parameters to the 22 OpenAI API. Use the `**generation_kwargs` argument when you initialize 23 the component or when you run it. Any parameter that works with 24 `openai.ChatCompletion.create` will work here too. 25 26 27 For details on OpenAI API parameters, see 28 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 29 30 31 ### Usage example 32 33 ```python 34 from haystack.components.generators import AzureOpenAIGenerator 35 from haystack.utils import Secret 36 client = AzureOpenAIGenerator( 37 azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>", 38 api_key=Secret.from_token("<your-api-key>"), 39 azure_deployment="<this a model name, e.g. gpt-4o-mini>") 40 response = client.run("What's Natural Language Processing? Be brief.") 41 print(response) 42 ``` 43 44 ``` 45 >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 46 >> the interaction between computers and human language. It involves enabling computers to understand, interpret, 47 >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model': 48 >> 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16, 49 >> 'completion_tokens': 49, 'total_tokens': 65}}]} 50 ``` 51 52 <a id="azure.AzureOpenAIGenerator.__init__"></a> 53 54 #### AzureOpenAIGenerator.\_\_init\_\_ 55 56 ```python 57 def __init__(azure_endpoint: Optional[str] = None, 58 api_version: Optional[str] = "2023-05-15", 59 azure_deployment: Optional[str] = "gpt-4o-mini", 60 api_key: Optional[Secret] = Secret.from_env_var( 61 "AZURE_OPENAI_API_KEY", strict=False), 62 azure_ad_token: Optional[Secret] = Secret.from_env_var( 63 "AZURE_OPENAI_AD_TOKEN", strict=False), 64 organization: Optional[str] = None, 65 streaming_callback: Optional[StreamingCallbackT] = None, 66 system_prompt: Optional[str] = None, 67 timeout: Optional[float] = None, 68 max_retries: Optional[int] = None, 69 http_client_kwargs: Optional[dict[str, Any]] = None, 70 generation_kwargs: Optional[dict[str, Any]] = None, 71 default_headers: Optional[dict[str, str]] = None, 72 *, 73 azure_ad_token_provider: Optional[AzureADTokenProvider] = None) 74 ``` 75 76 Initialize the Azure OpenAI Generator. 77 78 **Arguments**: 79 80 - `azure_endpoint`: The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`. 81 - `api_version`: The version of the API to use. Defaults to 2023-05-15. 82 - `azure_deployment`: The deployment of the model, usually the model name. 83 - `api_key`: The API key to use for authentication. 84 - `azure_ad_token`: [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 85 - `organization`: Your organization ID, defaults to `None`. For help, see 86 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 87 - `streaming_callback`: A callback function called when a new token is received from the stream. 88 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 89 as an argument. 90 - `system_prompt`: The system prompt to use for text generation. If not provided, the Generator 91 omits the system prompt and uses the default system prompt. 92 - `timeout`: Timeout for AzureOpenAI client. If not set, it is inferred from the 93 `OPENAI_TIMEOUT` environment variable or set to 30. 94 - `max_retries`: Maximum retries to establish contact with AzureOpenAI if it returns an internal error. 95 If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 96 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 97 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 98 - `generation_kwargs`: Other parameters to use for the model, sent directly to 99 the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for 100 more details. 101 Some of the supported parameters: 102 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 103 including visible output tokens and reasoning tokens. 104 - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. 105 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 106 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 107 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 108 comprising the top 10% probability mass are considered. 109 - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, 110 the LLM will generate two completions per prompt, resulting in 6 completions total. 111 - `stop`: One or more sequences after which the LLM should stop generating tokens. 112 - `presence_penalty`: The penalty applied if a token is already present. 113 Higher values make the model less likely to repeat the token. 114 - `frequency_penalty`: Penalty applied if a token has already been generated. 115 Higher values make the model less likely to repeat the token. 116 - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the 117 values are the bias to add to that token. 118 - `default_headers`: Default headers to use for the AzureOpenAI client. 119 - `azure_ad_token_provider`: A function that returns an Azure Active Directory token, will be invoked on 120 every request. 121 122 <a id="azure.AzureOpenAIGenerator.to_dict"></a> 123 124 #### AzureOpenAIGenerator.to\_dict 125 126 ```python 127 def to_dict() -> dict[str, Any] 128 ``` 129 130 Serialize this component to a dictionary. 131 132 **Returns**: 133 134 The serialized component as a dictionary. 135 136 <a id="azure.AzureOpenAIGenerator.from_dict"></a> 137 138 #### AzureOpenAIGenerator.from\_dict 139 140 ```python 141 @classmethod 142 def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIGenerator" 143 ``` 144 145 Deserialize this component from a dictionary. 146 147 **Arguments**: 148 149 - `data`: The dictionary representation of this component. 150 151 **Returns**: 152 153 The deserialized component instance. 154 155 <a id="azure.AzureOpenAIGenerator.run"></a> 156 157 #### AzureOpenAIGenerator.run 158 159 ```python 160 @component.output_types(replies=list[str], meta=list[dict[str, Any]]) 161 def run(prompt: str, 162 system_prompt: Optional[str] = None, 163 streaming_callback: Optional[StreamingCallbackT] = None, 164 generation_kwargs: Optional[dict[str, Any]] = None) 165 ``` 166 167 Invoke the text generation inference based on the provided messages and generation parameters. 168 169 **Arguments**: 170 171 - `prompt`: The string prompt to use for text generation. 172 - `system_prompt`: The system prompt to use for text generation. If this run time system prompt is omitted, the system 173 prompt, if defined at initialisation time, is used. 174 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 175 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters 176 passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to 177 the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create). 178 179 **Returns**: 180 181 A list of strings containing the generated responses and a list of dictionaries containing the metadata 182 for each response. 183 184 <a id="hugging_face_local"></a> 185 186 ## Module hugging\_face\_local 187 188 <a id="hugging_face_local.HuggingFaceLocalGenerator"></a> 189 190 ### HuggingFaceLocalGenerator 191 192 Generates text using models from Hugging Face that run locally. 193 194 LLMs running locally may need powerful hardware. 195 196 ### Usage example 197 198 ```python 199 from haystack.components.generators import HuggingFaceLocalGenerator 200 201 generator = HuggingFaceLocalGenerator( 202 model="google/flan-t5-large", 203 task="text2text-generation", 204 generation_kwargs={"max_new_tokens": 100, "temperature": 0.9}) 205 206 generator.warm_up() 207 208 print(generator.run("Who is the best American actor?")) 209 # {'replies': ['John Cusack']} 210 ``` 211 212 <a id="hugging_face_local.HuggingFaceLocalGenerator.__init__"></a> 213 214 #### HuggingFaceLocalGenerator.\_\_init\_\_ 215 216 ```python 217 def __init__(model: str = "google/flan-t5-base", 218 task: Optional[Literal["text-generation", 219 "text2text-generation"]] = None, 220 device: Optional[ComponentDevice] = None, 221 token: Optional[Secret] = Secret.from_env_var( 222 ["HF_API_TOKEN", "HF_TOKEN"], strict=False), 223 generation_kwargs: Optional[dict[str, Any]] = None, 224 huggingface_pipeline_kwargs: Optional[dict[str, Any]] = None, 225 stop_words: Optional[list[str]] = None, 226 streaming_callback: Optional[StreamingCallbackT] = None) 227 ``` 228 229 Creates an instance of a HuggingFaceLocalGenerator. 230 231 **Arguments**: 232 233 - `model`: The Hugging Face text generation model name or path. 234 - `task`: The task for the Hugging Face pipeline. Possible options: 235 - `text-generation`: Supported by decoder models, like GPT. 236 - `text2text-generation`: Supported by encoder-decoder models, like T5. 237 If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 238 If not specified, the component calls the Hugging Face API to infer the task from the model name. 239 - `device`: The device for loading the model. If `None`, automatically selects the default device. 240 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 241 - `token`: The token to use as HTTP bearer authorization for remote files. 242 If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 243 - `generation_kwargs`: A dictionary with keyword arguments to customize text generation. 244 Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. 245 See Hugging Face's documentation for more information: 246 - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation) 247 - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig) 248 - `huggingface_pipeline_kwargs`: Dictionary with keyword arguments to initialize the 249 Hugging Face pipeline for text generation. 250 These keyword arguments provide fine-grained control over the Hugging Face pipeline. 251 In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. 252 For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). 253 In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization: 254 [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) 255 - `stop_words`: If the model generates a stop word, the generation stops. 256 If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. 257 For some chat models, the output includes both the new text and the original prompt. 258 In these cases, make sure your prompt has no stop words. 259 - `streaming_callback`: An optional callable for handling streaming responses. 260 261 <a id="hugging_face_local.HuggingFaceLocalGenerator.warm_up"></a> 262 263 #### HuggingFaceLocalGenerator.warm\_up 264 265 ```python 266 def warm_up() 267 ``` 268 269 Initializes the component. 270 271 <a id="hugging_face_local.HuggingFaceLocalGenerator.to_dict"></a> 272 273 #### HuggingFaceLocalGenerator.to\_dict 274 275 ```python 276 def to_dict() -> dict[str, Any] 277 ``` 278 279 Serializes the component to a dictionary. 280 281 **Returns**: 282 283 Dictionary with serialized data. 284 285 <a id="hugging_face_local.HuggingFaceLocalGenerator.from_dict"></a> 286 287 #### HuggingFaceLocalGenerator.from\_dict 288 289 ```python 290 @classmethod 291 def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceLocalGenerator" 292 ``` 293 294 Deserializes the component from a dictionary. 295 296 **Arguments**: 297 298 - `data`: The dictionary to deserialize from. 299 300 **Returns**: 301 302 The deserialized component. 303 304 <a id="hugging_face_local.HuggingFaceLocalGenerator.run"></a> 305 306 #### HuggingFaceLocalGenerator.run 307 308 ```python 309 @component.output_types(replies=list[str]) 310 def run(prompt: str, 311 streaming_callback: Optional[StreamingCallbackT] = None, 312 generation_kwargs: Optional[dict[str, Any]] = None) 313 ``` 314 315 Run the text generation model on the given prompt. 316 317 **Arguments**: 318 319 - `prompt`: A string representing the prompt. 320 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 321 - `generation_kwargs`: Additional keyword arguments for text generation. 322 323 **Returns**: 324 325 A dictionary containing the generated replies. 326 - replies: A list of strings representing the generated replies. 327 328 <a id="hugging_face_api"></a> 329 330 ## Module hugging\_face\_api 331 332 <a id="hugging_face_api.HuggingFaceAPIGenerator"></a> 333 334 ### HuggingFaceAPIGenerator 335 336 Generates text using Hugging Face APIs. 337 338 Use it with the following Hugging Face APIs: 339 - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints) 340 - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference) 341 342 **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the 343 `text_generation` endpoint. Generative models are now only available through providers supporting the 344 `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API. 345 Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint. 346 347 ### Usage examples 348 349 #### With Hugging Face Inference Endpoints 350 351 352 #### With self-hosted text generation inference 353 354 #### With the free serverless inference API 355 356 Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the 357 `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the 358 `chat_completion` endpoint. 359 360 ```python 361 from haystack.components.generators import HuggingFaceAPIGenerator 362 from haystack.utils import Secret 363 364 generator = HuggingFaceAPIGenerator(api_type="inference_endpoints", 365 api_params={"url": "<your-inference-endpoint-url>"}, 366 token=Secret.from_token("<your-api-key>")) 367 368 result = generator.run(prompt="What's Natural Language Processing?") 369 print(result) 370 ``` 371 ```python 372 from haystack.components.generators import HuggingFaceAPIGenerator 373 374 generator = HuggingFaceAPIGenerator(api_type="text_generation_inference", 375 api_params={"url": "http://localhost:8080"}) 376 377 result = generator.run(prompt="What's Natural Language Processing?") 378 print(result) 379 ``` 380 ```python 381 from haystack.components.generators import HuggingFaceAPIGenerator 382 from haystack.utils import Secret 383 384 generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api", 385 api_params={"model": "HuggingFaceH4/zephyr-7b-beta"}, 386 token=Secret.from_token("<your-api-key>")) 387 388 result = generator.run(prompt="What's Natural Language Processing?") 389 print(result) 390 ``` 391 392 <a id="hugging_face_api.HuggingFaceAPIGenerator.__init__"></a> 393 394 #### HuggingFaceAPIGenerator.\_\_init\_\_ 395 396 ```python 397 def __init__(api_type: Union[HFGenerationAPIType, str], 398 api_params: dict[str, str], 399 token: Optional[Secret] = Secret.from_env_var( 400 ["HF_API_TOKEN", "HF_TOKEN"], strict=False), 401 generation_kwargs: Optional[dict[str, Any]] = None, 402 stop_words: Optional[list[str]] = None, 403 streaming_callback: Optional[StreamingCallbackT] = None) 404 ``` 405 406 Initialize the HuggingFaceAPIGenerator instance. 407 408 **Arguments**: 409 410 - `api_type`: The type of Hugging Face API to use. Available types: 411 - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference). 412 - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints). 413 - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api). 414 This might no longer work due to changes in the models offered in the Hugging Face Inference API. 415 Please use the `HuggingFaceAPIChatGenerator` component instead. 416 - `api_params`: A dictionary with the following keys: 417 - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. 418 - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or 419 `TEXT_GENERATION_INFERENCE`. 420 - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc. 421 - `token`: The Hugging Face token to use as HTTP bearer authorization. 422 Check your HF token in your [account settings](https://huggingface.co/settings/tokens). 423 - `generation_kwargs`: A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`, 424 `temperature`, `top_k`, `top_p`. 425 For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) 426 for more information. 427 - `stop_words`: An optional list of strings representing the stop words. 428 - `streaming_callback`: An optional callable for handling streaming responses. 429 430 <a id="hugging_face_api.HuggingFaceAPIGenerator.to_dict"></a> 431 432 #### HuggingFaceAPIGenerator.to\_dict 433 434 ```python 435 def to_dict() -> dict[str, Any] 436 ``` 437 438 Serialize this component to a dictionary. 439 440 **Returns**: 441 442 A dictionary containing the serialized component. 443 444 <a id="hugging_face_api.HuggingFaceAPIGenerator.from_dict"></a> 445 446 #### HuggingFaceAPIGenerator.from\_dict 447 448 ```python 449 @classmethod 450 def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIGenerator" 451 ``` 452 453 Deserialize this component from a dictionary. 454 455 <a id="hugging_face_api.HuggingFaceAPIGenerator.run"></a> 456 457 #### HuggingFaceAPIGenerator.run 458 459 ```python 460 @component.output_types(replies=list[str], meta=list[dict[str, Any]]) 461 def run(prompt: str, 462 streaming_callback: Optional[StreamingCallbackT] = None, 463 generation_kwargs: Optional[dict[str, Any]] = None) 464 ``` 465 466 Invoke the text generation inference for the given prompt and generation parameters. 467 468 **Arguments**: 469 470 - `prompt`: A string representing the prompt. 471 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 472 - `generation_kwargs`: Additional keyword arguments for text generation. 473 474 **Returns**: 475 476 A dictionary with the generated replies and metadata. Both are lists of length n. 477 - replies: A list of strings representing the generated replies. 478 479 <a id="openai"></a> 480 481 ## Module openai 482 483 <a id="openai.OpenAIGenerator"></a> 484 485 ### OpenAIGenerator 486 487 Generates text using OpenAI's large language models (LLMs). 488 489 It works with the gpt-4 and o-series models and supports streaming responses 490 from OpenAI API. It uses strings as input and output. 491 492 You can customize how the text is generated by passing parameters to the 493 OpenAI API. Use the `**generation_kwargs` argument when you initialize 494 the component or when you run it. Any parameter that works with 495 `openai.ChatCompletion.create` will work here too. 496 497 498 For details on OpenAI API parameters, see 499 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 500 501 ### Usage example 502 503 ```python 504 from haystack.components.generators import OpenAIGenerator 505 client = OpenAIGenerator() 506 response = client.run("What's Natural Language Processing? Be brief.") 507 print(response) 508 509 >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 510 >> the interaction between computers and human language. It involves enabling computers to understand, interpret, 511 >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model': 512 >> 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16, 513 >> 'completion_tokens': 49, 'total_tokens': 65}}]} 514 ``` 515 516 <a id="openai.OpenAIGenerator.__init__"></a> 517 518 #### OpenAIGenerator.\_\_init\_\_ 519 520 ```python 521 def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 522 model: str = "gpt-4o-mini", 523 streaming_callback: Optional[StreamingCallbackT] = None, 524 api_base_url: Optional[str] = None, 525 organization: Optional[str] = None, 526 system_prompt: Optional[str] = None, 527 generation_kwargs: Optional[dict[str, Any]] = None, 528 timeout: Optional[float] = None, 529 max_retries: Optional[int] = None, 530 http_client_kwargs: Optional[dict[str, Any]] = None) 531 ``` 532 533 Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-4o-mini 534 535 By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters 536 in the OpenAI client. 537 538 **Arguments**: 539 540 - `api_key`: The OpenAI API key to connect to OpenAI. 541 - `model`: The name of the model to use. 542 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 543 The callback function accepts StreamingChunk as an argument. 544 - `api_base_url`: An optional base URL. 545 - `organization`: The Organization ID, defaults to `None`. 546 - `system_prompt`: The system prompt to use for text generation. If not provided, the system prompt is 547 omitted, and the default system prompt of the model is used. 548 - `generation_kwargs`: Other parameters to use for the model. These parameters are all sent directly to 549 the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for 550 more details. 551 Some of the supported parameters: 552 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 553 including visible output tokens and reasoning tokens. 554 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 555 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 556 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 557 considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens 558 comprising the top 10% probability mass are considered. 559 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 560 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 561 - `stop`: One or more sequences after which the LLM should stop generating tokens. 562 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 563 the model will be less likely to repeat the same token in the text. 564 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 565 Bigger values mean the model will be less likely to repeat the same token in the text. 566 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 567 values are the bias to add to that token. 568 - `timeout`: Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable 569 or set to 30. 570 - `max_retries`: Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred 571 from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 572 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 573 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 574 575 <a id="openai.OpenAIGenerator.to_dict"></a> 576 577 #### OpenAIGenerator.to\_dict 578 579 ```python 580 def to_dict() -> dict[str, Any] 581 ``` 582 583 Serialize this component to a dictionary. 584 585 **Returns**: 586 587 The serialized component as a dictionary. 588 589 <a id="openai.OpenAIGenerator.from_dict"></a> 590 591 #### OpenAIGenerator.from\_dict 592 593 ```python 594 @classmethod 595 def from_dict(cls, data: dict[str, Any]) -> "OpenAIGenerator" 596 ``` 597 598 Deserialize this component from a dictionary. 599 600 **Arguments**: 601 602 - `data`: The dictionary representation of this component. 603 604 **Returns**: 605 606 The deserialized component instance. 607 608 <a id="openai.OpenAIGenerator.run"></a> 609 610 #### OpenAIGenerator.run 611 612 ```python 613 @component.output_types(replies=list[str], meta=list[dict[str, Any]]) 614 def run(prompt: str, 615 system_prompt: Optional[str] = None, 616 streaming_callback: Optional[StreamingCallbackT] = None, 617 generation_kwargs: Optional[dict[str, Any]] = None) 618 ``` 619 620 Invoke the text generation inference based on the provided messages and generation parameters. 621 622 **Arguments**: 623 624 - `prompt`: The string prompt to use for text generation. 625 - `system_prompt`: The system prompt to use for text generation. If this run time system prompt is omitted, the system 626 prompt, if defined at initialisation time, is used. 627 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 628 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters 629 passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to 630 the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create). 631 632 **Returns**: 633 634 A list of strings containing the generated responses and a list of dictionaries containing the metadata 635 for each response. 636 637 <a id="openai_dalle"></a> 638 639 ## Module openai\_dalle 640 641 <a id="openai_dalle.DALLEImageGenerator"></a> 642 643 ### DALLEImageGenerator 644 645 Generates images using OpenAI's DALL-E model. 646 647 For details on OpenAI API parameters, see 648 [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create). 649 650 ### Usage example 651 652 ```python 653 from haystack.components.generators import DALLEImageGenerator 654 image_generator = DALLEImageGenerator() 655 response = image_generator.run("Show me a picture of a black cat.") 656 print(response) 657 ``` 658 659 <a id="openai_dalle.DALLEImageGenerator.__init__"></a> 660 661 #### DALLEImageGenerator.\_\_init\_\_ 662 663 ```python 664 def __init__(model: str = "dall-e-3", 665 quality: Literal["standard", "hd"] = "standard", 666 size: Literal["256x256", "512x512", "1024x1024", "1792x1024", 667 "1024x1792"] = "1024x1024", 668 response_format: Literal["url", "b64_json"] = "url", 669 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 670 api_base_url: Optional[str] = None, 671 organization: Optional[str] = None, 672 timeout: Optional[float] = None, 673 max_retries: Optional[int] = None, 674 http_client_kwargs: Optional[dict[str, Any]] = None) 675 ``` 676 677 Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3. 678 679 **Arguments**: 680 681 - `model`: The model to use for image generation. Can be "dall-e-2" or "dall-e-3". 682 - `quality`: The quality of the generated image. Can be "standard" or "hd". 683 - `size`: The size of the generated images. 684 Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. 685 Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models. 686 - `response_format`: The format of the response. Can be "url" or "b64_json". 687 - `api_key`: The OpenAI API key to connect to OpenAI. 688 - `api_base_url`: An optional base URL. 689 - `organization`: The Organization ID, defaults to `None`. 690 - `timeout`: Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable 691 or set to 30. 692 - `max_retries`: Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred 693 from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 694 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 695 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 696 697 <a id="openai_dalle.DALLEImageGenerator.warm_up"></a> 698 699 #### DALLEImageGenerator.warm\_up 700 701 ```python 702 def warm_up() -> None 703 ``` 704 705 Warm up the OpenAI client. 706 707 <a id="openai_dalle.DALLEImageGenerator.run"></a> 708 709 #### DALLEImageGenerator.run 710 711 ```python 712 @component.output_types(images=list[str], revised_prompt=str) 713 def run(prompt: str, 714 size: Optional[Literal["256x256", "512x512", "1024x1024", "1792x1024", 715 "1024x1792"]] = None, 716 quality: Optional[Literal["standard", "hd"]] = None, 717 response_format: Optional[Optional[Literal["url", 718 "b64_json"]]] = None) 719 ``` 720 721 Invokes the image generation inference based on the provided prompt and generation parameters. 722 723 **Arguments**: 724 725 - `prompt`: The prompt to generate the image. 726 - `size`: If provided, overrides the size provided during initialization. 727 - `quality`: If provided, overrides the quality provided during initialization. 728 - `response_format`: If provided, overrides the response format provided during initialization. 729 730 **Returns**: 731 732 A dictionary containing the generated list of images and the revised prompt. 733 Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings. 734 The revised prompt is the prompt that was used to generate the image, if there was any revision 735 to the prompt made by OpenAI. 736 737 <a id="openai_dalle.DALLEImageGenerator.to_dict"></a> 738 739 #### DALLEImageGenerator.to\_dict 740 741 ```python 742 def to_dict() -> dict[str, Any] 743 ``` 744 745 Serialize this component to a dictionary. 746 747 **Returns**: 748 749 The serialized component as a dictionary. 750 751 <a id="openai_dalle.DALLEImageGenerator.from_dict"></a> 752 753 #### DALLEImageGenerator.from\_dict 754 755 ```python 756 @classmethod 757 def from_dict(cls, data: dict[str, Any]) -> "DALLEImageGenerator" 758 ``` 759 760 Deserialize this component from a dictionary. 761 762 **Arguments**: 763 764 - `data`: The dictionary representation of this component. 765 766 **Returns**: 767 768 The deserialized component instance. 769 770 <a id="chat/azure"></a> 771 772 ## Module chat/azure 773 774 <a id="chat/azure.AzureOpenAIChatGenerator"></a> 775 776 ### AzureOpenAIChatGenerator 777 778 Generates text using OpenAI's models on Azure. 779 780 It works with the gpt-4 - type models and supports streaming responses 781 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 782 format in input and output. 783 784 You can customize how the text is generated by passing parameters to the 785 OpenAI API. Use the `**generation_kwargs` argument when you initialize 786 the component or when you run it. Any parameter that works with 787 `openai.ChatCompletion.create` will work here too. 788 789 For details on OpenAI API parameters, see 790 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 791 792 ### Usage example 793 794 ```python 795 from haystack.components.generators.chat import AzureOpenAIChatGenerator 796 from haystack.dataclasses import ChatMessage 797 from haystack.utils import Secret 798 799 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 800 801 client = AzureOpenAIChatGenerator( 802 azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>", 803 api_key=Secret.from_token("<your-api-key>"), 804 azure_deployment="<this a model name, e.g. gpt-4o-mini>") 805 response = client.run(messages) 806 print(response) 807 ``` 808 809 ``` 810 {'replies': 811 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text= 812 "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 813 enabling computers to understand, interpret, and generate human language in a way that is useful.")], 814 _name=None, 815 _meta={'model': 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop', 816 'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})] 817 } 818 ``` 819 820 <a id="chat/azure.AzureOpenAIChatGenerator.__init__"></a> 821 822 #### AzureOpenAIChatGenerator.\_\_init\_\_ 823 824 ```python 825 def __init__(azure_endpoint: Optional[str] = None, 826 api_version: Optional[str] = "2023-05-15", 827 azure_deployment: Optional[str] = "gpt-4o-mini", 828 api_key: Optional[Secret] = Secret.from_env_var( 829 "AZURE_OPENAI_API_KEY", strict=False), 830 azure_ad_token: Optional[Secret] = Secret.from_env_var( 831 "AZURE_OPENAI_AD_TOKEN", strict=False), 832 organization: Optional[str] = None, 833 streaming_callback: Optional[StreamingCallbackT] = None, 834 timeout: Optional[float] = None, 835 max_retries: Optional[int] = None, 836 generation_kwargs: Optional[dict[str, Any]] = None, 837 default_headers: Optional[dict[str, str]] = None, 838 tools: Optional[ToolsType] = None, 839 tools_strict: bool = False, 840 *, 841 azure_ad_token_provider: Optional[Union[ 842 AzureADTokenProvider, AsyncAzureADTokenProvider]] = None, 843 http_client_kwargs: Optional[dict[str, Any]] = None) 844 ``` 845 846 Initialize the Azure OpenAI Chat Generator component. 847 848 **Arguments**: 849 850 - `azure_endpoint`: The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. 851 - `api_version`: The version of the API to use. Defaults to 2023-05-15. 852 - `azure_deployment`: The deployment of the model, usually the model name. 853 - `api_key`: The API key to use for authentication. 854 - `azure_ad_token`: [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 855 - `organization`: Your organization ID, defaults to `None`. For help, see 856 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 857 - `streaming_callback`: A callback function called when a new token is received from the stream. 858 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 859 as an argument. 860 - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the 861 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 862 - `max_retries`: Maximum number of retries to contact OpenAI after an internal error. 863 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 864 - `generation_kwargs`: Other parameters to use for the model. These parameters are sent directly to 865 the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 866 Some of the supported parameters: 867 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 868 including visible output tokens and reasoning tokens. 869 - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. 870 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 871 - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers 872 tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising 873 the top 10% probability mass are considered. 874 - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, 875 the LLM will generate two completions per prompt, resulting in 6 completions total. 876 - `stop`: One or more sequences after which the LLM should stop generating tokens. 877 - `presence_penalty`: The penalty applied if a token is already present. 878 Higher values make the model less likely to repeat the token. 879 - `frequency_penalty`: Penalty applied if a token has already been generated. 880 Higher values make the model less likely to repeat the token. 881 - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the 882 values are the bias to add to that token. 883 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 884 If provided, the output will always be validated against this 885 format (unless the model returns a tool call). 886 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 887 Notes: 888 - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. 889 Older models only support basic version of structured outputs through `{"type": "json_object"}`. 890 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 891 - For structured outputs with streaming, 892 the `response_format` must be a JSON schema and not a Pydantic model. 893 - `default_headers`: Default headers to use for the AzureOpenAI client. 894 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 895 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 896 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 897 - `azure_ad_token_provider`: A function that returns an Azure Active Directory token, will be invoked on 898 every request. 899 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 900 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 901 902 <a id="chat/azure.AzureOpenAIChatGenerator.warm_up"></a> 903 904 #### AzureOpenAIChatGenerator.warm\_up 905 906 ```python 907 def warm_up() 908 ``` 909 910 Warm up the Azure OpenAI chat generator. 911 912 This will warm up the tools registered in the chat generator. 913 This method is idempotent and will only warm up the tools once. 914 915 <a id="chat/azure.AzureOpenAIChatGenerator.to_dict"></a> 916 917 #### AzureOpenAIChatGenerator.to\_dict 918 919 ```python 920 def to_dict() -> dict[str, Any] 921 ``` 922 923 Serialize this component to a dictionary. 924 925 **Returns**: 926 927 The serialized component as a dictionary. 928 929 <a id="chat/azure.AzureOpenAIChatGenerator.from_dict"></a> 930 931 #### AzureOpenAIChatGenerator.from\_dict 932 933 ```python 934 @classmethod 935 def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIChatGenerator" 936 ``` 937 938 Deserialize this component from a dictionary. 939 940 **Arguments**: 941 942 - `data`: The dictionary representation of this component. 943 944 **Returns**: 945 946 The deserialized component instance. 947 948 <a id="chat/azure.AzureOpenAIChatGenerator.run"></a> 949 950 #### AzureOpenAIChatGenerator.run 951 952 ```python 953 @component.output_types(replies=list[ChatMessage]) 954 def run(messages: list[ChatMessage], 955 streaming_callback: Optional[StreamingCallbackT] = None, 956 generation_kwargs: Optional[dict[str, Any]] = None, 957 *, 958 tools: Optional[ToolsType] = None, 959 tools_strict: Optional[bool] = None) 960 ``` 961 962 Invokes chat completion based on the provided messages and generation parameters. 963 964 **Arguments**: 965 966 - `messages`: A list of ChatMessage instances representing the input messages. 967 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 968 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 969 override the parameters passed during component initialization. 970 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 971 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 972 If set, it will override the `tools` parameter provided during initialization. 973 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 974 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 975 If set, it will override the `tools_strict` parameter set during component initialization. 976 977 **Returns**: 978 979 A dictionary with the following key: 980 - `replies`: A list containing the generated responses as ChatMessage instances. 981 982 <a id="chat/azure.AzureOpenAIChatGenerator.run_async"></a> 983 984 #### AzureOpenAIChatGenerator.run\_async 985 986 ```python 987 @component.output_types(replies=list[ChatMessage]) 988 async def run_async(messages: list[ChatMessage], 989 streaming_callback: Optional[StreamingCallbackT] = None, 990 generation_kwargs: Optional[dict[str, Any]] = None, 991 *, 992 tools: Optional[ToolsType] = None, 993 tools_strict: Optional[bool] = None) 994 ``` 995 996 Asynchronously invokes chat completion based on the provided messages and generation parameters. 997 998 This is the asynchronous version of the `run` method. It has the same parameters and return values 999 but can be used with `await` in async code. 1000 1001 **Arguments**: 1002 1003 - `messages`: A list of ChatMessage instances representing the input messages. 1004 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 1005 Must be a coroutine. 1006 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 1007 override the parameters passed during component initialization. 1008 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 1009 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1010 If set, it will override the `tools` parameter provided during initialization. 1011 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1012 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1013 If set, it will override the `tools_strict` parameter set during component initialization. 1014 1015 **Returns**: 1016 1017 A dictionary with the following key: 1018 - `replies`: A list containing the generated responses as ChatMessage instances. 1019 1020 <a id="chat/azure_responses"></a> 1021 1022 ## Module chat/azure\_responses 1023 1024 <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator"></a> 1025 1026 ### AzureOpenAIResponsesChatGenerator 1027 1028 Completes chats using OpenAI's Responses API on Azure. 1029 1030 It works with the gpt-5 and o-series models and supports streaming responses 1031 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1032 format in input and output. 1033 1034 You can customize how the text is generated by passing parameters to the 1035 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1036 the component or when you run it. Any parameter that works with 1037 `openai.Responses.create` will work here too. 1038 1039 For details on OpenAI API parameters, see 1040 [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses). 1041 1042 ### Usage example 1043 1044 ```python 1045 from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator 1046 from haystack.dataclasses import ChatMessage 1047 1048 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 1049 1050 client = AzureOpenAIResponsesChatGenerator( 1051 azure_endpoint="https://example-resource.azure.openai.com/", 1052 generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}} 1053 ) 1054 response = client.run(messages) 1055 print(response) 1056 ``` 1057 1058 <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.__init__"></a> 1059 1060 #### AzureOpenAIResponsesChatGenerator.\_\_init\_\_ 1061 1062 ```python 1063 def __init__(*, 1064 api_key: Union[Secret, Callable[[], str], 1065 Callable[[], 1066 Awaitable[str]]] = Secret.from_env_var( 1067 "AZURE_OPENAI_API_KEY", strict=False), 1068 azure_endpoint: Optional[str] = None, 1069 azure_deployment: str = "gpt-5-mini", 1070 streaming_callback: Optional[StreamingCallbackT] = None, 1071 organization: Optional[str] = None, 1072 generation_kwargs: Optional[dict[str, Any]] = None, 1073 timeout: Optional[float] = None, 1074 max_retries: Optional[int] = None, 1075 tools: Optional[ToolsType] = None, 1076 tools_strict: bool = False, 1077 http_client_kwargs: Optional[dict[str, Any]] = None) 1078 ``` 1079 1080 Initialize the AzureOpenAIResponsesChatGenerator component. 1081 1082 **Arguments**: 1083 1084 - `api_key`: The API key to use for authentication. Can be: 1085 - A `Secret` object containing the API key. 1086 - A `Secret` object containing the [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 1087 - A function that returns an Azure Active Directory token. 1088 - `azure_endpoint`: The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. 1089 - `azure_deployment`: The deployment of the model, usually the model name. 1090 - `organization`: Your organization ID, defaults to `None`. For help, see 1091 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 1092 - `streaming_callback`: A callback function called when a new token is received from the stream. 1093 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 1094 as an argument. 1095 - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the 1096 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 1097 - `max_retries`: Maximum number of retries to contact OpenAI after an internal error. 1098 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 1099 - `generation_kwargs`: Other parameters to use for the model. These parameters are sent 1100 directly to the OpenAI endpoint. 1101 See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for 1102 more details. 1103 Some of the supported parameters: 1104 - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, 1105 while lower values like 0.2 will make it more focused and deterministic. 1106 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 1107 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 1108 comprising the top 10% probability mass are considered. 1109 - `previous_response_id`: The ID of the previous response. 1110 Use this to create multi-turn conversations. 1111 - `text_format`: A Pydantic model that enforces the structure of the model's response. 1112 If provided, the output will always be validated against this 1113 format (unless the model returns a tool call). 1114 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 1115 - `text`: A JSON schema that enforces the structure of the model's response. 1116 If provided, the output will always be validated against this 1117 format (unless the model returns a tool call). 1118 Notes: 1119 - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. 1120 - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. 1121 - Currently, this component doesn't support streaming for structured outputs. 1122 - Older models only support basic version of structured outputs through `{"type": "json_object"}`. 1123 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 1124 - `reasoning`: A dictionary of parameters for reasoning. For example: 1125 - `summary`: The summary of the reasoning. 1126 - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`. 1127 - `generate_summary`: Whether to generate a summary of the reasoning. 1128 Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. 1129 For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning). 1130 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1131 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1132 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1133 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 1134 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 1135 1136 <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.to_dict"></a> 1137 1138 #### AzureOpenAIResponsesChatGenerator.to\_dict 1139 1140 ```python 1141 def to_dict() -> dict[str, Any] 1142 ``` 1143 1144 Serialize this component to a dictionary. 1145 1146 **Returns**: 1147 1148 The serialized component as a dictionary. 1149 1150 <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.from_dict"></a> 1151 1152 #### AzureOpenAIResponsesChatGenerator.from\_dict 1153 1154 ```python 1155 @classmethod 1156 def from_dict(cls, data: dict[str, 1157 Any]) -> "AzureOpenAIResponsesChatGenerator" 1158 ``` 1159 1160 Deserialize this component from a dictionary. 1161 1162 **Arguments**: 1163 1164 - `data`: The dictionary representation of this component. 1165 1166 **Returns**: 1167 1168 The deserialized component instance. 1169 1170 <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.warm_up"></a> 1171 1172 #### AzureOpenAIResponsesChatGenerator.warm\_up 1173 1174 ```python 1175 def warm_up() 1176 ``` 1177 1178 Warm up the OpenAI responses chat generator. 1179 1180 This will warm up the tools registered in the chat generator. 1181 This method is idempotent and will only warm up the tools once. 1182 1183 <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.run"></a> 1184 1185 #### AzureOpenAIResponsesChatGenerator.run 1186 1187 ```python 1188 @component.output_types(replies=list[ChatMessage]) 1189 def run(messages: list[ChatMessage], 1190 *, 1191 streaming_callback: Optional[StreamingCallbackT] = None, 1192 generation_kwargs: Optional[dict[str, Any]] = None, 1193 tools: Optional[Union[ToolsType, list[dict]]] = None, 1194 tools_strict: Optional[bool] = None) 1195 ``` 1196 1197 Invokes response generation based on the provided messages and generation parameters. 1198 1199 **Arguments**: 1200 1201 - `messages`: A list of ChatMessage instances representing the input messages. 1202 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 1203 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 1204 override the parameters passed during component initialization. 1205 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 1206 - `tools`: The tools that the model can use to prepare calls. If set, it will override the 1207 `tools` parameter set during component initialization. This parameter can accept either a 1208 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1209 OpenAI/MCP tool definitions. 1210 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1211 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 1212 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 1213 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 1214 are strict by default. 1215 If set, it will override the `tools_strict` parameter set during component initialization. 1216 1217 **Returns**: 1218 1219 A dictionary with the following key: 1220 - `replies`: A list containing the generated responses as ChatMessage instances. 1221 1222 <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.run_async"></a> 1223 1224 #### AzureOpenAIResponsesChatGenerator.run\_async 1225 1226 ```python 1227 @component.output_types(replies=list[ChatMessage]) 1228 async def run_async(messages: list[ChatMessage], 1229 *, 1230 streaming_callback: Optional[StreamingCallbackT] = None, 1231 generation_kwargs: Optional[dict[str, Any]] = None, 1232 tools: Optional[Union[ToolsType, list[dict]]] = None, 1233 tools_strict: Optional[bool] = None) 1234 ``` 1235 1236 Asynchronously invokes response generation based on the provided messages and generation parameters. 1237 1238 This is the asynchronous version of the `run` method. It has the same parameters and return values 1239 but can be used with `await` in async code. 1240 1241 **Arguments**: 1242 1243 - `messages`: A list of ChatMessage instances representing the input messages. 1244 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 1245 Must be a coroutine. 1246 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 1247 override the parameters passed during component initialization. 1248 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 1249 - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the 1250 `tools` parameter set during component initialization. This parameter can accept either a list of 1251 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1252 OpenAI/MCP tool definitions. 1253 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1254 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1255 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1256 If set, it will override the `tools_strict` parameter set during component initialization. 1257 1258 **Returns**: 1259 1260 A dictionary with the following key: 1261 - `replies`: A list containing the generated responses as ChatMessage instances. 1262 1263 <a id="chat/hugging_face_local"></a> 1264 1265 ## Module chat/hugging\_face\_local 1266 1267 <a id="chat/hugging_face_local.default_tool_parser"></a> 1268 1269 #### default\_tool\_parser 1270 1271 ```python 1272 def default_tool_parser(text: str) -> Optional[list[ToolCall]] 1273 ``` 1274 1275 Default implementation for parsing tool calls from model output text. 1276 1277 Uses DEFAULT_TOOL_PATTERN to extract tool calls. 1278 1279 **Arguments**: 1280 1281 - `text`: The text to parse for tool calls. 1282 1283 **Returns**: 1284 1285 A list containing a single ToolCall if a valid tool call is found, None otherwise. 1286 1287 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator"></a> 1288 1289 ### HuggingFaceLocalChatGenerator 1290 1291 Generates chat responses using models from Hugging Face that run locally. 1292 1293 Use this component with chat-based models, 1294 such as `HuggingFaceH4/zephyr-7b-beta` or `meta-llama/Llama-2-7b-chat-hf`. 1295 LLMs running locally may need powerful hardware. 1296 1297 ### Usage example 1298 1299 ```python 1300 from haystack.components.generators.chat import HuggingFaceLocalChatGenerator 1301 from haystack.dataclasses import ChatMessage 1302 1303 generator = HuggingFaceLocalChatGenerator(model="HuggingFaceH4/zephyr-7b-beta") 1304 generator.warm_up() 1305 messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")] 1306 print(generator.run(messages)) 1307 ``` 1308 1309 ``` 1310 {'replies': 1311 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text= 1312 "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals 1313 with the interaction between computers and human language. It enables computers to understand, interpret, and 1314 generate human language in a valuable way. NLP involves various techniques such as speech recognition, text 1315 analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to 1316 process and derive meaning from human language, improving communication between humans and machines.")], 1317 _name=None, 1318 _meta={'finish_reason': 'stop', 'index': 0, 'model': 1319 'mistralai/Mistral-7B-Instruct-v0.2', 1320 'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}}) 1321 ] 1322 } 1323 ``` 1324 1325 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.__init__"></a> 1326 1327 #### HuggingFaceLocalChatGenerator.\_\_init\_\_ 1328 1329 ```python 1330 def __init__(model: str = "HuggingFaceH4/zephyr-7b-beta", 1331 task: Optional[Literal["text-generation", 1332 "text2text-generation"]] = None, 1333 device: Optional[ComponentDevice] = None, 1334 token: Optional[Secret] = Secret.from_env_var( 1335 ["HF_API_TOKEN", "HF_TOKEN"], strict=False), 1336 chat_template: Optional[str] = None, 1337 generation_kwargs: Optional[dict[str, Any]] = None, 1338 huggingface_pipeline_kwargs: Optional[dict[str, Any]] = None, 1339 stop_words: Optional[list[str]] = None, 1340 streaming_callback: Optional[StreamingCallbackT] = None, 1341 tools: Optional[ToolsType] = None, 1342 tool_parsing_function: Optional[Callable[ 1343 [str], Optional[list[ToolCall]]]] = None, 1344 async_executor: Optional[ThreadPoolExecutor] = None) -> None 1345 ``` 1346 1347 Initializes the HuggingFaceLocalChatGenerator component. 1348 1349 **Arguments**: 1350 1351 - `model`: The Hugging Face text generation model name or path, 1352 for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`. 1353 The model must be a chat model supporting the ChatML messaging 1354 format. 1355 If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1356 - `task`: The task for the Hugging Face pipeline. Possible options: 1357 - `text-generation`: Supported by decoder models, like GPT. 1358 - `text2text-generation`: Supported by encoder-decoder models, like T5. 1359 If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1360 If not specified, the component calls the Hugging Face API to infer the task from the model name. 1361 - `device`: The device for loading the model. If `None`, automatically selects the default device. 1362 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 1363 - `token`: The token to use as HTTP bearer authorization for remote files. 1364 If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1365 - `chat_template`: Specifies an optional Jinja template for formatting chat 1366 messages. Most high-quality chat models have their own templates, but for models without this 1367 feature or if you prefer a custom template, use this parameter. 1368 - `generation_kwargs`: A dictionary with keyword arguments to customize text generation. 1369 Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. 1370 See Hugging Face's documentation for more information: 1371 - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation) 1372 - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig) 1373 The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens. 1374 - `huggingface_pipeline_kwargs`: Dictionary with keyword arguments to initialize the 1375 Hugging Face pipeline for text generation. 1376 These keyword arguments provide fine-grained control over the Hugging Face pipeline. 1377 In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. 1378 For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). 1379 In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) 1380 - `stop_words`: A list of stop words. If the model generates a stop word, the generation stops. 1381 If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. 1382 For some chat models, the output includes both the new text and the original prompt. 1383 In these cases, make sure your prompt has no stop words. 1384 - `streaming_callback`: An optional callable for handling streaming responses. 1385 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1386 - `tool_parsing_function`: A callable that takes a string and returns a list of ToolCall objects or None. 1387 If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern. 1388 - `async_executor`: Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be 1389 initialized and used 1390 1391 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.__del__"></a> 1392 1393 #### HuggingFaceLocalChatGenerator.\_\_del\_\_ 1394 1395 ```python 1396 def __del__() -> None 1397 ``` 1398 1399 Cleanup when the instance is being destroyed. 1400 1401 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.shutdown"></a> 1402 1403 #### HuggingFaceLocalChatGenerator.shutdown 1404 1405 ```python 1406 def shutdown() -> None 1407 ``` 1408 1409 Explicitly shutdown the executor if we own it. 1410 1411 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.warm_up"></a> 1412 1413 #### HuggingFaceLocalChatGenerator.warm\_up 1414 1415 ```python 1416 def warm_up() -> None 1417 ``` 1418 1419 Initializes the component and warms up tools if provided. 1420 1421 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.to_dict"></a> 1422 1423 #### HuggingFaceLocalChatGenerator.to\_dict 1424 1425 ```python 1426 def to_dict() -> dict[str, Any] 1427 ``` 1428 1429 Serializes the component to a dictionary. 1430 1431 **Returns**: 1432 1433 Dictionary with serialized data. 1434 1435 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.from_dict"></a> 1436 1437 #### HuggingFaceLocalChatGenerator.from\_dict 1438 1439 ```python 1440 @classmethod 1441 def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceLocalChatGenerator" 1442 ``` 1443 1444 Deserializes the component from a dictionary. 1445 1446 **Arguments**: 1447 1448 - `data`: The dictionary to deserialize from. 1449 1450 **Returns**: 1451 1452 The deserialized component. 1453 1454 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.run"></a> 1455 1456 #### HuggingFaceLocalChatGenerator.run 1457 1458 ```python 1459 @component.output_types(replies=list[ChatMessage]) 1460 def run(messages: list[ChatMessage], 1461 generation_kwargs: Optional[dict[str, Any]] = None, 1462 streaming_callback: Optional[StreamingCallbackT] = None, 1463 tools: Optional[ToolsType] = None) -> dict[str, list[ChatMessage]] 1464 ``` 1465 1466 Invoke text generation inference based on the provided messages and generation parameters. 1467 1468 **Arguments**: 1469 1470 - `messages`: A list of ChatMessage objects representing the input messages. 1471 - `generation_kwargs`: Additional keyword arguments for text generation. 1472 - `streaming_callback`: An optional callable for handling streaming responses. 1473 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1474 If set, it will override the `tools` parameter provided during initialization. 1475 1476 **Returns**: 1477 1478 A dictionary with the following keys: 1479 - `replies`: A list containing the generated responses as ChatMessage instances. 1480 1481 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.create_message"></a> 1482 1483 #### HuggingFaceLocalChatGenerator.create\_message 1484 1485 ```python 1486 def create_message(text: str, 1487 index: int, 1488 tokenizer: Union["PreTrainedTokenizer", 1489 "PreTrainedTokenizerFast"], 1490 prompt: str, 1491 generation_kwargs: dict[str, Any], 1492 parse_tool_calls: bool = False) -> ChatMessage 1493 ``` 1494 1495 Create a ChatMessage instance from the provided text, populated with metadata. 1496 1497 **Arguments**: 1498 1499 - `text`: The generated text. 1500 - `index`: The index of the generated text. 1501 - `tokenizer`: The tokenizer used for generation. 1502 - `prompt`: The prompt used for generation. 1503 - `generation_kwargs`: The generation parameters. 1504 - `parse_tool_calls`: Whether to attempt parsing tool calls from the text. 1505 1506 **Returns**: 1507 1508 A ChatMessage instance. 1509 1510 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.run_async"></a> 1511 1512 #### HuggingFaceLocalChatGenerator.run\_async 1513 1514 ```python 1515 @component.output_types(replies=list[ChatMessage]) 1516 async def run_async( 1517 messages: list[ChatMessage], 1518 generation_kwargs: Optional[dict[str, Any]] = None, 1519 streaming_callback: Optional[StreamingCallbackT] = None, 1520 tools: Optional[ToolsType] = None) -> dict[str, list[ChatMessage]] 1521 ``` 1522 1523 Asynchronously invokes text generation inference based on the provided messages and generation parameters. 1524 1525 This is the asynchronous version of the `run` method. It has the same parameters 1526 and return values but can be used with `await` in an async code. 1527 1528 **Arguments**: 1529 1530 - `messages`: A list of ChatMessage objects representing the input messages. 1531 - `generation_kwargs`: Additional keyword arguments for text generation. 1532 - `streaming_callback`: An optional callable for handling streaming responses. 1533 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1534 If set, it will override the `tools` parameter provided during initialization. 1535 1536 **Returns**: 1537 1538 A dictionary with the following keys: 1539 - `replies`: A list containing the generated responses as ChatMessage instances. 1540 1541 <a id="chat/hugging_face_api"></a> 1542 1543 ## Module chat/hugging\_face\_api 1544 1545 <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator"></a> 1546 1547 ### HuggingFaceAPIChatGenerator 1548 1549 Completes chats using Hugging Face APIs. 1550 1551 HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1552 format for input and output. Use it to generate text with Hugging Face APIs: 1553 - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers) 1554 - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints) 1555 - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference) 1556 1557 ### Usage examples 1558 1559 #### With the serverless inference API (Inference Providers) - free tier available 1560 1561 ```python 1562 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 1563 from haystack.dataclasses import ChatMessage 1564 from haystack.utils import Secret 1565 from haystack.utils.hf import HFGenerationAPIType 1566 1567 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 1568 ChatMessage.from_user("What's Natural Language Processing?")] 1569 1570 # the api_type can be expressed using the HFGenerationAPIType enum or as a string 1571 api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API 1572 api_type = "serverless_inference_api" # this is equivalent to the above 1573 1574 generator = HuggingFaceAPIChatGenerator(api_type=api_type, 1575 api_params={"model": "Qwen/Qwen2.5-7B-Instruct", 1576 "provider": "together"}, 1577 token=Secret.from_token("<your-api-key>")) 1578 1579 result = generator.run(messages) 1580 print(result) 1581 ``` 1582 1583 #### With the serverless inference API (Inference Providers) and text+image input 1584 1585 ```python 1586 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 1587 from haystack.dataclasses import ChatMessage, ImageContent 1588 from haystack.utils import Secret 1589 from haystack.utils.hf import HFGenerationAPIType 1590 1591 # Create an image from file path, URL, or base64 1592 image = ImageContent.from_file_path("path/to/your/image.jpg") 1593 1594 # Create a multimodal message with both text and image 1595 messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])] 1596 1597 generator = HuggingFaceAPIChatGenerator( 1598 api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API, 1599 api_params={ 1600 "model": "Qwen/Qwen2.5-VL-7B-Instruct", # Vision Language Model 1601 "provider": "hyperbolic" 1602 }, 1603 token=Secret.from_token("<your-api-key>") 1604 ) 1605 1606 result = generator.run(messages) 1607 print(result) 1608 ``` 1609 1610 #### With paid inference endpoints 1611 1612 ```python 1613 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 1614 from haystack.dataclasses import ChatMessage 1615 from haystack.utils import Secret 1616 1617 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 1618 ChatMessage.from_user("What's Natural Language Processing?")] 1619 1620 generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints", 1621 api_params={"url": "<your-inference-endpoint-url>"}, 1622 token=Secret.from_token("<your-api-key>")) 1623 1624 result = generator.run(messages) 1625 print(result) 1626 1627 #### With self-hosted text generation inference 1628 1629 ```python 1630 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 1631 from haystack.dataclasses import ChatMessage 1632 1633 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 1634 ChatMessage.from_user("What's Natural Language Processing?")] 1635 1636 generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference", 1637 api_params={"url": "http://localhost:8080"}) 1638 1639 result = generator.run(messages) 1640 print(result) 1641 ``` 1642 1643 <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.__init__"></a> 1644 1645 #### HuggingFaceAPIChatGenerator.\_\_init\_\_ 1646 1647 ```python 1648 def __init__(api_type: Union[HFGenerationAPIType, str], 1649 api_params: dict[str, str], 1650 token: Optional[Secret] = Secret.from_env_var( 1651 ["HF_API_TOKEN", "HF_TOKEN"], strict=False), 1652 generation_kwargs: Optional[dict[str, Any]] = None, 1653 stop_words: Optional[list[str]] = None, 1654 streaming_callback: Optional[StreamingCallbackT] = None, 1655 tools: Optional[ToolsType] = None) 1656 ``` 1657 1658 Initialize the HuggingFaceAPIChatGenerator instance. 1659 1660 **Arguments**: 1661 1662 - `api_type`: The type of Hugging Face API to use. Available types: 1663 - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference). 1664 - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints). 1665 - `serverless_inference_api`: See 1666 [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers). 1667 - `api_params`: A dictionary with the following keys: 1668 - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. 1669 - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`. 1670 - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or 1671 `TEXT_GENERATION_INFERENCE`. 1672 - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc. 1673 - `token`: The Hugging Face token to use as HTTP bearer authorization. 1674 Check your HF token in your [account settings](https://huggingface.co/settings/tokens). 1675 - `generation_kwargs`: A dictionary with keyword arguments to customize text generation. 1676 Some examples: `max_tokens`, `temperature`, `top_p`. 1677 For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion). 1678 - `stop_words`: An optional list of strings representing the stop words. 1679 - `streaming_callback`: An optional callable for handling streaming responses. 1680 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1681 The chosen model should support tool/function calling, according to the model card. 1682 Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience 1683 unexpected behavior. 1684 1685 <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.warm_up"></a> 1686 1687 #### HuggingFaceAPIChatGenerator.warm\_up 1688 1689 ```python 1690 def warm_up() 1691 ``` 1692 1693 Warm up the Hugging Face API chat generator. 1694 1695 This will warm up the tools registered in the chat generator. 1696 This method is idempotent and will only warm up the tools once. 1697 1698 <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.to_dict"></a> 1699 1700 #### HuggingFaceAPIChatGenerator.to\_dict 1701 1702 ```python 1703 def to_dict() -> dict[str, Any] 1704 ``` 1705 1706 Serialize this component to a dictionary. 1707 1708 **Returns**: 1709 1710 A dictionary containing the serialized component. 1711 1712 <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.from_dict"></a> 1713 1714 #### HuggingFaceAPIChatGenerator.from\_dict 1715 1716 ```python 1717 @classmethod 1718 def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIChatGenerator" 1719 ``` 1720 1721 Deserialize this component from a dictionary. 1722 1723 <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.run"></a> 1724 1725 #### HuggingFaceAPIChatGenerator.run 1726 1727 ```python 1728 @component.output_types(replies=list[ChatMessage]) 1729 def run(messages: list[ChatMessage], 1730 generation_kwargs: Optional[dict[str, Any]] = None, 1731 tools: Optional[ToolsType] = None, 1732 streaming_callback: Optional[StreamingCallbackT] = None) 1733 ``` 1734 1735 Invoke the text generation inference based on the provided messages and generation parameters. 1736 1737 **Arguments**: 1738 1739 - `messages`: A list of ChatMessage objects representing the input messages. 1740 - `generation_kwargs`: Additional keyword arguments for text generation. 1741 - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override 1742 the `tools` parameter set during component initialization. This parameter can accept either a 1743 list of `Tool` objects or a `Toolset` instance. 1744 - `streaming_callback`: An optional callable for handling streaming responses. If set, it will override the `streaming_callback` 1745 parameter set during component initialization. 1746 1747 **Returns**: 1748 1749 A dictionary with the following keys: 1750 - `replies`: A list containing the generated responses as ChatMessage objects. 1751 1752 <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.run_async"></a> 1753 1754 #### HuggingFaceAPIChatGenerator.run\_async 1755 1756 ```python 1757 @component.output_types(replies=list[ChatMessage]) 1758 async def run_async(messages: list[ChatMessage], 1759 generation_kwargs: Optional[dict[str, Any]] = None, 1760 tools: Optional[ToolsType] = None, 1761 streaming_callback: Optional[StreamingCallbackT] = None) 1762 ``` 1763 1764 Asynchronously invokes the text generation inference based on the provided messages and generation parameters. 1765 1766 This is the asynchronous version of the `run` method. It has the same parameters 1767 and return values but can be used with `await` in an async code. 1768 1769 **Arguments**: 1770 1771 - `messages`: A list of ChatMessage objects representing the input messages. 1772 - `generation_kwargs`: Additional keyword arguments for text generation. 1773 - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools` 1774 parameter set during component initialization. This parameter can accept either a list of `Tool` objects 1775 or a `Toolset` instance. 1776 - `streaming_callback`: An optional callable for handling streaming responses. If set, it will override the `streaming_callback` 1777 parameter set during component initialization. 1778 1779 **Returns**: 1780 1781 A dictionary with the following keys: 1782 - `replies`: A list containing the generated responses as ChatMessage objects. 1783 1784 <a id="chat/openai"></a> 1785 1786 ## Module chat/openai 1787 1788 <a id="chat/openai.OpenAIChatGenerator"></a> 1789 1790 ### OpenAIChatGenerator 1791 1792 Completes chats using OpenAI's large language models (LLMs). 1793 1794 It works with the gpt-4 and o-series models and supports streaming responses 1795 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1796 format in input and output. 1797 1798 You can customize how the text is generated by passing parameters to the 1799 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1800 the component or when you run it. Any parameter that works with 1801 `openai.ChatCompletion.create` will work here too. 1802 1803 For details on OpenAI API parameters, see 1804 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 1805 1806 ### Usage example 1807 1808 ```python 1809 from haystack.components.generators.chat import OpenAIChatGenerator 1810 from haystack.dataclasses import ChatMessage 1811 1812 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 1813 1814 client = OpenAIChatGenerator() 1815 response = client.run(messages) 1816 print(response) 1817 ``` 1818 Output: 1819 ``` 1820 {'replies': 1821 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content= 1822 [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence 1823 that focuses on enabling computers to understand, interpret, and generate human language in 1824 a way that is meaningful and useful.")], 1825 _name=None, 1826 _meta={'model': 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop', 1827 'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}}) 1828 ] 1829 } 1830 ``` 1831 1832 <a id="chat/openai.OpenAIChatGenerator.__init__"></a> 1833 1834 #### OpenAIChatGenerator.\_\_init\_\_ 1835 1836 ```python 1837 def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 1838 model: str = "gpt-4o-mini", 1839 streaming_callback: Optional[StreamingCallbackT] = None, 1840 api_base_url: Optional[str] = None, 1841 organization: Optional[str] = None, 1842 generation_kwargs: Optional[dict[str, Any]] = None, 1843 timeout: Optional[float] = None, 1844 max_retries: Optional[int] = None, 1845 tools: Optional[ToolsType] = None, 1846 tools_strict: bool = False, 1847 http_client_kwargs: Optional[dict[str, Any]] = None) 1848 ``` 1849 1850 Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-4o-mini 1851 1852 Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' 1853 environment variables to override the `timeout` and `max_retries` parameters respectively 1854 in the OpenAI client. 1855 1856 **Arguments**: 1857 1858 - `api_key`: The OpenAI API key. 1859 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 1860 during initialization. 1861 - `model`: The name of the model to use. 1862 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 1863 The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 1864 as an argument. 1865 - `api_base_url`: An optional base URL. 1866 - `organization`: Your organization ID, defaults to `None`. See 1867 [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 1868 - `generation_kwargs`: Other parameters to use for the model. These parameters are sent directly to 1869 the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for 1870 more details. 1871 Some of the supported parameters: 1872 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 1873 including visible output tokens and reasoning tokens. 1874 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 1875 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 1876 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 1877 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 1878 comprising the top 10% probability mass are considered. 1879 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 1880 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 1881 - `stop`: One or more sequences after which the LLM should stop generating tokens. 1882 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 1883 the model will be less likely to repeat the same token in the text. 1884 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 1885 Bigger values mean the model will be less likely to repeat the same token in the text. 1886 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 1887 values are the bias to add to that token. 1888 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 1889 If provided, the output will always be validated against this 1890 format (unless the model returns a tool call). 1891 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 1892 Notes: 1893 - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. 1894 Older models only support basic version of structured outputs through `{"type": "json_object"}`. 1895 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 1896 - For structured outputs with streaming, 1897 the `response_format` must be a JSON schema and not a Pydantic model. 1898 - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the 1899 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 1900 - `max_retries`: Maximum number of retries to contact OpenAI after an internal error. 1901 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 1902 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1903 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1904 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1905 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 1906 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 1907 1908 <a id="chat/openai.OpenAIChatGenerator.warm_up"></a> 1909 1910 #### OpenAIChatGenerator.warm\_up 1911 1912 ```python 1913 def warm_up() 1914 ``` 1915 1916 Warm up the OpenAI chat generator. 1917 1918 This will warm up the tools registered in the chat generator. 1919 This method is idempotent and will only warm up the tools once. 1920 1921 <a id="chat/openai.OpenAIChatGenerator.to_dict"></a> 1922 1923 #### OpenAIChatGenerator.to\_dict 1924 1925 ```python 1926 def to_dict() -> dict[str, Any] 1927 ``` 1928 1929 Serialize this component to a dictionary. 1930 1931 **Returns**: 1932 1933 The serialized component as a dictionary. 1934 1935 <a id="chat/openai.OpenAIChatGenerator.from_dict"></a> 1936 1937 #### OpenAIChatGenerator.from\_dict 1938 1939 ```python 1940 @classmethod 1941 def from_dict(cls, data: dict[str, Any]) -> "OpenAIChatGenerator" 1942 ``` 1943 1944 Deserialize this component from a dictionary. 1945 1946 **Arguments**: 1947 1948 - `data`: The dictionary representation of this component. 1949 1950 **Returns**: 1951 1952 The deserialized component instance. 1953 1954 <a id="chat/openai.OpenAIChatGenerator.run"></a> 1955 1956 #### OpenAIChatGenerator.run 1957 1958 ```python 1959 @component.output_types(replies=list[ChatMessage]) 1960 def run(messages: list[ChatMessage], 1961 streaming_callback: Optional[StreamingCallbackT] = None, 1962 generation_kwargs: Optional[dict[str, Any]] = None, 1963 *, 1964 tools: Optional[ToolsType] = None, 1965 tools_strict: Optional[bool] = None) 1966 ``` 1967 1968 Invokes chat completion based on the provided messages and generation parameters. 1969 1970 **Arguments**: 1971 1972 - `messages`: A list of ChatMessage instances representing the input messages. 1973 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 1974 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 1975 override the parameters passed during component initialization. 1976 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 1977 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1978 If set, it will override the `tools` parameter provided during initialization. 1979 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1980 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1981 If set, it will override the `tools_strict` parameter set during component initialization. 1982 1983 **Returns**: 1984 1985 A dictionary with the following key: 1986 - `replies`: A list containing the generated responses as ChatMessage instances. 1987 1988 <a id="chat/openai.OpenAIChatGenerator.run_async"></a> 1989 1990 #### OpenAIChatGenerator.run\_async 1991 1992 ```python 1993 @component.output_types(replies=list[ChatMessage]) 1994 async def run_async(messages: list[ChatMessage], 1995 streaming_callback: Optional[StreamingCallbackT] = None, 1996 generation_kwargs: Optional[dict[str, Any]] = None, 1997 *, 1998 tools: Optional[ToolsType] = None, 1999 tools_strict: Optional[bool] = None) 2000 ``` 2001 2002 Asynchronously invokes chat completion based on the provided messages and generation parameters. 2003 2004 This is the asynchronous version of the `run` method. It has the same parameters and return values 2005 but can be used with `await` in async code. 2006 2007 **Arguments**: 2008 2009 - `messages`: A list of ChatMessage instances representing the input messages. 2010 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 2011 Must be a coroutine. 2012 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 2013 override the parameters passed during component initialization. 2014 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 2015 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 2016 If set, it will override the `tools` parameter provided during initialization. 2017 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 2018 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 2019 If set, it will override the `tools_strict` parameter set during component initialization. 2020 2021 **Returns**: 2022 2023 A dictionary with the following key: 2024 - `replies`: A list containing the generated responses as ChatMessage instances. 2025 2026 <a id="chat/openai_responses"></a> 2027 2028 ## Module chat/openai\_responses 2029 2030 <a id="chat/openai_responses.OpenAIResponsesChatGenerator"></a> 2031 2032 ### OpenAIResponsesChatGenerator 2033 2034 Completes chats using OpenAI's Responses API. 2035 2036 It works with the gpt-4 and o-series models and supports streaming responses 2037 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 2038 format in input and output. 2039 2040 You can customize how the text is generated by passing parameters to the 2041 OpenAI API. Use the `**generation_kwargs` argument when you initialize 2042 the component or when you run it. Any parameter that works with 2043 `openai.Responses.create` will work here too. 2044 2045 For details on OpenAI API parameters, see 2046 [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses). 2047 2048 ### Usage example 2049 2050 ```python 2051 from haystack.components.generators.chat import OpenAIResponsesChatGenerator 2052 from haystack.dataclasses import ChatMessage 2053 2054 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 2055 2056 client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}) 2057 response = client.run(messages) 2058 print(response) 2059 ``` 2060 2061 <a id="chat/openai_responses.OpenAIResponsesChatGenerator.__init__"></a> 2062 2063 #### OpenAIResponsesChatGenerator.\_\_init\_\_ 2064 2065 ```python 2066 def __init__(*, 2067 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 2068 model: str = "gpt-5-mini", 2069 streaming_callback: Optional[StreamingCallbackT] = None, 2070 api_base_url: Optional[str] = None, 2071 organization: Optional[str] = None, 2072 generation_kwargs: Optional[dict[str, Any]] = None, 2073 timeout: Optional[float] = None, 2074 max_retries: Optional[int] = None, 2075 tools: Optional[Union[ToolsType, list[dict]]] = None, 2076 tools_strict: bool = False, 2077 http_client_kwargs: Optional[dict[str, Any]] = None) 2078 ``` 2079 2080 Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default. 2081 2082 Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' 2083 environment variables to override the `timeout` and `max_retries` parameters respectively 2084 in the OpenAI client. 2085 2086 **Arguments**: 2087 2088 - `api_key`: The OpenAI API key. 2089 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 2090 during initialization. 2091 - `model`: The name of the model to use. 2092 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 2093 The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 2094 as an argument. 2095 - `api_base_url`: An optional base URL. 2096 - `organization`: Your organization ID, defaults to `None`. See 2097 [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 2098 - `generation_kwargs`: Other parameters to use for the model. These parameters are sent 2099 directly to the OpenAI endpoint. 2100 See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for 2101 more details. 2102 Some of the supported parameters: 2103 - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, 2104 while lower values like 0.2 will make it more focused and deterministic. 2105 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 2106 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 2107 comprising the top 10% probability mass are considered. 2108 - `previous_response_id`: The ID of the previous response. 2109 Use this to create multi-turn conversations. 2110 - `text_format`: A Pydantic model that enforces the structure of the model's response. 2111 If provided, the output will always be validated against this 2112 format (unless the model returns a tool call). 2113 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 2114 - `text`: A JSON schema that enforces the structure of the model's response. 2115 If provided, the output will always be validated against this 2116 format (unless the model returns a tool call). 2117 Notes: 2118 - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. 2119 - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. 2120 - Currently, this component doesn't support streaming for structured outputs. 2121 - Older models only support basic version of structured outputs through `{"type": "json_object"}`. 2122 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 2123 - `reasoning`: A dictionary of parameters for reasoning. For example: 2124 - `summary`: The summary of the reasoning. 2125 - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`. 2126 - `generate_summary`: Whether to generate a summary of the reasoning. 2127 Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. 2128 For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning). 2129 - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the 2130 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 2131 - `max_retries`: Maximum number of retries to contact OpenAI after an internal error. 2132 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 2133 - `tools`: The tools that the model can use to prepare calls. This parameter can accept either a 2134 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 2135 OpenAI/MCP tool definitions. 2136 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 2137 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 2138 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 2139 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 2140 are strict by default. 2141 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 2142 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 2143 2144 <a id="chat/openai_responses.OpenAIResponsesChatGenerator.warm_up"></a> 2145 2146 #### OpenAIResponsesChatGenerator.warm\_up 2147 2148 ```python 2149 def warm_up() 2150 ``` 2151 2152 Warm up the OpenAI responses chat generator. 2153 2154 This will warm up the tools registered in the chat generator. 2155 This method is idempotent and will only warm up the tools once. 2156 2157 <a id="chat/openai_responses.OpenAIResponsesChatGenerator.to_dict"></a> 2158 2159 #### OpenAIResponsesChatGenerator.to\_dict 2160 2161 ```python 2162 def to_dict() -> dict[str, Any] 2163 ``` 2164 2165 Serialize this component to a dictionary. 2166 2167 **Returns**: 2168 2169 The serialized component as a dictionary. 2170 2171 <a id="chat/openai_responses.OpenAIResponsesChatGenerator.from_dict"></a> 2172 2173 #### OpenAIResponsesChatGenerator.from\_dict 2174 2175 ```python 2176 @classmethod 2177 def from_dict(cls, data: dict[str, Any]) -> "OpenAIResponsesChatGenerator" 2178 ``` 2179 2180 Deserialize this component from a dictionary. 2181 2182 **Arguments**: 2183 2184 - `data`: The dictionary representation of this component. 2185 2186 **Returns**: 2187 2188 The deserialized component instance. 2189 2190 <a id="chat/openai_responses.OpenAIResponsesChatGenerator.run"></a> 2191 2192 #### OpenAIResponsesChatGenerator.run 2193 2194 ```python 2195 @component.output_types(replies=list[ChatMessage]) 2196 def run(messages: list[ChatMessage], 2197 *, 2198 streaming_callback: Optional[StreamingCallbackT] = None, 2199 generation_kwargs: Optional[dict[str, Any]] = None, 2200 tools: Optional[Union[ToolsType, list[dict]]] = None, 2201 tools_strict: Optional[bool] = None) 2202 ``` 2203 2204 Invokes response generation based on the provided messages and generation parameters. 2205 2206 **Arguments**: 2207 2208 - `messages`: A list of ChatMessage instances representing the input messages. 2209 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 2210 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 2211 override the parameters passed during component initialization. 2212 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 2213 - `tools`: The tools that the model can use to prepare calls. If set, it will override the 2214 `tools` parameter set during component initialization. This parameter can accept either a 2215 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 2216 OpenAI/MCP tool definitions. 2217 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 2218 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 2219 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 2220 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 2221 are strict by default. 2222 If set, it will override the `tools_strict` parameter set during component initialization. 2223 2224 **Returns**: 2225 2226 A dictionary with the following key: 2227 - `replies`: A list containing the generated responses as ChatMessage instances. 2228 2229 <a id="chat/openai_responses.OpenAIResponsesChatGenerator.run_async"></a> 2230 2231 #### OpenAIResponsesChatGenerator.run\_async 2232 2233 ```python 2234 @component.output_types(replies=list[ChatMessage]) 2235 async def run_async(messages: list[ChatMessage], 2236 *, 2237 streaming_callback: Optional[StreamingCallbackT] = None, 2238 generation_kwargs: Optional[dict[str, Any]] = None, 2239 tools: Optional[Union[ToolsType, list[dict]]] = None, 2240 tools_strict: Optional[bool] = None) 2241 ``` 2242 2243 Asynchronously invokes response generation based on the provided messages and generation parameters. 2244 2245 This is the asynchronous version of the `run` method. It has the same parameters and return values 2246 but can be used with `await` in async code. 2247 2248 **Arguments**: 2249 2250 - `messages`: A list of ChatMessage instances representing the input messages. 2251 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 2252 Must be a coroutine. 2253 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 2254 override the parameters passed during component initialization. 2255 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 2256 - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the 2257 `tools` parameter set during component initialization. This parameter can accept either a list of 2258 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 2259 OpenAI/MCP tool definitions. 2260 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 2261 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 2262 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 2263 If set, it will override the `tools_strict` parameter set during component initialization. 2264 2265 **Returns**: 2266 2267 A dictionary with the following key: 2268 - `replies`: A list containing the generated responses as ChatMessage instances. 2269 2270 <a id="chat/fallback"></a> 2271 2272 ## Module chat/fallback 2273 2274 <a id="chat/fallback.FallbackChatGenerator"></a> 2275 2276 ### FallbackChatGenerator 2277 2278 A chat generator wrapper that tries multiple chat generators sequentially. 2279 2280 It forwards all parameters transparently to the underlying chat generators and returns the first successful result. 2281 Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator. 2282 If all chat generators fail, it raises a RuntimeError with details. 2283 2284 Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only 2285 work correctly if the underlying chat generators implement proper timeout handling and raise exceptions 2286 when timeouts occur. For predictable latency guarantees, ensure your chat generators: 2287 - Support a `timeout` parameter in their initialization 2288 - Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming) 2289 - Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded 2290 2291 Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters 2292 with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`) 2293 typically applies to all connection phases: connection setup, read, write, and pool. For streaming 2294 responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for 2295 receiving the complete response. 2296 2297 Failover is automatically triggered when a generator raises any exception, including: 2298 - Timeout errors (if the generator implements and raises them) 2299 - Rate limit errors (429) 2300 - Authentication errors (401) 2301 - Context length errors (400) 2302 - Server errors (500+) 2303 - Any other exception 2304 2305 <a id="chat/fallback.FallbackChatGenerator.__init__"></a> 2306 2307 #### FallbackChatGenerator.\_\_init\_\_ 2308 2309 ```python 2310 def __init__(chat_generators: list[ChatGenerator]) 2311 ``` 2312 2313 Creates an instance of FallbackChatGenerator. 2314 2315 **Arguments**: 2316 2317 - `chat_generators`: A non-empty list of chat generator components to try in order. 2318 2319 <a id="chat/fallback.FallbackChatGenerator.to_dict"></a> 2320 2321 #### FallbackChatGenerator.to\_dict 2322 2323 ```python 2324 def to_dict() -> dict[str, Any] 2325 ``` 2326 2327 Serialize the component, including nested chat generators when they support serialization. 2328 2329 <a id="chat/fallback.FallbackChatGenerator.from_dict"></a> 2330 2331 #### FallbackChatGenerator.from\_dict 2332 2333 ```python 2334 @classmethod 2335 def from_dict(cls, data: dict[str, Any]) -> FallbackChatGenerator 2336 ``` 2337 2338 Rebuild the component from a serialized representation, restoring nested chat generators. 2339 2340 <a id="chat/fallback.FallbackChatGenerator.warm_up"></a> 2341 2342 #### FallbackChatGenerator.warm\_up 2343 2344 ```python 2345 def warm_up() -> None 2346 ``` 2347 2348 Warm up all underlying chat generators. 2349 2350 This method calls warm_up() on each underlying generator that supports it. 2351 2352 <a id="chat/fallback.FallbackChatGenerator.run"></a> 2353 2354 #### FallbackChatGenerator.run 2355 2356 ```python 2357 @component.output_types(replies=list[ChatMessage], meta=dict[str, Any]) 2358 def run( 2359 messages: list[ChatMessage], 2360 generation_kwargs: Union[dict[str, Any], None] = None, 2361 tools: Optional[ToolsType] = None, 2362 streaming_callback: Union[StreamingCallbackT, 2363 None] = None) -> dict[str, Any] 2364 ``` 2365 2366 Execute chat generators sequentially until one succeeds. 2367 2368 **Arguments**: 2369 2370 - `messages`: The conversation history as a list of ChatMessage instances. 2371 - `generation_kwargs`: Optional parameters for the chat generator (e.g., temperature, max_tokens). 2372 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities. 2373 - `streaming_callback`: Optional callable for handling streaming responses. 2374 2375 **Raises**: 2376 2377 - `RuntimeError`: If all chat generators fail. 2378 2379 **Returns**: 2380 2381 A dictionary with: 2382 - "replies": Generated ChatMessage instances from the first successful generator. 2383 - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, 2384 total_attempts, failed_chat_generators, plus any metadata from the successful generator. 2385 2386 <a id="chat/fallback.FallbackChatGenerator.run_async"></a> 2387 2388 #### FallbackChatGenerator.run\_async 2389 2390 ```python 2391 @component.output_types(replies=list[ChatMessage], meta=dict[str, Any]) 2392 async def run_async( 2393 messages: list[ChatMessage], 2394 generation_kwargs: Union[dict[str, Any], None] = None, 2395 tools: Optional[ToolsType] = None, 2396 streaming_callback: Union[StreamingCallbackT, 2397 None] = None) -> dict[str, Any] 2398 ``` 2399 2400 Asynchronously execute chat generators sequentially until one succeeds. 2401 2402 **Arguments**: 2403 2404 - `messages`: The conversation history as a list of ChatMessage instances. 2405 - `generation_kwargs`: Optional parameters for the chat generator (e.g., temperature, max_tokens). 2406 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities. 2407 - `streaming_callback`: Optional callable for handling streaming responses. 2408 2409 **Raises**: 2410 2411 - `RuntimeError`: If all chat generators fail. 2412 2413 **Returns**: 2414 2415 A dictionary with: 2416 - "replies": Generated ChatMessage instances from the first successful generator. 2417 - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, 2418 total_attempts, failed_chat_generators, plus any metadata from the successful generator. 2419