generators_api.md
1 --- 2 title: "Generators" 3 id: generators-api 4 description: "Enables text generation using LLMs." 5 slug: "/generators-api" 6 --- 7 8 <a id="azure"></a> 9 10 ## Module azure 11 12 <a id="azure.AzureOpenAIGenerator"></a> 13 14 ### AzureOpenAIGenerator 15 16 Generates text using OpenAI's large language models (LLMs). 17 18 It works with the gpt-4 - type models and supports streaming responses 19 from OpenAI API. 20 21 You can customize how the text is generated by passing parameters to the 22 OpenAI API. Use the `**generation_kwargs` argument when you initialize 23 the component or when you run it. Any parameter that works with 24 `openai.ChatCompletion.create` will work here too. 25 26 27 For details on OpenAI API parameters, see 28 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 29 30 31 ### Usage example 32 33 ```python 34 from haystack.components.generators import AzureOpenAIGenerator 35 from haystack.utils import Secret 36 client = AzureOpenAIGenerator( 37 azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>", 38 api_key=Secret.from_token("<your-api-key>"), 39 azure_deployment="<this a model name, e.g. gpt-4.1-mini>") 40 response = client.run("What's Natural Language Processing? Be brief.") 41 print(response) 42 ``` 43 44 ``` 45 >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 46 >> the interaction between computers and human language. It involves enabling computers to understand, interpret, 47 >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model': 48 >> 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16, 49 >> 'completion_tokens': 49, 'total_tokens': 65}}]} 50 ``` 51 52 <a id="azure.AzureOpenAIGenerator.__init__"></a> 53 54 #### AzureOpenAIGenerator.\_\_init\_\_ 55 56 ```python 57 def __init__(azure_endpoint: str | None = None, 58 api_version: str | None = "2024-12-01-preview", 59 azure_deployment: str | None = "gpt-4.1-mini", 60 api_key: Secret | None = Secret.from_env_var( 61 "AZURE_OPENAI_API_KEY", strict=False), 62 azure_ad_token: Secret | None = Secret.from_env_var( 63 "AZURE_OPENAI_AD_TOKEN", strict=False), 64 organization: str | None = None, 65 streaming_callback: StreamingCallbackT | None = None, 66 system_prompt: str | None = None, 67 timeout: float | None = None, 68 max_retries: int | None = None, 69 http_client_kwargs: dict[str, Any] | None = None, 70 generation_kwargs: dict[str, Any] | None = None, 71 default_headers: dict[str, str] | None = None, 72 *, 73 azure_ad_token_provider: AzureADTokenProvider | None = None) 74 ``` 75 76 Initialize the Azure OpenAI Generator. 77 78 **Arguments**: 79 80 - `azure_endpoint`: The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`. 81 - `api_version`: The version of the API to use. Defaults to 2024-12-01-preview. 82 - `azure_deployment`: The deployment of the model, usually the model name. 83 - `api_key`: The API key to use for authentication. 84 - `azure_ad_token`: [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 85 - `organization`: Your organization ID, defaults to `None`. For help, see 86 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 87 - `streaming_callback`: A callback function called when a new token is received from the stream. 88 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 89 as an argument. 90 - `system_prompt`: The system prompt to use for text generation. If not provided, the Generator 91 omits the system prompt and uses the default system prompt. 92 - `timeout`: Timeout for AzureOpenAI client. If not set, it is inferred from the 93 `OPENAI_TIMEOUT` environment variable or set to 30. 94 - `max_retries`: Maximum retries to establish contact with AzureOpenAI if it returns an internal error. 95 If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 96 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 97 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 98 - `generation_kwargs`: Other parameters to use for the model, sent directly to 99 the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for 100 more details. 101 Some of the supported parameters: 102 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 103 including visible output tokens and reasoning tokens. 104 - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. 105 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 106 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 107 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 108 comprising the top 10% probability mass are considered. 109 - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, 110 the LLM will generate two completions per prompt, resulting in 6 completions total. 111 - `stop`: One or more sequences after which the LLM should stop generating tokens. 112 - `presence_penalty`: The penalty applied if a token is already present. 113 Higher values make the model less likely to repeat the token. 114 - `frequency_penalty`: Penalty applied if a token has already been generated. 115 Higher values make the model less likely to repeat the token. 116 - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the 117 values are the bias to add to that token. 118 - `default_headers`: Default headers to use for the AzureOpenAI client. 119 - `azure_ad_token_provider`: A function that returns an Azure Active Directory token, will be invoked on 120 every request. 121 122 <a id="azure.AzureOpenAIGenerator.to_dict"></a> 123 124 #### AzureOpenAIGenerator.to\_dict 125 126 ```python 127 def to_dict() -> dict[str, Any] 128 ``` 129 130 Serialize this component to a dictionary. 131 132 **Returns**: 133 134 The serialized component as a dictionary. 135 136 <a id="azure.AzureOpenAIGenerator.from_dict"></a> 137 138 #### AzureOpenAIGenerator.from\_dict 139 140 ```python 141 @classmethod 142 def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIGenerator" 143 ``` 144 145 Deserialize this component from a dictionary. 146 147 **Arguments**: 148 149 - `data`: The dictionary representation of this component. 150 151 **Returns**: 152 153 The deserialized component instance. 154 155 <a id="azure.AzureOpenAIGenerator.run"></a> 156 157 #### AzureOpenAIGenerator.run 158 159 ```python 160 @component.output_types(replies=list[str], meta=list[dict[str, Any]]) 161 def run( 162 prompt: str, 163 system_prompt: str | None = None, 164 streaming_callback: StreamingCallbackT | None = None, 165 generation_kwargs: dict[str, Any] | None = None 166 ) -> dict[str, list[str] | list[dict[str, Any]]] 167 ``` 168 169 Invoke the text generation inference based on the provided messages and generation parameters. 170 171 **Arguments**: 172 173 - `prompt`: The string prompt to use for text generation. 174 - `system_prompt`: The system prompt to use for text generation. If this run time system prompt is omitted, the system 175 prompt, if defined at initialisation time, is used. 176 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 177 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters 178 passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to 179 the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create). 180 181 **Returns**: 182 183 A list of strings containing the generated responses and a list of dictionaries containing the metadata 184 for each response. 185 186 <a id="chat/azure"></a> 187 188 ## Module chat/azure 189 190 <a id="chat/azure.AzureOpenAIChatGenerator"></a> 191 192 ### AzureOpenAIChatGenerator 193 194 Generates text using OpenAI's models on Azure. 195 196 It works with the gpt-4 - type models and supports streaming responses 197 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 198 format in input and output. 199 200 You can customize how the text is generated by passing parameters to the 201 OpenAI API. Use the `**generation_kwargs` argument when you initialize 202 the component or when you run it. Any parameter that works with 203 `openai.ChatCompletion.create` will work here too. 204 205 For details on OpenAI API parameters, see 206 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 207 208 ### Usage example 209 210 ```python 211 from haystack.components.generators.chat import AzureOpenAIChatGenerator 212 from haystack.dataclasses import ChatMessage 213 from haystack.utils import Secret 214 215 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 216 217 client = AzureOpenAIChatGenerator( 218 azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>", 219 api_key=Secret.from_token("<your-api-key>"), 220 azure_deployment="<this a model name, e.g. gpt-4.1-mini>") 221 response = client.run(messages) 222 print(response) 223 ``` 224 225 ``` 226 {'replies': 227 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text= 228 "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 229 enabling computers to understand, interpret, and generate human language in a way that is useful.")], 230 _name=None, 231 _meta={'model': 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 232 'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})] 233 } 234 ``` 235 236 <a id="chat/azure.AzureOpenAIChatGenerator.__init__"></a> 237 238 #### AzureOpenAIChatGenerator.\_\_init\_\_ 239 240 ```python 241 def __init__(azure_endpoint: str | None = None, 242 api_version: str | None = "2024-12-01-preview", 243 azure_deployment: str | None = "gpt-4.1-mini", 244 api_key: Secret | None = Secret.from_env_var( 245 "AZURE_OPENAI_API_KEY", strict=False), 246 azure_ad_token: Secret | None = Secret.from_env_var( 247 "AZURE_OPENAI_AD_TOKEN", strict=False), 248 organization: str | None = None, 249 streaming_callback: StreamingCallbackT | None = None, 250 timeout: float | None = None, 251 max_retries: int | None = None, 252 generation_kwargs: dict[str, Any] | None = None, 253 default_headers: dict[str, str] | None = None, 254 tools: ToolsType | None = None, 255 tools_strict: bool = False, 256 *, 257 azure_ad_token_provider: AzureADTokenProvider 258 | AsyncAzureADTokenProvider | None = None, 259 http_client_kwargs: dict[str, Any] | None = None) 260 ``` 261 262 Initialize the Azure OpenAI Chat Generator component. 263 264 **Arguments**: 265 266 - `azure_endpoint`: The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. 267 - `api_version`: The version of the API to use. Defaults to 2024-12-01-preview. 268 - `azure_deployment`: The deployment of the model, usually the model name. 269 - `api_key`: The API key to use for authentication. 270 - `azure_ad_token`: [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 271 - `organization`: Your organization ID, defaults to `None`. For help, see 272 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 273 - `streaming_callback`: A callback function called when a new token is received from the stream. 274 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 275 as an argument. 276 - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the 277 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 278 - `max_retries`: Maximum number of retries to contact OpenAI after an internal error. 279 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 280 - `generation_kwargs`: Other parameters to use for the model. These parameters are sent directly to 281 the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 282 Some of the supported parameters: 283 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 284 including visible output tokens and reasoning tokens. 285 - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. 286 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 287 - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers 288 tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising 289 the top 10% probability mass are considered. 290 - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, 291 the LLM will generate two completions per prompt, resulting in 6 completions total. 292 - `stop`: One or more sequences after which the LLM should stop generating tokens. 293 - `presence_penalty`: The penalty applied if a token is already present. 294 Higher values make the model less likely to repeat the token. 295 - `frequency_penalty`: Penalty applied if a token has already been generated. 296 Higher values make the model less likely to repeat the token. 297 - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the 298 values are the bias to add to that token. 299 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 300 If provided, the output will always be validated against this 301 format (unless the model returns a tool call). 302 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 303 Notes: 304 - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. 305 Older models only support basic version of structured outputs through `{"type": "json_object"}`. 306 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 307 - For structured outputs with streaming, 308 the `response_format` must be a JSON schema and not a Pydantic model. 309 - `default_headers`: Default headers to use for the AzureOpenAI client. 310 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 311 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 312 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 313 - `azure_ad_token_provider`: A function that returns an Azure Active Directory token, will be invoked on 314 every request. 315 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 316 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 317 318 <a id="chat/azure.AzureOpenAIChatGenerator.warm_up"></a> 319 320 #### AzureOpenAIChatGenerator.warm\_up 321 322 ```python 323 def warm_up() 324 ``` 325 326 Warm up the Azure OpenAI chat generator. 327 328 This will warm up the tools registered in the chat generator. 329 This method is idempotent and will only warm up the tools once. 330 331 <a id="chat/azure.AzureOpenAIChatGenerator.to_dict"></a> 332 333 #### AzureOpenAIChatGenerator.to\_dict 334 335 ```python 336 def to_dict() -> dict[str, Any] 337 ``` 338 339 Serialize this component to a dictionary. 340 341 **Returns**: 342 343 The serialized component as a dictionary. 344 345 <a id="chat/azure.AzureOpenAIChatGenerator.from_dict"></a> 346 347 #### AzureOpenAIChatGenerator.from\_dict 348 349 ```python 350 @classmethod 351 def from_dict(cls, data: dict[str, Any]) -> "AzureOpenAIChatGenerator" 352 ``` 353 354 Deserialize this component from a dictionary. 355 356 **Arguments**: 357 358 - `data`: The dictionary representation of this component. 359 360 **Returns**: 361 362 The deserialized component instance. 363 364 <a id="chat/azure.AzureOpenAIChatGenerator.run"></a> 365 366 #### AzureOpenAIChatGenerator.run 367 368 ```python 369 @component.output_types(replies=list[ChatMessage]) 370 def run(messages: list[ChatMessage], 371 streaming_callback: StreamingCallbackT | None = None, 372 generation_kwargs: dict[str, Any] | None = None, 373 *, 374 tools: ToolsType | None = None, 375 tools_strict: bool | None = None) -> dict[str, list[ChatMessage]] 376 ``` 377 378 Invokes chat completion based on the provided messages and generation parameters. 379 380 **Arguments**: 381 382 - `messages`: A list of ChatMessage instances representing the input messages. 383 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 384 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 385 override the parameters passed during component initialization. 386 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 387 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 388 If set, it will override the `tools` parameter provided during initialization. 389 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 390 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 391 If set, it will override the `tools_strict` parameter set during component initialization. 392 393 **Returns**: 394 395 A dictionary with the following key: 396 - `replies`: A list containing the generated responses as ChatMessage instances. 397 398 <a id="chat/azure.AzureOpenAIChatGenerator.run_async"></a> 399 400 #### AzureOpenAIChatGenerator.run\_async 401 402 ```python 403 @component.output_types(replies=list[ChatMessage]) 404 async def run_async( 405 messages: list[ChatMessage], 406 streaming_callback: StreamingCallbackT | None = None, 407 generation_kwargs: dict[str, Any] | None = None, 408 *, 409 tools: ToolsType | None = None, 410 tools_strict: bool | None = None) -> dict[str, list[ChatMessage]] 411 ``` 412 413 Asynchronously invokes chat completion based on the provided messages and generation parameters. 414 415 This is the asynchronous version of the `run` method. It has the same parameters and return values 416 but can be used with `await` in async code. 417 418 **Arguments**: 419 420 - `messages`: A list of ChatMessage instances representing the input messages. 421 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 422 Must be a coroutine. 423 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 424 override the parameters passed during component initialization. 425 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 426 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 427 If set, it will override the `tools` parameter provided during initialization. 428 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 429 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 430 If set, it will override the `tools_strict` parameter set during component initialization. 431 432 **Returns**: 433 434 A dictionary with the following key: 435 - `replies`: A list containing the generated responses as ChatMessage instances. 436 437 <a id="chat/azure_responses"></a> 438 439 ## Module chat/azure\_responses 440 441 <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator"></a> 442 443 ### AzureOpenAIResponsesChatGenerator 444 445 Completes chats using OpenAI's Responses API on Azure. 446 447 It works with the gpt-5 and o-series models and supports streaming responses 448 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 449 format in input and output. 450 451 You can customize how the text is generated by passing parameters to the 452 OpenAI API. Use the `**generation_kwargs` argument when you initialize 453 the component or when you run it. Any parameter that works with 454 `openai.Responses.create` will work here too. 455 456 For details on OpenAI API parameters, see 457 [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses). 458 459 ### Usage example 460 461 ```python 462 from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator 463 from haystack.dataclasses import ChatMessage 464 465 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 466 467 client = AzureOpenAIResponsesChatGenerator( 468 azure_endpoint="https://example-resource.azure.openai.com/", 469 generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}} 470 ) 471 response = client.run(messages) 472 print(response) 473 ``` 474 475 <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.__init__"></a> 476 477 #### AzureOpenAIResponsesChatGenerator.\_\_init\_\_ 478 479 ```python 480 def __init__(*, 481 api_key: Secret | Callable[[], str] 482 | Callable[[], Awaitable[str]] = Secret.from_env_var( 483 "AZURE_OPENAI_API_KEY", strict=False), 484 azure_endpoint: str | None = None, 485 azure_deployment: str = "gpt-5-mini", 486 streaming_callback: StreamingCallbackT | None = None, 487 organization: str | None = None, 488 generation_kwargs: dict[str, Any] | None = None, 489 timeout: float | None = None, 490 max_retries: int | None = None, 491 tools: ToolsType | None = None, 492 tools_strict: bool = False, 493 http_client_kwargs: dict[str, Any] | None = None) 494 ``` 495 496 Initialize the AzureOpenAIResponsesChatGenerator component. 497 498 **Arguments**: 499 500 - `api_key`: The API key to use for authentication. Can be: 501 - A `Secret` object containing the API key. 502 - A `Secret` object containing the [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 503 - A function that returns an Azure Active Directory token. 504 - `azure_endpoint`: The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. 505 - `azure_deployment`: The deployment of the model, usually the model name. 506 - `organization`: Your organization ID, defaults to `None`. For help, see 507 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 508 - `streaming_callback`: A callback function called when a new token is received from the stream. 509 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 510 as an argument. 511 - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the 512 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 513 - `max_retries`: Maximum number of retries to contact OpenAI after an internal error. 514 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 515 - `generation_kwargs`: Other parameters to use for the model. These parameters are sent 516 directly to the OpenAI endpoint. 517 See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for 518 more details. 519 Some of the supported parameters: 520 - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, 521 while lower values like 0.2 will make it more focused and deterministic. 522 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 523 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 524 comprising the top 10% probability mass are considered. 525 - `previous_response_id`: The ID of the previous response. 526 Use this to create multi-turn conversations. 527 - `text_format`: A Pydantic model that enforces the structure of the model's response. 528 If provided, the output will always be validated against this 529 format (unless the model returns a tool call). 530 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 531 - `text`: A JSON schema that enforces the structure of the model's response. 532 If provided, the output will always be validated against this 533 format (unless the model returns a tool call). 534 Notes: 535 - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. 536 - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. 537 - Currently, this component doesn't support streaming for structured outputs. 538 - Older models only support basic version of structured outputs through `{"type": "json_object"}`. 539 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 540 - `reasoning`: A dictionary of parameters for reasoning. For example: 541 - `summary`: The summary of the reasoning. 542 - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`. 543 - `generate_summary`: Whether to generate a summary of the reasoning. 544 Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. 545 For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning). 546 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 547 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 548 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 549 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 550 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 551 552 <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.to_dict"></a> 553 554 #### AzureOpenAIResponsesChatGenerator.to\_dict 555 556 ```python 557 def to_dict() -> dict[str, Any] 558 ``` 559 560 Serialize this component to a dictionary. 561 562 **Returns**: 563 564 The serialized component as a dictionary. 565 566 <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.from_dict"></a> 567 568 #### AzureOpenAIResponsesChatGenerator.from\_dict 569 570 ```python 571 @classmethod 572 def from_dict(cls, data: dict[str, 573 Any]) -> "AzureOpenAIResponsesChatGenerator" 574 ``` 575 576 Deserialize this component from a dictionary. 577 578 **Arguments**: 579 580 - `data`: The dictionary representation of this component. 581 582 **Returns**: 583 584 The deserialized component instance. 585 586 <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.warm_up"></a> 587 588 #### AzureOpenAIResponsesChatGenerator.warm\_up 589 590 ```python 591 def warm_up() 592 ``` 593 594 Warm up the OpenAI responses chat generator. 595 596 This will warm up the tools registered in the chat generator. 597 This method is idempotent and will only warm up the tools once. 598 599 <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.run"></a> 600 601 #### AzureOpenAIResponsesChatGenerator.run 602 603 ```python 604 @component.output_types(replies=list[ChatMessage]) 605 def run(messages: list[ChatMessage], 606 *, 607 streaming_callback: StreamingCallbackT | None = None, 608 generation_kwargs: dict[str, Any] | None = None, 609 tools: ToolsType | list[dict] | None = None, 610 tools_strict: bool | None = None) -> dict[str, list[ChatMessage]] 611 ``` 612 613 Invokes response generation based on the provided messages and generation parameters. 614 615 **Arguments**: 616 617 - `messages`: A list of ChatMessage instances representing the input messages. 618 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 619 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 620 override the parameters passed during component initialization. 621 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 622 - `tools`: The tools that the model can use to prepare calls. If set, it will override the 623 `tools` parameter set during component initialization. This parameter can accept either a 624 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 625 OpenAI/MCP tool definitions. 626 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 627 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 628 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 629 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 630 are strict by default. 631 If set, it will override the `tools_strict` parameter set during component initialization. 632 633 **Returns**: 634 635 A dictionary with the following key: 636 - `replies`: A list containing the generated responses as ChatMessage instances. 637 638 <a id="chat/azure_responses.AzureOpenAIResponsesChatGenerator.run_async"></a> 639 640 #### AzureOpenAIResponsesChatGenerator.run\_async 641 642 ```python 643 @component.output_types(replies=list[ChatMessage]) 644 async def run_async( 645 messages: list[ChatMessage], 646 *, 647 streaming_callback: StreamingCallbackT | None = None, 648 generation_kwargs: dict[str, Any] | None = None, 649 tools: ToolsType | list[dict] | None = None, 650 tools_strict: bool | None = None) -> dict[str, list[ChatMessage]] 651 ``` 652 653 Asynchronously invokes response generation based on the provided messages and generation parameters. 654 655 This is the asynchronous version of the `run` method. It has the same parameters and return values 656 but can be used with `await` in async code. 657 658 **Arguments**: 659 660 - `messages`: A list of ChatMessage instances representing the input messages. 661 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 662 Must be a coroutine. 663 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 664 override the parameters passed during component initialization. 665 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 666 - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the 667 `tools` parameter set during component initialization. This parameter can accept either a list of 668 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 669 OpenAI/MCP tool definitions. 670 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 671 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 672 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 673 If set, it will override the `tools_strict` parameter set during component initialization. 674 675 **Returns**: 676 677 A dictionary with the following key: 678 - `replies`: A list containing the generated responses as ChatMessage instances. 679 680 <a id="chat/fallback"></a> 681 682 ## Module chat/fallback 683 684 <a id="chat/fallback.FallbackChatGenerator"></a> 685 686 ### FallbackChatGenerator 687 688 A chat generator wrapper that tries multiple chat generators sequentially. 689 690 It forwards all parameters transparently to the underlying chat generators and returns the first successful result. 691 Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator. 692 If all chat generators fail, it raises a RuntimeError with details. 693 694 Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only 695 work correctly if the underlying chat generators implement proper timeout handling and raise exceptions 696 when timeouts occur. For predictable latency guarantees, ensure your chat generators: 697 - Support a `timeout` parameter in their initialization 698 - Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming) 699 - Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded 700 701 Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters 702 with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`) 703 typically applies to all connection phases: connection setup, read, write, and pool. For streaming 704 responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for 705 receiving the complete response. 706 707 Failover is automatically triggered when a generator raises any exception, including: 708 - Timeout errors (if the generator implements and raises them) 709 - Rate limit errors (429) 710 - Authentication errors (401) 711 - Context length errors (400) 712 - Server errors (500+) 713 - Any other exception 714 715 <a id="chat/fallback.FallbackChatGenerator.__init__"></a> 716 717 #### FallbackChatGenerator.\_\_init\_\_ 718 719 ```python 720 def __init__(chat_generators: list[ChatGenerator]) -> None 721 ``` 722 723 Creates an instance of FallbackChatGenerator. 724 725 **Arguments**: 726 727 - `chat_generators`: A non-empty list of chat generator components to try in order. 728 729 <a id="chat/fallback.FallbackChatGenerator.to_dict"></a> 730 731 #### FallbackChatGenerator.to\_dict 732 733 ```python 734 def to_dict() -> dict[str, Any] 735 ``` 736 737 Serialize the component, including nested chat generators when they support serialization. 738 739 <a id="chat/fallback.FallbackChatGenerator.from_dict"></a> 740 741 #### FallbackChatGenerator.from\_dict 742 743 ```python 744 @classmethod 745 def from_dict(cls, data: dict[str, Any]) -> FallbackChatGenerator 746 ``` 747 748 Rebuild the component from a serialized representation, restoring nested chat generators. 749 750 <a id="chat/fallback.FallbackChatGenerator.warm_up"></a> 751 752 #### FallbackChatGenerator.warm\_up 753 754 ```python 755 def warm_up() -> None 756 ``` 757 758 Warm up all underlying chat generators. 759 760 This method calls warm_up() on each underlying generator that supports it. 761 762 <a id="chat/fallback.FallbackChatGenerator.run"></a> 763 764 #### FallbackChatGenerator.run 765 766 ```python 767 @component.output_types(replies=list[ChatMessage], meta=dict[str, Any]) 768 def run( 769 messages: list[ChatMessage], 770 generation_kwargs: dict[str, Any] | None = None, 771 tools: ToolsType | None = None, 772 streaming_callback: StreamingCallbackT | None = None 773 ) -> dict[str, list[ChatMessage] | dict[str, Any]] 774 ``` 775 776 Execute chat generators sequentially until one succeeds. 777 778 **Arguments**: 779 780 - `messages`: The conversation history as a list of ChatMessage instances. 781 - `generation_kwargs`: Optional parameters for the chat generator (e.g., temperature, max_tokens). 782 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities. 783 - `streaming_callback`: Optional callable for handling streaming responses. 784 785 **Raises**: 786 787 - `RuntimeError`: If all chat generators fail. 788 789 **Returns**: 790 791 A dictionary with: 792 - "replies": Generated ChatMessage instances from the first successful generator. 793 - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, 794 total_attempts, failed_chat_generators, plus any metadata from the successful generator. 795 796 <a id="chat/fallback.FallbackChatGenerator.run_async"></a> 797 798 #### FallbackChatGenerator.run\_async 799 800 ```python 801 @component.output_types(replies=list[ChatMessage], meta=dict[str, Any]) 802 async def run_async( 803 messages: list[ChatMessage], 804 generation_kwargs: dict[str, Any] | None = None, 805 tools: ToolsType | None = None, 806 streaming_callback: StreamingCallbackT | None = None 807 ) -> dict[str, list[ChatMessage] | dict[str, Any]] 808 ``` 809 810 Asynchronously execute chat generators sequentially until one succeeds. 811 812 **Arguments**: 813 814 - `messages`: The conversation history as a list of ChatMessage instances. 815 - `generation_kwargs`: Optional parameters for the chat generator (e.g., temperature, max_tokens). 816 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities. 817 - `streaming_callback`: Optional callable for handling streaming responses. 818 819 **Raises**: 820 821 - `RuntimeError`: If all chat generators fail. 822 823 **Returns**: 824 825 A dictionary with: 826 - "replies": Generated ChatMessage instances from the first successful generator. 827 - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, 828 total_attempts, failed_chat_generators, plus any metadata from the successful generator. 829 830 <a id="chat/hugging_face_api"></a> 831 832 ## Module chat/hugging\_face\_api 833 834 <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator"></a> 835 836 ### HuggingFaceAPIChatGenerator 837 838 Completes chats using Hugging Face APIs. 839 840 HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 841 format for input and output. Use it to generate text with Hugging Face APIs: 842 - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers) 843 - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints) 844 - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference) 845 846 ### Usage examples 847 848 #### With the serverless inference API (Inference Providers) - free tier available 849 850 ```python 851 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 852 from haystack.dataclasses import ChatMessage 853 from haystack.utils import Secret 854 from haystack.utils.hf import HFGenerationAPIType 855 856 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 857 ChatMessage.from_user("What's Natural Language Processing?")] 858 859 # the api_type can be expressed using the HFGenerationAPIType enum or as a string 860 api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API 861 api_type = "serverless_inference_api" # this is equivalent to the above 862 863 generator = HuggingFaceAPIChatGenerator(api_type=api_type, 864 api_params={"model": "Qwen/Qwen2.5-7B-Instruct", 865 "provider": "together"}, 866 token=Secret.from_token("<your-api-key>")) 867 868 result = generator.run(messages) 869 print(result) 870 ``` 871 872 #### With the serverless inference API (Inference Providers) and text+image input 873 874 ```python 875 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 876 from haystack.dataclasses import ChatMessage, ImageContent 877 from haystack.utils import Secret 878 from haystack.utils.hf import HFGenerationAPIType 879 880 # Create an image from file path, URL, or base64 881 image = ImageContent.from_file_path("path/to/your/image.jpg") 882 883 # Create a multimodal message with both text and image 884 messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])] 885 886 generator = HuggingFaceAPIChatGenerator( 887 api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API, 888 api_params={ 889 "model": "Qwen/Qwen2.5-VL-7B-Instruct", # Vision Language Model 890 "provider": "hyperbolic" 891 }, 892 token=Secret.from_token("<your-api-key>") 893 ) 894 895 result = generator.run(messages) 896 print(result) 897 ``` 898 899 #### With paid inference endpoints 900 901 ```python 902 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 903 from haystack.dataclasses import ChatMessage 904 from haystack.utils import Secret 905 906 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 907 ChatMessage.from_user("What's Natural Language Processing?")] 908 909 generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints", 910 api_params={"url": "<your-inference-endpoint-url>"}, 911 token=Secret.from_token("<your-api-key>")) 912 913 result = generator.run(messages) 914 print(result) 915 916 #### With self-hosted text generation inference 917 918 ```python 919 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 920 from haystack.dataclasses import ChatMessage 921 922 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 923 ChatMessage.from_user("What's Natural Language Processing?")] 924 925 generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference", 926 api_params={"url": "http://localhost:8080"}) 927 928 result = generator.run(messages) 929 print(result) 930 ``` 931 932 <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.__init__"></a> 933 934 #### HuggingFaceAPIChatGenerator.\_\_init\_\_ 935 936 ```python 937 def __init__(api_type: HFGenerationAPIType | str, 938 api_params: dict[str, str], 939 token: Secret | None = Secret.from_env_var( 940 ["HF_API_TOKEN", "HF_TOKEN"], strict=False), 941 generation_kwargs: dict[str, Any] | None = None, 942 stop_words: list[str] | None = None, 943 streaming_callback: StreamingCallbackT | None = None, 944 tools: ToolsType | None = None) 945 ``` 946 947 Initialize the HuggingFaceAPIChatGenerator instance. 948 949 **Arguments**: 950 951 - `api_type`: The type of Hugging Face API to use. Available types: 952 - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference). 953 - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints). 954 - `serverless_inference_api`: See 955 [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers). 956 - `api_params`: A dictionary with the following keys: 957 - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. 958 - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`. 959 - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or 960 `TEXT_GENERATION_INFERENCE`. 961 - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc. 962 - `token`: The Hugging Face token to use as HTTP bearer authorization. 963 Check your HF token in your [account settings](https://huggingface.co/settings/tokens). 964 - `generation_kwargs`: A dictionary with keyword arguments to customize text generation. 965 Some examples: `max_tokens`, `temperature`, `top_p`. 966 For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion). 967 - `stop_words`: An optional list of strings representing the stop words. 968 - `streaming_callback`: An optional callable for handling streaming responses. 969 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 970 The chosen model should support tool/function calling, according to the model card. 971 Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience 972 unexpected behavior. 973 974 <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.warm_up"></a> 975 976 #### HuggingFaceAPIChatGenerator.warm\_up 977 978 ```python 979 def warm_up() 980 ``` 981 982 Warm up the Hugging Face API chat generator. 983 984 This will warm up the tools registered in the chat generator. 985 This method is idempotent and will only warm up the tools once. 986 987 <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.to_dict"></a> 988 989 #### HuggingFaceAPIChatGenerator.to\_dict 990 991 ```python 992 def to_dict() -> dict[str, Any] 993 ``` 994 995 Serialize this component to a dictionary. 996 997 **Returns**: 998 999 A dictionary containing the serialized component. 1000 1001 <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.from_dict"></a> 1002 1003 #### HuggingFaceAPIChatGenerator.from\_dict 1004 1005 ```python 1006 @classmethod 1007 def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIChatGenerator" 1008 ``` 1009 1010 Deserialize this component from a dictionary. 1011 1012 <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.run"></a> 1013 1014 #### HuggingFaceAPIChatGenerator.run 1015 1016 ```python 1017 @component.output_types(replies=list[ChatMessage]) 1018 def run( 1019 messages: list[ChatMessage], 1020 generation_kwargs: dict[str, Any] | None = None, 1021 tools: ToolsType | None = None, 1022 streaming_callback: StreamingCallbackT | None = None 1023 ) -> dict[str, list[ChatMessage]] 1024 ``` 1025 1026 Invoke the text generation inference based on the provided messages and generation parameters. 1027 1028 **Arguments**: 1029 1030 - `messages`: A list of ChatMessage objects representing the input messages. 1031 - `generation_kwargs`: Additional keyword arguments for text generation. 1032 - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override 1033 the `tools` parameter set during component initialization. This parameter can accept either a 1034 list of `Tool` objects or a `Toolset` instance. 1035 - `streaming_callback`: An optional callable for handling streaming responses. If set, it will override the `streaming_callback` 1036 parameter set during component initialization. 1037 1038 **Returns**: 1039 1040 A dictionary with the following keys: 1041 - `replies`: A list containing the generated responses as ChatMessage objects. 1042 1043 <a id="chat/hugging_face_api.HuggingFaceAPIChatGenerator.run_async"></a> 1044 1045 #### HuggingFaceAPIChatGenerator.run\_async 1046 1047 ```python 1048 @component.output_types(replies=list[ChatMessage]) 1049 async def run_async( 1050 messages: list[ChatMessage], 1051 generation_kwargs: dict[str, Any] | None = None, 1052 tools: ToolsType | None = None, 1053 streaming_callback: StreamingCallbackT | None = None 1054 ) -> dict[str, list[ChatMessage]] 1055 ``` 1056 1057 Asynchronously invokes the text generation inference based on the provided messages and generation parameters. 1058 1059 This is the asynchronous version of the `run` method. It has the same parameters 1060 and return values but can be used with `await` in an async code. 1061 1062 **Arguments**: 1063 1064 - `messages`: A list of ChatMessage objects representing the input messages. 1065 - `generation_kwargs`: Additional keyword arguments for text generation. 1066 - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools` 1067 parameter set during component initialization. This parameter can accept either a list of `Tool` objects 1068 or a `Toolset` instance. 1069 - `streaming_callback`: An optional callable for handling streaming responses. If set, it will override the `streaming_callback` 1070 parameter set during component initialization. 1071 1072 **Returns**: 1073 1074 A dictionary with the following keys: 1075 - `replies`: A list containing the generated responses as ChatMessage objects. 1076 1077 <a id="chat/hugging_face_local"></a> 1078 1079 ## Module chat/hugging\_face\_local 1080 1081 <a id="chat/hugging_face_local.default_tool_parser"></a> 1082 1083 #### default\_tool\_parser 1084 1085 ```python 1086 def default_tool_parser(text: str) -> list[ToolCall] | None 1087 ``` 1088 1089 Default implementation for parsing tool calls from model output text. 1090 1091 Uses DEFAULT_TOOL_PATTERN to extract tool calls. 1092 1093 **Arguments**: 1094 1095 - `text`: The text to parse for tool calls. 1096 1097 **Returns**: 1098 1099 A list containing a single ToolCall if a valid tool call is found, None otherwise. 1100 1101 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator"></a> 1102 1103 ### HuggingFaceLocalChatGenerator 1104 1105 Generates chat responses using models from Hugging Face that run locally. 1106 1107 Use this component with chat-based models, 1108 such as `Qwen/Qwen3-0.6B` or `meta-llama/Llama-2-7b-chat-hf`. 1109 LLMs running locally may need powerful hardware. 1110 1111 ### Usage example 1112 1113 ```python 1114 from haystack.components.generators.chat import HuggingFaceLocalChatGenerator 1115 from haystack.dataclasses import ChatMessage 1116 1117 generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B") 1118 generator.warm_up() 1119 messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")] 1120 print(generator.run(messages)) 1121 ``` 1122 1123 ``` 1124 {'replies': 1125 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text= 1126 "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals 1127 with the interaction between computers and human language. It enables computers to understand, interpret, and 1128 generate human language in a valuable way. NLP involves various techniques such as speech recognition, text 1129 analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to 1130 process and derive meaning from human language, improving communication between humans and machines.")], 1131 _name=None, 1132 _meta={'finish_reason': 'stop', 'index': 0, 'model': 1133 'mistralai/Mistral-7B-Instruct-v0.2', 1134 'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}}) 1135 ] 1136 } 1137 ``` 1138 1139 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.__init__"></a> 1140 1141 #### HuggingFaceLocalChatGenerator.\_\_init\_\_ 1142 1143 ```python 1144 def __init__(model: str = "Qwen/Qwen3-0.6B", 1145 task: Literal["text-generation", "text2text-generation"] 1146 | None = None, 1147 device: ComponentDevice | None = None, 1148 token: Secret | None = Secret.from_env_var( 1149 ["HF_API_TOKEN", "HF_TOKEN"], strict=False), 1150 chat_template: str | None = None, 1151 generation_kwargs: dict[str, Any] | None = None, 1152 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 1153 stop_words: list[str] | None = None, 1154 streaming_callback: StreamingCallbackT | None = None, 1155 tools: ToolsType | None = None, 1156 tool_parsing_function: Callable[[str], list[ToolCall] | None] 1157 | None = None, 1158 async_executor: ThreadPoolExecutor | None = None, 1159 *, 1160 enable_thinking: bool = False) -> None 1161 ``` 1162 1163 Initializes the HuggingFaceLocalChatGenerator component. 1164 1165 **Arguments**: 1166 1167 - `model`: The Hugging Face text generation model name or path, 1168 for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`. 1169 The model must be a chat model supporting the ChatML messaging 1170 format. 1171 If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1172 - `task`: The task for the Hugging Face pipeline. Possible options: 1173 - `text-generation`: Supported by decoder models, like GPT. 1174 - `text2text-generation`: Supported by encoder-decoder models, like T5. 1175 If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1176 If not specified, the component calls the Hugging Face API to infer the task from the model name. 1177 - `device`: The device for loading the model. If `None`, automatically selects the default device. 1178 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 1179 - `token`: The token to use as HTTP bearer authorization for remote files. 1180 If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1181 - `chat_template`: Specifies an optional Jinja template for formatting chat 1182 messages. Most high-quality chat models have their own templates, but for models without this 1183 feature or if you prefer a custom template, use this parameter. 1184 - `generation_kwargs`: A dictionary with keyword arguments to customize text generation. 1185 Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. 1186 See Hugging Face's documentation for more information: 1187 - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation) 1188 - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig) 1189 The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens. 1190 - `huggingface_pipeline_kwargs`: Dictionary with keyword arguments to initialize the 1191 Hugging Face pipeline for text generation. 1192 These keyword arguments provide fine-grained control over the Hugging Face pipeline. 1193 In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. 1194 For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). 1195 In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) 1196 - `stop_words`: A list of stop words. If the model generates a stop word, the generation stops. 1197 If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. 1198 For some chat models, the output includes both the new text and the original prompt. 1199 In these cases, make sure your prompt has no stop words. 1200 - `streaming_callback`: An optional callable for handling streaming responses. 1201 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1202 - `tool_parsing_function`: A callable that takes a string and returns a list of ToolCall objects or None. 1203 If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern. 1204 - `async_executor`: Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be 1205 initialized and used 1206 - `enable_thinking`: Whether to enable thinking mode in the chat template for thinking-capable models. 1207 When enabled, the model generates intermediate reasoning before the final response. Defaults to False. 1208 1209 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.__del__"></a> 1210 1211 #### HuggingFaceLocalChatGenerator.\_\_del\_\_ 1212 1213 ```python 1214 def __del__() -> None 1215 ``` 1216 1217 Cleanup when the instance is being destroyed. 1218 1219 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.shutdown"></a> 1220 1221 #### HuggingFaceLocalChatGenerator.shutdown 1222 1223 ```python 1224 def shutdown() -> None 1225 ``` 1226 1227 Explicitly shutdown the executor if we own it. 1228 1229 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.warm_up"></a> 1230 1231 #### HuggingFaceLocalChatGenerator.warm\_up 1232 1233 ```python 1234 def warm_up() -> None 1235 ``` 1236 1237 Initializes the component and warms up tools if provided. 1238 1239 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.to_dict"></a> 1240 1241 #### HuggingFaceLocalChatGenerator.to\_dict 1242 1243 ```python 1244 def to_dict() -> dict[str, Any] 1245 ``` 1246 1247 Serializes the component to a dictionary. 1248 1249 **Returns**: 1250 1251 Dictionary with serialized data. 1252 1253 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.from_dict"></a> 1254 1255 #### HuggingFaceLocalChatGenerator.from\_dict 1256 1257 ```python 1258 @classmethod 1259 def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceLocalChatGenerator" 1260 ``` 1261 1262 Deserializes the component from a dictionary. 1263 1264 **Arguments**: 1265 1266 - `data`: The dictionary to deserialize from. 1267 1268 **Returns**: 1269 1270 The deserialized component. 1271 1272 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.run"></a> 1273 1274 #### HuggingFaceLocalChatGenerator.run 1275 1276 ```python 1277 @component.output_types(replies=list[ChatMessage]) 1278 def run(messages: list[ChatMessage], 1279 generation_kwargs: dict[str, Any] | None = None, 1280 streaming_callback: StreamingCallbackT | None = None, 1281 tools: ToolsType | None = None) -> dict[str, list[ChatMessage]] 1282 ``` 1283 1284 Invoke text generation inference based on the provided messages and generation parameters. 1285 1286 **Arguments**: 1287 1288 - `messages`: A list of ChatMessage objects representing the input messages. 1289 - `generation_kwargs`: Additional keyword arguments for text generation. 1290 - `streaming_callback`: An optional callable for handling streaming responses. 1291 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1292 If set, it will override the `tools` parameter provided during initialization. 1293 1294 **Returns**: 1295 1296 A dictionary with the following keys: 1297 - `replies`: A list containing the generated responses as ChatMessage instances. 1298 1299 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.create_message"></a> 1300 1301 #### HuggingFaceLocalChatGenerator.create\_message 1302 1303 ```python 1304 def create_message(text: str, 1305 index: int, 1306 tokenizer: Union["PreTrainedTokenizer", 1307 "PreTrainedTokenizerFast"], 1308 prompt: str, 1309 generation_kwargs: dict[str, Any], 1310 parse_tool_calls: bool = False) -> ChatMessage 1311 ``` 1312 1313 Create a ChatMessage instance from the provided text, populated with metadata. 1314 1315 **Arguments**: 1316 1317 - `text`: The generated text. 1318 - `index`: The index of the generated text. 1319 - `tokenizer`: The tokenizer used for generation. 1320 - `prompt`: The prompt used for generation. 1321 - `generation_kwargs`: The generation parameters. 1322 - `parse_tool_calls`: Whether to attempt parsing tool calls from the text. 1323 1324 **Returns**: 1325 1326 A ChatMessage instance. 1327 1328 <a id="chat/hugging_face_local.HuggingFaceLocalChatGenerator.run_async"></a> 1329 1330 #### HuggingFaceLocalChatGenerator.run\_async 1331 1332 ```python 1333 @component.output_types(replies=list[ChatMessage]) 1334 async def run_async( 1335 messages: list[ChatMessage], 1336 generation_kwargs: dict[str, Any] | None = None, 1337 streaming_callback: StreamingCallbackT | None = None, 1338 tools: ToolsType | None = None) -> dict[str, list[ChatMessage]] 1339 ``` 1340 1341 Asynchronously invokes text generation inference based on the provided messages and generation parameters. 1342 1343 This is the asynchronous version of the `run` method. It has the same parameters 1344 and return values but can be used with `await` in an async code. 1345 1346 **Arguments**: 1347 1348 - `messages`: A list of ChatMessage objects representing the input messages. 1349 - `generation_kwargs`: Additional keyword arguments for text generation. 1350 - `streaming_callback`: An optional callable for handling streaming responses. 1351 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1352 If set, it will override the `tools` parameter provided during initialization. 1353 1354 **Returns**: 1355 1356 A dictionary with the following keys: 1357 - `replies`: A list containing the generated responses as ChatMessage instances. 1358 1359 <a id="chat/openai"></a> 1360 1361 ## Module chat/openai 1362 1363 <a id="chat/openai.OpenAIChatGenerator"></a> 1364 1365 ### OpenAIChatGenerator 1366 1367 Completes chats using OpenAI's large language models (LLMs). 1368 1369 It works with the gpt-4 and gpt-5 series models and supports streaming responses 1370 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1371 format in input and output. 1372 1373 You can customize how the text is generated by passing parameters to the 1374 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1375 the component or when you run it. Any parameter that works with 1376 `openai.ChatCompletion.create` will work here too. 1377 1378 For details on OpenAI API parameters, see 1379 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 1380 1381 ### Usage example 1382 1383 ```python 1384 from haystack.components.generators.chat import OpenAIChatGenerator 1385 from haystack.dataclasses import ChatMessage 1386 1387 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 1388 1389 client = OpenAIChatGenerator() 1390 response = client.run(messages) 1391 print(response) 1392 ``` 1393 Output: 1394 ``` 1395 {'replies': 1396 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content= 1397 [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence 1398 that focuses on enabling computers to understand, interpret, and generate human language in 1399 a way that is meaningful and useful.")], 1400 _name=None, 1401 _meta={'model': 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 1402 'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}}) 1403 ] 1404 } 1405 ``` 1406 1407 <a id="chat/openai.OpenAIChatGenerator.__init__"></a> 1408 1409 #### OpenAIChatGenerator.\_\_init\_\_ 1410 1411 ```python 1412 def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 1413 model: str = "gpt-5-mini", 1414 streaming_callback: StreamingCallbackT | None = None, 1415 api_base_url: str | None = None, 1416 organization: str | None = None, 1417 generation_kwargs: dict[str, Any] | None = None, 1418 timeout: float | None = None, 1419 max_retries: int | None = None, 1420 tools: ToolsType | None = None, 1421 tools_strict: bool = False, 1422 http_client_kwargs: dict[str, Any] | None = None) 1423 ``` 1424 1425 Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini 1426 1427 Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' 1428 environment variables to override the `timeout` and `max_retries` parameters respectively 1429 in the OpenAI client. 1430 1431 **Arguments**: 1432 1433 - `api_key`: The OpenAI API key. 1434 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 1435 during initialization. 1436 - `model`: The name of the model to use. 1437 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 1438 The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 1439 as an argument. 1440 - `api_base_url`: An optional base URL. 1441 - `organization`: Your organization ID, defaults to `None`. See 1442 [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 1443 - `generation_kwargs`: Other parameters to use for the model. These parameters are sent directly to 1444 the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for 1445 more details. 1446 Some of the supported parameters: 1447 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 1448 including visible output tokens and reasoning tokens. 1449 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 1450 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 1451 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 1452 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 1453 comprising the top 10% probability mass are considered. 1454 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 1455 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 1456 - `stop`: One or more sequences after which the LLM should stop generating tokens. 1457 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 1458 the model will be less likely to repeat the same token in the text. 1459 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 1460 Bigger values mean the model will be less likely to repeat the same token in the text. 1461 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 1462 values are the bias to add to that token. 1463 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 1464 If provided, the output will always be validated against this 1465 format (unless the model returns a tool call). 1466 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 1467 Notes: 1468 - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. 1469 Older models only support basic version of structured outputs through `{"type": "json_object"}`. 1470 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 1471 - For structured outputs with streaming, 1472 the `response_format` must be a JSON schema and not a Pydantic model. 1473 - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the 1474 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 1475 - `max_retries`: Maximum number of retries to contact OpenAI after an internal error. 1476 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 1477 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1478 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1479 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1480 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 1481 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 1482 1483 <a id="chat/openai.OpenAIChatGenerator.warm_up"></a> 1484 1485 #### OpenAIChatGenerator.warm\_up 1486 1487 ```python 1488 def warm_up() 1489 ``` 1490 1491 Warm up the OpenAI chat generator. 1492 1493 This will warm up the tools registered in the chat generator. 1494 This method is idempotent and will only warm up the tools once. 1495 1496 <a id="chat/openai.OpenAIChatGenerator.to_dict"></a> 1497 1498 #### OpenAIChatGenerator.to\_dict 1499 1500 ```python 1501 def to_dict() -> dict[str, Any] 1502 ``` 1503 1504 Serialize this component to a dictionary. 1505 1506 **Returns**: 1507 1508 The serialized component as a dictionary. 1509 1510 <a id="chat/openai.OpenAIChatGenerator.from_dict"></a> 1511 1512 #### OpenAIChatGenerator.from\_dict 1513 1514 ```python 1515 @classmethod 1516 def from_dict(cls, data: dict[str, Any]) -> "OpenAIChatGenerator" 1517 ``` 1518 1519 Deserialize this component from a dictionary. 1520 1521 **Arguments**: 1522 1523 - `data`: The dictionary representation of this component. 1524 1525 **Returns**: 1526 1527 The deserialized component instance. 1528 1529 <a id="chat/openai.OpenAIChatGenerator.run"></a> 1530 1531 #### OpenAIChatGenerator.run 1532 1533 ```python 1534 @component.output_types(replies=list[ChatMessage]) 1535 def run(messages: list[ChatMessage], 1536 streaming_callback: StreamingCallbackT | None = None, 1537 generation_kwargs: dict[str, Any] | None = None, 1538 *, 1539 tools: ToolsType | None = None, 1540 tools_strict: bool | None = None) -> dict[str, list[ChatMessage]] 1541 ``` 1542 1543 Invokes chat completion based on the provided messages and generation parameters. 1544 1545 **Arguments**: 1546 1547 - `messages`: A list of ChatMessage instances representing the input messages. 1548 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 1549 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 1550 override the parameters passed during component initialization. 1551 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 1552 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1553 If set, it will override the `tools` parameter provided during initialization. 1554 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1555 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1556 If set, it will override the `tools_strict` parameter set during component initialization. 1557 1558 **Returns**: 1559 1560 A dictionary with the following key: 1561 - `replies`: A list containing the generated responses as ChatMessage instances. 1562 1563 <a id="chat/openai.OpenAIChatGenerator.run_async"></a> 1564 1565 #### OpenAIChatGenerator.run\_async 1566 1567 ```python 1568 @component.output_types(replies=list[ChatMessage]) 1569 async def run_async( 1570 messages: list[ChatMessage], 1571 streaming_callback: StreamingCallbackT | None = None, 1572 generation_kwargs: dict[str, Any] | None = None, 1573 *, 1574 tools: ToolsType | None = None, 1575 tools_strict: bool | None = None) -> dict[str, list[ChatMessage]] 1576 ``` 1577 1578 Asynchronously invokes chat completion based on the provided messages and generation parameters. 1579 1580 This is the asynchronous version of the `run` method. It has the same parameters and return values 1581 but can be used with `await` in async code. 1582 1583 **Arguments**: 1584 1585 - `messages`: A list of ChatMessage instances representing the input messages. 1586 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 1587 Must be a coroutine. 1588 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 1589 override the parameters passed during component initialization. 1590 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 1591 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1592 If set, it will override the `tools` parameter provided during initialization. 1593 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1594 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1595 If set, it will override the `tools_strict` parameter set during component initialization. 1596 1597 **Returns**: 1598 1599 A dictionary with the following key: 1600 - `replies`: A list containing the generated responses as ChatMessage instances. 1601 1602 <a id="chat/openai_responses"></a> 1603 1604 ## Module chat/openai\_responses 1605 1606 <a id="chat/openai_responses.OpenAIResponsesChatGenerator"></a> 1607 1608 ### OpenAIResponsesChatGenerator 1609 1610 Completes chats using OpenAI's Responses API. 1611 1612 It works with the gpt-4 and o-series models and supports streaming responses 1613 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1614 format in input and output. 1615 1616 You can customize how the text is generated by passing parameters to the 1617 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1618 the component or when you run it. Any parameter that works with 1619 `openai.Responses.create` will work here too. 1620 1621 For details on OpenAI API parameters, see 1622 [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses). 1623 1624 ### Usage example 1625 1626 ```python 1627 from haystack.components.generators.chat import OpenAIResponsesChatGenerator 1628 from haystack.dataclasses import ChatMessage 1629 1630 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 1631 1632 client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}) 1633 response = client.run(messages) 1634 print(response) 1635 ``` 1636 1637 <a id="chat/openai_responses.OpenAIResponsesChatGenerator.__init__"></a> 1638 1639 #### OpenAIResponsesChatGenerator.\_\_init\_\_ 1640 1641 ```python 1642 def __init__(*, 1643 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 1644 model: str = "gpt-5-mini", 1645 streaming_callback: StreamingCallbackT | None = None, 1646 api_base_url: str | None = None, 1647 organization: str | None = None, 1648 generation_kwargs: dict[str, Any] | None = None, 1649 timeout: float | None = None, 1650 max_retries: int | None = None, 1651 tools: ToolsType | list[dict] | None = None, 1652 tools_strict: bool = False, 1653 http_client_kwargs: dict[str, Any] | None = None) 1654 ``` 1655 1656 Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default. 1657 1658 Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' 1659 environment variables to override the `timeout` and `max_retries` parameters respectively 1660 in the OpenAI client. 1661 1662 **Arguments**: 1663 1664 - `api_key`: The OpenAI API key. 1665 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 1666 during initialization. 1667 - `model`: The name of the model to use. 1668 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 1669 The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 1670 as an argument. 1671 - `api_base_url`: An optional base URL. 1672 - `organization`: Your organization ID, defaults to `None`. See 1673 [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 1674 - `generation_kwargs`: Other parameters to use for the model. These parameters are sent 1675 directly to the OpenAI endpoint. 1676 See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for 1677 more details. 1678 Some of the supported parameters: 1679 - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, 1680 while lower values like 0.2 will make it more focused and deterministic. 1681 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 1682 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 1683 comprising the top 10% probability mass are considered. 1684 - `previous_response_id`: The ID of the previous response. 1685 Use this to create multi-turn conversations. 1686 - `text_format`: A Pydantic model that enforces the structure of the model's response. 1687 If provided, the output will always be validated against this 1688 format (unless the model returns a tool call). 1689 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 1690 - `text`: A JSON schema that enforces the structure of the model's response. 1691 If provided, the output will always be validated against this 1692 format (unless the model returns a tool call). 1693 Notes: 1694 - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. 1695 - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. 1696 - Currently, this component doesn't support streaming for structured outputs. 1697 - Older models only support basic version of structured outputs through `{"type": "json_object"}`. 1698 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 1699 - `reasoning`: A dictionary of parameters for reasoning. For example: 1700 - `summary`: The summary of the reasoning. 1701 - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`. 1702 - `generate_summary`: Whether to generate a summary of the reasoning. 1703 Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. 1704 For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning). 1705 - `timeout`: Timeout for OpenAI client calls. If not set, it defaults to either the 1706 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 1707 - `max_retries`: Maximum number of retries to contact OpenAI after an internal error. 1708 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 1709 - `tools`: The tools that the model can use to prepare calls. This parameter can accept either a 1710 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1711 OpenAI/MCP tool definitions. 1712 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1713 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 1714 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 1715 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 1716 are strict by default. 1717 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 1718 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 1719 1720 <a id="chat/openai_responses.OpenAIResponsesChatGenerator.warm_up"></a> 1721 1722 #### OpenAIResponsesChatGenerator.warm\_up 1723 1724 ```python 1725 def warm_up() 1726 ``` 1727 1728 Warm up the OpenAI responses chat generator. 1729 1730 This will warm up the tools registered in the chat generator. 1731 This method is idempotent and will only warm up the tools once. 1732 1733 <a id="chat/openai_responses.OpenAIResponsesChatGenerator.to_dict"></a> 1734 1735 #### OpenAIResponsesChatGenerator.to\_dict 1736 1737 ```python 1738 def to_dict() -> dict[str, Any] 1739 ``` 1740 1741 Serialize this component to a dictionary. 1742 1743 **Returns**: 1744 1745 The serialized component as a dictionary. 1746 1747 <a id="chat/openai_responses.OpenAIResponsesChatGenerator.from_dict"></a> 1748 1749 #### OpenAIResponsesChatGenerator.from\_dict 1750 1751 ```python 1752 @classmethod 1753 def from_dict(cls, data: dict[str, Any]) -> "OpenAIResponsesChatGenerator" 1754 ``` 1755 1756 Deserialize this component from a dictionary. 1757 1758 **Arguments**: 1759 1760 - `data`: The dictionary representation of this component. 1761 1762 **Returns**: 1763 1764 The deserialized component instance. 1765 1766 <a id="chat/openai_responses.OpenAIResponsesChatGenerator.run"></a> 1767 1768 #### OpenAIResponsesChatGenerator.run 1769 1770 ```python 1771 @component.output_types(replies=list[ChatMessage]) 1772 def run(messages: list[ChatMessage], 1773 *, 1774 streaming_callback: StreamingCallbackT | None = None, 1775 generation_kwargs: dict[str, Any] | None = None, 1776 tools: ToolsType | list[dict] | None = None, 1777 tools_strict: bool | None = None) -> dict[str, list[ChatMessage]] 1778 ``` 1779 1780 Invokes response generation based on the provided messages and generation parameters. 1781 1782 **Arguments**: 1783 1784 - `messages`: A list of ChatMessage instances representing the input messages. 1785 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 1786 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 1787 override the parameters passed during component initialization. 1788 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 1789 - `tools`: The tools that the model can use to prepare calls. If set, it will override the 1790 `tools` parameter set during component initialization. This parameter can accept either a 1791 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1792 OpenAI/MCP tool definitions. 1793 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1794 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 1795 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 1796 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 1797 are strict by default. 1798 If set, it will override the `tools_strict` parameter set during component initialization. 1799 1800 **Returns**: 1801 1802 A dictionary with the following key: 1803 - `replies`: A list containing the generated responses as ChatMessage instances. 1804 1805 <a id="chat/openai_responses.OpenAIResponsesChatGenerator.run_async"></a> 1806 1807 #### OpenAIResponsesChatGenerator.run\_async 1808 1809 ```python 1810 @component.output_types(replies=list[ChatMessage]) 1811 async def run_async( 1812 messages: list[ChatMessage], 1813 *, 1814 streaming_callback: StreamingCallbackT | None = None, 1815 generation_kwargs: dict[str, Any] | None = None, 1816 tools: ToolsType | list[dict] | None = None, 1817 tools_strict: bool | None = None) -> dict[str, list[ChatMessage]] 1818 ``` 1819 1820 Asynchronously invokes response generation based on the provided messages and generation parameters. 1821 1822 This is the asynchronous version of the `run` method. It has the same parameters and return values 1823 but can be used with `await` in async code. 1824 1825 **Arguments**: 1826 1827 - `messages`: A list of ChatMessage instances representing the input messages. 1828 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 1829 Must be a coroutine. 1830 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will 1831 override the parameters passed during component initialization. 1832 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 1833 - `tools`: A list of tools or a Toolset for which the model can prepare calls. If set, it will override the 1834 `tools` parameter set during component initialization. This parameter can accept either a list of 1835 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1836 OpenAI/MCP tool definitions. 1837 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1838 - `tools_strict`: Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1839 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1840 If set, it will override the `tools_strict` parameter set during component initialization. 1841 1842 **Returns**: 1843 1844 A dictionary with the following key: 1845 - `replies`: A list containing the generated responses as ChatMessage instances. 1846 1847 <a id="hugging_face_api"></a> 1848 1849 ## Module hugging\_face\_api 1850 1851 <a id="hugging_face_api.HuggingFaceAPIGenerator"></a> 1852 1853 ### HuggingFaceAPIGenerator 1854 1855 Generates text using Hugging Face APIs. 1856 1857 Use it with the following Hugging Face APIs: 1858 - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints) 1859 - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference) 1860 1861 **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the 1862 `text_generation` endpoint. Generative models are now only available through providers supporting the 1863 `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API. 1864 Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint. 1865 1866 ### Usage examples 1867 1868 #### With Hugging Face Inference Endpoints 1869 1870 1871 #### With self-hosted text generation inference 1872 1873 #### With the free serverless inference API 1874 1875 Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the 1876 `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the 1877 `chat_completion` endpoint. 1878 1879 ```python 1880 from haystack.components.generators import HuggingFaceAPIGenerator 1881 from haystack.utils import Secret 1882 1883 generator = HuggingFaceAPIGenerator(api_type="inference_endpoints", 1884 api_params={"url": "<your-inference-endpoint-url>"}, 1885 token=Secret.from_token("<your-api-key>")) 1886 1887 result = generator.run(prompt="What's Natural Language Processing?") 1888 print(result) 1889 ``` 1890 ```python 1891 from haystack.components.generators import HuggingFaceAPIGenerator 1892 1893 generator = HuggingFaceAPIGenerator(api_type="text_generation_inference", 1894 api_params={"url": "http://localhost:8080"}) 1895 1896 result = generator.run(prompt="What's Natural Language Processing?") 1897 print(result) 1898 ``` 1899 ```python 1900 from haystack.components.generators import HuggingFaceAPIGenerator 1901 from haystack.utils import Secret 1902 1903 generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api", 1904 api_params={"model": "HuggingFaceH4/zephyr-7b-beta"}, 1905 token=Secret.from_token("<your-api-key>")) 1906 1907 result = generator.run(prompt="What's Natural Language Processing?") 1908 print(result) 1909 ``` 1910 1911 <a id="hugging_face_api.HuggingFaceAPIGenerator.__init__"></a> 1912 1913 #### HuggingFaceAPIGenerator.\_\_init\_\_ 1914 1915 ```python 1916 def __init__(api_type: HFGenerationAPIType | str, 1917 api_params: dict[str, str], 1918 token: Secret | None = Secret.from_env_var( 1919 ["HF_API_TOKEN", "HF_TOKEN"], strict=False), 1920 generation_kwargs: dict[str, Any] | None = None, 1921 stop_words: list[str] | None = None, 1922 streaming_callback: StreamingCallbackT | None = None) 1923 ``` 1924 1925 Initialize the HuggingFaceAPIGenerator instance. 1926 1927 **Arguments**: 1928 1929 - `api_type`: The type of Hugging Face API to use. Available types: 1930 - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference). 1931 - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints). 1932 - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api). 1933 This might no longer work due to changes in the models offered in the Hugging Face Inference API. 1934 Please use the `HuggingFaceAPIChatGenerator` component instead. 1935 - `api_params`: A dictionary with the following keys: 1936 - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. 1937 - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or 1938 `TEXT_GENERATION_INFERENCE`. 1939 - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc. 1940 - `token`: The Hugging Face token to use as HTTP bearer authorization. 1941 Check your HF token in your [account settings](https://huggingface.co/settings/tokens). 1942 - `generation_kwargs`: A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`, 1943 `temperature`, `top_k`, `top_p`. 1944 For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) 1945 for more information. 1946 - `stop_words`: An optional list of strings representing the stop words. 1947 - `streaming_callback`: An optional callable for handling streaming responses. 1948 1949 <a id="hugging_face_api.HuggingFaceAPIGenerator.to_dict"></a> 1950 1951 #### HuggingFaceAPIGenerator.to\_dict 1952 1953 ```python 1954 def to_dict() -> dict[str, Any] 1955 ``` 1956 1957 Serialize this component to a dictionary. 1958 1959 **Returns**: 1960 1961 A dictionary containing the serialized component. 1962 1963 <a id="hugging_face_api.HuggingFaceAPIGenerator.from_dict"></a> 1964 1965 #### HuggingFaceAPIGenerator.from\_dict 1966 1967 ```python 1968 @classmethod 1969 def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceAPIGenerator" 1970 ``` 1971 1972 Deserialize this component from a dictionary. 1973 1974 <a id="hugging_face_api.HuggingFaceAPIGenerator.run"></a> 1975 1976 #### HuggingFaceAPIGenerator.run 1977 1978 ```python 1979 @component.output_types(replies=list[str], meta=list[dict[str, Any]]) 1980 def run(prompt: str, 1981 streaming_callback: StreamingCallbackT | None = None, 1982 generation_kwargs: dict[str, Any] | None = None) 1983 ``` 1984 1985 Invoke the text generation inference for the given prompt and generation parameters. 1986 1987 **Arguments**: 1988 1989 - `prompt`: A string representing the prompt. 1990 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 1991 - `generation_kwargs`: Additional keyword arguments for text generation. 1992 1993 **Returns**: 1994 1995 A dictionary with the generated replies and metadata. Both are lists of length n. 1996 - replies: A list of strings representing the generated replies. 1997 1998 <a id="hugging_face_local"></a> 1999 2000 ## Module hugging\_face\_local 2001 2002 <a id="hugging_face_local.HuggingFaceLocalGenerator"></a> 2003 2004 ### HuggingFaceLocalGenerator 2005 2006 Generates text using models from Hugging Face that run locally. 2007 2008 LLMs running locally may need powerful hardware. 2009 2010 ### Usage example 2011 2012 ```python 2013 from haystack.components.generators import HuggingFaceLocalGenerator 2014 2015 generator = HuggingFaceLocalGenerator( 2016 model="google/flan-t5-large", 2017 task="text2text-generation", 2018 generation_kwargs={"max_new_tokens": 100, "temperature": 0.9}) 2019 2020 generator.warm_up() 2021 2022 print(generator.run("Who is the best American actor?")) 2023 # {'replies': ['John Cusack']} 2024 ``` 2025 2026 <a id="hugging_face_local.HuggingFaceLocalGenerator.__init__"></a> 2027 2028 #### HuggingFaceLocalGenerator.\_\_init\_\_ 2029 2030 ```python 2031 def __init__(model: str = "google/flan-t5-base", 2032 task: Literal["text-generation", "text2text-generation"] 2033 | None = None, 2034 device: ComponentDevice | None = None, 2035 token: Secret | None = Secret.from_env_var( 2036 ["HF_API_TOKEN", "HF_TOKEN"], strict=False), 2037 generation_kwargs: dict[str, Any] | None = None, 2038 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 2039 stop_words: list[str] | None = None, 2040 streaming_callback: StreamingCallbackT | None = None) 2041 ``` 2042 2043 Creates an instance of a HuggingFaceLocalGenerator. 2044 2045 **Arguments**: 2046 2047 - `model`: The Hugging Face text generation model name or path. 2048 - `task`: The task for the Hugging Face pipeline. Possible options: 2049 - `text-generation`: Supported by decoder models, like GPT. 2050 - `text2text-generation`: Supported by encoder-decoder models, like T5. 2051 If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 2052 If not specified, the component calls the Hugging Face API to infer the task from the model name. 2053 - `device`: The device for loading the model. If `None`, automatically selects the default device. 2054 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 2055 - `token`: The token to use as HTTP bearer authorization for remote files. 2056 If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 2057 - `generation_kwargs`: A dictionary with keyword arguments to customize text generation. 2058 Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. 2059 See Hugging Face's documentation for more information: 2060 - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation) 2061 - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig) 2062 - `huggingface_pipeline_kwargs`: Dictionary with keyword arguments to initialize the 2063 Hugging Face pipeline for text generation. 2064 These keyword arguments provide fine-grained control over the Hugging Face pipeline. 2065 In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. 2066 For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). 2067 In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization: 2068 [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) 2069 - `stop_words`: If the model generates a stop word, the generation stops. 2070 If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. 2071 For some chat models, the output includes both the new text and the original prompt. 2072 In these cases, make sure your prompt has no stop words. 2073 - `streaming_callback`: An optional callable for handling streaming responses. 2074 2075 <a id="hugging_face_local.HuggingFaceLocalGenerator.warm_up"></a> 2076 2077 #### HuggingFaceLocalGenerator.warm\_up 2078 2079 ```python 2080 def warm_up() 2081 ``` 2082 2083 Initializes the component. 2084 2085 <a id="hugging_face_local.HuggingFaceLocalGenerator.to_dict"></a> 2086 2087 #### HuggingFaceLocalGenerator.to\_dict 2088 2089 ```python 2090 def to_dict() -> dict[str, Any] 2091 ``` 2092 2093 Serializes the component to a dictionary. 2094 2095 **Returns**: 2096 2097 Dictionary with serialized data. 2098 2099 <a id="hugging_face_local.HuggingFaceLocalGenerator.from_dict"></a> 2100 2101 #### HuggingFaceLocalGenerator.from\_dict 2102 2103 ```python 2104 @classmethod 2105 def from_dict(cls, data: dict[str, Any]) -> "HuggingFaceLocalGenerator" 2106 ``` 2107 2108 Deserializes the component from a dictionary. 2109 2110 **Arguments**: 2111 2112 - `data`: The dictionary to deserialize from. 2113 2114 **Returns**: 2115 2116 The deserialized component. 2117 2118 <a id="hugging_face_local.HuggingFaceLocalGenerator.run"></a> 2119 2120 #### HuggingFaceLocalGenerator.run 2121 2122 ```python 2123 @component.output_types(replies=list[str]) 2124 def run(prompt: str, 2125 streaming_callback: StreamingCallbackT | None = None, 2126 generation_kwargs: dict[str, Any] | None = None) 2127 ``` 2128 2129 Run the text generation model on the given prompt. 2130 2131 **Arguments**: 2132 2133 - `prompt`: A string representing the prompt. 2134 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 2135 - `generation_kwargs`: Additional keyword arguments for text generation. 2136 2137 **Returns**: 2138 2139 A dictionary containing the generated replies. 2140 - replies: A list of strings representing the generated replies. 2141 2142 <a id="openai"></a> 2143 2144 ## Module openai 2145 2146 <a id="openai.OpenAIGenerator"></a> 2147 2148 ### OpenAIGenerator 2149 2150 Generates text using OpenAI's large language models (LLMs). 2151 2152 It works with the gpt-4 and gpt-5 series models and supports streaming responses 2153 from OpenAI API. It uses strings as input and output. 2154 2155 You can customize how the text is generated by passing parameters to the 2156 OpenAI API. Use the `**generation_kwargs` argument when you initialize 2157 the component or when you run it. Any parameter that works with 2158 `openai.ChatCompletion.create` will work here too. 2159 2160 2161 For details on OpenAI API parameters, see 2162 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 2163 2164 ### Usage example 2165 2166 ```python 2167 from haystack.components.generators import OpenAIGenerator 2168 client = OpenAIGenerator() 2169 response = client.run("What's Natural Language Processing? Be brief.") 2170 print(response) 2171 2172 >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 2173 >> the interaction between computers and human language. It involves enabling computers to understand, interpret, 2174 >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model': 2175 >> 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16, 2176 >> 'completion_tokens': 49, 'total_tokens': 65}}]} 2177 ``` 2178 2179 <a id="openai.OpenAIGenerator.__init__"></a> 2180 2181 #### OpenAIGenerator.\_\_init\_\_ 2182 2183 ```python 2184 def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 2185 model: str = "gpt-5-mini", 2186 streaming_callback: StreamingCallbackT | None = None, 2187 api_base_url: str | None = None, 2188 organization: str | None = None, 2189 system_prompt: str | None = None, 2190 generation_kwargs: dict[str, Any] | None = None, 2191 timeout: float | None = None, 2192 max_retries: int | None = None, 2193 http_client_kwargs: dict[str, Any] | None = None) 2194 ``` 2195 2196 Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini 2197 2198 By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters 2199 in the OpenAI client. 2200 2201 **Arguments**: 2202 2203 - `api_key`: The OpenAI API key to connect to OpenAI. 2204 - `model`: The name of the model to use. 2205 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 2206 The callback function accepts StreamingChunk as an argument. 2207 - `api_base_url`: An optional base URL. 2208 - `organization`: The Organization ID, defaults to `None`. 2209 - `system_prompt`: The system prompt to use for text generation. If not provided, the system prompt is 2210 omitted, and the default system prompt of the model is used. 2211 - `generation_kwargs`: Other parameters to use for the model. These parameters are all sent directly to 2212 the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for 2213 more details. 2214 Some of the supported parameters: 2215 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 2216 including visible output tokens and reasoning tokens. 2217 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 2218 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 2219 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 2220 considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens 2221 comprising the top 10% probability mass are considered. 2222 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 2223 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 2224 - `stop`: One or more sequences after which the LLM should stop generating tokens. 2225 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 2226 the model will be less likely to repeat the same token in the text. 2227 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 2228 Bigger values mean the model will be less likely to repeat the same token in the text. 2229 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 2230 values are the bias to add to that token. 2231 - `timeout`: Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable 2232 or set to 30. 2233 - `max_retries`: Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred 2234 from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 2235 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 2236 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 2237 2238 <a id="openai.OpenAIGenerator.to_dict"></a> 2239 2240 #### OpenAIGenerator.to\_dict 2241 2242 ```python 2243 def to_dict() -> dict[str, Any] 2244 ``` 2245 2246 Serialize this component to a dictionary. 2247 2248 **Returns**: 2249 2250 The serialized component as a dictionary. 2251 2252 <a id="openai.OpenAIGenerator.from_dict"></a> 2253 2254 #### OpenAIGenerator.from\_dict 2255 2256 ```python 2257 @classmethod 2258 def from_dict(cls, data: dict[str, Any]) -> "OpenAIGenerator" 2259 ``` 2260 2261 Deserialize this component from a dictionary. 2262 2263 **Arguments**: 2264 2265 - `data`: The dictionary representation of this component. 2266 2267 **Returns**: 2268 2269 The deserialized component instance. 2270 2271 <a id="openai.OpenAIGenerator.run"></a> 2272 2273 #### OpenAIGenerator.run 2274 2275 ```python 2276 @component.output_types(replies=list[str], meta=list[dict[str, Any]]) 2277 def run( 2278 prompt: str, 2279 system_prompt: str | None = None, 2280 streaming_callback: StreamingCallbackT | None = None, 2281 generation_kwargs: dict[str, Any] | None = None 2282 ) -> dict[str, list[str] | list[dict[str, Any]]] 2283 ``` 2284 2285 Invoke the text generation inference based on the provided messages and generation parameters. 2286 2287 **Arguments**: 2288 2289 - `prompt`: The string prompt to use for text generation. 2290 - `system_prompt`: The system prompt to use for text generation. If this run time system prompt is omitted, the system 2291 prompt, if defined at initialisation time, is used. 2292 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 2293 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters 2294 passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to 2295 the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create). 2296 2297 **Returns**: 2298 2299 A list of strings containing the generated responses and a list of dictionaries containing the metadata 2300 for each response. 2301 2302 <a id="openai_dalle"></a> 2303 2304 ## Module openai\_dalle 2305 2306 <a id="openai_dalle.DALLEImageGenerator"></a> 2307 2308 ### DALLEImageGenerator 2309 2310 Generates images using OpenAI's DALL-E model. 2311 2312 For details on OpenAI API parameters, see 2313 [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create). 2314 2315 ### Usage example 2316 2317 ```python 2318 from haystack.components.generators import DALLEImageGenerator 2319 image_generator = DALLEImageGenerator() 2320 response = image_generator.run("Show me a picture of a black cat.") 2321 print(response) 2322 ``` 2323 2324 <a id="openai_dalle.DALLEImageGenerator.__init__"></a> 2325 2326 #### DALLEImageGenerator.\_\_init\_\_ 2327 2328 ```python 2329 def __init__(model: str = "dall-e-3", 2330 quality: Literal["standard", "hd"] = "standard", 2331 size: Literal["256x256", "512x512", "1024x1024", "1792x1024", 2332 "1024x1792"] = "1024x1024", 2333 response_format: Literal["url", "b64_json"] = "url", 2334 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 2335 api_base_url: str | None = None, 2336 organization: str | None = None, 2337 timeout: float | None = None, 2338 max_retries: int | None = None, 2339 http_client_kwargs: dict[str, Any] | None = None) 2340 ``` 2341 2342 Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3. 2343 2344 **Arguments**: 2345 2346 - `model`: The model to use for image generation. Can be "dall-e-2" or "dall-e-3". 2347 - `quality`: The quality of the generated image. Can be "standard" or "hd". 2348 - `size`: The size of the generated images. 2349 Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. 2350 Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models. 2351 - `response_format`: The format of the response. Can be "url" or "b64_json". 2352 - `api_key`: The OpenAI API key to connect to OpenAI. 2353 - `api_base_url`: An optional base URL. 2354 - `organization`: The Organization ID, defaults to `None`. 2355 - `timeout`: Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable 2356 or set to 30. 2357 - `max_retries`: Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred 2358 from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 2359 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 2360 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 2361 2362 <a id="openai_dalle.DALLEImageGenerator.warm_up"></a> 2363 2364 #### DALLEImageGenerator.warm\_up 2365 2366 ```python 2367 def warm_up() -> None 2368 ``` 2369 2370 Warm up the OpenAI client. 2371 2372 <a id="openai_dalle.DALLEImageGenerator.run"></a> 2373 2374 #### DALLEImageGenerator.run 2375 2376 ```python 2377 @component.output_types(images=list[str], revised_prompt=str) 2378 def run(prompt: str, 2379 size: Literal["256x256", "512x512", "1024x1024", "1792x1024", 2380 "1024x1792"] | None = None, 2381 quality: Literal["standard", "hd"] | None = None, 2382 response_format: Literal["url", "b64_json"] | None = None) 2383 ``` 2384 2385 Invokes the image generation inference based on the provided prompt and generation parameters. 2386 2387 **Arguments**: 2388 2389 - `prompt`: The prompt to generate the image. 2390 - `size`: If provided, overrides the size provided during initialization. 2391 - `quality`: If provided, overrides the quality provided during initialization. 2392 - `response_format`: If provided, overrides the response format provided during initialization. 2393 2394 **Returns**: 2395 2396 A dictionary containing the generated list of images and the revised prompt. 2397 Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings. 2398 The revised prompt is the prompt that was used to generate the image, if there was any revision 2399 to the prompt made by OpenAI. 2400 2401 <a id="openai_dalle.DALLEImageGenerator.to_dict"></a> 2402 2403 #### DALLEImageGenerator.to\_dict 2404 2405 ```python 2406 def to_dict() -> dict[str, Any] 2407 ``` 2408 2409 Serialize this component to a dictionary. 2410 2411 **Returns**: 2412 2413 The serialized component as a dictionary. 2414 2415 <a id="openai_dalle.DALLEImageGenerator.from_dict"></a> 2416 2417 #### DALLEImageGenerator.from\_dict 2418 2419 ```python 2420 @classmethod 2421 def from_dict(cls, data: dict[str, Any]) -> "DALLEImageGenerator" 2422 ``` 2423 2424 Deserialize this component from a dictionary. 2425 2426 **Arguments**: 2427 2428 - `data`: The dictionary representation of this component. 2429 2430 **Returns**: 2431 2432 The deserialized component instance. 2433 2434 <a id="utils"></a> 2435 2436 ## Module utils 2437 2438 <a id="utils.print_streaming_chunk"></a> 2439 2440 #### print\_streaming\_chunk 2441 2442 ```python 2443 def print_streaming_chunk(chunk: StreamingChunk) -> None 2444 ``` 2445 2446 Callback function to handle and display streaming output chunks. 2447 2448 This function processes a `StreamingChunk` object by: 2449 - Printing tool call metadata (if any), including function names and arguments, as they arrive. 2450 - Printing tool call results when available. 2451 - Printing the main content (e.g., text tokens) of the chunk as it is received. 2452 2453 The function outputs data directly to stdout and flushes output buffers to ensure immediate display during 2454 streaming. 2455 2456 **Arguments**: 2457 2458 - `chunk`: A chunk of streaming data containing content and optional metadata, such as tool calls and 2459 tool results. 2460