generators_api.md
1 --- 2 title: "Generators" 3 id: generators-api 4 description: "Enables text generation using LLMs." 5 slug: "/generators-api" 6 --- 7 8 9 ## azure 10 11 ### AzureOpenAIGenerator 12 13 Bases: <code>OpenAIGenerator</code> 14 15 Generates text using OpenAI's large language models (LLMs). 16 17 It works with the gpt-4 - type models and supports streaming responses 18 from OpenAI API. 19 20 You can customize how the text is generated by passing parameters to the 21 OpenAI API. Use the `**generation_kwargs` argument when you initialize 22 the component or when you run it. Any parameter that works with 23 `openai.ChatCompletion.create` will work here too. 24 25 For details on OpenAI API parameters, see 26 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 27 28 ### Usage example 29 30 <!-- test-ignore --> 31 32 ```python 33 from haystack.components.generators import AzureOpenAIGenerator 34 from haystack.utils import Secret 35 client = AzureOpenAIGenerator( 36 azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>", 37 api_key=Secret.from_token("<your-api-key>"), 38 azure_deployment="<this a model name, e.g. gpt-4.1-mini>") 39 response = client.run("What's Natural Language Processing? Be brief.") 40 print(response) 41 ``` 42 43 ``` 44 # >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 45 # >> the interaction between computers and human language. It involves enabling computers to understand, interpret, 46 # >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model': 47 # >> 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16, 48 # >> 'completion_tokens': 49, 'total_tokens': 65}}]} 49 ``` 50 51 #### __init__ 52 53 ```python 54 __init__( 55 azure_endpoint: str | None = None, 56 api_version: str | None = "2024-12-01-preview", 57 azure_deployment: str | None = "gpt-4.1-mini", 58 api_key: Secret | None = Secret.from_env_var( 59 "AZURE_OPENAI_API_KEY", strict=False 60 ), 61 azure_ad_token: Secret | None = Secret.from_env_var( 62 "AZURE_OPENAI_AD_TOKEN", strict=False 63 ), 64 organization: str | None = None, 65 streaming_callback: StreamingCallbackT | None = None, 66 system_prompt: str | None = None, 67 timeout: float | None = None, 68 max_retries: int | None = None, 69 http_client_kwargs: dict[str, Any] | None = None, 70 generation_kwargs: dict[str, Any] | None = None, 71 default_headers: dict[str, str] | None = None, 72 *, 73 azure_ad_token_provider: AzureADTokenProvider | None = None 74 ) -> None 75 ``` 76 77 Initialize the Azure OpenAI Generator. 78 79 **Parameters:** 80 81 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`. 82 - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview. 83 - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name. 84 - **api_key** (<code>Secret | None</code>) – The API key to use for authentication. 85 - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 86 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 87 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 88 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 89 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 90 as an argument. 91 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the Generator 92 omits the system prompt and uses the default system prompt. 93 - **timeout** (<code>float | None</code>) – Timeout for AzureOpenAI client. If not set, it is inferred from the 94 `OPENAI_TIMEOUT` environment variable or set to 30. 95 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with AzureOpenAI if it returns an internal error. 96 If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 97 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 98 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 99 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model, sent directly to 100 the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for 101 more details. 102 Some of the supported parameters: 103 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 104 including visible output tokens and reasoning tokens. 105 - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. 106 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 107 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 108 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 109 comprising the top 10% probability mass are considered. 110 - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, 111 the LLM will generate two completions per prompt, resulting in 6 completions total. 112 - `stop`: One or more sequences after which the LLM should stop generating tokens. 113 - `presence_penalty`: The penalty applied if a token is already present. 114 Higher values make the model less likely to repeat the token. 115 - `frequency_penalty`: Penalty applied if a token has already been generated. 116 Higher values make the model less likely to repeat the token. 117 - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the 118 values are the bias to add to that token. 119 - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client. 120 - **azure_ad_token_provider** (<code>AzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on 121 every request. 122 123 #### to_dict 124 125 ```python 126 to_dict() -> dict[str, Any] 127 ``` 128 129 Serialize this component to a dictionary. 130 131 **Returns:** 132 133 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 134 135 #### from_dict 136 137 ```python 138 from_dict(data: dict[str, Any]) -> AzureOpenAIGenerator 139 ``` 140 141 Deserialize this component from a dictionary. 142 143 **Parameters:** 144 145 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 146 147 **Returns:** 148 149 - <code>AzureOpenAIGenerator</code> – The deserialized component instance. 150 151 ## chat/azure 152 153 ### AzureOpenAIChatGenerator 154 155 Bases: <code>OpenAIChatGenerator</code> 156 157 Generates text using OpenAI's models on Azure. 158 159 It works with the gpt-4 - type models and supports streaming responses 160 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 161 format in input and output. 162 163 You can customize how the text is generated by passing parameters to the 164 OpenAI API. Use the `**generation_kwargs` argument when you initialize 165 the component or when you run it. Any parameter that works with 166 `openai.ChatCompletion.create` will work here too. 167 168 For details on OpenAI API parameters, see 169 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 170 171 ### Usage example 172 173 <!-- test-ignore --> 174 175 ```python 176 from haystack.components.generators.chat import AzureOpenAIChatGenerator 177 from haystack.dataclasses import ChatMessage 178 from haystack.utils import Secret 179 180 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 181 182 client = AzureOpenAIChatGenerator( 183 azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>", 184 api_key=Secret.from_token("<your-api-key>"), 185 azure_deployment="<this a model name, e.g. gpt-4.1-mini>") 186 response = client.run(messages) 187 print(response) 188 ``` 189 190 ``` 191 {'replies': 192 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text= 193 "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 194 enabling computers to understand, interpret, and generate human language in a way that is useful.")], 195 _name=None, 196 _meta={'model': 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 197 'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})] 198 } 199 ``` 200 201 #### SUPPORTED_MODELS 202 203 ```python 204 SUPPORTED_MODELS: list[str] = [ 205 "gpt-5.4", 206 "gpt-5.4-pro", 207 "gpt-5.3-codex", 208 "gpt-5.2", 209 "gpt-5.2-codex", 210 "gpt-5.2-chat", 211 "gpt-5.1", 212 "gpt-5.1-chat", 213 "gpt-5.1-codex", 214 "gpt-5.1-codex-mini", 215 "gpt-5", 216 "gpt-5-mini", 217 "gpt-5-nano", 218 "gpt-5-chat", 219 "gpt-4.1", 220 "gpt-4.1-mini", 221 "gpt-4.1-nano", 222 "gpt-4o", 223 "gpt-4o-mini", 224 "gpt-4o-audio-preview", 225 "gpt-realtime-1.5", 226 "gpt-audio-1.5", 227 "o1", 228 "o1-mini", 229 "o3", 230 "o3-mini", 231 "o4-mini", 232 "codex-mini", 233 "gpt-4", 234 "gpt-35-turbo", 235 "gpt-oss-120b", 236 "computer-use-preview", 237 ] 238 239 ``` 240 241 A non-exhaustive list of chat models supported by this component. 242 See https://learn.microsoft.com/en-us/azure/foundry/foundry-models/concepts/models-sold-directly-by-azure 243 for the full list. 244 245 #### __init__ 246 247 ```python 248 __init__( 249 azure_endpoint: str | None = None, 250 api_version: str | None = "2024-12-01-preview", 251 azure_deployment: str | None = "gpt-4.1-mini", 252 api_key: Secret | None = Secret.from_env_var( 253 "AZURE_OPENAI_API_KEY", strict=False 254 ), 255 azure_ad_token: Secret | None = Secret.from_env_var( 256 "AZURE_OPENAI_AD_TOKEN", strict=False 257 ), 258 organization: str | None = None, 259 streaming_callback: StreamingCallbackT | None = None, 260 timeout: float | None = None, 261 max_retries: int | None = None, 262 generation_kwargs: dict[str, Any] | None = None, 263 default_headers: dict[str, str] | None = None, 264 tools: ToolsType | None = None, 265 tools_strict: bool = False, 266 *, 267 azure_ad_token_provider: ( 268 AzureADTokenProvider | AsyncAzureADTokenProvider | None 269 ) = None, 270 http_client_kwargs: dict[str, Any] | None = None 271 ) -> None 272 ``` 273 274 Initialize the Azure OpenAI Chat Generator component. 275 276 **Parameters:** 277 278 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. 279 - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview. 280 - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name. 281 - **api_key** (<code>Secret | None</code>) – The API key to use for authentication. 282 - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 283 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 284 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 285 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 286 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 287 as an argument. 288 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 289 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 290 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 291 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 292 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to 293 the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 294 Some of the supported parameters: 295 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 296 including visible output tokens and reasoning tokens. 297 - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. 298 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 299 - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers 300 tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising 301 the top 10% probability mass are considered. 302 - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, 303 the LLM will generate two completions per prompt, resulting in 6 completions total. 304 - `stop`: One or more sequences after which the LLM should stop generating tokens. 305 - `presence_penalty`: The penalty applied if a token is already present. 306 Higher values make the model less likely to repeat the token. 307 - `frequency_penalty`: Penalty applied if a token has already been generated. 308 Higher values make the model less likely to repeat the token. 309 - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the 310 values are the bias to add to that token. 311 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 312 If provided, the output will always be validated against this 313 format (unless the model returns a tool call). 314 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 315 Notes: 316 - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. 317 Older models only support basic version of structured outputs through `{"type": "json_object"}`. 318 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 319 - For structured outputs with streaming, 320 the `response_format` must be a JSON schema and not a Pydantic model. 321 - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client. 322 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 323 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 324 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 325 - **azure_ad_token_provider** (<code>AzureADTokenProvider | AsyncAzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on 326 every request. 327 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 328 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 329 330 #### warm_up 331 332 ```python 333 warm_up() -> None 334 ``` 335 336 Warm up the Azure OpenAI chat generator. 337 338 This will warm up the tools registered in the chat generator. 339 This method is idempotent and will only warm up the tools once. 340 341 #### to_dict 342 343 ```python 344 to_dict() -> dict[str, Any] 345 ``` 346 347 Serialize this component to a dictionary. 348 349 **Returns:** 350 351 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 352 353 #### from_dict 354 355 ```python 356 from_dict(data: dict[str, Any]) -> AzureOpenAIChatGenerator 357 ``` 358 359 Deserialize this component from a dictionary. 360 361 **Parameters:** 362 363 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 364 365 **Returns:** 366 367 - <code>AzureOpenAIChatGenerator</code> – The deserialized component instance. 368 369 ## chat/azure_responses 370 371 ### AzureOpenAIResponsesChatGenerator 372 373 Bases: <code>OpenAIResponsesChatGenerator</code> 374 375 Completes chats using OpenAI's Responses API on Azure. 376 377 It works with the gpt-5 and o-series models and supports streaming responses 378 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 379 format in input and output. 380 381 You can customize how the text is generated by passing parameters to the 382 OpenAI API. Use the `**generation_kwargs` argument when you initialize 383 the component or when you run it. Any parameter that works with 384 `openai.Responses.create` will work here too. 385 386 For details on OpenAI API parameters, see 387 [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses). 388 389 ### Usage example 390 391 <!-- test-ignore --> 392 393 ```python 394 from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator 395 from haystack.dataclasses import ChatMessage 396 397 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 398 399 client = AzureOpenAIResponsesChatGenerator( 400 azure_endpoint="https://example-resource.azure.openai.com/", 401 generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}} 402 ) 403 response = client.run(messages) 404 print(response) 405 ``` 406 407 #### SUPPORTED_MODELS 408 409 ```python 410 SUPPORTED_MODELS: list[str] = [ 411 "gpt-5.4-pro", 412 "gpt-5.4", 413 "gpt-5.3-chat", 414 "gpt-5.3-codex", 415 "gpt-5.2-codex", 416 "gpt-5.2", 417 "gpt-5.2-chat", 418 "gpt-5.1-codex-max", 419 "gpt-5.1", 420 "gpt-5.1-chat", 421 "gpt-5.1-codex", 422 "gpt-5.1-codex-mini", 423 "gpt-5-pro", 424 "gpt-5-codex", 425 "gpt-5", 426 "gpt-5-mini", 427 "gpt-5-nano", 428 "gpt-5-chat", 429 "gpt-4o", 430 "gpt-4o-mini", 431 "computer-use-preview", 432 "gpt-4.1", 433 "gpt-4.1-nano", 434 "gpt-4.1-mini", 435 "gpt-image-1", 436 "gpt-image-1-mini", 437 "gpt-image-1.5", 438 "o1", 439 "o3-mini", 440 "o3", 441 "o4-mini", 442 ] 443 444 ``` 445 446 A non-exhaustive list of chat models supported by this component. 447 See https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/responses#model-support for the full list. 448 449 #### __init__ 450 451 ```python 452 __init__( 453 *, 454 api_key: ( 455 Secret | Callable[[], str] | Callable[[], Awaitable[str]] 456 ) = Secret.from_env_var("AZURE_OPENAI_API_KEY", strict=False), 457 azure_endpoint: str | None = None, 458 azure_deployment: str = "gpt-5-mini", 459 streaming_callback: StreamingCallbackT | None = None, 460 organization: str | None = None, 461 generation_kwargs: dict[str, Any] | None = None, 462 timeout: float | None = None, 463 max_retries: int | None = None, 464 tools: ToolsType | None = None, 465 tools_strict: bool = False, 466 http_client_kwargs: dict[str, Any] | None = None 467 ) -> None 468 ``` 469 470 Initialize the AzureOpenAIResponsesChatGenerator component. 471 472 **Parameters:** 473 474 - **api_key** (<code>Secret | Callable\[[], str\] | Callable\[[], Awaitable\[str\]\]</code>) – The API key to use for authentication. Can be: 475 - A `Secret` object containing the API key. 476 - A `Secret` object containing the [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 477 - A function that returns an Azure Active Directory token. 478 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. 479 - **azure_deployment** (<code>str</code>) – The deployment of the model, usually the model name. 480 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 481 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 482 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 483 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 484 as an argument. 485 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 486 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 487 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 488 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 489 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent 490 directly to the OpenAI endpoint. 491 See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for 492 more details. 493 Some of the supported parameters: 494 - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, 495 while lower values like 0.2 will make it more focused and deterministic. 496 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 497 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 498 comprising the top 10% probability mass are considered. 499 - `previous_response_id`: The ID of the previous response. 500 Use this to create multi-turn conversations. 501 - `text_format`: A Pydantic model that enforces the structure of the model's response. 502 If provided, the output will always be validated against this 503 format (unless the model returns a tool call). 504 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 505 - `text`: A JSON schema that enforces the structure of the model's response. 506 If provided, the output will always be validated against this 507 format (unless the model returns a tool call). 508 Notes: 509 - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. 510 - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. 511 - Currently, this component doesn't support streaming for structured outputs. 512 - Older models only support basic version of structured outputs through `{"type": "json_object"}`. 513 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 514 - `reasoning`: A dictionary of parameters for reasoning. For example: 515 - `summary`: The summary of the reasoning. 516 - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`. 517 - `generate_summary`: Whether to generate a summary of the reasoning. 518 Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. 519 For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning). 520 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 521 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 522 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 523 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 524 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 525 526 #### to_dict 527 528 ```python 529 to_dict() -> dict[str, Any] 530 ``` 531 532 Serialize this component to a dictionary. 533 534 **Returns:** 535 536 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 537 538 #### from_dict 539 540 ```python 541 from_dict(data: dict[str, Any]) -> AzureOpenAIResponsesChatGenerator 542 ``` 543 544 Deserialize this component from a dictionary. 545 546 **Parameters:** 547 548 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 549 550 **Returns:** 551 552 - <code>AzureOpenAIResponsesChatGenerator</code> – The deserialized component instance. 553 554 ## chat/fallback 555 556 ### FallbackChatGenerator 557 558 A chat generator wrapper that tries multiple chat generators sequentially. 559 560 It forwards all parameters transparently to the underlying chat generators and returns the first successful result. 561 Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator. 562 If all chat generators fail, it raises a RuntimeError with details. 563 564 Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only 565 work correctly if the underlying chat generators implement proper timeout handling and raise exceptions 566 when timeouts occur. For predictable latency guarantees, ensure your chat generators: 567 568 - Support a `timeout` parameter in their initialization 569 - Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming) 570 - Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded 571 572 Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters 573 with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`) 574 typically applies to all connection phases: connection setup, read, write, and pool. For streaming 575 responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for 576 receiving the complete response. 577 578 Fail over is automatically triggered when a generator raises any exception, including: 579 580 - Timeout errors (if the generator implements and raises them) 581 - Rate limit errors (429) 582 - Authentication errors (401) 583 - Context length errors (400) 584 - Server errors (500+) 585 - Any other exception 586 587 #### __init__ 588 589 ```python 590 __init__(chat_generators: list[ChatGenerator]) -> None 591 ``` 592 593 Creates an instance of FallbackChatGenerator. 594 595 **Parameters:** 596 597 - **chat_generators** (<code>list\[ChatGenerator\]</code>) – A non-empty list of chat generator components to try in order. 598 599 #### to_dict 600 601 ```python 602 to_dict() -> dict[str, Any] 603 ``` 604 605 Serialize the component, including nested chat generators when they support serialization. 606 607 #### from_dict 608 609 ```python 610 from_dict(data: dict[str, Any]) -> FallbackChatGenerator 611 ``` 612 613 Rebuild the component from a serialized representation, restoring nested chat generators. 614 615 #### warm_up 616 617 ```python 618 warm_up() -> None 619 ``` 620 621 Warm up all underlying chat generators. 622 623 This method calls warm_up() on each underlying generator that supports it. 624 625 #### run 626 627 ```python 628 run( 629 messages: list[ChatMessage], 630 generation_kwargs: dict[str, Any] | None = None, 631 tools: ToolsType | None = None, 632 streaming_callback: StreamingCallbackT | None = None, 633 ) -> dict[str, list[ChatMessage] | dict[str, Any]] 634 ``` 635 636 Execute chat generators sequentially until one succeeds. 637 638 **Parameters:** 639 640 - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances. 641 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens). 642 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities. 643 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses. 644 645 **Returns:** 646 647 - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with: 648 - "replies": Generated ChatMessage instances from the first successful generator. 649 - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, 650 total_attempts, failed_chat_generators, plus any metadata from the successful generator. 651 652 **Raises:** 653 654 - <code>RuntimeError</code> – If all chat generators fail. 655 656 #### run_async 657 658 ```python 659 run_async( 660 messages: list[ChatMessage], 661 generation_kwargs: dict[str, Any] | None = None, 662 tools: ToolsType | None = None, 663 streaming_callback: StreamingCallbackT | None = None, 664 ) -> dict[str, list[ChatMessage] | dict[str, Any]] 665 ``` 666 667 Asynchronously execute chat generators sequentially until one succeeds. 668 669 **Parameters:** 670 671 - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances. 672 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens). 673 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities. 674 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses. 675 676 **Returns:** 677 678 - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with: 679 - "replies": Generated ChatMessage instances from the first successful generator. 680 - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, 681 total_attempts, failed_chat_generators, plus any metadata from the successful generator. 682 683 **Raises:** 684 685 - <code>RuntimeError</code> – If all chat generators fail. 686 687 ## chat/hugging_face_api 688 689 ### HuggingFaceAPIChatGenerator 690 691 Completes chats using Hugging Face APIs. 692 693 HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 694 format for input and output. Use it to generate text with Hugging Face APIs: 695 696 - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers) 697 - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints) 698 - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference) 699 700 ### Usage examples 701 702 #### With the serverless inference API (Inference Providers) - free tier available 703 704 <!-- test-ignore --> 705 706 ```python 707 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 708 from haystack.dataclasses import ChatMessage 709 from haystack.utils import Secret 710 from haystack.utils.hf import HFGenerationAPIType 711 712 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 713 ChatMessage.from_user("What's Natural Language Processing?")] 714 715 # the api_type can be expressed using the HFGenerationAPIType enum or as a string 716 api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API 717 api_type = "serverless_inference_api" # this is equivalent to the above 718 719 generator = HuggingFaceAPIChatGenerator(api_type=api_type, 720 api_params={"model": "Qwen/Qwen2.5-7B-Instruct", 721 "provider": "together"}, 722 token=Secret.from_token("<your-api-key>")) 723 724 result = generator.run(messages) 725 print(result) 726 ``` 727 728 #### With the serverless inference API (Inference Providers) and text+image input 729 730 <!-- test-ignore --> 731 732 ```python 733 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 734 from haystack.dataclasses import ChatMessage, ImageContent 735 from haystack.utils import Secret 736 from haystack.utils.hf import HFGenerationAPIType 737 738 # Create an image from file path, URL, or base64 739 image = ImageContent.from_file_path("path/to/your/image.jpg") 740 741 # Create a multimodal message with both text and image 742 messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])] 743 744 generator = HuggingFaceAPIChatGenerator( 745 api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API, 746 api_params={ 747 "model": "Qwen/Qwen2.5-VL-7B-Instruct", # Vision Language Model 748 "provider": "hyperbolic" 749 }, 750 token=Secret.from_token("<your-api-key>") 751 ) 752 753 result = generator.run(messages) 754 print(result) 755 ``` 756 757 #### With paid inference endpoints 758 759 <!-- test-ignore --> 760 761 ```python 762 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 763 from haystack.dataclasses import ChatMessage 764 from haystack.utils import Secret 765 766 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 767 ChatMessage.from_user("What's Natural Language Processing?")] 768 769 generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints", 770 api_params={"url": "<your-inference-endpoint-url>"}, 771 token=Secret.from_token("<your-api-key>")) 772 773 result = generator.run(messages) 774 print(result) 775 ``` 776 777 #### With self-hosted text generation inference 778 779 <!-- test-ignore --> 780 781 ```python 782 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 783 from haystack.dataclasses import ChatMessage 784 785 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 786 ChatMessage.from_user("What's Natural Language Processing?")] 787 788 generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference", 789 api_params={"url": "http://localhost:8080"}) 790 791 result = generator.run(messages) 792 print(result) 793 ``` 794 795 #### __init__ 796 797 ```python 798 __init__( 799 api_type: HFGenerationAPIType | str, 800 api_params: dict[str, str], 801 token: Secret | None = Secret.from_env_var( 802 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 803 ), 804 generation_kwargs: dict[str, Any] | None = None, 805 stop_words: list[str] | None = None, 806 streaming_callback: StreamingCallbackT | None = None, 807 tools: ToolsType | None = None, 808 ) -> None 809 ``` 810 811 Initialize the HuggingFaceAPIChatGenerator instance. 812 813 **Parameters:** 814 815 - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types: 816 - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference). 817 - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints). 818 - `serverless_inference_api`: See 819 [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers). 820 - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys: 821 - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. 822 - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`. 823 - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or 824 `TEXT_GENERATION_INFERENCE`. 825 - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc. 826 - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization. 827 Check your HF token in your [account settings](https://huggingface.co/settings/tokens). 828 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 829 Some examples: `max_tokens`, `temperature`, `top_p`. 830 For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion). 831 - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words. 832 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 833 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 834 The chosen model should support tool/function calling, according to the model card. 835 Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience 836 unexpected behavior. 837 838 #### warm_up 839 840 ```python 841 warm_up() -> None 842 ``` 843 844 Warm up the Hugging Face API chat generator. 845 846 This will warm up the tools registered in the chat generator. 847 This method is idempotent and will only warm up the tools once. 848 849 #### to_dict 850 851 ```python 852 to_dict() -> dict[str, Any] 853 ``` 854 855 Serialize this component to a dictionary. 856 857 **Returns:** 858 859 - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component. 860 861 #### from_dict 862 863 ```python 864 from_dict(data: dict[str, Any]) -> HuggingFaceAPIChatGenerator 865 ``` 866 867 Deserialize this component from a dictionary. 868 869 #### run 870 871 ```python 872 run( 873 messages: list[ChatMessage], 874 generation_kwargs: dict[str, Any] | None = None, 875 tools: ToolsType | None = None, 876 streaming_callback: StreamingCallbackT | None = None, 877 ) -> dict[str, list[ChatMessage]] 878 ``` 879 880 Invoke the text generation inference based on the provided messages and generation parameters. 881 882 **Parameters:** 883 884 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 885 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 886 - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override 887 the `tools` parameter set during component initialization. This parameter can accept either a 888 list of `Tool` objects or a `Toolset` instance. 889 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback` 890 parameter set during component initialization. 891 892 **Returns:** 893 894 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 895 - `replies`: A list containing the generated responses as ChatMessage objects. 896 897 #### run_async 898 899 ```python 900 run_async( 901 messages: list[ChatMessage], 902 generation_kwargs: dict[str, Any] | None = None, 903 tools: ToolsType | None = None, 904 streaming_callback: StreamingCallbackT | None = None, 905 ) -> dict[str, list[ChatMessage]] 906 ``` 907 908 Asynchronously invokes the text generation inference based on the provided messages and generation parameters. 909 910 This is the asynchronous version of the `run` method. It has the same parameters 911 and return values but can be used with `await` in an async code. 912 913 **Parameters:** 914 915 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 916 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 917 - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools` 918 parameter set during component initialization. This parameter can accept either a list of `Tool` objects 919 or a `Toolset` instance. 920 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback` 921 parameter set during component initialization. 922 923 **Returns:** 924 925 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 926 - `replies`: A list containing the generated responses as ChatMessage objects. 927 928 ## chat/hugging_face_local 929 930 ### default_tool_parser 931 932 ```python 933 default_tool_parser(text: str) -> list[ToolCall] | None 934 ``` 935 936 Default implementation for parsing tool calls from model output text. 937 938 Uses DEFAULT_TOOL_PATTERN to extract tool calls. 939 940 **Parameters:** 941 942 - **text** (<code>str</code>) – The text to parse for tool calls. 943 944 **Returns:** 945 946 - <code>list\[ToolCall\] | None</code> – A list containing a single ToolCall if a valid tool call is found, None otherwise. 947 948 ### HuggingFaceLocalChatGenerator 949 950 Generates chat responses using models from Hugging Face that run locally. 951 952 Use this component with chat-based models, 953 such as `Qwen/Qwen3-0.6B` or `meta-llama/Llama-2-7b-chat-hf`. 954 LLMs running locally may need powerful hardware. 955 956 ### Usage example 957 958 <!-- test-ignore --> 959 960 ```python 961 from haystack.components.generators.chat import HuggingFaceLocalChatGenerator 962 from haystack.dataclasses import ChatMessage 963 964 generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B") 965 messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")] 966 print(generator.run(messages)) 967 ``` 968 969 ``` 970 {'replies': 971 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text= 972 "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals 973 with the interaction between computers and human language. It enables computers to understand, interpret, and 974 generate human language in a valuable way. NLP involves various techniques such as speech recognition, text 975 analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to 976 process and derive meaning from human language, improving communication between humans and machines.")], 977 _name=None, 978 _meta={'finish_reason': 'stop', 'index': 0, 'model': 979 'mistralai/Mistral-7B-Instruct-v0.2', 980 'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}}) 981 ] 982 } 983 ``` 984 985 #### __init__ 986 987 ```python 988 __init__( 989 model: str = "Qwen/Qwen3-0.6B", 990 task: ( 991 Literal["text-generation", "text2text-generation", "image-text-to-text"] 992 | None 993 ) = None, 994 device: ComponentDevice | None = None, 995 token: Secret | None = Secret.from_env_var( 996 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 997 ), 998 chat_template: str | None = None, 999 generation_kwargs: dict[str, Any] | None = None, 1000 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 1001 stop_words: list[str] | None = None, 1002 streaming_callback: StreamingCallbackT | None = None, 1003 tools: ToolsType | None = None, 1004 tool_parsing_function: Callable[[str], list[ToolCall] | None] | None = None, 1005 async_executor: ThreadPoolExecutor | None = None, 1006 *, 1007 enable_thinking: bool = False 1008 ) -> None 1009 ``` 1010 1011 Initializes the HuggingFaceLocalChatGenerator component. 1012 1013 **Parameters:** 1014 1015 - **model** (<code>str</code>) – The Hugging Face text generation model name or path, 1016 for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`. 1017 The model must be a chat model supporting the ChatML messaging 1018 format. 1019 If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1020 - **task** (<code>Literal['text-generation', 'text2text-generation', 'image-text-to-text'] | None</code>) – The task for the Hugging Face pipeline. Possible options: 1021 - `text-generation`: Supported by decoder models, like GPT. 1022 - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead. 1023 Previously supported by encoder–decoder models such as T5. 1024 - `image-text-to-text`: Supported by vision-language models. 1025 If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1026 If not specified, the component calls the Hugging Face API to infer the task from the model name. 1027 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 1028 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 1029 - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files. 1030 If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1031 - **chat_template** (<code>str | None</code>) – Specifies an optional Jinja template for formatting chat 1032 messages. Most high-quality chat models have their own templates, but for models without this 1033 feature or if you prefer a custom template, use this parameter. 1034 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 1035 Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. 1036 See Hugging Face's documentation for more information: 1037 - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation) 1038 - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig) 1039 The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens. 1040 - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the 1041 Hugging Face pipeline for text generation. 1042 These keyword arguments provide fine-grained control over the Hugging Face pipeline. 1043 In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. 1044 For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). 1045 In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) 1046 - **stop_words** (<code>list\[str\] | None</code>) – A list of stop words. If the model generates a stop word, the generation stops. 1047 If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. 1048 For some chat models, the output includes both the new text and the original prompt. 1049 In these cases, make sure your prompt has no stop words. 1050 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1051 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1052 - **tool_parsing_function** (<code>Callable\\[[str\], list\[ToolCall\] | None\] | None</code>) – A callable that takes a string and returns a list of ToolCall objects or None. 1053 If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern. 1054 - **async_executor** (<code>ThreadPoolExecutor | None</code>) – Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be 1055 initialized and used 1056 - **enable_thinking** (<code>bool</code>) – Whether to enable thinking mode in the chat template for thinking-capable models. 1057 When enabled, the model generates intermediate reasoning before the final response. Defaults to False. 1058 1059 #### shutdown 1060 1061 ```python 1062 shutdown() -> None 1063 ``` 1064 1065 Explicitly shutdown the executor if we own it. 1066 1067 #### warm_up 1068 1069 ```python 1070 warm_up() -> None 1071 ``` 1072 1073 Initializes the component and warms up tools if provided. 1074 1075 #### to_dict 1076 1077 ```python 1078 to_dict() -> dict[str, Any] 1079 ``` 1080 1081 Serializes the component to a dictionary. 1082 1083 **Returns:** 1084 1085 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1086 1087 #### from_dict 1088 1089 ```python 1090 from_dict(data: dict[str, Any]) -> HuggingFaceLocalChatGenerator 1091 ``` 1092 1093 Deserializes the component from a dictionary. 1094 1095 **Parameters:** 1096 1097 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 1098 1099 **Returns:** 1100 1101 - <code>HuggingFaceLocalChatGenerator</code> – The deserialized component. 1102 1103 #### run 1104 1105 ```python 1106 run( 1107 messages: list[ChatMessage], 1108 generation_kwargs: dict[str, Any] | None = None, 1109 streaming_callback: StreamingCallbackT | None = None, 1110 tools: ToolsType | None = None, 1111 ) -> dict[str, list[ChatMessage]] 1112 ``` 1113 1114 Invoke text generation inference based on the provided messages and generation parameters. 1115 1116 **Parameters:** 1117 1118 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 1119 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 1120 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1121 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1122 If set, it will override the `tools` parameter provided during initialization. 1123 1124 **Returns:** 1125 1126 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 1127 - `replies`: A list containing the generated responses as ChatMessage instances. 1128 1129 #### create_message 1130 1131 ```python 1132 create_message( 1133 text: str, 1134 index: int, 1135 tokenizer: Union[PreTrainedTokenizer, PreTrainedTokenizerFast], 1136 prompt: str, 1137 generation_kwargs: dict[str, Any], 1138 parse_tool_calls: bool = False, 1139 ) -> ChatMessage 1140 ``` 1141 1142 Create a ChatMessage instance from the provided text, populated with metadata. 1143 1144 **Parameters:** 1145 1146 - **text** (<code>str</code>) – The generated text. 1147 - **index** (<code>int</code>) – The index of the generated text. 1148 - **tokenizer** (<code>Union\[PreTrainedTokenizer, PreTrainedTokenizerFast\]</code>) – The tokenizer used for generation. 1149 - **prompt** (<code>str</code>) – The prompt used for generation. 1150 - **generation_kwargs** (<code>dict\[str, Any\]</code>) – The generation parameters. 1151 - **parse_tool_calls** (<code>bool</code>) – Whether to attempt parsing tool calls from the text. 1152 1153 **Returns:** 1154 1155 - <code>ChatMessage</code> – A ChatMessage instance. 1156 1157 #### run_async 1158 1159 ```python 1160 run_async( 1161 messages: list[ChatMessage], 1162 generation_kwargs: dict[str, Any] | None = None, 1163 streaming_callback: StreamingCallbackT | None = None, 1164 tools: ToolsType | None = None, 1165 ) -> dict[str, list[ChatMessage]] 1166 ``` 1167 1168 Asynchronously invokes text generation inference based on the provided messages and generation parameters. 1169 1170 This is the asynchronous version of the `run` method. It has the same parameters 1171 and return values but can be used with `await` in an async code. 1172 1173 **Parameters:** 1174 1175 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 1176 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 1177 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1178 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1179 If set, it will override the `tools` parameter provided during initialization. 1180 1181 **Returns:** 1182 1183 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 1184 - `replies`: A list containing the generated responses as ChatMessage instances. 1185 1186 ## chat/llm 1187 1188 ### LLM 1189 1190 Bases: <code>Agent</code> 1191 1192 A text generation component powered by a large language model. 1193 1194 The LLM component is a simplified version of the Agent that focuses solely on text generation 1195 without tool usage. It processes messages and returns a single response from the language model. 1196 1197 ### Usage examples 1198 1199 ```python 1200 from haystack.components.generators.chat import LLM 1201 from haystack.components.generators.chat import OpenAIChatGenerator 1202 from haystack.dataclasses import ChatMessage 1203 1204 llm = LLM( 1205 chat_generator=OpenAIChatGenerator(), 1206 system_prompt="You are a helpful translation assistant.", 1207 user_prompt="""{% message role="user"%} 1208 Summarize the following document: {{ document }} 1209 {% endmessage %}""", 1210 required_variables=["document"], 1211 ) 1212 1213 result = llm.run(document="The weather is lovely today and the sun is shining. ") 1214 print(result["last_message"].text) 1215 ``` 1216 1217 #### __init__ 1218 1219 ```python 1220 __init__( 1221 *, 1222 chat_generator: ChatGenerator, 1223 system_prompt: str | None = None, 1224 user_prompt: str | None = None, 1225 required_variables: list[str] | Literal["*"] | None = None, 1226 streaming_callback: StreamingCallbackT | None = None 1227 ) -> None 1228 ``` 1229 1230 Initialize the LLM component. 1231 1232 **Parameters:** 1233 1234 - **chat_generator** (<code>ChatGenerator</code>) – An instance of the chat generator that the LLM should use. 1235 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. 1236 - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided this is appended to the messages provided at runtime. 1237 - **required_variables** (<code>list\[str\] | Literal['\*'] | None</code>) – List variables that must be provided as input to user_prompt. 1238 If a variable listed as required is not provided, an exception is raised. 1239 If set to `"*"`, all variables found in the prompt are required. Optional. 1240 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM. 1241 1242 #### to_dict 1243 1244 ```python 1245 to_dict() -> dict[str, Any] 1246 ``` 1247 1248 Serialize the LLM component to a dictionary. 1249 1250 **Returns:** 1251 1252 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1253 1254 #### from_dict 1255 1256 ```python 1257 from_dict(data: dict[str, Any]) -> LLM 1258 ``` 1259 1260 Deserialize the LLM from a dictionary. 1261 1262 **Parameters:** 1263 1264 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 1265 1266 **Returns:** 1267 1268 - <code>LLM</code> – Deserialized LLM instance. 1269 1270 #### run 1271 1272 ```python 1273 run( 1274 messages: list[ChatMessage] | None = None, 1275 streaming_callback: StreamingCallbackT | None = None, 1276 *, 1277 generation_kwargs: dict[str, Any] | None = None, 1278 system_prompt: str | None = None, 1279 user_prompt: str | None = None, 1280 **kwargs: Any 1281 ) -> dict[str, Any] 1282 ``` 1283 1284 Process messages and generate a response from the language model. 1285 1286 **Parameters:** 1287 1288 - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process. 1289 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM. 1290 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters 1291 will override the parameters passed during component initialization. 1292 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt. 1293 - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is 1294 appended to the messages provided at runtime. 1295 - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt` 1296 (the keys must match template variable names). 1297 1298 **Returns:** 1299 1300 - <code>dict\[str, Any\]</code> – A dictionary with the following keys: 1301 - "messages": List of all messages exchanged during the LLM's run. 1302 - "last_message": The last message exchanged during the LLM's run. 1303 1304 #### run_async 1305 1306 ```python 1307 run_async( 1308 messages: list[ChatMessage] | None = None, 1309 streaming_callback: StreamingCallbackT | None = None, 1310 *, 1311 generation_kwargs: dict[str, Any] | None = None, 1312 system_prompt: str | None = None, 1313 user_prompt: str | None = None, 1314 **kwargs: Any 1315 ) -> dict[str, Any] 1316 ``` 1317 1318 Asynchronously process messages and generate a response from the language model. 1319 1320 **Parameters:** 1321 1322 - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process. 1323 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An asynchronous callback that will be invoked when a response is streamed 1324 from the LLM. 1325 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters 1326 will override the parameters passed during component initialization. 1327 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt. 1328 - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is 1329 appended to the messages provided at runtime. 1330 - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt` 1331 (the keys must match template variable names). 1332 1333 **Returns:** 1334 1335 - <code>dict\[str, Any\]</code> – A dictionary with the following keys: 1336 - "messages": List of all messages exchanged during the LLM's run. 1337 - "last_message": The last message exchanged during the LLM's run. 1338 1339 ## chat/openai 1340 1341 ### OpenAIChatGenerator 1342 1343 Completes chats using OpenAI's large language models (LLMs). 1344 1345 It works with the gpt-4 and gpt-5 series models and supports streaming responses 1346 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1347 format in input and output. 1348 1349 You can customize how the text is generated by passing parameters to the 1350 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1351 the component or when you run it. Any parameter that works with 1352 `openai.ChatCompletion.create` will work here too. 1353 1354 For details on OpenAI API parameters, see 1355 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 1356 1357 ### Usage example 1358 1359 ```python 1360 from haystack.components.generators.chat import OpenAIChatGenerator 1361 from haystack.dataclasses import ChatMessage 1362 1363 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 1364 1365 client = OpenAIChatGenerator() 1366 response = client.run(messages) 1367 print(response) 1368 ``` 1369 1370 Output: 1371 1372 ``` 1373 {'replies': 1374 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content= 1375 [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence 1376 that focuses on enabling computers to understand, interpret, and generate human language in 1377 a way that is meaningful and useful.")], 1378 _name=None, 1379 _meta={'model': 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 1380 'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}}) 1381 ] 1382 } 1383 ``` 1384 1385 #### SUPPORTED_MODELS 1386 1387 ```python 1388 SUPPORTED_MODELS: list[str] = [ 1389 "gpt-5-mini", 1390 "gpt-5-nano", 1391 "gpt-5", 1392 "gpt-5.1", 1393 "gpt-5.2", 1394 "gpt-5.2-pro", 1395 "gpt-5.4", 1396 "gpt-5-pro", 1397 "gpt-4.1", 1398 "gpt-4.1-mini", 1399 "gpt-4.1-nano", 1400 "gpt-4o", 1401 "gpt-4o-mini", 1402 "gpt-4-turbo", 1403 "gpt-4", 1404 "gpt-3.5-turbo", 1405 ] 1406 1407 ``` 1408 1409 A non-exhaustive list of chat models supported by this component. 1410 See https://developers.openai.com/api/docs/models for the full list and snapshot IDs. 1411 1412 #### __init__ 1413 1414 ```python 1415 __init__( 1416 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 1417 model: str = "gpt-5-mini", 1418 streaming_callback: StreamingCallbackT | None = None, 1419 api_base_url: str | None = None, 1420 organization: str | None = None, 1421 generation_kwargs: dict[str, Any] | None = None, 1422 timeout: float | None = None, 1423 max_retries: int | None = None, 1424 tools: ToolsType | None = None, 1425 tools_strict: bool = False, 1426 http_client_kwargs: dict[str, Any] | None = None, 1427 ) -> None 1428 ``` 1429 1430 Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini 1431 1432 Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' 1433 environment variables to override the `timeout` and `max_retries` parameters respectively 1434 in the OpenAI client. 1435 1436 **Parameters:** 1437 1438 - **api_key** (<code>Secret</code>) – The OpenAI API key. 1439 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 1440 during initialization. 1441 - **model** (<code>str</code>) – The name of the model to use. 1442 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1443 The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 1444 as an argument. 1445 - **api_base_url** (<code>str | None</code>) – An optional base URL. 1446 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See 1447 [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 1448 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to 1449 the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for 1450 more details. 1451 Some of the supported parameters: 1452 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 1453 including visible output tokens and reasoning tokens. 1454 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 1455 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 1456 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 1457 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 1458 comprising the top 10% probability mass are considered. 1459 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 1460 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 1461 - `stop`: One or more sequences after which the LLM should stop generating tokens. 1462 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 1463 the model will be less likely to repeat the same token in the text. 1464 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 1465 Bigger values mean the model will be less likely to repeat the same token in the text. 1466 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 1467 values are the bias to add to that token. 1468 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 1469 If provided, the output will always be validated against this 1470 format (unless the model returns a tool call). 1471 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 1472 Notes: 1473 - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. 1474 Older models only support basic version of structured outputs through `{"type": "json_object"}`. 1475 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 1476 - For structured outputs with streaming, 1477 the `response_format` must be a JSON schema and not a Pydantic model. 1478 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 1479 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 1480 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 1481 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 1482 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1483 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1484 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1485 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 1486 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 1487 1488 #### warm_up 1489 1490 ```python 1491 warm_up() -> None 1492 ``` 1493 1494 Warm up the OpenAI chat generator. 1495 1496 This will warm up the tools registered in the chat generator. 1497 This method is idempotent and will only warm up the tools once. 1498 1499 #### to_dict 1500 1501 ```python 1502 to_dict() -> dict[str, Any] 1503 ``` 1504 1505 Serialize this component to a dictionary. 1506 1507 **Returns:** 1508 1509 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 1510 1511 #### from_dict 1512 1513 ```python 1514 from_dict(data: dict[str, Any]) -> OpenAIChatGenerator 1515 ``` 1516 1517 Deserialize this component from a dictionary. 1518 1519 **Parameters:** 1520 1521 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 1522 1523 **Returns:** 1524 1525 - <code>OpenAIChatGenerator</code> – The deserialized component instance. 1526 1527 #### run 1528 1529 ```python 1530 run( 1531 messages: list[ChatMessage], 1532 streaming_callback: StreamingCallbackT | None = None, 1533 generation_kwargs: dict[str, Any] | None = None, 1534 *, 1535 tools: ToolsType | None = None, 1536 tools_strict: bool | None = None 1537 ) -> dict[str, list[ChatMessage]] 1538 ``` 1539 1540 Invokes chat completion based on the provided messages and generation parameters. 1541 1542 **Parameters:** 1543 1544 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1545 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1546 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1547 override the parameters passed during component initialization. 1548 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 1549 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1550 If set, it will override the `tools` parameter provided during initialization. 1551 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1552 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1553 If set, it will override the `tools_strict` parameter set during component initialization. 1554 1555 **Returns:** 1556 1557 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1558 - `replies`: A list containing the generated responses as ChatMessage instances. 1559 1560 #### run_async 1561 1562 ```python 1563 run_async( 1564 messages: list[ChatMessage], 1565 streaming_callback: StreamingCallbackT | None = None, 1566 generation_kwargs: dict[str, Any] | None = None, 1567 *, 1568 tools: ToolsType | None = None, 1569 tools_strict: bool | None = None 1570 ) -> dict[str, list[ChatMessage]] 1571 ``` 1572 1573 Asynchronously invokes chat completion based on the provided messages and generation parameters. 1574 1575 This is the asynchronous version of the `run` method. It has the same parameters and return values 1576 but can be used with `await` in async code. 1577 1578 **Parameters:** 1579 1580 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1581 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1582 Must be a coroutine. 1583 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1584 override the parameters passed during component initialization. 1585 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 1586 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1587 If set, it will override the `tools` parameter provided during initialization. 1588 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1589 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1590 If set, it will override the `tools_strict` parameter set during component initialization. 1591 1592 **Returns:** 1593 1594 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1595 - `replies`: A list containing the generated responses as ChatMessage instances. 1596 1597 ## chat/openai_responses 1598 1599 ### OpenAIResponsesChatGenerator 1600 1601 Completes chats using OpenAI's Responses API. 1602 1603 It works with the gpt-4 and o-series models and supports streaming responses 1604 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1605 format in input and output. 1606 1607 You can customize how the text is generated by passing parameters to the 1608 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1609 the component or when you run it. Any parameter that works with 1610 `openai.Responses.create` will work here too. 1611 1612 For details on OpenAI API parameters, see 1613 [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses). 1614 1615 ### Usage example 1616 1617 ```python 1618 from haystack.components.generators.chat import OpenAIResponsesChatGenerator 1619 from haystack.dataclasses import ChatMessage 1620 1621 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 1622 1623 client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}) 1624 response = client.run(messages) 1625 print(response) 1626 ``` 1627 1628 #### SUPPORTED_MODELS 1629 1630 ```python 1631 SUPPORTED_MODELS: list[str] = [ 1632 "gpt-5-mini", 1633 "gpt-5-nano", 1634 "gpt-5", 1635 "gpt-5.1", 1636 "gpt-5.2", 1637 "gpt-5.2-pro", 1638 "gpt-5.4", 1639 "gpt-5-pro", 1640 "gpt-4.1", 1641 "gpt-4.1-mini", 1642 "gpt-4.1-nano", 1643 "gpt-4o", 1644 "gpt-4o-mini", 1645 "o1", 1646 "o1-mini", 1647 "o1-pro", 1648 "o3", 1649 "o3-mini", 1650 "o3-pro", 1651 "o4-mini", 1652 ] 1653 1654 ``` 1655 1656 A non-exhaustive list of chat models supported by this component. 1657 See https://platform.openai.com/docs/models for the full list and snapshot IDs. 1658 1659 #### __init__ 1660 1661 ```python 1662 __init__( 1663 *, 1664 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 1665 model: str = "gpt-5-mini", 1666 streaming_callback: StreamingCallbackT | None = None, 1667 api_base_url: str | None = None, 1668 organization: str | None = None, 1669 generation_kwargs: dict[str, Any] | None = None, 1670 timeout: float | None = None, 1671 max_retries: int | None = None, 1672 tools: ToolsType | list[dict] | None = None, 1673 tools_strict: bool = False, 1674 http_client_kwargs: dict[str, Any] | None = None 1675 ) -> None 1676 ``` 1677 1678 Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default. 1679 1680 Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' 1681 environment variables to override the `timeout` and `max_retries` parameters respectively 1682 in the OpenAI client. 1683 1684 **Parameters:** 1685 1686 - **api_key** (<code>Secret</code>) – The OpenAI API key. 1687 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 1688 during initialization. 1689 - **model** (<code>str</code>) – The name of the model to use. 1690 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1691 The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 1692 as an argument. 1693 - **api_base_url** (<code>str | None</code>) – An optional base URL. 1694 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See 1695 [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 1696 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent 1697 directly to the OpenAI endpoint. 1698 See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for 1699 more details. 1700 Some of the supported parameters: 1701 - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, 1702 while lower values like 0.2 will make it more focused and deterministic. 1703 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 1704 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 1705 comprising the top 10% probability mass are considered. 1706 - `previous_response_id`: The ID of the previous response. 1707 Use this to create multi-turn conversations. 1708 - `text_format`: A Pydantic model that enforces the structure of the model's response. 1709 If provided, the output will always be validated against this 1710 format (unless the model returns a tool call). 1711 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 1712 - `text`: A JSON schema that enforces the structure of the model's response. 1713 If provided, the output will always be validated against this 1714 format (unless the model returns a tool call). 1715 Notes: 1716 - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. 1717 - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. 1718 - Currently, this component doesn't support streaming for structured outputs. 1719 - Older models only support basic version of structured outputs through `{"type": "json_object"}`. 1720 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 1721 - `reasoning`: A dictionary of parameters for reasoning. For example: 1722 - `summary`: The summary of the reasoning. 1723 - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`. 1724 - `generate_summary`: Whether to generate a summary of the reasoning. 1725 Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. 1726 For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning). 1727 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 1728 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 1729 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 1730 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 1731 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. This parameter can accept either a 1732 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1733 OpenAI/MCP tool definitions. 1734 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1735 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 1736 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 1737 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 1738 are strict by default. 1739 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 1740 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 1741 1742 #### warm_up 1743 1744 ```python 1745 warm_up() -> None 1746 ``` 1747 1748 Warm up the OpenAI responses chat generator. 1749 1750 This will warm up the tools registered in the chat generator. 1751 This method is idempotent and will only warm up the tools once. 1752 1753 #### to_dict 1754 1755 ```python 1756 to_dict() -> dict[str, Any] 1757 ``` 1758 1759 Serialize this component to a dictionary. 1760 1761 **Returns:** 1762 1763 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 1764 1765 #### from_dict 1766 1767 ```python 1768 from_dict(data: dict[str, Any]) -> OpenAIResponsesChatGenerator 1769 ``` 1770 1771 Deserialize this component from a dictionary. 1772 1773 **Parameters:** 1774 1775 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 1776 1777 **Returns:** 1778 1779 - <code>OpenAIResponsesChatGenerator</code> – The deserialized component instance. 1780 1781 #### run 1782 1783 ```python 1784 run( 1785 messages: list[ChatMessage], 1786 *, 1787 streaming_callback: StreamingCallbackT | None = None, 1788 generation_kwargs: dict[str, Any] | None = None, 1789 tools: ToolsType | list[dict] | None = None, 1790 tools_strict: bool | None = None 1791 ) -> dict[str, list[ChatMessage]] 1792 ``` 1793 1794 Invokes response generation based on the provided messages and generation parameters. 1795 1796 **Parameters:** 1797 1798 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1799 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1800 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1801 override the parameters passed during component initialization. 1802 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 1803 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. If set, it will override the 1804 `tools` parameter set during component initialization. This parameter can accept either a 1805 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1806 OpenAI/MCP tool definitions. 1807 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1808 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 1809 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 1810 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 1811 are strict by default. 1812 If set, it will override the `tools_strict` parameter set during component initialization. 1813 1814 **Returns:** 1815 1816 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1817 - `replies`: A list containing the generated responses as ChatMessage instances. 1818 1819 #### run_async 1820 1821 ```python 1822 run_async( 1823 messages: list[ChatMessage], 1824 *, 1825 streaming_callback: StreamingCallbackT | None = None, 1826 generation_kwargs: dict[str, Any] | None = None, 1827 tools: ToolsType | list[dict] | None = None, 1828 tools_strict: bool | None = None 1829 ) -> dict[str, list[ChatMessage]] 1830 ``` 1831 1832 Asynchronously invokes response generation based on the provided messages and generation parameters. 1833 1834 This is the asynchronous version of the `run` method. It has the same parameters and return values 1835 but can be used with `await` in async code. 1836 1837 **Parameters:** 1838 1839 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1840 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1841 Must be a coroutine. 1842 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1843 override the parameters passed during component initialization. 1844 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 1845 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the 1846 `tools` parameter set during component initialization. This parameter can accept either a list of 1847 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1848 OpenAI/MCP tool definitions. 1849 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1850 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1851 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1852 If set, it will override the `tools_strict` parameter set during component initialization. 1853 1854 **Returns:** 1855 1856 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1857 - `replies`: A list containing the generated responses as ChatMessage instances. 1858 1859 ## hugging_face_api 1860 1861 ### HuggingFaceAPIGenerator 1862 1863 Generates text using Hugging Face APIs. 1864 1865 Use it with the following Hugging Face APIs: 1866 1867 - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints) 1868 - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference) 1869 1870 **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the 1871 `text_generation` endpoint. Generative models are now only available through providers supporting the 1872 `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API. 1873 Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint. 1874 1875 ### Usage examples 1876 1877 #### With Hugging Face Inference Endpoints 1878 1879 <!-- test-ignore --> 1880 1881 ```python 1882 from haystack.components.generators import HuggingFaceAPIGenerator 1883 from haystack.utils import Secret 1884 1885 generator = HuggingFaceAPIGenerator(api_type="inference_endpoints", 1886 api_params={"url": "<your-inference-endpoint-url>"}, 1887 token=Secret.from_token("<your-api-key>")) 1888 1889 result = generator.run(prompt="What's Natural Language Processing?") 1890 print(result) 1891 ``` 1892 1893 #### With self-hosted text generation inference 1894 1895 <!-- test-ignore --> 1896 1897 ```python 1898 from haystack.components.generators import HuggingFaceAPIGenerator 1899 1900 generator = HuggingFaceAPIGenerator(api_type="text_generation_inference", 1901 api_params={"url": "http://localhost:8080"}) 1902 1903 result = generator.run(prompt="What's Natural Language Processing?") 1904 print(result) 1905 ``` 1906 1907 #### With the free serverless inference API 1908 1909 Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the 1910 `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the 1911 `chat_completion` endpoint. 1912 1913 <!-- test-ignore --> 1914 1915 ```python 1916 from haystack.components.generators import HuggingFaceAPIGenerator 1917 from haystack.utils import Secret 1918 1919 generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api", 1920 api_params={"model": "HuggingFaceH4/zephyr-7b-beta"}, 1921 token=Secret.from_token("<your-api-key>")) 1922 1923 result = generator.run(prompt="What's Natural Language Processing?") 1924 print(result) 1925 ``` 1926 1927 #### __init__ 1928 1929 ```python 1930 __init__( 1931 api_type: HFGenerationAPIType | str, 1932 api_params: dict[str, str], 1933 token: Secret | None = Secret.from_env_var( 1934 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 1935 ), 1936 generation_kwargs: dict[str, Any] | None = None, 1937 stop_words: list[str] | None = None, 1938 streaming_callback: StreamingCallbackT | None = None, 1939 ) -> None 1940 ``` 1941 1942 Initialize the HuggingFaceAPIGenerator instance. 1943 1944 **Parameters:** 1945 1946 - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types: 1947 - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference). 1948 - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints). 1949 - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api). 1950 This might no longer work due to changes in the models offered in the Hugging Face Inference API. 1951 Please use the `HuggingFaceAPIChatGenerator` component instead. 1952 - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys: 1953 - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. 1954 - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or 1955 `TEXT_GENERATION_INFERENCE`. 1956 - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc. 1957 - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization. 1958 Check your HF token in your [account settings](https://huggingface.co/settings/tokens). 1959 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`, 1960 `temperature`, `top_k`, `top_p`. 1961 For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) 1962 for more information. 1963 - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words. 1964 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1965 1966 #### to_dict 1967 1968 ```python 1969 to_dict() -> dict[str, Any] 1970 ``` 1971 1972 Serialize this component to a dictionary. 1973 1974 **Returns:** 1975 1976 - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component. 1977 1978 #### from_dict 1979 1980 ```python 1981 from_dict(data: dict[str, Any]) -> HuggingFaceAPIGenerator 1982 ``` 1983 1984 Deserialize this component from a dictionary. 1985 1986 #### run 1987 1988 ```python 1989 run( 1990 prompt: str, 1991 streaming_callback: StreamingCallbackT | None = None, 1992 generation_kwargs: dict[str, Any] | None = None, 1993 ) -> dict[str, Any] 1994 ``` 1995 1996 Invoke the text generation inference for the given prompt and generation parameters. 1997 1998 **Parameters:** 1999 2000 - **prompt** (<code>str</code>) – A string representing the prompt. 2001 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2002 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 2003 2004 **Returns:** 2005 2006 - <code>dict\[str, Any\]</code> – A dictionary with the generated replies and metadata. Both are lists of length n. 2007 - replies: A list of strings representing the generated replies. 2008 2009 ## hugging_face_local 2010 2011 ### HuggingFaceLocalGenerator 2012 2013 Generates text using models from Hugging Face that run locally. 2014 2015 LLMs running locally may need powerful hardware. 2016 2017 ### Usage example 2018 2019 ```python 2020 from haystack.components.generators import HuggingFaceLocalGenerator 2021 2022 generator = HuggingFaceLocalGenerator( 2023 model="Qwen/Qwen3-0.6B", 2024 task="text-generation", 2025 generation_kwargs={"max_new_tokens": 100, "temperature": 0.9} 2026 ) 2027 2028 print(generator.run("Who is the best American actor?")) 2029 # >> {'replies': ['John Cusack']} 2030 ``` 2031 2032 #### __init__ 2033 2034 ```python 2035 __init__( 2036 model: str = "Qwen/Qwen3-0.6B", 2037 task: Literal["text-generation", "text2text-generation"] | None = None, 2038 device: ComponentDevice | None = None, 2039 token: Secret | None = Secret.from_env_var( 2040 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 2041 ), 2042 generation_kwargs: dict[str, Any] | None = None, 2043 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 2044 stop_words: list[str] | None = None, 2045 streaming_callback: StreamingCallbackT | None = None, 2046 ) -> None 2047 ``` 2048 2049 Creates an instance of a HuggingFaceLocalGenerator. 2050 2051 **Parameters:** 2052 2053 - **model** (<code>str</code>) – The Hugging Face text generation model name or path. 2054 - **task** (<code>Literal['text-generation', 'text2text-generation'] | None</code>) – The task for the Hugging Face pipeline. Possible options: 2055 - `text-generation`: Supported by decoder models, like GPT. 2056 - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead. 2057 Previously supported by encoder–decoder models such as T5. 2058 If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 2059 If not specified, the component calls the Hugging Face API to infer the task from the model name. 2060 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 2061 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 2062 - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files. 2063 If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 2064 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 2065 Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. 2066 See Hugging Face's documentation for more information: 2067 - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation) 2068 - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig) 2069 - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the 2070 Hugging Face pipeline for text generation. 2071 These keyword arguments provide fine-grained control over the Hugging Face pipeline. 2072 In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. 2073 For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). 2074 In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization: 2075 [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) 2076 - **stop_words** (<code>list\[str\] | None</code>) – If the model generates a stop word, the generation stops. 2077 If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. 2078 For some chat models, the output includes both the new text and the original prompt. 2079 In these cases, make sure your prompt has no stop words. 2080 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 2081 2082 #### warm_up 2083 2084 ```python 2085 warm_up() -> None 2086 ``` 2087 2088 Initializes the component. 2089 2090 #### to_dict 2091 2092 ```python 2093 to_dict() -> dict[str, Any] 2094 ``` 2095 2096 Serializes the component to a dictionary. 2097 2098 **Returns:** 2099 2100 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 2101 2102 #### from_dict 2103 2104 ```python 2105 from_dict(data: dict[str, Any]) -> HuggingFaceLocalGenerator 2106 ``` 2107 2108 Deserializes the component from a dictionary. 2109 2110 **Parameters:** 2111 2112 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 2113 2114 **Returns:** 2115 2116 - <code>HuggingFaceLocalGenerator</code> – The deserialized component. 2117 2118 #### run 2119 2120 ```python 2121 run( 2122 prompt: str, 2123 streaming_callback: StreamingCallbackT | None = None, 2124 generation_kwargs: dict[str, Any] | None = None, 2125 ) -> dict[str, Any] 2126 ``` 2127 2128 Run the text generation model on the given prompt. 2129 2130 **Parameters:** 2131 2132 - **prompt** (<code>str</code>) – A string representing the prompt. 2133 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2134 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 2135 2136 **Returns:** 2137 2138 - <code>dict\[str, Any\]</code> – A dictionary containing the generated replies. 2139 - replies: A list of strings representing the generated replies. 2140 2141 ## openai 2142 2143 ### OpenAIGenerator 2144 2145 Generates text using OpenAI's large language models (LLMs). 2146 2147 It works with the gpt-4 and gpt-5 series models and supports streaming responses 2148 from OpenAI API. It uses strings as input and output. 2149 2150 You can customize how the text is generated by passing parameters to the 2151 OpenAI API. Use the `**generation_kwargs` argument when you initialize 2152 the component or when you run it. Any parameter that works with 2153 `openai.ChatCompletion.create` will work here too. 2154 2155 For details on OpenAI API parameters, see 2156 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 2157 2158 ### Usage example 2159 2160 ```python 2161 from haystack.components.generators import OpenAIGenerator 2162 client = OpenAIGenerator() 2163 response = client.run("What's Natural Language Processing? Be brief.") 2164 print(response) 2165 2166 # >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 2167 # >> the interaction between computers and human language. It involves enabling computers to understand, interpret, 2168 # >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model': 2169 # >> 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16, 2170 # >> 'completion_tokens': 49, 'total_tokens': 65}}]} 2171 ``` 2172 2173 #### __init__ 2174 2175 ```python 2176 __init__( 2177 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 2178 model: str = "gpt-5-mini", 2179 streaming_callback: StreamingCallbackT | None = None, 2180 api_base_url: str | None = None, 2181 organization: str | None = None, 2182 system_prompt: str | None = None, 2183 generation_kwargs: dict[str, Any] | None = None, 2184 timeout: float | None = None, 2185 max_retries: int | None = None, 2186 http_client_kwargs: dict[str, Any] | None = None, 2187 ) -> None 2188 ``` 2189 2190 Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini 2191 2192 By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters 2193 in the OpenAI client. 2194 2195 **Parameters:** 2196 2197 - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI. 2198 - **model** (<code>str</code>) – The name of the model to use. 2199 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2200 The callback function accepts StreamingChunk as an argument. 2201 - **api_base_url** (<code>str | None</code>) – An optional base URL. 2202 - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`. 2203 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the system prompt is 2204 omitted, and the default system prompt of the model is used. 2205 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are all sent directly to 2206 the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for 2207 more details. 2208 Some of the supported parameters: 2209 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 2210 including visible output tokens and reasoning tokens. 2211 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 2212 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 2213 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 2214 considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens 2215 comprising the top 10% probability mass are considered. 2216 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 2217 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 2218 - `stop`: One or more sequences after which the LLM should stop generating tokens. 2219 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 2220 the model will be less likely to repeat the same token in the text. 2221 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 2222 Bigger values mean the model will be less likely to repeat the same token in the text. 2223 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 2224 values are the bias to add to that token. 2225 - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable 2226 or set to 30. 2227 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred 2228 from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 2229 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 2230 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 2231 2232 #### to_dict 2233 2234 ```python 2235 to_dict() -> dict[str, Any] 2236 ``` 2237 2238 Serialize this component to a dictionary. 2239 2240 **Returns:** 2241 2242 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 2243 2244 #### from_dict 2245 2246 ```python 2247 from_dict(data: dict[str, Any]) -> OpenAIGenerator 2248 ``` 2249 2250 Deserialize this component from a dictionary. 2251 2252 **Parameters:** 2253 2254 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 2255 2256 **Returns:** 2257 2258 - <code>OpenAIGenerator</code> – The deserialized component instance. 2259 2260 #### run 2261 2262 ```python 2263 run( 2264 prompt: str, 2265 system_prompt: str | None = None, 2266 streaming_callback: StreamingCallbackT | None = None, 2267 generation_kwargs: dict[str, Any] | None = None, 2268 ) -> dict[str, list[str] | list[dict[str, Any]]] 2269 ``` 2270 2271 Invoke the text generation inference based on the provided messages and generation parameters. 2272 2273 **Parameters:** 2274 2275 - **prompt** (<code>str</code>) – The string prompt to use for text generation. 2276 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If this run time system prompt is omitted, the system 2277 prompt, if defined at initialisation time, is used. 2278 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2279 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will potentially override the parameters 2280 passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to 2281 the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create). 2282 2283 **Returns:** 2284 2285 - <code>dict\[str, list\[str\] | list\[dict\[str, Any\]\]\]</code> – A list of strings containing the generated responses and a list of dictionaries containing the metadata 2286 for each response. 2287 2288 ## openai_dalle 2289 2290 ### DALLEImageGenerator 2291 2292 Generates images using OpenAI's DALL-E model. 2293 2294 For details on OpenAI API parameters, see 2295 [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create). 2296 2297 ### Usage example 2298 2299 ```python 2300 from haystack.components.generators import DALLEImageGenerator 2301 image_generator = DALLEImageGenerator() 2302 response = image_generator.run("Show me a picture of a black cat.") 2303 print(response) 2304 ``` 2305 2306 #### __init__ 2307 2308 ```python 2309 __init__( 2310 model: str = "dall-e-3", 2311 quality: Literal["standard", "hd"] = "standard", 2312 size: Literal[ 2313 "256x256", "512x512", "1024x1024", "1792x1024", "1024x1792" 2314 ] = "1024x1024", 2315 response_format: Literal["url", "b64_json"] = "url", 2316 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 2317 api_base_url: str | None = None, 2318 organization: str | None = None, 2319 timeout: float | None = None, 2320 max_retries: int | None = None, 2321 http_client_kwargs: dict[str, Any] | None = None, 2322 ) -> None 2323 ``` 2324 2325 Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3. 2326 2327 **Parameters:** 2328 2329 - **model** (<code>str</code>) – The model to use for image generation. Can be "dall-e-2" or "dall-e-3". 2330 - **quality** (<code>Literal['standard', 'hd']</code>) – The quality of the generated image. Can be "standard" or "hd". 2331 - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792']</code>) – The size of the generated images. 2332 Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. 2333 Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models. 2334 - **response_format** (<code>Literal['url', 'b64_json']</code>) – The format of the response. Can be "url" or "b64_json". 2335 - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI. 2336 - **api_base_url** (<code>str | None</code>) – An optional base URL. 2337 - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`. 2338 - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable 2339 or set to 30. 2340 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred 2341 from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 2342 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 2343 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 2344 2345 #### warm_up 2346 2347 ```python 2348 warm_up() -> None 2349 ``` 2350 2351 Warm up the OpenAI client. 2352 2353 #### run 2354 2355 ```python 2356 run( 2357 prompt: str, 2358 size: ( 2359 Literal["256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"] 2360 | None 2361 ) = None, 2362 quality: Literal["standard", "hd"] | None = None, 2363 response_format: Literal["url", "b64_json"] | None = None, 2364 ) -> dict[str, Any] 2365 ``` 2366 2367 Invokes the image generation inference based on the provided prompt and generation parameters. 2368 2369 **Parameters:** 2370 2371 - **prompt** (<code>str</code>) – The prompt to generate the image. 2372 - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792'] | None</code>) – If provided, overrides the size provided during initialization. 2373 - **quality** (<code>Literal['standard', 'hd'] | None</code>) – If provided, overrides the quality provided during initialization. 2374 - **response_format** (<code>Literal['url', 'b64_json'] | None</code>) – If provided, overrides the response format provided during initialization. 2375 2376 **Returns:** 2377 2378 - <code>dict\[str, Any\]</code> – A dictionary containing the generated list of images and the revised prompt. 2379 Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings. 2380 The revised prompt is the prompt that was used to generate the image, if there was any revision 2381 to the prompt made by OpenAI. 2382 2383 #### to_dict 2384 2385 ```python 2386 to_dict() -> dict[str, Any] 2387 ``` 2388 2389 Serialize this component to a dictionary. 2390 2391 **Returns:** 2392 2393 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 2394 2395 #### from_dict 2396 2397 ```python 2398 from_dict(data: dict[str, Any]) -> DALLEImageGenerator 2399 ``` 2400 2401 Deserialize this component from a dictionary. 2402 2403 **Parameters:** 2404 2405 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 2406 2407 **Returns:** 2408 2409 - <code>DALLEImageGenerator</code> – The deserialized component instance. 2410 2411 ## utils 2412 2413 ### print_streaming_chunk 2414 2415 ```python 2416 print_streaming_chunk(chunk: StreamingChunk) -> None 2417 ``` 2418 2419 Callback function to handle and display streaming output chunks. 2420 2421 This function processes a `StreamingChunk` object by: 2422 2423 - Printing tool call metadata (if any), including function names and arguments, as they arrive. 2424 - Printing tool call results when available. 2425 - Printing the main content (e.g., text tokens) of the chunk as it is received. 2426 2427 The function outputs data directly to stdout and flushes output buffers to ensure immediate display during 2428 streaming. 2429 2430 **Parameters:** 2431 2432 - **chunk** (<code>StreamingChunk</code>) – A chunk of streaming data containing content and optional metadata, such as tool calls and 2433 tool results.