generators_api.md
1 --- 2 title: "Generators" 3 id: generators-api 4 description: "Enables text generation using LLMs." 5 slug: "/generators-api" 6 --- 7 8 9 ## azure 10 11 ### AzureOpenAIGenerator 12 13 Bases: <code>OpenAIGenerator</code> 14 15 Generates text using OpenAI's large language models (LLMs). 16 17 It works with the gpt-4 - type models and supports streaming responses 18 from OpenAI API. 19 20 You can customize how the text is generated by passing parameters to the 21 OpenAI API. Use the `**generation_kwargs` argument when you initialize 22 the component or when you run it. Any parameter that works with 23 `openai.ChatCompletion.create` will work here too. 24 25 For details on OpenAI API parameters, see 26 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 27 28 ### Usage example 29 30 <!-- test-ignore --> 31 32 ```python 33 from haystack.components.generators import AzureOpenAIGenerator 34 from haystack.utils import Secret 35 client = AzureOpenAIGenerator( 36 azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>", 37 api_key=Secret.from_token("<your-api-key>"), 38 azure_deployment="<this a model name, e.g. gpt-4.1-mini>") 39 response = client.run("What's Natural Language Processing? Be brief.") 40 print(response) 41 ``` 42 43 ``` 44 # >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 45 # >> the interaction between computers and human language. It involves enabling computers to understand, interpret, 46 # >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model': 47 # >> 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16, 48 # >> 'completion_tokens': 49, 'total_tokens': 65}}]} 49 ``` 50 51 #### __init__ 52 53 ```python 54 __init__( 55 azure_endpoint: str | None = None, 56 api_version: str | None = "2024-12-01-preview", 57 azure_deployment: str | None = "gpt-4.1-mini", 58 api_key: Secret | None = Secret.from_env_var( 59 "AZURE_OPENAI_API_KEY", strict=False 60 ), 61 azure_ad_token: Secret | None = Secret.from_env_var( 62 "AZURE_OPENAI_AD_TOKEN", strict=False 63 ), 64 organization: str | None = None, 65 streaming_callback: StreamingCallbackT | None = None, 66 system_prompt: str | None = None, 67 timeout: float | None = None, 68 max_retries: int | None = None, 69 http_client_kwargs: dict[str, Any] | None = None, 70 generation_kwargs: dict[str, Any] | None = None, 71 default_headers: dict[str, str] | None = None, 72 *, 73 azure_ad_token_provider: AzureADTokenProvider | None = None 74 ) -> None 75 ``` 76 77 Initialize the Azure OpenAI Generator. 78 79 **Parameters:** 80 81 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`. 82 - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview. 83 - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name. 84 - **api_key** (<code>Secret | None</code>) – The API key to use for authentication. 85 - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 86 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 87 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 88 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 89 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 90 as an argument. 91 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the Generator 92 omits the system prompt and uses the default system prompt. 93 - **timeout** (<code>float | None</code>) – Timeout for AzureOpenAI client. If not set, it is inferred from the 94 `OPENAI_TIMEOUT` environment variable or set to 30. 95 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with AzureOpenAI if it returns an internal error. 96 If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 97 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 98 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 99 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model, sent directly to 100 the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for 101 more details. 102 Some of the supported parameters: 103 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 104 including visible output tokens and reasoning tokens. 105 - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. 106 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 107 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 108 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 109 comprising the top 10% probability mass are considered. 110 - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, 111 the LLM will generate two completions per prompt, resulting in 6 completions total. 112 - `stop`: One or more sequences after which the LLM should stop generating tokens. 113 - `presence_penalty`: The penalty applied if a token is already present. 114 Higher values make the model less likely to repeat the token. 115 - `frequency_penalty`: Penalty applied if a token has already been generated. 116 Higher values make the model less likely to repeat the token. 117 - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the 118 values are the bias to add to that token. 119 - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client. 120 - **azure_ad_token_provider** (<code>AzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on 121 every request. 122 123 #### to_dict 124 125 ```python 126 to_dict() -> dict[str, Any] 127 ``` 128 129 Serialize this component to a dictionary. 130 131 **Returns:** 132 133 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 134 135 #### from_dict 136 137 ```python 138 from_dict(data: dict[str, Any]) -> AzureOpenAIGenerator 139 ``` 140 141 Deserialize this component from a dictionary. 142 143 **Parameters:** 144 145 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 146 147 **Returns:** 148 149 - <code>AzureOpenAIGenerator</code> – The deserialized component instance. 150 151 ## chat/azure 152 153 ### AzureOpenAIChatGenerator 154 155 Bases: <code>OpenAIChatGenerator</code> 156 157 Generates text using OpenAI's models on Azure. 158 159 It works with the gpt-4 - type models and supports streaming responses 160 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 161 format in input and output. 162 163 You can customize how the text is generated by passing parameters to the 164 OpenAI API. Use the `**generation_kwargs` argument when you initialize 165 the component or when you run it. Any parameter that works with 166 `openai.ChatCompletion.create` will work here too. 167 168 For details on OpenAI API parameters, see 169 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 170 171 ### Usage example 172 173 <!-- test-ignore --> 174 175 ```python 176 from haystack.components.generators.chat import AzureOpenAIChatGenerator 177 from haystack.dataclasses import ChatMessage 178 from haystack.utils import Secret 179 180 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 181 182 client = AzureOpenAIChatGenerator( 183 azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>", 184 api_key=Secret.from_token("<your-api-key>"), 185 azure_deployment="<this a model name, e.g. gpt-4.1-mini>") 186 response = client.run(messages) 187 print(response) 188 ``` 189 190 ``` 191 {'replies': 192 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text= 193 "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 194 enabling computers to understand, interpret, and generate human language in a way that is useful.")], 195 _name=None, 196 _meta={'model': 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 197 'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})] 198 } 199 ``` 200 201 #### SUPPORTED_MODELS 202 203 ```python 204 SUPPORTED_MODELS: list[str] = [ 205 "gpt-5.4", 206 "gpt-5.4-pro", 207 "gpt-5.3-codex", 208 "gpt-5.2", 209 "gpt-5.2-codex", 210 "gpt-5.2-chat", 211 "gpt-5.1", 212 "gpt-5.1-chat", 213 "gpt-5.1-codex", 214 "gpt-5.1-codex-mini", 215 "gpt-5", 216 "gpt-5-mini", 217 "gpt-5-nano", 218 "gpt-5-chat", 219 "gpt-4.1", 220 "gpt-4.1-mini", 221 "gpt-4.1-nano", 222 "gpt-4o", 223 "gpt-4o-mini", 224 "gpt-4o-audio-preview", 225 "gpt-realtime-1.5", 226 "gpt-audio-1.5", 227 "o1", 228 "o1-mini", 229 "o3", 230 "o3-mini", 231 "o4-mini", 232 "codex-mini", 233 "gpt-4", 234 "gpt-35-turbo", 235 "gpt-oss-120b", 236 "computer-use-preview", 237 ] 238 239 ``` 240 241 A non-exhaustive list of chat models supported by this component. 242 See https://learn.microsoft.com/en-us/azure/foundry/foundry-models/concepts/models-sold-directly-by-azure 243 for the full list. 244 245 #### __init__ 246 247 ```python 248 __init__( 249 azure_endpoint: str | None = None, 250 api_version: str | None = "2024-12-01-preview", 251 azure_deployment: str | None = "gpt-4.1-mini", 252 api_key: Secret | None = Secret.from_env_var( 253 "AZURE_OPENAI_API_KEY", strict=False 254 ), 255 azure_ad_token: Secret | None = Secret.from_env_var( 256 "AZURE_OPENAI_AD_TOKEN", strict=False 257 ), 258 organization: str | None = None, 259 streaming_callback: StreamingCallbackT | None = None, 260 timeout: float | None = None, 261 max_retries: int | None = None, 262 generation_kwargs: dict[str, Any] | None = None, 263 default_headers: dict[str, str] | None = None, 264 tools: ToolsType | None = None, 265 tools_strict: bool = False, 266 *, 267 azure_ad_token_provider: ( 268 AzureADTokenProvider | AsyncAzureADTokenProvider | None 269 ) = None, 270 http_client_kwargs: dict[str, Any] | None = None 271 ) -> None 272 ``` 273 274 Initialize the Azure OpenAI Chat Generator component. 275 276 **Parameters:** 277 278 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. 279 - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview. 280 - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name. 281 - **api_key** (<code>Secret | None</code>) – The API key to use for authentication. 282 - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 283 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 284 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 285 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 286 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 287 as an argument. 288 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 289 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 290 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 291 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 292 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to 293 the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 294 Some of the supported parameters: 295 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 296 including visible output tokens and reasoning tokens. 297 - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. 298 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 299 - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers 300 tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising 301 the top 10% probability mass are considered. 302 - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, 303 the LLM will generate two completions per prompt, resulting in 6 completions total. 304 - `stop`: One or more sequences after which the LLM should stop generating tokens. 305 - `presence_penalty`: The penalty applied if a token is already present. 306 Higher values make the model less likely to repeat the token. 307 - `frequency_penalty`: Penalty applied if a token has already been generated. 308 Higher values make the model less likely to repeat the token. 309 - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the 310 values are the bias to add to that token. 311 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 312 If provided, the output will always be validated against this 313 format (unless the model returns a tool call). 314 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 315 Notes: 316 - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. 317 Older models only support basic version of structured outputs through `{"type": "json_object"}`. 318 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 319 - For structured outputs with streaming, 320 the `response_format` must be a JSON schema and not a Pydantic model. 321 - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client. 322 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 323 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 324 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 325 - **azure_ad_token_provider** (<code>AzureADTokenProvider | AsyncAzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on 326 every request. 327 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 328 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 329 330 #### warm_up 331 332 ```python 333 warm_up() -> None 334 ``` 335 336 Warm up the Azure OpenAI chat generator. 337 338 This will warm up the tools registered in the chat generator. 339 This method is idempotent and will only warm up the tools once. 340 341 #### to_dict 342 343 ```python 344 to_dict() -> dict[str, Any] 345 ``` 346 347 Serialize this component to a dictionary. 348 349 **Returns:** 350 351 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 352 353 #### from_dict 354 355 ```python 356 from_dict(data: dict[str, Any]) -> AzureOpenAIChatGenerator 357 ``` 358 359 Deserialize this component from a dictionary. 360 361 **Parameters:** 362 363 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 364 365 **Returns:** 366 367 - <code>AzureOpenAIChatGenerator</code> – The deserialized component instance. 368 369 ## chat/azure_responses 370 371 ### AzureOpenAIResponsesChatGenerator 372 373 Bases: <code>OpenAIResponsesChatGenerator</code> 374 375 Completes chats using OpenAI's Responses API on Azure. 376 377 It works with the gpt-5 and o-series models and supports streaming responses 378 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 379 format in input and output. 380 381 You can customize how the text is generated by passing parameters to the 382 OpenAI API. Use the `**generation_kwargs` argument when you initialize 383 the component or when you run it. Any parameter that works with 384 `openai.Responses.create` will work here too. 385 386 For details on OpenAI API parameters, see 387 [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses). 388 389 ### Usage example 390 391 <!-- test-ignore --> 392 393 ```python 394 from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator 395 from haystack.dataclasses import ChatMessage 396 397 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 398 399 client = AzureOpenAIResponsesChatGenerator( 400 azure_endpoint="https://example-resource.azure.openai.com/", 401 generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}} 402 ) 403 response = client.run(messages) 404 print(response) 405 ``` 406 407 #### SUPPORTED_MODELS 408 409 ```python 410 SUPPORTED_MODELS: list[str] = [ 411 "gpt-5.4-pro", 412 "gpt-5.4", 413 "gpt-5.3-chat", 414 "gpt-5.3-codex", 415 "gpt-5.2-codex", 416 "gpt-5.2", 417 "gpt-5.2-chat", 418 "gpt-5.1-codex-max", 419 "gpt-5.1", 420 "gpt-5.1-chat", 421 "gpt-5.1-codex", 422 "gpt-5.1-codex-mini", 423 "gpt-5-pro", 424 "gpt-5-codex", 425 "gpt-5", 426 "gpt-5-mini", 427 "gpt-5-nano", 428 "gpt-5-chat", 429 "gpt-4o", 430 "gpt-4o-mini", 431 "computer-use-preview", 432 "gpt-4.1", 433 "gpt-4.1-nano", 434 "gpt-4.1-mini", 435 "gpt-image-1", 436 "gpt-image-1-mini", 437 "gpt-image-1.5", 438 "o1", 439 "o3-mini", 440 "o3", 441 "o4-mini", 442 ] 443 444 ``` 445 446 A non-exhaustive list of chat models supported by this component. 447 See https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/responses#model-support for the full list. 448 449 #### __init__ 450 451 ```python 452 __init__( 453 *, 454 api_key: ( 455 Secret | Callable[[], str] | Callable[[], Awaitable[str]] 456 ) = Secret.from_env_var("AZURE_OPENAI_API_KEY", strict=False), 457 azure_endpoint: str | None = None, 458 azure_deployment: str = "gpt-5-mini", 459 streaming_callback: StreamingCallbackT | None = None, 460 organization: str | None = None, 461 generation_kwargs: dict[str, Any] | None = None, 462 timeout: float | None = None, 463 max_retries: int | None = None, 464 tools: ToolsType | None = None, 465 tools_strict: bool = False, 466 http_client_kwargs: dict[str, Any] | None = None 467 ) -> None 468 ``` 469 470 Initialize the AzureOpenAIResponsesChatGenerator component. 471 472 **Parameters:** 473 474 - **api_key** (<code>Secret | Callable\[[], str\] | Callable\[[], Awaitable\[str\]\]</code>) – The API key to use for authentication. Can be: 475 - A `Secret` object containing the API key. 476 - A `Secret` object containing the [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 477 - A function that returns an Azure Active Directory token. 478 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. 479 - **azure_deployment** (<code>str</code>) – The deployment of the model, usually the model name. 480 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 481 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 482 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 483 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 484 as an argument. 485 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 486 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 487 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 488 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 489 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent 490 directly to the OpenAI endpoint. 491 See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for 492 more details. 493 Some of the supported parameters: 494 - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, 495 while lower values like 0.2 will make it more focused and deterministic. 496 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 497 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 498 comprising the top 10% probability mass are considered. 499 - `previous_response_id`: The ID of the previous response. 500 Use this to create multi-turn conversations. 501 - `text_format`: A Pydantic model that enforces the structure of the model's response. 502 If provided, the output will always be validated against this 503 format (unless the model returns a tool call). 504 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 505 - `text`: A JSON schema that enforces the structure of the model's response. 506 If provided, the output will always be validated against this 507 format (unless the model returns a tool call). 508 Notes: 509 - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. 510 - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. 511 - Currently, this component doesn't support streaming for structured outputs. 512 - Older models only support basic version of structured outputs through `{"type": "json_object"}`. 513 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 514 - `reasoning`: A dictionary of parameters for reasoning. For example: 515 - `summary`: The summary of the reasoning. 516 - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`. 517 - `generate_summary`: Whether to generate a summary of the reasoning. 518 Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. 519 For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning). 520 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 521 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 522 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 523 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 524 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 525 526 #### to_dict 527 528 ```python 529 to_dict() -> dict[str, Any] 530 ``` 531 532 Serialize this component to a dictionary. 533 534 **Returns:** 535 536 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 537 538 #### from_dict 539 540 ```python 541 from_dict(data: dict[str, Any]) -> AzureOpenAIResponsesChatGenerator 542 ``` 543 544 Deserialize this component from a dictionary. 545 546 **Parameters:** 547 548 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 549 550 **Returns:** 551 552 - <code>AzureOpenAIResponsesChatGenerator</code> – The deserialized component instance. 553 554 ## chat/fallback 555 556 ### FallbackChatGenerator 557 558 A chat generator wrapper that tries multiple chat generators sequentially. 559 560 It forwards all parameters transparently to the underlying chat generators and returns the first successful result. 561 Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator. 562 If all chat generators fail, it raises a RuntimeError with details. 563 564 Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only 565 work correctly if the underlying chat generators implement proper timeout handling and raise exceptions 566 when timeouts occur. For predictable latency guarantees, ensure your chat generators: 567 568 - Support a `timeout` parameter in their initialization 569 - Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming) 570 - Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded 571 572 Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters 573 with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`) 574 typically applies to all connection phases: connection setup, read, write, and pool. For streaming 575 responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for 576 receiving the complete response. 577 578 Fail over is automatically triggered when a generator raises any exception, including: 579 580 - Timeout errors (if the generator implements and raises them) 581 - Rate limit errors (429) 582 - Authentication errors (401) 583 - Context length errors (400) 584 - Server errors (500+) 585 - Any other exception 586 587 #### __init__ 588 589 ```python 590 __init__(chat_generators: list[ChatGenerator]) -> None 591 ``` 592 593 Creates an instance of FallbackChatGenerator. 594 595 **Parameters:** 596 597 - **chat_generators** (<code>list\[ChatGenerator\]</code>) – A non-empty list of chat generator components to try in order. 598 599 #### to_dict 600 601 ```python 602 to_dict() -> dict[str, Any] 603 ``` 604 605 Serialize the component, including nested chat generators when they support serialization. 606 607 #### from_dict 608 609 ```python 610 from_dict(data: dict[str, Any]) -> FallbackChatGenerator 611 ``` 612 613 Rebuild the component from a serialized representation, restoring nested chat generators. 614 615 #### warm_up 616 617 ```python 618 warm_up() -> None 619 ``` 620 621 Warm up all underlying chat generators. 622 623 This method calls warm_up() on each underlying generator that supports it. 624 625 #### run 626 627 ```python 628 run( 629 messages: list[ChatMessage], 630 generation_kwargs: dict[str, Any] | None = None, 631 tools: ToolsType | None = None, 632 streaming_callback: StreamingCallbackT | None = None, 633 ) -> dict[str, list[ChatMessage] | dict[str, Any]] 634 ``` 635 636 Execute chat generators sequentially until one succeeds. 637 638 **Parameters:** 639 640 - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances. 641 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens). 642 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities. 643 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses. 644 645 **Returns:** 646 647 - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with: 648 - "replies": Generated ChatMessage instances from the first successful generator. 649 - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, 650 total_attempts, failed_chat_generators, plus any metadata from the successful generator. 651 652 **Raises:** 653 654 - <code>RuntimeError</code> – If all chat generators fail. 655 656 #### run_async 657 658 ```python 659 run_async( 660 messages: list[ChatMessage], 661 generation_kwargs: dict[str, Any] | None = None, 662 tools: ToolsType | None = None, 663 streaming_callback: StreamingCallbackT | None = None, 664 ) -> dict[str, list[ChatMessage] | dict[str, Any]] 665 ``` 666 667 Asynchronously execute chat generators sequentially until one succeeds. 668 669 **Parameters:** 670 671 - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances. 672 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens). 673 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities. 674 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses. 675 676 **Returns:** 677 678 - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with: 679 - "replies": Generated ChatMessage instances from the first successful generator. 680 - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, 681 total_attempts, failed_chat_generators, plus any metadata from the successful generator. 682 683 **Raises:** 684 685 - <code>RuntimeError</code> – If all chat generators fail. 686 687 ## chat/hugging_face_api 688 689 ### HuggingFaceAPIChatGenerator 690 691 Completes chats using Hugging Face APIs. 692 693 HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 694 format for input and output. Use it to generate text with Hugging Face APIs: 695 696 - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers) 697 - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints) 698 - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference) 699 700 ### Usage examples 701 702 #### With the serverless inference API (Inference Providers) - free tier available 703 704 <!-- test-ignore --> 705 706 ```python 707 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 708 from haystack.dataclasses import ChatMessage 709 from haystack.utils import Secret 710 from haystack.utils.hf import HFGenerationAPIType 711 712 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 713 ChatMessage.from_user("What's Natural Language Processing?")] 714 715 # the api_type can be expressed using the HFGenerationAPIType enum or as a string 716 api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API 717 api_type = "serverless_inference_api" # this is equivalent to the above 718 719 generator = HuggingFaceAPIChatGenerator(api_type=api_type, 720 api_params={"model": "Qwen/Qwen2.5-7B-Instruct", 721 "provider": "together"}, 722 token=Secret.from_token("<your-api-key>")) 723 724 result = generator.run(messages) 725 print(result) 726 ``` 727 728 #### With the serverless inference API (Inference Providers) and text+image input 729 730 <!-- test-ignore --> 731 732 ```python 733 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 734 from haystack.dataclasses import ChatMessage, ImageContent 735 from haystack.utils import Secret 736 from haystack.utils.hf import HFGenerationAPIType 737 738 # Create an image from file path, URL, or base64 739 image = ImageContent.from_file_path("path/to/your/image.jpg") 740 741 # Create a multimodal message with both text and image 742 messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])] 743 744 generator = HuggingFaceAPIChatGenerator( 745 api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API, 746 api_params={ 747 "model": "Qwen/Qwen2.5-VL-7B-Instruct", # Vision Language Model 748 "provider": "hyperbolic" 749 }, 750 token=Secret.from_token("<your-api-key>") 751 ) 752 753 result = generator.run(messages) 754 print(result) 755 ``` 756 757 #### With paid inference endpoints 758 759 <!-- test-ignore --> 760 761 ```python 762 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 763 from haystack.dataclasses import ChatMessage 764 from haystack.utils import Secret 765 766 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 767 ChatMessage.from_user("What's Natural Language Processing?")] 768 769 generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints", 770 api_params={"url": "<your-inference-endpoint-url>"}, 771 token=Secret.from_token("<your-api-key>")) 772 773 result = generator.run(messages) 774 print(result) 775 ``` 776 777 #### With self-hosted text generation inference 778 779 <!-- test-ignore --> 780 781 ```python 782 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 783 from haystack.dataclasses import ChatMessage 784 785 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 786 ChatMessage.from_user("What's Natural Language Processing?")] 787 788 generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference", 789 api_params={"url": "http://localhost:8080"}) 790 791 result = generator.run(messages) 792 print(result) 793 ``` 794 795 #### __init__ 796 797 ```python 798 __init__( 799 api_type: HFGenerationAPIType | str, 800 api_params: dict[str, str], 801 token: Secret | None = Secret.from_env_var( 802 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 803 ), 804 generation_kwargs: dict[str, Any] | None = None, 805 stop_words: list[str] | None = None, 806 streaming_callback: StreamingCallbackT | None = None, 807 tools: ToolsType | None = None, 808 ) -> None 809 ``` 810 811 Initialize the HuggingFaceAPIChatGenerator instance. 812 813 **Parameters:** 814 815 - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types: 816 - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference). 817 - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints). 818 - `serverless_inference_api`: See 819 [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers). 820 - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys: 821 - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. 822 - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`. 823 - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or 824 `TEXT_GENERATION_INFERENCE`. 825 - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc. 826 - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization. 827 Check your HF token in your [account settings](https://huggingface.co/settings/tokens). 828 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 829 Some examples: `max_tokens`, `temperature`, `top_p`. 830 For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion). 831 - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words. 832 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 833 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 834 The chosen model should support tool/function calling, according to the model card. 835 Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience 836 unexpected behavior. 837 838 #### warm_up 839 840 ```python 841 warm_up() -> None 842 ``` 843 844 Warm up the Hugging Face API chat generator. 845 846 This will warm up the tools registered in the chat generator. 847 This method is idempotent and will only warm up the tools once. 848 849 #### to_dict 850 851 ```python 852 to_dict() -> dict[str, Any] 853 ``` 854 855 Serialize this component to a dictionary. 856 857 **Returns:** 858 859 - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component. 860 861 #### from_dict 862 863 ```python 864 from_dict(data: dict[str, Any]) -> HuggingFaceAPIChatGenerator 865 ``` 866 867 Deserialize this component from a dictionary. 868 869 #### run 870 871 ```python 872 run( 873 messages: list[ChatMessage], 874 generation_kwargs: dict[str, Any] | None = None, 875 tools: ToolsType | None = None, 876 streaming_callback: StreamingCallbackT | None = None, 877 ) -> dict[str, list[ChatMessage]] 878 ``` 879 880 Invoke the text generation inference based on the provided messages and generation parameters. 881 882 **Parameters:** 883 884 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 885 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 886 - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override 887 the `tools` parameter set during component initialization. This parameter can accept either a 888 list of `Tool` objects or a `Toolset` instance. 889 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback` 890 parameter set during component initialization. 891 892 **Returns:** 893 894 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 895 - `replies`: A list containing the generated responses as ChatMessage objects. 896 897 #### run_async 898 899 ```python 900 run_async( 901 messages: list[ChatMessage], 902 generation_kwargs: dict[str, Any] | None = None, 903 tools: ToolsType | None = None, 904 streaming_callback: StreamingCallbackT | None = None, 905 ) -> dict[str, list[ChatMessage]] 906 ``` 907 908 Asynchronously invokes the text generation inference based on the provided messages and generation parameters. 909 910 This is the asynchronous version of the `run` method. It has the same parameters 911 and return values but can be used with `await` in an async code. 912 913 **Parameters:** 914 915 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 916 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 917 - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools` 918 parameter set during component initialization. This parameter can accept either a list of `Tool` objects 919 or a `Toolset` instance. 920 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback` 921 parameter set during component initialization. 922 923 **Returns:** 924 925 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 926 - `replies`: A list containing the generated responses as ChatMessage objects. 927 928 ## chat/hugging_face_local 929 930 ### default_tool_parser 931 932 ```python 933 default_tool_parser(text: str) -> list[ToolCall] | None 934 ``` 935 936 Default implementation for parsing tool calls from model output text. 937 938 Uses DEFAULT_TOOL_PATTERN to extract tool calls. 939 940 **Parameters:** 941 942 - **text** (<code>str</code>) – The text to parse for tool calls. 943 944 **Returns:** 945 946 - <code>list\[ToolCall\] | None</code> – A list containing a single ToolCall if a valid tool call is found, None otherwise. 947 948 ### HuggingFaceLocalChatGenerator 949 950 Generates chat responses using models from Hugging Face that run locally. 951 952 Use this component with chat-based models, 953 such as `Qwen/Qwen3-0.6B` or `meta-llama/Llama-2-7b-chat-hf`. 954 LLMs running locally may need powerful hardware. 955 956 ### Usage example 957 958 <!-- test-ignore --> 959 960 ```python 961 from haystack.components.generators.chat import HuggingFaceLocalChatGenerator 962 from haystack.dataclasses import ChatMessage 963 964 generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B") 965 messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")] 966 print(generator.run(messages)) 967 ``` 968 969 ``` 970 {'replies': 971 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text= 972 "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals 973 with the interaction between computers and human language. It enables computers to understand, interpret, and 974 generate human language in a valuable way. NLP involves various techniques such as speech recognition, text 975 analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to 976 process and derive meaning from human language, improving communication between humans and machines.")], 977 _name=None, 978 _meta={'finish_reason': 'stop', 'index': 0, 'model': 979 'mistralai/Mistral-7B-Instruct-v0.2', 980 'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}}) 981 ] 982 } 983 ``` 984 985 #### __init__ 986 987 ```python 988 __init__( 989 model: str = "Qwen/Qwen3-0.6B", 990 task: ( 991 Literal["text-generation", "text2text-generation", "image-text-to-text"] 992 | None 993 ) = None, 994 device: ComponentDevice | None = None, 995 token: Secret | None = Secret.from_env_var( 996 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 997 ), 998 chat_template: str | None = None, 999 generation_kwargs: dict[str, Any] | None = None, 1000 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 1001 stop_words: list[str] | None = None, 1002 streaming_callback: StreamingCallbackT | None = None, 1003 tools: ToolsType | None = None, 1004 tool_parsing_function: Callable[[str], list[ToolCall] | None] | None = None, 1005 async_executor: ThreadPoolExecutor | None = None, 1006 *, 1007 enable_thinking: bool = False 1008 ) -> None 1009 ``` 1010 1011 Initializes the HuggingFaceLocalChatGenerator component. 1012 1013 **Parameters:** 1014 1015 - **model** (<code>str</code>) – The Hugging Face text generation model name or path, 1016 for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`. 1017 The model must be a chat model supporting the ChatML messaging 1018 format. 1019 If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1020 - **task** (<code>Literal['text-generation', 'text2text-generation', 'image-text-to-text'] | None</code>) – The task for the Hugging Face pipeline. Possible options: 1021 - `text-generation`: Supported by decoder models, like GPT. 1022 - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead. 1023 Previously supported by encoder–decoder models such as T5. 1024 - `image-text-to-text`: Supported by vision-language models. 1025 If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1026 If not specified, the component calls the Hugging Face API to infer the task from the model name. 1027 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 1028 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 1029 - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files. 1030 If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1031 - **chat_template** (<code>str | None</code>) – Specifies an optional Jinja template for formatting chat 1032 messages. Most high-quality chat models have their own templates, but for models without this 1033 feature or if you prefer a custom template, use this parameter. 1034 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 1035 Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. 1036 See Hugging Face's documentation for more information: 1037 - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation) 1038 - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig) 1039 The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens. 1040 - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the 1041 Hugging Face pipeline for text generation. 1042 These keyword arguments provide fine-grained control over the Hugging Face pipeline. 1043 In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. 1044 For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). 1045 In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) 1046 - **stop_words** (<code>list\[str\] | None</code>) – A list of stop words. If the model generates a stop word, the generation stops. 1047 If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. 1048 For some chat models, the output includes both the new text and the original prompt. 1049 In these cases, make sure your prompt has no stop words. 1050 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1051 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1052 - **tool_parsing_function** (<code>Callable\\[[str\], list\[ToolCall\] | None\] | None</code>) – A callable that takes a string and returns a list of ToolCall objects or None. 1053 If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern. 1054 - **async_executor** (<code>ThreadPoolExecutor | None</code>) – Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be 1055 initialized and used 1056 - **enable_thinking** (<code>bool</code>) – Whether to enable thinking mode in the chat template for thinking-capable models. 1057 When enabled, the model generates intermediate reasoning before the final response. Defaults to False. 1058 1059 #### shutdown 1060 1061 ```python 1062 shutdown() -> None 1063 ``` 1064 1065 Explicitly shutdown the executor if we own it. 1066 1067 #### warm_up 1068 1069 ```python 1070 warm_up() -> None 1071 ``` 1072 1073 Initializes the component and warms up tools if provided. 1074 1075 #### to_dict 1076 1077 ```python 1078 to_dict() -> dict[str, Any] 1079 ``` 1080 1081 Serializes the component to a dictionary. 1082 1083 **Returns:** 1084 1085 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1086 1087 #### from_dict 1088 1089 ```python 1090 from_dict(data: dict[str, Any]) -> HuggingFaceLocalChatGenerator 1091 ``` 1092 1093 Deserializes the component from a dictionary. 1094 1095 **Parameters:** 1096 1097 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 1098 1099 **Returns:** 1100 1101 - <code>HuggingFaceLocalChatGenerator</code> – The deserialized component. 1102 1103 #### run 1104 1105 ```python 1106 run( 1107 messages: list[ChatMessage], 1108 generation_kwargs: dict[str, Any] | None = None, 1109 streaming_callback: StreamingCallbackT | None = None, 1110 tools: ToolsType | None = None, 1111 ) -> dict[str, list[ChatMessage]] 1112 ``` 1113 1114 Invoke text generation inference based on the provided messages and generation parameters. 1115 1116 **Parameters:** 1117 1118 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 1119 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 1120 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1121 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1122 If set, it will override the `tools` parameter provided during initialization. 1123 1124 **Returns:** 1125 1126 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 1127 - `replies`: A list containing the generated responses as ChatMessage instances. 1128 1129 #### create_message 1130 1131 ```python 1132 create_message( 1133 text: str, 1134 index: int, 1135 tokenizer: Union[PreTrainedTokenizer, PreTrainedTokenizerFast], 1136 prompt: str, 1137 generation_kwargs: dict[str, Any], 1138 parse_tool_calls: bool = False, 1139 ) -> ChatMessage 1140 ``` 1141 1142 Create a ChatMessage instance from the provided text, populated with metadata. 1143 1144 **Parameters:** 1145 1146 - **text** (<code>str</code>) – The generated text. 1147 - **index** (<code>int</code>) – The index of the generated text. 1148 - **tokenizer** (<code>Union\[PreTrainedTokenizer, PreTrainedTokenizerFast\]</code>) – The tokenizer used for generation. 1149 - **prompt** (<code>str</code>) – The prompt used for generation. 1150 - **generation_kwargs** (<code>dict\[str, Any\]</code>) – The generation parameters. 1151 - **parse_tool_calls** (<code>bool</code>) – Whether to attempt parsing tool calls from the text. 1152 1153 **Returns:** 1154 1155 - <code>ChatMessage</code> – A ChatMessage instance. 1156 1157 #### run_async 1158 1159 ```python 1160 run_async( 1161 messages: list[ChatMessage], 1162 generation_kwargs: dict[str, Any] | None = None, 1163 streaming_callback: StreamingCallbackT | None = None, 1164 tools: ToolsType | None = None, 1165 ) -> dict[str, list[ChatMessage]] 1166 ``` 1167 1168 Asynchronously invokes text generation inference based on the provided messages and generation parameters. 1169 1170 This is the asynchronous version of the `run` method. It has the same parameters 1171 and return values but can be used with `await` in an async code. 1172 1173 **Parameters:** 1174 1175 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 1176 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 1177 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1178 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1179 If set, it will override the `tools` parameter provided during initialization. 1180 1181 **Returns:** 1182 1183 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 1184 - `replies`: A list containing the generated responses as ChatMessage instances. 1185 1186 ## chat/llm 1187 1188 ### LLM 1189 1190 Bases: <code>Agent</code> 1191 1192 A text generation component powered by a large language model. 1193 1194 The LLM component is a simplified version of the Agent that focuses solely on text generation 1195 without tool usage. It processes messages and returns a single response from the language model. 1196 1197 ### Usage examples 1198 1199 ```python 1200 from haystack.components.generators.chat import LLM 1201 from haystack.components.generators.chat import OpenAIChatGenerator 1202 from haystack.dataclasses import ChatMessage 1203 1204 llm = LLM( 1205 chat_generator=OpenAIChatGenerator(), 1206 system_prompt="You are a helpful translation assistant.", 1207 user_prompt="""{% message role="user"%} 1208 Summarize the following document: {{ document }} 1209 {% endmessage %}""", 1210 required_variables=["document"], 1211 ) 1212 1213 result = llm.run(document="The weather is lovely today and the sun is shining. ") 1214 print(result["last_message"].text) 1215 ``` 1216 1217 #### __init__ 1218 1219 ```python 1220 __init__( 1221 *, 1222 chat_generator: ChatGenerator, 1223 system_prompt: str | None = None, 1224 user_prompt: str, 1225 required_variables: list[str] | Literal["*"] = "*", 1226 streaming_callback: StreamingCallbackT | None = None 1227 ) -> None 1228 ``` 1229 1230 Initialize the LLM component. 1231 1232 **Parameters:** 1233 1234 - **chat_generator** (<code>ChatGenerator</code>) – An instance of the chat generator that the LLM should use. 1235 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. 1236 - **user_prompt** (<code>str</code>) – User prompt for the LLM. Must contain at least one Jinja2 template variable 1237 (e.g., `{{ variable_name }}`). This prompt is appended to the messages provided at runtime. 1238 - **required_variables** (<code>list\[str\] | Literal['\*']</code>) – Variables that must be provided as input to user_prompt. 1239 If a variable listed as required is not provided, an exception is raised. 1240 If set to `"*"`, all variables found in the prompt are required. Defaults to `"*"`. 1241 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM. 1242 1243 **Raises:** 1244 1245 - <code>ValueError</code> – If user_prompt contains no template variables. 1246 - <code>ValueError</code> – If required_variables is an empty list. 1247 1248 #### to_dict 1249 1250 ```python 1251 to_dict() -> dict[str, Any] 1252 ``` 1253 1254 Serialize the LLM component to a dictionary. 1255 1256 **Returns:** 1257 1258 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1259 1260 #### from_dict 1261 1262 ```python 1263 from_dict(data: dict[str, Any]) -> LLM 1264 ``` 1265 1266 Deserialize the LLM from a dictionary. 1267 1268 **Parameters:** 1269 1270 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 1271 1272 **Returns:** 1273 1274 - <code>LLM</code> – Deserialized LLM instance. 1275 1276 #### run 1277 1278 ```python 1279 run( 1280 messages: list[ChatMessage] | None = None, 1281 streaming_callback: StreamingCallbackT | None = None, 1282 *, 1283 generation_kwargs: dict[str, Any] | None = None, 1284 system_prompt: str | None = None, 1285 user_prompt: str | None = None, 1286 **kwargs: Any 1287 ) -> dict[str, Any] 1288 ``` 1289 1290 Process messages and generate a response from the language model. 1291 1292 **Parameters:** 1293 1294 - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process. 1295 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM. 1296 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters 1297 will override the parameters passed during component initialization. 1298 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt. 1299 - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is 1300 appended to the messages provided at runtime. 1301 - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt` 1302 (the keys must match template variable names). 1303 1304 **Returns:** 1305 1306 - <code>dict\[str, Any\]</code> – A dictionary with the following keys: 1307 - "messages": List of all messages exchanged during the LLM's run. 1308 - "last_message": The last message exchanged during the LLM's run. 1309 1310 #### run_async 1311 1312 ```python 1313 run_async( 1314 messages: list[ChatMessage] | None = None, 1315 streaming_callback: StreamingCallbackT | None = None, 1316 *, 1317 generation_kwargs: dict[str, Any] | None = None, 1318 system_prompt: str | None = None, 1319 user_prompt: str | None = None, 1320 **kwargs: Any 1321 ) -> dict[str, Any] 1322 ``` 1323 1324 Asynchronously process messages and generate a response from the language model. 1325 1326 **Parameters:** 1327 1328 - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process. 1329 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An asynchronous callback that will be invoked when a response is streamed 1330 from the LLM. 1331 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters 1332 will override the parameters passed during component initialization. 1333 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt. 1334 - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is 1335 appended to the messages provided at runtime. 1336 - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt` 1337 (the keys must match template variable names). 1338 1339 **Returns:** 1340 1341 - <code>dict\[str, Any\]</code> – A dictionary with the following keys: 1342 - "messages": List of all messages exchanged during the LLM's run. 1343 - "last_message": The last message exchanged during the LLM's run. 1344 1345 ## chat/openai 1346 1347 ### OpenAIChatGenerator 1348 1349 Completes chats using OpenAI's large language models (LLMs). 1350 1351 It works with the gpt-4 and gpt-5 series models and supports streaming responses 1352 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1353 format in input and output. 1354 1355 You can customize how the text is generated by passing parameters to the 1356 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1357 the component or when you run it. Any parameter that works with 1358 `openai.ChatCompletion.create` will work here too. 1359 1360 For details on OpenAI API parameters, see 1361 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 1362 1363 ### Usage example 1364 1365 ```python 1366 from haystack.components.generators.chat import OpenAIChatGenerator 1367 from haystack.dataclasses import ChatMessage 1368 1369 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 1370 1371 client = OpenAIChatGenerator() 1372 response = client.run(messages) 1373 print(response) 1374 ``` 1375 1376 Output: 1377 1378 ``` 1379 {'replies': 1380 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content= 1381 [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence 1382 that focuses on enabling computers to understand, interpret, and generate human language in 1383 a way that is meaningful and useful.")], 1384 _name=None, 1385 _meta={'model': 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 1386 'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}}) 1387 ] 1388 } 1389 ``` 1390 1391 #### SUPPORTED_MODELS 1392 1393 ```python 1394 SUPPORTED_MODELS: list[str] = [ 1395 "gpt-5-mini", 1396 "gpt-5-nano", 1397 "gpt-5", 1398 "gpt-5.1", 1399 "gpt-5.2", 1400 "gpt-5.2-pro", 1401 "gpt-5.4", 1402 "gpt-5-pro", 1403 "gpt-4.1", 1404 "gpt-4.1-mini", 1405 "gpt-4.1-nano", 1406 "gpt-4o", 1407 "gpt-4o-mini", 1408 "gpt-4-turbo", 1409 "gpt-4", 1410 "gpt-3.5-turbo", 1411 ] 1412 1413 ``` 1414 1415 A non-exhaustive list of chat models supported by this component. 1416 See https://developers.openai.com/api/docs/models for the full list and snapshot IDs. 1417 1418 #### __init__ 1419 1420 ```python 1421 __init__( 1422 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 1423 model: str = "gpt-5-mini", 1424 streaming_callback: StreamingCallbackT | None = None, 1425 api_base_url: str | None = None, 1426 organization: str | None = None, 1427 generation_kwargs: dict[str, Any] | None = None, 1428 timeout: float | None = None, 1429 max_retries: int | None = None, 1430 tools: ToolsType | None = None, 1431 tools_strict: bool = False, 1432 http_client_kwargs: dict[str, Any] | None = None, 1433 ) -> None 1434 ``` 1435 1436 Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini 1437 1438 Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' 1439 environment variables to override the `timeout` and `max_retries` parameters respectively 1440 in the OpenAI client. 1441 1442 **Parameters:** 1443 1444 - **api_key** (<code>Secret</code>) – The OpenAI API key. 1445 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 1446 during initialization. 1447 - **model** (<code>str</code>) – The name of the model to use. 1448 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1449 The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 1450 as an argument. 1451 - **api_base_url** (<code>str | None</code>) – An optional base URL. 1452 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See 1453 [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 1454 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to 1455 the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for 1456 more details. 1457 Some of the supported parameters: 1458 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 1459 including visible output tokens and reasoning tokens. 1460 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 1461 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 1462 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 1463 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 1464 comprising the top 10% probability mass are considered. 1465 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 1466 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 1467 - `stop`: One or more sequences after which the LLM should stop generating tokens. 1468 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 1469 the model will be less likely to repeat the same token in the text. 1470 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 1471 Bigger values mean the model will be less likely to repeat the same token in the text. 1472 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 1473 values are the bias to add to that token. 1474 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 1475 If provided, the output will always be validated against this 1476 format (unless the model returns a tool call). 1477 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 1478 Notes: 1479 - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. 1480 Older models only support basic version of structured outputs through `{"type": "json_object"}`. 1481 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 1482 - For structured outputs with streaming, 1483 the `response_format` must be a JSON schema and not a Pydantic model. 1484 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 1485 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 1486 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 1487 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 1488 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1489 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1490 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1491 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 1492 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 1493 1494 #### warm_up 1495 1496 ```python 1497 warm_up() -> None 1498 ``` 1499 1500 Warm up the OpenAI chat generator. 1501 1502 This will warm up the tools registered in the chat generator. 1503 This method is idempotent and will only warm up the tools once. 1504 1505 #### to_dict 1506 1507 ```python 1508 to_dict() -> dict[str, Any] 1509 ``` 1510 1511 Serialize this component to a dictionary. 1512 1513 **Returns:** 1514 1515 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 1516 1517 #### from_dict 1518 1519 ```python 1520 from_dict(data: dict[str, Any]) -> OpenAIChatGenerator 1521 ``` 1522 1523 Deserialize this component from a dictionary. 1524 1525 **Parameters:** 1526 1527 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 1528 1529 **Returns:** 1530 1531 - <code>OpenAIChatGenerator</code> – The deserialized component instance. 1532 1533 #### run 1534 1535 ```python 1536 run( 1537 messages: list[ChatMessage], 1538 streaming_callback: StreamingCallbackT | None = None, 1539 generation_kwargs: dict[str, Any] | None = None, 1540 *, 1541 tools: ToolsType | None = None, 1542 tools_strict: bool | None = None 1543 ) -> dict[str, list[ChatMessage]] 1544 ``` 1545 1546 Invokes chat completion based on the provided messages and generation parameters. 1547 1548 **Parameters:** 1549 1550 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1551 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1552 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1553 override the parameters passed during component initialization. 1554 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 1555 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1556 If set, it will override the `tools` parameter provided during initialization. 1557 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1558 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1559 If set, it will override the `tools_strict` parameter set during component initialization. 1560 1561 **Returns:** 1562 1563 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1564 - `replies`: A list containing the generated responses as ChatMessage instances. 1565 1566 #### run_async 1567 1568 ```python 1569 run_async( 1570 messages: list[ChatMessage], 1571 streaming_callback: StreamingCallbackT | None = None, 1572 generation_kwargs: dict[str, Any] | None = None, 1573 *, 1574 tools: ToolsType | None = None, 1575 tools_strict: bool | None = None 1576 ) -> dict[str, list[ChatMessage]] 1577 ``` 1578 1579 Asynchronously invokes chat completion based on the provided messages and generation parameters. 1580 1581 This is the asynchronous version of the `run` method. It has the same parameters and return values 1582 but can be used with `await` in async code. 1583 1584 **Parameters:** 1585 1586 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1587 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1588 Must be a coroutine. 1589 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1590 override the parameters passed during component initialization. 1591 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 1592 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1593 If set, it will override the `tools` parameter provided during initialization. 1594 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1595 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1596 If set, it will override the `tools_strict` parameter set during component initialization. 1597 1598 **Returns:** 1599 1600 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1601 - `replies`: A list containing the generated responses as ChatMessage instances. 1602 1603 ## chat/openai_responses 1604 1605 ### OpenAIResponsesChatGenerator 1606 1607 Completes chats using OpenAI's Responses API. 1608 1609 It works with the gpt-4 and o-series models and supports streaming responses 1610 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1611 format in input and output. 1612 1613 You can customize how the text is generated by passing parameters to the 1614 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1615 the component or when you run it. Any parameter that works with 1616 `openai.Responses.create` will work here too. 1617 1618 For details on OpenAI API parameters, see 1619 [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses). 1620 1621 ### Usage example 1622 1623 ```python 1624 from haystack.components.generators.chat import OpenAIResponsesChatGenerator 1625 from haystack.dataclasses import ChatMessage 1626 1627 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 1628 1629 client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}) 1630 response = client.run(messages) 1631 print(response) 1632 ``` 1633 1634 #### SUPPORTED_MODELS 1635 1636 ```python 1637 SUPPORTED_MODELS: list[str] = [ 1638 "gpt-5-mini", 1639 "gpt-5-nano", 1640 "gpt-5", 1641 "gpt-5.1", 1642 "gpt-5.2", 1643 "gpt-5.2-pro", 1644 "gpt-5.4", 1645 "gpt-5-pro", 1646 "gpt-4.1", 1647 "gpt-4.1-mini", 1648 "gpt-4.1-nano", 1649 "gpt-4o", 1650 "gpt-4o-mini", 1651 "o1", 1652 "o1-mini", 1653 "o1-pro", 1654 "o3", 1655 "o3-mini", 1656 "o3-pro", 1657 "o4-mini", 1658 ] 1659 1660 ``` 1661 1662 A non-exhaustive list of chat models supported by this component. 1663 See https://platform.openai.com/docs/models for the full list and snapshot IDs. 1664 1665 #### __init__ 1666 1667 ```python 1668 __init__( 1669 *, 1670 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 1671 model: str = "gpt-5-mini", 1672 streaming_callback: StreamingCallbackT | None = None, 1673 api_base_url: str | None = None, 1674 organization: str | None = None, 1675 generation_kwargs: dict[str, Any] | None = None, 1676 timeout: float | None = None, 1677 max_retries: int | None = None, 1678 tools: ToolsType | list[dict] | None = None, 1679 tools_strict: bool = False, 1680 http_client_kwargs: dict[str, Any] | None = None 1681 ) -> None 1682 ``` 1683 1684 Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default. 1685 1686 Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' 1687 environment variables to override the `timeout` and `max_retries` parameters respectively 1688 in the OpenAI client. 1689 1690 **Parameters:** 1691 1692 - **api_key** (<code>Secret</code>) – The OpenAI API key. 1693 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 1694 during initialization. 1695 - **model** (<code>str</code>) – The name of the model to use. 1696 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1697 The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 1698 as an argument. 1699 - **api_base_url** (<code>str | None</code>) – An optional base URL. 1700 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See 1701 [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 1702 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent 1703 directly to the OpenAI endpoint. 1704 See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for 1705 more details. 1706 Some of the supported parameters: 1707 - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, 1708 while lower values like 0.2 will make it more focused and deterministic. 1709 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 1710 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 1711 comprising the top 10% probability mass are considered. 1712 - `previous_response_id`: The ID of the previous response. 1713 Use this to create multi-turn conversations. 1714 - `text_format`: A Pydantic model that enforces the structure of the model's response. 1715 If provided, the output will always be validated against this 1716 format (unless the model returns a tool call). 1717 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 1718 - `text`: A JSON schema that enforces the structure of the model's response. 1719 If provided, the output will always be validated against this 1720 format (unless the model returns a tool call). 1721 Notes: 1722 - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. 1723 - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. 1724 - Currently, this component doesn't support streaming for structured outputs. 1725 - Older models only support basic version of structured outputs through `{"type": "json_object"}`. 1726 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 1727 - `reasoning`: A dictionary of parameters for reasoning. For example: 1728 - `summary`: The summary of the reasoning. 1729 - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`. 1730 - `generate_summary`: Whether to generate a summary of the reasoning. 1731 Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. 1732 For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning). 1733 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 1734 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 1735 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 1736 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 1737 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. This parameter can accept either a 1738 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1739 OpenAI/MCP tool definitions. 1740 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1741 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 1742 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 1743 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 1744 are strict by default. 1745 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 1746 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 1747 1748 #### warm_up 1749 1750 ```python 1751 warm_up() -> None 1752 ``` 1753 1754 Warm up the OpenAI responses chat generator. 1755 1756 This will warm up the tools registered in the chat generator. 1757 This method is idempotent and will only warm up the tools once. 1758 1759 #### to_dict 1760 1761 ```python 1762 to_dict() -> dict[str, Any] 1763 ``` 1764 1765 Serialize this component to a dictionary. 1766 1767 **Returns:** 1768 1769 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 1770 1771 #### from_dict 1772 1773 ```python 1774 from_dict(data: dict[str, Any]) -> OpenAIResponsesChatGenerator 1775 ``` 1776 1777 Deserialize this component from a dictionary. 1778 1779 **Parameters:** 1780 1781 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 1782 1783 **Returns:** 1784 1785 - <code>OpenAIResponsesChatGenerator</code> – The deserialized component instance. 1786 1787 #### run 1788 1789 ```python 1790 run( 1791 messages: list[ChatMessage], 1792 *, 1793 streaming_callback: StreamingCallbackT | None = None, 1794 generation_kwargs: dict[str, Any] | None = None, 1795 tools: ToolsType | list[dict] | None = None, 1796 tools_strict: bool | None = None 1797 ) -> dict[str, list[ChatMessage]] 1798 ``` 1799 1800 Invokes response generation based on the provided messages and generation parameters. 1801 1802 **Parameters:** 1803 1804 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1805 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1806 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1807 override the parameters passed during component initialization. 1808 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 1809 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. If set, it will override the 1810 `tools` parameter set during component initialization. This parameter can accept either a 1811 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1812 OpenAI/MCP tool definitions. 1813 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1814 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 1815 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 1816 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 1817 are strict by default. 1818 If set, it will override the `tools_strict` parameter set during component initialization. 1819 1820 **Returns:** 1821 1822 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1823 - `replies`: A list containing the generated responses as ChatMessage instances. 1824 1825 #### run_async 1826 1827 ```python 1828 run_async( 1829 messages: list[ChatMessage], 1830 *, 1831 streaming_callback: StreamingCallbackT | None = None, 1832 generation_kwargs: dict[str, Any] | None = None, 1833 tools: ToolsType | list[dict] | None = None, 1834 tools_strict: bool | None = None 1835 ) -> dict[str, list[ChatMessage]] 1836 ``` 1837 1838 Asynchronously invokes response generation based on the provided messages and generation parameters. 1839 1840 This is the asynchronous version of the `run` method. It has the same parameters and return values 1841 but can be used with `await` in async code. 1842 1843 **Parameters:** 1844 1845 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1846 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1847 Must be a coroutine. 1848 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1849 override the parameters passed during component initialization. 1850 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 1851 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the 1852 `tools` parameter set during component initialization. This parameter can accept either a list of 1853 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1854 OpenAI/MCP tool definitions. 1855 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1856 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1857 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1858 If set, it will override the `tools_strict` parameter set during component initialization. 1859 1860 **Returns:** 1861 1862 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1863 - `replies`: A list containing the generated responses as ChatMessage instances. 1864 1865 ## hugging_face_api 1866 1867 ### HuggingFaceAPIGenerator 1868 1869 Generates text using Hugging Face APIs. 1870 1871 Use it with the following Hugging Face APIs: 1872 1873 - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints) 1874 - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference) 1875 1876 **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the 1877 `text_generation` endpoint. Generative models are now only available through providers supporting the 1878 `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API. 1879 Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint. 1880 1881 ### Usage examples 1882 1883 #### With Hugging Face Inference Endpoints 1884 1885 <!-- test-ignore --> 1886 1887 ```python 1888 from haystack.components.generators import HuggingFaceAPIGenerator 1889 from haystack.utils import Secret 1890 1891 generator = HuggingFaceAPIGenerator(api_type="inference_endpoints", 1892 api_params={"url": "<your-inference-endpoint-url>"}, 1893 token=Secret.from_token("<your-api-key>")) 1894 1895 result = generator.run(prompt="What's Natural Language Processing?") 1896 print(result) 1897 ``` 1898 1899 #### With self-hosted text generation inference 1900 1901 <!-- test-ignore --> 1902 1903 ```python 1904 from haystack.components.generators import HuggingFaceAPIGenerator 1905 1906 generator = HuggingFaceAPIGenerator(api_type="text_generation_inference", 1907 api_params={"url": "http://localhost:8080"}) 1908 1909 result = generator.run(prompt="What's Natural Language Processing?") 1910 print(result) 1911 ``` 1912 1913 #### With the free serverless inference API 1914 1915 Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the 1916 `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the 1917 `chat_completion` endpoint. 1918 1919 <!-- test-ignore --> 1920 1921 ```python 1922 from haystack.components.generators import HuggingFaceAPIGenerator 1923 from haystack.utils import Secret 1924 1925 generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api", 1926 api_params={"model": "HuggingFaceH4/zephyr-7b-beta"}, 1927 token=Secret.from_token("<your-api-key>")) 1928 1929 result = generator.run(prompt="What's Natural Language Processing?") 1930 print(result) 1931 ``` 1932 1933 #### __init__ 1934 1935 ```python 1936 __init__( 1937 api_type: HFGenerationAPIType | str, 1938 api_params: dict[str, str], 1939 token: Secret | None = Secret.from_env_var( 1940 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 1941 ), 1942 generation_kwargs: dict[str, Any] | None = None, 1943 stop_words: list[str] | None = None, 1944 streaming_callback: StreamingCallbackT | None = None, 1945 ) -> None 1946 ``` 1947 1948 Initialize the HuggingFaceAPIGenerator instance. 1949 1950 **Parameters:** 1951 1952 - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types: 1953 - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference). 1954 - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints). 1955 - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api). 1956 This might no longer work due to changes in the models offered in the Hugging Face Inference API. 1957 Please use the `HuggingFaceAPIChatGenerator` component instead. 1958 - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys: 1959 - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. 1960 - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or 1961 `TEXT_GENERATION_INFERENCE`. 1962 - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc. 1963 - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization. 1964 Check your HF token in your [account settings](https://huggingface.co/settings/tokens). 1965 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`, 1966 `temperature`, `top_k`, `top_p`. 1967 For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) 1968 for more information. 1969 - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words. 1970 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1971 1972 #### to_dict 1973 1974 ```python 1975 to_dict() -> dict[str, Any] 1976 ``` 1977 1978 Serialize this component to a dictionary. 1979 1980 **Returns:** 1981 1982 - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component. 1983 1984 #### from_dict 1985 1986 ```python 1987 from_dict(data: dict[str, Any]) -> HuggingFaceAPIGenerator 1988 ``` 1989 1990 Deserialize this component from a dictionary. 1991 1992 #### run 1993 1994 ```python 1995 run( 1996 prompt: str, 1997 streaming_callback: StreamingCallbackT | None = None, 1998 generation_kwargs: dict[str, Any] | None = None, 1999 ) -> dict[str, Any] 2000 ``` 2001 2002 Invoke the text generation inference for the given prompt and generation parameters. 2003 2004 **Parameters:** 2005 2006 - **prompt** (<code>str</code>) – A string representing the prompt. 2007 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2008 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 2009 2010 **Returns:** 2011 2012 - <code>dict\[str, Any\]</code> – A dictionary with the generated replies and metadata. Both are lists of length n. 2013 - replies: A list of strings representing the generated replies. 2014 2015 ## hugging_face_local 2016 2017 ### HuggingFaceLocalGenerator 2018 2019 Generates text using models from Hugging Face that run locally. 2020 2021 LLMs running locally may need powerful hardware. 2022 2023 ### Usage example 2024 2025 ```python 2026 from haystack.components.generators import HuggingFaceLocalGenerator 2027 2028 generator = HuggingFaceLocalGenerator( 2029 model="Qwen/Qwen3-0.6B", 2030 task="text-generation", 2031 generation_kwargs={"max_new_tokens": 100, "temperature": 0.9} 2032 ) 2033 2034 print(generator.run("Who is the best American actor?")) 2035 # >> {'replies': ['John Cusack']} 2036 ``` 2037 2038 #### __init__ 2039 2040 ```python 2041 __init__( 2042 model: str = "Qwen/Qwen3-0.6B", 2043 task: Literal["text-generation", "text2text-generation"] | None = None, 2044 device: ComponentDevice | None = None, 2045 token: Secret | None = Secret.from_env_var( 2046 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 2047 ), 2048 generation_kwargs: dict[str, Any] | None = None, 2049 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 2050 stop_words: list[str] | None = None, 2051 streaming_callback: StreamingCallbackT | None = None, 2052 ) -> None 2053 ``` 2054 2055 Creates an instance of a HuggingFaceLocalGenerator. 2056 2057 **Parameters:** 2058 2059 - **model** (<code>str</code>) – The Hugging Face text generation model name or path. 2060 - **task** (<code>Literal['text-generation', 'text2text-generation'] | None</code>) – The task for the Hugging Face pipeline. Possible options: 2061 - `text-generation`: Supported by decoder models, like GPT. 2062 - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead. 2063 Previously supported by encoder–decoder models such as T5. 2064 If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 2065 If not specified, the component calls the Hugging Face API to infer the task from the model name. 2066 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 2067 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 2068 - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files. 2069 If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 2070 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 2071 Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. 2072 See Hugging Face's documentation for more information: 2073 - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation) 2074 - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig) 2075 - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the 2076 Hugging Face pipeline for text generation. 2077 These keyword arguments provide fine-grained control over the Hugging Face pipeline. 2078 In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. 2079 For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). 2080 In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization: 2081 [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) 2082 - **stop_words** (<code>list\[str\] | None</code>) – If the model generates a stop word, the generation stops. 2083 If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. 2084 For some chat models, the output includes both the new text and the original prompt. 2085 In these cases, make sure your prompt has no stop words. 2086 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 2087 2088 #### warm_up 2089 2090 ```python 2091 warm_up() -> None 2092 ``` 2093 2094 Initializes the component. 2095 2096 #### to_dict 2097 2098 ```python 2099 to_dict() -> dict[str, Any] 2100 ``` 2101 2102 Serializes the component to a dictionary. 2103 2104 **Returns:** 2105 2106 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 2107 2108 #### from_dict 2109 2110 ```python 2111 from_dict(data: dict[str, Any]) -> HuggingFaceLocalGenerator 2112 ``` 2113 2114 Deserializes the component from a dictionary. 2115 2116 **Parameters:** 2117 2118 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 2119 2120 **Returns:** 2121 2122 - <code>HuggingFaceLocalGenerator</code> – The deserialized component. 2123 2124 #### run 2125 2126 ```python 2127 run( 2128 prompt: str, 2129 streaming_callback: StreamingCallbackT | None = None, 2130 generation_kwargs: dict[str, Any] | None = None, 2131 ) -> dict[str, Any] 2132 ``` 2133 2134 Run the text generation model on the given prompt. 2135 2136 **Parameters:** 2137 2138 - **prompt** (<code>str</code>) – A string representing the prompt. 2139 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2140 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 2141 2142 **Returns:** 2143 2144 - <code>dict\[str, Any\]</code> – A dictionary containing the generated replies. 2145 - replies: A list of strings representing the generated replies. 2146 2147 ## openai 2148 2149 ### OpenAIGenerator 2150 2151 Generates text using OpenAI's large language models (LLMs). 2152 2153 It works with the gpt-4 and gpt-5 series models and supports streaming responses 2154 from OpenAI API. It uses strings as input and output. 2155 2156 You can customize how the text is generated by passing parameters to the 2157 OpenAI API. Use the `**generation_kwargs` argument when you initialize 2158 the component or when you run it. Any parameter that works with 2159 `openai.ChatCompletion.create` will work here too. 2160 2161 For details on OpenAI API parameters, see 2162 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 2163 2164 ### Usage example 2165 2166 ```python 2167 from haystack.components.generators import OpenAIGenerator 2168 client = OpenAIGenerator() 2169 response = client.run("What's Natural Language Processing? Be brief.") 2170 print(response) 2171 2172 # >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 2173 # >> the interaction between computers and human language. It involves enabling computers to understand, interpret, 2174 # >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model': 2175 # >> 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16, 2176 # >> 'completion_tokens': 49, 'total_tokens': 65}}]} 2177 ``` 2178 2179 #### __init__ 2180 2181 ```python 2182 __init__( 2183 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 2184 model: str = "gpt-5-mini", 2185 streaming_callback: StreamingCallbackT | None = None, 2186 api_base_url: str | None = None, 2187 organization: str | None = None, 2188 system_prompt: str | None = None, 2189 generation_kwargs: dict[str, Any] | None = None, 2190 timeout: float | None = None, 2191 max_retries: int | None = None, 2192 http_client_kwargs: dict[str, Any] | None = None, 2193 ) -> None 2194 ``` 2195 2196 Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini 2197 2198 By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters 2199 in the OpenAI client. 2200 2201 **Parameters:** 2202 2203 - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI. 2204 - **model** (<code>str</code>) – The name of the model to use. 2205 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2206 The callback function accepts StreamingChunk as an argument. 2207 - **api_base_url** (<code>str | None</code>) – An optional base URL. 2208 - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`. 2209 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the system prompt is 2210 omitted, and the default system prompt of the model is used. 2211 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are all sent directly to 2212 the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for 2213 more details. 2214 Some of the supported parameters: 2215 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 2216 including visible output tokens and reasoning tokens. 2217 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 2218 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 2219 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 2220 considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens 2221 comprising the top 10% probability mass are considered. 2222 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 2223 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 2224 - `stop`: One or more sequences after which the LLM should stop generating tokens. 2225 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 2226 the model will be less likely to repeat the same token in the text. 2227 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 2228 Bigger values mean the model will be less likely to repeat the same token in the text. 2229 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 2230 values are the bias to add to that token. 2231 - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable 2232 or set to 30. 2233 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred 2234 from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 2235 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 2236 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 2237 2238 #### to_dict 2239 2240 ```python 2241 to_dict() -> dict[str, Any] 2242 ``` 2243 2244 Serialize this component to a dictionary. 2245 2246 **Returns:** 2247 2248 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 2249 2250 #### from_dict 2251 2252 ```python 2253 from_dict(data: dict[str, Any]) -> OpenAIGenerator 2254 ``` 2255 2256 Deserialize this component from a dictionary. 2257 2258 **Parameters:** 2259 2260 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 2261 2262 **Returns:** 2263 2264 - <code>OpenAIGenerator</code> – The deserialized component instance. 2265 2266 #### run 2267 2268 ```python 2269 run( 2270 prompt: str, 2271 system_prompt: str | None = None, 2272 streaming_callback: StreamingCallbackT | None = None, 2273 generation_kwargs: dict[str, Any] | None = None, 2274 ) -> dict[str, list[str] | list[dict[str, Any]]] 2275 ``` 2276 2277 Invoke the text generation inference based on the provided messages and generation parameters. 2278 2279 **Parameters:** 2280 2281 - **prompt** (<code>str</code>) – The string prompt to use for text generation. 2282 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If this run time system prompt is omitted, the system 2283 prompt, if defined at initialisation time, is used. 2284 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2285 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will potentially override the parameters 2286 passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to 2287 the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create). 2288 2289 **Returns:** 2290 2291 - <code>dict\[str, list\[str\] | list\[dict\[str, Any\]\]\]</code> – A list of strings containing the generated responses and a list of dictionaries containing the metadata 2292 for each response. 2293 2294 ## openai_dalle 2295 2296 ### DALLEImageGenerator 2297 2298 Generates images using OpenAI's DALL-E model. 2299 2300 For details on OpenAI API parameters, see 2301 [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create). 2302 2303 ### Usage example 2304 2305 ```python 2306 from haystack.components.generators import DALLEImageGenerator 2307 image_generator = DALLEImageGenerator() 2308 response = image_generator.run("Show me a picture of a black cat.") 2309 print(response) 2310 ``` 2311 2312 #### __init__ 2313 2314 ```python 2315 __init__( 2316 model: str = "dall-e-3", 2317 quality: Literal["standard", "hd"] = "standard", 2318 size: Literal[ 2319 "256x256", "512x512", "1024x1024", "1792x1024", "1024x1792" 2320 ] = "1024x1024", 2321 response_format: Literal["url", "b64_json"] = "url", 2322 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 2323 api_base_url: str | None = None, 2324 organization: str | None = None, 2325 timeout: float | None = None, 2326 max_retries: int | None = None, 2327 http_client_kwargs: dict[str, Any] | None = None, 2328 ) -> None 2329 ``` 2330 2331 Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3. 2332 2333 **Parameters:** 2334 2335 - **model** (<code>str</code>) – The model to use for image generation. Can be "dall-e-2" or "dall-e-3". 2336 - **quality** (<code>Literal['standard', 'hd']</code>) – The quality of the generated image. Can be "standard" or "hd". 2337 - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792']</code>) – The size of the generated images. 2338 Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. 2339 Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models. 2340 - **response_format** (<code>Literal['url', 'b64_json']</code>) – The format of the response. Can be "url" or "b64_json". 2341 - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI. 2342 - **api_base_url** (<code>str | None</code>) – An optional base URL. 2343 - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`. 2344 - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable 2345 or set to 30. 2346 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred 2347 from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 2348 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 2349 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 2350 2351 #### warm_up 2352 2353 ```python 2354 warm_up() -> None 2355 ``` 2356 2357 Warm up the OpenAI client. 2358 2359 #### run 2360 2361 ```python 2362 run( 2363 prompt: str, 2364 size: ( 2365 Literal["256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"] 2366 | None 2367 ) = None, 2368 quality: Literal["standard", "hd"] | None = None, 2369 response_format: Literal["url", "b64_json"] | None = None, 2370 ) -> dict[str, Any] 2371 ``` 2372 2373 Invokes the image generation inference based on the provided prompt and generation parameters. 2374 2375 **Parameters:** 2376 2377 - **prompt** (<code>str</code>) – The prompt to generate the image. 2378 - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792'] | None</code>) – If provided, overrides the size provided during initialization. 2379 - **quality** (<code>Literal['standard', 'hd'] | None</code>) – If provided, overrides the quality provided during initialization. 2380 - **response_format** (<code>Literal['url', 'b64_json'] | None</code>) – If provided, overrides the response format provided during initialization. 2381 2382 **Returns:** 2383 2384 - <code>dict\[str, Any\]</code> – A dictionary containing the generated list of images and the revised prompt. 2385 Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings. 2386 The revised prompt is the prompt that was used to generate the image, if there was any revision 2387 to the prompt made by OpenAI. 2388 2389 #### to_dict 2390 2391 ```python 2392 to_dict() -> dict[str, Any] 2393 ``` 2394 2395 Serialize this component to a dictionary. 2396 2397 **Returns:** 2398 2399 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 2400 2401 #### from_dict 2402 2403 ```python 2404 from_dict(data: dict[str, Any]) -> DALLEImageGenerator 2405 ``` 2406 2407 Deserialize this component from a dictionary. 2408 2409 **Parameters:** 2410 2411 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 2412 2413 **Returns:** 2414 2415 - <code>DALLEImageGenerator</code> – The deserialized component instance. 2416 2417 ## utils 2418 2419 ### print_streaming_chunk 2420 2421 ```python 2422 print_streaming_chunk(chunk: StreamingChunk) -> None 2423 ``` 2424 2425 Callback function to handle and display streaming output chunks. 2426 2427 This function processes a `StreamingChunk` object by: 2428 2429 - Printing tool call metadata (if any), including function names and arguments, as they arrive. 2430 - Printing tool call results when available. 2431 - Printing the main content (e.g., text tokens) of the chunk as it is received. 2432 2433 The function outputs data directly to stdout and flushes output buffers to ensure immediate display during 2434 streaming. 2435 2436 **Parameters:** 2437 2438 - **chunk** (<code>StreamingChunk</code>) – A chunk of streaming data containing content and optional metadata, such as tool calls and 2439 tool results.