generators_api.md
1 --- 2 title: "Generators" 3 id: generators-api 4 description: "Enables text generation using LLMs." 5 slug: "/generators-api" 6 --- 7 8 9 ## azure 10 11 ### AzureOpenAIGenerator 12 13 Bases: <code>OpenAIGenerator</code> 14 15 Generates text using OpenAI's large language models (LLMs). 16 17 It works with the gpt-4 - type models and supports streaming responses 18 from OpenAI API. 19 20 You can customize how the text is generated by passing parameters to the 21 OpenAI API. Use the `**generation_kwargs` argument when you initialize 22 the component or when you run it. Any parameter that works with 23 `openai.ChatCompletion.create` will work here too. 24 25 For details on OpenAI API parameters, see 26 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 27 28 ### Usage example 29 30 ```python 31 from haystack.components.generators import AzureOpenAIGenerator 32 from haystack.utils import Secret 33 client = AzureOpenAIGenerator( 34 azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>", 35 api_key=Secret.from_token("<your-api-key>"), 36 azure_deployment="<this a model name, e.g. gpt-4.1-mini>") 37 response = client.run("What's Natural Language Processing? Be brief.") 38 print(response) 39 ``` 40 41 ``` 42 >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 43 >> the interaction between computers and human language. It involves enabling computers to understand, interpret, 44 >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model': 45 >> 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16, 46 >> 'completion_tokens': 49, 'total_tokens': 65}}]} 47 ``` 48 49 #### __init__ 50 51 ```python 52 __init__( 53 azure_endpoint: str | None = None, 54 api_version: str | None = "2024-12-01-preview", 55 azure_deployment: str | None = "gpt-4.1-mini", 56 api_key: Secret | None = Secret.from_env_var( 57 "AZURE_OPENAI_API_KEY", strict=False 58 ), 59 azure_ad_token: Secret | None = Secret.from_env_var( 60 "AZURE_OPENAI_AD_TOKEN", strict=False 61 ), 62 organization: str | None = None, 63 streaming_callback: StreamingCallbackT | None = None, 64 system_prompt: str | None = None, 65 timeout: float | None = None, 66 max_retries: int | None = None, 67 http_client_kwargs: dict[str, Any] | None = None, 68 generation_kwargs: dict[str, Any] | None = None, 69 default_headers: dict[str, str] | None = None, 70 *, 71 azure_ad_token_provider: AzureADTokenProvider | None = None 72 ) 73 ``` 74 75 Initialize the Azure OpenAI Generator. 76 77 **Parameters:** 78 79 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`. 80 - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview. 81 - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name. 82 - **api_key** (<code>Secret | None</code>) – The API key to use for authentication. 83 - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 84 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 85 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 86 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 87 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 88 as an argument. 89 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the Generator 90 omits the system prompt and uses the default system prompt. 91 - **timeout** (<code>float | None</code>) – Timeout for AzureOpenAI client. If not set, it is inferred from the 92 `OPENAI_TIMEOUT` environment variable or set to 30. 93 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with AzureOpenAI if it returns an internal error. 94 If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 95 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 96 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 97 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model, sent directly to 98 the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for 99 more details. 100 Some of the supported parameters: 101 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 102 including visible output tokens and reasoning tokens. 103 - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. 104 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 105 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 106 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 107 comprising the top 10% probability mass are considered. 108 - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, 109 the LLM will generate two completions per prompt, resulting in 6 completions total. 110 - `stop`: One or more sequences after which the LLM should stop generating tokens. 111 - `presence_penalty`: The penalty applied if a token is already present. 112 Higher values make the model less likely to repeat the token. 113 - `frequency_penalty`: Penalty applied if a token has already been generated. 114 Higher values make the model less likely to repeat the token. 115 - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the 116 values are the bias to add to that token. 117 - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client. 118 - **azure_ad_token_provider** (<code>AzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on 119 every request. 120 121 #### to_dict 122 123 ```python 124 to_dict() -> dict[str, Any] 125 ``` 126 127 Serialize this component to a dictionary. 128 129 **Returns:** 130 131 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 132 133 #### from_dict 134 135 ```python 136 from_dict(data: dict[str, Any]) -> AzureOpenAIGenerator 137 ``` 138 139 Deserialize this component from a dictionary. 140 141 **Parameters:** 142 143 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 144 145 **Returns:** 146 147 - <code>AzureOpenAIGenerator</code> – The deserialized component instance. 148 149 ## chat/azure 150 151 ### AzureOpenAIChatGenerator 152 153 Bases: <code>OpenAIChatGenerator</code> 154 155 Generates text using OpenAI's models on Azure. 156 157 It works with the gpt-4 - type models and supports streaming responses 158 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 159 format in input and output. 160 161 You can customize how the text is generated by passing parameters to the 162 OpenAI API. Use the `**generation_kwargs` argument when you initialize 163 the component or when you run it. Any parameter that works with 164 `openai.ChatCompletion.create` will work here too. 165 166 For details on OpenAI API parameters, see 167 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 168 169 ### Usage example 170 171 ```python 172 from haystack.components.generators.chat import AzureOpenAIChatGenerator 173 from haystack.dataclasses import ChatMessage 174 from haystack.utils import Secret 175 176 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 177 178 client = AzureOpenAIChatGenerator( 179 azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>", 180 api_key=Secret.from_token("<your-api-key>"), 181 azure_deployment="<this a model name, e.g. gpt-4.1-mini>") 182 response = client.run(messages) 183 print(response) 184 ``` 185 186 ``` 187 {'replies': 188 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text= 189 "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 190 enabling computers to understand, interpret, and generate human language in a way that is useful.")], 191 _name=None, 192 _meta={'model': 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 193 'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})] 194 } 195 ``` 196 197 #### __init__ 198 199 ```python 200 __init__( 201 azure_endpoint: str | None = None, 202 api_version: str | None = "2024-12-01-preview", 203 azure_deployment: str | None = "gpt-4.1-mini", 204 api_key: Secret | None = Secret.from_env_var( 205 "AZURE_OPENAI_API_KEY", strict=False 206 ), 207 azure_ad_token: Secret | None = Secret.from_env_var( 208 "AZURE_OPENAI_AD_TOKEN", strict=False 209 ), 210 organization: str | None = None, 211 streaming_callback: StreamingCallbackT | None = None, 212 timeout: float | None = None, 213 max_retries: int | None = None, 214 generation_kwargs: dict[str, Any] | None = None, 215 default_headers: dict[str, str] | None = None, 216 tools: ToolsType | None = None, 217 tools_strict: bool = False, 218 *, 219 azure_ad_token_provider: ( 220 AzureADTokenProvider | AsyncAzureADTokenProvider | None 221 ) = None, 222 http_client_kwargs: dict[str, Any] | None = None 223 ) 224 ``` 225 226 Initialize the Azure OpenAI Chat Generator component. 227 228 **Parameters:** 229 230 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. 231 - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview. 232 - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name. 233 - **api_key** (<code>Secret | None</code>) – The API key to use for authentication. 234 - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 235 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 236 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 237 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 238 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 239 as an argument. 240 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 241 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 242 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 243 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 244 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to 245 the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 246 Some of the supported parameters: 247 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 248 including visible output tokens and reasoning tokens. 249 - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. 250 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 251 - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers 252 tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising 253 the top 10% probability mass are considered. 254 - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, 255 the LLM will generate two completions per prompt, resulting in 6 completions total. 256 - `stop`: One or more sequences after which the LLM should stop generating tokens. 257 - `presence_penalty`: The penalty applied if a token is already present. 258 Higher values make the model less likely to repeat the token. 259 - `frequency_penalty`: Penalty applied if a token has already been generated. 260 Higher values make the model less likely to repeat the token. 261 - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the 262 values are the bias to add to that token. 263 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 264 If provided, the output will always be validated against this 265 format (unless the model returns a tool call). 266 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 267 Notes: 268 - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. 269 Older models only support basic version of structured outputs through `{"type": "json_object"}`. 270 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 271 - For structured outputs with streaming, 272 the `response_format` must be a JSON schema and not a Pydantic model. 273 - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client. 274 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 275 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 276 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 277 - **azure_ad_token_provider** (<code>AzureADTokenProvider | AsyncAzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on 278 every request. 279 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 280 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 281 282 #### warm_up 283 284 ```python 285 warm_up() 286 ``` 287 288 Warm up the Azure OpenAI chat generator. 289 290 This will warm up the tools registered in the chat generator. 291 This method is idempotent and will only warm up the tools once. 292 293 #### to_dict 294 295 ```python 296 to_dict() -> dict[str, Any] 297 ``` 298 299 Serialize this component to a dictionary. 300 301 **Returns:** 302 303 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 304 305 #### from_dict 306 307 ```python 308 from_dict(data: dict[str, Any]) -> AzureOpenAIChatGenerator 309 ``` 310 311 Deserialize this component from a dictionary. 312 313 **Parameters:** 314 315 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 316 317 **Returns:** 318 319 - <code>AzureOpenAIChatGenerator</code> – The deserialized component instance. 320 321 ## chat/azure_responses 322 323 ### AzureOpenAIResponsesChatGenerator 324 325 Bases: <code>OpenAIResponsesChatGenerator</code> 326 327 Completes chats using OpenAI's Responses API on Azure. 328 329 It works with the gpt-5 and o-series models and supports streaming responses 330 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 331 format in input and output. 332 333 You can customize how the text is generated by passing parameters to the 334 OpenAI API. Use the `**generation_kwargs` argument when you initialize 335 the component or when you run it. Any parameter that works with 336 `openai.Responses.create` will work here too. 337 338 For details on OpenAI API parameters, see 339 [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses). 340 341 ### Usage example 342 343 ```python 344 from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator 345 from haystack.dataclasses import ChatMessage 346 347 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 348 349 client = AzureOpenAIResponsesChatGenerator( 350 azure_endpoint="https://example-resource.azure.openai.com/", 351 generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}} 352 ) 353 response = client.run(messages) 354 print(response) 355 ``` 356 357 #### SUPPORTED_MODELS 358 359 ```python 360 SUPPORTED_MODELS: list[str] = [ 361 "gpt-5.4-pro", 362 "gpt-5.4", 363 "gpt-5.3-chat", 364 "gpt-5.3-codex", 365 "gpt-5.2-codex", 366 "gpt-5.2", 367 "gpt-5.2-chat", 368 "gpt-5.1-codex-max", 369 "gpt-5.1", 370 "gpt-5.1-chat", 371 "gpt-5.1-codex", 372 "gpt-5.1-codex-mini", 373 "gpt-5-pro", 374 "gpt-5-codex", 375 "gpt-5", 376 "gpt-5-mini", 377 "gpt-5-nano", 378 "gpt-5-chat", 379 "gpt-4o", 380 "gpt-4o-mini", 381 "computer-use-preview", 382 "gpt-4.1", 383 "gpt-4.1-nano", 384 "gpt-4.1-mini", 385 "gpt-image-1", 386 "gpt-image-1-mini", 387 "gpt-image-1.5", 388 "o1", 389 "o3-mini", 390 "o3", 391 "o4-mini", 392 ] 393 394 ``` 395 396 A non-exhaustive list of chat models supported by this component. 397 See https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/responses#model-support for the full list. 398 399 #### __init__ 400 401 ```python 402 __init__( 403 *, 404 api_key: ( 405 Secret | Callable[[], str] | Callable[[], Awaitable[str]] 406 ) = Secret.from_env_var("AZURE_OPENAI_API_KEY", strict=False), 407 azure_endpoint: str | None = None, 408 azure_deployment: str = "gpt-5-mini", 409 streaming_callback: StreamingCallbackT | None = None, 410 organization: str | None = None, 411 generation_kwargs: dict[str, Any] | None = None, 412 timeout: float | None = None, 413 max_retries: int | None = None, 414 tools: ToolsType | None = None, 415 tools_strict: bool = False, 416 http_client_kwargs: dict[str, Any] | None = None 417 ) 418 ``` 419 420 Initialize the AzureOpenAIResponsesChatGenerator component. 421 422 **Parameters:** 423 424 - **api_key** (<code>Secret | Callable\[[], str\] | Callable\[[], Awaitable\[str\]\]</code>) – The API key to use for authentication. Can be: 425 - A `Secret` object containing the API key. 426 - A `Secret` object containing the [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 427 - A function that returns an Azure Active Directory token. 428 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. 429 - **azure_deployment** (<code>str</code>) – The deployment of the model, usually the model name. 430 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 431 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 432 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 433 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 434 as an argument. 435 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 436 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 437 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 438 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 439 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent 440 directly to the OpenAI endpoint. 441 See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for 442 more details. 443 Some of the supported parameters: 444 - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, 445 while lower values like 0.2 will make it more focused and deterministic. 446 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 447 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 448 comprising the top 10% probability mass are considered. 449 - `previous_response_id`: The ID of the previous response. 450 Use this to create multi-turn conversations. 451 - `text_format`: A Pydantic model that enforces the structure of the model's response. 452 If provided, the output will always be validated against this 453 format (unless the model returns a tool call). 454 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 455 - `text`: A JSON schema that enforces the structure of the model's response. 456 If provided, the output will always be validated against this 457 format (unless the model returns a tool call). 458 Notes: 459 - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. 460 - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. 461 - Currently, this component doesn't support streaming for structured outputs. 462 - Older models only support basic version of structured outputs through `{"type": "json_object"}`. 463 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 464 - `reasoning`: A dictionary of parameters for reasoning. For example: 465 - `summary`: The summary of the reasoning. 466 - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`. 467 - `generate_summary`: Whether to generate a summary of the reasoning. 468 Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. 469 For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning). 470 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 471 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 472 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 473 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 474 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 475 476 #### to_dict 477 478 ```python 479 to_dict() -> dict[str, Any] 480 ``` 481 482 Serialize this component to a dictionary. 483 484 **Returns:** 485 486 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 487 488 #### from_dict 489 490 ```python 491 from_dict(data: dict[str, Any]) -> AzureOpenAIResponsesChatGenerator 492 ``` 493 494 Deserialize this component from a dictionary. 495 496 **Parameters:** 497 498 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 499 500 **Returns:** 501 502 - <code>AzureOpenAIResponsesChatGenerator</code> – The deserialized component instance. 503 504 ## chat/fallback 505 506 ### FallbackChatGenerator 507 508 A chat generator wrapper that tries multiple chat generators sequentially. 509 510 It forwards all parameters transparently to the underlying chat generators and returns the first successful result. 511 Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator. 512 If all chat generators fail, it raises a RuntimeError with details. 513 514 Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only 515 work correctly if the underlying chat generators implement proper timeout handling and raise exceptions 516 when timeouts occur. For predictable latency guarantees, ensure your chat generators: 517 518 - Support a `timeout` parameter in their initialization 519 - Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming) 520 - Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded 521 522 Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters 523 with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`) 524 typically applies to all connection phases: connection setup, read, write, and pool. For streaming 525 responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for 526 receiving the complete response. 527 528 Failover is automatically triggered when a generator raises any exception, including: 529 530 - Timeout errors (if the generator implements and raises them) 531 - Rate limit errors (429) 532 - Authentication errors (401) 533 - Context length errors (400) 534 - Server errors (500+) 535 - Any other exception 536 537 #### __init__ 538 539 ```python 540 __init__(chat_generators: list[ChatGenerator]) -> None 541 ``` 542 543 Creates an instance of FallbackChatGenerator. 544 545 **Parameters:** 546 547 - **chat_generators** (<code>list\[ChatGenerator\]</code>) – A non-empty list of chat generator components to try in order. 548 549 #### to_dict 550 551 ```python 552 to_dict() -> dict[str, Any] 553 ``` 554 555 Serialize the component, including nested chat generators when they support serialization. 556 557 #### from_dict 558 559 ```python 560 from_dict(data: dict[str, Any]) -> FallbackChatGenerator 561 ``` 562 563 Rebuild the component from a serialized representation, restoring nested chat generators. 564 565 #### warm_up 566 567 ```python 568 warm_up() -> None 569 ``` 570 571 Warm up all underlying chat generators. 572 573 This method calls warm_up() on each underlying generator that supports it. 574 575 #### run 576 577 ```python 578 run( 579 messages: list[ChatMessage], 580 generation_kwargs: dict[str, Any] | None = None, 581 tools: ToolsType | None = None, 582 streaming_callback: StreamingCallbackT | None = None, 583 ) -> dict[str, list[ChatMessage] | dict[str, Any]] 584 ``` 585 586 Execute chat generators sequentially until one succeeds. 587 588 **Parameters:** 589 590 - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances. 591 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens). 592 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities. 593 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses. 594 595 **Returns:** 596 597 - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with: 598 - "replies": Generated ChatMessage instances from the first successful generator. 599 - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, 600 total_attempts, failed_chat_generators, plus any metadata from the successful generator. 601 602 **Raises:** 603 604 - <code>RuntimeError</code> – If all chat generators fail. 605 606 #### run_async 607 608 ```python 609 run_async( 610 messages: list[ChatMessage], 611 generation_kwargs: dict[str, Any] | None = None, 612 tools: ToolsType | None = None, 613 streaming_callback: StreamingCallbackT | None = None, 614 ) -> dict[str, list[ChatMessage] | dict[str, Any]] 615 ``` 616 617 Asynchronously execute chat generators sequentially until one succeeds. 618 619 **Parameters:** 620 621 - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances. 622 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens). 623 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities. 624 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses. 625 626 **Returns:** 627 628 - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with: 629 - "replies": Generated ChatMessage instances from the first successful generator. 630 - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, 631 total_attempts, failed_chat_generators, plus any metadata from the successful generator. 632 633 **Raises:** 634 635 - <code>RuntimeError</code> – If all chat generators fail. 636 637 ## chat/hugging_face_api 638 639 ### HuggingFaceAPIChatGenerator 640 641 Completes chats using Hugging Face APIs. 642 643 HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 644 format for input and output. Use it to generate text with Hugging Face APIs: 645 646 - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers) 647 - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints) 648 - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference) 649 650 ### Usage examples 651 652 #### With the serverless inference API (Inference Providers) - free tier available 653 654 ```python 655 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 656 from haystack.dataclasses import ChatMessage 657 from haystack.utils import Secret 658 from haystack.utils.hf import HFGenerationAPIType 659 660 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 661 ChatMessage.from_user("What's Natural Language Processing?")] 662 663 # the api_type can be expressed using the HFGenerationAPIType enum or as a string 664 api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API 665 api_type = "serverless_inference_api" # this is equivalent to the above 666 667 generator = HuggingFaceAPIChatGenerator(api_type=api_type, 668 api_params={"model": "Qwen/Qwen2.5-7B-Instruct", 669 "provider": "together"}, 670 token=Secret.from_token("<your-api-key>")) 671 672 result = generator.run(messages) 673 print(result) 674 ``` 675 676 #### With the serverless inference API (Inference Providers) and text+image input 677 678 ```python 679 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 680 from haystack.dataclasses import ChatMessage, ImageContent 681 from haystack.utils import Secret 682 from haystack.utils.hf import HFGenerationAPIType 683 684 # Create an image from file path, URL, or base64 685 image = ImageContent.from_file_path("path/to/your/image.jpg") 686 687 # Create a multimodal message with both text and image 688 messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])] 689 690 generator = HuggingFaceAPIChatGenerator( 691 api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API, 692 api_params={ 693 "model": "Qwen/Qwen2.5-VL-7B-Instruct", # Vision Language Model 694 "provider": "hyperbolic" 695 }, 696 token=Secret.from_token("<your-api-key>") 697 ) 698 699 result = generator.run(messages) 700 print(result) 701 ``` 702 703 #### With paid inference endpoints 704 705 ````python 706 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 707 from haystack.dataclasses import ChatMessage 708 from haystack.utils import Secret 709 710 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 711 ChatMessage.from_user("What's Natural Language Processing?")] 712 713 generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints", 714 api_params={"url": "<your-inference-endpoint-url>"}, 715 token=Secret.from_token("<your-api-key>")) 716 717 result = generator.run(messages) 718 print(result) 719 720 #### With self-hosted text generation inference 721 722 ```python 723 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 724 from haystack.dataclasses import ChatMessage 725 726 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 727 ChatMessage.from_user("What's Natural Language Processing?")] 728 729 generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference", 730 api_params={"url": "http://localhost:8080"}) 731 732 result = generator.run(messages) 733 print(result) 734 ```` 735 736 #### __init__ 737 738 ```python 739 __init__( 740 api_type: HFGenerationAPIType | str, 741 api_params: dict[str, str], 742 token: Secret | None = Secret.from_env_var( 743 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 744 ), 745 generation_kwargs: dict[str, Any] | None = None, 746 stop_words: list[str] | None = None, 747 streaming_callback: StreamingCallbackT | None = None, 748 tools: ToolsType | None = None, 749 ) 750 ``` 751 752 Initialize the HuggingFaceAPIChatGenerator instance. 753 754 **Parameters:** 755 756 - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types: 757 - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference). 758 - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints). 759 - `serverless_inference_api`: See 760 [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers). 761 - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys: 762 - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. 763 - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`. 764 - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or 765 `TEXT_GENERATION_INFERENCE`. 766 - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc. 767 - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization. 768 Check your HF token in your [account settings](https://huggingface.co/settings/tokens). 769 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 770 Some examples: `max_tokens`, `temperature`, `top_p`. 771 For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion). 772 - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words. 773 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 774 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 775 The chosen model should support tool/function calling, according to the model card. 776 Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience 777 unexpected behavior. 778 779 #### warm_up 780 781 ```python 782 warm_up() 783 ``` 784 785 Warm up the Hugging Face API chat generator. 786 787 This will warm up the tools registered in the chat generator. 788 This method is idempotent and will only warm up the tools once. 789 790 #### to_dict 791 792 ```python 793 to_dict() -> dict[str, Any] 794 ``` 795 796 Serialize this component to a dictionary. 797 798 **Returns:** 799 800 - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component. 801 802 #### from_dict 803 804 ```python 805 from_dict(data: dict[str, Any]) -> HuggingFaceAPIChatGenerator 806 ``` 807 808 Deserialize this component from a dictionary. 809 810 #### run 811 812 ```python 813 run( 814 messages: list[ChatMessage], 815 generation_kwargs: dict[str, Any] | None = None, 816 tools: ToolsType | None = None, 817 streaming_callback: StreamingCallbackT | None = None, 818 ) -> dict[str, list[ChatMessage]] 819 ``` 820 821 Invoke the text generation inference based on the provided messages and generation parameters. 822 823 **Parameters:** 824 825 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 826 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 827 - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override 828 the `tools` parameter set during component initialization. This parameter can accept either a 829 list of `Tool` objects or a `Toolset` instance. 830 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback` 831 parameter set during component initialization. 832 833 **Returns:** 834 835 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 836 - `replies`: A list containing the generated responses as ChatMessage objects. 837 838 #### run_async 839 840 ```python 841 run_async( 842 messages: list[ChatMessage], 843 generation_kwargs: dict[str, Any] | None = None, 844 tools: ToolsType | None = None, 845 streaming_callback: StreamingCallbackT | None = None, 846 ) -> dict[str, list[ChatMessage]] 847 ``` 848 849 Asynchronously invokes the text generation inference based on the provided messages and generation parameters. 850 851 This is the asynchronous version of the `run` method. It has the same parameters 852 and return values but can be used with `await` in an async code. 853 854 **Parameters:** 855 856 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 857 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 858 - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools` 859 parameter set during component initialization. This parameter can accept either a list of `Tool` objects 860 or a `Toolset` instance. 861 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback` 862 parameter set during component initialization. 863 864 **Returns:** 865 866 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 867 - `replies`: A list containing the generated responses as ChatMessage objects. 868 869 ## chat/hugging_face_local 870 871 ### default_tool_parser 872 873 ```python 874 default_tool_parser(text: str) -> list[ToolCall] | None 875 ``` 876 877 Default implementation for parsing tool calls from model output text. 878 879 Uses DEFAULT_TOOL_PATTERN to extract tool calls. 880 881 **Parameters:** 882 883 - **text** (<code>str</code>) – The text to parse for tool calls. 884 885 **Returns:** 886 887 - <code>list\[ToolCall\] | None</code> – A list containing a single ToolCall if a valid tool call is found, None otherwise. 888 889 ### HuggingFaceLocalChatGenerator 890 891 Generates chat responses using models from Hugging Face that run locally. 892 893 Use this component with chat-based models, 894 such as `Qwen/Qwen3-0.6B` or `meta-llama/Llama-2-7b-chat-hf`. 895 LLMs running locally may need powerful hardware. 896 897 ### Usage example 898 899 ```python 900 from haystack.components.generators.chat import HuggingFaceLocalChatGenerator 901 from haystack.dataclasses import ChatMessage 902 903 generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B") 904 messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")] 905 print(generator.run(messages)) 906 ``` 907 908 ``` 909 {'replies': 910 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text= 911 "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals 912 with the interaction between computers and human language. It enables computers to understand, interpret, and 913 generate human language in a valuable way. NLP involves various techniques such as speech recognition, text 914 analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to 915 process and derive meaning from human language, improving communication between humans and machines.")], 916 _name=None, 917 _meta={'finish_reason': 'stop', 'index': 0, 'model': 918 'mistralai/Mistral-7B-Instruct-v0.2', 919 'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}}) 920 ] 921 } 922 ``` 923 924 #### __init__ 925 926 ```python 927 __init__( 928 model: str = "Qwen/Qwen3-0.6B", 929 task: Literal["text-generation", "text2text-generation"] | None = None, 930 device: ComponentDevice | None = None, 931 token: Secret | None = Secret.from_env_var( 932 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 933 ), 934 chat_template: str | None = None, 935 generation_kwargs: dict[str, Any] | None = None, 936 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 937 stop_words: list[str] | None = None, 938 streaming_callback: StreamingCallbackT | None = None, 939 tools: ToolsType | None = None, 940 tool_parsing_function: Callable[[str], list[ToolCall] | None] | None = None, 941 async_executor: ThreadPoolExecutor | None = None, 942 *, 943 enable_thinking: bool = False 944 ) -> None 945 ``` 946 947 Initializes the HuggingFaceLocalChatGenerator component. 948 949 **Parameters:** 950 951 - **model** (<code>str</code>) – The Hugging Face text generation model name or path, 952 for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`. 953 The model must be a chat model supporting the ChatML messaging 954 format. 955 If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 956 - **task** (<code>Literal['text-generation', 'text2text-generation'] | None</code>) – The task for the Hugging Face pipeline. Possible options: 957 - `text-generation`: Supported by decoder models, like GPT. 958 - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead. 959 Previously supported by encoder–decoder models such as T5. 960 If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 961 If not specified, the component calls the Hugging Face API to infer the task from the model name. 962 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 963 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 964 - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files. 965 If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 966 - **chat_template** (<code>str | None</code>) – Specifies an optional Jinja template for formatting chat 967 messages. Most high-quality chat models have their own templates, but for models without this 968 feature or if you prefer a custom template, use this parameter. 969 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 970 Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. 971 See Hugging Face's documentation for more information: 972 - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation) 973 - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig) 974 The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens. 975 - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the 976 Hugging Face pipeline for text generation. 977 These keyword arguments provide fine-grained control over the Hugging Face pipeline. 978 In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. 979 For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). 980 In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) 981 - **stop_words** (<code>list\[str\] | None</code>) – A list of stop words. If the model generates a stop word, the generation stops. 982 If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. 983 For some chat models, the output includes both the new text and the original prompt. 984 In these cases, make sure your prompt has no stop words. 985 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 986 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 987 - **tool_parsing_function** (<code>Callable\\[[str\], list\[ToolCall\] | None\] | None</code>) – A callable that takes a string and returns a list of ToolCall objects or None. 988 If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern. 989 - **async_executor** (<code>ThreadPoolExecutor | None</code>) – Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be 990 initialized and used 991 - **enable_thinking** (<code>bool</code>) – Whether to enable thinking mode in the chat template for thinking-capable models. 992 When enabled, the model generates intermediate reasoning before the final response. Defaults to False. 993 994 #### shutdown 995 996 ```python 997 shutdown() -> None 998 ``` 999 1000 Explicitly shutdown the executor if we own it. 1001 1002 #### warm_up 1003 1004 ```python 1005 warm_up() -> None 1006 ``` 1007 1008 Initializes the component and warms up tools if provided. 1009 1010 #### to_dict 1011 1012 ```python 1013 to_dict() -> dict[str, Any] 1014 ``` 1015 1016 Serializes the component to a dictionary. 1017 1018 **Returns:** 1019 1020 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1021 1022 #### from_dict 1023 1024 ```python 1025 from_dict(data: dict[str, Any]) -> HuggingFaceLocalChatGenerator 1026 ``` 1027 1028 Deserializes the component from a dictionary. 1029 1030 **Parameters:** 1031 1032 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 1033 1034 **Returns:** 1035 1036 - <code>HuggingFaceLocalChatGenerator</code> – The deserialized component. 1037 1038 #### run 1039 1040 ```python 1041 run( 1042 messages: list[ChatMessage], 1043 generation_kwargs: dict[str, Any] | None = None, 1044 streaming_callback: StreamingCallbackT | None = None, 1045 tools: ToolsType | None = None, 1046 ) -> dict[str, list[ChatMessage]] 1047 ``` 1048 1049 Invoke text generation inference based on the provided messages and generation parameters. 1050 1051 **Parameters:** 1052 1053 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 1054 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 1055 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1056 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1057 If set, it will override the `tools` parameter provided during initialization. 1058 1059 **Returns:** 1060 1061 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 1062 - `replies`: A list containing the generated responses as ChatMessage instances. 1063 1064 #### create_message 1065 1066 ```python 1067 create_message( 1068 text: str, 1069 index: int, 1070 tokenizer: Union[PreTrainedTokenizer, PreTrainedTokenizerFast], 1071 prompt: str, 1072 generation_kwargs: dict[str, Any], 1073 parse_tool_calls: bool = False, 1074 ) -> ChatMessage 1075 ``` 1076 1077 Create a ChatMessage instance from the provided text, populated with metadata. 1078 1079 **Parameters:** 1080 1081 - **text** (<code>str</code>) – The generated text. 1082 - **index** (<code>int</code>) – The index of the generated text. 1083 - **tokenizer** (<code>Union\[PreTrainedTokenizer, PreTrainedTokenizerFast\]</code>) – The tokenizer used for generation. 1084 - **prompt** (<code>str</code>) – The prompt used for generation. 1085 - **generation_kwargs** (<code>dict\[str, Any\]</code>) – The generation parameters. 1086 - **parse_tool_calls** (<code>bool</code>) – Whether to attempt parsing tool calls from the text. 1087 1088 **Returns:** 1089 1090 - <code>ChatMessage</code> – A ChatMessage instance. 1091 1092 #### run_async 1093 1094 ```python 1095 run_async( 1096 messages: list[ChatMessage], 1097 generation_kwargs: dict[str, Any] | None = None, 1098 streaming_callback: StreamingCallbackT | None = None, 1099 tools: ToolsType | None = None, 1100 ) -> dict[str, list[ChatMessage]] 1101 ``` 1102 1103 Asynchronously invokes text generation inference based on the provided messages and generation parameters. 1104 1105 This is the asynchronous version of the `run` method. It has the same parameters 1106 and return values but can be used with `await` in an async code. 1107 1108 **Parameters:** 1109 1110 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 1111 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 1112 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1113 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1114 If set, it will override the `tools` parameter provided during initialization. 1115 1116 **Returns:** 1117 1118 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 1119 - `replies`: A list containing the generated responses as ChatMessage instances. 1120 1121 ## chat/llm 1122 1123 ### LLM 1124 1125 Bases: <code>Agent</code> 1126 1127 A text generation component powered by a large language model. 1128 1129 The LLM component is a simplified version of the Agent that focuses solely on text generation 1130 without tool usage. It processes messages and returns a single response from the language model. 1131 1132 ### Usage examples 1133 1134 ```python 1135 from haystack.components.generators.chat import LLM 1136 from haystack.components.generators.chat import OpenAIChatGenerator 1137 from haystack.dataclasses import ChatMessage 1138 1139 llm = LLM( 1140 chat_generator=OpenAIChatGenerator(), 1141 system_prompt="You are a helpful translation assistant.", 1142 user_prompt="""{% message role="user"%} 1143 Summarize the following document: {{ document }} 1144 {% endmessage %}""", 1145 required_variables=["document"], 1146 ) 1147 1148 result = llm.run(document="The weather is lovely today and the sun is shining. ") 1149 print(result["last_message"].text) 1150 ``` 1151 1152 #### __init__ 1153 1154 ```python 1155 __init__( 1156 *, 1157 chat_generator: ChatGenerator, 1158 system_prompt: str | None = None, 1159 user_prompt: str | None = None, 1160 required_variables: list[str] | Literal["*"] | None = None, 1161 streaming_callback: StreamingCallbackT | None = None 1162 ) -> None 1163 ``` 1164 1165 Initialize the LLM component. 1166 1167 **Parameters:** 1168 1169 - **chat_generator** (<code>ChatGenerator</code>) – An instance of the chat generator that the LLM should use. 1170 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. 1171 - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided this is appended to the messages provided at runtime. 1172 - **required_variables** (<code>list\[str\] | Literal['\*'] | None</code>) – List variables that must be provided as input to user_prompt. 1173 If a variable listed as required is not provided, an exception is raised. 1174 If set to `"*"`, all variables found in the prompt are required. Optional. 1175 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM. 1176 1177 #### to_dict 1178 1179 ```python 1180 to_dict() -> dict[str, Any] 1181 ``` 1182 1183 Serialize the LLM component to a dictionary. 1184 1185 **Returns:** 1186 1187 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1188 1189 #### from_dict 1190 1191 ```python 1192 from_dict(data: dict[str, Any]) -> LLM 1193 ``` 1194 1195 Deserialize the LLM from a dictionary. 1196 1197 **Parameters:** 1198 1199 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 1200 1201 **Returns:** 1202 1203 - <code>LLM</code> – Deserialized LLM instance. 1204 1205 #### run 1206 1207 ```python 1208 run( 1209 messages: list[ChatMessage] | None = None, 1210 streaming_callback: StreamingCallbackT | None = None, 1211 *, 1212 generation_kwargs: dict[str, Any] | None = None, 1213 system_prompt: str | None = None, 1214 user_prompt: str | None = None, 1215 **kwargs: Any 1216 ) -> dict[str, Any] 1217 ``` 1218 1219 Process messages and generate a response from the language model. 1220 1221 **Parameters:** 1222 1223 - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process. 1224 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM. 1225 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters 1226 will override the parameters passed during component initialization. 1227 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt. 1228 - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is 1229 appended to the messages provided at runtime. 1230 - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt` 1231 (the keys must match template variable names). 1232 1233 **Returns:** 1234 1235 - <code>dict\[str, Any\]</code> – A dictionary with the following keys: 1236 - "messages": List of all messages exchanged during the LLM's run. 1237 - "last_message": The last message exchanged during the LLM's run. 1238 1239 #### run_async 1240 1241 ```python 1242 run_async( 1243 messages: list[ChatMessage] | None = None, 1244 streaming_callback: StreamingCallbackT | None = None, 1245 *, 1246 generation_kwargs: dict[str, Any] | None = None, 1247 system_prompt: str | None = None, 1248 user_prompt: str | None = None, 1249 **kwargs: Any 1250 ) -> dict[str, Any] 1251 ``` 1252 1253 Asynchronously process messages and generate a response from the language model. 1254 1255 **Parameters:** 1256 1257 - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process. 1258 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An asynchronous callback that will be invoked when a response is streamed 1259 from the LLM. 1260 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters 1261 will override the parameters passed during component initialization. 1262 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt. 1263 - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is 1264 appended to the messages provided at runtime. 1265 - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt` 1266 (the keys must match template variable names). 1267 1268 **Returns:** 1269 1270 - <code>dict\[str, Any\]</code> – A dictionary with the following keys: 1271 - "messages": List of all messages exchanged during the LLM's run. 1272 - "last_message": The last message exchanged during the LLM's run. 1273 1274 ## chat/openai 1275 1276 ### OpenAIChatGenerator 1277 1278 Completes chats using OpenAI's large language models (LLMs). 1279 1280 It works with the gpt-4 and gpt-5 series models and supports streaming responses 1281 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1282 format in input and output. 1283 1284 You can customize how the text is generated by passing parameters to the 1285 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1286 the component or when you run it. Any parameter that works with 1287 `openai.ChatCompletion.create` will work here too. 1288 1289 For details on OpenAI API parameters, see 1290 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 1291 1292 ### Usage example 1293 1294 ```python 1295 from haystack.components.generators.chat import OpenAIChatGenerator 1296 from haystack.dataclasses import ChatMessage 1297 1298 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 1299 1300 client = OpenAIChatGenerator() 1301 response = client.run(messages) 1302 print(response) 1303 ``` 1304 1305 Output: 1306 1307 ``` 1308 {'replies': 1309 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content= 1310 [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence 1311 that focuses on enabling computers to understand, interpret, and generate human language in 1312 a way that is meaningful and useful.")], 1313 _name=None, 1314 _meta={'model': 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 1315 'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}}) 1316 ] 1317 } 1318 ``` 1319 1320 #### SUPPORTED_MODELS 1321 1322 ```python 1323 SUPPORTED_MODELS = [ 1324 "gpt-5-mini", 1325 "gpt-5-nano", 1326 "gpt-5", 1327 "gpt-5.1", 1328 "gpt-5.2", 1329 "gpt-5.2-pro", 1330 "gpt-5.4", 1331 "gpt-5-pro", 1332 "gpt-4.1", 1333 "gpt-4.1-mini", 1334 "gpt-4.1-nano", 1335 "gpt-4o", 1336 "gpt-4o-mini", 1337 "gpt-4-turbo", 1338 "gpt-4", 1339 "gpt-3.5-turbo", 1340 ] 1341 1342 ``` 1343 1344 A non-exhaustive list of chat models supported by this component. 1345 See https://developers.openai.com/api/docs/models for the full list and snapshot IDs. 1346 1347 #### __init__ 1348 1349 ```python 1350 __init__( 1351 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 1352 model: str = "gpt-5-mini", 1353 streaming_callback: StreamingCallbackT | None = None, 1354 api_base_url: str | None = None, 1355 organization: str | None = None, 1356 generation_kwargs: dict[str, Any] | None = None, 1357 timeout: float | None = None, 1358 max_retries: int | None = None, 1359 tools: ToolsType | None = None, 1360 tools_strict: bool = False, 1361 http_client_kwargs: dict[str, Any] | None = None, 1362 ) 1363 ``` 1364 1365 Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini 1366 1367 Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' 1368 environment variables to override the `timeout` and `max_retries` parameters respectively 1369 in the OpenAI client. 1370 1371 **Parameters:** 1372 1373 - **api_key** (<code>Secret</code>) – The OpenAI API key. 1374 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 1375 during initialization. 1376 - **model** (<code>str</code>) – The name of the model to use. 1377 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1378 The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 1379 as an argument. 1380 - **api_base_url** (<code>str | None</code>) – An optional base URL. 1381 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See 1382 [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 1383 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to 1384 the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for 1385 more details. 1386 Some of the supported parameters: 1387 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 1388 including visible output tokens and reasoning tokens. 1389 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 1390 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 1391 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 1392 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 1393 comprising the top 10% probability mass are considered. 1394 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 1395 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 1396 - `stop`: One or more sequences after which the LLM should stop generating tokens. 1397 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 1398 the model will be less likely to repeat the same token in the text. 1399 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 1400 Bigger values mean the model will be less likely to repeat the same token in the text. 1401 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 1402 values are the bias to add to that token. 1403 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 1404 If provided, the output will always be validated against this 1405 format (unless the model returns a tool call). 1406 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 1407 Notes: 1408 - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. 1409 Older models only support basic version of structured outputs through `{"type": "json_object"}`. 1410 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 1411 - For structured outputs with streaming, 1412 the `response_format` must be a JSON schema and not a Pydantic model. 1413 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 1414 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 1415 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 1416 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 1417 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1418 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1419 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1420 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 1421 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 1422 1423 #### warm_up 1424 1425 ```python 1426 warm_up() 1427 ``` 1428 1429 Warm up the OpenAI chat generator. 1430 1431 This will warm up the tools registered in the chat generator. 1432 This method is idempotent and will only warm up the tools once. 1433 1434 #### to_dict 1435 1436 ```python 1437 to_dict() -> dict[str, Any] 1438 ``` 1439 1440 Serialize this component to a dictionary. 1441 1442 **Returns:** 1443 1444 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 1445 1446 #### from_dict 1447 1448 ```python 1449 from_dict(data: dict[str, Any]) -> OpenAIChatGenerator 1450 ``` 1451 1452 Deserialize this component from a dictionary. 1453 1454 **Parameters:** 1455 1456 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 1457 1458 **Returns:** 1459 1460 - <code>OpenAIChatGenerator</code> – The deserialized component instance. 1461 1462 #### run 1463 1464 ```python 1465 run( 1466 messages: list[ChatMessage], 1467 streaming_callback: StreamingCallbackT | None = None, 1468 generation_kwargs: dict[str, Any] | None = None, 1469 *, 1470 tools: ToolsType | None = None, 1471 tools_strict: bool | None = None 1472 ) -> dict[str, list[ChatMessage]] 1473 ``` 1474 1475 Invokes chat completion based on the provided messages and generation parameters. 1476 1477 **Parameters:** 1478 1479 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1480 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1481 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1482 override the parameters passed during component initialization. 1483 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 1484 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1485 If set, it will override the `tools` parameter provided during initialization. 1486 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1487 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1488 If set, it will override the `tools_strict` parameter set during component initialization. 1489 1490 **Returns:** 1491 1492 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1493 - `replies`: A list containing the generated responses as ChatMessage instances. 1494 1495 #### run_async 1496 1497 ```python 1498 run_async( 1499 messages: list[ChatMessage], 1500 streaming_callback: StreamingCallbackT | None = None, 1501 generation_kwargs: dict[str, Any] | None = None, 1502 *, 1503 tools: ToolsType | None = None, 1504 tools_strict: bool | None = None 1505 ) -> dict[str, list[ChatMessage]] 1506 ``` 1507 1508 Asynchronously invokes chat completion based on the provided messages and generation parameters. 1509 1510 This is the asynchronous version of the `run` method. It has the same parameters and return values 1511 but can be used with `await` in async code. 1512 1513 **Parameters:** 1514 1515 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1516 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1517 Must be a coroutine. 1518 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1519 override the parameters passed during component initialization. 1520 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 1521 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1522 If set, it will override the `tools` parameter provided during initialization. 1523 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1524 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1525 If set, it will override the `tools_strict` parameter set during component initialization. 1526 1527 **Returns:** 1528 1529 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1530 - `replies`: A list containing the generated responses as ChatMessage instances. 1531 1532 ## chat/openai_responses 1533 1534 ### OpenAIResponsesChatGenerator 1535 1536 Completes chats using OpenAI's Responses API. 1537 1538 It works with the gpt-4 and o-series models and supports streaming responses 1539 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1540 format in input and output. 1541 1542 You can customize how the text is generated by passing parameters to the 1543 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1544 the component or when you run it. Any parameter that works with 1545 `openai.Responses.create` will work here too. 1546 1547 For details on OpenAI API parameters, see 1548 [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses). 1549 1550 ### Usage example 1551 1552 ```python 1553 from haystack.components.generators.chat import OpenAIResponsesChatGenerator 1554 from haystack.dataclasses import ChatMessage 1555 1556 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 1557 1558 client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}) 1559 response = client.run(messages) 1560 print(response) 1561 ``` 1562 1563 #### __init__ 1564 1565 ```python 1566 __init__( 1567 *, 1568 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 1569 model: str = "gpt-5-mini", 1570 streaming_callback: StreamingCallbackT | None = None, 1571 api_base_url: str | None = None, 1572 organization: str | None = None, 1573 generation_kwargs: dict[str, Any] | None = None, 1574 timeout: float | None = None, 1575 max_retries: int | None = None, 1576 tools: ToolsType | list[dict] | None = None, 1577 tools_strict: bool = False, 1578 http_client_kwargs: dict[str, Any] | None = None 1579 ) 1580 ``` 1581 1582 Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default. 1583 1584 Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' 1585 environment variables to override the `timeout` and `max_retries` parameters respectively 1586 in the OpenAI client. 1587 1588 **Parameters:** 1589 1590 - **api_key** (<code>Secret</code>) – The OpenAI API key. 1591 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 1592 during initialization. 1593 - **model** (<code>str</code>) – The name of the model to use. 1594 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1595 The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 1596 as an argument. 1597 - **api_base_url** (<code>str | None</code>) – An optional base URL. 1598 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See 1599 [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 1600 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent 1601 directly to the OpenAI endpoint. 1602 See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for 1603 more details. 1604 Some of the supported parameters: 1605 - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, 1606 while lower values like 0.2 will make it more focused and deterministic. 1607 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 1608 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 1609 comprising the top 10% probability mass are considered. 1610 - `previous_response_id`: The ID of the previous response. 1611 Use this to create multi-turn conversations. 1612 - `text_format`: A Pydantic model that enforces the structure of the model's response. 1613 If provided, the output will always be validated against this 1614 format (unless the model returns a tool call). 1615 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 1616 - `text`: A JSON schema that enforces the structure of the model's response. 1617 If provided, the output will always be validated against this 1618 format (unless the model returns a tool call). 1619 Notes: 1620 - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. 1621 - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. 1622 - Currently, this component doesn't support streaming for structured outputs. 1623 - Older models only support basic version of structured outputs through `{"type": "json_object"}`. 1624 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 1625 - `reasoning`: A dictionary of parameters for reasoning. For example: 1626 - `summary`: The summary of the reasoning. 1627 - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`. 1628 - `generate_summary`: Whether to generate a summary of the reasoning. 1629 Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. 1630 For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning). 1631 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 1632 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 1633 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 1634 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 1635 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. This parameter can accept either a 1636 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1637 OpenAI/MCP tool definitions. 1638 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1639 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 1640 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 1641 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 1642 are strict by default. 1643 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 1644 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 1645 1646 #### warm_up 1647 1648 ```python 1649 warm_up() 1650 ``` 1651 1652 Warm up the OpenAI responses chat generator. 1653 1654 This will warm up the tools registered in the chat generator. 1655 This method is idempotent and will only warm up the tools once. 1656 1657 #### to_dict 1658 1659 ```python 1660 to_dict() -> dict[str, Any] 1661 ``` 1662 1663 Serialize this component to a dictionary. 1664 1665 **Returns:** 1666 1667 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 1668 1669 #### from_dict 1670 1671 ```python 1672 from_dict(data: dict[str, Any]) -> OpenAIResponsesChatGenerator 1673 ``` 1674 1675 Deserialize this component from a dictionary. 1676 1677 **Parameters:** 1678 1679 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 1680 1681 **Returns:** 1682 1683 - <code>OpenAIResponsesChatGenerator</code> – The deserialized component instance. 1684 1685 #### run 1686 1687 ```python 1688 run( 1689 messages: list[ChatMessage], 1690 *, 1691 streaming_callback: StreamingCallbackT | None = None, 1692 generation_kwargs: dict[str, Any] | None = None, 1693 tools: ToolsType | list[dict] | None = None, 1694 tools_strict: bool | None = None 1695 ) -> dict[str, list[ChatMessage]] 1696 ``` 1697 1698 Invokes response generation based on the provided messages and generation parameters. 1699 1700 **Parameters:** 1701 1702 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1703 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1704 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1705 override the parameters passed during component initialization. 1706 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 1707 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. If set, it will override the 1708 `tools` parameter set during component initialization. This parameter can accept either a 1709 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1710 OpenAI/MCP tool definitions. 1711 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1712 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 1713 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 1714 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 1715 are strict by default. 1716 If set, it will override the `tools_strict` parameter set during component initialization. 1717 1718 **Returns:** 1719 1720 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1721 - `replies`: A list containing the generated responses as ChatMessage instances. 1722 1723 #### run_async 1724 1725 ```python 1726 run_async( 1727 messages: list[ChatMessage], 1728 *, 1729 streaming_callback: StreamingCallbackT | None = None, 1730 generation_kwargs: dict[str, Any] | None = None, 1731 tools: ToolsType | list[dict] | None = None, 1732 tools_strict: bool | None = None 1733 ) -> dict[str, list[ChatMessage]] 1734 ``` 1735 1736 Asynchronously invokes response generation based on the provided messages and generation parameters. 1737 1738 This is the asynchronous version of the `run` method. It has the same parameters and return values 1739 but can be used with `await` in async code. 1740 1741 **Parameters:** 1742 1743 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1744 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1745 Must be a coroutine. 1746 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1747 override the parameters passed during component initialization. 1748 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 1749 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the 1750 `tools` parameter set during component initialization. This parameter can accept either a list of 1751 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1752 OpenAI/MCP tool definitions. 1753 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1754 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1755 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1756 If set, it will override the `tools_strict` parameter set during component initialization. 1757 1758 **Returns:** 1759 1760 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1761 - `replies`: A list containing the generated responses as ChatMessage instances. 1762 1763 ## hugging_face_api 1764 1765 ### HuggingFaceAPIGenerator 1766 1767 Generates text using Hugging Face APIs. 1768 1769 Use it with the following Hugging Face APIs: 1770 1771 - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints) 1772 - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference) 1773 1774 **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the 1775 `text_generation` endpoint. Generative models are now only available through providers supporting the 1776 `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API. 1777 Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint. 1778 1779 ### Usage examples 1780 1781 #### With Hugging Face Inference Endpoints 1782 1783 ```python 1784 from haystack.components.generators import HuggingFaceAPIGenerator 1785 from haystack.utils import Secret 1786 1787 generator = HuggingFaceAPIGenerator(api_type="inference_endpoints", 1788 api_params={"url": "<your-inference-endpoint-url>"}, 1789 token=Secret.from_token("<your-api-key>")) 1790 1791 result = generator.run(prompt="What's Natural Language Processing?") 1792 print(result) 1793 ``` 1794 1795 #### With self-hosted text generation inference 1796 1797 ```python 1798 from haystack.components.generators import HuggingFaceAPIGenerator 1799 1800 generator = HuggingFaceAPIGenerator(api_type="text_generation_inference", 1801 api_params={"url": "http://localhost:8080"}) 1802 1803 result = generator.run(prompt="What's Natural Language Processing?") 1804 print(result) 1805 ``` 1806 1807 #### With the free serverless inference API 1808 1809 Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the 1810 `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the 1811 `chat_completion` endpoint. 1812 1813 ```python 1814 from haystack.components.generators import HuggingFaceAPIGenerator 1815 from haystack.utils import Secret 1816 1817 generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api", 1818 api_params={"model": "HuggingFaceH4/zephyr-7b-beta"}, 1819 token=Secret.from_token("<your-api-key>")) 1820 1821 result = generator.run(prompt="What's Natural Language Processing?") 1822 print(result) 1823 ``` 1824 1825 #### __init__ 1826 1827 ```python 1828 __init__( 1829 api_type: HFGenerationAPIType | str, 1830 api_params: dict[str, str], 1831 token: Secret | None = Secret.from_env_var( 1832 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 1833 ), 1834 generation_kwargs: dict[str, Any] | None = None, 1835 stop_words: list[str] | None = None, 1836 streaming_callback: StreamingCallbackT | None = None, 1837 ) 1838 ``` 1839 1840 Initialize the HuggingFaceAPIGenerator instance. 1841 1842 **Parameters:** 1843 1844 - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types: 1845 - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference). 1846 - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints). 1847 - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api). 1848 This might no longer work due to changes in the models offered in the Hugging Face Inference API. 1849 Please use the `HuggingFaceAPIChatGenerator` component instead. 1850 - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys: 1851 - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. 1852 - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or 1853 `TEXT_GENERATION_INFERENCE`. 1854 - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc. 1855 - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization. 1856 Check your HF token in your [account settings](https://huggingface.co/settings/tokens). 1857 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`, 1858 `temperature`, `top_k`, `top_p`. 1859 For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) 1860 for more information. 1861 - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words. 1862 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1863 1864 #### to_dict 1865 1866 ```python 1867 to_dict() -> dict[str, Any] 1868 ``` 1869 1870 Serialize this component to a dictionary. 1871 1872 **Returns:** 1873 1874 - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component. 1875 1876 #### from_dict 1877 1878 ```python 1879 from_dict(data: dict[str, Any]) -> HuggingFaceAPIGenerator 1880 ``` 1881 1882 Deserialize this component from a dictionary. 1883 1884 #### run 1885 1886 ```python 1887 run( 1888 prompt: str, 1889 streaming_callback: StreamingCallbackT | None = None, 1890 generation_kwargs: dict[str, Any] | None = None, 1891 ) 1892 ``` 1893 1894 Invoke the text generation inference for the given prompt and generation parameters. 1895 1896 **Parameters:** 1897 1898 - **prompt** (<code>str</code>) – A string representing the prompt. 1899 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1900 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 1901 1902 **Returns:** 1903 1904 - – A dictionary with the generated replies and metadata. Both are lists of length n. 1905 - replies: A list of strings representing the generated replies. 1906 1907 ## hugging_face_local 1908 1909 ### HuggingFaceLocalGenerator 1910 1911 Generates text using models from Hugging Face that run locally. 1912 1913 LLMs running locally may need powerful hardware. 1914 1915 ### Usage example 1916 1917 ```python 1918 from haystack.components.generators import HuggingFaceLocalGenerator 1919 1920 generator = HuggingFaceLocalGenerator( 1921 model="Qwen/Qwen3-0.6B", 1922 task="text-generation", 1923 generation_kwargs={"max_new_tokens": 100, "temperature": 0.9} 1924 ) 1925 1926 print(generator.run("Who is the best American actor?")) 1927 # {'replies': ['John Cusack']} 1928 ``` 1929 1930 #### __init__ 1931 1932 ```python 1933 __init__( 1934 model: str = "Qwen/Qwen3-0.6B", 1935 task: Literal["text-generation", "text2text-generation"] | None = None, 1936 device: ComponentDevice | None = None, 1937 token: Secret | None = Secret.from_env_var( 1938 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 1939 ), 1940 generation_kwargs: dict[str, Any] | None = None, 1941 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 1942 stop_words: list[str] | None = None, 1943 streaming_callback: StreamingCallbackT | None = None, 1944 ) 1945 ``` 1946 1947 Creates an instance of a HuggingFaceLocalGenerator. 1948 1949 **Parameters:** 1950 1951 - **model** (<code>str</code>) – The Hugging Face text generation model name or path. 1952 - **task** (<code>Literal['text-generation', 'text2text-generation'] | None</code>) – The task for the Hugging Face pipeline. Possible options: 1953 - `text-generation`: Supported by decoder models, like GPT. 1954 - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead. 1955 Previously supported by encoder–decoder models such as T5. 1956 If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1957 If not specified, the component calls the Hugging Face API to infer the task from the model name. 1958 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 1959 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 1960 - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files. 1961 If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1962 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 1963 Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. 1964 See Hugging Face's documentation for more information: 1965 - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation) 1966 - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig) 1967 - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the 1968 Hugging Face pipeline for text generation. 1969 These keyword arguments provide fine-grained control over the Hugging Face pipeline. 1970 In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. 1971 For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). 1972 In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization: 1973 [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) 1974 - **stop_words** (<code>list\[str\] | None</code>) – If the model generates a stop word, the generation stops. 1975 If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. 1976 For some chat models, the output includes both the new text and the original prompt. 1977 In these cases, make sure your prompt has no stop words. 1978 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1979 1980 #### warm_up 1981 1982 ```python 1983 warm_up() 1984 ``` 1985 1986 Initializes the component. 1987 1988 #### to_dict 1989 1990 ```python 1991 to_dict() -> dict[str, Any] 1992 ``` 1993 1994 Serializes the component to a dictionary. 1995 1996 **Returns:** 1997 1998 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1999 2000 #### from_dict 2001 2002 ```python 2003 from_dict(data: dict[str, Any]) -> HuggingFaceLocalGenerator 2004 ``` 2005 2006 Deserializes the component from a dictionary. 2007 2008 **Parameters:** 2009 2010 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 2011 2012 **Returns:** 2013 2014 - <code>HuggingFaceLocalGenerator</code> – The deserialized component. 2015 2016 #### run 2017 2018 ```python 2019 run( 2020 prompt: str, 2021 streaming_callback: StreamingCallbackT | None = None, 2022 generation_kwargs: dict[str, Any] | None = None, 2023 ) 2024 ``` 2025 2026 Run the text generation model on the given prompt. 2027 2028 **Parameters:** 2029 2030 - **prompt** (<code>str</code>) – A string representing the prompt. 2031 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2032 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 2033 2034 **Returns:** 2035 2036 - – A dictionary containing the generated replies. 2037 - replies: A list of strings representing the generated replies. 2038 2039 ## openai 2040 2041 ### OpenAIGenerator 2042 2043 Generates text using OpenAI's large language models (LLMs). 2044 2045 It works with the gpt-4 and gpt-5 series models and supports streaming responses 2046 from OpenAI API. It uses strings as input and output. 2047 2048 You can customize how the text is generated by passing parameters to the 2049 OpenAI API. Use the `**generation_kwargs` argument when you initialize 2050 the component or when you run it. Any parameter that works with 2051 `openai.ChatCompletion.create` will work here too. 2052 2053 For details on OpenAI API parameters, see 2054 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 2055 2056 ### Usage example 2057 2058 ```python 2059 from haystack.components.generators import OpenAIGenerator 2060 client = OpenAIGenerator() 2061 response = client.run("What's Natural Language Processing? Be brief.") 2062 print(response) 2063 2064 >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 2065 >> the interaction between computers and human language. It involves enabling computers to understand, interpret, 2066 >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model': 2067 >> 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16, 2068 >> 'completion_tokens': 49, 'total_tokens': 65}}]} 2069 ``` 2070 2071 #### __init__ 2072 2073 ```python 2074 __init__( 2075 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 2076 model: str = "gpt-5-mini", 2077 streaming_callback: StreamingCallbackT | None = None, 2078 api_base_url: str | None = None, 2079 organization: str | None = None, 2080 system_prompt: str | None = None, 2081 generation_kwargs: dict[str, Any] | None = None, 2082 timeout: float | None = None, 2083 max_retries: int | None = None, 2084 http_client_kwargs: dict[str, Any] | None = None, 2085 ) 2086 ``` 2087 2088 Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini 2089 2090 By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters 2091 in the OpenAI client. 2092 2093 **Parameters:** 2094 2095 - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI. 2096 - **model** (<code>str</code>) – The name of the model to use. 2097 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2098 The callback function accepts StreamingChunk as an argument. 2099 - **api_base_url** (<code>str | None</code>) – An optional base URL. 2100 - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`. 2101 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the system prompt is 2102 omitted, and the default system prompt of the model is used. 2103 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are all sent directly to 2104 the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for 2105 more details. 2106 Some of the supported parameters: 2107 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 2108 including visible output tokens and reasoning tokens. 2109 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 2110 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 2111 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 2112 considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens 2113 comprising the top 10% probability mass are considered. 2114 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 2115 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 2116 - `stop`: One or more sequences after which the LLM should stop generating tokens. 2117 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 2118 the model will be less likely to repeat the same token in the text. 2119 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 2120 Bigger values mean the model will be less likely to repeat the same token in the text. 2121 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 2122 values are the bias to add to that token. 2123 - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable 2124 or set to 30. 2125 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred 2126 from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 2127 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 2128 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 2129 2130 #### to_dict 2131 2132 ```python 2133 to_dict() -> dict[str, Any] 2134 ``` 2135 2136 Serialize this component to a dictionary. 2137 2138 **Returns:** 2139 2140 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 2141 2142 #### from_dict 2143 2144 ```python 2145 from_dict(data: dict[str, Any]) -> OpenAIGenerator 2146 ``` 2147 2148 Deserialize this component from a dictionary. 2149 2150 **Parameters:** 2151 2152 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 2153 2154 **Returns:** 2155 2156 - <code>OpenAIGenerator</code> – The deserialized component instance. 2157 2158 #### run 2159 2160 ```python 2161 run( 2162 prompt: str, 2163 system_prompt: str | None = None, 2164 streaming_callback: StreamingCallbackT | None = None, 2165 generation_kwargs: dict[str, Any] | None = None, 2166 ) -> dict[str, list[str] | list[dict[str, Any]]] 2167 ``` 2168 2169 Invoke the text generation inference based on the provided messages and generation parameters. 2170 2171 **Parameters:** 2172 2173 - **prompt** (<code>str</code>) – The string prompt to use for text generation. 2174 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If this run time system prompt is omitted, the system 2175 prompt, if defined at initialisation time, is used. 2176 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2177 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will potentially override the parameters 2178 passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to 2179 the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create). 2180 2181 **Returns:** 2182 2183 - <code>dict\[str, list\[str\] | list\[dict\[str, Any\]\]\]</code> – A list of strings containing the generated responses and a list of dictionaries containing the metadata 2184 for each response. 2185 2186 ## openai_dalle 2187 2188 ### DALLEImageGenerator 2189 2190 Generates images using OpenAI's DALL-E model. 2191 2192 For details on OpenAI API parameters, see 2193 [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create). 2194 2195 ### Usage example 2196 2197 ```python 2198 from haystack.components.generators import DALLEImageGenerator 2199 image_generator = DALLEImageGenerator() 2200 response = image_generator.run("Show me a picture of a black cat.") 2201 print(response) 2202 ``` 2203 2204 #### __init__ 2205 2206 ```python 2207 __init__( 2208 model: str = "dall-e-3", 2209 quality: Literal["standard", "hd"] = "standard", 2210 size: Literal[ 2211 "256x256", "512x512", "1024x1024", "1792x1024", "1024x1792" 2212 ] = "1024x1024", 2213 response_format: Literal["url", "b64_json"] = "url", 2214 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 2215 api_base_url: str | None = None, 2216 organization: str | None = None, 2217 timeout: float | None = None, 2218 max_retries: int | None = None, 2219 http_client_kwargs: dict[str, Any] | None = None, 2220 ) 2221 ``` 2222 2223 Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3. 2224 2225 **Parameters:** 2226 2227 - **model** (<code>str</code>) – The model to use for image generation. Can be "dall-e-2" or "dall-e-3". 2228 - **quality** (<code>Literal['standard', 'hd']</code>) – The quality of the generated image. Can be "standard" or "hd". 2229 - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792']</code>) – The size of the generated images. 2230 Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. 2231 Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models. 2232 - **response_format** (<code>Literal['url', 'b64_json']</code>) – The format of the response. Can be "url" or "b64_json". 2233 - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI. 2234 - **api_base_url** (<code>str | None</code>) – An optional base URL. 2235 - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`. 2236 - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable 2237 or set to 30. 2238 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred 2239 from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 2240 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 2241 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 2242 2243 #### warm_up 2244 2245 ```python 2246 warm_up() -> None 2247 ``` 2248 2249 Warm up the OpenAI client. 2250 2251 #### run 2252 2253 ```python 2254 run( 2255 prompt: str, 2256 size: ( 2257 Literal["256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"] 2258 | None 2259 ) = None, 2260 quality: Literal["standard", "hd"] | None = None, 2261 response_format: Literal["url", "b64_json"] | None = None, 2262 ) 2263 ``` 2264 2265 Invokes the image generation inference based on the provided prompt and generation parameters. 2266 2267 **Parameters:** 2268 2269 - **prompt** (<code>str</code>) – The prompt to generate the image. 2270 - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792'] | None</code>) – If provided, overrides the size provided during initialization. 2271 - **quality** (<code>Literal['standard', 'hd'] | None</code>) – If provided, overrides the quality provided during initialization. 2272 - **response_format** (<code>Literal['url', 'b64_json'] | None</code>) – If provided, overrides the response format provided during initialization. 2273 2274 **Returns:** 2275 2276 - – A dictionary containing the generated list of images and the revised prompt. 2277 Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings. 2278 The revised prompt is the prompt that was used to generate the image, if there was any revision 2279 to the prompt made by OpenAI. 2280 2281 #### to_dict 2282 2283 ```python 2284 to_dict() -> dict[str, Any] 2285 ``` 2286 2287 Serialize this component to a dictionary. 2288 2289 **Returns:** 2290 2291 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 2292 2293 #### from_dict 2294 2295 ```python 2296 from_dict(data: dict[str, Any]) -> DALLEImageGenerator 2297 ``` 2298 2299 Deserialize this component from a dictionary. 2300 2301 **Parameters:** 2302 2303 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 2304 2305 **Returns:** 2306 2307 - <code>DALLEImageGenerator</code> – The deserialized component instance. 2308 2309 ## utils 2310 2311 ### print_streaming_chunk 2312 2313 ```python 2314 print_streaming_chunk(chunk: StreamingChunk) -> None 2315 ``` 2316 2317 Callback function to handle and display streaming output chunks. 2318 2319 This function processes a `StreamingChunk` object by: 2320 2321 - Printing tool call metadata (if any), including function names and arguments, as they arrive. 2322 - Printing tool call results when available. 2323 - Printing the main content (e.g., text tokens) of the chunk as it is received. 2324 2325 The function outputs data directly to stdout and flushes output buffers to ensure immediate display during 2326 streaming. 2327 2328 **Parameters:** 2329 2330 - **chunk** (<code>StreamingChunk</code>) – A chunk of streaming data containing content and optional metadata, such as tool calls and 2331 tool results.