generators_api.md
1 --- 2 title: "Generators" 3 id: generators-api 4 description: "Enables text generation using LLMs." 5 slug: "/generators-api" 6 --- 7 8 9 ## azure 10 11 ### AzureOpenAIGenerator 12 13 Bases: <code>OpenAIGenerator</code> 14 15 Generates text using OpenAI's large language models (LLMs). 16 17 It works with the gpt-4 - type models and supports streaming responses 18 from OpenAI API. 19 20 You can customize how the text is generated by passing parameters to the 21 OpenAI API. Use the `**generation_kwargs` argument when you initialize 22 the component or when you run it. Any parameter that works with 23 `openai.ChatCompletion.create` will work here too. 24 25 For details on OpenAI API parameters, see 26 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 27 28 ### Usage example 29 30 ```python 31 from haystack.components.generators import AzureOpenAIGenerator 32 from haystack.utils import Secret 33 client = AzureOpenAIGenerator( 34 azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>", 35 api_key=Secret.from_token("<your-api-key>"), 36 azure_deployment="<this a model name, e.g. gpt-4.1-mini>") 37 response = client.run("What's Natural Language Processing? Be brief.") 38 print(response) 39 ``` 40 41 ``` 42 >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 43 >> the interaction between computers and human language. It involves enabling computers to understand, interpret, 44 >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model': 45 >> 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16, 46 >> 'completion_tokens': 49, 'total_tokens': 65}}]} 47 ``` 48 49 #### __init__ 50 51 ```python 52 __init__( 53 azure_endpoint: str | None = None, 54 api_version: str | None = "2024-12-01-preview", 55 azure_deployment: str | None = "gpt-4.1-mini", 56 api_key: Secret | None = Secret.from_env_var( 57 "AZURE_OPENAI_API_KEY", strict=False 58 ), 59 azure_ad_token: Secret | None = Secret.from_env_var( 60 "AZURE_OPENAI_AD_TOKEN", strict=False 61 ), 62 organization: str | None = None, 63 streaming_callback: StreamingCallbackT | None = None, 64 system_prompt: str | None = None, 65 timeout: float | None = None, 66 max_retries: int | None = None, 67 http_client_kwargs: dict[str, Any] | None = None, 68 generation_kwargs: dict[str, Any] | None = None, 69 default_headers: dict[str, str] | None = None, 70 *, 71 azure_ad_token_provider: AzureADTokenProvider | None = None 72 ) 73 ``` 74 75 Initialize the Azure OpenAI Generator. 76 77 **Parameters:** 78 79 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`. 80 - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview. 81 - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name. 82 - **api_key** (<code>Secret | None</code>) – The API key to use for authentication. 83 - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 84 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 85 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 86 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 87 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 88 as an argument. 89 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the Generator 90 omits the system prompt and uses the default system prompt. 91 - **timeout** (<code>float | None</code>) – Timeout for AzureOpenAI client. If not set, it is inferred from the 92 `OPENAI_TIMEOUT` environment variable or set to 30. 93 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with AzureOpenAI if it returns an internal error. 94 If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 95 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 96 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 97 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model, sent directly to 98 the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for 99 more details. 100 Some of the supported parameters: 101 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 102 including visible output tokens and reasoning tokens. 103 - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. 104 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 105 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 106 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 107 comprising the top 10% probability mass are considered. 108 - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, 109 the LLM will generate two completions per prompt, resulting in 6 completions total. 110 - `stop`: One or more sequences after which the LLM should stop generating tokens. 111 - `presence_penalty`: The penalty applied if a token is already present. 112 Higher values make the model less likely to repeat the token. 113 - `frequency_penalty`: Penalty applied if a token has already been generated. 114 Higher values make the model less likely to repeat the token. 115 - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the 116 values are the bias to add to that token. 117 - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client. 118 - **azure_ad_token_provider** (<code>AzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on 119 every request. 120 121 #### to_dict 122 123 ```python 124 to_dict() -> dict[str, Any] 125 ``` 126 127 Serialize this component to a dictionary. 128 129 **Returns:** 130 131 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 132 133 #### from_dict 134 135 ```python 136 from_dict(data: dict[str, Any]) -> AzureOpenAIGenerator 137 ``` 138 139 Deserialize this component from a dictionary. 140 141 **Parameters:** 142 143 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 144 145 **Returns:** 146 147 - <code>AzureOpenAIGenerator</code> – The deserialized component instance. 148 149 ## chat/azure 150 151 ### AzureOpenAIChatGenerator 152 153 Bases: <code>OpenAIChatGenerator</code> 154 155 Generates text using OpenAI's models on Azure. 156 157 It works with the gpt-4 - type models and supports streaming responses 158 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 159 format in input and output. 160 161 You can customize how the text is generated by passing parameters to the 162 OpenAI API. Use the `**generation_kwargs` argument when you initialize 163 the component or when you run it. Any parameter that works with 164 `openai.ChatCompletion.create` will work here too. 165 166 For details on OpenAI API parameters, see 167 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 168 169 ### Usage example 170 171 ```python 172 from haystack.components.generators.chat import AzureOpenAIChatGenerator 173 from haystack.dataclasses import ChatMessage 174 from haystack.utils import Secret 175 176 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 177 178 client = AzureOpenAIChatGenerator( 179 azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>", 180 api_key=Secret.from_token("<your-api-key>"), 181 azure_deployment="<this a model name, e.g. gpt-4.1-mini>") 182 response = client.run(messages) 183 print(response) 184 ``` 185 186 ``` 187 {'replies': 188 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text= 189 "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 190 enabling computers to understand, interpret, and generate human language in a way that is useful.")], 191 _name=None, 192 _meta={'model': 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 193 'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})] 194 } 195 ``` 196 197 #### __init__ 198 199 ```python 200 __init__( 201 azure_endpoint: str | None = None, 202 api_version: str | None = "2024-12-01-preview", 203 azure_deployment: str | None = "gpt-4.1-mini", 204 api_key: Secret | None = Secret.from_env_var( 205 "AZURE_OPENAI_API_KEY", strict=False 206 ), 207 azure_ad_token: Secret | None = Secret.from_env_var( 208 "AZURE_OPENAI_AD_TOKEN", strict=False 209 ), 210 organization: str | None = None, 211 streaming_callback: StreamingCallbackT | None = None, 212 timeout: float | None = None, 213 max_retries: int | None = None, 214 generation_kwargs: dict[str, Any] | None = None, 215 default_headers: dict[str, str] | None = None, 216 tools: ToolsType | None = None, 217 tools_strict: bool = False, 218 *, 219 azure_ad_token_provider: ( 220 AzureADTokenProvider | AsyncAzureADTokenProvider | None 221 ) = None, 222 http_client_kwargs: dict[str, Any] | None = None 223 ) 224 ``` 225 226 Initialize the Azure OpenAI Chat Generator component. 227 228 **Parameters:** 229 230 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. 231 - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview. 232 - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name. 233 - **api_key** (<code>Secret | None</code>) – The API key to use for authentication. 234 - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 235 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 236 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 237 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 238 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 239 as an argument. 240 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 241 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 242 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 243 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 244 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to 245 the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 246 Some of the supported parameters: 247 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 248 including visible output tokens and reasoning tokens. 249 - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. 250 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 251 - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers 252 tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising 253 the top 10% probability mass are considered. 254 - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, 255 the LLM will generate two completions per prompt, resulting in 6 completions total. 256 - `stop`: One or more sequences after which the LLM should stop generating tokens. 257 - `presence_penalty`: The penalty applied if a token is already present. 258 Higher values make the model less likely to repeat the token. 259 - `frequency_penalty`: Penalty applied if a token has already been generated. 260 Higher values make the model less likely to repeat the token. 261 - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the 262 values are the bias to add to that token. 263 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 264 If provided, the output will always be validated against this 265 format (unless the model returns a tool call). 266 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 267 Notes: 268 - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. 269 Older models only support basic version of structured outputs through `{"type": "json_object"}`. 270 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 271 - For structured outputs with streaming, 272 the `response_format` must be a JSON schema and not a Pydantic model. 273 - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client. 274 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 275 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 276 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 277 - **azure_ad_token_provider** (<code>AzureADTokenProvider | AsyncAzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on 278 every request. 279 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 280 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 281 282 #### warm_up 283 284 ```python 285 warm_up() 286 ``` 287 288 Warm up the Azure OpenAI chat generator. 289 290 This will warm up the tools registered in the chat generator. 291 This method is idempotent and will only warm up the tools once. 292 293 #### to_dict 294 295 ```python 296 to_dict() -> dict[str, Any] 297 ``` 298 299 Serialize this component to a dictionary. 300 301 **Returns:** 302 303 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 304 305 #### from_dict 306 307 ```python 308 from_dict(data: dict[str, Any]) -> AzureOpenAIChatGenerator 309 ``` 310 311 Deserialize this component from a dictionary. 312 313 **Parameters:** 314 315 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 316 317 **Returns:** 318 319 - <code>AzureOpenAIChatGenerator</code> – The deserialized component instance. 320 321 ## chat/azure_responses 322 323 ### AzureOpenAIResponsesChatGenerator 324 325 Bases: <code>OpenAIResponsesChatGenerator</code> 326 327 Completes chats using OpenAI's Responses API on Azure. 328 329 It works with the gpt-5 and o-series models and supports streaming responses 330 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 331 format in input and output. 332 333 You can customize how the text is generated by passing parameters to the 334 OpenAI API. Use the `**generation_kwargs` argument when you initialize 335 the component or when you run it. Any parameter that works with 336 `openai.Responses.create` will work here too. 337 338 For details on OpenAI API parameters, see 339 [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses). 340 341 ### Usage example 342 343 ```python 344 from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator 345 from haystack.dataclasses import ChatMessage 346 347 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 348 349 client = AzureOpenAIResponsesChatGenerator( 350 azure_endpoint="https://example-resource.azure.openai.com/", 351 generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}} 352 ) 353 response = client.run(messages) 354 print(response) 355 ``` 356 357 #### __init__ 358 359 ```python 360 __init__( 361 *, 362 api_key: ( 363 Secret | Callable[[], str] | Callable[[], Awaitable[str]] 364 ) = Secret.from_env_var("AZURE_OPENAI_API_KEY", strict=False), 365 azure_endpoint: str | None = None, 366 azure_deployment: str = "gpt-5-mini", 367 streaming_callback: StreamingCallbackT | None = None, 368 organization: str | None = None, 369 generation_kwargs: dict[str, Any] | None = None, 370 timeout: float | None = None, 371 max_retries: int | None = None, 372 tools: ToolsType | None = None, 373 tools_strict: bool = False, 374 http_client_kwargs: dict[str, Any] | None = None 375 ) 376 ``` 377 378 Initialize the AzureOpenAIResponsesChatGenerator component. 379 380 **Parameters:** 381 382 - **api_key** (<code>Secret | Callable\[[], str\] | Callable\[[], Awaitable\[str\]\]</code>) – The API key to use for authentication. Can be: 383 - A `Secret` object containing the API key. 384 - A `Secret` object containing the [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 385 - A function that returns an Azure Active Directory token. 386 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. 387 - **azure_deployment** (<code>str</code>) – The deployment of the model, usually the model name. 388 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 389 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 390 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 391 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 392 as an argument. 393 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 394 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 395 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 396 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 397 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent 398 directly to the OpenAI endpoint. 399 See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for 400 more details. 401 Some of the supported parameters: 402 - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, 403 while lower values like 0.2 will make it more focused and deterministic. 404 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 405 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 406 comprising the top 10% probability mass are considered. 407 - `previous_response_id`: The ID of the previous response. 408 Use this to create multi-turn conversations. 409 - `text_format`: A Pydantic model that enforces the structure of the model's response. 410 If provided, the output will always be validated against this 411 format (unless the model returns a tool call). 412 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 413 - `text`: A JSON schema that enforces the structure of the model's response. 414 If provided, the output will always be validated against this 415 format (unless the model returns a tool call). 416 Notes: 417 - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. 418 - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. 419 - Currently, this component doesn't support streaming for structured outputs. 420 - Older models only support basic version of structured outputs through `{"type": "json_object"}`. 421 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 422 - `reasoning`: A dictionary of parameters for reasoning. For example: 423 - `summary`: The summary of the reasoning. 424 - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`. 425 - `generate_summary`: Whether to generate a summary of the reasoning. 426 Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. 427 For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning). 428 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 429 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 430 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 431 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 432 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 433 434 #### to_dict 435 436 ```python 437 to_dict() -> dict[str, Any] 438 ``` 439 440 Serialize this component to a dictionary. 441 442 **Returns:** 443 444 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 445 446 #### from_dict 447 448 ```python 449 from_dict(data: dict[str, Any]) -> AzureOpenAIResponsesChatGenerator 450 ``` 451 452 Deserialize this component from a dictionary. 453 454 **Parameters:** 455 456 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 457 458 **Returns:** 459 460 - <code>AzureOpenAIResponsesChatGenerator</code> – The deserialized component instance. 461 462 ## chat/fallback 463 464 ### FallbackChatGenerator 465 466 A chat generator wrapper that tries multiple chat generators sequentially. 467 468 It forwards all parameters transparently to the underlying chat generators and returns the first successful result. 469 Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator. 470 If all chat generators fail, it raises a RuntimeError with details. 471 472 Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only 473 work correctly if the underlying chat generators implement proper timeout handling and raise exceptions 474 when timeouts occur. For predictable latency guarantees, ensure your chat generators: 475 476 - Support a `timeout` parameter in their initialization 477 - Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming) 478 - Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded 479 480 Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters 481 with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`) 482 typically applies to all connection phases: connection setup, read, write, and pool. For streaming 483 responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for 484 receiving the complete response. 485 486 Failover is automatically triggered when a generator raises any exception, including: 487 488 - Timeout errors (if the generator implements and raises them) 489 - Rate limit errors (429) 490 - Authentication errors (401) 491 - Context length errors (400) 492 - Server errors (500+) 493 - Any other exception 494 495 #### __init__ 496 497 ```python 498 __init__(chat_generators: list[ChatGenerator]) -> None 499 ``` 500 501 Creates an instance of FallbackChatGenerator. 502 503 **Parameters:** 504 505 - **chat_generators** (<code>list\[ChatGenerator\]</code>) – A non-empty list of chat generator components to try in order. 506 507 #### to_dict 508 509 ```python 510 to_dict() -> dict[str, Any] 511 ``` 512 513 Serialize the component, including nested chat generators when they support serialization. 514 515 #### from_dict 516 517 ```python 518 from_dict(data: dict[str, Any]) -> FallbackChatGenerator 519 ``` 520 521 Rebuild the component from a serialized representation, restoring nested chat generators. 522 523 #### warm_up 524 525 ```python 526 warm_up() -> None 527 ``` 528 529 Warm up all underlying chat generators. 530 531 This method calls warm_up() on each underlying generator that supports it. 532 533 #### run 534 535 ```python 536 run( 537 messages: list[ChatMessage], 538 generation_kwargs: dict[str, Any] | None = None, 539 tools: ToolsType | None = None, 540 streaming_callback: StreamingCallbackT | None = None, 541 ) -> dict[str, list[ChatMessage] | dict[str, Any]] 542 ``` 543 544 Execute chat generators sequentially until one succeeds. 545 546 **Parameters:** 547 548 - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances. 549 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens). 550 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities. 551 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses. 552 553 **Returns:** 554 555 - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with: 556 - "replies": Generated ChatMessage instances from the first successful generator. 557 - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, 558 total_attempts, failed_chat_generators, plus any metadata from the successful generator. 559 560 **Raises:** 561 562 - <code>RuntimeError</code> – If all chat generators fail. 563 564 #### run_async 565 566 ```python 567 run_async( 568 messages: list[ChatMessage], 569 generation_kwargs: dict[str, Any] | None = None, 570 tools: ToolsType | None = None, 571 streaming_callback: StreamingCallbackT | None = None, 572 ) -> dict[str, list[ChatMessage] | dict[str, Any]] 573 ``` 574 575 Asynchronously execute chat generators sequentially until one succeeds. 576 577 **Parameters:** 578 579 - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances. 580 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens). 581 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities. 582 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses. 583 584 **Returns:** 585 586 - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with: 587 - "replies": Generated ChatMessage instances from the first successful generator. 588 - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, 589 total_attempts, failed_chat_generators, plus any metadata from the successful generator. 590 591 **Raises:** 592 593 - <code>RuntimeError</code> – If all chat generators fail. 594 595 ## chat/hugging_face_api 596 597 ### HuggingFaceAPIChatGenerator 598 599 Completes chats using Hugging Face APIs. 600 601 HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 602 format for input and output. Use it to generate text with Hugging Face APIs: 603 604 - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers) 605 - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints) 606 - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference) 607 608 ### Usage examples 609 610 #### With the serverless inference API (Inference Providers) - free tier available 611 612 ```python 613 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 614 from haystack.dataclasses import ChatMessage 615 from haystack.utils import Secret 616 from haystack.utils.hf import HFGenerationAPIType 617 618 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 619 ChatMessage.from_user("What's Natural Language Processing?")] 620 621 # the api_type can be expressed using the HFGenerationAPIType enum or as a string 622 api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API 623 api_type = "serverless_inference_api" # this is equivalent to the above 624 625 generator = HuggingFaceAPIChatGenerator(api_type=api_type, 626 api_params={"model": "Qwen/Qwen2.5-7B-Instruct", 627 "provider": "together"}, 628 token=Secret.from_token("<your-api-key>")) 629 630 result = generator.run(messages) 631 print(result) 632 ``` 633 634 #### With the serverless inference API (Inference Providers) and text+image input 635 636 ```python 637 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 638 from haystack.dataclasses import ChatMessage, ImageContent 639 from haystack.utils import Secret 640 from haystack.utils.hf import HFGenerationAPIType 641 642 # Create an image from file path, URL, or base64 643 image = ImageContent.from_file_path("path/to/your/image.jpg") 644 645 # Create a multimodal message with both text and image 646 messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])] 647 648 generator = HuggingFaceAPIChatGenerator( 649 api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API, 650 api_params={ 651 "model": "Qwen/Qwen2.5-VL-7B-Instruct", # Vision Language Model 652 "provider": "hyperbolic" 653 }, 654 token=Secret.from_token("<your-api-key>") 655 ) 656 657 result = generator.run(messages) 658 print(result) 659 ``` 660 661 #### With paid inference endpoints 662 663 ````python 664 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 665 from haystack.dataclasses import ChatMessage 666 from haystack.utils import Secret 667 668 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 669 ChatMessage.from_user("What's Natural Language Processing?")] 670 671 generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints", 672 api_params={"url": "<your-inference-endpoint-url>"}, 673 token=Secret.from_token("<your-api-key>")) 674 675 result = generator.run(messages) 676 print(result) 677 678 #### With self-hosted text generation inference 679 680 ```python 681 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 682 from haystack.dataclasses import ChatMessage 683 684 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 685 ChatMessage.from_user("What's Natural Language Processing?")] 686 687 generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference", 688 api_params={"url": "http://localhost:8080"}) 689 690 result = generator.run(messages) 691 print(result) 692 ```` 693 694 #### __init__ 695 696 ```python 697 __init__( 698 api_type: HFGenerationAPIType | str, 699 api_params: dict[str, str], 700 token: Secret | None = Secret.from_env_var( 701 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 702 ), 703 generation_kwargs: dict[str, Any] | None = None, 704 stop_words: list[str] | None = None, 705 streaming_callback: StreamingCallbackT | None = None, 706 tools: ToolsType | None = None, 707 ) 708 ``` 709 710 Initialize the HuggingFaceAPIChatGenerator instance. 711 712 **Parameters:** 713 714 - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types: 715 - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference). 716 - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints). 717 - `serverless_inference_api`: See 718 [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers). 719 - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys: 720 - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. 721 - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`. 722 - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or 723 `TEXT_GENERATION_INFERENCE`. 724 - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc. 725 - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization. 726 Check your HF token in your [account settings](https://huggingface.co/settings/tokens). 727 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 728 Some examples: `max_tokens`, `temperature`, `top_p`. 729 For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion). 730 - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words. 731 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 732 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 733 The chosen model should support tool/function calling, according to the model card. 734 Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience 735 unexpected behavior. 736 737 #### warm_up 738 739 ```python 740 warm_up() 741 ``` 742 743 Warm up the Hugging Face API chat generator. 744 745 This will warm up the tools registered in the chat generator. 746 This method is idempotent and will only warm up the tools once. 747 748 #### to_dict 749 750 ```python 751 to_dict() -> dict[str, Any] 752 ``` 753 754 Serialize this component to a dictionary. 755 756 **Returns:** 757 758 - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component. 759 760 #### from_dict 761 762 ```python 763 from_dict(data: dict[str, Any]) -> HuggingFaceAPIChatGenerator 764 ``` 765 766 Deserialize this component from a dictionary. 767 768 #### run 769 770 ```python 771 run( 772 messages: list[ChatMessage], 773 generation_kwargs: dict[str, Any] | None = None, 774 tools: ToolsType | None = None, 775 streaming_callback: StreamingCallbackT | None = None, 776 ) -> dict[str, list[ChatMessage]] 777 ``` 778 779 Invoke the text generation inference based on the provided messages and generation parameters. 780 781 **Parameters:** 782 783 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 784 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 785 - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override 786 the `tools` parameter set during component initialization. This parameter can accept either a 787 list of `Tool` objects or a `Toolset` instance. 788 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback` 789 parameter set during component initialization. 790 791 **Returns:** 792 793 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 794 - `replies`: A list containing the generated responses as ChatMessage objects. 795 796 #### run_async 797 798 ```python 799 run_async( 800 messages: list[ChatMessage], 801 generation_kwargs: dict[str, Any] | None = None, 802 tools: ToolsType | None = None, 803 streaming_callback: StreamingCallbackT | None = None, 804 ) -> dict[str, list[ChatMessage]] 805 ``` 806 807 Asynchronously invokes the text generation inference based on the provided messages and generation parameters. 808 809 This is the asynchronous version of the `run` method. It has the same parameters 810 and return values but can be used with `await` in an async code. 811 812 **Parameters:** 813 814 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 815 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 816 - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools` 817 parameter set during component initialization. This parameter can accept either a list of `Tool` objects 818 or a `Toolset` instance. 819 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback` 820 parameter set during component initialization. 821 822 **Returns:** 823 824 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 825 - `replies`: A list containing the generated responses as ChatMessage objects. 826 827 ## chat/hugging_face_local 828 829 ### default_tool_parser 830 831 ```python 832 default_tool_parser(text: str) -> list[ToolCall] | None 833 ``` 834 835 Default implementation for parsing tool calls from model output text. 836 837 Uses DEFAULT_TOOL_PATTERN to extract tool calls. 838 839 **Parameters:** 840 841 - **text** (<code>str</code>) – The text to parse for tool calls. 842 843 **Returns:** 844 845 - <code>list\[ToolCall\] | None</code> – A list containing a single ToolCall if a valid tool call is found, None otherwise. 846 847 ### HuggingFaceLocalChatGenerator 848 849 Generates chat responses using models from Hugging Face that run locally. 850 851 Use this component with chat-based models, 852 such as `Qwen/Qwen3-0.6B` or `meta-llama/Llama-2-7b-chat-hf`. 853 LLMs running locally may need powerful hardware. 854 855 ### Usage example 856 857 ```python 858 from haystack.components.generators.chat import HuggingFaceLocalChatGenerator 859 from haystack.dataclasses import ChatMessage 860 861 generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B") 862 messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")] 863 print(generator.run(messages)) 864 ``` 865 866 ``` 867 {'replies': 868 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text= 869 "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals 870 with the interaction between computers and human language. It enables computers to understand, interpret, and 871 generate human language in a valuable way. NLP involves various techniques such as speech recognition, text 872 analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to 873 process and derive meaning from human language, improving communication between humans and machines.")], 874 _name=None, 875 _meta={'finish_reason': 'stop', 'index': 0, 'model': 876 'mistralai/Mistral-7B-Instruct-v0.2', 877 'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}}) 878 ] 879 } 880 ``` 881 882 #### __init__ 883 884 ```python 885 __init__( 886 model: str = "Qwen/Qwen3-0.6B", 887 task: Literal["text-generation", "text2text-generation"] | None = None, 888 device: ComponentDevice | None = None, 889 token: Secret | None = Secret.from_env_var( 890 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 891 ), 892 chat_template: str | None = None, 893 generation_kwargs: dict[str, Any] | None = None, 894 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 895 stop_words: list[str] | None = None, 896 streaming_callback: StreamingCallbackT | None = None, 897 tools: ToolsType | None = None, 898 tool_parsing_function: Callable[[str], list[ToolCall] | None] | None = None, 899 async_executor: ThreadPoolExecutor | None = None, 900 *, 901 enable_thinking: bool = False 902 ) -> None 903 ``` 904 905 Initializes the HuggingFaceLocalChatGenerator component. 906 907 **Parameters:** 908 909 - **model** (<code>str</code>) – The Hugging Face text generation model name or path, 910 for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`. 911 The model must be a chat model supporting the ChatML messaging 912 format. 913 If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 914 - **task** (<code>Literal['text-generation', 'text2text-generation'] | None</code>) – The task for the Hugging Face pipeline. Possible options: 915 - `text-generation`: Supported by decoder models, like GPT. 916 - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead. 917 Previously supported by encoder–decoder models such as T5. 918 If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 919 If not specified, the component calls the Hugging Face API to infer the task from the model name. 920 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 921 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 922 - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files. 923 If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 924 - **chat_template** (<code>str | None</code>) – Specifies an optional Jinja template for formatting chat 925 messages. Most high-quality chat models have their own templates, but for models without this 926 feature or if you prefer a custom template, use this parameter. 927 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 928 Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. 929 See Hugging Face's documentation for more information: 930 - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation) 931 - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig) 932 The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens. 933 - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the 934 Hugging Face pipeline for text generation. 935 These keyword arguments provide fine-grained control over the Hugging Face pipeline. 936 In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. 937 For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). 938 In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) 939 - **stop_words** (<code>list\[str\] | None</code>) – A list of stop words. If the model generates a stop word, the generation stops. 940 If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. 941 For some chat models, the output includes both the new text and the original prompt. 942 In these cases, make sure your prompt has no stop words. 943 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 944 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 945 - **tool_parsing_function** (<code>Callable\\[[str\], list\[ToolCall\] | None\] | None</code>) – A callable that takes a string and returns a list of ToolCall objects or None. 946 If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern. 947 - **async_executor** (<code>ThreadPoolExecutor | None</code>) – Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be 948 initialized and used 949 - **enable_thinking** (<code>bool</code>) – Whether to enable thinking mode in the chat template for thinking-capable models. 950 When enabled, the model generates intermediate reasoning before the final response. Defaults to False. 951 952 #### shutdown 953 954 ```python 955 shutdown() -> None 956 ``` 957 958 Explicitly shutdown the executor if we own it. 959 960 #### warm_up 961 962 ```python 963 warm_up() -> None 964 ``` 965 966 Initializes the component and warms up tools if provided. 967 968 #### to_dict 969 970 ```python 971 to_dict() -> dict[str, Any] 972 ``` 973 974 Serializes the component to a dictionary. 975 976 **Returns:** 977 978 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 979 980 #### from_dict 981 982 ```python 983 from_dict(data: dict[str, Any]) -> HuggingFaceLocalChatGenerator 984 ``` 985 986 Deserializes the component from a dictionary. 987 988 **Parameters:** 989 990 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 991 992 **Returns:** 993 994 - <code>HuggingFaceLocalChatGenerator</code> – The deserialized component. 995 996 #### run 997 998 ```python 999 run( 1000 messages: list[ChatMessage], 1001 generation_kwargs: dict[str, Any] | None = None, 1002 streaming_callback: StreamingCallbackT | None = None, 1003 tools: ToolsType | None = None, 1004 ) -> dict[str, list[ChatMessage]] 1005 ``` 1006 1007 Invoke text generation inference based on the provided messages and generation parameters. 1008 1009 **Parameters:** 1010 1011 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 1012 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 1013 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1014 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1015 If set, it will override the `tools` parameter provided during initialization. 1016 1017 **Returns:** 1018 1019 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 1020 - `replies`: A list containing the generated responses as ChatMessage instances. 1021 1022 #### create_message 1023 1024 ```python 1025 create_message( 1026 text: str, 1027 index: int, 1028 tokenizer: Union[PreTrainedTokenizer, PreTrainedTokenizerFast], 1029 prompt: str, 1030 generation_kwargs: dict[str, Any], 1031 parse_tool_calls: bool = False, 1032 ) -> ChatMessage 1033 ``` 1034 1035 Create a ChatMessage instance from the provided text, populated with metadata. 1036 1037 **Parameters:** 1038 1039 - **text** (<code>str</code>) – The generated text. 1040 - **index** (<code>int</code>) – The index of the generated text. 1041 - **tokenizer** (<code>Union\[PreTrainedTokenizer, PreTrainedTokenizerFast\]</code>) – The tokenizer used for generation. 1042 - **prompt** (<code>str</code>) – The prompt used for generation. 1043 - **generation_kwargs** (<code>dict\[str, Any\]</code>) – The generation parameters. 1044 - **parse_tool_calls** (<code>bool</code>) – Whether to attempt parsing tool calls from the text. 1045 1046 **Returns:** 1047 1048 - <code>ChatMessage</code> – A ChatMessage instance. 1049 1050 #### run_async 1051 1052 ```python 1053 run_async( 1054 messages: list[ChatMessage], 1055 generation_kwargs: dict[str, Any] | None = None, 1056 streaming_callback: StreamingCallbackT | None = None, 1057 tools: ToolsType | None = None, 1058 ) -> dict[str, list[ChatMessage]] 1059 ``` 1060 1061 Asynchronously invokes text generation inference based on the provided messages and generation parameters. 1062 1063 This is the asynchronous version of the `run` method. It has the same parameters 1064 and return values but can be used with `await` in an async code. 1065 1066 **Parameters:** 1067 1068 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 1069 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 1070 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1071 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1072 If set, it will override the `tools` parameter provided during initialization. 1073 1074 **Returns:** 1075 1076 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 1077 - `replies`: A list containing the generated responses as ChatMessage instances. 1078 1079 ## chat/llm 1080 1081 ### LLM 1082 1083 Bases: <code>Agent</code> 1084 1085 A text generation component powered by a large language model. 1086 1087 The LLM component is a simplified version of the Agent that focuses solely on text generation 1088 without tool usage. It processes messages and returns a single response from the language model. 1089 1090 ### Usage examples 1091 1092 ```python 1093 from haystack.components.generators.chat import LLM 1094 from haystack.components.generators.chat import OpenAIChatGenerator 1095 from haystack.dataclasses import ChatMessage 1096 1097 llm = LLM( 1098 chat_generator=OpenAIChatGenerator(), 1099 system_prompt="You are a helpful translation assistant.", 1100 user_prompt="""{% message role="user"%} 1101 Summarize the following document: {{ document }} 1102 {% endmessage %}""", 1103 required_variables=["document"], 1104 ) 1105 1106 result = llm.run(document="The weather is lovely today and the sun is shining. ") 1107 print(result["last_message"].text) 1108 ``` 1109 1110 #### __init__ 1111 1112 ```python 1113 __init__( 1114 *, 1115 chat_generator: ChatGenerator, 1116 system_prompt: str | None = None, 1117 user_prompt: str | None = None, 1118 required_variables: list[str] | Literal["*"] | None = None, 1119 streaming_callback: StreamingCallbackT | None = None 1120 ) -> None 1121 ``` 1122 1123 Initialize the LLM component. 1124 1125 **Parameters:** 1126 1127 - **chat_generator** (<code>ChatGenerator</code>) – An instance of the chat generator that the LLM should use. 1128 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. 1129 - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided this is appended to the messages provided at runtime. 1130 - **required_variables** (<code>list\[str\] | Literal['\*'] | None</code>) – List variables that must be provided as input to user_prompt. 1131 If a variable listed as required is not provided, an exception is raised. 1132 If set to `"*"`, all variables found in the prompt are required. Optional. 1133 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM. 1134 1135 #### to_dict 1136 1137 ```python 1138 to_dict() -> dict[str, Any] 1139 ``` 1140 1141 Serialize the LLM component to a dictionary. 1142 1143 **Returns:** 1144 1145 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1146 1147 #### from_dict 1148 1149 ```python 1150 from_dict(data: dict[str, Any]) -> LLM 1151 ``` 1152 1153 Deserialize the LLM from a dictionary. 1154 1155 **Parameters:** 1156 1157 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 1158 1159 **Returns:** 1160 1161 - <code>LLM</code> – Deserialized LLM instance. 1162 1163 #### run 1164 1165 ```python 1166 run( 1167 messages: list[ChatMessage] | None = None, 1168 streaming_callback: StreamingCallbackT | None = None, 1169 *, 1170 generation_kwargs: dict[str, Any] | None = None, 1171 system_prompt: str | None = None, 1172 user_prompt: str | None = None, 1173 **kwargs: Any 1174 ) -> dict[str, Any] 1175 ``` 1176 1177 Process messages and generate a response from the language model. 1178 1179 **Parameters:** 1180 1181 - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process. 1182 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM. 1183 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters 1184 will override the parameters passed during component initialization. 1185 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt. 1186 - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is 1187 appended to the messages provided at runtime. 1188 - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt` 1189 (the keys must match template variable names). 1190 1191 **Returns:** 1192 1193 - <code>dict\[str, Any\]</code> – A dictionary with the following keys: 1194 - "messages": List of all messages exchanged during the LLM's run. 1195 - "last_message": The last message exchanged during the LLM's run. 1196 1197 #### run_async 1198 1199 ```python 1200 run_async( 1201 messages: list[ChatMessage] | None = None, 1202 streaming_callback: StreamingCallbackT | None = None, 1203 *, 1204 generation_kwargs: dict[str, Any] | None = None, 1205 system_prompt: str | None = None, 1206 user_prompt: str | None = None, 1207 **kwargs: Any 1208 ) -> dict[str, Any] 1209 ``` 1210 1211 Asynchronously process messages and generate a response from the language model. 1212 1213 **Parameters:** 1214 1215 - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process. 1216 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An asynchronous callback that will be invoked when a response is streamed 1217 from the LLM. 1218 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters 1219 will override the parameters passed during component initialization. 1220 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt. 1221 - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is 1222 appended to the messages provided at runtime. 1223 - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt` 1224 (the keys must match template variable names). 1225 1226 **Returns:** 1227 1228 - <code>dict\[str, Any\]</code> – A dictionary with the following keys: 1229 - "messages": List of all messages exchanged during the LLM's run. 1230 - "last_message": The last message exchanged during the LLM's run. 1231 1232 ## chat/openai 1233 1234 ### OpenAIChatGenerator 1235 1236 Completes chats using OpenAI's large language models (LLMs). 1237 1238 It works with the gpt-4 and gpt-5 series models and supports streaming responses 1239 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1240 format in input and output. 1241 1242 You can customize how the text is generated by passing parameters to the 1243 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1244 the component or when you run it. Any parameter that works with 1245 `openai.ChatCompletion.create` will work here too. 1246 1247 For details on OpenAI API parameters, see 1248 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 1249 1250 ### Usage example 1251 1252 ```python 1253 from haystack.components.generators.chat import OpenAIChatGenerator 1254 from haystack.dataclasses import ChatMessage 1255 1256 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 1257 1258 client = OpenAIChatGenerator() 1259 response = client.run(messages) 1260 print(response) 1261 ``` 1262 1263 Output: 1264 1265 ``` 1266 {'replies': 1267 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content= 1268 [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence 1269 that focuses on enabling computers to understand, interpret, and generate human language in 1270 a way that is meaningful and useful.")], 1271 _name=None, 1272 _meta={'model': 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 1273 'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}}) 1274 ] 1275 } 1276 ``` 1277 1278 #### __init__ 1279 1280 ```python 1281 __init__( 1282 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 1283 model: str = "gpt-5-mini", 1284 streaming_callback: StreamingCallbackT | None = None, 1285 api_base_url: str | None = None, 1286 organization: str | None = None, 1287 generation_kwargs: dict[str, Any] | None = None, 1288 timeout: float | None = None, 1289 max_retries: int | None = None, 1290 tools: ToolsType | None = None, 1291 tools_strict: bool = False, 1292 http_client_kwargs: dict[str, Any] | None = None, 1293 ) 1294 ``` 1295 1296 Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini 1297 1298 Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' 1299 environment variables to override the `timeout` and `max_retries` parameters respectively 1300 in the OpenAI client. 1301 1302 **Parameters:** 1303 1304 - **api_key** (<code>Secret</code>) – The OpenAI API key. 1305 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 1306 during initialization. 1307 - **model** (<code>str</code>) – The name of the model to use. 1308 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1309 The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 1310 as an argument. 1311 - **api_base_url** (<code>str | None</code>) – An optional base URL. 1312 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See 1313 [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 1314 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to 1315 the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for 1316 more details. 1317 Some of the supported parameters: 1318 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 1319 including visible output tokens and reasoning tokens. 1320 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 1321 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 1322 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 1323 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 1324 comprising the top 10% probability mass are considered. 1325 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 1326 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 1327 - `stop`: One or more sequences after which the LLM should stop generating tokens. 1328 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 1329 the model will be less likely to repeat the same token in the text. 1330 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 1331 Bigger values mean the model will be less likely to repeat the same token in the text. 1332 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 1333 values are the bias to add to that token. 1334 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 1335 If provided, the output will always be validated against this 1336 format (unless the model returns a tool call). 1337 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 1338 Notes: 1339 - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. 1340 Older models only support basic version of structured outputs through `{"type": "json_object"}`. 1341 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 1342 - For structured outputs with streaming, 1343 the `response_format` must be a JSON schema and not a Pydantic model. 1344 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 1345 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 1346 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 1347 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 1348 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1349 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1350 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1351 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 1352 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 1353 1354 #### warm_up 1355 1356 ```python 1357 warm_up() 1358 ``` 1359 1360 Warm up the OpenAI chat generator. 1361 1362 This will warm up the tools registered in the chat generator. 1363 This method is idempotent and will only warm up the tools once. 1364 1365 #### to_dict 1366 1367 ```python 1368 to_dict() -> dict[str, Any] 1369 ``` 1370 1371 Serialize this component to a dictionary. 1372 1373 **Returns:** 1374 1375 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 1376 1377 #### from_dict 1378 1379 ```python 1380 from_dict(data: dict[str, Any]) -> OpenAIChatGenerator 1381 ``` 1382 1383 Deserialize this component from a dictionary. 1384 1385 **Parameters:** 1386 1387 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 1388 1389 **Returns:** 1390 1391 - <code>OpenAIChatGenerator</code> – The deserialized component instance. 1392 1393 #### run 1394 1395 ```python 1396 run( 1397 messages: list[ChatMessage], 1398 streaming_callback: StreamingCallbackT | None = None, 1399 generation_kwargs: dict[str, Any] | None = None, 1400 *, 1401 tools: ToolsType | None = None, 1402 tools_strict: bool | None = None 1403 ) -> dict[str, list[ChatMessage]] 1404 ``` 1405 1406 Invokes chat completion based on the provided messages and generation parameters. 1407 1408 **Parameters:** 1409 1410 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1411 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1412 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1413 override the parameters passed during component initialization. 1414 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 1415 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1416 If set, it will override the `tools` parameter provided during initialization. 1417 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1418 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1419 If set, it will override the `tools_strict` parameter set during component initialization. 1420 1421 **Returns:** 1422 1423 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1424 - `replies`: A list containing the generated responses as ChatMessage instances. 1425 1426 #### run_async 1427 1428 ```python 1429 run_async( 1430 messages: list[ChatMessage], 1431 streaming_callback: StreamingCallbackT | None = None, 1432 generation_kwargs: dict[str, Any] | None = None, 1433 *, 1434 tools: ToolsType | None = None, 1435 tools_strict: bool | None = None 1436 ) -> dict[str, list[ChatMessage]] 1437 ``` 1438 1439 Asynchronously invokes chat completion based on the provided messages and generation parameters. 1440 1441 This is the asynchronous version of the `run` method. It has the same parameters and return values 1442 but can be used with `await` in async code. 1443 1444 **Parameters:** 1445 1446 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1447 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1448 Must be a coroutine. 1449 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1450 override the parameters passed during component initialization. 1451 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 1452 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1453 If set, it will override the `tools` parameter provided during initialization. 1454 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1455 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1456 If set, it will override the `tools_strict` parameter set during component initialization. 1457 1458 **Returns:** 1459 1460 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1461 - `replies`: A list containing the generated responses as ChatMessage instances. 1462 1463 ## chat/openai_responses 1464 1465 ### OpenAIResponsesChatGenerator 1466 1467 Completes chats using OpenAI's Responses API. 1468 1469 It works with the gpt-4 and o-series models and supports streaming responses 1470 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1471 format in input and output. 1472 1473 You can customize how the text is generated by passing parameters to the 1474 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1475 the component or when you run it. Any parameter that works with 1476 `openai.Responses.create` will work here too. 1477 1478 For details on OpenAI API parameters, see 1479 [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses). 1480 1481 ### Usage example 1482 1483 ```python 1484 from haystack.components.generators.chat import OpenAIResponsesChatGenerator 1485 from haystack.dataclasses import ChatMessage 1486 1487 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 1488 1489 client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}) 1490 response = client.run(messages) 1491 print(response) 1492 ``` 1493 1494 #### __init__ 1495 1496 ```python 1497 __init__( 1498 *, 1499 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 1500 model: str = "gpt-5-mini", 1501 streaming_callback: StreamingCallbackT | None = None, 1502 api_base_url: str | None = None, 1503 organization: str | None = None, 1504 generation_kwargs: dict[str, Any] | None = None, 1505 timeout: float | None = None, 1506 max_retries: int | None = None, 1507 tools: ToolsType | list[dict] | None = None, 1508 tools_strict: bool = False, 1509 http_client_kwargs: dict[str, Any] | None = None 1510 ) 1511 ``` 1512 1513 Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default. 1514 1515 Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' 1516 environment variables to override the `timeout` and `max_retries` parameters respectively 1517 in the OpenAI client. 1518 1519 **Parameters:** 1520 1521 - **api_key** (<code>Secret</code>) – The OpenAI API key. 1522 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 1523 during initialization. 1524 - **model** (<code>str</code>) – The name of the model to use. 1525 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1526 The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 1527 as an argument. 1528 - **api_base_url** (<code>str | None</code>) – An optional base URL. 1529 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See 1530 [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 1531 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent 1532 directly to the OpenAI endpoint. 1533 See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for 1534 more details. 1535 Some of the supported parameters: 1536 - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, 1537 while lower values like 0.2 will make it more focused and deterministic. 1538 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 1539 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 1540 comprising the top 10% probability mass are considered. 1541 - `previous_response_id`: The ID of the previous response. 1542 Use this to create multi-turn conversations. 1543 - `text_format`: A Pydantic model that enforces the structure of the model's response. 1544 If provided, the output will always be validated against this 1545 format (unless the model returns a tool call). 1546 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 1547 - `text`: A JSON schema that enforces the structure of the model's response. 1548 If provided, the output will always be validated against this 1549 format (unless the model returns a tool call). 1550 Notes: 1551 - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. 1552 - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. 1553 - Currently, this component doesn't support streaming for structured outputs. 1554 - Older models only support basic version of structured outputs through `{"type": "json_object"}`. 1555 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 1556 - `reasoning`: A dictionary of parameters for reasoning. For example: 1557 - `summary`: The summary of the reasoning. 1558 - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`. 1559 - `generate_summary`: Whether to generate a summary of the reasoning. 1560 Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. 1561 For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning). 1562 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 1563 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 1564 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 1565 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 1566 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. This parameter can accept either a 1567 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1568 OpenAI/MCP tool definitions. 1569 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1570 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 1571 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 1572 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 1573 are strict by default. 1574 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 1575 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 1576 1577 #### warm_up 1578 1579 ```python 1580 warm_up() 1581 ``` 1582 1583 Warm up the OpenAI responses chat generator. 1584 1585 This will warm up the tools registered in the chat generator. 1586 This method is idempotent and will only warm up the tools once. 1587 1588 #### to_dict 1589 1590 ```python 1591 to_dict() -> dict[str, Any] 1592 ``` 1593 1594 Serialize this component to a dictionary. 1595 1596 **Returns:** 1597 1598 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 1599 1600 #### from_dict 1601 1602 ```python 1603 from_dict(data: dict[str, Any]) -> OpenAIResponsesChatGenerator 1604 ``` 1605 1606 Deserialize this component from a dictionary. 1607 1608 **Parameters:** 1609 1610 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 1611 1612 **Returns:** 1613 1614 - <code>OpenAIResponsesChatGenerator</code> – The deserialized component instance. 1615 1616 #### run 1617 1618 ```python 1619 run( 1620 messages: list[ChatMessage], 1621 *, 1622 streaming_callback: StreamingCallbackT | None = None, 1623 generation_kwargs: dict[str, Any] | None = None, 1624 tools: ToolsType | list[dict] | None = None, 1625 tools_strict: bool | None = None 1626 ) -> dict[str, list[ChatMessage]] 1627 ``` 1628 1629 Invokes response generation based on the provided messages and generation parameters. 1630 1631 **Parameters:** 1632 1633 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1634 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1635 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1636 override the parameters passed during component initialization. 1637 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 1638 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. If set, it will override the 1639 `tools` parameter set during component initialization. This parameter can accept either a 1640 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1641 OpenAI/MCP tool definitions. 1642 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1643 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 1644 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 1645 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 1646 are strict by default. 1647 If set, it will override the `tools_strict` parameter set during component initialization. 1648 1649 **Returns:** 1650 1651 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1652 - `replies`: A list containing the generated responses as ChatMessage instances. 1653 1654 #### run_async 1655 1656 ```python 1657 run_async( 1658 messages: list[ChatMessage], 1659 *, 1660 streaming_callback: StreamingCallbackT | None = None, 1661 generation_kwargs: dict[str, Any] | None = None, 1662 tools: ToolsType | list[dict] | None = None, 1663 tools_strict: bool | None = None 1664 ) -> dict[str, list[ChatMessage]] 1665 ``` 1666 1667 Asynchronously invokes response generation based on the provided messages and generation parameters. 1668 1669 This is the asynchronous version of the `run` method. It has the same parameters and return values 1670 but can be used with `await` in async code. 1671 1672 **Parameters:** 1673 1674 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1675 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1676 Must be a coroutine. 1677 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1678 override the parameters passed during component initialization. 1679 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 1680 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the 1681 `tools` parameter set during component initialization. This parameter can accept either a list of 1682 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1683 OpenAI/MCP tool definitions. 1684 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1685 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1686 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1687 If set, it will override the `tools_strict` parameter set during component initialization. 1688 1689 **Returns:** 1690 1691 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1692 - `replies`: A list containing the generated responses as ChatMessage instances. 1693 1694 ## hugging_face_api 1695 1696 ### HuggingFaceAPIGenerator 1697 1698 Generates text using Hugging Face APIs. 1699 1700 Use it with the following Hugging Face APIs: 1701 1702 - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints) 1703 - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference) 1704 1705 **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the 1706 `text_generation` endpoint. Generative models are now only available through providers supporting the 1707 `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API. 1708 Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint. 1709 1710 ### Usage examples 1711 1712 #### With Hugging Face Inference Endpoints 1713 1714 ```python 1715 from haystack.components.generators import HuggingFaceAPIGenerator 1716 from haystack.utils import Secret 1717 1718 generator = HuggingFaceAPIGenerator(api_type="inference_endpoints", 1719 api_params={"url": "<your-inference-endpoint-url>"}, 1720 token=Secret.from_token("<your-api-key>")) 1721 1722 result = generator.run(prompt="What's Natural Language Processing?") 1723 print(result) 1724 ``` 1725 1726 #### With self-hosted text generation inference 1727 1728 ```python 1729 from haystack.components.generators import HuggingFaceAPIGenerator 1730 1731 generator = HuggingFaceAPIGenerator(api_type="text_generation_inference", 1732 api_params={"url": "http://localhost:8080"}) 1733 1734 result = generator.run(prompt="What's Natural Language Processing?") 1735 print(result) 1736 ``` 1737 1738 #### With the free serverless inference API 1739 1740 Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the 1741 `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the 1742 `chat_completion` endpoint. 1743 1744 ```python 1745 from haystack.components.generators import HuggingFaceAPIGenerator 1746 from haystack.utils import Secret 1747 1748 generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api", 1749 api_params={"model": "HuggingFaceH4/zephyr-7b-beta"}, 1750 token=Secret.from_token("<your-api-key>")) 1751 1752 result = generator.run(prompt="What's Natural Language Processing?") 1753 print(result) 1754 ``` 1755 1756 #### __init__ 1757 1758 ```python 1759 __init__( 1760 api_type: HFGenerationAPIType | str, 1761 api_params: dict[str, str], 1762 token: Secret | None = Secret.from_env_var( 1763 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 1764 ), 1765 generation_kwargs: dict[str, Any] | None = None, 1766 stop_words: list[str] | None = None, 1767 streaming_callback: StreamingCallbackT | None = None, 1768 ) 1769 ``` 1770 1771 Initialize the HuggingFaceAPIGenerator instance. 1772 1773 **Parameters:** 1774 1775 - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types: 1776 - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference). 1777 - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints). 1778 - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api). 1779 This might no longer work due to changes in the models offered in the Hugging Face Inference API. 1780 Please use the `HuggingFaceAPIChatGenerator` component instead. 1781 - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys: 1782 - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. 1783 - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or 1784 `TEXT_GENERATION_INFERENCE`. 1785 - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc. 1786 - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization. 1787 Check your HF token in your [account settings](https://huggingface.co/settings/tokens). 1788 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`, 1789 `temperature`, `top_k`, `top_p`. 1790 For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) 1791 for more information. 1792 - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words. 1793 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1794 1795 #### to_dict 1796 1797 ```python 1798 to_dict() -> dict[str, Any] 1799 ``` 1800 1801 Serialize this component to a dictionary. 1802 1803 **Returns:** 1804 1805 - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component. 1806 1807 #### from_dict 1808 1809 ```python 1810 from_dict(data: dict[str, Any]) -> HuggingFaceAPIGenerator 1811 ``` 1812 1813 Deserialize this component from a dictionary. 1814 1815 #### run 1816 1817 ```python 1818 run( 1819 prompt: str, 1820 streaming_callback: StreamingCallbackT | None = None, 1821 generation_kwargs: dict[str, Any] | None = None, 1822 ) 1823 ``` 1824 1825 Invoke the text generation inference for the given prompt and generation parameters. 1826 1827 **Parameters:** 1828 1829 - **prompt** (<code>str</code>) – A string representing the prompt. 1830 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1831 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 1832 1833 **Returns:** 1834 1835 - – A dictionary with the generated replies and metadata. Both are lists of length n. 1836 - replies: A list of strings representing the generated replies. 1837 1838 ## hugging_face_local 1839 1840 ### HuggingFaceLocalGenerator 1841 1842 Generates text using models from Hugging Face that run locally. 1843 1844 LLMs running locally may need powerful hardware. 1845 1846 ### Usage example 1847 1848 ```python 1849 from haystack.components.generators import HuggingFaceLocalGenerator 1850 1851 generator = HuggingFaceLocalGenerator( 1852 model="Qwen/Qwen3-0.6B", 1853 task="text-generation", 1854 generation_kwargs={"max_new_tokens": 100, "temperature": 0.9} 1855 ) 1856 1857 print(generator.run("Who is the best American actor?")) 1858 # {'replies': ['John Cusack']} 1859 ``` 1860 1861 #### __init__ 1862 1863 ```python 1864 __init__( 1865 model: str = "Qwen/Qwen3-0.6B", 1866 task: Literal["text-generation", "text2text-generation"] | None = None, 1867 device: ComponentDevice | None = None, 1868 token: Secret | None = Secret.from_env_var( 1869 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 1870 ), 1871 generation_kwargs: dict[str, Any] | None = None, 1872 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 1873 stop_words: list[str] | None = None, 1874 streaming_callback: StreamingCallbackT | None = None, 1875 ) 1876 ``` 1877 1878 Creates an instance of a HuggingFaceLocalGenerator. 1879 1880 **Parameters:** 1881 1882 - **model** (<code>str</code>) – The Hugging Face text generation model name or path. 1883 - **task** (<code>Literal['text-generation', 'text2text-generation'] | None</code>) – The task for the Hugging Face pipeline. Possible options: 1884 - `text-generation`: Supported by decoder models, like GPT. 1885 - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead. 1886 Previously supported by encoder–decoder models such as T5. 1887 If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1888 If not specified, the component calls the Hugging Face API to infer the task from the model name. 1889 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 1890 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 1891 - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files. 1892 If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1893 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 1894 Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. 1895 See Hugging Face's documentation for more information: 1896 - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation) 1897 - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig) 1898 - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the 1899 Hugging Face pipeline for text generation. 1900 These keyword arguments provide fine-grained control over the Hugging Face pipeline. 1901 In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. 1902 For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). 1903 In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization: 1904 [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) 1905 - **stop_words** (<code>list\[str\] | None</code>) – If the model generates a stop word, the generation stops. 1906 If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. 1907 For some chat models, the output includes both the new text and the original prompt. 1908 In these cases, make sure your prompt has no stop words. 1909 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1910 1911 #### warm_up 1912 1913 ```python 1914 warm_up() 1915 ``` 1916 1917 Initializes the component. 1918 1919 #### to_dict 1920 1921 ```python 1922 to_dict() -> dict[str, Any] 1923 ``` 1924 1925 Serializes the component to a dictionary. 1926 1927 **Returns:** 1928 1929 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1930 1931 #### from_dict 1932 1933 ```python 1934 from_dict(data: dict[str, Any]) -> HuggingFaceLocalGenerator 1935 ``` 1936 1937 Deserializes the component from a dictionary. 1938 1939 **Parameters:** 1940 1941 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 1942 1943 **Returns:** 1944 1945 - <code>HuggingFaceLocalGenerator</code> – The deserialized component. 1946 1947 #### run 1948 1949 ```python 1950 run( 1951 prompt: str, 1952 streaming_callback: StreamingCallbackT | None = None, 1953 generation_kwargs: dict[str, Any] | None = None, 1954 ) 1955 ``` 1956 1957 Run the text generation model on the given prompt. 1958 1959 **Parameters:** 1960 1961 - **prompt** (<code>str</code>) – A string representing the prompt. 1962 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1963 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 1964 1965 **Returns:** 1966 1967 - – A dictionary containing the generated replies. 1968 - replies: A list of strings representing the generated replies. 1969 1970 ## openai 1971 1972 ### OpenAIGenerator 1973 1974 Generates text using OpenAI's large language models (LLMs). 1975 1976 It works with the gpt-4 and gpt-5 series models and supports streaming responses 1977 from OpenAI API. It uses strings as input and output. 1978 1979 You can customize how the text is generated by passing parameters to the 1980 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1981 the component or when you run it. Any parameter that works with 1982 `openai.ChatCompletion.create` will work here too. 1983 1984 For details on OpenAI API parameters, see 1985 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 1986 1987 ### Usage example 1988 1989 ```python 1990 from haystack.components.generators import OpenAIGenerator 1991 client = OpenAIGenerator() 1992 response = client.run("What's Natural Language Processing? Be brief.") 1993 print(response) 1994 1995 >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 1996 >> the interaction between computers and human language. It involves enabling computers to understand, interpret, 1997 >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model': 1998 >> 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16, 1999 >> 'completion_tokens': 49, 'total_tokens': 65}}]} 2000 ``` 2001 2002 #### __init__ 2003 2004 ```python 2005 __init__( 2006 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 2007 model: str = "gpt-5-mini", 2008 streaming_callback: StreamingCallbackT | None = None, 2009 api_base_url: str | None = None, 2010 organization: str | None = None, 2011 system_prompt: str | None = None, 2012 generation_kwargs: dict[str, Any] | None = None, 2013 timeout: float | None = None, 2014 max_retries: int | None = None, 2015 http_client_kwargs: dict[str, Any] | None = None, 2016 ) 2017 ``` 2018 2019 Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini 2020 2021 By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters 2022 in the OpenAI client. 2023 2024 **Parameters:** 2025 2026 - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI. 2027 - **model** (<code>str</code>) – The name of the model to use. 2028 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2029 The callback function accepts StreamingChunk as an argument. 2030 - **api_base_url** (<code>str | None</code>) – An optional base URL. 2031 - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`. 2032 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the system prompt is 2033 omitted, and the default system prompt of the model is used. 2034 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are all sent directly to 2035 the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for 2036 more details. 2037 Some of the supported parameters: 2038 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 2039 including visible output tokens and reasoning tokens. 2040 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 2041 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 2042 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 2043 considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens 2044 comprising the top 10% probability mass are considered. 2045 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 2046 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 2047 - `stop`: One or more sequences after which the LLM should stop generating tokens. 2048 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 2049 the model will be less likely to repeat the same token in the text. 2050 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 2051 Bigger values mean the model will be less likely to repeat the same token in the text. 2052 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 2053 values are the bias to add to that token. 2054 - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable 2055 or set to 30. 2056 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred 2057 from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 2058 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 2059 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 2060 2061 #### to_dict 2062 2063 ```python 2064 to_dict() -> dict[str, Any] 2065 ``` 2066 2067 Serialize this component to a dictionary. 2068 2069 **Returns:** 2070 2071 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 2072 2073 #### from_dict 2074 2075 ```python 2076 from_dict(data: dict[str, Any]) -> OpenAIGenerator 2077 ``` 2078 2079 Deserialize this component from a dictionary. 2080 2081 **Parameters:** 2082 2083 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 2084 2085 **Returns:** 2086 2087 - <code>OpenAIGenerator</code> – The deserialized component instance. 2088 2089 #### run 2090 2091 ```python 2092 run( 2093 prompt: str, 2094 system_prompt: str | None = None, 2095 streaming_callback: StreamingCallbackT | None = None, 2096 generation_kwargs: dict[str, Any] | None = None, 2097 ) -> dict[str, list[str] | list[dict[str, Any]]] 2098 ``` 2099 2100 Invoke the text generation inference based on the provided messages and generation parameters. 2101 2102 **Parameters:** 2103 2104 - **prompt** (<code>str</code>) – The string prompt to use for text generation. 2105 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If this run time system prompt is omitted, the system 2106 prompt, if defined at initialisation time, is used. 2107 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2108 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will potentially override the parameters 2109 passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to 2110 the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create). 2111 2112 **Returns:** 2113 2114 - <code>dict\[str, list\[str\] | list\[dict\[str, Any\]\]\]</code> – A list of strings containing the generated responses and a list of dictionaries containing the metadata 2115 for each response. 2116 2117 ## openai_dalle 2118 2119 ### DALLEImageGenerator 2120 2121 Generates images using OpenAI's DALL-E model. 2122 2123 For details on OpenAI API parameters, see 2124 [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create). 2125 2126 ### Usage example 2127 2128 ```python 2129 from haystack.components.generators import DALLEImageGenerator 2130 image_generator = DALLEImageGenerator() 2131 response = image_generator.run("Show me a picture of a black cat.") 2132 print(response) 2133 ``` 2134 2135 #### __init__ 2136 2137 ```python 2138 __init__( 2139 model: str = "dall-e-3", 2140 quality: Literal["standard", "hd"] = "standard", 2141 size: Literal[ 2142 "256x256", "512x512", "1024x1024", "1792x1024", "1024x1792" 2143 ] = "1024x1024", 2144 response_format: Literal["url", "b64_json"] = "url", 2145 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 2146 api_base_url: str | None = None, 2147 organization: str | None = None, 2148 timeout: float | None = None, 2149 max_retries: int | None = None, 2150 http_client_kwargs: dict[str, Any] | None = None, 2151 ) 2152 ``` 2153 2154 Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3. 2155 2156 **Parameters:** 2157 2158 - **model** (<code>str</code>) – The model to use for image generation. Can be "dall-e-2" or "dall-e-3". 2159 - **quality** (<code>Literal['standard', 'hd']</code>) – The quality of the generated image. Can be "standard" or "hd". 2160 - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792']</code>) – The size of the generated images. 2161 Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. 2162 Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models. 2163 - **response_format** (<code>Literal['url', 'b64_json']</code>) – The format of the response. Can be "url" or "b64_json". 2164 - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI. 2165 - **api_base_url** (<code>str | None</code>) – An optional base URL. 2166 - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`. 2167 - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable 2168 or set to 30. 2169 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred 2170 from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 2171 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 2172 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 2173 2174 #### warm_up 2175 2176 ```python 2177 warm_up() -> None 2178 ``` 2179 2180 Warm up the OpenAI client. 2181 2182 #### run 2183 2184 ```python 2185 run( 2186 prompt: str, 2187 size: ( 2188 Literal["256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"] 2189 | None 2190 ) = None, 2191 quality: Literal["standard", "hd"] | None = None, 2192 response_format: Literal["url", "b64_json"] | None = None, 2193 ) 2194 ``` 2195 2196 Invokes the image generation inference based on the provided prompt and generation parameters. 2197 2198 **Parameters:** 2199 2200 - **prompt** (<code>str</code>) – The prompt to generate the image. 2201 - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792'] | None</code>) – If provided, overrides the size provided during initialization. 2202 - **quality** (<code>Literal['standard', 'hd'] | None</code>) – If provided, overrides the quality provided during initialization. 2203 - **response_format** (<code>Literal['url', 'b64_json'] | None</code>) – If provided, overrides the response format provided during initialization. 2204 2205 **Returns:** 2206 2207 - – A dictionary containing the generated list of images and the revised prompt. 2208 Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings. 2209 The revised prompt is the prompt that was used to generate the image, if there was any revision 2210 to the prompt made by OpenAI. 2211 2212 #### to_dict 2213 2214 ```python 2215 to_dict() -> dict[str, Any] 2216 ``` 2217 2218 Serialize this component to a dictionary. 2219 2220 **Returns:** 2221 2222 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 2223 2224 #### from_dict 2225 2226 ```python 2227 from_dict(data: dict[str, Any]) -> DALLEImageGenerator 2228 ``` 2229 2230 Deserialize this component from a dictionary. 2231 2232 **Parameters:** 2233 2234 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 2235 2236 **Returns:** 2237 2238 - <code>DALLEImageGenerator</code> – The deserialized component instance. 2239 2240 ## utils 2241 2242 ### print_streaming_chunk 2243 2244 ```python 2245 print_streaming_chunk(chunk: StreamingChunk) -> None 2246 ``` 2247 2248 Callback function to handle and display streaming output chunks. 2249 2250 This function processes a `StreamingChunk` object by: 2251 2252 - Printing tool call metadata (if any), including function names and arguments, as they arrive. 2253 - Printing tool call results when available. 2254 - Printing the main content (e.g., text tokens) of the chunk as it is received. 2255 2256 The function outputs data directly to stdout and flushes output buffers to ensure immediate display during 2257 streaming. 2258 2259 **Parameters:** 2260 2261 - **chunk** (<code>StreamingChunk</code>) – A chunk of streaming data containing content and optional metadata, such as tool calls and 2262 tool results.