generators_api.md
1 --- 2 title: "Generators" 3 id: generators-api 4 description: "Enables text generation using LLMs." 5 slug: "/generators-api" 6 --- 7 8 9 ## azure 10 11 ### AzureOpenAIGenerator 12 13 Bases: <code>OpenAIGenerator</code> 14 15 Generates text using OpenAI's large language models (LLMs). 16 17 It works with the gpt-4 - type models and supports streaming responses 18 from OpenAI API. 19 20 You can customize how the text is generated by passing parameters to the 21 OpenAI API. Use the `**generation_kwargs` argument when you initialize 22 the component or when you run it. Any parameter that works with 23 `openai.ChatCompletion.create` will work here too. 24 25 For details on OpenAI API parameters, see 26 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 27 28 ### Usage example 29 30 ```python 31 from haystack.components.generators import AzureOpenAIGenerator 32 from haystack.utils import Secret 33 client = AzureOpenAIGenerator( 34 azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>", 35 api_key=Secret.from_token("<your-api-key>"), 36 azure_deployment="<this a model name, e.g. gpt-4.1-mini>") 37 response = client.run("What's Natural Language Processing? Be brief.") 38 print(response) 39 ``` 40 41 ``` 42 >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 43 >> the interaction between computers and human language. It involves enabling computers to understand, interpret, 44 >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model': 45 >> 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16, 46 >> 'completion_tokens': 49, 'total_tokens': 65}}]} 47 ``` 48 49 #### __init__ 50 51 ```python 52 __init__( 53 azure_endpoint: str | None = None, 54 api_version: str | None = "2024-12-01-preview", 55 azure_deployment: str | None = "gpt-4.1-mini", 56 api_key: Secret | None = Secret.from_env_var( 57 "AZURE_OPENAI_API_KEY", strict=False 58 ), 59 azure_ad_token: Secret | None = Secret.from_env_var( 60 "AZURE_OPENAI_AD_TOKEN", strict=False 61 ), 62 organization: str | None = None, 63 streaming_callback: StreamingCallbackT | None = None, 64 system_prompt: str | None = None, 65 timeout: float | None = None, 66 max_retries: int | None = None, 67 http_client_kwargs: dict[str, Any] | None = None, 68 generation_kwargs: dict[str, Any] | None = None, 69 default_headers: dict[str, str] | None = None, 70 *, 71 azure_ad_token_provider: AzureADTokenProvider | None = None 72 ) -> None 73 ``` 74 75 Initialize the Azure OpenAI Generator. 76 77 **Parameters:** 78 79 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `https://example-resource.azure.openai.com/`. 80 - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview. 81 - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name. 82 - **api_key** (<code>Secret | None</code>) – The API key to use for authentication. 83 - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 84 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 85 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 86 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 87 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 88 as an argument. 89 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the Generator 90 omits the system prompt and uses the default system prompt. 91 - **timeout** (<code>float | None</code>) – Timeout for AzureOpenAI client. If not set, it is inferred from the 92 `OPENAI_TIMEOUT` environment variable or set to 30. 93 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with AzureOpenAI if it returns an internal error. 94 If not set, it is inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 95 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 96 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 97 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model, sent directly to 98 the OpenAI endpoint. See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat) for 99 more details. 100 Some of the supported parameters: 101 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 102 including visible output tokens and reasoning tokens. 103 - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. 104 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 105 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 106 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 107 comprising the top 10% probability mass are considered. 108 - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, 109 the LLM will generate two completions per prompt, resulting in 6 completions total. 110 - `stop`: One or more sequences after which the LLM should stop generating tokens. 111 - `presence_penalty`: The penalty applied if a token is already present. 112 Higher values make the model less likely to repeat the token. 113 - `frequency_penalty`: Penalty applied if a token has already been generated. 114 Higher values make the model less likely to repeat the token. 115 - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the 116 values are the bias to add to that token. 117 - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client. 118 - **azure_ad_token_provider** (<code>AzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on 119 every request. 120 121 #### to_dict 122 123 ```python 124 to_dict() -> dict[str, Any] 125 ``` 126 127 Serialize this component to a dictionary. 128 129 **Returns:** 130 131 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 132 133 #### from_dict 134 135 ```python 136 from_dict(data: dict[str, Any]) -> AzureOpenAIGenerator 137 ``` 138 139 Deserialize this component from a dictionary. 140 141 **Parameters:** 142 143 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 144 145 **Returns:** 146 147 - <code>AzureOpenAIGenerator</code> – The deserialized component instance. 148 149 ## chat/azure 150 151 ### AzureOpenAIChatGenerator 152 153 Bases: <code>OpenAIChatGenerator</code> 154 155 Generates text using OpenAI's models on Azure. 156 157 It works with the gpt-4 - type models and supports streaming responses 158 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 159 format in input and output. 160 161 You can customize how the text is generated by passing parameters to the 162 OpenAI API. Use the `**generation_kwargs` argument when you initialize 163 the component or when you run it. Any parameter that works with 164 `openai.ChatCompletion.create` will work here too. 165 166 For details on OpenAI API parameters, see 167 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 168 169 ### Usage example 170 171 ```python 172 from haystack.components.generators.chat import AzureOpenAIChatGenerator 173 from haystack.dataclasses import ChatMessage 174 from haystack.utils import Secret 175 176 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 177 178 client = AzureOpenAIChatGenerator( 179 azure_endpoint="<Your Azure endpoint e.g. `https://your-company.azure.openai.com/>", 180 api_key=Secret.from_token("<your-api-key>"), 181 azure_deployment="<this a model name, e.g. gpt-4.1-mini>") 182 response = client.run(messages) 183 print(response) 184 ``` 185 186 ``` 187 {'replies': 188 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text= 189 "Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 190 enabling computers to understand, interpret, and generate human language in a way that is useful.")], 191 _name=None, 192 _meta={'model': 'gpt-4.1-mini', 'index': 0, 'finish_reason': 'stop', 193 'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})] 194 } 195 ``` 196 197 #### SUPPORTED_MODELS 198 199 ```python 200 SUPPORTED_MODELS: list[str] = [ 201 "gpt-5.4", 202 "gpt-5.4-pro", 203 "gpt-5.3-codex", 204 "gpt-5.2", 205 "gpt-5.2-codex", 206 "gpt-5.2-chat", 207 "gpt-5.1", 208 "gpt-5.1-chat", 209 "gpt-5.1-codex", 210 "gpt-5.1-codex-mini", 211 "gpt-5", 212 "gpt-5-mini", 213 "gpt-5-nano", 214 "gpt-5-chat", 215 "gpt-4.1", 216 "gpt-4.1-mini", 217 "gpt-4.1-nano", 218 "gpt-4o", 219 "gpt-4o-mini", 220 "gpt-4o-audio-preview", 221 "gpt-realtime-1.5", 222 "gpt-audio-1.5", 223 "o1", 224 "o1-mini", 225 "o3", 226 "o3-mini", 227 "o4-mini", 228 "codex-mini", 229 "gpt-4", 230 "gpt-35-turbo", 231 "gpt-oss-120b", 232 "computer-use-preview", 233 ] 234 235 ``` 236 237 A non-exhaustive list of chat models supported by this component. 238 See https://learn.microsoft.com/en-us/azure/foundry/foundry-models/concepts/models-sold-directly-by-azure 239 for the full list. 240 241 #### __init__ 242 243 ```python 244 __init__( 245 azure_endpoint: str | None = None, 246 api_version: str | None = "2024-12-01-preview", 247 azure_deployment: str | None = "gpt-4.1-mini", 248 api_key: Secret | None = Secret.from_env_var( 249 "AZURE_OPENAI_API_KEY", strict=False 250 ), 251 azure_ad_token: Secret | None = Secret.from_env_var( 252 "AZURE_OPENAI_AD_TOKEN", strict=False 253 ), 254 organization: str | None = None, 255 streaming_callback: StreamingCallbackT | None = None, 256 timeout: float | None = None, 257 max_retries: int | None = None, 258 generation_kwargs: dict[str, Any] | None = None, 259 default_headers: dict[str, str] | None = None, 260 tools: ToolsType | None = None, 261 tools_strict: bool = False, 262 *, 263 azure_ad_token_provider: ( 264 AzureADTokenProvider | AsyncAzureADTokenProvider | None 265 ) = None, 266 http_client_kwargs: dict[str, Any] | None = None 267 ) -> None 268 ``` 269 270 Initialize the Azure OpenAI Chat Generator component. 271 272 **Parameters:** 273 274 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. 275 - **api_version** (<code>str | None</code>) – The version of the API to use. Defaults to 2024-12-01-preview. 276 - **azure_deployment** (<code>str | None</code>) – The deployment of the model, usually the model name. 277 - **api_key** (<code>Secret | None</code>) – The API key to use for authentication. 278 - **azure_ad_token** (<code>Secret | None</code>) – [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 279 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 280 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 281 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 282 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 283 as an argument. 284 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 285 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 286 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 287 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 288 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to 289 the OpenAI endpoint. For details, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 290 Some of the supported parameters: 291 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 292 including visible output tokens and reasoning tokens. 293 - `temperature`: The sampling temperature to use. Higher values mean the model takes more risks. 294 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 295 - `top_p`: Nucleus sampling is an alternative to sampling with temperature, where the model considers 296 tokens with a top_p probability mass. For example, 0.1 means only the tokens comprising 297 the top 10% probability mass are considered. 298 - `n`: The number of completions to generate for each prompt. For example, with 3 prompts and n=2, 299 the LLM will generate two completions per prompt, resulting in 6 completions total. 300 - `stop`: One or more sequences after which the LLM should stop generating tokens. 301 - `presence_penalty`: The penalty applied if a token is already present. 302 Higher values make the model less likely to repeat the token. 303 - `frequency_penalty`: Penalty applied if a token has already been generated. 304 Higher values make the model less likely to repeat the token. 305 - `logit_bias`: Adds a logit bias to specific tokens. The keys of the dictionary are tokens, and the 306 values are the bias to add to that token. 307 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 308 If provided, the output will always be validated against this 309 format (unless the model returns a tool call). 310 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 311 Notes: 312 - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. 313 Older models only support basic version of structured outputs through `{"type": "json_object"}`. 314 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 315 - For structured outputs with streaming, 316 the `response_format` must be a JSON schema and not a Pydantic model. 317 - **default_headers** (<code>dict\[str, str\] | None</code>) – Default headers to use for the AzureOpenAI client. 318 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 319 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 320 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 321 - **azure_ad_token_provider** (<code>AzureADTokenProvider | AsyncAzureADTokenProvider | None</code>) – A function that returns an Azure Active Directory token, will be invoked on 322 every request. 323 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 324 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 325 326 #### warm_up 327 328 ```python 329 warm_up() -> None 330 ``` 331 332 Warm up the Azure OpenAI chat generator. 333 334 This will warm up the tools registered in the chat generator. 335 This method is idempotent and will only warm up the tools once. 336 337 #### to_dict 338 339 ```python 340 to_dict() -> dict[str, Any] 341 ``` 342 343 Serialize this component to a dictionary. 344 345 **Returns:** 346 347 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 348 349 #### from_dict 350 351 ```python 352 from_dict(data: dict[str, Any]) -> AzureOpenAIChatGenerator 353 ``` 354 355 Deserialize this component from a dictionary. 356 357 **Parameters:** 358 359 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 360 361 **Returns:** 362 363 - <code>AzureOpenAIChatGenerator</code> – The deserialized component instance. 364 365 ## chat/azure_responses 366 367 ### AzureOpenAIResponsesChatGenerator 368 369 Bases: <code>OpenAIResponsesChatGenerator</code> 370 371 Completes chats using OpenAI's Responses API on Azure. 372 373 It works with the gpt-5 and o-series models and supports streaming responses 374 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 375 format in input and output. 376 377 You can customize how the text is generated by passing parameters to the 378 OpenAI API. Use the `**generation_kwargs` argument when you initialize 379 the component or when you run it. Any parameter that works with 380 `openai.Responses.create` will work here too. 381 382 For details on OpenAI API parameters, see 383 [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses). 384 385 ### Usage example 386 387 ```python 388 from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator 389 from haystack.dataclasses import ChatMessage 390 391 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 392 393 client = AzureOpenAIResponsesChatGenerator( 394 azure_endpoint="https://example-resource.azure.openai.com/", 395 generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}} 396 ) 397 response = client.run(messages) 398 print(response) 399 ``` 400 401 #### SUPPORTED_MODELS 402 403 ```python 404 SUPPORTED_MODELS: list[str] = [ 405 "gpt-5.4-pro", 406 "gpt-5.4", 407 "gpt-5.3-chat", 408 "gpt-5.3-codex", 409 "gpt-5.2-codex", 410 "gpt-5.2", 411 "gpt-5.2-chat", 412 "gpt-5.1-codex-max", 413 "gpt-5.1", 414 "gpt-5.1-chat", 415 "gpt-5.1-codex", 416 "gpt-5.1-codex-mini", 417 "gpt-5-pro", 418 "gpt-5-codex", 419 "gpt-5", 420 "gpt-5-mini", 421 "gpt-5-nano", 422 "gpt-5-chat", 423 "gpt-4o", 424 "gpt-4o-mini", 425 "computer-use-preview", 426 "gpt-4.1", 427 "gpt-4.1-nano", 428 "gpt-4.1-mini", 429 "gpt-image-1", 430 "gpt-image-1-mini", 431 "gpt-image-1.5", 432 "o1", 433 "o3-mini", 434 "o3", 435 "o4-mini", 436 ] 437 438 ``` 439 440 A non-exhaustive list of chat models supported by this component. 441 See https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/responses#model-support for the full list. 442 443 #### __init__ 444 445 ```python 446 __init__( 447 *, 448 api_key: ( 449 Secret | Callable[[], str] | Callable[[], Awaitable[str]] 450 ) = Secret.from_env_var("AZURE_OPENAI_API_KEY", strict=False), 451 azure_endpoint: str | None = None, 452 azure_deployment: str = "gpt-5-mini", 453 streaming_callback: StreamingCallbackT | None = None, 454 organization: str | None = None, 455 generation_kwargs: dict[str, Any] | None = None, 456 timeout: float | None = None, 457 max_retries: int | None = None, 458 tools: ToolsType | None = None, 459 tools_strict: bool = False, 460 http_client_kwargs: dict[str, Any] | None = None 461 ) -> None 462 ``` 463 464 Initialize the AzureOpenAIResponsesChatGenerator component. 465 466 **Parameters:** 467 468 - **api_key** (<code>Secret | Callable\[[], str\] | Callable\[[], Awaitable\[str\]\]</code>) – The API key to use for authentication. Can be: 469 - A `Secret` object containing the API key. 470 - A `Secret` object containing the [Azure Active Directory token](https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id). 471 - A function that returns an Azure Active Directory token. 472 - **azure_endpoint** (<code>str | None</code>) – The endpoint of the deployed model, for example `"https://example-resource.azure.openai.com/"`. 473 - **azure_deployment** (<code>str</code>) – The deployment of the model, usually the model name. 474 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. For help, see 475 [Setting up your organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 476 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function called when a new token is received from the stream. 477 It accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 478 as an argument. 479 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 480 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 481 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 482 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 483 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent 484 directly to the OpenAI endpoint. 485 See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for 486 more details. 487 Some of the supported parameters: 488 - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, 489 while lower values like 0.2 will make it more focused and deterministic. 490 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 491 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 492 comprising the top 10% probability mass are considered. 493 - `previous_response_id`: The ID of the previous response. 494 Use this to create multi-turn conversations. 495 - `text_format`: A Pydantic model that enforces the structure of the model's response. 496 If provided, the output will always be validated against this 497 format (unless the model returns a tool call). 498 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 499 - `text`: A JSON schema that enforces the structure of the model's response. 500 If provided, the output will always be validated against this 501 format (unless the model returns a tool call). 502 Notes: 503 - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. 504 - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. 505 - Currently, this component doesn't support streaming for structured outputs. 506 - Older models only support basic version of structured outputs through `{"type": "json_object"}`. 507 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 508 - `reasoning`: A dictionary of parameters for reasoning. For example: 509 - `summary`: The summary of the reasoning. 510 - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`. 511 - `generate_summary`: Whether to generate a summary of the reasoning. 512 Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. 513 For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning). 514 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 515 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 516 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 517 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 518 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 519 520 #### to_dict 521 522 ```python 523 to_dict() -> dict[str, Any] 524 ``` 525 526 Serialize this component to a dictionary. 527 528 **Returns:** 529 530 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 531 532 #### from_dict 533 534 ```python 535 from_dict(data: dict[str, Any]) -> AzureOpenAIResponsesChatGenerator 536 ``` 537 538 Deserialize this component from a dictionary. 539 540 **Parameters:** 541 542 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 543 544 **Returns:** 545 546 - <code>AzureOpenAIResponsesChatGenerator</code> – The deserialized component instance. 547 548 ## chat/fallback 549 550 ### FallbackChatGenerator 551 552 A chat generator wrapper that tries multiple chat generators sequentially. 553 554 It forwards all parameters transparently to the underlying chat generators and returns the first successful result. 555 Calls chat generators sequentially until one succeeds. Falls back on any exception raised by a generator. 556 If all chat generators fail, it raises a RuntimeError with details. 557 558 Timeout enforcement is fully delegated to the underlying chat generators. The fallback mechanism will only 559 work correctly if the underlying chat generators implement proper timeout handling and raise exceptions 560 when timeouts occur. For predictable latency guarantees, ensure your chat generators: 561 562 - Support a `timeout` parameter in their initialization 563 - Implement timeout as total wall-clock time (shared deadline for both streaming and non-streaming) 564 - Raise timeout exceptions (e.g., TimeoutError, asyncio.TimeoutError, httpx.TimeoutException) when exceeded 565 566 Note: Most well-implemented chat generators (OpenAI, Anthropic, Cohere, etc.) support timeout parameters 567 with consistent semantics. For HTTP-based LLM providers, a single timeout value (e.g., `timeout=30`) 568 typically applies to all connection phases: connection setup, read, write, and pool. For streaming 569 responses, read timeout is the maximum gap between chunks. For non-streaming, it's the time limit for 570 receiving the complete response. 571 572 Failover is automatically triggered when a generator raises any exception, including: 573 574 - Timeout errors (if the generator implements and raises them) 575 - Rate limit errors (429) 576 - Authentication errors (401) 577 - Context length errors (400) 578 - Server errors (500+) 579 - Any other exception 580 581 #### __init__ 582 583 ```python 584 __init__(chat_generators: list[ChatGenerator]) -> None 585 ``` 586 587 Creates an instance of FallbackChatGenerator. 588 589 **Parameters:** 590 591 - **chat_generators** (<code>list\[ChatGenerator\]</code>) – A non-empty list of chat generator components to try in order. 592 593 #### to_dict 594 595 ```python 596 to_dict() -> dict[str, Any] 597 ``` 598 599 Serialize the component, including nested chat generators when they support serialization. 600 601 #### from_dict 602 603 ```python 604 from_dict(data: dict[str, Any]) -> FallbackChatGenerator 605 ``` 606 607 Rebuild the component from a serialized representation, restoring nested chat generators. 608 609 #### warm_up 610 611 ```python 612 warm_up() -> None 613 ``` 614 615 Warm up all underlying chat generators. 616 617 This method calls warm_up() on each underlying generator that supports it. 618 619 #### run 620 621 ```python 622 run( 623 messages: list[ChatMessage], 624 generation_kwargs: dict[str, Any] | None = None, 625 tools: ToolsType | None = None, 626 streaming_callback: StreamingCallbackT | None = None, 627 ) -> dict[str, list[ChatMessage] | dict[str, Any]] 628 ``` 629 630 Execute chat generators sequentially until one succeeds. 631 632 **Parameters:** 633 634 - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances. 635 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens). 636 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities. 637 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses. 638 639 **Returns:** 640 641 - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with: 642 - "replies": Generated ChatMessage instances from the first successful generator. 643 - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, 644 total_attempts, failed_chat_generators, plus any metadata from the successful generator. 645 646 **Raises:** 647 648 - <code>RuntimeError</code> – If all chat generators fail. 649 650 #### run_async 651 652 ```python 653 run_async( 654 messages: list[ChatMessage], 655 generation_kwargs: dict[str, Any] | None = None, 656 tools: ToolsType | None = None, 657 streaming_callback: StreamingCallbackT | None = None, 658 ) -> dict[str, list[ChatMessage] | dict[str, Any]] 659 ``` 660 661 Asynchronously execute chat generators sequentially until one succeeds. 662 663 **Parameters:** 664 665 - **messages** (<code>list\[ChatMessage\]</code>) – The conversation history as a list of ChatMessage instances. 666 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Optional parameters for the chat generator (e.g., temperature, max_tokens). 667 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for function calling capabilities. 668 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – Optional callable for handling streaming responses. 669 670 **Returns:** 671 672 - <code>dict\[str, list\[ChatMessage\] | dict\[str, Any\]\]</code> – A dictionary with: 673 - "replies": Generated ChatMessage instances from the first successful generator. 674 - "meta": Execution metadata including successful_chat_generator_index, successful_chat_generator_class, 675 total_attempts, failed_chat_generators, plus any metadata from the successful generator. 676 677 **Raises:** 678 679 - <code>RuntimeError</code> – If all chat generators fail. 680 681 ## chat/hugging_face_api 682 683 ### HuggingFaceAPIChatGenerator 684 685 Completes chats using Hugging Face APIs. 686 687 HuggingFaceAPIChatGenerator uses the [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 688 format for input and output. Use it to generate text with Hugging Face APIs: 689 690 - [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers) 691 - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints) 692 - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference) 693 694 ### Usage examples 695 696 #### With the serverless inference API (Inference Providers) - free tier available 697 698 ```python 699 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 700 from haystack.dataclasses import ChatMessage 701 from haystack.utils import Secret 702 from haystack.utils.hf import HFGenerationAPIType 703 704 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 705 ChatMessage.from_user("What's Natural Language Processing?")] 706 707 # the api_type can be expressed using the HFGenerationAPIType enum or as a string 708 api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API 709 api_type = "serverless_inference_api" # this is equivalent to the above 710 711 generator = HuggingFaceAPIChatGenerator(api_type=api_type, 712 api_params={"model": "Qwen/Qwen2.5-7B-Instruct", 713 "provider": "together"}, 714 token=Secret.from_token("<your-api-key>")) 715 716 result = generator.run(messages) 717 print(result) 718 ``` 719 720 #### With the serverless inference API (Inference Providers) and text+image input 721 722 ```python 723 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 724 from haystack.dataclasses import ChatMessage, ImageContent 725 from haystack.utils import Secret 726 from haystack.utils.hf import HFGenerationAPIType 727 728 # Create an image from file path, URL, or base64 729 image = ImageContent.from_file_path("path/to/your/image.jpg") 730 731 # Create a multimodal message with both text and image 732 messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])] 733 734 generator = HuggingFaceAPIChatGenerator( 735 api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API, 736 api_params={ 737 "model": "Qwen/Qwen2.5-VL-7B-Instruct", # Vision Language Model 738 "provider": "hyperbolic" 739 }, 740 token=Secret.from_token("<your-api-key>") 741 ) 742 743 result = generator.run(messages) 744 print(result) 745 ``` 746 747 #### With paid inference endpoints 748 749 ```python 750 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 751 from haystack.dataclasses import ChatMessage 752 from haystack.utils import Secret 753 754 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 755 ChatMessage.from_user("What's Natural Language Processing?")] 756 757 generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints", 758 api_params={"url": "<your-inference-endpoint-url>"}, 759 token=Secret.from_token("<your-api-key>")) 760 761 result = generator.run(messages) 762 print(result) 763 ``` 764 765 #### With self-hosted text generation inference 766 767 ```python 768 from haystack.components.generators.chat import HuggingFaceAPIChatGenerator 769 from haystack.dataclasses import ChatMessage 770 771 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 772 ChatMessage.from_user("What's Natural Language Processing?")] 773 774 generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference", 775 api_params={"url": "http://localhost:8080"}) 776 777 result = generator.run(messages) 778 print(result) 779 ``` 780 781 #### __init__ 782 783 ```python 784 __init__( 785 api_type: HFGenerationAPIType | str, 786 api_params: dict[str, str], 787 token: Secret | None = Secret.from_env_var( 788 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 789 ), 790 generation_kwargs: dict[str, Any] | None = None, 791 stop_words: list[str] | None = None, 792 streaming_callback: StreamingCallbackT | None = None, 793 tools: ToolsType | None = None, 794 ) -> None 795 ``` 796 797 Initialize the HuggingFaceAPIChatGenerator instance. 798 799 **Parameters:** 800 801 - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types: 802 - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference). 803 - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints). 804 - `serverless_inference_api`: See 805 [Serverless Inference API - Inference Providers](https://huggingface.co/docs/inference-providers). 806 - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys: 807 - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. 808 - `provider`: Provider name. Recommended when `api_type` is `SERVERLESS_INFERENCE_API`. 809 - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or 810 `TEXT_GENERATION_INFERENCE`. 811 - Other parameters specific to the chosen API type, such as `timeout`, `headers`, etc. 812 - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization. 813 Check your HF token in your [account settings](https://huggingface.co/settings/tokens). 814 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 815 Some examples: `max_tokens`, `temperature`, `top_p`. 816 For details, see [Hugging Face chat_completion documentation](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion). 817 - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words. 818 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 819 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 820 The chosen model should support tool/function calling, according to the model card. 821 Support for tools in the Hugging Face API and TGI is not yet fully refined and you may experience 822 unexpected behavior. 823 824 #### warm_up 825 826 ```python 827 warm_up() -> None 828 ``` 829 830 Warm up the Hugging Face API chat generator. 831 832 This will warm up the tools registered in the chat generator. 833 This method is idempotent and will only warm up the tools once. 834 835 #### to_dict 836 837 ```python 838 to_dict() -> dict[str, Any] 839 ``` 840 841 Serialize this component to a dictionary. 842 843 **Returns:** 844 845 - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component. 846 847 #### from_dict 848 849 ```python 850 from_dict(data: dict[str, Any]) -> HuggingFaceAPIChatGenerator 851 ``` 852 853 Deserialize this component from a dictionary. 854 855 #### run 856 857 ```python 858 run( 859 messages: list[ChatMessage], 860 generation_kwargs: dict[str, Any] | None = None, 861 tools: ToolsType | None = None, 862 streaming_callback: StreamingCallbackT | None = None, 863 ) -> dict[str, list[ChatMessage]] 864 ``` 865 866 Invoke the text generation inference based on the provided messages and generation parameters. 867 868 **Parameters:** 869 870 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 871 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 872 - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override 873 the `tools` parameter set during component initialization. This parameter can accept either a 874 list of `Tool` objects or a `Toolset` instance. 875 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback` 876 parameter set during component initialization. 877 878 **Returns:** 879 880 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 881 - `replies`: A list containing the generated responses as ChatMessage objects. 882 883 #### run_async 884 885 ```python 886 run_async( 887 messages: list[ChatMessage], 888 generation_kwargs: dict[str, Any] | None = None, 889 tools: ToolsType | None = None, 890 streaming_callback: StreamingCallbackT | None = None, 891 ) -> dict[str, list[ChatMessage]] 892 ``` 893 894 Asynchronously invokes the text generation inference based on the provided messages and generation parameters. 895 896 This is the asynchronous version of the `run` method. It has the same parameters 897 and return values but can be used with `await` in an async code. 898 899 **Parameters:** 900 901 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 902 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 903 - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the `tools` 904 parameter set during component initialization. This parameter can accept either a list of `Tool` objects 905 or a `Toolset` instance. 906 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. If set, it will override the `streaming_callback` 907 parameter set during component initialization. 908 909 **Returns:** 910 911 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 912 - `replies`: A list containing the generated responses as ChatMessage objects. 913 914 ## chat/hugging_face_local 915 916 ### default_tool_parser 917 918 ```python 919 default_tool_parser(text: str) -> list[ToolCall] | None 920 ``` 921 922 Default implementation for parsing tool calls from model output text. 923 924 Uses DEFAULT_TOOL_PATTERN to extract tool calls. 925 926 **Parameters:** 927 928 - **text** (<code>str</code>) – The text to parse for tool calls. 929 930 **Returns:** 931 932 - <code>list\[ToolCall\] | None</code> – A list containing a single ToolCall if a valid tool call is found, None otherwise. 933 934 ### HuggingFaceLocalChatGenerator 935 936 Generates chat responses using models from Hugging Face that run locally. 937 938 Use this component with chat-based models, 939 such as `Qwen/Qwen3-0.6B` or `meta-llama/Llama-2-7b-chat-hf`. 940 LLMs running locally may need powerful hardware. 941 942 ### Usage example 943 944 ```python 945 from haystack.components.generators.chat import HuggingFaceLocalChatGenerator 946 from haystack.dataclasses import ChatMessage 947 948 generator = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3-0.6B") 949 messages = [ChatMessage.from_user("What's Natural Language Processing? Be brief.")] 950 print(generator.run(messages)) 951 ``` 952 953 ``` 954 {'replies': 955 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text= 956 "Natural Language Processing (NLP) is a subfield of artificial intelligence that deals 957 with the interaction between computers and human language. It enables computers to understand, interpret, and 958 generate human language in a valuable way. NLP involves various techniques such as speech recognition, text 959 analysis, sentiment analysis, and machine translation. The ultimate goal is to make it easier for computers to 960 process and derive meaning from human language, improving communication between humans and machines.")], 961 _name=None, 962 _meta={'finish_reason': 'stop', 'index': 0, 'model': 963 'mistralai/Mistral-7B-Instruct-v0.2', 964 'usage': {'completion_tokens': 90, 'prompt_tokens': 19, 'total_tokens': 109}}) 965 ] 966 } 967 ``` 968 969 #### __init__ 970 971 ```python 972 __init__( 973 model: str = "Qwen/Qwen3-0.6B", 974 task: ( 975 Literal["text-generation", "text2text-generation", "image-text-to-text"] 976 | None 977 ) = None, 978 device: ComponentDevice | None = None, 979 token: Secret | None = Secret.from_env_var( 980 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 981 ), 982 chat_template: str | None = None, 983 generation_kwargs: dict[str, Any] | None = None, 984 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 985 stop_words: list[str] | None = None, 986 streaming_callback: StreamingCallbackT | None = None, 987 tools: ToolsType | None = None, 988 tool_parsing_function: Callable[[str], list[ToolCall] | None] | None = None, 989 async_executor: ThreadPoolExecutor | None = None, 990 *, 991 enable_thinking: bool = False 992 ) -> None 993 ``` 994 995 Initializes the HuggingFaceLocalChatGenerator component. 996 997 **Parameters:** 998 999 - **model** (<code>str</code>) – The Hugging Face text generation model name or path, 1000 for example, `mistralai/Mistral-7B-Instruct-v0.2` or `TheBloke/OpenHermes-2.5-Mistral-7B-16k-AWQ`. 1001 The model must be a chat model supporting the ChatML messaging 1002 format. 1003 If the model is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1004 - **task** (<code>Literal['text-generation', 'text2text-generation', 'image-text-to-text'] | None</code>) – The task for the Hugging Face pipeline. Possible options: 1005 - `text-generation`: Supported by decoder models, like GPT. 1006 - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead. 1007 Previously supported by encoder–decoder models such as T5. 1008 - `image-text-to-text`: Supported by vision-language models. 1009 If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1010 If not specified, the component calls the Hugging Face API to infer the task from the model name. 1011 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 1012 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 1013 - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files. 1014 If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 1015 - **chat_template** (<code>str | None</code>) – Specifies an optional Jinja template for formatting chat 1016 messages. Most high-quality chat models have their own templates, but for models without this 1017 feature or if you prefer a custom template, use this parameter. 1018 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 1019 Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. 1020 See Hugging Face's documentation for more information: 1021 - - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation) 1022 - - [GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig) 1023 The only `generation_kwargs` set by default is `max_new_tokens`, which is set to 512 tokens. 1024 - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the 1025 Hugging Face pipeline for text generation. 1026 These keyword arguments provide fine-grained control over the Hugging Face pipeline. 1027 In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. 1028 For kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). 1029 In this dictionary, you can also include `model_kwargs` to specify the kwargs for [model initialization](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) 1030 - **stop_words** (<code>list\[str\] | None</code>) – A list of stop words. If the model generates a stop word, the generation stops. 1031 If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. 1032 For some chat models, the output includes both the new text and the original prompt. 1033 In these cases, make sure your prompt has no stop words. 1034 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1035 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1036 - **tool_parsing_function** (<code>Callable\\[[str\], list\[ToolCall\] | None\] | None</code>) – A callable that takes a string and returns a list of ToolCall objects or None. 1037 If None, the default_tool_parser will be used which extracts tool calls using a predefined pattern. 1038 - **async_executor** (<code>ThreadPoolExecutor | None</code>) – Optional ThreadPoolExecutor to use for async calls. If not provided, a single-threaded executor will be 1039 initialized and used 1040 - **enable_thinking** (<code>bool</code>) – Whether to enable thinking mode in the chat template for thinking-capable models. 1041 When enabled, the model generates intermediate reasoning before the final response. Defaults to False. 1042 1043 #### shutdown 1044 1045 ```python 1046 shutdown() -> None 1047 ``` 1048 1049 Explicitly shutdown the executor if we own it. 1050 1051 #### warm_up 1052 1053 ```python 1054 warm_up() -> None 1055 ``` 1056 1057 Initializes the component and warms up tools if provided. 1058 1059 #### to_dict 1060 1061 ```python 1062 to_dict() -> dict[str, Any] 1063 ``` 1064 1065 Serializes the component to a dictionary. 1066 1067 **Returns:** 1068 1069 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1070 1071 #### from_dict 1072 1073 ```python 1074 from_dict(data: dict[str, Any]) -> HuggingFaceLocalChatGenerator 1075 ``` 1076 1077 Deserializes the component from a dictionary. 1078 1079 **Parameters:** 1080 1081 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 1082 1083 **Returns:** 1084 1085 - <code>HuggingFaceLocalChatGenerator</code> – The deserialized component. 1086 1087 #### run 1088 1089 ```python 1090 run( 1091 messages: list[ChatMessage], 1092 generation_kwargs: dict[str, Any] | None = None, 1093 streaming_callback: StreamingCallbackT | None = None, 1094 tools: ToolsType | None = None, 1095 ) -> dict[str, list[ChatMessage]] 1096 ``` 1097 1098 Invoke text generation inference based on the provided messages and generation parameters. 1099 1100 **Parameters:** 1101 1102 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 1103 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 1104 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1105 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1106 If set, it will override the `tools` parameter provided during initialization. 1107 1108 **Returns:** 1109 1110 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 1111 - `replies`: A list containing the generated responses as ChatMessage instances. 1112 1113 #### create_message 1114 1115 ```python 1116 create_message( 1117 text: str, 1118 index: int, 1119 tokenizer: Union[PreTrainedTokenizer, PreTrainedTokenizerFast], 1120 prompt: str, 1121 generation_kwargs: dict[str, Any], 1122 parse_tool_calls: bool = False, 1123 ) -> ChatMessage 1124 ``` 1125 1126 Create a ChatMessage instance from the provided text, populated with metadata. 1127 1128 **Parameters:** 1129 1130 - **text** (<code>str</code>) – The generated text. 1131 - **index** (<code>int</code>) – The index of the generated text. 1132 - **tokenizer** (<code>Union\[PreTrainedTokenizer, PreTrainedTokenizerFast\]</code>) – The tokenizer used for generation. 1133 - **prompt** (<code>str</code>) – The prompt used for generation. 1134 - **generation_kwargs** (<code>dict\[str, Any\]</code>) – The generation parameters. 1135 - **parse_tool_calls** (<code>bool</code>) – Whether to attempt parsing tool calls from the text. 1136 1137 **Returns:** 1138 1139 - <code>ChatMessage</code> – A ChatMessage instance. 1140 1141 #### run_async 1142 1143 ```python 1144 run_async( 1145 messages: list[ChatMessage], 1146 generation_kwargs: dict[str, Any] | None = None, 1147 streaming_callback: StreamingCallbackT | None = None, 1148 tools: ToolsType | None = None, 1149 ) -> dict[str, list[ChatMessage]] 1150 ``` 1151 1152 Asynchronously invokes text generation inference based on the provided messages and generation parameters. 1153 1154 This is the asynchronous version of the `run` method. It has the same parameters 1155 and return values but can be used with `await` in an async code. 1156 1157 **Parameters:** 1158 1159 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage objects representing the input messages. 1160 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 1161 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1162 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1163 If set, it will override the `tools` parameter provided during initialization. 1164 1165 **Returns:** 1166 1167 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following keys: 1168 - `replies`: A list containing the generated responses as ChatMessage instances. 1169 1170 ## chat/llm 1171 1172 ### LLM 1173 1174 Bases: <code>Agent</code> 1175 1176 A text generation component powered by a large language model. 1177 1178 The LLM component is a simplified version of the Agent that focuses solely on text generation 1179 without tool usage. It processes messages and returns a single response from the language model. 1180 1181 ### Usage examples 1182 1183 ```python 1184 from haystack.components.generators.chat import LLM 1185 from haystack.components.generators.chat import OpenAIChatGenerator 1186 from haystack.dataclasses import ChatMessage 1187 1188 llm = LLM( 1189 chat_generator=OpenAIChatGenerator(), 1190 system_prompt="You are a helpful translation assistant.", 1191 user_prompt="""{% message role="user"%} 1192 Summarize the following document: {{ document }} 1193 {% endmessage %}""", 1194 required_variables=["document"], 1195 ) 1196 1197 result = llm.run(document="The weather is lovely today and the sun is shining. ") 1198 print(result["last_message"].text) 1199 ``` 1200 1201 #### __init__ 1202 1203 ```python 1204 __init__( 1205 *, 1206 chat_generator: ChatGenerator, 1207 system_prompt: str | None = None, 1208 user_prompt: str | None = None, 1209 required_variables: list[str] | Literal["*"] | None = None, 1210 streaming_callback: StreamingCallbackT | None = None 1211 ) -> None 1212 ``` 1213 1214 Initialize the LLM component. 1215 1216 **Parameters:** 1217 1218 - **chat_generator** (<code>ChatGenerator</code>) – An instance of the chat generator that the LLM should use. 1219 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. 1220 - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided this is appended to the messages provided at runtime. 1221 - **required_variables** (<code>list\[str\] | Literal['\*'] | None</code>) – List variables that must be provided as input to user_prompt. 1222 If a variable listed as required is not provided, an exception is raised. 1223 If set to `"*"`, all variables found in the prompt are required. Optional. 1224 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM. 1225 1226 #### to_dict 1227 1228 ```python 1229 to_dict() -> dict[str, Any] 1230 ``` 1231 1232 Serialize the LLM component to a dictionary. 1233 1234 **Returns:** 1235 1236 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 1237 1238 #### from_dict 1239 1240 ```python 1241 from_dict(data: dict[str, Any]) -> LLM 1242 ``` 1243 1244 Deserialize the LLM from a dictionary. 1245 1246 **Parameters:** 1247 1248 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 1249 1250 **Returns:** 1251 1252 - <code>LLM</code> – Deserialized LLM instance. 1253 1254 #### run 1255 1256 ```python 1257 run( 1258 messages: list[ChatMessage] | None = None, 1259 streaming_callback: StreamingCallbackT | None = None, 1260 *, 1261 generation_kwargs: dict[str, Any] | None = None, 1262 system_prompt: str | None = None, 1263 user_prompt: str | None = None, 1264 **kwargs: Any 1265 ) -> dict[str, Any] 1266 ``` 1267 1268 Process messages and generate a response from the language model. 1269 1270 **Parameters:** 1271 1272 - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process. 1273 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback that will be invoked when a response is streamed from the LLM. 1274 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters 1275 will override the parameters passed during component initialization. 1276 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt. 1277 - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is 1278 appended to the messages provided at runtime. 1279 - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt` 1280 (the keys must match template variable names). 1281 1282 **Returns:** 1283 1284 - <code>dict\[str, Any\]</code> – A dictionary with the following keys: 1285 - "messages": List of all messages exchanged during the LLM's run. 1286 - "last_message": The last message exchanged during the LLM's run. 1287 1288 #### run_async 1289 1290 ```python 1291 run_async( 1292 messages: list[ChatMessage] | None = None, 1293 streaming_callback: StreamingCallbackT | None = None, 1294 *, 1295 generation_kwargs: dict[str, Any] | None = None, 1296 system_prompt: str | None = None, 1297 user_prompt: str | None = None, 1298 **kwargs: Any 1299 ) -> dict[str, Any] 1300 ``` 1301 1302 Asynchronously process messages and generate a response from the language model. 1303 1304 **Parameters:** 1305 1306 - **messages** (<code>list\[ChatMessage\] | None</code>) – List of Haystack ChatMessage objects to process. 1307 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An asynchronous callback that will be invoked when a response is streamed 1308 from the LLM. 1309 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for the underlying chat generator. These parameters 1310 will override the parameters passed during component initialization. 1311 - **system_prompt** (<code>str | None</code>) – System prompt for the LLM. If provided, it overrides the default system prompt. 1312 - **user_prompt** (<code>str | None</code>) – User prompt for the LLM. If provided, it overrides the default user prompt and is 1313 appended to the messages provided at runtime. 1314 - **kwargs** (<code>Any</code>) – Additional keyword arguments. These are used to fill template variables in the `user_prompt` 1315 (the keys must match template variable names). 1316 1317 **Returns:** 1318 1319 - <code>dict\[str, Any\]</code> – A dictionary with the following keys: 1320 - "messages": List of all messages exchanged during the LLM's run. 1321 - "last_message": The last message exchanged during the LLM's run. 1322 1323 ## chat/openai 1324 1325 ### OpenAIChatGenerator 1326 1327 Completes chats using OpenAI's large language models (LLMs). 1328 1329 It works with the gpt-4 and gpt-5 series models and supports streaming responses 1330 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1331 format in input and output. 1332 1333 You can customize how the text is generated by passing parameters to the 1334 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1335 the component or when you run it. Any parameter that works with 1336 `openai.ChatCompletion.create` will work here too. 1337 1338 For details on OpenAI API parameters, see 1339 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 1340 1341 ### Usage example 1342 1343 ```python 1344 from haystack.components.generators.chat import OpenAIChatGenerator 1345 from haystack.dataclasses import ChatMessage 1346 1347 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 1348 1349 client = OpenAIChatGenerator() 1350 response = client.run(messages) 1351 print(response) 1352 ``` 1353 1354 Output: 1355 1356 ``` 1357 {'replies': 1358 [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content= 1359 [TextContent(text="Natural Language Processing (NLP) is a branch of artificial intelligence 1360 that focuses on enabling computers to understand, interpret, and generate human language in 1361 a way that is meaningful and useful.")], 1362 _name=None, 1363 _meta={'model': 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 1364 'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}}) 1365 ] 1366 } 1367 ``` 1368 1369 #### SUPPORTED_MODELS 1370 1371 ```python 1372 SUPPORTED_MODELS: list[str] = [ 1373 "gpt-5-mini", 1374 "gpt-5-nano", 1375 "gpt-5", 1376 "gpt-5.1", 1377 "gpt-5.2", 1378 "gpt-5.2-pro", 1379 "gpt-5.4", 1380 "gpt-5-pro", 1381 "gpt-4.1", 1382 "gpt-4.1-mini", 1383 "gpt-4.1-nano", 1384 "gpt-4o", 1385 "gpt-4o-mini", 1386 "gpt-4-turbo", 1387 "gpt-4", 1388 "gpt-3.5-turbo", 1389 ] 1390 1391 ``` 1392 1393 A non-exhaustive list of chat models supported by this component. 1394 See https://developers.openai.com/api/docs/models for the full list and snapshot IDs. 1395 1396 #### __init__ 1397 1398 ```python 1399 __init__( 1400 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 1401 model: str = "gpt-5-mini", 1402 streaming_callback: StreamingCallbackT | None = None, 1403 api_base_url: str | None = None, 1404 organization: str | None = None, 1405 generation_kwargs: dict[str, Any] | None = None, 1406 timeout: float | None = None, 1407 max_retries: int | None = None, 1408 tools: ToolsType | None = None, 1409 tools_strict: bool = False, 1410 http_client_kwargs: dict[str, Any] | None = None, 1411 ) -> None 1412 ``` 1413 1414 Creates an instance of OpenAIChatGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini 1415 1416 Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' 1417 environment variables to override the `timeout` and `max_retries` parameters respectively 1418 in the OpenAI client. 1419 1420 **Parameters:** 1421 1422 - **api_key** (<code>Secret</code>) – The OpenAI API key. 1423 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 1424 during initialization. 1425 - **model** (<code>str</code>) – The name of the model to use. 1426 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1427 The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 1428 as an argument. 1429 - **api_base_url** (<code>str | None</code>) – An optional base URL. 1430 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See 1431 [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 1432 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent directly to 1433 the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for 1434 more details. 1435 Some of the supported parameters: 1436 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 1437 including visible output tokens and reasoning tokens. 1438 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 1439 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 1440 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 1441 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 1442 comprising the top 10% probability mass are considered. 1443 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 1444 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 1445 - `stop`: One or more sequences after which the LLM should stop generating tokens. 1446 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 1447 the model will be less likely to repeat the same token in the text. 1448 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 1449 Bigger values mean the model will be less likely to repeat the same token in the text. 1450 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 1451 values are the bias to add to that token. 1452 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 1453 If provided, the output will always be validated against this 1454 format (unless the model returns a tool call). 1455 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 1456 Notes: 1457 - This parameter accepts Pydantic models and JSON schemas for latest models starting from GPT-4o. 1458 Older models only support basic version of structured outputs through `{"type": "json_object"}`. 1459 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 1460 - For structured outputs with streaming, 1461 the `response_format` must be a JSON schema and not a Pydantic model. 1462 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 1463 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 1464 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 1465 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 1466 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1467 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1468 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1469 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 1470 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 1471 1472 #### warm_up 1473 1474 ```python 1475 warm_up() -> None 1476 ``` 1477 1478 Warm up the OpenAI chat generator. 1479 1480 This will warm up the tools registered in the chat generator. 1481 This method is idempotent and will only warm up the tools once. 1482 1483 #### to_dict 1484 1485 ```python 1486 to_dict() -> dict[str, Any] 1487 ``` 1488 1489 Serialize this component to a dictionary. 1490 1491 **Returns:** 1492 1493 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 1494 1495 #### from_dict 1496 1497 ```python 1498 from_dict(data: dict[str, Any]) -> OpenAIChatGenerator 1499 ``` 1500 1501 Deserialize this component from a dictionary. 1502 1503 **Parameters:** 1504 1505 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 1506 1507 **Returns:** 1508 1509 - <code>OpenAIChatGenerator</code> – The deserialized component instance. 1510 1511 #### run 1512 1513 ```python 1514 run( 1515 messages: list[ChatMessage], 1516 streaming_callback: StreamingCallbackT | None = None, 1517 generation_kwargs: dict[str, Any] | None = None, 1518 *, 1519 tools: ToolsType | None = None, 1520 tools_strict: bool | None = None 1521 ) -> dict[str, list[ChatMessage]] 1522 ``` 1523 1524 Invokes chat completion based on the provided messages and generation parameters. 1525 1526 **Parameters:** 1527 1528 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1529 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1530 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1531 override the parameters passed during component initialization. 1532 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 1533 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1534 If set, it will override the `tools` parameter provided during initialization. 1535 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1536 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1537 If set, it will override the `tools_strict` parameter set during component initialization. 1538 1539 **Returns:** 1540 1541 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1542 - `replies`: A list containing the generated responses as ChatMessage instances. 1543 1544 #### run_async 1545 1546 ```python 1547 run_async( 1548 messages: list[ChatMessage], 1549 streaming_callback: StreamingCallbackT | None = None, 1550 generation_kwargs: dict[str, Any] | None = None, 1551 *, 1552 tools: ToolsType | None = None, 1553 tools_strict: bool | None = None 1554 ) -> dict[str, list[ChatMessage]] 1555 ``` 1556 1557 Asynchronously invokes chat completion based on the provided messages and generation parameters. 1558 1559 This is the asynchronous version of the `run` method. It has the same parameters and return values 1560 but can be used with `await` in async code. 1561 1562 **Parameters:** 1563 1564 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1565 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1566 Must be a coroutine. 1567 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1568 override the parameters passed during component initialization. 1569 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create). 1570 - **tools** (<code>ToolsType | None</code>) – A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 1571 If set, it will override the `tools` parameter provided during initialization. 1572 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1573 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1574 If set, it will override the `tools_strict` parameter set during component initialization. 1575 1576 **Returns:** 1577 1578 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1579 - `replies`: A list containing the generated responses as ChatMessage instances. 1580 1581 ## chat/openai_responses 1582 1583 ### OpenAIResponsesChatGenerator 1584 1585 Completes chats using OpenAI's Responses API. 1586 1587 It works with the gpt-4 and o-series models and supports streaming responses 1588 from OpenAI API. It uses [ChatMessage](https://docs.haystack.deepset.ai/docs/chatmessage) 1589 format in input and output. 1590 1591 You can customize how the text is generated by passing parameters to the 1592 OpenAI API. Use the `**generation_kwargs` argument when you initialize 1593 the component or when you run it. Any parameter that works with 1594 `openai.Responses.create` will work here too. 1595 1596 For details on OpenAI API parameters, see 1597 [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses). 1598 1599 ### Usage example 1600 1601 ```python 1602 from haystack.components.generators.chat import OpenAIResponsesChatGenerator 1603 from haystack.dataclasses import ChatMessage 1604 1605 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 1606 1607 client = OpenAIResponsesChatGenerator(generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}}) 1608 response = client.run(messages) 1609 print(response) 1610 ``` 1611 1612 #### SUPPORTED_MODELS 1613 1614 ```python 1615 SUPPORTED_MODELS: list[str] = [ 1616 "gpt-5-mini", 1617 "gpt-5-nano", 1618 "gpt-5", 1619 "gpt-5.1", 1620 "gpt-5.2", 1621 "gpt-5.2-pro", 1622 "gpt-5.4", 1623 "gpt-5-pro", 1624 "gpt-4.1", 1625 "gpt-4.1-mini", 1626 "gpt-4.1-nano", 1627 "gpt-4o", 1628 "gpt-4o-mini", 1629 "o1", 1630 "o1-mini", 1631 "o1-pro", 1632 "o3", 1633 "o3-mini", 1634 "o3-pro", 1635 "o4-mini", 1636 ] 1637 1638 ``` 1639 1640 A non-exhaustive list of chat models supported by this component. 1641 See https://platform.openai.com/docs/models for the full list and snapshot IDs. 1642 1643 #### __init__ 1644 1645 ```python 1646 __init__( 1647 *, 1648 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 1649 model: str = "gpt-5-mini", 1650 streaming_callback: StreamingCallbackT | None = None, 1651 api_base_url: str | None = None, 1652 organization: str | None = None, 1653 generation_kwargs: dict[str, Any] | None = None, 1654 timeout: float | None = None, 1655 max_retries: int | None = None, 1656 tools: ToolsType | list[dict] | None = None, 1657 tools_strict: bool = False, 1658 http_client_kwargs: dict[str, Any] | None = None 1659 ) -> None 1660 ``` 1661 1662 Creates an instance of OpenAIResponsesChatGenerator. Uses OpenAI's gpt-5-mini by default. 1663 1664 Before initializing the component, you can set the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' 1665 environment variables to override the `timeout` and `max_retries` parameters respectively 1666 in the OpenAI client. 1667 1668 **Parameters:** 1669 1670 - **api_key** (<code>Secret</code>) – The OpenAI API key. 1671 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 1672 during initialization. 1673 - **model** (<code>str</code>) – The name of the model to use. 1674 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1675 The callback function accepts [StreamingChunk](https://docs.haystack.deepset.ai/docs/data-classes#streamingchunk) 1676 as an argument. 1677 - **api_base_url** (<code>str | None</code>) – An optional base URL. 1678 - **organization** (<code>str | None</code>) – Your organization ID, defaults to `None`. See 1679 [production best practices](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 1680 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are sent 1681 directly to the OpenAI endpoint. 1682 See OpenAI [documentation](https://platform.openai.com/docs/api-reference/responses) for 1683 more details. 1684 Some of the supported parameters: 1685 - `temperature`: What sampling temperature to use. Higher values like 0.8 will make the output more random, 1686 while lower values like 0.2 will make it more focused and deterministic. 1687 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 1688 considers the results of the tokens with top_p probability mass. For example, 0.1 means only the tokens 1689 comprising the top 10% probability mass are considered. 1690 - `previous_response_id`: The ID of the previous response. 1691 Use this to create multi-turn conversations. 1692 - `text_format`: A Pydantic model that enforces the structure of the model's response. 1693 If provided, the output will always be validated against this 1694 format (unless the model returns a tool call). 1695 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 1696 - `text`: A JSON schema that enforces the structure of the model's response. 1697 If provided, the output will always be validated against this 1698 format (unless the model returns a tool call). 1699 Notes: 1700 - Both JSON Schema and Pydantic models are supported for latest models starting from GPT-4o. 1701 - If both are provided, `text_format` takes precedence and json schema passed to `text` is ignored. 1702 - Currently, this component doesn't support streaming for structured outputs. 1703 - Older models only support basic version of structured outputs through `{"type": "json_object"}`. 1704 For detailed information on JSON mode, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs#json-mode). 1705 - `reasoning`: A dictionary of parameters for reasoning. For example: 1706 - `summary`: The summary of the reasoning. 1707 - `effort`: The level of effort to put into the reasoning. Can be `low`, `medium` or `high`. 1708 - `generate_summary`: Whether to generate a summary of the reasoning. 1709 Note: OpenAI does not return the reasoning tokens, but we can view summary if its enabled. 1710 For details, see the [OpenAI Reasoning documentation](https://platform.openai.com/docs/guides/reasoning). 1711 - **timeout** (<code>float | None</code>) – Timeout for OpenAI client calls. If not set, it defaults to either the 1712 `OPENAI_TIMEOUT` environment variable, or 30 seconds. 1713 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact OpenAI after an internal error. 1714 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 1715 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. This parameter can accept either a 1716 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1717 OpenAI/MCP tool definitions. 1718 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1719 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 1720 - **tools_strict** (<code>bool</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 1721 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 1722 are strict by default. 1723 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 1724 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 1725 1726 #### warm_up 1727 1728 ```python 1729 warm_up() -> None 1730 ``` 1731 1732 Warm up the OpenAI responses chat generator. 1733 1734 This will warm up the tools registered in the chat generator. 1735 This method is idempotent and will only warm up the tools once. 1736 1737 #### to_dict 1738 1739 ```python 1740 to_dict() -> dict[str, Any] 1741 ``` 1742 1743 Serialize this component to a dictionary. 1744 1745 **Returns:** 1746 1747 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 1748 1749 #### from_dict 1750 1751 ```python 1752 from_dict(data: dict[str, Any]) -> OpenAIResponsesChatGenerator 1753 ``` 1754 1755 Deserialize this component from a dictionary. 1756 1757 **Parameters:** 1758 1759 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 1760 1761 **Returns:** 1762 1763 - <code>OpenAIResponsesChatGenerator</code> – The deserialized component instance. 1764 1765 #### run 1766 1767 ```python 1768 run( 1769 messages: list[ChatMessage], 1770 *, 1771 streaming_callback: StreamingCallbackT | None = None, 1772 generation_kwargs: dict[str, Any] | None = None, 1773 tools: ToolsType | list[dict] | None = None, 1774 tools_strict: bool | None = None 1775 ) -> dict[str, list[ChatMessage]] 1776 ``` 1777 1778 Invokes response generation based on the provided messages and generation parameters. 1779 1780 **Parameters:** 1781 1782 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1783 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1784 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1785 override the parameters passed during component initialization. 1786 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 1787 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – The tools that the model can use to prepare calls. If set, it will override the 1788 `tools` parameter set during component initialization. This parameter can accept either a 1789 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1790 OpenAI/MCP tool definitions. 1791 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1792 For details on tool support, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create#responses-create-tools). 1793 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `False`, the model may not exactly 1794 follow the schema provided in the `parameters` field of the tool definition. In Response API, tool calls 1795 are strict by default. 1796 If set, it will override the `tools_strict` parameter set during component initialization. 1797 1798 **Returns:** 1799 1800 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1801 - `replies`: A list containing the generated responses as ChatMessage instances. 1802 1803 #### run_async 1804 1805 ```python 1806 run_async( 1807 messages: list[ChatMessage], 1808 *, 1809 streaming_callback: StreamingCallbackT | None = None, 1810 generation_kwargs: dict[str, Any] | None = None, 1811 tools: ToolsType | list[dict] | None = None, 1812 tools_strict: bool | None = None 1813 ) -> dict[str, list[ChatMessage]] 1814 ``` 1815 1816 Asynchronously invokes response generation based on the provided messages and generation parameters. 1817 1818 This is the asynchronous version of the `run` method. It has the same parameters and return values 1819 but can be used with `await` in async code. 1820 1821 **Parameters:** 1822 1823 - **messages** (<code>list\[ChatMessage\]</code>) – A list of ChatMessage instances representing the input messages. 1824 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1825 Must be a coroutine. 1826 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will 1827 override the parameters passed during component initialization. 1828 For details on OpenAI API parameters, see [OpenAI documentation](https://platform.openai.com/docs/api-reference/responses/create). 1829 - **tools** (<code>ToolsType | list\[dict\] | None</code>) – A list of tools or a Toolset for which the model can prepare calls. If set, it will override the 1830 `tools` parameter set during component initialization. This parameter can accept either a list of 1831 mixed list of Haystack `Tool` objects and Haystack `Toolset`. Or you can pass a dictionary of 1832 OpenAI/MCP tool definitions. 1833 Note: You cannot pass OpenAI/MCP tools and Haystack tools together. 1834 - **tools_strict** (<code>bool | None</code>) – Whether to enable strict schema adherence for tool calls. If set to `True`, the model will follow exactly 1835 the schema provided in the `parameters` field of the tool definition, but this may increase latency. 1836 If set, it will override the `tools_strict` parameter set during component initialization. 1837 1838 **Returns:** 1839 1840 - <code>dict\[str, list\[ChatMessage\]\]</code> – A dictionary with the following key: 1841 - `replies`: A list containing the generated responses as ChatMessage instances. 1842 1843 ## hugging_face_api 1844 1845 ### HuggingFaceAPIGenerator 1846 1847 Generates text using Hugging Face APIs. 1848 1849 Use it with the following Hugging Face APIs: 1850 1851 - [Paid Inference Endpoints](https://huggingface.co/inference-endpoints) 1852 - [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference) 1853 1854 **Note:** As of July 2025, the Hugging Face Inference API no longer offers generative models through the 1855 `text_generation` endpoint. Generative models are now only available through providers supporting the 1856 `chat_completion` endpoint. As a result, this component might no longer work with the Hugging Face Inference API. 1857 Use the `HuggingFaceAPIChatGenerator` component, which supports the `chat_completion` endpoint. 1858 1859 ### Usage examples 1860 1861 #### With Hugging Face Inference Endpoints 1862 1863 ```python 1864 from haystack.components.generators import HuggingFaceAPIGenerator 1865 from haystack.utils import Secret 1866 1867 generator = HuggingFaceAPIGenerator(api_type="inference_endpoints", 1868 api_params={"url": "<your-inference-endpoint-url>"}, 1869 token=Secret.from_token("<your-api-key>")) 1870 1871 result = generator.run(prompt="What's Natural Language Processing?") 1872 print(result) 1873 ``` 1874 1875 #### With self-hosted text generation inference 1876 1877 ```python 1878 from haystack.components.generators import HuggingFaceAPIGenerator 1879 1880 generator = HuggingFaceAPIGenerator(api_type="text_generation_inference", 1881 api_params={"url": "http://localhost:8080"}) 1882 1883 result = generator.run(prompt="What's Natural Language Processing?") 1884 print(result) 1885 ``` 1886 1887 #### With the free serverless inference API 1888 1889 Be aware that this example might not work as the Hugging Face Inference API no longer offer models that support the 1890 `text_generation` endpoint. Use the `HuggingFaceAPIChatGenerator` for generative models through the 1891 `chat_completion` endpoint. 1892 1893 ```python 1894 from haystack.components.generators import HuggingFaceAPIGenerator 1895 from haystack.utils import Secret 1896 1897 generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api", 1898 api_params={"model": "HuggingFaceH4/zephyr-7b-beta"}, 1899 token=Secret.from_token("<your-api-key>")) 1900 1901 result = generator.run(prompt="What's Natural Language Processing?") 1902 print(result) 1903 ``` 1904 1905 #### __init__ 1906 1907 ```python 1908 __init__( 1909 api_type: HFGenerationAPIType | str, 1910 api_params: dict[str, str], 1911 token: Secret | None = Secret.from_env_var( 1912 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 1913 ), 1914 generation_kwargs: dict[str, Any] | None = None, 1915 stop_words: list[str] | None = None, 1916 streaming_callback: StreamingCallbackT | None = None, 1917 ) -> None 1918 ``` 1919 1920 Initialize the HuggingFaceAPIGenerator instance. 1921 1922 **Parameters:** 1923 1924 - **api_type** (<code>HFGenerationAPIType | str</code>) – The type of Hugging Face API to use. Available types: 1925 - `text_generation_inference`: See [TGI](https://github.com/huggingface/text-generation-inference). 1926 - `inference_endpoints`: See [Inference Endpoints](https://huggingface.co/inference-endpoints). 1927 - `serverless_inference_api`: See [Serverless Inference API](https://huggingface.co/inference-api). 1928 This might no longer work due to changes in the models offered in the Hugging Face Inference API. 1929 Please use the `HuggingFaceAPIChatGenerator` component instead. 1930 - **api_params** (<code>dict\[str, str\]</code>) – A dictionary with the following keys: 1931 - `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`. 1932 - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or 1933 `TEXT_GENERATION_INFERENCE`. 1934 - Other parameters specific to the chosen API type, such as `timeout`, `headers`, `provider` etc. 1935 - **token** (<code>Secret | None</code>) – The Hugging Face token to use as HTTP bearer authorization. 1936 Check your HF token in your [account settings](https://huggingface.co/settings/tokens). 1937 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. Some examples: `max_new_tokens`, 1938 `temperature`, `top_k`, `top_p`. 1939 For details, see [Hugging Face documentation](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) 1940 for more information. 1941 - **stop_words** (<code>list\[str\] | None</code>) – An optional list of strings representing the stop words. 1942 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 1943 1944 #### to_dict 1945 1946 ```python 1947 to_dict() -> dict[str, Any] 1948 ``` 1949 1950 Serialize this component to a dictionary. 1951 1952 **Returns:** 1953 1954 - <code>dict\[str, Any\]</code> – A dictionary containing the serialized component. 1955 1956 #### from_dict 1957 1958 ```python 1959 from_dict(data: dict[str, Any]) -> HuggingFaceAPIGenerator 1960 ``` 1961 1962 Deserialize this component from a dictionary. 1963 1964 #### run 1965 1966 ```python 1967 run( 1968 prompt: str, 1969 streaming_callback: StreamingCallbackT | None = None, 1970 generation_kwargs: dict[str, Any] | None = None, 1971 ) -> dict[str, Any] 1972 ``` 1973 1974 Invoke the text generation inference for the given prompt and generation parameters. 1975 1976 **Parameters:** 1977 1978 - **prompt** (<code>str</code>) – A string representing the prompt. 1979 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 1980 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 1981 1982 **Returns:** 1983 1984 - <code>dict\[str, Any\]</code> – A dictionary with the generated replies and metadata. Both are lists of length n. 1985 - replies: A list of strings representing the generated replies. 1986 1987 ## hugging_face_local 1988 1989 ### HuggingFaceLocalGenerator 1990 1991 Generates text using models from Hugging Face that run locally. 1992 1993 LLMs running locally may need powerful hardware. 1994 1995 ### Usage example 1996 1997 ```python 1998 from haystack.components.generators import HuggingFaceLocalGenerator 1999 2000 generator = HuggingFaceLocalGenerator( 2001 model="Qwen/Qwen3-0.6B", 2002 task="text-generation", 2003 generation_kwargs={"max_new_tokens": 100, "temperature": 0.9} 2004 ) 2005 2006 print(generator.run("Who is the best American actor?")) 2007 # {'replies': ['John Cusack']} 2008 ``` 2009 2010 #### __init__ 2011 2012 ```python 2013 __init__( 2014 model: str = "Qwen/Qwen3-0.6B", 2015 task: Literal["text-generation", "text2text-generation"] | None = None, 2016 device: ComponentDevice | None = None, 2017 token: Secret | None = Secret.from_env_var( 2018 ["HF_API_TOKEN", "HF_TOKEN"], strict=False 2019 ), 2020 generation_kwargs: dict[str, Any] | None = None, 2021 huggingface_pipeline_kwargs: dict[str, Any] | None = None, 2022 stop_words: list[str] | None = None, 2023 streaming_callback: StreamingCallbackT | None = None, 2024 ) -> None 2025 ``` 2026 2027 Creates an instance of a HuggingFaceLocalGenerator. 2028 2029 **Parameters:** 2030 2031 - **model** (<code>str</code>) – The Hugging Face text generation model name or path. 2032 - **task** (<code>Literal['text-generation', 'text2text-generation'] | None</code>) – The task for the Hugging Face pipeline. Possible options: 2033 - `text-generation`: Supported by decoder models, like GPT. 2034 - `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead. 2035 Previously supported by encoder–decoder models such as T5. 2036 If the task is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 2037 If not specified, the component calls the Hugging Face API to infer the task from the model name. 2038 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 2039 If a device or device map is specified in `huggingface_pipeline_kwargs`, it overrides this parameter. 2040 - **token** (<code>Secret | None</code>) – The token to use as HTTP bearer authorization for remote files. 2041 If the token is specified in `huggingface_pipeline_kwargs`, this parameter is ignored. 2042 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary with keyword arguments to customize text generation. 2043 Some examples: `max_length`, `max_new_tokens`, `temperature`, `top_k`, `top_p`. 2044 See Hugging Face's documentation for more information: 2045 - [customize-text-generation](https://huggingface.co/docs/transformers/main/en/generation_strategies#customize-text-generation) 2046 - [transformers.GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig) 2047 - **huggingface_pipeline_kwargs** (<code>dict\[str, Any\] | None</code>) – Dictionary with keyword arguments to initialize the 2048 Hugging Face pipeline for text generation. 2049 These keyword arguments provide fine-grained control over the Hugging Face pipeline. 2050 In case of duplication, these kwargs override `model`, `task`, `device`, and `token` init parameters. 2051 For available kwargs, see [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task). 2052 In this dictionary, you can also include `model_kwargs` to specify the kwargs for model initialization: 2053 [transformers.PreTrainedModel.from_pretrained](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) 2054 - **stop_words** (<code>list\[str\] | None</code>) – If the model generates a stop word, the generation stops. 2055 If you provide this parameter, don't specify the `stopping_criteria` in `generation_kwargs`. 2056 For some chat models, the output includes both the new text and the original prompt. 2057 In these cases, make sure your prompt has no stop words. 2058 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – An optional callable for handling streaming responses. 2059 2060 #### warm_up 2061 2062 ```python 2063 warm_up() -> None 2064 ``` 2065 2066 Initializes the component. 2067 2068 #### to_dict 2069 2070 ```python 2071 to_dict() -> dict[str, Any] 2072 ``` 2073 2074 Serializes the component to a dictionary. 2075 2076 **Returns:** 2077 2078 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 2079 2080 #### from_dict 2081 2082 ```python 2083 from_dict(data: dict[str, Any]) -> HuggingFaceLocalGenerator 2084 ``` 2085 2086 Deserializes the component from a dictionary. 2087 2088 **Parameters:** 2089 2090 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 2091 2092 **Returns:** 2093 2094 - <code>HuggingFaceLocalGenerator</code> – The deserialized component. 2095 2096 #### run 2097 2098 ```python 2099 run( 2100 prompt: str, 2101 streaming_callback: StreamingCallbackT | None = None, 2102 generation_kwargs: dict[str, Any] | None = None, 2103 ) -> dict[str, Any] 2104 ``` 2105 2106 Run the text generation model on the given prompt. 2107 2108 **Parameters:** 2109 2110 - **prompt** (<code>str</code>) – A string representing the prompt. 2111 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2112 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. 2113 2114 **Returns:** 2115 2116 - <code>dict\[str, Any\]</code> – A dictionary containing the generated replies. 2117 - replies: A list of strings representing the generated replies. 2118 2119 ## openai 2120 2121 ### OpenAIGenerator 2122 2123 Generates text using OpenAI's large language models (LLMs). 2124 2125 It works with the gpt-4 and gpt-5 series models and supports streaming responses 2126 from OpenAI API. It uses strings as input and output. 2127 2128 You can customize how the text is generated by passing parameters to the 2129 OpenAI API. Use the `**generation_kwargs` argument when you initialize 2130 the component or when you run it. Any parameter that works with 2131 `openai.ChatCompletion.create` will work here too. 2132 2133 For details on OpenAI API parameters, see 2134 [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat). 2135 2136 ### Usage example 2137 2138 ```python 2139 from haystack.components.generators import OpenAIGenerator 2140 client = OpenAIGenerator() 2141 response = client.run("What's Natural Language Processing? Be brief.") 2142 print(response) 2143 2144 # >> {'replies': ['Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on 2145 # >> the interaction between computers and human language. It involves enabling computers to understand, interpret, 2146 # >> and respond to natural human language in a way that is both meaningful and useful.'], 'meta': [{'model': 2147 # >> 'gpt-5-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 16, 2148 # >> 'completion_tokens': 49, 'total_tokens': 65}}]} 2149 ``` 2150 2151 #### __init__ 2152 2153 ```python 2154 __init__( 2155 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 2156 model: str = "gpt-5-mini", 2157 streaming_callback: StreamingCallbackT | None = None, 2158 api_base_url: str | None = None, 2159 organization: str | None = None, 2160 system_prompt: str | None = None, 2161 generation_kwargs: dict[str, Any] | None = None, 2162 timeout: float | None = None, 2163 max_retries: int | None = None, 2164 http_client_kwargs: dict[str, Any] | None = None, 2165 ) -> None 2166 ``` 2167 2168 Creates an instance of OpenAIGenerator. Unless specified otherwise in `model`, uses OpenAI's gpt-5-mini 2169 2170 By setting the 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' you can change the timeout and max_retries parameters 2171 in the OpenAI client. 2172 2173 **Parameters:** 2174 2175 - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI. 2176 - **model** (<code>str</code>) – The name of the model to use. 2177 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2178 The callback function accepts StreamingChunk as an argument. 2179 - **api_base_url** (<code>str | None</code>) – An optional base URL. 2180 - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`. 2181 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If not provided, the system prompt is 2182 omitted, and the default system prompt of the model is used. 2183 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are all sent directly to 2184 the OpenAI endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat) for 2185 more details. 2186 Some of the supported parameters: 2187 - `max_completion_tokens`: An upper bound for the number of tokens that can be generated for a completion, 2188 including visible output tokens and reasoning tokens. 2189 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 2190 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 2191 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 2192 considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens 2193 comprising the top 10% probability mass are considered. 2194 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 2195 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 2196 - `stop`: One or more sequences after which the LLM should stop generating tokens. 2197 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 2198 the model will be less likely to repeat the same token in the text. 2199 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 2200 Bigger values mean the model will be less likely to repeat the same token in the text. 2201 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 2202 values are the bias to add to that token. 2203 - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment variable 2204 or set to 30. 2205 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error, if not set it is inferred 2206 from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 2207 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 2208 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 2209 2210 #### to_dict 2211 2212 ```python 2213 to_dict() -> dict[str, Any] 2214 ``` 2215 2216 Serialize this component to a dictionary. 2217 2218 **Returns:** 2219 2220 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 2221 2222 #### from_dict 2223 2224 ```python 2225 from_dict(data: dict[str, Any]) -> OpenAIGenerator 2226 ``` 2227 2228 Deserialize this component from a dictionary. 2229 2230 **Parameters:** 2231 2232 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 2233 2234 **Returns:** 2235 2236 - <code>OpenAIGenerator</code> – The deserialized component instance. 2237 2238 #### run 2239 2240 ```python 2241 run( 2242 prompt: str, 2243 system_prompt: str | None = None, 2244 streaming_callback: StreamingCallbackT | None = None, 2245 generation_kwargs: dict[str, Any] | None = None, 2246 ) -> dict[str, list[str] | list[dict[str, Any]]] 2247 ``` 2248 2249 Invoke the text generation inference based on the provided messages and generation parameters. 2250 2251 **Parameters:** 2252 2253 - **prompt** (<code>str</code>) – The string prompt to use for text generation. 2254 - **system_prompt** (<code>str | None</code>) – The system prompt to use for text generation. If this run time system prompt is omitted, the system 2255 prompt, if defined at initialisation time, is used. 2256 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 2257 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Additional keyword arguments for text generation. These parameters will potentially override the parameters 2258 passed in the `__init__` method. For more details on the parameters supported by the OpenAI API, refer to 2259 the OpenAI [documentation](https://platform.openai.com/docs/api-reference/chat/create). 2260 2261 **Returns:** 2262 2263 - <code>dict\[str, list\[str\] | list\[dict\[str, Any\]\]\]</code> – A list of strings containing the generated responses and a list of dictionaries containing the metadata 2264 for each response. 2265 2266 ## openai_dalle 2267 2268 ### DALLEImageGenerator 2269 2270 Generates images using OpenAI's DALL-E model. 2271 2272 For details on OpenAI API parameters, see 2273 [OpenAI documentation](https://platform.openai.com/docs/api-reference/images/create). 2274 2275 ### Usage example 2276 2277 ```python 2278 from haystack.components.generators import DALLEImageGenerator 2279 image_generator = DALLEImageGenerator() 2280 response = image_generator.run("Show me a picture of a black cat.") 2281 print(response) 2282 ``` 2283 2284 #### __init__ 2285 2286 ```python 2287 __init__( 2288 model: str = "dall-e-3", 2289 quality: Literal["standard", "hd"] = "standard", 2290 size: Literal[ 2291 "256x256", "512x512", "1024x1024", "1792x1024", "1024x1792" 2292 ] = "1024x1024", 2293 response_format: Literal["url", "b64_json"] = "url", 2294 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 2295 api_base_url: str | None = None, 2296 organization: str | None = None, 2297 timeout: float | None = None, 2298 max_retries: int | None = None, 2299 http_client_kwargs: dict[str, Any] | None = None, 2300 ) -> None 2301 ``` 2302 2303 Creates an instance of DALLEImageGenerator. Unless specified otherwise in `model`, uses OpenAI's dall-e-3. 2304 2305 **Parameters:** 2306 2307 - **model** (<code>str</code>) – The model to use for image generation. Can be "dall-e-2" or "dall-e-3". 2308 - **quality** (<code>Literal['standard', 'hd']</code>) – The quality of the generated image. Can be "standard" or "hd". 2309 - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792']</code>) – The size of the generated images. 2310 Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. 2311 Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models. 2312 - **response_format** (<code>Literal['url', 'b64_json']</code>) – The format of the response. Can be "url" or "b64_json". 2313 - **api_key** (<code>Secret</code>) – The OpenAI API key to connect to OpenAI. 2314 - **api_base_url** (<code>str | None</code>) – An optional base URL. 2315 - **organization** (<code>str | None</code>) – The Organization ID, defaults to `None`. 2316 - **timeout** (<code>float | None</code>) – Timeout for OpenAI Client calls. If not set, it is inferred from the `OPENAI_TIMEOUT` environment variable 2317 or set to 30. 2318 - **max_retries** (<code>int | None</code>) – Maximum retries to establish contact with OpenAI if it returns an internal error. If not set, it is inferred 2319 from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 2320 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 2321 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 2322 2323 #### warm_up 2324 2325 ```python 2326 warm_up() -> None 2327 ``` 2328 2329 Warm up the OpenAI client. 2330 2331 #### run 2332 2333 ```python 2334 run( 2335 prompt: str, 2336 size: ( 2337 Literal["256x256", "512x512", "1024x1024", "1792x1024", "1024x1792"] 2338 | None 2339 ) = None, 2340 quality: Literal["standard", "hd"] | None = None, 2341 response_format: Literal["url", "b64_json"] | None = None, 2342 ) -> dict[str, Any] 2343 ``` 2344 2345 Invokes the image generation inference based on the provided prompt and generation parameters. 2346 2347 **Parameters:** 2348 2349 - **prompt** (<code>str</code>) – The prompt to generate the image. 2350 - **size** (<code>Literal['256x256', '512x512', '1024x1024', '1792x1024', '1024x1792'] | None</code>) – If provided, overrides the size provided during initialization. 2351 - **quality** (<code>Literal['standard', 'hd'] | None</code>) – If provided, overrides the quality provided during initialization. 2352 - **response_format** (<code>Literal['url', 'b64_json'] | None</code>) – If provided, overrides the response format provided during initialization. 2353 2354 **Returns:** 2355 2356 - <code>dict\[str, Any\]</code> – A dictionary containing the generated list of images and the revised prompt. 2357 Depending on the `response_format` parameter, the list of images can be URLs or base64 encoded JSON strings. 2358 The revised prompt is the prompt that was used to generate the image, if there was any revision 2359 to the prompt made by OpenAI. 2360 2361 #### to_dict 2362 2363 ```python 2364 to_dict() -> dict[str, Any] 2365 ``` 2366 2367 Serialize this component to a dictionary. 2368 2369 **Returns:** 2370 2371 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 2372 2373 #### from_dict 2374 2375 ```python 2376 from_dict(data: dict[str, Any]) -> DALLEImageGenerator 2377 ``` 2378 2379 Deserialize this component from a dictionary. 2380 2381 **Parameters:** 2382 2383 - **data** (<code>dict\[str, Any\]</code>) – The dictionary representation of this component. 2384 2385 **Returns:** 2386 2387 - <code>DALLEImageGenerator</code> – The deserialized component instance. 2388 2389 ## utils 2390 2391 ### print_streaming_chunk 2392 2393 ```python 2394 print_streaming_chunk(chunk: StreamingChunk) -> None 2395 ``` 2396 2397 Callback function to handle and display streaming output chunks. 2398 2399 This function processes a `StreamingChunk` object by: 2400 2401 - Printing tool call metadata (if any), including function names and arguments, as they arrive. 2402 - Printing tool call results when available. 2403 - Printing the main content (e.g., text tokens) of the chunk as it is received. 2404 2405 The function outputs data directly to stdout and flushes output buffers to ensure immediate display during 2406 streaming. 2407 2408 **Parameters:** 2409 2410 - **chunk** (<code>StreamingChunk</code>) – A chunk of streaming data containing content and optional metadata, such as tool calls and 2411 tool results.