nvidia.md
1 --- 2 title: "Nvidia" 3 id: integrations-nvidia 4 description: "Nvidia integration for Haystack" 5 slug: "/integrations-nvidia" 6 --- 7 8 9 ## haystack_integrations.components.embedders.nvidia.document_embedder 10 11 ### NvidiaDocumentEmbedder 12 13 A component for embedding documents using embedding models provided by [NVIDIA NIMs](https://ai.nvidia.com). 14 15 Usage example: 16 17 ```python 18 from haystack_integrations.components.embedders.nvidia import NvidiaDocumentEmbedder 19 20 doc = Document(content="I love pizza!") 21 22 text_embedder = NvidiaDocumentEmbedder(model="nvidia/nv-embedqa-e5-v5", api_url="https://integrate.api.nvidia.com/v1") 23 # Components warm up automatically on first run. 24 25 result = document_embedder.run([doc]) 26 print(result["documents"][0].embedding) 27 ``` 28 29 #### __init__ 30 31 ```python 32 __init__( 33 model: str | None = None, 34 api_key: Secret | None = Secret.from_env_var("NVIDIA_API_KEY"), 35 api_url: str = os.getenv("NVIDIA_API_URL", DEFAULT_API_URL), 36 prefix: str = "", 37 suffix: str = "", 38 batch_size: int = 32, 39 progress_bar: bool = True, 40 meta_fields_to_embed: list[str] | None = None, 41 embedding_separator: str = "\n", 42 truncate: EmbeddingTruncateMode | str | None = None, 43 timeout: float | None = None, 44 ) -> None 45 ``` 46 47 Create a NvidiaTextEmbedder component. 48 49 **Parameters:** 50 51 - **model** (<code>str | None</code>) – Embedding model to use. 52 If no specific model along with locally hosted API URL is provided, 53 the system defaults to the available model found using /models API. 54 - **api_key** (<code>Secret | None</code>) – API key for the NVIDIA NIM. 55 - **api_url** (<code>str</code>) – Custom API URL for the NVIDIA NIM. 56 Format for API URL is `http://host:port` 57 - **prefix** (<code>str</code>) – A string to add to the beginning of each text. 58 - **suffix** (<code>str</code>) – A string to add to the end of each text. 59 - **batch_size** (<code>int</code>) – Number of Documents to encode at once. 60 Cannot be greater than 50. 61 - **progress_bar** (<code>bool</code>) – Whether to show a progress bar or not. 62 - **meta_fields_to_embed** (<code>list\[str\] | None</code>) – List of meta fields that should be embedded along with the Document text. 63 - **embedding_separator** (<code>str</code>) – Separator used to concatenate the meta fields to the Document text. 64 - **truncate** (<code>EmbeddingTruncateMode | str | None</code>) – Specifies how inputs longer than the maximum token length should be truncated. 65 If None the behavior is model-dependent, see the official documentation for more information. 66 - **timeout** (<code>float | None</code>) – Timeout for request calls, if not set it is inferred from the `NVIDIA_TIMEOUT` environment variable 67 or set to 60 by default. 68 69 #### class_name 70 71 ```python 72 class_name() -> str 73 ``` 74 75 Return the class name identifier for serialization. 76 77 #### default_model 78 79 ```python 80 default_model() -> None 81 ``` 82 83 Set default model in local NIM mode. 84 85 #### warm_up 86 87 ```python 88 warm_up() -> None 89 ``` 90 91 Initializes the component. 92 93 #### to_dict 94 95 ```python 96 to_dict() -> dict[str, Any] 97 ``` 98 99 Serializes the component to a dictionary. 100 101 **Returns:** 102 103 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 104 105 #### available_models 106 107 ```python 108 available_models: list[Model] 109 ``` 110 111 Get a list of available models that work with NvidiaDocumentEmbedder. 112 113 #### from_dict 114 115 ```python 116 from_dict(data: dict[str, Any]) -> NvidiaDocumentEmbedder 117 ``` 118 119 Deserializes the component from a dictionary. 120 121 **Parameters:** 122 123 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 124 125 **Returns:** 126 127 - <code>NvidiaDocumentEmbedder</code> – The deserialized component. 128 129 #### run 130 131 ```python 132 run(documents: list[Document]) -> dict[str, list[Document] | dict[str, Any]] 133 ``` 134 135 Embed a list of Documents. 136 137 The embedding of each Document is stored in the `embedding` field of the Document. 138 139 **Parameters:** 140 141 - **documents** (<code>list\[Document\]</code>) – A list of Documents to embed. 142 143 **Returns:** 144 145 - <code>dict\[str, list\[Document\] | dict\[str, Any\]\]</code> – A dictionary with the following keys and values: 146 - `documents` - List of processed Documents with embeddings. 147 - `meta` - Metadata on usage statistics, etc. 148 149 **Raises:** 150 151 - <code>TypeError</code> – If the input is not a list of Documents. 152 153 ## haystack_integrations.components.embedders.nvidia.text_embedder 154 155 ### NvidiaTextEmbedder 156 157 A component for embedding strings using embedding models provided by [NVIDIA NIMs](https://ai.nvidia.com). 158 159 For models that differentiate between query and document inputs, 160 this component embeds the input string as a query. 161 162 Usage example: 163 164 ```python 165 from haystack_integrations.components.embedders.nvidia import NvidiaTextEmbedder 166 167 text_to_embed = "I love pizza!" 168 169 text_embedder = NvidiaTextEmbedder(model="nvidia/nv-embedqa-e5-v5", api_url="https://integrate.api.nvidia.com/v1") 170 # Components warm up automatically on first run. 171 172 print(text_embedder.run(text_to_embed)) 173 ``` 174 175 #### __init__ 176 177 ```python 178 __init__( 179 model: str | None = None, 180 api_key: Secret | None = Secret.from_env_var("NVIDIA_API_KEY"), 181 api_url: str = os.getenv("NVIDIA_API_URL", DEFAULT_API_URL), 182 prefix: str = "", 183 suffix: str = "", 184 truncate: EmbeddingTruncateMode | str | None = None, 185 timeout: float | None = None, 186 ) -> None 187 ``` 188 189 Create a NvidiaTextEmbedder component. 190 191 **Parameters:** 192 193 - **model** (<code>str | None</code>) – Embedding model to use. 194 If no specific model along with locally hosted API URL is provided, 195 the system defaults to the available model found using /models API. 196 - **api_key** (<code>Secret | None</code>) – API key for the NVIDIA NIM. 197 - **api_url** (<code>str</code>) – Custom API URL for the NVIDIA NIM. 198 Format for API URL is `http://host:port` 199 - **prefix** (<code>str</code>) – A string to add to the beginning of each text. 200 - **suffix** (<code>str</code>) – A string to add to the end of each text. 201 - **truncate** (<code>EmbeddingTruncateMode | str | None</code>) – Specifies how inputs longer that the maximum token length should be truncated. 202 If None the behavior is model-dependent, see the official documentation for more information. 203 - **timeout** (<code>float | None</code>) – Timeout for request calls, if not set it is inferred from the `NVIDIA_TIMEOUT` environment variable 204 or set to 60 by default. 205 206 #### class_name 207 208 ```python 209 class_name() -> str 210 ``` 211 212 Return the class name identifier for serialization. 213 214 #### default_model 215 216 ```python 217 default_model() -> None 218 ``` 219 220 Set default model in local NIM mode. 221 222 #### warm_up 223 224 ```python 225 warm_up() -> None 226 ``` 227 228 Initializes the component. 229 230 #### to_dict 231 232 ```python 233 to_dict() -> dict[str, Any] 234 ``` 235 236 Serializes the component to a dictionary. 237 238 **Returns:** 239 240 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 241 242 #### available_models 243 244 ```python 245 available_models: list[Model] 246 ``` 247 248 Get a list of available models that work with NvidiaTextEmbedder. 249 250 #### from_dict 251 252 ```python 253 from_dict(data: dict[str, Any]) -> NvidiaTextEmbedder 254 ``` 255 256 Deserializes the component from a dictionary. 257 258 **Parameters:** 259 260 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 261 262 **Returns:** 263 264 - <code>NvidiaTextEmbedder</code> – The deserialized component. 265 266 #### run 267 268 ```python 269 run(text: str) -> dict[str, list[float] | dict[str, Any]] 270 ``` 271 272 Embed a string. 273 274 **Parameters:** 275 276 - **text** (<code>str</code>) – The text to embed. 277 278 **Returns:** 279 280 - <code>dict\[str, list\[float\] | dict\[str, Any\]\]</code> – A dictionary with the following keys and values: 281 - `embedding` - Embedding of the text. 282 - `meta` - Metadata on usage statistics, etc. 283 284 **Raises:** 285 286 - <code>TypeError</code> – If the input is not a string. 287 - <code>ValueError</code> – If the input string is empty. 288 289 ## haystack_integrations.components.embedders.nvidia.truncate 290 291 ### EmbeddingTruncateMode 292 293 Bases: <code>Enum</code> 294 295 Specifies how inputs to the NVIDIA embedding components are truncated. 296 297 If START, the input will be truncated from the start. 298 If END, the input will be truncated from the end. 299 If NONE, an error will be returned (if the input is too long). 300 301 #### from_str 302 303 ```python 304 from_str(string: str) -> EmbeddingTruncateMode 305 ``` 306 307 Create an truncate mode from a string. 308 309 **Parameters:** 310 311 - **string** (<code>str</code>) – String to convert. 312 313 **Returns:** 314 315 - <code>EmbeddingTruncateMode</code> – Truncate mode. 316 317 ## haystack_integrations.components.generators.nvidia.chat.chat_generator 318 319 ### NvidiaChatGenerator 320 321 Bases: <code>OpenAIChatGenerator</code> 322 323 Enables text generation using NVIDIA generative models. 324 325 For supported models, see [NVIDIA Docs](https://build.nvidia.com/models). 326 327 Users can pass any text generation parameters valid for the NVIDIA Chat Completion API 328 directly to this component via the `generation_kwargs` parameter in `__init__` or the `generation_kwargs` 329 parameter in `run` method. 330 331 This component uses the ChatMessage format for structuring both input and output, 332 ensuring coherent and contextually relevant responses in chat-based text generation scenarios. 333 Details on the ChatMessage format can be found in the 334 [Haystack docs](https://docs.haystack.deepset.ai/docs/data-classes#chatmessage) 335 336 For more details on the parameters supported by the NVIDIA API, refer to the 337 [NVIDIA Docs](https://build.nvidia.com/models). 338 339 Usage example: 340 341 ```python 342 from haystack_integrations.components.generators.nvidia import NvidiaChatGenerator 343 from haystack.dataclasses import ChatMessage 344 345 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 346 347 client = NvidiaChatGenerator() 348 response = client.run(messages) 349 print(response) 350 ``` 351 352 #### __init__ 353 354 ```python 355 __init__( 356 *, 357 api_key: Secret = Secret.from_env_var("NVIDIA_API_KEY"), 358 model: str = "meta/llama-3.1-8b-instruct", 359 streaming_callback: StreamingCallbackT | None = None, 360 api_base_url: str | None = os.getenv("NVIDIA_API_URL", DEFAULT_API_URL), 361 generation_kwargs: dict[str, Any] | None = None, 362 tools: ToolsType | None = None, 363 timeout: float | None = None, 364 max_retries: int | None = None, 365 http_client_kwargs: dict[str, Any] | None = None 366 ) -> None 367 ``` 368 369 Creates an instance of NvidiaChatGenerator. 370 371 **Parameters:** 372 373 - **api_key** (<code>Secret</code>) – The NVIDIA API key. 374 - **model** (<code>str</code>) – The name of the NVIDIA chat completion model to use. 375 - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream. 376 The callback function accepts StreamingChunk as an argument. 377 - **api_base_url** (<code>str | None</code>) – The NVIDIA API Base url. 378 - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are all sent directly to 379 the NVIDIA API endpoint. See [NVIDIA API docs](https://docs.nvcf.nvidia.com/ai/generative-models/) 380 for more details. 381 Some of the supported parameters: 382 - `max_tokens`: The maximum number of tokens the output text can have. 383 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 384 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 385 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 386 considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens 387 comprising the top 10% probability mass are considered. 388 - `stream`: Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent 389 events as they become available, with the stream terminated by a data: [DONE] message. 390 - `response_format`: For NVIDIA NIM servers, this parameter has limited support. 391 The basic JSON mode with `{"type": "json_object"}` is supported by compatible models, to produce 392 valid JSON output. 393 To generate structured JSON output, use the `response_format` parameter. 394 Example: 395 ```python 396 generation_kwargs={ 397 "response_format": { 398 "type": "json_schema", 399 "json_schema": { 400 "name": "my_schema", 401 "schema": json_schema, 402 }, 403 } 404 } 405 ``` 406 For more details, see the [NVIDIA NIM documentation](https://docs.nvidia.com/nim/vision-language-models/latest/structured-generation.html). 407 - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. This parameter can accept either a 408 list of `Tool` objects or a `Toolset` instance. 409 - **timeout** (<code>float | None</code>) – The timeout for the NVIDIA API call. 410 - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact NVIDIA after an internal error. 411 If not set, it defaults to either the `NVIDIA_MAX_RETRIES` environment variable, or set to 5. 412 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 413 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 414 415 #### to_dict 416 417 ```python 418 to_dict() -> dict[str, Any] 419 ``` 420 421 Serialize this component to a dictionary. 422 423 **Returns:** 424 425 - <code>dict\[str, Any\]</code> – The serialized component as a dictionary. 426 427 ## haystack_integrations.components.generators.nvidia.generator 428 429 ### NvidiaGenerator 430 431 Generates text using generative models hosted with [NVIDIA NIM](https://ai.nvidia.com). 432 433 Available via the [NVIDIA API Catalog](https://build.nvidia.com/explore/discover). 434 435 ### Usage example 436 437 ```python 438 from haystack_integrations.components.generators.nvidia import NvidiaGenerator 439 440 generator = NvidiaGenerator( 441 model="meta/llama3-8b-instruct", 442 model_arguments={ 443 "temperature": 0.2, 444 "top_p": 0.7, 445 "max_tokens": 1024, 446 }, 447 ) 448 # Components warm up automatically on first run. 449 450 result = generator.run(prompt="What is the answer?") 451 print(result["replies"]) 452 print(result["meta"]) 453 print(result["usage"]) 454 ``` 455 456 You need an NVIDIA API key for this component to work. 457 458 #### __init__ 459 460 ```python 461 __init__( 462 model: str | None = None, 463 api_url: str = os.getenv("NVIDIA_API_URL", DEFAULT_API_URL), 464 api_key: Secret | None = Secret.from_env_var("NVIDIA_API_KEY"), 465 model_arguments: dict[str, Any] | None = None, 466 timeout: float | None = None, 467 ) -> None 468 ``` 469 470 Create a NvidiaGenerator component. 471 472 **Parameters:** 473 474 - **model** (<code>str | None</code>) – Name of the model to use for text generation. 475 See the [NVIDIA NIMs](https://ai.nvidia.com) 476 for more information on the supported models. 477 `Note`: If no specific model along with locally hosted API URL is provided, 478 the system defaults to the available model found using /models API. 479 Check supported models at [NVIDIA NIM](https://ai.nvidia.com). 480 - **api_key** (<code>Secret | None</code>) – API key for the NVIDIA NIM. Set it as the `NVIDIA_API_KEY` environment 481 variable or pass it here. 482 - **api_url** (<code>str</code>) – Custom API URL for the NVIDIA NIM. 483 - **model_arguments** (<code>dict\[str, Any\] | None</code>) – Additional arguments to pass to the model provider. These arguments are 484 specific to a model. 485 Search your model in the [NVIDIA NIM](https://ai.nvidia.com) 486 to find the arguments it accepts. 487 - **timeout** (<code>float | None</code>) – Timeout for request calls, if not set it is inferred from the `NVIDIA_TIMEOUT` environment variable 488 or set to 60 by default. 489 490 #### class_name 491 492 ```python 493 class_name() -> str 494 ``` 495 496 Return the class name identifier for serialization. 497 498 #### default_model 499 500 ```python 501 default_model() -> None 502 ``` 503 504 Set default model in local NIM mode. 505 506 #### warm_up 507 508 ```python 509 warm_up() -> None 510 ``` 511 512 Initializes the component. 513 514 #### to_dict 515 516 ```python 517 to_dict() -> dict[str, Any] 518 ``` 519 520 Serializes the component to a dictionary. 521 522 **Returns:** 523 524 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 525 526 #### available_models 527 528 ```python 529 available_models: list[Model] 530 ``` 531 532 Get a list of available models that work with ChatNVIDIA. 533 534 #### from_dict 535 536 ```python 537 from_dict(data: dict[str, Any]) -> NvidiaGenerator 538 ``` 539 540 Deserializes the component from a dictionary. 541 542 **Parameters:** 543 544 - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from. 545 546 **Returns:** 547 548 - <code>NvidiaGenerator</code> – Deserialized component. 549 550 #### run 551 552 ```python 553 run(prompt: str) -> dict[str, list[str] | list[dict[str, Any]]] 554 ``` 555 556 Queries the model with the provided prompt. 557 558 **Parameters:** 559 560 - **prompt** (<code>str</code>) – Text to be sent to the generative model. 561 562 **Returns:** 563 564 - <code>dict\[str, list\[str\] | list\[dict\[str, Any\]\]\]</code> – A dictionary with the following keys: 565 - `replies` - Replies generated by the model. 566 - `meta` - Metadata for each reply. 567 568 ## haystack_integrations.components.rankers.nvidia.ranker 569 570 ### NvidiaRanker 571 572 A component for ranking documents using ranking models provided by [NVIDIA NIMs](https://ai.nvidia.com). 573 574 Usage example: 575 576 ```python 577 from haystack_integrations.components.rankers.nvidia import NvidiaRanker 578 from haystack import Document 579 from haystack.utils import Secret 580 581 ranker = NvidiaRanker( 582 model="nvidia/nv-rerankqa-mistral-4b-v3", 583 api_key=Secret.from_env_var("NVIDIA_API_KEY"), 584 ) 585 # Components warm up automatically on first run. 586 587 query = "What is the capital of Germany?" 588 documents = [ 589 Document(content="Berlin is the capital of Germany."), 590 Document(content="The capital of Germany is Berlin."), 591 Document(content="Germany's capital is Berlin."), 592 ] 593 594 result = ranker.run(query, documents, top_k=2) 595 print(result["documents"]) 596 ``` 597 598 #### __init__ 599 600 ```python 601 __init__( 602 model: str | None = None, 603 truncate: RankerTruncateMode | str | None = None, 604 api_url: str = os.getenv("NVIDIA_API_URL", DEFAULT_API_URL), 605 api_key: Secret | None = Secret.from_env_var("NVIDIA_API_KEY"), 606 top_k: int = 5, 607 query_prefix: str = "", 608 document_prefix: str = "", 609 meta_fields_to_embed: list[str] | None = None, 610 embedding_separator: str = "\n", 611 timeout: float | None = None, 612 ) -> None 613 ``` 614 615 Create a NvidiaRanker component. 616 617 **Parameters:** 618 619 - **model** (<code>str | None</code>) – Ranking model to use. 620 - **truncate** (<code>RankerTruncateMode | str | None</code>) – Truncation strategy to use. Can be "NONE", "END", or RankerTruncateMode. Defaults to NIM's default. 621 - **api_key** (<code>Secret | None</code>) – API key for the NVIDIA NIM. 622 - **api_url** (<code>str</code>) – Custom API URL for the NVIDIA NIM. 623 - **top_k** (<code>int</code>) – Number of documents to return. 624 - **query_prefix** (<code>str</code>) – A string to add at the beginning of the query text before ranking. 625 Use it to prepend the text with an instruction, as required by reranking models like `bge`. 626 - **document_prefix** (<code>str</code>) – A string to add at the beginning of each document before ranking. You can use it to prepend the document 627 with an instruction, as required by embedding models like `bge`. 628 - **meta_fields_to_embed** (<code>list\[str\] | None</code>) – List of metadata fields to embed with the document. 629 - **embedding_separator** (<code>str</code>) – Separator to concatenate metadata fields to the document. 630 - **timeout** (<code>float | None</code>) – Timeout for request calls, if not set it is inferred from the `NVIDIA_TIMEOUT` environment variable 631 or set to 60 by default. 632 633 #### class_name 634 635 ```python 636 class_name() -> str 637 ``` 638 639 Return the class name identifier for serialization. 640 641 #### to_dict 642 643 ```python 644 to_dict() -> dict[str, Any] 645 ``` 646 647 Serialize the ranker to a dictionary. 648 649 **Returns:** 650 651 - <code>dict\[str, Any\]</code> – A dictionary containing the ranker's attributes. 652 653 #### from_dict 654 655 ```python 656 from_dict(data: dict[str, Any]) -> NvidiaRanker 657 ``` 658 659 Deserialize the ranker from a dictionary. 660 661 **Parameters:** 662 663 - **data** (<code>dict\[str, Any\]</code>) – A dictionary containing the ranker's attributes. 664 665 **Returns:** 666 667 - <code>NvidiaRanker</code> – The deserialized ranker. 668 669 #### warm_up 670 671 ```python 672 warm_up() -> None 673 ``` 674 675 Initialize the ranker. 676 677 **Raises:** 678 679 - <code>ValueError</code> – If the API key is required for hosted NVIDIA NIMs. 680 681 #### run 682 683 ```python 684 run( 685 query: str, documents: list[Document], top_k: int | None = None 686 ) -> dict[str, list[Document]] 687 ``` 688 689 Rank a list of documents based on a given query. 690 691 **Parameters:** 692 693 - **query** (<code>str</code>) – The query to rank the documents against. 694 - **documents** (<code>list\[Document\]</code>) – The list of documents to rank. 695 - **top_k** (<code>int | None</code>) – The number of documents to return. 696 697 **Returns:** 698 699 - <code>dict\[str, list\[Document\]\]</code> – A dictionary containing the ranked documents. 700 701 **Raises:** 702 703 - <code>TypeError</code> – If the arguments are of the wrong type. 704 705 ## haystack_integrations.components.rankers.nvidia.truncate 706 707 ### RankerTruncateMode 708 709 Bases: <code>str</code>, <code>Enum</code> 710 711 Specifies how inputs to the NVIDIA ranker components are truncated. 712 713 If NONE, the input will not be truncated and an error returned instead. 714 If END, the input will be truncated from the end. 715 716 #### from_str 717 718 ```python 719 from_str(string: str) -> RankerTruncateMode 720 ``` 721 722 Create an truncate mode from a string. 723 724 **Parameters:** 725 726 - **string** (<code>str</code>) – String to convert. 727 728 **Returns:** 729 730 - <code>RankerTruncateMode</code> – Truncate mode.