nvidia.md
  1  ---
  2  title: "Nvidia"
  3  id: integrations-nvidia
  4  description: "Nvidia integration for Haystack"
  5  slug: "/integrations-nvidia"
  6  ---
  7  
  8  
  9  ## haystack_integrations.components.embedders.nvidia.document_embedder
 10  
 11  ### NvidiaDocumentEmbedder
 12  
 13  A component for embedding documents using embedding models provided by [NVIDIA NIMs](https://ai.nvidia.com).
 14  
 15  Usage example:
 16  
 17  ```python
 18  from haystack_integrations.components.embedders.nvidia import NvidiaDocumentEmbedder
 19  
 20  doc = Document(content="I love pizza!")
 21  
 22  text_embedder = NvidiaDocumentEmbedder(model="nvidia/nv-embedqa-e5-v5", api_url="https://integrate.api.nvidia.com/v1")
 23  # Components warm up automatically on first run.
 24  
 25  result = document_embedder.run([doc])
 26  print(result["documents"][0].embedding)
 27  ```
 28  
 29  #### __init__
 30  
 31  ```python
 32  __init__(
 33      model: str | None = None,
 34      api_key: Secret | None = Secret.from_env_var("NVIDIA_API_KEY"),
 35      api_url: str = os.getenv("NVIDIA_API_URL", DEFAULT_API_URL),
 36      prefix: str = "",
 37      suffix: str = "",
 38      batch_size: int = 32,
 39      progress_bar: bool = True,
 40      meta_fields_to_embed: list[str] | None = None,
 41      embedding_separator: str = "\n",
 42      truncate: EmbeddingTruncateMode | str | None = None,
 43      timeout: float | None = None,
 44  ) -> None
 45  ```
 46  
 47  Create a NvidiaTextEmbedder component.
 48  
 49  **Parameters:**
 50  
 51  - **model** (<code>str | None</code>) – Embedding model to use.
 52    If no specific model along with locally hosted API URL is provided,
 53    the system defaults to the available model found using /models API.
 54  - **api_key** (<code>Secret | None</code>) – API key for the NVIDIA NIM.
 55  - **api_url** (<code>str</code>) – Custom API URL for the NVIDIA NIM.
 56    Format for API URL is `http://host:port`
 57  - **prefix** (<code>str</code>) – A string to add to the beginning of each text.
 58  - **suffix** (<code>str</code>) – A string to add to the end of each text.
 59  - **batch_size** (<code>int</code>) – Number of Documents to encode at once.
 60    Cannot be greater than 50.
 61  - **progress_bar** (<code>bool</code>) – Whether to show a progress bar or not.
 62  - **meta_fields_to_embed** (<code>list\[str\] | None</code>) – List of meta fields that should be embedded along with the Document text.
 63  - **embedding_separator** (<code>str</code>) – Separator used to concatenate the meta fields to the Document text.
 64  - **truncate** (<code>EmbeddingTruncateMode | str | None</code>) – Specifies how inputs longer than the maximum token length should be truncated.
 65    If None the behavior is model-dependent, see the official documentation for more information.
 66  - **timeout** (<code>float | None</code>) – Timeout for request calls, if not set it is inferred from the `NVIDIA_TIMEOUT` environment variable
 67    or set to 60 by default.
 68  
 69  #### class_name
 70  
 71  ```python
 72  class_name() -> str
 73  ```
 74  
 75  Return the class name identifier for serialization.
 76  
 77  #### default_model
 78  
 79  ```python
 80  default_model() -> None
 81  ```
 82  
 83  Set default model in local NIM mode.
 84  
 85  #### warm_up
 86  
 87  ```python
 88  warm_up() -> None
 89  ```
 90  
 91  Initializes the component.
 92  
 93  #### to_dict
 94  
 95  ```python
 96  to_dict() -> dict[str, Any]
 97  ```
 98  
 99  Serializes the component to a dictionary.
100  
101  **Returns:**
102  
103  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
104  
105  #### available_models
106  
107  ```python
108  available_models: list[Model]
109  ```
110  
111  Get a list of available models that work with NvidiaDocumentEmbedder.
112  
113  #### from_dict
114  
115  ```python
116  from_dict(data: dict[str, Any]) -> NvidiaDocumentEmbedder
117  ```
118  
119  Deserializes the component from a dictionary.
120  
121  **Parameters:**
122  
123  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
124  
125  **Returns:**
126  
127  - <code>NvidiaDocumentEmbedder</code> – The deserialized component.
128  
129  #### run
130  
131  ```python
132  run(documents: list[Document]) -> dict[str, list[Document] | dict[str, Any]]
133  ```
134  
135  Embed a list of Documents.
136  
137  The embedding of each Document is stored in the `embedding` field of the Document.
138  
139  **Parameters:**
140  
141  - **documents** (<code>list\[Document\]</code>) – A list of Documents to embed.
142  
143  **Returns:**
144  
145  - <code>dict\[str, list\[Document\] | dict\[str, Any\]\]</code> – A dictionary with the following keys and values:
146  - `documents` - List of processed Documents with embeddings.
147  - `meta` - Metadata on usage statistics, etc.
148  
149  **Raises:**
150  
151  - <code>TypeError</code> – If the input is not a list of Documents.
152  
153  ## haystack_integrations.components.embedders.nvidia.text_embedder
154  
155  ### NvidiaTextEmbedder
156  
157  A component for embedding strings using embedding models provided by [NVIDIA NIMs](https://ai.nvidia.com).
158  
159  For models that differentiate between query and document inputs,
160  this component embeds the input string as a query.
161  
162  Usage example:
163  
164  ```python
165  from haystack_integrations.components.embedders.nvidia import NvidiaTextEmbedder
166  
167  text_to_embed = "I love pizza!"
168  
169  text_embedder = NvidiaTextEmbedder(model="nvidia/nv-embedqa-e5-v5", api_url="https://integrate.api.nvidia.com/v1")
170  # Components warm up automatically on first run.
171  
172  print(text_embedder.run(text_to_embed))
173  ```
174  
175  #### __init__
176  
177  ```python
178  __init__(
179      model: str | None = None,
180      api_key: Secret | None = Secret.from_env_var("NVIDIA_API_KEY"),
181      api_url: str = os.getenv("NVIDIA_API_URL", DEFAULT_API_URL),
182      prefix: str = "",
183      suffix: str = "",
184      truncate: EmbeddingTruncateMode | str | None = None,
185      timeout: float | None = None,
186  ) -> None
187  ```
188  
189  Create a NvidiaTextEmbedder component.
190  
191  **Parameters:**
192  
193  - **model** (<code>str | None</code>) – Embedding model to use.
194    If no specific model along with locally hosted API URL is provided,
195    the system defaults to the available model found using /models API.
196  - **api_key** (<code>Secret | None</code>) – API key for the NVIDIA NIM.
197  - **api_url** (<code>str</code>) – Custom API URL for the NVIDIA NIM.
198    Format for API URL is `http://host:port`
199  - **prefix** (<code>str</code>) – A string to add to the beginning of each text.
200  - **suffix** (<code>str</code>) – A string to add to the end of each text.
201  - **truncate** (<code>EmbeddingTruncateMode | str | None</code>) – Specifies how inputs longer that the maximum token length should be truncated.
202    If None the behavior is model-dependent, see the official documentation for more information.
203  - **timeout** (<code>float | None</code>) – Timeout for request calls, if not set it is inferred from the `NVIDIA_TIMEOUT` environment variable
204    or set to 60 by default.
205  
206  #### class_name
207  
208  ```python
209  class_name() -> str
210  ```
211  
212  Return the class name identifier for serialization.
213  
214  #### default_model
215  
216  ```python
217  default_model() -> None
218  ```
219  
220  Set default model in local NIM mode.
221  
222  #### warm_up
223  
224  ```python
225  warm_up() -> None
226  ```
227  
228  Initializes the component.
229  
230  #### to_dict
231  
232  ```python
233  to_dict() -> dict[str, Any]
234  ```
235  
236  Serializes the component to a dictionary.
237  
238  **Returns:**
239  
240  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
241  
242  #### available_models
243  
244  ```python
245  available_models: list[Model]
246  ```
247  
248  Get a list of available models that work with NvidiaTextEmbedder.
249  
250  #### from_dict
251  
252  ```python
253  from_dict(data: dict[str, Any]) -> NvidiaTextEmbedder
254  ```
255  
256  Deserializes the component from a dictionary.
257  
258  **Parameters:**
259  
260  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
261  
262  **Returns:**
263  
264  - <code>NvidiaTextEmbedder</code> – The deserialized component.
265  
266  #### run
267  
268  ```python
269  run(text: str) -> dict[str, list[float] | dict[str, Any]]
270  ```
271  
272  Embed a string.
273  
274  **Parameters:**
275  
276  - **text** (<code>str</code>) – The text to embed.
277  
278  **Returns:**
279  
280  - <code>dict\[str, list\[float\] | dict\[str, Any\]\]</code> – A dictionary with the following keys and values:
281  - `embedding` - Embedding of the text.
282  - `meta` - Metadata on usage statistics, etc.
283  
284  **Raises:**
285  
286  - <code>TypeError</code> – If the input is not a string.
287  - <code>ValueError</code> – If the input string is empty.
288  
289  ## haystack_integrations.components.embedders.nvidia.truncate
290  
291  ### EmbeddingTruncateMode
292  
293  Bases: <code>Enum</code>
294  
295  Specifies how inputs to the NVIDIA embedding components are truncated.
296  
297  If START, the input will be truncated from the start.
298  If END, the input will be truncated from the end.
299  If NONE, an error will be returned (if the input is too long).
300  
301  #### from_str
302  
303  ```python
304  from_str(string: str) -> EmbeddingTruncateMode
305  ```
306  
307  Create an truncate mode from a string.
308  
309  **Parameters:**
310  
311  - **string** (<code>str</code>) – String to convert.
312  
313  **Returns:**
314  
315  - <code>EmbeddingTruncateMode</code> – Truncate mode.
316  
317  ## haystack_integrations.components.generators.nvidia.chat.chat_generator
318  
319  ### NvidiaChatGenerator
320  
321  Bases: <code>OpenAIChatGenerator</code>
322  
323  Enables text generation using NVIDIA generative models.
324  
325  For supported models, see [NVIDIA Docs](https://build.nvidia.com/models).
326  
327  Users can pass any text generation parameters valid for the NVIDIA Chat Completion API
328  directly to this component via the `generation_kwargs` parameter in `__init__` or the `generation_kwargs`
329  parameter in `run` method.
330  
331  This component uses the ChatMessage format for structuring both input and output,
332  ensuring coherent and contextually relevant responses in chat-based text generation scenarios.
333  Details on the ChatMessage format can be found in the
334  [Haystack docs](https://docs.haystack.deepset.ai/docs/data-classes#chatmessage)
335  
336  For more details on the parameters supported by the NVIDIA API, refer to the
337  [NVIDIA Docs](https://build.nvidia.com/models).
338  
339  Usage example:
340  
341  ```python
342  from haystack_integrations.components.generators.nvidia import NvidiaChatGenerator
343  from haystack.dataclasses import ChatMessage
344  
345  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
346  
347  client = NvidiaChatGenerator()
348  response = client.run(messages)
349  print(response)
350  ```
351  
352  #### __init__
353  
354  ```python
355  __init__(
356      *,
357      api_key: Secret = Secret.from_env_var("NVIDIA_API_KEY"),
358      model: str = "meta/llama-3.1-8b-instruct",
359      streaming_callback: StreamingCallbackT | None = None,
360      api_base_url: str | None = os.getenv("NVIDIA_API_URL", DEFAULT_API_URL),
361      generation_kwargs: dict[str, Any] | None = None,
362      tools: ToolsType | None = None,
363      timeout: float | None = None,
364      max_retries: int | None = None,
365      http_client_kwargs: dict[str, Any] | None = None
366  ) -> None
367  ```
368  
369  Creates an instance of NvidiaChatGenerator.
370  
371  **Parameters:**
372  
373  - **api_key** (<code>Secret</code>) – The NVIDIA API key.
374  - **model** (<code>str</code>) – The name of the NVIDIA chat completion model to use.
375  - **streaming_callback** (<code>StreamingCallbackT | None</code>) – A callback function that is called when a new token is received from the stream.
376    The callback function accepts StreamingChunk as an argument.
377  - **api_base_url** (<code>str | None</code>) – The NVIDIA API Base url.
378  - **generation_kwargs** (<code>dict\[str, Any\] | None</code>) – Other parameters to use for the model. These parameters are all sent directly to
379    the NVIDIA API endpoint. See [NVIDIA API docs](https://docs.nvcf.nvidia.com/ai/generative-models/)
380    for more details.
381    Some of the supported parameters:
382  - `max_tokens`: The maximum number of tokens the output text can have.
383  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
384    Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
385  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
386    considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens
387    comprising the top 10% probability mass are considered.
388  - `stream`: Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent
389    events as they become available, with the stream terminated by a data: [DONE] message.
390  - `response_format`: For NVIDIA NIM servers, this parameter has limited support.
391    The basic JSON mode with `{"type": "json_object"}` is supported by compatible models, to produce
392    valid JSON output.
393    To generate structured JSON output, use the `response_format` parameter.
394    Example:
395    ```python
396    generation_kwargs={
397        "response_format": {
398            "type": "json_schema",
399            "json_schema": {
400                "name": "my_schema",
401                "schema": json_schema,
402            },
403        }
404    }
405    ```
406    For more details, see the [NVIDIA NIM documentation](https://docs.nvidia.com/nim/vision-language-models/latest/structured-generation.html).
407  - **tools** (<code>ToolsType | None</code>) – A list of tools or a Toolset for which the model can prepare calls. This parameter can accept either a
408    list of `Tool` objects or a `Toolset` instance.
409  - **timeout** (<code>float | None</code>) – The timeout for the NVIDIA API call.
410  - **max_retries** (<code>int | None</code>) – Maximum number of retries to contact NVIDIA after an internal error.
411    If not set, it defaults to either the `NVIDIA_MAX_RETRIES` environment variable, or set to 5.
412  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
413    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
414  
415  #### to_dict
416  
417  ```python
418  to_dict() -> dict[str, Any]
419  ```
420  
421  Serialize this component to a dictionary.
422  
423  **Returns:**
424  
425  - <code>dict\[str, Any\]</code> – The serialized component as a dictionary.
426  
427  ## haystack_integrations.components.generators.nvidia.generator
428  
429  ### NvidiaGenerator
430  
431  Generates text using generative models hosted with [NVIDIA NIM](https://ai.nvidia.com).
432  
433  Available via the [NVIDIA API Catalog](https://build.nvidia.com/explore/discover).
434  
435  ### Usage example
436  
437  ```python
438  from haystack_integrations.components.generators.nvidia import NvidiaGenerator
439  
440  generator = NvidiaGenerator(
441      model="meta/llama3-8b-instruct",
442      model_arguments={
443          "temperature": 0.2,
444          "top_p": 0.7,
445          "max_tokens": 1024,
446      },
447  )
448  # Components warm up automatically on first run.
449  
450  result = generator.run(prompt="What is the answer?")
451  print(result["replies"])
452  print(result["meta"])
453  print(result["usage"])
454  ```
455  
456  You need an NVIDIA API key for this component to work.
457  
458  #### __init__
459  
460  ```python
461  __init__(
462      model: str | None = None,
463      api_url: str = os.getenv("NVIDIA_API_URL", DEFAULT_API_URL),
464      api_key: Secret | None = Secret.from_env_var("NVIDIA_API_KEY"),
465      model_arguments: dict[str, Any] | None = None,
466      timeout: float | None = None,
467  ) -> None
468  ```
469  
470  Create a NvidiaGenerator component.
471  
472  **Parameters:**
473  
474  - **model** (<code>str | None</code>) – Name of the model to use for text generation.
475    See the [NVIDIA NIMs](https://ai.nvidia.com)
476    for more information on the supported models.
477    `Note`: If no specific model along with locally hosted API URL is provided,
478    the system defaults to the available model found using /models API.
479    Check supported models at [NVIDIA NIM](https://ai.nvidia.com).
480  - **api_key** (<code>Secret | None</code>) – API key for the NVIDIA NIM. Set it as the `NVIDIA_API_KEY` environment
481    variable or pass it here.
482  - **api_url** (<code>str</code>) – Custom API URL for the NVIDIA NIM.
483  - **model_arguments** (<code>dict\[str, Any\] | None</code>) – Additional arguments to pass to the model provider. These arguments are
484    specific to a model.
485    Search your model in the [NVIDIA NIM](https://ai.nvidia.com)
486    to find the arguments it accepts.
487  - **timeout** (<code>float | None</code>) – Timeout for request calls, if not set it is inferred from the `NVIDIA_TIMEOUT` environment variable
488    or set to 60 by default.
489  
490  #### class_name
491  
492  ```python
493  class_name() -> str
494  ```
495  
496  Return the class name identifier for serialization.
497  
498  #### default_model
499  
500  ```python
501  default_model() -> None
502  ```
503  
504  Set default model in local NIM mode.
505  
506  #### warm_up
507  
508  ```python
509  warm_up() -> None
510  ```
511  
512  Initializes the component.
513  
514  #### to_dict
515  
516  ```python
517  to_dict() -> dict[str, Any]
518  ```
519  
520  Serializes the component to a dictionary.
521  
522  **Returns:**
523  
524  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
525  
526  #### available_models
527  
528  ```python
529  available_models: list[Model]
530  ```
531  
532  Get a list of available models that work with ChatNVIDIA.
533  
534  #### from_dict
535  
536  ```python
537  from_dict(data: dict[str, Any]) -> NvidiaGenerator
538  ```
539  
540  Deserializes the component from a dictionary.
541  
542  **Parameters:**
543  
544  - **data** (<code>dict\[str, Any\]</code>) – Dictionary to deserialize from.
545  
546  **Returns:**
547  
548  - <code>NvidiaGenerator</code> – Deserialized component.
549  
550  #### run
551  
552  ```python
553  run(prompt: str) -> dict[str, list[str] | list[dict[str, Any]]]
554  ```
555  
556  Queries the model with the provided prompt.
557  
558  **Parameters:**
559  
560  - **prompt** (<code>str</code>) – Text to be sent to the generative model.
561  
562  **Returns:**
563  
564  - <code>dict\[str, list\[str\] | list\[dict\[str, Any\]\]\]</code> – A dictionary with the following keys:
565  - `replies` - Replies generated by the model.
566  - `meta` - Metadata for each reply.
567  
568  ## haystack_integrations.components.rankers.nvidia.ranker
569  
570  ### NvidiaRanker
571  
572  A component for ranking documents using ranking models provided by [NVIDIA NIMs](https://ai.nvidia.com).
573  
574  Usage example:
575  
576  ```python
577  from haystack_integrations.components.rankers.nvidia import NvidiaRanker
578  from haystack import Document
579  from haystack.utils import Secret
580  
581  ranker = NvidiaRanker(
582      model="nvidia/nv-rerankqa-mistral-4b-v3",
583      api_key=Secret.from_env_var("NVIDIA_API_KEY"),
584  )
585  # Components warm up automatically on first run.
586  
587  query = "What is the capital of Germany?"
588  documents = [
589      Document(content="Berlin is the capital of Germany."),
590      Document(content="The capital of Germany is Berlin."),
591      Document(content="Germany's capital is Berlin."),
592  ]
593  
594  result = ranker.run(query, documents, top_k=2)
595  print(result["documents"])
596  ```
597  
598  #### __init__
599  
600  ```python
601  __init__(
602      model: str | None = None,
603      truncate: RankerTruncateMode | str | None = None,
604      api_url: str = os.getenv("NVIDIA_API_URL", DEFAULT_API_URL),
605      api_key: Secret | None = Secret.from_env_var("NVIDIA_API_KEY"),
606      top_k: int = 5,
607      query_prefix: str = "",
608      document_prefix: str = "",
609      meta_fields_to_embed: list[str] | None = None,
610      embedding_separator: str = "\n",
611      timeout: float | None = None,
612  ) -> None
613  ```
614  
615  Create a NvidiaRanker component.
616  
617  **Parameters:**
618  
619  - **model** (<code>str | None</code>) – Ranking model to use.
620  - **truncate** (<code>RankerTruncateMode | str | None</code>) – Truncation strategy to use. Can be "NONE", "END", or RankerTruncateMode. Defaults to NIM's default.
621  - **api_key** (<code>Secret | None</code>) – API key for the NVIDIA NIM.
622  - **api_url** (<code>str</code>) – Custom API URL for the NVIDIA NIM.
623  - **top_k** (<code>int</code>) – Number of documents to return.
624  - **query_prefix** (<code>str</code>) – A string to add at the beginning of the query text before ranking.
625    Use it to prepend the text with an instruction, as required by reranking models like `bge`.
626  - **document_prefix** (<code>str</code>) – A string to add at the beginning of each document before ranking. You can use it to prepend the document
627    with an instruction, as required by embedding models like `bge`.
628  - **meta_fields_to_embed** (<code>list\[str\] | None</code>) – List of metadata fields to embed with the document.
629  - **embedding_separator** (<code>str</code>) – Separator to concatenate metadata fields to the document.
630  - **timeout** (<code>float | None</code>) – Timeout for request calls, if not set it is inferred from the `NVIDIA_TIMEOUT` environment variable
631    or set to 60 by default.
632  
633  #### class_name
634  
635  ```python
636  class_name() -> str
637  ```
638  
639  Return the class name identifier for serialization.
640  
641  #### to_dict
642  
643  ```python
644  to_dict() -> dict[str, Any]
645  ```
646  
647  Serialize the ranker to a dictionary.
648  
649  **Returns:**
650  
651  - <code>dict\[str, Any\]</code> – A dictionary containing the ranker's attributes.
652  
653  #### from_dict
654  
655  ```python
656  from_dict(data: dict[str, Any]) -> NvidiaRanker
657  ```
658  
659  Deserialize the ranker from a dictionary.
660  
661  **Parameters:**
662  
663  - **data** (<code>dict\[str, Any\]</code>) – A dictionary containing the ranker's attributes.
664  
665  **Returns:**
666  
667  - <code>NvidiaRanker</code> – The deserialized ranker.
668  
669  #### warm_up
670  
671  ```python
672  warm_up() -> None
673  ```
674  
675  Initialize the ranker.
676  
677  **Raises:**
678  
679  - <code>ValueError</code> – If the API key is required for hosted NVIDIA NIMs.
680  
681  #### run
682  
683  ```python
684  run(
685      query: str, documents: list[Document], top_k: int | None = None
686  ) -> dict[str, list[Document]]
687  ```
688  
689  Rank a list of documents based on a given query.
690  
691  **Parameters:**
692  
693  - **query** (<code>str</code>) – The query to rank the documents against.
694  - **documents** (<code>list\[Document\]</code>) – The list of documents to rank.
695  - **top_k** (<code>int | None</code>) – The number of documents to return.
696  
697  **Returns:**
698  
699  - <code>dict\[str, list\[Document\]\]</code> – A dictionary containing the ranked documents.
700  
701  **Raises:**
702  
703  - <code>TypeError</code> – If the arguments are of the wrong type.
704  
705  ## haystack_integrations.components.rankers.nvidia.truncate
706  
707  ### RankerTruncateMode
708  
709  Bases: <code>str</code>, <code>Enum</code>
710  
711  Specifies how inputs to the NVIDIA ranker components are truncated.
712  
713  If NONE, the input will not be truncated and an error returned instead.
714  If END, the input will be truncated from the end.
715  
716  #### from_str
717  
718  ```python
719  from_str(string: str) -> RankerTruncateMode
720  ```
721  
722  Create an truncate mode from a string.
723  
724  **Parameters:**
725  
726  - **string** (<code>str</code>) – String to convert.
727  
728  **Returns:**
729  
730  - <code>RankerTruncateMode</code> – Truncate mode.