vllmtextembedder.mdx
  1  ---
  2  title: "VLLMTextEmbedder"
  3  id: vllmtextembedder
  4  slug: "/vllmtextembedder"
  5  description: "This component computes the embeddings of a string using models served with vLLM."
  6  ---
  7  
  8  # VLLMTextEmbedder
  9  
 10  This component computes the embeddings of a string using models served with [vLLM](https://docs.vllm.ai/).
 11  
 12  <div className="key-value-table">
 13  
 14  |  |  |
 15  | --- | --- |
 16  | **Most common position in a pipeline** | Before an embedding [Retriever](../retrievers.mdx) in a query/RAG pipeline                 |
 17  | **Mandatory init variables**           | `model`: The name of the model served by vLLM                                              |
 18  | **Mandatory run variables**            | `text`: A string                                                                           |
 19  | **Output variables**                   | `embedding`: A vector (list of float numbers)                                              |
 20  | **API reference**                      | [vLLM](/reference/integrations-vllm)                                                       |
 21  | **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm      |
 22  
 23  </div>
 24  
 25  ## Overview
 26  
 27  [vLLM](https://docs.vllm.ai/) is a high-throughput and memory-efficient inference and serving engine for LLMs. It exposes an OpenAI-compatible HTTP server, which `VLLMTextEmbedder` uses to compute embeddings through the Embeddings API.
 28  
 29  `VLLMTextEmbedder` expects a vLLM server to be running and accessible at the `api_base_url` parameter (by default, `http://localhost:8000/v1`). Use this component to embed a simple string (such as a query) into a vector. For embedding lists of documents, use the [`VLLMDocumentEmbedder`](vllmdocumentembedder.mdx).
 30  
 31  When you perform embedding retrieval, use this component first to transform your query into a vector. Then, the embedding Retriever will use the vector to search for similar or relevant documents.
 32  
 33  If the vLLM server was started with `--api-key`, provide the API key through the `VLLM_API_KEY` environment variable or the `api_key` init parameter using Haystack's [Secret](../../concepts/secret-management.mdx) API.
 34  
 35  ### Compatible models
 36  
 37  vLLM supports a range of embedding models. Check the [vLLM pooling models docs](https://docs.vllm.ai/en/stable/models/pooling_models) for the list of supported architectures and models.
 38  
 39  ### vLLM-specific parameters
 40  
 41  You can pass vLLM-specific parameters through the `extra_parameters` dictionary. These are forwarded as `extra_body` to the OpenAI-compatible embeddings endpoint. Use this to pass parameters that are not part of the standard OpenAI Embeddings API, such as `truncate_prompt_tokens` or `truncation_side`. See the [vLLM Embeddings API docs](https://docs.vllm.ai/en/stable/models/pooling_models/embed/#openai-compatible-embeddings-api) for details.
 42  
 43  ```python
 44  embedder = VLLMTextEmbedder(
 45      model="google/embeddinggemma-300m",
 46      extra_parameters={"truncate_prompt_tokens": 256, "truncation_side": "right"},
 47  )
 48  ```
 49  
 50  ### Matryoshka embeddings
 51  
 52  If the model was trained with Matryoshka Representation Learning, you can reduce the dimensionality of the output vector through the `dimensions` parameter. See the [vLLM Matryoshka docs](https://docs.vllm.ai/en/stable/models/pooling_models/embed/#matryoshka-embeddings) for details.
 53  
 54  ### Instructions
 55  
 56  Some embedding models require prepending the text with an instruction to work better for retrieval. For example, if you use [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5#model-list), you should prefix your query with the following instruction: "Represent this sentence for searching relevant passages:".
 57  
 58  This is how it works with `VLLMTextEmbedder`:
 59  
 60  ```python
 61  instruction = "Represent this sentence for searching relevant passages:"
 62  embedder = VLLMTextEmbedder(
 63      model="BAAI/bge-large-en-v1.5",
 64      prefix=instruction,
 65  )
 66  ```
 67  
 68  ## Usage
 69  
 70  Install the `vllm-haystack` package to use the `VLLMTextEmbedder`:
 71  
 72  ```shell
 73  pip install vllm-haystack
 74  ```
 75  
 76  ### Starting the vLLM server
 77  
 78  Before using this component, start a vLLM server with an embedding model:
 79  
 80  ```bash
 81  vllm serve google/embeddinggemma-300m
 82  ```
 83  
 84  For details on server options, see the [vLLM CLI docs](https://docs.vllm.ai/en/stable/cli/serve/).
 85  
 86  ### On its own
 87  
 88  ```python
 89  from haystack_integrations.components.embedders.vllm import VLLMTextEmbedder
 90  
 91  text_embedder = VLLMTextEmbedder(model="google/embeddinggemma-300m")
 92  print(text_embedder.run("I love pizza!"))
 93  
 94  ## {'embedding': [-0.0215301513671875, 0.01499176025390625, ...], 'meta': {...}}
 95  ```
 96  
 97  ### In a pipeline
 98  
 99  ```python
100  from haystack import Document, Pipeline
101  from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
102  from haystack.document_stores.in_memory import InMemoryDocumentStore
103  from haystack_integrations.components.embedders.vllm import (
104      VLLMDocumentEmbedder,
105      VLLMTextEmbedder,
106  )
107  
108  document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
109  
110  documents = [
111      Document(content="My name is Wolfgang and I live in Berlin"),
112      Document(content="I saw a black horse running"),
113      Document(content="Germany has many big cities"),
114  ]
115  
116  document_embedder = VLLMDocumentEmbedder(model="google/embeddinggemma-300m")
117  documents_with_embeddings = document_embedder.run(documents)["documents"]
118  document_store.write_documents(documents_with_embeddings)
119  
120  query_pipeline = Pipeline()
121  query_pipeline.add_component(
122      "text_embedder",
123      VLLMTextEmbedder(model="google/embeddinggemma-300m"),
124  )
125  query_pipeline.add_component(
126      "retriever",
127      InMemoryEmbeddingRetriever(document_store=document_store),
128  )
129  query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
130  
131  query = "Who lives in Berlin?"
132  
133  result = query_pipeline.run({"text_embedder": {"text": query}})
134  
135  print(result["retriever"]["documents"][0])
136  
137  ## Document(id=..., content: 'My name is Wolfgang and I live in Berlin', score: ...)
138  ```