/ docs-website / versioned_docs / version-2.28 / pipeline-components / embedders / vllmtextembedder.mdx
vllmtextembedder.mdx
1 --- 2 title: "VLLMTextEmbedder" 3 id: vllmtextembedder 4 slug: "/vllmtextembedder" 5 description: "This component computes the embeddings of a string using models served with vLLM." 6 --- 7 8 # VLLMTextEmbedder 9 10 This component computes the embeddings of a string using models served with [vLLM](https://docs.vllm.ai/). 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | Before an embedding [Retriever](../retrievers.mdx) in a query/RAG pipeline | 17 | **Mandatory init variables** | `model`: The name of the model served by vLLM | 18 | **Mandatory run variables** | `text`: A string | 19 | **Output variables** | `embedding`: A vector (list of float numbers) | 20 | **API reference** | [vLLM](/reference/integrations-vllm) | 21 | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm | 22 23 </div> 24 25 ## Overview 26 27 [vLLM](https://docs.vllm.ai/) is a high-throughput and memory-efficient inference and serving engine for LLMs. It exposes an OpenAI-compatible HTTP server, which `VLLMTextEmbedder` uses to compute embeddings through the Embeddings API. 28 29 `VLLMTextEmbedder` expects a vLLM server to be running and accessible at the `api_base_url` parameter (by default, `http://localhost:8000/v1`). Use this component to embed a simple string (such as a query) into a vector. For embedding lists of documents, use the [`VLLMDocumentEmbedder`](vllmdocumentembedder.mdx). 30 31 When you perform embedding retrieval, use this component first to transform your query into a vector. Then, the embedding Retriever will use the vector to search for similar or relevant documents. 32 33 If the vLLM server was started with `--api-key`, provide the API key through the `VLLM_API_KEY` environment variable or the `api_key` init parameter using Haystack's [Secret](../../concepts/secret-management.mdx) API. 34 35 ### Compatible models 36 37 vLLM supports a range of embedding models. Check the [vLLM pooling models docs](https://docs.vllm.ai/en/stable/models/pooling_models) for the list of supported architectures and models. 38 39 ### vLLM-specific parameters 40 41 You can pass vLLM-specific parameters through the `extra_parameters` dictionary. These are forwarded as `extra_body` to the OpenAI-compatible embeddings endpoint. Use this to pass parameters that are not part of the standard OpenAI Embeddings API, such as `truncate_prompt_tokens` or `truncation_side`. See the [vLLM Embeddings API docs](https://docs.vllm.ai/en/stable/models/pooling_models/embed/#openai-compatible-embeddings-api) for details. 42 43 ```python 44 embedder = VLLMTextEmbedder( 45 model="google/embeddinggemma-300m", 46 extra_parameters={"truncate_prompt_tokens": 256, "truncation_side": "right"}, 47 ) 48 ``` 49 50 ### Matryoshka embeddings 51 52 If the model was trained with Matryoshka Representation Learning, you can reduce the dimensionality of the output vector through the `dimensions` parameter. See the [vLLM Matryoshka docs](https://docs.vllm.ai/en/stable/models/pooling_models/embed/#matryoshka-embeddings) for details. 53 54 ### Instructions 55 56 Some embedding models require prepending the text with an instruction to work better for retrieval. For example, if you use [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5#model-list), you should prefix your query with the following instruction: "Represent this sentence for searching relevant passages:". 57 58 This is how it works with `VLLMTextEmbedder`: 59 60 ```python 61 instruction = "Represent this sentence for searching relevant passages:" 62 embedder = VLLMTextEmbedder( 63 model="BAAI/bge-large-en-v1.5", 64 prefix=instruction, 65 ) 66 ``` 67 68 ## Usage 69 70 Install the `vllm-haystack` package to use the `VLLMTextEmbedder`: 71 72 ```shell 73 pip install vllm-haystack 74 ``` 75 76 ### Starting the vLLM server 77 78 Before using this component, start a vLLM server with an embedding model: 79 80 ```bash 81 vllm serve google/embeddinggemma-300m 82 ``` 83 84 For details on server options, see the [vLLM CLI docs](https://docs.vllm.ai/en/stable/cli/serve/). 85 86 ### On its own 87 88 ```python 89 from haystack_integrations.components.embedders.vllm import VLLMTextEmbedder 90 91 text_embedder = VLLMTextEmbedder(model="google/embeddinggemma-300m") 92 print(text_embedder.run("I love pizza!")) 93 94 ## {'embedding': [-0.0215301513671875, 0.01499176025390625, ...], 'meta': {...}} 95 ``` 96 97 ### In a pipeline 98 99 ```python 100 from haystack import Document, Pipeline 101 from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever 102 from haystack.document_stores.in_memory import InMemoryDocumentStore 103 from haystack_integrations.components.embedders.vllm import ( 104 VLLMDocumentEmbedder, 105 VLLMTextEmbedder, 106 ) 107 108 document_store = InMemoryDocumentStore(embedding_similarity_function="cosine") 109 110 documents = [ 111 Document(content="My name is Wolfgang and I live in Berlin"), 112 Document(content="I saw a black horse running"), 113 Document(content="Germany has many big cities"), 114 ] 115 116 document_embedder = VLLMDocumentEmbedder(model="google/embeddinggemma-300m") 117 documents_with_embeddings = document_embedder.run(documents)["documents"] 118 document_store.write_documents(documents_with_embeddings) 119 120 query_pipeline = Pipeline() 121 query_pipeline.add_component( 122 "text_embedder", 123 VLLMTextEmbedder(model="google/embeddinggemma-300m"), 124 ) 125 query_pipeline.add_component( 126 "retriever", 127 InMemoryEmbeddingRetriever(document_store=document_store), 128 ) 129 query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") 130 131 query = "Who lives in Berlin?" 132 133 result = query_pipeline.run({"text_embedder": {"text": query}}) 134 135 print(result["retriever"]["documents"][0]) 136 137 ## Document(id=..., content: 'My name is Wolfgang and I live in Berlin', score: ...) 138 ```