vllmdocumentembedder.mdx
1 --- 2 title: "VLLMDocumentEmbedder" 3 id: vllmdocumentembedder 4 slug: "/vllmdocumentembedder" 5 description: "This component computes the embeddings of a list of documents using models served with vLLM." 6 --- 7 8 # VLLMDocumentEmbedder 9 10 This component computes the embeddings of a list of documents using models served with [vLLM](https://docs.vllm.ai/). 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | Before a [`DocumentWriter`](../writers/documentwriter.mdx) in an indexing pipeline | 17 | **Mandatory init variables** | `model`: The name of the model served by vLLM | 18 | **Mandatory run variables** | `documents`: A list of documents | 19 | **Output variables** | `documents`: A list of documents (enriched with embeddings) | 20 | **API reference** | [vLLM](/reference/integrations-vllm) | 21 | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/vllm | 22 23 </div> 24 25 ## Overview 26 27 [vLLM](https://docs.vllm.ai/) is a high-throughput and memory-efficient inference and serving engine for LLMs. It exposes an OpenAI-compatible HTTP server, which `VLLMDocumentEmbedder` uses to compute embeddings through the Embeddings API. 28 29 `VLLMDocumentEmbedder` computes the embeddings of a list of documents and stores the obtained vectors in the `embedding` field of each document. It expects a vLLM server to be running and accessible at the `api_base_url` parameter (by default, `http://localhost:8000/v1`). To embed a string (such as a query), use the [`VLLMTextEmbedder`](vllmtextembedder.mdx). 30 31 The vectors computed by this component are necessary to perform embedding retrieval on a collection of documents. At retrieval time, the vector that represents the query is compared with those of the documents to find the most similar or relevant ones. 32 33 If the vLLM server was started with `--api-key`, provide the API key through the `VLLM_API_KEY` environment variable or the `api_key` init parameter using Haystack's [Secret](../../concepts/secret-management.mdx) API. 34 35 ### Compatible models 36 37 vLLM supports a range of embedding models. Check the [vLLM pooling models docs](https://docs.vllm.ai/en/stable/models/pooling_models) for the list of supported architectures and models. 38 39 ### vLLM-specific parameters 40 41 You can pass vLLM-specific parameters through the `extra_parameters` dictionary. These are forwarded as `extra_body` to the OpenAI-compatible embeddings endpoint. Use this to pass parameters that are not part of the standard OpenAI Embeddings API, such as `truncate_prompt_tokens` or `truncation_side`. See the [vLLM Embeddings API docs](https://docs.vllm.ai/en/stable/models/pooling_models/embed/#openai-compatible-embeddings-api) for details. 42 43 ```python 44 embedder = VLLMDocumentEmbedder( 45 model="google/embeddinggemma-300m", 46 extra_parameters={"truncate_prompt_tokens": 256, "truncation_side": "right"}, 47 ) 48 ``` 49 50 ### Matryoshka embeddings 51 52 If the model was trained with Matryoshka Representation Learning, you can reduce the dimensionality of the output vector through the `dimensions` parameter. See the [vLLM Matryoshka docs](https://docs.vllm.ai/en/stable/models/pooling_models/embed/#matryoshka-embeddings) for details. 53 54 ### Batching and failure handling 55 56 `VLLMDocumentEmbedder` encodes documents in batches. Use `batch_size` (default `32`) to control how many documents are sent in a single request to the vLLM server, and `progress_bar` to toggle the progress indicator. 57 58 By default (`raise_on_failure=False`), failed embedding requests are logged and processing continues with the remaining documents. Set `raise_on_failure=True` to raise an exception instead. 59 60 ### Instructions 61 62 Some embedding models require prepending the document text with an instruction to work better for retrieval. For example, if you use [intfloat/e5-large-v2](https://huggingface.co/intfloat/e5-large-v2), you should prefix your document with the following instruction: "passage:". 63 64 This is how it works with `VLLMDocumentEmbedder`: 65 66 ```python 67 instruction = "passage:" 68 embedder = VLLMDocumentEmbedder( 69 model="intfloat/e5-large-v2", 70 prefix=instruction, 71 ) 72 ``` 73 74 ### Embedding metadata 75 76 Documents often come with a set of metadata. If they are distinctive and semantically meaningful, you can embed them along with the text of the document to improve retrieval. Pass the relevant fields through `meta_fields_to_embed`; they are concatenated to the document text using `embedding_separator` (a newline by default): 77 78 ```python 79 from haystack import Document 80 from haystack_integrations.components.embedders.vllm import VLLMDocumentEmbedder 81 82 doc = Document(content="some text", meta={"title": "relevant title", "page_number": 18}) 83 84 embedder = VLLMDocumentEmbedder( 85 model="google/embeddinggemma-300m", 86 meta_fields_to_embed=["title"], 87 ) 88 89 docs_with_embeddings = embedder.run(documents=[doc])["documents"] 90 ``` 91 92 ## Usage 93 94 Install the `vllm-haystack` package to use the `VLLMDocumentEmbedder`: 95 96 ```shell 97 pip install vllm-haystack 98 ``` 99 100 ### Starting the vLLM server 101 102 Before using this component, start a vLLM server with an embedding model: 103 104 ```bash 105 vllm serve google/embeddinggemma-300m 106 ``` 107 108 For details on server options, see the [vLLM CLI docs](https://docs.vllm.ai/en/stable/cli/serve/). 109 110 ### On its own 111 112 ```python 113 from haystack import Document 114 from haystack_integrations.components.embedders.vllm import VLLMDocumentEmbedder 115 116 doc = Document(content="I love pizza!") 117 118 document_embedder = VLLMDocumentEmbedder(model="google/embeddinggemma-300m") 119 120 result = document_embedder.run([doc]) 121 print(result["documents"][0].embedding) 122 123 ## [-0.0215301513671875, 0.01499176025390625, ...] 124 ``` 125 126 ### In a pipeline 127 128 ```python 129 from haystack import Document, Pipeline 130 from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever 131 from haystack.components.writers import DocumentWriter 132 from haystack.document_stores.in_memory import InMemoryDocumentStore 133 from haystack.document_stores.types import DuplicatePolicy 134 from haystack_integrations.components.embedders.vllm import ( 135 VLLMDocumentEmbedder, 136 VLLMTextEmbedder, 137 ) 138 139 document_store = InMemoryDocumentStore(embedding_similarity_function="cosine") 140 141 documents = [ 142 Document(content="My name is Wolfgang and I live in Berlin"), 143 Document(content="I saw a black horse running"), 144 Document(content="Germany has many big cities"), 145 ] 146 147 document_embedder = VLLMDocumentEmbedder(model="google/embeddinggemma-300m") 148 writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.OVERWRITE) 149 150 indexing_pipeline = Pipeline() 151 indexing_pipeline.add_component("document_embedder", document_embedder) 152 indexing_pipeline.add_component("writer", writer) 153 indexing_pipeline.connect("document_embedder", "writer") 154 155 indexing_pipeline.run({"document_embedder": {"documents": documents}}) 156 157 query_pipeline = Pipeline() 158 query_pipeline.add_component( 159 "text_embedder", 160 VLLMTextEmbedder(model="google/embeddinggemma-300m"), 161 ) 162 query_pipeline.add_component( 163 "retriever", 164 InMemoryEmbeddingRetriever(document_store=document_store), 165 ) 166 query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") 167 168 query = "Who lives in Berlin?" 169 170 result = query_pipeline.run({"text_embedder": {"text": query}}) 171 172 print(result["retriever"]["documents"][0]) 173 174 ## Document(id=..., content: 'My name is Wolfgang and I live in Berlin', score: ...) 175 ```