/ docs-website / versioned_docs / version-2.22 / pipeline-components / embedders / fastembedtextembedder.mdx
fastembedtextembedder.mdx
1 --- 2 title: "FastembedTextEmbedder" 3 id: fastembedtextembedder 4 slug: "/fastembedtextembedder" 5 description: "This component computes the embeddings of a string using embedding models supported by FastEmbed." 6 --- 7 8 # FastembedTextEmbedder 9 10 This component computes the embeddings of a string using embedding models supported by FastEmbed. 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | Before an embedding [Retriever](../retrievers.mdx) in a query/RAG pipeline | 17 | **Mandatory run variables** | `text`: A string | 18 | **Output variables** | `embedding`: A vector (list of float numbers) | 19 | **API reference** | [FastEmbed](/reference/fastembed-embedders) | 20 | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/fastembed | 21 22 </div> 23 24 This component should be used to embed a simple string (such as a query) into a vector. For embedding lists of documents, use the [`FastembedDocumentEmbedder`](fastembeddocumentembedder.mdx), which enriches the document with the computed embedding, known as vector. 25 26 ## Overview 27 28 `FastembedTextEmbedder` transforms a string into a vector that captures its semantics using embedding [models supported by FastEmbed](https://qdrant.github.io/fastembed/examples/Supported_Models/). 29 30 When you perform embedding retrieval, use this component first to transform your query into a vector. Then, the embedding Retriever will use the vector to search for similar or relevant documents. 31 32 ### Compatible models 33 34 You can find the original models in the [FastEmbed documentation](https://qdrant.github.io/fastembed/). 35 36 Currently, most of the models in the [Massive Text Embedding Benchmark (MTEB) Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) are compatible with FastEmbed. You can look for compatibility in the [supported model list](https://qdrant.github.io/fastembed/examples/Supported_Models/). 37 38 ### Installation 39 40 To start using this integration with Haystack, install the package with: 41 42 ```bash 43 pip install fastembed-haystack 44 ``` 45 46 ### Instructions 47 48 Some recent models that you can find in MTEB require prepending the text with an instruction to work better for retrieval. 49 For example, if you use `[BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5#model-list)` model, you should prefix your query with the `instruction: “passage:”`. 50 51 This is how it works with `FastembedTextEmbedder`: 52 53 ```python 54 instruction = "passage:" 55 embedder = FastembedTextEmbedder( 56 *model="*BAAI/bge-large-en-v1.5", 57 prefix=instruction) 58 ``` 59 60 ### Parameters 61 62 You can set the path where the model will be stored in a cache directory. Also, you can set the number of threads a single `onnxruntime` session can use. 63 64 ```python 65 cache_dir= "/your_cacheDirectory" 66 embedder = FastembedTextEmbedder( 67 *model="*BAAI/bge-large-en-v1.5", 68 cache_dir=cache_dir, 69 threads=2 70 ) 71 ``` 72 73 If you want to use the data parallel encoding, you can set the parameters `parallel` and `batch_size`. 74 75 - If parallel > 1, data-parallel encoding will be used. This is recommended for offline encoding of large datasets. 76 - If parallel is 0, use all available cores. 77 - If None, don't use data-parallel processing; use default `onnxruntime` threading instead. 78 79 :::tip 80 If you create a Text Embedder and a Document Embedder based on the same model, Haystack uses the same resource behind the scenes to save resources. 81 ::: 82 83 ## Usage 84 85 ### On its own 86 87 ```python 88 from haystack_integrations.components.embedders.fastembed import FastembedTextEmbedder 89 90 text = """It clearly says online this will work on a Mac OS system. 91 The disk comes and it does not, only Windows. 92 Do Not order this if you have a Mac!!""" 93 text_embedder = FastembedTextEmbedder(model="BAAI/bge-small-en-v1.5") 94 text_embedder.warm_up() 95 embedding = text_embedder.run(text)["embedding"] 96 ``` 97 98 ### In a pipeline 99 100 ```python 101 from haystack import Document, Pipeline 102 from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever 103 from haystack.document_stores.in_memory import InMemoryDocumentStore 104 from haystack_integrations.components.embedders.fastembed import ( 105 FastembedDocumentEmbedder, 106 FastembedTextEmbedder, 107 ) 108 109 document_store = InMemoryDocumentStore(embedding_similarity_function="cosine") 110 111 documents = [ 112 Document(content="My name is Wolfgang and I live in Berlin"), 113 Document(content="I saw a black horse running"), 114 Document(content="Germany has many big cities"), 115 Document(content="fastembed is supported by and maintained by Qdrant."), 116 ] 117 118 document_embedder = FastembedDocumentEmbedder() 119 document_embedder.warm_up() 120 documents_with_embeddings = document_embedder.run(documents)["documents"] 121 document_store.write_documents(documents_with_embeddings) 122 123 query_pipeline = Pipeline() 124 query_pipeline.add_component("text_embedder", FastembedTextEmbedder()) 125 query_pipeline.add_component( 126 "retriever", 127 InMemoryEmbeddingRetriever(document_store=document_store), 128 ) 129 query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") 130 131 query = "Who supports FastEmbed?" 132 133 result = query_pipeline.run({"text_embedder": {"text": query}}) 134 135 print(result["retriever"]["documents"][0]) # noqa: T201 136 137 ## Document(id=..., 138 ## content: 'FastEmbed is supported by and maintained by Qdrant.', 139 ## score: 0.758..) 140 ``` 141 142 ## Additional References 143 144 🧑🍳 Cookbook: [RAG Pipeline Using FastEmbed for Embeddings Generation](https://haystack.deepset.ai/cookbook/rag_fastembed)