Cradicle Explorer

/ docs-website / docs / pipeline-components / embedders / fastembedsparsetextembedder.mdx
fastembedsparsetextembedder.mdx
  1  ---
  2  title: "FastembedSparseTextEmbedder"
  3  id: fastembedsparsetextembedder
  4  slug: "/fastembedsparsetextembedder"
  5  description: "Use this component to embed a simple string (such as a query) into a sparse vector."
  6  ---
  7  
  8  # FastembedSparseTextEmbedder
  9  
 10  Use this component to embed a simple string (such as a query) into a sparse vector.
 11  
 12  <div className="key-value-table">
 13  
 14  |  |  |
 15  | --- | --- |
 16  | **Most common position in a pipeline** | Before a sparse embedding [Retriever](../retrievers.mdx)  in a query/RAG pipeline            |
 17  | **Mandatory run variables**            | `text`: A string                                                                            |
 18  | **Output variables**                   | `sparse_embedding`: A [`SparseEmbedding`](../../concepts/data-classes.mdx#sparseembedding)  object       |
 19  | **API reference**                      | [FastEmbed](/reference/fastembed-embedders)                                                        |
 20  | **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/fastembed |
 21  
 22  </div>
 23  
 24  For embedding lists of documents, use the [`FastembedSparseDocumentEmbedder`](fastembedsparsedocumentembedder.mdx), which enriches the document with the computed sparse embedding.
 25  
 26  ## Overview
 27  
 28  `FastembedSparseTextEmbedder` transforms a string into a sparse vector using sparse embedding [models](https://qdrant.github.io/fastembed/examples/Supported_Models/#supported-sparse-text-embedding-models) supported by FastEmbed.
 29  
 30  When you perform sparse embedding retrieval, use this component first to transform your query into a sparse vector. Then, the sparse embedding Retriever will use the vector to search for similar or relevant documents.
 31  
 32  ### Compatible Models
 33  
 34  You can find the supported models in the [FastEmbed documentation](https://qdrant.github.io/fastembed/examples/Supported_Models/#supported-sparse-text-embedding-models).
 35  
 36  Currently, supported models are based on SPLADE, a technique for producing sparse representations for text, where each non-zero value in the embedding is the importance weight of a term in the BERT WordPiece vocabulary. For more information, see [our docs](../retrievers.mdx#sparse-embedding-based-retrievers) that explain sparse embedding-based Retrievers further.
 37  
 38  ### Installation
 39  
 40  To start using this integration with Haystack, install the package with:
 41  
 42  ```shell
 43  pip install fastembed-haystack
 44  ```
 45  
 46  ### Parameters
 47  
 48  You can set the path where the model will be stored in a cache directory. Also, you can set the number of threads a single `onnxruntime` session can use:
 49  
 50  ```python
 51  cache_dir = "/your_cacheDirectory"
 52  embedder = FastembedSparseTextEmbedder(
 53      model="prithivida/Splade_PP_en_v1",
 54      cache_dir=cache_dir,
 55      threads=2,
 56  )
 57  ```
 58  
 59  If you want to use the data parallel encoding, you can set the `parallel` parameter.
 60  
 61  - If `parallel` > 1, data-parallel encoding will be used. This is recommended for offline encoding of large datasets.
 62  - If `parallel` is 0, use all available cores.
 63  - If None, don't use data-parallel processing; use the default `onnxruntime` threading instead.
 64  
 65  :::tip
 66  If you create both a Sparse Text Embedder and a Sparse Document Embedder based on the same model, Haystack utilizes a shared resource behind the scenes to conserve resources.
 67  :::
 68  
 69  ## Usage
 70  
 71  ### On its own
 72  
 73  ```python
 74  from haystack_integrations.components.embedders.fastembed import (
 75      FastembedSparseTextEmbedder,
 76  )
 77  
 78  text = """It clearly says online this will work on a Mac OS system.
 79  The disk comes and it does not, only Windows.
 80  Do Not order this if you have a Mac!!"""
 81  
 82  text_embedder = FastembedSparseTextEmbedder(model="prithivida/Splade_PP_en_v1")
 83  
 84  sparse_embedding = text_embedder.run(text)["sparse_embedding"]
 85  ```
 86  
 87  ### In a pipeline
 88  
 89  Currently, sparse embedding retrieval is only supported by `QdrantDocumentStore`.
 90  First, install the package with:
 91  
 92  ```shell
 93  pip install qdrant-haystack
 94  ```
 95  
 96  Then, try out this pipeline:
 97  
 98  ```python
 99  from haystack import Document, Pipeline
100  from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
101  from haystack_integrations.components.retrievers.qdrant import (
102      QdrantSparseEmbeddingRetriever,
103  )
104  from haystack_integrations.components.embedders.fastembed import (
105      FastembedSparseTextEmbedder,
106      FastembedSparseDocumentEmbedder,
107      FastembedTextEmbedder,
108  )
109  
110  document_store = QdrantDocumentStore(
111      ":memory:",
112      recreate_index=True,
113      use_sparse_embeddings=True,
114  )
115  
116  documents = [
117      Document(content="My name is Wolfgang and I live in Berlin"),
118      Document(content="I saw a black horse running"),
119      Document(content="Germany has many big cities"),
120      Document(content="fastembed is supported by and maintained by Qdrant."),
121  ]
122  
123  sparse_document_embedder = FastembedSparseDocumentEmbedder(
124      model="prithivida/Splade_PP_en_v1",
125  )
126  
127  documents_with_sparse_embeddings = sparse_document_embedder.run(documents)["documents"]
128  document_store.write_documents(documents_with_sparse_embeddings)
129  
130  query_pipeline = Pipeline()
131  query_pipeline.add_component("sparse_text_embedder", FastembedSparseTextEmbedder())
132  query_pipeline.add_component(
133      "sparse_retriever",
134      QdrantSparseEmbeddingRetriever(document_store=document_store),
135  )
136  query_pipeline.connect(
137      "sparse_text_embedder.sparse_embedding",
138      "sparse_retriever.query_sparse_embedding",
139  )
140  
141  query = "Who supports fastembed?"
142  
143  result = query_pipeline.run({"sparse_text_embedder": {"text": query}})
144  
145  print(result["sparse_retriever"]["documents"][0])  # noqa: T201
146  
147  ## Document(id=...,
148  ## content: 'fastembed is supported by and maintained by Qdrant.',
149  ## score: 0.561..)
150  ```
151  
152  ## Additional References
153  
154  🧑‍🍳 Cookbook: [Sparse Embedding Retrieval with Qdrant and FastEmbed](https://haystack.deepset.ai/cookbook/sparse_embedding_retrieval)