Cradicle Explorer

/ docs-website / docs / pipeline-components / retrievers / faissembeddingretriever.mdx
faissembeddingretriever.mdx
 1  ---
 2  title: "FAISSEmbeddingRetriever"
 3  id: faissembeddingretriever
 4  slug: "/faissembeddingretriever"
 5  description: "An embedding-based Retriever compatible with the FAISSDocumentStore."
 6  ---
 7  
 8  # FAISSEmbeddingRetriever
 9  
10  An embedding-based Retriever compatible with the FAISSDocumentStore.
11  
12  <div className="key-value-table">
13  
14  |  |  |
15  | --- | --- |
16  | **Most common position in a pipeline** | 1. After a Text Embedder and before a [`PromptBuilder`](../builders/promptbuilder.mdx) in a RAG pipeline 2. The last component in a semantic search pipeline 3. After a Text Embedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx) in an extractive QA pipeline |
17  | **Mandatory init variables**           | `document_store`: An instance of a [`FAISSDocumentStore`](../../document-stores/faissdocumentstore.mdx) |
18  | **Mandatory run variables**            | `query_embedding`: A vector representing the query (a list of floats) |
19  | **Output variables**                   | `documents`: A list of documents |
20  | **API reference**                      | [FAISS](/reference/integrations-faiss) |
21  | **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/faiss |
22  
23  </div>
24  
25  ## Overview
26  
27  The `FAISSEmbeddingRetriever` is an embedding-based Retriever that queries a `FAISSDocumentStore`. It compares the query embedding to document embeddings stored in FAISS and returns the most similar documents.
28  
29  This Retriever expects precomputed embeddings in the Document Store and a query embedding at runtime. You can generate them with a Document Embedder in your indexing pipeline and a Text Embedder in your query pipeline.
30  
31  In addition to `query_embedding`, you can pass:
32  
33  - `top_k`: The maximum number of documents to return.
34  - `filters`: Metadata filters to restrict retrieved documents.
35  
36  You can also configure default filters and `filter_policy` at initialization.
37  
38  ## Usage
39  
40  ### On its own
41  
42  ```python
43  from haystack_integrations.document_stores.faiss import FAISSDocumentStore
44  from haystack_integrations.components.retrievers.faiss import FAISSEmbeddingRetriever
45  
46  document_store = FAISSDocumentStore(embedding_dim=768)
47  retriever = FAISSEmbeddingRetriever(document_store=document_store, top_k=5)
48  
49  # Example query embedding
50  result = retriever.run(query_embedding=[0.1] * 768)
51  print(result["documents"])
52  ```
53  
54  ### In a pipeline
55  
56  ```python
57  from haystack import Document, Pipeline
58  from haystack.components.embedders import (
59      SentenceTransformersDocumentEmbedder,
60      SentenceTransformersTextEmbedder,
61  )
62  from haystack.document_stores.types import DuplicatePolicy
63  from haystack_integrations.document_stores.faiss import FAISSDocumentStore
64  from haystack_integrations.components.retrievers.faiss import FAISSEmbeddingRetriever
65  
66  document_store = FAISSDocumentStore(embedding_dim=768)
67  
68  documents = [
69      Document(content="There are over 7,000 languages spoken around the world today."),
70      Document(
71          content="Elephants have been observed to behave in a way that indicates a high level of intelligence.",
72      ),
73      Document(
74          content="In certain places, you can witness the phenomenon of bioluminescent waves.",
75      ),
76  ]
77  
78  document_embedder = SentenceTransformersDocumentEmbedder()
79  documents_with_embeddings = document_embedder.run(documents)["documents"]
80  document_store.write_documents(
81      documents_with_embeddings,
82      policy=DuplicatePolicy.OVERWRITE,
83  )
84  
85  query_pipeline = Pipeline()
86  query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
87  query_pipeline.add_component(
88      "retriever",
89      FAISSEmbeddingRetriever(document_store=document_store),
90  )
91  query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
92  
93  query = "How many languages are there?"
94  result = query_pipeline.run({"text_embedder": {"text": query}})
95  
96  print(result["retriever"]["documents"][0])
97  ```