Cradicle Explorer

/ docs-website / docs / pipeline-components / retrievers / inmemoryembeddingretriever.mdx
inmemoryembeddingretriever.mdx
 1  ---
 2  title: "InMemoryEmbeddingRetriever"
 3  id: inmemoryembeddingretriever
 4  slug: "/inmemoryembeddingretriever"
 5  description: "Use this Retriever with the InMemoryDocumentStore if you're looking for embedding-based retrieval."
 6  ---
 7  
 8  # InMemoryEmbeddingRetriever
 9  
10  Use this Retriever with the InMemoryDocumentStore if you're looking for embedding-based retrieval.
11  
12  <div className="key-value-table">
13  
14  |  |  |
15  | --- | --- |
16  | **Most common position in a pipeline** | In query pipelines:  <br />In a RAG pipeline, before a [`PromptBuilder`](../builders/promptbuilder.mdx)  <br />In a semantic search pipeline, as the last component  <br />In an extractive QA pipeline, after a Tex tEmbedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx) |
17  | **Mandatory init variables** | `document_store`: An instance of [InMemoryDocumentStore](../../document-stores/inmemorydocumentstore.mdx) |
18  | **Mandatory run variables** | `query_embedding`: A list of floating point numbers |
19  | **Output variables** | `documents`: A list of documents |
20  | **API reference** | [Retrievers](/reference/retrievers-api) |
21  | **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/retrievers/in_memory/embedding_retriever.py |
22  
23  </div>
24  
25  ## Overview
26  
27  The `InMemoryEmbeddingRetriever` is an embedding-based Retriever compatible with the `InMemoryDocumentStore`. It compares the query and Document embeddings and fetches the Documents most relevant to the query from the `InMemoryDocumentStore` based on the outcome.
28  
29  When using the `InMemoryEmbeddingRetriever` in your NLP system, make sure it has the query and Document embeddings available. You can do so by adding a DocumentEmbedder to your indexing pipeline and a Text Embedder to your query pipeline. For details, see [Embedders](../embedders.mdx).
30  
31  In addition to the `query_embedding`, the `InMemoryEmbeddingRetriever` accepts other optional parameters, including `top_k` (the maximum number of Documents to retrieve) and `filters` to narrow down the search space.
32  
33  The `embedding_similarity_function`  to use for embedding retrieval must be defined when the corresponding`InMemoryDocumentStore` is initialized.
34  
35  ## Usage
36  
37  ### In a pipeline
38  
39  Use this Retriever in a query pipeline like this:
40  
41  ```python
42  from haystack import Document, Pipeline
43  from haystack.document_stores.in_memory import InMemoryDocumentStore
44  from haystack.components.embedders import (
45      SentenceTransformersTextEmbedder,
46      SentenceTransformersDocumentEmbedder,
47  )
48  from haystack.components.retrievers import InMemoryEmbeddingRetriever
49  
50  document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
51  
52  documents = [
53      Document(content="There are over 7,000 languages spoken around the world today."),
54      Document(
55          content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.",
56      ),
57      Document(
58          content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.",
59      ),
60  ]
61  
62  document_embedder = SentenceTransformersDocumentEmbedder()
63  
64  documents_with_embeddings = document_embedder.run(documents)["documents"]
65  document_store.write_documents(documents_with_embeddings)
66  
67  query_pipeline = Pipeline()
68  query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
69  query_pipeline.add_component(
70      "retriever",
71      InMemoryEmbeddingRetriever(document_store=document_store),
72  )
73  query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
74  
75  query = "How many languages are there?"
76  
77  result = query_pipeline.run({"text_embedder": {"text": query}})
78  
79  print(result["retriever"]["documents"][0])
80  ```