inmemoryembeddingretriever.mdx
1 --- 2 title: "InMemoryEmbeddingRetriever" 3 id: inmemoryembeddingretriever 4 slug: "/inmemoryembeddingretriever" 5 description: "Use this Retriever with the InMemoryDocumentStore if you're looking for embedding-based retrieval." 6 --- 7 8 # InMemoryEmbeddingRetriever 9 10 Use this Retriever with the InMemoryDocumentStore if you're looking for embedding-based retrieval. 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | In query pipelines: <br />In a RAG pipeline, before a [`PromptBuilder`](../builders/promptbuilder.mdx) <br />In a semantic search pipeline, as the last component <br />In an extractive QA pipeline, after a Tex tEmbedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx) | 17 | **Mandatory init variables** | `document_store`: An instance of [InMemoryDocumentStore](../../document-stores/inmemorydocumentstore.mdx) | 18 | **Mandatory run variables** | `query_embedding`: A list of floating point numbers | 19 | **Output variables** | `documents`: A list of documents | 20 | **API reference** | [Retrievers](/reference/retrievers-api) | 21 | **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/retrievers/in_memory/embedding_retriever.py | 22 23 </div> 24 25 ## Overview 26 27 The `InMemoryEmbeddingRetriever` is an embedding-based Retriever compatible with the `InMemoryDocumentStore`. It compares the query and Document embeddings and fetches the Documents most relevant to the query from the `InMemoryDocumentStore` based on the outcome. 28 29 When using the `InMemoryEmbeddingRetriever` in your NLP system, make sure it has the query and Document embeddings available. You can do so by adding a DocumentEmbedder to your indexing pipeline and a Text Embedder to your query pipeline. For details, see [Embedders](../embedders.mdx). 30 31 In addition to the `query_embedding`, the `InMemoryEmbeddingRetriever` accepts other optional parameters, including `top_k` (the maximum number of Documents to retrieve) and `filters` to narrow down the search space. 32 33 The `embedding_similarity_function` to use for embedding retrieval must be defined when the corresponding`InMemoryDocumentStore` is initialized. 34 35 ## Usage 36 37 ### In a pipeline 38 39 Use this Retriever in a query pipeline like this: 40 41 ```python 42 from haystack import Document, Pipeline 43 from haystack.document_stores.in_memory import InMemoryDocumentStore 44 from haystack.components.embedders import ( 45 SentenceTransformersTextEmbedder, 46 SentenceTransformersDocumentEmbedder, 47 ) 48 from haystack.components.retrievers import InMemoryEmbeddingRetriever 49 50 document_store = InMemoryDocumentStore(embedding_similarity_function="cosine") 51 52 documents = [ 53 Document(content="There are over 7,000 languages spoken around the world today."), 54 Document( 55 content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.", 56 ), 57 Document( 58 content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.", 59 ), 60 ] 61 62 document_embedder = SentenceTransformersDocumentEmbedder() 63 64 documents_with_embeddings = document_embedder.run(documents)["documents"] 65 document_store.write_documents(documents_with_embeddings) 66 67 query_pipeline = Pipeline() 68 query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder()) 69 query_pipeline.add_component( 70 "retriever", 71 InMemoryEmbeddingRetriever(document_store=document_store), 72 ) 73 query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") 74 75 query = "How many languages are there?" 76 77 result = query_pipeline.run({"text_embedder": {"text": query}}) 78 79 print(result["retriever"]["documents"][0]) 80 ```