pineconedenseretriever.mdx
1 --- 2 title: "PineconeEmbeddingRetriever" 3 id: pineconedenseretriever 4 slug: "/pineconedenseretriever" 5 description: "An embedding-based Retriever compatible with the Pinecone Document Store." 6 --- 7 8 # PineconeEmbeddingRetriever 9 10 An embedding-based Retriever compatible with the Pinecone Document Store. 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | 1. After a Text Embedder and before a [`PromptBuilder`](../builders/promptbuilder.mdx) in a RAG pipeline 2. The last component in the semantic search pipeline 3. After a Text Embedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx) in an extractive QA pipeline | 17 | **Mandatory init variables** | `document_store`: An instance of a [PineconeDocumentStore](../../document-stores/pinecone-document-store.mdx) | 18 | **Mandatory run variables** | `query_embedding`: A vector representing the query (a list of floats) | 19 | **Output variables** | `documents`: A list of documents | 20 | **API reference** | [Pinecone](/reference/integrations-pinecone) | 21 | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/pinecone | 22 23 </div> 24 25 ## Overview 26 27 The `PineconeEmbeddingRetriever` is an embedding-based Retriever compatible with the `PineconeDocumentStore`. It compares the query and Document embeddings and fetches the Documents most relevant to the query from the `PineconeDocumentStore` based on the outcome. 28 29 When using the `PineconeEmbeddingRetriever` in your NLP system, make sure it has the query and Document embeddings available. You can do so by adding a Document Embedder to your indexing Pipeline and a Text Embedder to your query Pipeline. 30 31 In addition to the `query_embedding`, the `PineconeEmbeddingRetriever` accepts other optional parameters, including `top_k` (the maximum number of Documents to retrieve) and `filters` to narrow down the search space. 32 33 Some relevant parameters that impact the embedding retrieval must be defined when the corresponding `PineconeDocumentStore` is initialized: these include the `dimension` of the embeddings and the distance `metric` to use. 34 35 ## Usage 36 37 ### On its own 38 39 This Retriever needs the `PineconeDocumentStore` and indexed Documents to run. 40 41 ```python 42 from haystack_integrations.components.retrievers.pinecone import ( 43 PineconeEmbeddingRetriever, 44 ) 45 from haystack_integrations.document_stores.pinecone import PineconeDocumentStore 46 47 ## Make sure you have the PINECONE_API_KEY environment variable set 48 document_store = PineconeDocumentStore( 49 index="my_index_with_documents", 50 namespace="my_namespace", 51 dimension=768, 52 ) 53 54 retriever = PineconeEmbeddingRetriever(document_store=document_store) 55 56 ## using an imaginary vector to keep the example simple, example run query: 57 retriever.run(query_embedding=[0.1] * 768) 58 ``` 59 60 ### In a pipeline 61 62 Install the dependencies you’ll need: 63 64 ```shell 65 pip install pinecone-haystack 66 pip install sentence-transformers 67 ``` 68 69 Use this Retriever in a query Pipeline like this: 70 71 ```python 72 from haystack.document_stores.types import DuplicatePolicy 73 from haystack import Document 74 from haystack import Pipeline 75 from haystack.components.embedders import ( 76 SentenceTransformersTextEmbedder, 77 SentenceTransformersDocumentEmbedder, 78 ) 79 from haystack_integrations.components.retrievers.pinecone import ( 80 PineconeEmbeddingRetriever, 81 ) 82 from haystack_integrations.document_stores.pinecone import PineconeDocumentStore 83 84 ## Make sure you have the PINECONE_API_KEY environment variable set 85 document_store = PineconeDocumentStore( 86 index="my_index", 87 namespace="my_namespace", 88 dimension=768, 89 ) 90 91 documents = [ 92 Document(content="There are over 7,000 languages spoken around the world today."), 93 Document( 94 content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.", 95 ), 96 Document( 97 content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.", 98 ), 99 ] 100 101 document_embedder = SentenceTransformersDocumentEmbedder() 102 documents_with_embeddings = document_embedder.run(documents) 103 104 document_store.write_documents( 105 documents_with_embeddings.get("documents"), 106 policy=DuplicatePolicy.OVERWRITE, 107 ) 108 109 query_pipeline = Pipeline() 110 query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder()) 111 query_pipeline.add_component( 112 "retriever", 113 PineconeEmbeddingRetriever(document_store=document_store), 114 ) 115 query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") 116 117 query = "How many languages are there?" 118 119 result = query_pipeline.run({"text_embedder": {"text": query}}) 120 121 print(result["retriever"]["documents"][0]) 122 ``` 123 124 The example output would be: 125 126 ```python 127 Document(id=cfe93bc1c274908801e6670440bf2bbba54fad792770d57421f85ffa2a4fcc94, content: 'There are over 7,000 languages spoken around the world today.', score: 0.87717235, embedding: vector of size 768) 128 ```