Cradicle Explorer

/ docs-website / docs / pipeline-components / retrievers / pineconedenseretriever.mdx
pineconedenseretriever.mdx
  1  ---
  2  title: "PineconeEmbeddingRetriever"
  3  id: pineconedenseretriever
  4  slug: "/pineconedenseretriever"
  5  description: "An embedding-based Retriever compatible with the Pinecone Document Store."
  6  ---
  7  
  8  # PineconeEmbeddingRetriever
  9  
 10  An embedding-based Retriever compatible with the Pinecone Document Store.
 11  
 12  <div className="key-value-table">
 13  
 14  |  |  |
 15  | --- | --- |
 16  | **Most common position in a pipeline** | 1. After a Text Embedder and before a [`PromptBuilder`](../builders/promptbuilder.mdx)   in a RAG pipeline 2. The last component in the semantic search pipeline 3. After a Text Embedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx)   in an extractive QA pipeline |
 17  | **Mandatory init variables**           | `document_store`: An instance of a [PineconeDocumentStore](../../document-stores/pinecone-document-store.mdx)                                                                                                                                                                                   |
 18  | **Mandatory run variables**            | `query_embedding`: A vector representing the query (a list of floats)                                                                                                                                                                                                     |
 19  | **Output variables**                   | `documents`: A list of documents                                                                                                                                                                                                                                          |
 20  | **API reference**                      | [Pinecone](/reference/integrations-pinecone)                                                                                                                                                                                                                                     |
 21  | **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/pinecone                                                                                                                                                                                |
 22  
 23  </div>
 24  
 25  ## Overview
 26  
 27  The `PineconeEmbeddingRetriever` is an embedding-based Retriever compatible with the `PineconeDocumentStore`. It compares the query and Document embeddings and fetches the Documents most relevant to the query from the `PineconeDocumentStore` based on the outcome.
 28  
 29  When using the `PineconeEmbeddingRetriever` in your NLP system, make sure it has the query and Document embeddings available. You can do so by adding a Document Embedder to your indexing Pipeline and a Text Embedder to your query Pipeline.
 30  
 31  In addition to the `query_embedding`, the `PineconeEmbeddingRetriever` accepts other optional parameters, including `top_k` (the maximum number of Documents to retrieve) and `filters` to narrow down the search space.
 32  
 33  Some relevant parameters that impact the embedding retrieval must be defined when the corresponding `PineconeDocumentStore` is initialized: these include the `dimension` of the embeddings and the distance `metric` to use.
 34  
 35  ## Usage
 36  
 37  ### On its own
 38  
 39  This Retriever needs the `PineconeDocumentStore` and indexed Documents to run.
 40  
 41  ```python
 42  from haystack_integrations.components.retrievers.pinecone import (
 43      PineconeEmbeddingRetriever,
 44  )
 45  from haystack_integrations.document_stores.pinecone import PineconeDocumentStore
 46  
 47  ## Make sure you have the PINECONE_API_KEY environment variable set
 48  document_store = PineconeDocumentStore(
 49      index="my_index_with_documents",
 50      namespace="my_namespace",
 51      dimension=768,
 52  )
 53  
 54  retriever = PineconeEmbeddingRetriever(document_store=document_store)
 55  
 56  ## using an imaginary vector to keep the example simple, example run query:
 57  retriever.run(query_embedding=[0.1] * 768)
 58  ```
 59  
 60  ### In a pipeline
 61  
 62  Install the dependencies you’ll need:
 63  
 64  ```shell
 65  pip install pinecone-haystack
 66  pip install sentence-transformers
 67  ```
 68  
 69  Use this Retriever in a query Pipeline like this:
 70  
 71  ```python
 72  from haystack.document_stores.types import DuplicatePolicy
 73  from haystack import Document
 74  from haystack import Pipeline
 75  from haystack.components.embedders import (
 76      SentenceTransformersTextEmbedder,
 77      SentenceTransformersDocumentEmbedder,
 78  )
 79  from haystack_integrations.components.retrievers.pinecone import (
 80      PineconeEmbeddingRetriever,
 81  )
 82  from haystack_integrations.document_stores.pinecone import PineconeDocumentStore
 83  
 84  ## Make sure you have the PINECONE_API_KEY environment variable set
 85  document_store = PineconeDocumentStore(
 86      index="my_index",
 87      namespace="my_namespace",
 88      dimension=768,
 89  )
 90  
 91  documents = [
 92      Document(content="There are over 7,000 languages spoken around the world today."),
 93      Document(
 94          content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors.",
 95      ),
 96      Document(
 97          content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.",
 98      ),
 99  ]
100  
101  document_embedder = SentenceTransformersDocumentEmbedder()
102  documents_with_embeddings = document_embedder.run(documents)
103  
104  document_store.write_documents(
105      documents_with_embeddings.get("documents"),
106      policy=DuplicatePolicy.OVERWRITE,
107  )
108  
109  query_pipeline = Pipeline()
110  query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
111  query_pipeline.add_component(
112      "retriever",
113      PineconeEmbeddingRetriever(document_store=document_store),
114  )
115  query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
116  
117  query = "How many languages are there?"
118  
119  result = query_pipeline.run({"text_embedder": {"text": query}})
120  
121  print(result["retriever"]["documents"][0])
122  ```
123  
124  The example output would be:
125  
126  ```python
127  Document(id=cfe93bc1c274908801e6670440bf2bbba54fad792770d57421f85ffa2a4fcc94, content: 'There are over 7,000 languages spoken around the world today.', score: 0.87717235, embedding: vector of size 768)
128  ```