coheredocumentembedder.mdx
  1  ---
  2  title: "CohereDocumentEmbedder"
  3  id: coheredocumentembedder
  4  slug: "/coheredocumentembedder"
  5  description: "This component computes the embeddings of a list of documents and stores the obtained vectors in the embedding field of each document. It uses Cohere embedding models."
  6  ---
  7  
  8  # CohereDocumentEmbedder
  9  
 10  This component computes the embeddings of a list of documents and stores the obtained vectors in the embedding field of each document. It uses Cohere embedding models.
 11  
 12  The vectors computed by this component are necessary to perform embedding retrieval on a collection of documents. At retrieval time, the vector that represents the query is compared with those of the documents to find the most similar or relevant documents.
 13  
 14  <div className="key-value-table">
 15  
 16  |  |  |
 17  | --- | --- |
 18  | **Most common position in a pipeline** | Before a [`DocumentWriter`](../writers/documentwriter.mdx)   in an indexing pipeline |
 19  | **Mandatory init variables** | `api_key`: The Cohere API key. Can be set with `COHERE_API_KEY` or `CO_API_KEY` env var. |
 20  | **Mandatory run variables** | `documents`: A list of documents to be embedded |
 21  | **Output variables** | `documents`: A list of documents (enriched with embeddings)  <br /> <br />`meta`: A dictionary of metadata strings |
 22  | **API reference** | [Cohere](/reference/integrations-cohere) |
 23  | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/cohere |
 24  
 25  </div>
 26  
 27  ## Overview
 28  
 29  `CohereDocumentEmbedder` enriches the metadata of documents with an embedding of their content. To embed a string, you should use the [`CohereTextEmbedder`](coheretextembedder.mdx).
 30  
 31  The component supports the following Cohere models:
 32  `"embed-english-v3.0"`, `"embed-english-light-v3.0"`, `"embed-multilingual-v3.0"`,
 33  `"embed-multilingual-light-v3.0"`, `"embed-english-v2.0"`, `"embed-english-light-v2.0"`,
 34  `"embed-multilingual-v2.0"`. The default model is `embed-english-v2.0`. This list of all supported models can be found in Cohere’s [model documentation](https://docs.cohere.com/docs/models#representation).
 35  
 36  To start using this integration with Haystack, install it with:
 37  
 38  ```shell
 39  pip install cohere-haystack
 40  ```
 41  
 42  The component uses a `COHERE_API_KEY` or `CO_API_KEY` environment variable by default. Otherwise, you can pass an API key at initialization with `api_key`:
 43  
 44  ```python
 45  embedder = CohereDocumentEmbedder(api_key=Secret.from_token("<your-api-key>"))
 46  ```
 47  
 48  To get a Cohere API key, head over to https://cohere.com/.
 49  
 50  ### Embedding Metadata
 51  
 52  Text documents often come with a set of metadata. If they are distinctive and semantically meaningful, you can embed them along with the text of the document to improve retrieval.
 53  
 54  You can do this by using the Document Embedder:
 55  
 56  ```python
 57  from haystack import Document
 58  from cohere_haystack.embedders.document_embedder import CohereDocumentEmbedder
 59  
 60  doc = Document(content="some text", meta={"title": "relevant title", "page number": 18})
 61  
 62  embedder = CohereDocumentEmbedder(api_key=Secret.from_token("<your-api-key>", meta_fields_to_embed=["title"])
 63  
 64  docs_w_embeddings = embedder.run(documents=[doc])["documents"]
 65  ```
 66  
 67  ## Usage
 68  
 69  ### On its own
 70  
 71  Remember to set `COHERE_API_KEY` as an environment variable first, or pass it in directly.
 72  
 73  Here is how you can use the component on its own:
 74  
 75  ```python
 76  from haystack import Document
 77  from haystack_integrations.components.embedders.cohere.document_embedder import (
 78      CohereDocumentEmbedder,
 79  )
 80  
 81  doc = Document(content="I love pizza!")
 82  
 83  embedder = CohereDocumentEmbedder()
 84  
 85  result = embedder.run([doc])
 86  print(result["documents"][0].embedding)
 87  ## [-0.453125, 1.2236328, 2.0058594, 0.67871094...]
 88  ```
 89  
 90  ### In a pipeline
 91  
 92  ```python
 93  from haystack import Pipeline
 94  from haystack.document_stores.in_memory import InMemoryDocumentStore
 95  from haystack.components.writers import DocumentWriter
 96  from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
 97  
 98  from haystack_integrations.components.embedders.cohere.document_embedder import (
 99      CohereDocumentEmbedder,
100  )
101  from haystack_integrations.components.embedders.cohere.text_embedder import (
102      CohereTextEmbedder,
103  )
104  
105  document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
106  
107  documents = [
108      Document(content="My name is Wolfgang and I live in Berlin"),
109      Document(content="I saw a black horse running"),
110      Document(content="Germany has many big cities"),
111  ]
112  
113  indexing_pipeline = Pipeline()
114  indexing_pipeline.add_component("embedder", CohereDocumentEmbedder())
115  indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
116  indexing_pipeline.connect("embedder", "writer")
117  
118  indexing_pipeline.run({"embedder": {"documents": documents}})
119  
120  query_pipeline = Pipeline()
121  query_pipeline.add_component("text_embedder", CohereTextEmbedder())
122  query_pipeline.add_component(
123      "retriever",
124      InMemoryEmbeddingRetriever(document_store=document_store),
125  )
126  query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
127  
128  query = "Who lives in Berlin?"
129  
130  result = query_pipeline.run({"text_embedder": {"text": query}})
131  
132  print(result["retriever"]["documents"][0])
133  
134  ## Document(id=..., text: 'My name is Wolfgang and I live in Berlin')
135  ```