/ docs-website / versioned_docs / version-2.18 / pipeline-components / embedders / jinadocumentembedder.mdx
jinadocumentembedder.mdx
1 --- 2 title: "JinaDocumentEmbedder" 3 id: jinadocumentembedder 4 slug: "/jinadocumentembedder" 5 description: "This component computes the embeddings of a list of documents and stores the obtained vectors in the embedding field of each document. It uses Jina AI Embeddings models. The vectors computed by this component are necessary to perform embedding retrieval on a collection of documents. At retrieval time, the vector representing the query is compared with those of the documents to find the most similar or relevant documents." 6 --- 7 8 # JinaDocumentEmbedder 9 10 This component computes the embeddings of a list of documents and stores the obtained vectors in the embedding field of each document. It uses Jina AI Embeddings models. The vectors computed by this component are necessary to perform embedding retrieval on a collection of documents. At retrieval time, the vector representing the query is compared with those of the documents to find the most similar or relevant documents. 11 12 | | | 13 | --- | --- | 14 | **Most common position in a pipeline** | Before a [`DocumentWriter`](../writers/documentwriter.mdx) in an indexing pipeline | 15 | **Mandatory init variables** | "api_key": The Jina API key. Can be set with `JINA_API_KEY` env var. | 16 | **Mandatory run variables** | “documents”: A list of documents | 17 | **Output variables** | “documents”: A list of documents (enriched with embeddings) <br /> <br />”meta”: A dictionary of metadata | 18 | **API reference** | [Jina](/reference/integrations-jina) | 19 | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/jina | 20 21 ## Overview 22 23 `JinaDocumentEmbedder` enriches the metadata of documents with an embedding of their content. To embed a string, you should use the [`JinaTextEmbedder`](jinatextembedder.mdx). To see the list of compatible Jina Embeddings models, head to Jina AI’s [website](https://jina.ai/embeddings/). The default model for `JinaDocumentEmbedder` is `jina-embeddings-v2-base-en`. 24 25 To start using this integration with Haystack, install the package with: 26 27 ```shell 28 pip install jina-haystack 29 ``` 30 31 The component uses a `JINA_API_KEY` environment variable by default. Otherwise, you can pass an API key at initialization with `api_key`: 32 33 ```python 34 embedder = JinaDocumentEmbedder(api_key=Secret.from_token("<your-api-key>")) 35 ``` 36 37 To get a Jina Embeddings API key, head to https://jina.ai/embeddings/. 38 39 ### Embedding Metadata 40 41 Text documents often come with a set of metadata. If they are distinctive and semantically meaningful, you can embed them along with the text of the document to improve retrieval. 42 43 You can do this easily by using the Document Embedder: 44 45 ```python 46 from haystack import Document 47 from haystack_integrations.components.embedders.jina import JinaDocumentEmbedder 48 49 doc = Document(content="some text", meta={"title": "relevant title", "page number": 18}) 50 51 embedder = JinaDocumentEmbedder( 52 api_key=Secret.from_token("<your-api-key>"), 53 meta_fields_to_embed=["title"], 54 ) 55 56 docs_w_embeddings = embedder.run(documents=[doc])["documents"] 57 ``` 58 59 ## Usage 60 61 ### On its own 62 63 Here is how you can use the component on its own: 64 65 ```python 66 from haystack_integrations.components.embedders.jina import JinaDocumentEmbedder 67 68 doc = Document(content="I love pizza!") 69 70 document_embedder = JinaDocumentEmbedder(api_key=Secret.from_token("<your-api-key>")) 71 72 result = document_embedder.run([doc]) 73 print(result["documents"][0].embedding) 74 75 ## [0.017020374536514282, -0.023255806416273117, ...] 76 ``` 77 78 :::note 79 We recommend setting JINA_API_KEY as an environment variable instead of setting it as a parameter. 80 81 ::: 82 83 ### In a pipeline 84 85 ```python 86 from haystack import Pipeline 87 from haystack.document_stores.in_memory import InMemoryDocumentStore 88 from haystack_integrations.components.embedders.jina import JinaDocumentEmbedder 89 from haystack_integrations.components.embedders.jina import JinaTextEmbedder 90 from haystack.components.writers import DocumentWriter 91 from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever 92 93 document_store = InMemoryDocumentStore(embedding_similarity_function="cosine") 94 95 documents = [ 96 Document(content="My name is Wolfgang and I live in Berlin"), 97 Document(content="I saw a black horse running"), 98 Document(content="Germany has many big cities"), 99 ] 100 101 indexing_pipeline = Pipeline() 102 indexing_pipeline.add_component( 103 "embedder", 104 JinaDocumentEmbedder(api_key=Secret.from_token("<your-api-key>")), 105 ) 106 indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store)) 107 indexing_pipeline.connect("embedder", "writer") 108 109 indexing_pipeline.run({"embedder": {"documents": documents}}) 110 111 query_pipeline = Pipeline() 112 query_pipeline.add_component( 113 "text_embedder", 114 JinaTextEmbedder(api_key=Secret.from_token("<your-api-key>")), 115 ) 116 query_pipeline.add_component( 117 "retriever", 118 InMemoryEmbeddingRetriever(document_store=document_store), 119 ) 120 query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") 121 122 query = "Who lives in Berlin?" 123 124 result = query_pipeline.run({"text_embedder": {"text": query}}) 125 126 print(result["retriever"]["documents"][0]) 127 128 ## Document(id=..., mimetype: 'text/plain', 129 ## text: 'My name is Wolfgang and I live in Berlin') 130 ``` 131 132 ## Additional References 133 134 🧑🍳 Cookbook: [Using the Jina-embeddings-v2-base-en model in a Haystack RAG pipeline for legal document analysis](https://haystack.deepset.ai/cookbook/jina-embeddings-v2-legal-analysis-rag)