/ docs-website / versioned_docs / version-2.20 / pipeline-components / embedders / watsonxdocumentembedder.mdx
watsonxdocumentembedder.mdx
1 --- 2 title: "WatsonxDocumentEmbedder" 3 id: watsonxdocumentembedder 4 slug: "/watsonxdocumentembedder" 5 description: "The vectors computed by this component are necessary to perform embedding retrieval on a collection of documents. At retrieval time, the vector that represents the query is compared with those of the documents to find the most similar or relevant documents." 6 --- 7 8 # WatsonxDocumentEmbedder 9 10 The vectors computed by this component are necessary to perform embedding retrieval on a collection of documents. At retrieval time, the vector that represents the query is compared with those of the documents to find the most similar or relevant documents. 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | Before a [`DocumentWriter`](../writers/documentwriter.mdx) in an indexing pipeline | 17 | **Mandatory init variables** | `api_key`: The IBM Cloud API key. Can be set with `WATSONX_API_KEY` env var. <br /> <br />`project_id`: The IBM Cloud project ID. Can be set with `WATSONX_PROJECT_ID` env var. | 18 | **Mandatory run variables** | `documents`: A list of documents to be embedded | 19 | **Output variables** | `documents`: A list of documents (enriched with embeddings) <br /> <br />`meta`: A dictionary of metadata strings | 20 | **API reference** | [Watsonx](/reference/integrations-watsonx) | 21 | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/watsonx | 22 23 </div> 24 25 ## Overview 26 27 `WatsonxDocumentEmbedder` enriches the metadata of documents with an embedding of their content. To embed a string, you should use the [`WatsonxTextEmbedder`](watsonxtextembedder.mdx). 28 29 The component supports IBM watsonx.ai embedding models such as `ibm/slate-30m-english-rtrvr` and similar. The default model is `ibm/slate-30m-english-rtrvr`. This list of all supported models can be found in IBM's [model documentation](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-models-embed.html?context=wx). 30 31 To start using this integration with Haystack, install it with: 32 33 ```shell 34 pip install watsonx-haystack 35 ``` 36 37 The component uses `WATSONX_API_KEY` and `WATSONX_PROJECT_ID` environment variables by default. Otherwise, you can pass API credentials at initialization with `api_key` and `project_id`: 38 39 ```python 40 embedder = WatsonxDocumentEmbedder( 41 api_key=Secret.from_token("<your-api-key>"), 42 project_id=Secret.from_token("<your-project-id>"), 43 ) 44 ``` 45 46 To get IBM Cloud credentials, head over to https://cloud.ibm.com/. 47 48 ### Embedding Metadata 49 50 Text documents often come with a set of metadata. If they are distinctive and semantically meaningful, you can embed them along with the text of the document to improve retrieval. 51 52 You can do this by using the Document Embedder: 53 54 ```python 55 from haystack import Document 56 from haystack_integrations.components.embedders.watsonx.document_embedder import ( 57 WatsonxDocumentEmbedder, 58 ) 59 from haystack.utils import Secret 60 61 doc = Document(content="some text", meta={"title": "relevant title", "page number": 18}) 62 63 embedder = WatsonxDocumentEmbedder( 64 api_key=Secret.from_env_var("WATSONX_API_KEY"), 65 project_id=Secret.from_env_var("WATSONX_PROJECT_ID"), 66 meta_fields_to_embed=["title"], 67 ) 68 69 docs_w_embeddings = embedder.run(documents=[doc])["documents"] 70 ``` 71 72 ## Usage 73 74 Install the `watsonx-haystack` package to use the `WatsonxDocumentEmbedder`: 75 76 ```shell 77 pip install watsonx-haystack 78 ``` 79 80 ### On its own 81 82 Remember to set `WATSONX_API_KEY` and `WATSONX_PROJECT_ID` as environment variables first, or pass them in directly. 83 84 Here is how you can use the component on its own: 85 86 ```python 87 from haystack import Document 88 from haystack_integrations.components.embedders.watsonx.document_embedder import ( 89 WatsonxDocumentEmbedder, 90 ) 91 92 doc = Document(content="I love pizza!") 93 94 embedder = WatsonxDocumentEmbedder() 95 96 result = embedder.run([doc]) 97 print(result["documents"][0].embedding) 98 ## [-0.453125, 1.2236328, 2.0058594, 0.67871094...] 99 ``` 100 101 ### In a pipeline 102 103 ```python 104 from haystack import Pipeline 105 from haystack.document_stores.in_memory import InMemoryDocumentStore 106 from haystack.components.writers import DocumentWriter 107 from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever 108 109 from haystack_integrations.components.embedders.watsonx.document_embedder import ( 110 WatsonxDocumentEmbedder, 111 ) 112 from haystack_integrations.components.embedders.watsonx.text_embedder import ( 113 WatsonxTextEmbedder, 114 ) 115 116 document_store = InMemoryDocumentStore(embedding_similarity_function="cosine") 117 118 documents = [ 119 Document(content="My name is Wolfgang and I live in Berlin"), 120 Document(content="I saw a black horse running"), 121 Document(content="Germany has many big cities"), 122 ] 123 124 indexing_pipeline = Pipeline() 125 indexing_pipeline.add_component("embedder", WatsonxDocumentEmbedder()) 126 indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store)) 127 indexing_pipeline.connect("embedder", "writer") 128 129 indexing_pipeline.run({"embedder": {"documents": documents}}) 130 131 query_pipeline = Pipeline() 132 query_pipeline.add_component("text_embedder", WatsonxTextEmbedder()) 133 query_pipeline.add_component( 134 "retriever", 135 InMemoryEmbeddingRetriever(document_store=document_store), 136 ) 137 query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") 138 139 query = "Who lives in Berlin?" 140 141 result = query_pipeline.run({"text_embedder": {"text": query}}) 142 143 print(result["retriever"]["documents"][0]) 144 145 ## Document(id=..., text: 'My name is Wolfgang and I live in Berlin') 146 ```