mistraldocumentembedder.mdx
1 --- 2 title: "MistralDocumentEmbedder" 3 id: mistraldocumentembedder 4 slug: "/mistraldocumentembedder" 5 description: "This component computes the embeddings of a list of documents using the Mistral API and models." 6 --- 7 8 # MistralDocumentEmbedder 9 10 This component computes the embeddings of a list of documents using the Mistral API and models. 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | Before a [`DocumentWriter`](../writers/documentwriter.mdx) in an indexing pipeline | 17 | **Mandatory init variables** | `api_key`: The Mistral API key. Can be set with `MISTRAL_API_KEY` env var. | 18 | **Mandatory run variables** | `documents`: A list of documents to be embedded | 19 | **Output variables** | `documents`: A list of documents (enriched with embeddings) <br /> <br />`meta`: A dictionary of metadata strings | 20 | **API reference** | [Mistral](/reference/integrations-mistral) | 21 | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/mistral | 22 23 </div> 24 25 This component should be used to embed a list of Documents. To embed a string, use the [`MistralTextEmbedder`](mistraltextembedder.mdx). 26 27 ## Overview 28 29 `MistralDocumentEmbedder` computes the embeddings of a list of documents and stores the obtained vectors in the embedding field of each document. It uses the Mistral API and its embedding models. 30 31 The component currently supports the `mistral-embed` embedding model. The list of all supported models can be found in Mistral’s [embedding models documentation](https://docs.mistral.ai/platform/endpoints/#embedding-models). 32 33 To start using this integration with Haystack, install it with: 34 35 ```shell 36 pip install mistral-haystack 37 ``` 38 39 `MistralDocumentEmbedder` needs a Mistral API key to work. It uses an `MISTRAL_API_KEY` environment variable by default. Otherwise, you can pass an API key at initialization with `api_key`: 40 41 ```python 42 embedder = MistralDocumentEmbedder( 43 api_key=Secret.from_token("<your-api-key>"), 44 model="mistral-embed", 45 ) 46 ``` 47 48 ## Usage 49 50 ### On its own 51 52 Remember first to set the`MISTRAL_API_KEY` as an environment variable or pass it in directly. 53 54 Here is how you can use the component on its own: 55 56 ```python 57 from haystack import Document 58 from haystack_integrations.components.embedders.mistral.document_embedder import ( 59 MistralDocumentEmbedder, 60 ) 61 62 doc = Document(content="I love pizza!") 63 64 embedder = MistralDocumentEmbedder( 65 api_key=Secret.from_token("<your-api-key>"), 66 model="mistral-embed", 67 ) 68 69 result = embedder.run([doc]) 70 print(result["documents"][0].embedding) 71 ## [-0.453125, 1.2236328, 2.0058594, 0.67871094...] 72 ``` 73 74 ### In a pipeline 75 76 Below is an example of the `MistralDocumentEmbedder` in an indexing pipeline. We are indexing the contents of a webpage into an `InMemoryDocumentStore`. 77 78 ```python 79 from haystack import Pipeline 80 from haystack.components.converters import HTMLToDocument 81 from haystack.components.fetchers import LinkContentFetcher 82 from haystack.components.preprocessors import DocumentSplitter 83 from haystack.components.writers import DocumentWriter 84 from haystack.document_stores.in_memory import InMemoryDocumentStore 85 from haystack_integrations.components.embedders.mistral.document_embedder import ( 86 MistralDocumentEmbedder, 87 ) 88 89 document_store = InMemoryDocumentStore() 90 fetcher = LinkContentFetcher() 91 converter = HTMLToDocument() 92 chunker = DocumentSplitter() 93 embedder = MistralDocumentEmbedder() 94 writer = DocumentWriter(document_store=document_store) 95 96 indexing = Pipeline() 97 98 indexing.add_component(name="fetcher", instance=fetcher) 99 indexing.add_component(name="converter", instance=converter) 100 indexing.add_component(name="chunker", instance=chunker) 101 indexing.add_component(name="embedder", instance=embedder) 102 indexing.add_component(name="writer", instance=writer) 103 104 indexing.connect("fetcher", "converter") 105 indexing.connect("converter", "chunker") 106 indexing.connect("chunker", "embedder") 107 indexing.connect("embedder", "writer") 108 109 indexing.run(data={"fetcher": {"urls": ["https://mistral.ai/news/la-plateforme/"]}}) 110 ```