/ docs-website / versioned_docs / version-2.24 / pipeline-components / embedders / openaitextembedder.mdx
openaitextembedder.mdx
1 --- 2 title: "OpenAITextEmbedder" 3 id: openaitextembedder 4 slug: "/openaitextembedder" 5 description: "OpenAITextEmbedder transforms a string into a vector that captures its semantics using an OpenAI embedding model." 6 --- 7 8 # OpenAITextEmbedder 9 10 OpenAITextEmbedder transforms a string into a vector that captures its semantics using an OpenAI embedding model. 11 12 When you perform embedding retrieval, you use this component to transform your query into a vector. Then, the embedding Retriever looks for similar or relevant documents. 13 14 <div className="key-value-table"> 15 16 | | | 17 | --- | --- | 18 | **Most common position in a pipeline** | Before an embedding [Retriever](../retrievers.mdx) in a query/RAG pipeline | 19 | **Mandatory init variables** | `api_key`: An OpenAI API key. Can be set with `OPENAI_API_KEY` env var. | 20 | **Mandatory run variables** | `text`: A string | 21 | **Output variables** | `embedding`: A list of float numbers <br /> <br />`meta`: A dictionary of metadata | 22 | **API reference** | [Embedders](/reference/embedders-api) | 23 | **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/embedders/openai_text_embedder.py | 24 25 </div> 26 27 ## Overview 28 29 To see the list of compatible OpenAI embedding models, head over to OpenAI [documentation](https://platform.openai.com/docs/guides/embeddings/embedding-models). The default model for `OpenAITextEmbedder` is `text-embedding-ada-002`. You can specify another model with the `model` parameter when initializing this component. 30 31 Use `OpenAITextEmbedder` to embed a simple string (such as a query) into a vector. For embedding lists of documents, use the [OpenAIDocumentEmbedder](openaidocumentembedder.mdx), which enriches the document with the computed embedding, also known as vector. 32 33 The component uses an `OPENAI_API_KEY` environment variable by default. Otherwise, you can pass an API key at initialization with `api_key`: 34 35 ```python 36 embedder = OpenAITextEmbedder(api_key=Secret.from_token("<your-api-key>")) 37 ``` 38 39 ## Usage 40 41 ### On its own 42 43 Here is how you can use the component on its own: 44 45 ```python 46 from haystack.components.embedders import OpenAITextEmbedder 47 48 text_to_embed = "I love pizza!" 49 50 text_embedder = OpenAITextEmbedder(api_key=Secret.from_token("<your-api-key>")) 51 52 print(text_embedder.run(text_to_embed)) 53 54 ## {'embedding': [0.017020374536514282, -0.023255806416273117, ...], 55 ## 'meta': {'model': 'text-embedding-ada-002-v2', 56 ## 'usage': {'prompt_tokens': 4, 'total_tokens': 4}}} 57 ``` 58 59 :::info 60 We recommend setting OPENAI_API_KEY as an environment variable instead of setting it as a parameter. 61 ::: 62 63 ### In a pipeline 64 65 ```python 66 from haystack import Document 67 from haystack import Pipeline 68 from haystack.document_stores.in_memory import InMemoryDocumentStore 69 from haystack.components.embedders import OpenAITextEmbedder, OpenAIDocumentEmbedder 70 from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever 71 72 document_store = InMemoryDocumentStore(embedding_similarity_function="cosine") 73 74 documents = [ 75 Document(content="My name is Wolfgang and I live in Berlin"), 76 Document(content="I saw a black horse running"), 77 Document(content="Germany has many big cities"), 78 ] 79 80 document_embedder = OpenAIDocumentEmbedder() 81 documents_with_embeddings = document_embedder.run(documents)["documents"] 82 document_store.write_documents(documents_with_embeddings) 83 84 query_pipeline = Pipeline() 85 query_pipeline.add_component("text_embedder", OpenAITextEmbedder()) 86 query_pipeline.add_component( 87 "retriever", 88 InMemoryEmbeddingRetriever(document_store=document_store), 89 ) 90 query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding") 91 92 query = "Who lives in Berlin?" 93 94 result = query_pipeline.run({"text_embedder": {"text": query}}) 95 96 print(result["retriever"]["documents"][0]) 97 98 ## Document(id=..., mimetype: 'text/plain', 99 ## text: 'My name is Wolfgang and I live in Berlin') 100 ```