togetherai.md
1 --- 2 title: "Together AI" 3 id: integrations-togetherai 4 description: "Together AI integration for Haystack" 5 slug: "/integrations-togetherai" 6 --- 7 8 <a id="haystack_integrations.components.generators.togetherai.chat.chat_generator"></a> 9 10 ## Module haystack\_integrations.components.generators.togetherai.chat.chat\_generator 11 12 <a id="haystack_integrations.components.generators.togetherai.chat.chat_generator.TogetherAIChatGenerator"></a> 13 14 ### TogetherAIChatGenerator 15 16 Enables text generation using Together AI generative models. 17 For supported models, see [Together AI docs](https://docs.together.ai/docs). 18 19 Users can pass any text generation parameters valid for the Together AI chat completion API 20 directly to this component using the `generation_kwargs` parameter in `__init__` or the `generation_kwargs` 21 parameter in `run` method. 22 23 Key Features and Compatibility: 24 - **Primary Compatibility**: Designed to work seamlessly with the Together AI chat completion endpoint. 25 - **Streaming Support**: Supports streaming responses from the Together AI chat completion endpoint. 26 - **Customizability**: Supports all parameters supported by the Together AI chat completion endpoint. 27 28 This component uses the ChatMessage format for structuring both input and output, 29 ensuring coherent and contextually relevant responses in chat-based text generation scenarios. 30 Details on the ChatMessage format can be found in the 31 [Haystack docs](https://docs.haystack.deepset.ai/docs/chatmessage) 32 33 For more details on the parameters supported by the Together AI API, refer to the 34 [Together AI API Docs](https://docs.together.ai/reference/chat-completions-1). 35 36 Usage example: 37 ```python 38 from haystack_integrations.components.generators.togetherai import TogetherAIChatGenerator 39 from haystack.dataclasses import ChatMessage 40 41 messages = [ChatMessage.from_user("What's Natural Language Processing?")] 42 43 client = TogetherAIChatGenerator() 44 response = client.run(messages) 45 print(response) 46 47 >>{'replies': [ChatMessage(_content='Natural Language Processing (NLP) is a branch of artificial intelligence 48 >>that focuses on enabling computers to understand, interpret, and generate human language in a way that is 49 >>meaningful and useful.', _role=<ChatRole.ASSISTANT: 'assistant'>, _name=None, 50 >>_meta={'model': 'meta-llama/Llama-3.3-70B-Instruct-Turbo', 'index': 0, 'finish_reason': 'stop', 51 >>'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})]} 52 ``` 53 54 <a id="haystack_integrations.components.generators.togetherai.chat.chat_generator.TogetherAIChatGenerator.__init__"></a> 55 56 #### TogetherAIChatGenerator.\_\_init\_\_ 57 58 ```python 59 def __init__(*, 60 api_key: Secret = Secret.from_env_var("TOGETHER_API_KEY"), 61 model: str = "meta-llama/Llama-3.3-70B-Instruct-Turbo", 62 streaming_callback: StreamingCallbackT | None = None, 63 api_base_url: str | None = "https://api.together.xyz/v1", 64 generation_kwargs: dict[str, Any] | None = None, 65 tools: ToolsType | None = None, 66 timeout: float | None = None, 67 max_retries: int | None = None, 68 http_client_kwargs: dict[str, Any] | None = None) 69 ``` 70 71 Creates an instance of TogetherAIChatGenerator. Unless specified otherwise, 72 73 the default model is `meta-llama/Llama-3.3-70B-Instruct-Turbo`. 74 75 **Arguments**: 76 77 - `api_key`: The Together API key. 78 - `model`: The name of the Together AI chat completion model to use. 79 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 80 The callback function accepts StreamingChunk as an argument. 81 - `api_base_url`: The Together AI API Base url. 82 For more details, see Together AI [docs](https://docs.together.ai/docs/openai-api-compatibility). 83 - `generation_kwargs`: Other parameters to use for the model. These parameters are all sent directly to 84 the Together AI endpoint. See [Together AI API docs](https://docs.together.ai/reference/chat-completions-1) 85 for more details. 86 Some of the supported parameters: 87 - `max_tokens`: The maximum number of tokens the output text can have. 88 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 89 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 90 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 91 considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens 92 comprising the top 10% probability mass are considered. 93 - `stream`: Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent 94 events as they become available, with the stream terminated by a data: [DONE] message. 95 - `safe_prompt`: Whether to inject a safety prompt before all conversations. 96 - `random_seed`: The seed to use for random sampling. 97 - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response. 98 If provided, the output will always be validated against this 99 format (unless the model returns a tool call). 100 For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs). 101 Notes: 102 - For structured outputs with streaming, 103 the `response_format` must be a JSON schema and not a Pydantic model. 104 - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls. 105 Each tool should have a unique name. 106 - `timeout`: The timeout for the Together AI API call. 107 - `max_retries`: Maximum number of retries to contact Together AI after an internal error. 108 If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5. 109 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 110 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 111 112 <a id="haystack_integrations.components.generators.togetherai.chat.chat_generator.TogetherAIChatGenerator.to_dict"></a> 113 114 #### TogetherAIChatGenerator.to\_dict 115 116 ```python 117 def to_dict() -> dict[str, Any] 118 ``` 119 120 Serialize this component to a dictionary. 121 122 **Returns**: 123 124 The serialized component as a dictionary. 125 126 <a id="haystack_integrations.components.generators.togetherai.generator"></a> 127 128 ## Module haystack\_integrations.components.generators.togetherai.generator 129 130 <a id="haystack_integrations.components.generators.togetherai.generator.TogetherAIGenerator"></a> 131 132 ### TogetherAIGenerator 133 134 Provides an interface to generate text using an LLM running on Together AI. 135 136 Usage example: 137 ```python 138 from haystack_integrations.components.generators.togetherai import TogetherAIGenerator 139 140 generator = TogetherAIGenerator(model="deepseek-ai/DeepSeek-R1", 141 generation_kwargs={ 142 "temperature": 0.9, 143 }) 144 145 print(generator.run("Who is the best Italian actor?")) 146 ``` 147 148 <a id="haystack_integrations.components.generators.togetherai.generator.TogetherAIGenerator.__init__"></a> 149 150 #### TogetherAIGenerator.\_\_init\_\_ 151 152 ```python 153 def __init__(api_key: Secret = Secret.from_env_var("TOGETHER_API_KEY"), 154 model: str = "meta-llama/Llama-3.3-70B-Instruct-Turbo", 155 api_base_url: str | None = "https://api.together.xyz/v1", 156 streaming_callback: StreamingCallbackT | None = None, 157 system_prompt: str | None = None, 158 generation_kwargs: dict[str, Any] | None = None, 159 timeout: float | None = None, 160 max_retries: int | None = None) 161 ``` 162 163 Initialize the TogetherAIGenerator. 164 165 **Arguments**: 166 167 - `api_key`: The Together API key. 168 - `model`: The name of the model to use. 169 - `api_base_url`: The base URL of the Together AI API. 170 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 171 The callback function accepts StreamingChunk as an argument. 172 - `system_prompt`: The system prompt to use for text generation. If not provided, the system prompt is 173 omitted, and the default system prompt of the model is used. 174 - `generation_kwargs`: Other parameters to use for the model. These parameters are all sent directly to 175 the Together AI endpoint. See Together AI 176 [documentation](https://docs.together.ai/reference/chat-completions-1) for more details. 177 Some of the supported parameters: 178 - `max_tokens`: The maximum number of tokens the output text can have. 179 - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks. 180 Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer. 181 - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model 182 considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens 183 comprising the top 10% probability mass are considered. 184 - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2, 185 it will generate two completions for each of the three prompts, ending up with 6 completions in total. 186 - `stop`: One or more sequences after which the LLM should stop generating tokens. 187 - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean 188 the model will be less likely to repeat the same token in the text. 189 - `frequency_penalty`: What penalty to apply if a token has already been generated in the text. 190 Bigger values mean the model will be less likely to repeat the same token in the text. 191 - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the 192 values are the bias to add to that token. 193 - `timeout`: Timeout for together.ai Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment 194 variable or set to 30. 195 - `max_retries`: Maximum retries to establish contact with Together AI if it returns an internal error, if not set it is 196 inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5. 197 198 <a id="haystack_integrations.components.generators.togetherai.generator.TogetherAIGenerator.to_dict"></a> 199 200 #### TogetherAIGenerator.to\_dict 201 202 ```python 203 def to_dict() -> dict[str, Any] 204 ``` 205 206 Serialize this component to a dictionary. 207 208 **Returns**: 209 210 The serialized component as a dictionary. 211 212 <a id="haystack_integrations.components.generators.togetherai.generator.TogetherAIGenerator.from_dict"></a> 213 214 #### TogetherAIGenerator.from\_dict 215 216 ```python 217 @classmethod 218 def from_dict(cls, data: dict[str, Any]) -> "TogetherAIGenerator" 219 ``` 220 221 Deserialize this component from a dictionary. 222 223 **Arguments**: 224 225 - `data`: The dictionary representation of this component. 226 227 **Returns**: 228 229 The deserialized component instance. 230 231 <a id="haystack_integrations.components.generators.togetherai.generator.TogetherAIGenerator.run"></a> 232 233 #### TogetherAIGenerator.run 234 235 ```python 236 @component.output_types(replies=list[str], meta=list[dict[str, Any]]) 237 def run(*, 238 prompt: str, 239 system_prompt: str | None = None, 240 streaming_callback: StreamingCallbackT | None = None, 241 generation_kwargs: dict[str, Any] | None = None) -> dict[str, Any] 242 ``` 243 244 Generate text completions synchronously. 245 246 **Arguments**: 247 248 - `prompt`: The input prompt string for text generation. 249 - `system_prompt`: An optional system prompt to provide context or instructions for the generation. 250 If not provided, the system prompt set in the `__init__` method will be used. 251 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 252 If provided, this will override the `streaming_callback` set in the `__init__` method. 253 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters 254 passed in the `__init__` method. Supported parameters include temperature, max_new_tokens, top_p, etc. 255 256 **Returns**: 257 258 A dictionary with the following keys: 259 - `replies`: A list of generated text completions as strings. 260 - `meta`: A list of metadata dictionaries containing information about each generation, 261 including model name, finish reason, and token usage statistics. 262 263 <a id="haystack_integrations.components.generators.togetherai.generator.TogetherAIGenerator.run_async"></a> 264 265 #### TogetherAIGenerator.run\_async 266 267 ```python 268 @component.output_types(replies=list[str], meta=list[dict[str, Any]]) 269 async def run_async( 270 *, 271 prompt: str, 272 system_prompt: str | None = None, 273 streaming_callback: StreamingCallbackT | None = None, 274 generation_kwargs: dict[str, Any] | None = None) -> dict[str, Any] 275 ``` 276 277 Generate text completions asynchronously. 278 279 **Arguments**: 280 281 - `prompt`: The input prompt string for text generation. 282 - `system_prompt`: An optional system prompt to provide context or instructions for the generation. 283 - `streaming_callback`: A callback function that is called when a new token is received from the stream. 284 If provided, this will override the `streaming_callback` set in the `__init__` method. 285 - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters 286 passed in the `__init__` method. Supported parameters include temperature, max_new_tokens, top_p, etc. 287 288 **Returns**: 289 290 A dictionary with the following keys: 291 - `replies`: A list of generated text completions as strings. 292 - `meta`: A list of metadata dictionaries containing information about each generation, 293 including model name, finish reason, and token usage statistics. 294