Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.24 / integrations-api / togetherai.md
togetherai.md
  1  ---
  2  title: "Together AI"
  3  id: integrations-togetherai
  4  description: "Together AI integration for Haystack"
  5  slug: "/integrations-togetherai"
  6  ---
  7  
  8  <a id="haystack_integrations.components.generators.togetherai.chat.chat_generator"></a>
  9  
 10  ## Module haystack\_integrations.components.generators.togetherai.chat.chat\_generator
 11  
 12  <a id="haystack_integrations.components.generators.togetherai.chat.chat_generator.TogetherAIChatGenerator"></a>
 13  
 14  ### TogetherAIChatGenerator
 15  
 16  Enables text generation using Together AI generative models.
 17  For supported models, see [Together AI docs](https://docs.together.ai/docs).
 18  
 19  Users can pass any text generation parameters valid for the Together AI chat completion API
 20  directly to this component using the `generation_kwargs` parameter in `__init__` or the `generation_kwargs`
 21  parameter in `run` method.
 22  
 23  Key Features and Compatibility:
 24  - **Primary Compatibility**: Designed to work seamlessly with the Together AI chat completion endpoint.
 25  - **Streaming Support**: Supports streaming responses from the Together AI chat completion endpoint.
 26  - **Customizability**: Supports all parameters supported by the Together AI chat completion endpoint.
 27  
 28  This component uses the ChatMessage format for structuring both input and output,
 29  ensuring coherent and contextually relevant responses in chat-based text generation scenarios.
 30  Details on the ChatMessage format can be found in the
 31  [Haystack docs](https://docs.haystack.deepset.ai/docs/chatmessage)
 32  
 33  For more details on the parameters supported by the Together AI API, refer to the
 34  [Together AI API Docs](https://docs.together.ai/reference/chat-completions-1).
 35  
 36  Usage example:
 37  ```python
 38  from haystack_integrations.components.generators.togetherai import TogetherAIChatGenerator
 39  from haystack.dataclasses import ChatMessage
 40  
 41  messages = [ChatMessage.from_user("What's Natural Language Processing?")]
 42  
 43  client = TogetherAIChatGenerator()
 44  response = client.run(messages)
 45  print(response)
 46  
 47  >>{'replies': [ChatMessage(_content='Natural Language Processing (NLP) is a branch of artificial intelligence
 48  >>that focuses on enabling computers to understand, interpret, and generate human language in a way that is
 49  >>meaningful and useful.', _role=<ChatRole.ASSISTANT: 'assistant'>, _name=None,
 50  >>_meta={'model': 'meta-llama/Llama-3.3-70B-Instruct-Turbo', 'index': 0, 'finish_reason': 'stop',
 51  >>'usage': {'prompt_tokens': 15, 'completion_tokens': 36, 'total_tokens': 51}})]}
 52  ```
 53  
 54  <a id="haystack_integrations.components.generators.togetherai.chat.chat_generator.TogetherAIChatGenerator.__init__"></a>
 55  
 56  #### TogetherAIChatGenerator.\_\_init\_\_
 57  
 58  ```python
 59  def __init__(*,
 60               api_key: Secret = Secret.from_env_var("TOGETHER_API_KEY"),
 61               model: str = "meta-llama/Llama-3.3-70B-Instruct-Turbo",
 62               streaming_callback: StreamingCallbackT | None = None,
 63               api_base_url: str | None = "https://api.together.xyz/v1",
 64               generation_kwargs: dict[str, Any] | None = None,
 65               tools: ToolsType | None = None,
 66               timeout: float | None = None,
 67               max_retries: int | None = None,
 68               http_client_kwargs: dict[str, Any] | None = None)
 69  ```
 70  
 71  Creates an instance of TogetherAIChatGenerator. Unless specified otherwise,
 72  
 73  the default model is `meta-llama/Llama-3.3-70B-Instruct-Turbo`.
 74  
 75  **Arguments**:
 76  
 77  - `api_key`: The Together API key.
 78  - `model`: The name of the Together AI chat completion model to use.
 79  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
 80  The callback function accepts StreamingChunk as an argument.
 81  - `api_base_url`: The Together AI API Base url.
 82  For more details, see Together AI [docs](https://docs.together.ai/docs/openai-api-compatibility).
 83  - `generation_kwargs`: Other parameters to use for the model. These parameters are all sent directly to
 84  the Together AI endpoint. See [Together AI API docs](https://docs.together.ai/reference/chat-completions-1)
 85  for more details.
 86  Some of the supported parameters:
 87  - `max_tokens`: The maximum number of tokens the output text can have.
 88  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
 89      Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
 90  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
 91      considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens
 92      comprising the top 10% probability mass are considered.
 93  - `stream`: Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent
 94      events as they become available, with the stream terminated by a data: [DONE] message.
 95  - `safe_prompt`: Whether to inject a safety prompt before all conversations.
 96  - `random_seed`: The seed to use for random sampling.
 97  - `response_format`: A JSON schema or a Pydantic model that enforces the structure of the model's response.
 98      If provided, the output will always be validated against this
 99      format (unless the model returns a tool call).
100      For details, see the [OpenAI Structured Outputs documentation](https://platform.openai.com/docs/guides/structured-outputs).
101      Notes:
102      - For structured outputs with streaming,
103        the `response_format` must be a JSON schema and not a Pydantic model.
104  - `tools`: A list of Tool and/or Toolset objects, or a single Toolset for which the model can prepare calls.
105  Each tool should have a unique name.
106  - `timeout`: The timeout for the Together AI API call.
107  - `max_retries`: Maximum number of retries to contact Together AI after an internal error.
108  If not set, it defaults to either the `OPENAI_MAX_RETRIES` environment variable, or set to 5.
109  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
110  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
111  
112  <a id="haystack_integrations.components.generators.togetherai.chat.chat_generator.TogetherAIChatGenerator.to_dict"></a>
113  
114  #### TogetherAIChatGenerator.to\_dict
115  
116  ```python
117  def to_dict() -> dict[str, Any]
118  ```
119  
120  Serialize this component to a dictionary.
121  
122  **Returns**:
123  
124  The serialized component as a dictionary.
125  
126  <a id="haystack_integrations.components.generators.togetherai.generator"></a>
127  
128  ## Module haystack\_integrations.components.generators.togetherai.generator
129  
130  <a id="haystack_integrations.components.generators.togetherai.generator.TogetherAIGenerator"></a>
131  
132  ### TogetherAIGenerator
133  
134  Provides an interface to generate text using an LLM running on Together AI.
135  
136  Usage example:
137  ```python
138  from haystack_integrations.components.generators.togetherai import TogetherAIGenerator
139  
140  generator = TogetherAIGenerator(model="deepseek-ai/DeepSeek-R1",
141                              generation_kwargs={
142                              "temperature": 0.9,
143                              })
144  
145  print(generator.run("Who is the best Italian actor?"))
146  ```
147  
148  <a id="haystack_integrations.components.generators.togetherai.generator.TogetherAIGenerator.__init__"></a>
149  
150  #### TogetherAIGenerator.\_\_init\_\_
151  
152  ```python
153  def __init__(api_key: Secret = Secret.from_env_var("TOGETHER_API_KEY"),
154               model: str = "meta-llama/Llama-3.3-70B-Instruct-Turbo",
155               api_base_url: str | None = "https://api.together.xyz/v1",
156               streaming_callback: StreamingCallbackT | None = None,
157               system_prompt: str | None = None,
158               generation_kwargs: dict[str, Any] | None = None,
159               timeout: float | None = None,
160               max_retries: int | None = None)
161  ```
162  
163  Initialize the TogetherAIGenerator.
164  
165  **Arguments**:
166  
167  - `api_key`: The Together API key.
168  - `model`: The name of the model to use.
169  - `api_base_url`: The base URL of the Together AI API.
170  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
171  The callback function accepts StreamingChunk as an argument.
172  - `system_prompt`: The system prompt to use for text generation. If not provided, the system prompt is
173  omitted, and the default system prompt of the model is used.
174  - `generation_kwargs`: Other parameters to use for the model. These parameters are all sent directly to
175  the Together AI endpoint. See Together AI
176  [documentation](https://docs.together.ai/reference/chat-completions-1) for more details.
177  Some of the supported parameters:
178  - `max_tokens`: The maximum number of tokens the output text can have.
179  - `temperature`: What sampling temperature to use. Higher values mean the model will take more risks.
180      Try 0.9 for more creative applications and 0 (argmax sampling) for ones with a well-defined answer.
181  - `top_p`: An alternative to sampling with temperature, called nucleus sampling, where the model
182      considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens
183      comprising the top 10% probability mass are considered.
184  - `n`: How many completions to generate for each prompt. For example, if the LLM gets 3 prompts and n is 2,
185      it will generate two completions for each of the three prompts, ending up with 6 completions in total.
186  - `stop`: One or more sequences after which the LLM should stop generating tokens.
187  - `presence_penalty`: What penalty to apply if a token is already present at all. Bigger values mean
188      the model will be less likely to repeat the same token in the text.
189  - `frequency_penalty`: What penalty to apply if a token has already been generated in the text.
190      Bigger values mean the model will be less likely to repeat the same token in the text.
191  - `logit_bias`: Add a logit bias to specific tokens. The keys of the dictionary are tokens, and the
192      values are the bias to add to that token.
193  - `timeout`: Timeout for together.ai Client calls, if not set it is inferred from the `OPENAI_TIMEOUT` environment
194  variable or set to 30.
195  - `max_retries`: Maximum retries to establish contact with Together AI if it returns an internal error, if not set it is
196  inferred from the `OPENAI_MAX_RETRIES` environment variable or set to 5.
197  
198  <a id="haystack_integrations.components.generators.togetherai.generator.TogetherAIGenerator.to_dict"></a>
199  
200  #### TogetherAIGenerator.to\_dict
201  
202  ```python
203  def to_dict() -> dict[str, Any]
204  ```
205  
206  Serialize this component to a dictionary.
207  
208  **Returns**:
209  
210  The serialized component as a dictionary.
211  
212  <a id="haystack_integrations.components.generators.togetherai.generator.TogetherAIGenerator.from_dict"></a>
213  
214  #### TogetherAIGenerator.from\_dict
215  
216  ```python
217  @classmethod
218  def from_dict(cls, data: dict[str, Any]) -> "TogetherAIGenerator"
219  ```
220  
221  Deserialize this component from a dictionary.
222  
223  **Arguments**:
224  
225  - `data`: The dictionary representation of this component.
226  
227  **Returns**:
228  
229  The deserialized component instance.
230  
231  <a id="haystack_integrations.components.generators.togetherai.generator.TogetherAIGenerator.run"></a>
232  
233  #### TogetherAIGenerator.run
234  
235  ```python
236  @component.output_types(replies=list[str], meta=list[dict[str, Any]])
237  def run(*,
238          prompt: str,
239          system_prompt: str | None = None,
240          streaming_callback: StreamingCallbackT | None = None,
241          generation_kwargs: dict[str, Any] | None = None) -> dict[str, Any]
242  ```
243  
244  Generate text completions synchronously.
245  
246  **Arguments**:
247  
248  - `prompt`: The input prompt string for text generation.
249  - `system_prompt`: An optional system prompt to provide context or instructions for the generation.
250  If not provided, the system prompt set in the `__init__` method will be used.
251  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
252  If provided, this will override the `streaming_callback` set in the `__init__` method.
253  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters
254  passed in the `__init__` method. Supported parameters include temperature, max_new_tokens, top_p, etc.
255  
256  **Returns**:
257  
258  A dictionary with the following keys:
259  - `replies`: A list of generated text completions as strings.
260  - `meta`: A list of metadata dictionaries containing information about each generation,
261  including model name, finish reason, and token usage statistics.
262  
263  <a id="haystack_integrations.components.generators.togetherai.generator.TogetherAIGenerator.run_async"></a>
264  
265  #### TogetherAIGenerator.run\_async
266  
267  ```python
268  @component.output_types(replies=list[str], meta=list[dict[str, Any]])
269  async def run_async(
270          *,
271          prompt: str,
272          system_prompt: str | None = None,
273          streaming_callback: StreamingCallbackT | None = None,
274          generation_kwargs: dict[str, Any] | None = None) -> dict[str, Any]
275  ```
276  
277  Generate text completions asynchronously.
278  
279  **Arguments**:
280  
281  - `prompt`: The input prompt string for text generation.
282  - `system_prompt`: An optional system prompt to provide context or instructions for the generation.
283  - `streaming_callback`: A callback function that is called when a new token is received from the stream.
284  If provided, this will override the `streaming_callback` set in the `__init__` method.
285  - `generation_kwargs`: Additional keyword arguments for text generation. These parameters will potentially override the parameters
286  passed in the `__init__` method. Supported parameters include temperature, max_new_tokens, top_p, etc.
287  
288  **Returns**:
289  
290  A dictionary with the following keys:
291  - `replies`: A list of generated text completions as strings.
292  - `meta`: A list of metadata dictionaries containing information about each generation,
293  including model name, finish reason, and token usage statistics.
294