metallamachatgenerator.mdx
  1  ---
  2  title: "MetaLlamaChatGenerator"
  3  id: metallamachatgenerator
  4  slug: "/metallamachatgenerator"
  5  description: "This component enables chat completion with any model hosted available with Meta Llama API."
  6  ---
  7  
  8  # MetaLlamaChatGenerator
  9  
 10  This component enables chat completion with any model hosted available with Meta Llama API.
 11  
 12  |                                        |                                                                                                             |
 13  | -------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
 14  | **Most common position in a pipeline** | After a [ChatPromptBuilder](../builders/chatpromptbuilder.mdx)                                                          |
 15  | **Mandatory init variables**           | “api_key”: A Meta Llama API key. Can be set with `LLAMA_API_KEY` env variable or passed to `init()` method. |
 16  | **Mandatory run variables**            | “messages:” A list of [ChatMessage](../../concepts/data-classes/chatmessage.mdx) objects                                                |
 17  | **Output variables**                   | “replies”: A list of [ChatMessage](../../concepts/data-classes/chatmessage.mdx) objects                                                 |
 18  | **API reference**                      | [Meta Llama API](/reference/integrations-meta-llama)                                                               |
 19  | **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/meta_llama                |
 20  
 21  ## Overview
 22  
 23  The `MetaLlamaChatGenerator` enables you to use multiple Meta Llama models by making chat completion calls to the Meta [Llama API](https://llama.developer.meta.com/?utm_source=partner-haystack&utm_medium=website). The default model is `Llama-4-Scout-17B-16E-Instruct-FP8`.
 24  
 25  Currently available models are:
 26  
 27  | Model ID                                 | Input context length | Output context length | Input Modalities | Output Modalities |
 28  | ---------------------------------------- | -------------------- | --------------------- | ---------------- | ----------------- |
 29  | `Llama-4-Scout-17B-16E-Instruct-FP8`     | 128k                 | 4028                  | Text, Image      | Text              |
 30  | `Llama-4-Maverick-17B-128E-Instruct-FP8` | 128k                 | 4028                  | Text, Image      | Text              |
 31  | `Llama-3.3-70B-Instruct`                 | 128k                 | 4028                  | Text             | Text              |
 32  | `Llama-3.3-8B-Instruct`                  | 128k                 | 4028                  | Text             | Text              |
 33  
 34  This component uses the same `ChatMessage` format as other Haystack Chat Generators for structured input and output. For more information, see the [ChatMessage documentation](../../concepts/data-classes/chatmessage.mdx).
 35  
 36  It is also fully compatible with Haystack [Tools](../../tools/tool.mdx) and [Toolsets](../../tools/toolset.mdx) that allow function-calling capabilities with supported models.
 37  
 38  ### Initialization
 39  
 40  To use this integration, you must have a Meta Llama API key. You can provide it with the `LLAMA_API_KEY` environment variable or by using a [Secret](../../concepts/secret-management.mdx).
 41  
 42  Then, install the `meta-llama-haystack` integration:
 43  
 44  ```shell
 45  pip install meta-llama-haystack
 46  ```
 47  
 48  ### Streaming
 49  
 50  `MetaLlamaChatGenerator` supports [streaming](guides-to-generators/choosing-the-right-generator.mdx#streaming-support) responses from the LLM, allowing tokens to be emitted as they are generated. To enable streaming, pass a callable to the `streaming_callback` parameter during initialization.
 51  
 52  ## Usage
 53  
 54  ### On its own
 55  
 56  ```python
 57  from haystack.dataclasses import ChatMessage
 58  from haystack_integrations.components.generators.meta_llama import (
 59      MetaLlamaChatGenerator,
 60  )
 61  
 62  llm = MetaLlamaChatGenerator()
 63  response = llm.run([ChatMessage.from_user("What are Agentic Pipelines? Be brief.")])
 64  print(response["replies"][0].text)
 65  ```
 66  
 67  With streaming and model routing:
 68  
 69  ```python
 70  from haystack.dataclasses import ChatMessage
 71  from haystack_integrations.components.generators.meta_llama import (
 72      MetaLlamaChatGenerator,
 73  )
 74  
 75  llm = MetaLlamaChatGenerator(
 76      model="Llama-3.3-8B-Instruct",
 77      streaming_callback=lambda chunk: print(chunk.content, end="", flush=True),
 78  )
 79  
 80  response = llm.run([ChatMessage.from_user("What are Agentic Pipelines? Be brief.")])
 81  
 82  ## check the model used for the response
 83  print("\n\n Model used: ", response["replies"][0].meta["model"])
 84  ```
 85  
 86  ### In a pipeline
 87  
 88  ```python
 89  ## To run this example, you will need to set a `LLAMA_API_KEY` environment variable.
 90  
 91  from haystack import Document, Pipeline
 92  from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
 93  from haystack.components.generators.utils import print_streaming_chunk
 94  from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
 95  from haystack.dataclasses import ChatMessage
 96  from haystack.document_stores.in_memory import InMemoryDocumentStore
 97  from haystack.utils import Secret
 98  
 99  from haystack_integrations.components.generators.meta_llama import (
100      MetaLlamaChatGenerator,
101  )
102  
103  ## Write documents to InMemoryDocumentStore
104  document_store = InMemoryDocumentStore()
105  document_store.write_documents(
106      [
107          Document(content="My name is Jean and I live in Paris."),
108          Document(content="My name is Mark and I live in Berlin."),
109          Document(content="My name is Giorgio and I live in Rome."),
110      ],
111  )
112  
113  ## Build a RAG pipeline
114  prompt_template = [
115      ChatMessage.from_user(
116          "Given these documents, answer the question.\n"
117          "Documents:\n{% for doc in documents %}{{ doc.content }}{% endfor %}\n"
118          "Question: {{question}}\n"
119          "Answer:",
120      ),
121  ]
122  
123  ## Define required variables explicitly
124  prompt_builder = ChatPromptBuilder(
125      template=prompt_template,
126      required_variables={"question", "documents"},
127  )
128  
129  retriever = InMemoryBM25Retriever(document_store=document_store)
130  llm = MetaLlamaChatGenerator(
131      api_key=Secret.from_env_var("LLAMA_API_KEY"),
132      streaming_callback=print_streaming_chunk,
133  )
134  
135  rag_pipeline = Pipeline()
136  rag_pipeline.add_component("retriever", retriever)
137  rag_pipeline.add_component("prompt_builder", prompt_builder)
138  rag_pipeline.add_component("llm", llm)
139  rag_pipeline.connect("retriever", "prompt_builder.documents")
140  rag_pipeline.connect("prompt_builder", "llm.messages")
141  
142  ## Ask a question
143  question = "Who lives in Paris?"
144  rag_pipeline.run(
145      {
146          "retriever": {"query": question},
147          "prompt_builder": {"question": question},
148      },
149  )
150  ```