/ docs-website / versioned_docs / version-2.21 / pipeline-components / generators / ollamachatgenerator.mdx
ollamachatgenerator.mdx
1 --- 2 title: "OllamaChatGenerator" 3 id: ollamachatgenerator 4 slug: "/ollamachatgenerator" 5 description: "This component enables chat completion using an LLM running on Ollama." 6 --- 7 8 # OllamaChatGenerator 9 10 This component enables chat completion using an LLM running on Ollama. 11 12 <div className="key-value-table"> 13 14 | | | 15 | --- | --- | 16 | **Most common position in a pipeline** | After a [ChatPromptBuilder](../builders/chatpromptbuilder.mdx) | 17 | **Mandatory run variables** | `messages`: A list of [`ChatMessage`](../../concepts/data-classes/chatmessage.mdx) objects representing the chat | 18 | **Output variables** | `replies`: A list of LLM’s alternative replies | 19 | **API reference** | [Ollama](/reference/integrations-ollama) | 20 | **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/ollama | 21 22 </div> 23 24 ## Overview 25 26 [Ollama](https://github.com/jmorganca/ollama) is a project focused on running LLMs locally. Internally, it uses the quantized GGUF format by default. This means it is possible to run LLMs on standard machines (even without GPUs) without having to handle complex installation procedures. 27 28 `OllamaChatGenerator` supports models running on Ollama, such as `llama2` and `mixtral`. Find the full list of supported models [here](https://ollama.ai/library). 29 30 `OllamaChatGenerator` needs a `model` name and a `url` to work. By default, it uses `"orca-mini"` model and `"http://localhost:11434"` url. 31 32 The way to operate with `OllamaChatGenerator` is by using `ChatMessage` objects. [ChatMessage](../../concepts/data-classes/chatmessage.mdx) is a data class that contains a message, a role (who generated the message, such as `user`, `assistant`, `system`, `function`), and optional metadata. See the [usage](#usage) section for an example. 33 34 ### Tool Support 35 36 `OllamaChatGenerator` supports function calling through the `tools` parameter, which accepts flexible tool configurations: 37 38 - **A list of Tool objects**: Pass individual tools as a list 39 - **A single Toolset**: Pass an entire Toolset directly 40 - **Mixed Tools and Toolsets**: Combine multiple Toolsets with standalone tools in a single list 41 42 This allows you to organize related tools into logical groups while also including standalone tools as needed. 43 44 ```python 45 from haystack.tools import Tool, Toolset 46 from haystack_integrations.components.generators.ollama import OllamaChatGenerator 47 48 # Create individual tools 49 weather_tool = Tool(name="weather", description="Get weather info", ...) 50 news_tool = Tool(name="news", description="Get latest news", ...) 51 52 # Group related tools into a toolset 53 math_toolset = Toolset([add_tool, subtract_tool, multiply_tool]) 54 55 # Pass mixed tools and toolsets to the generator 56 generator = OllamaChatGenerator( 57 model="llama2", 58 tools=[math_toolset, weather_tool, news_tool] # Mix of Toolset and Tool objects 59 ) 60 ``` 61 62 For more details on working with tools, see the [Tool](../../tools/tool.mdx) and [Toolset](../../tools/toolset.mdx) documentation. 63 64 ### Streaming 65 66 You can stream output as it’s generated. Pass a callback to `streaming_callback`. Use the built-in `print_streaming_chunk` to print text tokens and tool events (tool calls and tool results). 67 68 ```python 69 from haystack.components.generators.utils import print_streaming_chunk 70 71 ## Configure any `Generator` or `ChatGenerator` with a streaming callback 72 component = SomeGeneratorOrChatGenerator(streaming_callback=print_streaming_chunk) 73 74 ## If this is a `ChatGenerator`, pass a list of messages: 75 ## from haystack.dataclasses import ChatMessage 76 ## component.run([ChatMessage.from_user("Your question here")]) 77 78 ## If this is a (non-chat) `Generator`, pass a prompt: 79 ## component.run({"prompt": "Your prompt here"}) 80 ``` 81 82 :::info 83 Streaming works only with a single response. If a provider supports multiple candidates, set `n=1`. 84 ::: 85 86 See our [Streaming Support](guides-to-generators/choosing-the-right-generator.mdx#streaming-support) docs to learn more how `StreamingChunk` works and how to write a custom callback. 87 88 Give preference to `print_streaming_chunk` by default. Write a custom callback only if you need a specific transport (for example, SSE/WebSocket) or custom UI formatting. 89 90 ## Usage 91 92 1. You need a running instance of Ollama. The installation instructions are [in the Ollama GitHub repository](https://github.com/jmorganca/ollama). 93 A fast way to run Ollama is using Docker: 94 95 ```bash 96 docker run -d -p 11434:11434 --name ollama ollama/ollama:latest 97 ``` 98 99 2. You need to download or pull the desired LLM. The model library is available on the [Ollama website](https://ollama.ai/library). 100 If you are using Docker, you can, for example, pull the Zephyr model: 101 102 ```bash 103 docker exec ollama ollama pull zephyr 104 ``` 105 106 If you already installed Ollama in your system, you can execute: 107 108 ```bash 109 ollama pull zephyr 110 ``` 111 112 :::tip[Choose a specific version of a model] 113 114 You can also specify a tag to choose a specific (quantized) version of your model. The available tags are shown in the model card of the Ollama models library. This is an [example](https://ollama.ai/library/zephyr/tags) for Zephyr. 115 In this case, simply run 116 117 ```shell 118 # ollama pull model:tag 119 ollama pull zephyr:7b-alpha-q3_K_S 120 ``` 121 ::: 122 123 3. You also need to install the `ollama-haystack` package: 124 125 ```bash 126 pip install ollama-haystack 127 ``` 128 129 ### On its own 130 131 ```python 132 from haystack_integrations.components.generators.ollama import OllamaChatGenerator 133 from haystack.dataclasses import ChatMessage 134 135 generator = OllamaChatGenerator(model="zephyr", 136 url = "http://localhost:11434", 137 generation_kwargs={ 138 "num_predict": 100, 139 "temperature": 0.9, 140 }) 141 142 messages = [ChatMessage.from_system("\nYou are a helpful, respectful and honest assistant"), 143 ChatMessage.from_user("What's Natural Language Processing?")] 144 145 print(generator.run(messages=messages)) 146 >> { 147 "replies": [ 148 ChatMessage( 149 _role=<ChatRole.ASSISTANT: 'assistant'>, 150 _content=[ 151 TextContent( 152 text=( 153 "Natural Language Processing (NLP) is a subfield of " 154 "Artificial Intelligence that deals with understanding, " 155 "interpreting, and generating human language in a meaningful " 156 "way. It enables tasks such as language translation, sentiment " 157 "analysis, and text summarization." 158 ) 159 ) 160 ], 161 _name=None, 162 _meta={ 163 "model": "zephyr",... 164 } 165 ) 166 ] 167 } 168 ``` 169 170 With multimodal inputs: 171 172 ```python 173 from haystack.dataclasses import ChatMessage, ImageContent 174 from haystack_integrations.components.generators.ollama import OllamaChatGenerator 175 176 llm = OllamaChatGenerator(model="llava", url="http://localhost:11434") 177 178 image = ImageContent.from_file_path("apple.jpg") 179 user_message = ChatMessage.from_user( 180 content_parts=["What does the image show? Max 5 words.", image], 181 ) 182 183 response = llm.run([user_message])["replies"][0].text 184 print(response) 185 186 # Red apple on straw. 187 ``` 188 189 ### In a Pipeline 190 191 ```python 192 from haystack.components.builders import ChatPromptBuilder 193 from haystack_integrations.components.generators.ollama import OllamaChatGenerator 194 from haystack.dataclasses import ChatMessage 195 from haystack import Pipeline 196 197 ## no parameter init, we don't use any runtime template variables 198 prompt_builder = ChatPromptBuilder() 199 generator = OllamaChatGenerator(model="zephyr", 200 url = "http://localhost:11434", 201 generation_kwargs={ 202 "temperature": 0.9, 203 }) 204 205 pipe = Pipeline() 206 pipe.add_component("prompt_builder", prompt_builder) 207 pipe.add_component("llm", generator) 208 pipe.connect("prompt_builder.prompt", "llm.messages") 209 location = "Berlin" 210 messages = [ChatMessage.from_system("Always respond in Spanish even if some input data is in other languages."), 211 ChatMessage.from_user("Tell me about {{location}}")] 212 print(pipe.run(data={"prompt_builder": {"template_variables":{"location": location}, "template": messages}})) 213 214 >> { 215 "llm": { 216 "replies": [ 217 ChatMessage( 218 _role=<ChatRole.ASSISTANT: 'assistant'>, 219 _content=[ 220 TextContent( 221 text=( 222 "Berlín es la capital y la mayor ciudad de Alemania. " 223 "Está ubicada en el estado federado de Berlín, y tiene más..." 224 ) 225 ) 226 ], 227 _name=None, 228 _meta={ 229 "model": "zephyr",... 230 } 231 ) 232 ] 233 } 234 } 235 ```