llm.md
1 # LLM 2 3  4  5 6 The LLM pipeline runs prompts through a large language model (LLM). This pipeline autodetects the LLM framework based on the model path. 7 8 ## Example 9 10 The following shows a simple example using this pipeline. 11 12 ```python 13 from txtai import LLM 14 15 # Create LLM pipeline 16 llm = LLM() 17 18 # Run prompt 19 llm( 20 """ 21 Answer the following question using the provided context. 22 23 Question: 24 What are the applications of txtai? 25 26 Context: 27 txtai is an open-source platform for semantic search and 28 workflows powered by language models. 29 """ 30 ) 31 32 # Prompts with chat templating can be directly passed 33 # The template format varies by model 34 llm( 35 """ 36 <|im_start|>system 37 You are a friendly assistant.<|im_end|> 38 <|im_start|>user 39 Answer the following question...<|im_end|> 40 <|im_start|>assistant 41 """ 42 ) 43 44 # Chat messages automatically handle templating 45 llm([ 46 {"role": "system", "content": "You are a friendly assistant."}, 47 {"role": "user", "content": "Answer the following question..."} 48 ]) 49 50 # When there is no system prompt passed to instruction tuned models 51 # the default role is inferred `defaultrole="auto"` 52 llm("Answer the following question...") 53 54 # To always generate chat messages for string inputs 55 llm("Answer the following question...", defaultrole="user") 56 57 # To never generate chat messages for string inputs 58 llm("Answer the following question...", defaultrole="prompt") 59 ``` 60 61 The LLM pipeline automatically detects the underlying LLM framework. This can also be manually set. The following methods are supported. 62 63 - [Hugging Face Transformers](https://github.com/huggingface/transformers) 64 - [llama.cpp](https://github.com/abetlen/llama-cpp-python) 65 - [LLM APIs via LiteLLM](https://github.com/BerriAI/litellm) 66 - [OpenCode server](https://github.com/anomalyco/opencode) 67 68 `llama.cpp` models support both local and remote GGUF paths on the HF Hub. See the [LiteLLM documentation](https://litellm.vercel.app/docs/providers) for the options available with LiteLLM models. See the [OpenCode documentation](https://opencode.ai/docs/server/) for more on how to integrate the LLM pipeline with a running OpenCode instance. 69 70 ```python 71 from txtai import LLM 72 73 # Transformers 74 llm = LLM("openai/gpt-oss-20b") 75 llm = LLM("openai/gpt-oss-20b", method="transformers") 76 77 # llama.cpp 78 llm = LLM("unsloth/gpt-oss-20b-GGUF/gpt-oss-20b-Q4_K_M.gguf") 79 llm = LLM("unsloth/gpt-oss-20b-GGUF/gpt-oss-20b-Q4_K_M.gguf", 80 method="llama.cpp") 81 82 # LiteLLM 83 llm = LLM("ollama/gpt-oss") 84 llm = LLM("ollama/gpt-oss", method="litellm") 85 86 # Custom Ollama endpoint 87 llm = LLM("ollama/gpt-oss", api_base="http://localhost:11434") 88 89 # Custom OpenAI-compatible endpoint 90 llm = LLM("openai/gpt-oss", api_base="http://localhost:4000") 91 92 # LLM APIs - must also set API key via environment variable 93 llm = LLM("gpt-5.2") 94 llm = LLM("claude-opus-4-5-20251101") 95 llm = LLM("gemini/gemini-3-pro-preview") 96 97 # Local OpenCode server started via `opencode serve` 98 llm = LLM("opencode") 99 llm = LLM("opencode/big-pickle", url="http://localhost:4000") 100 ``` 101 102 Models can be externally loaded and passed to pipelines. This is useful for models that are not yet supported by Transformers and/or need special initialization. 103 104 ```python 105 import torch 106 107 from transformers import AutoModelForCausalLM, AutoTokenizer 108 from txtai import LLM 109 110 # Load Qwen3 0.6B 111 path = "Qwen/Qwen3-0.6B" 112 model = AutoModelForCausalLM.from_pretrained( 113 path, 114 dtype=torch.bfloat16, 115 ) 116 tokenizer = AutoTokenizer.from_pretrained(path) 117 118 llm = LLM((model, tokenizer)) 119 ``` 120 121 See the links below for more detailed examples. 122 123 | Notebook | Description | | 124 |:----------|:-------------|------:| 125 | [Prompt-driven search with LLMs](https://github.com/neuml/txtai/blob/master/examples/42_Prompt_driven_search_with_LLMs.ipynb) | Embeddings-guided and Prompt-driven search with Large Language Models (LLMs) | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/42_Prompt_driven_search_with_LLMs.ipynb) | 126 | [Prompt templates and task chains](https://github.com/neuml/txtai/blob/master/examples/44_Prompt_templates_and_task_chains.ipynb) | Build model prompts and connect tasks together with workflows | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/44_Prompt_templates_and_task_chains.ipynb) | 127 | [Build RAG pipelines with txtai](https://github.com/neuml/txtai/blob/master/examples/52_Build_RAG_pipelines_with_txtai.ipynb) [▶️](https://www.youtube.com/watch?v=t_OeAc8NVfQ) | Guide on retrieval augmented generation including how to create citations | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/52_Build_RAG_pipelines_with_txtai.ipynb) | 128 | [Integrate LLM frameworks](https://github.com/neuml/txtai/blob/master/examples/53_Integrate_LLM_Frameworks.ipynb) | Integrate llama.cpp, LiteLLM and custom generation frameworks | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/53_Integrate_LLM_Frameworks.ipynb) | 129 | [Generate knowledge with Semantic Graphs and RAG](https://github.com/neuml/txtai/blob/master/examples/55_Generate_knowledge_with_Semantic_Graphs_and_RAG.ipynb) | Knowledge exploration and discovery with Semantic Graphs and RAG | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/55_Generate_knowledge_with_Semantic_Graphs_and_RAG.ipynb) | 130 | [Build knowledge graphs with LLMs](https://github.com/neuml/txtai/blob/master/examples/57_Build_knowledge_graphs_with_LLM_driven_entity_extraction.ipynb) | Build knowledge graphs with LLM-driven entity extraction | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/57_Build_knowledge_graphs_with_LLM_driven_entity_extraction.ipynb) | 131 | [Advanced RAG with graph path traversal](https://github.com/neuml/txtai/blob/master/examples/58_Advanced_RAG_with_graph_path_traversal.ipynb) | Graph path traversal to collect complex sets of data for advanced RAG | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/58_Advanced_RAG_with_graph_path_traversal.ipynb) | 132 | [Advanced RAG with guided generation](https://github.com/neuml/txtai/blob/master/examples/60_Advanced_RAG_with_guided_generation.ipynb) | Retrieval Augmented and Guided Generation | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/60_Advanced_RAG_with_guided_generation.ipynb) | 133 | [RAG with llama.cpp and external API services](https://github.com/neuml/txtai/blob/master/examples/62_RAG_with_llama_cpp_and_external_API_services.ipynb) | RAG with additional vector and LLM frameworks | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/62_RAG_with_llama_cpp_and_external_API_services.ipynb) | 134 | [How RAG with txtai works](https://github.com/neuml/txtai/blob/master/examples/63_How_RAG_with_txtai_works.ipynb) | Create RAG processes, API services and Docker instances | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/63_How_RAG_with_txtai_works.ipynb) | 135 | [Speech to Speech RAG](https://github.com/neuml/txtai/blob/master/examples/65_Speech_to_Speech_RAG.ipynb) [▶️](https://www.youtube.com/watch?v=tH8QWwkVMKA) | Full cycle speech to speech workflow with RAG | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/65_Speech_to_Speech_RAG.ipynb) | 136 | [Analyzing Hugging Face Posts with Graphs and Agents](https://github.com/neuml/txtai/blob/master/examples/68_Analyzing_Hugging_Face_Posts_with_Graphs_and_Agents.ipynb) | Explore a rich dataset with Graph Analysis and Agents | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/68_Analyzing_Hugging_Face_Posts_with_Graphs_and_Agents.ipynb) | 137 | [Granting autonomy to agents](https://github.com/neuml/txtai/blob/master/examples/69_Granting_autonomy_to_agents.ipynb) | Agents that iteratively solve problems as they see fit | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/69_Granting_autonomy_to_agents.ipynb) | 138 | [Getting started with LLM APIs](https://github.com/neuml/txtai/blob/master/examples/70_Getting_started_with_LLM_APIs.ipynb) | Generate embeddings and run LLMs with OpenAI, Claude, Gemini, Bedrock and more | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/70_Getting_started_with_LLM_APIs.ipynb) | 139 | [Analyzing LinkedIn Company Posts with Graphs and Agents](https://github.com/neuml/txtai/blob/master/examples/71_Analyzing_LinkedIn_Company_Posts_with_Graphs_and_Agents.ipynb) | Exploring how to improve social media engagement with AI | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/71_Analyzing_LinkedIn_Company_Posts_with_Graphs_and_Agents.ipynb) | 140 | [Parsing the stars with txtai](https://github.com/neuml/txtai/blob/master/examples/72_Parsing_the_stars_with_txtai.ipynb) | Explore an astronomical knowledge graph of known stars, planets, galaxies | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/72_Parsing_the_stars_with_txtai.ipynb) | 141 | [Chunking your data for RAG](https://github.com/neuml/txtai/blob/master/examples/73_Chunking_your_data_for_RAG.ipynb) | Extract, chunk and index content for effective retrieval | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/73_Chunking_your_data_for_RAG.ipynb) | 142 | [Medical RAG Research with txtai](https://github.com/neuml/txtai/blob/master/examples/75_Medical_RAG_Research_with_txtai.ipynb) | Analyze PubMed article metadata with RAG | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/75_Medical_RAG_Research_with_txtai.ipynb) | 143 | [GraphRAG with Wikipedia and GPT OSS](https://github.com/neuml/txtai/blob/master/examples/77_GraphRAG_with_Wikipedia_and_GPT_OSS.ipynb) | Deep graph search powered RAG | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/77_GraphRAG_with_Wikipedia_and_GPT_OSS.ipynb) | 144 | [RAG is more than Vector Search](https://github.com/neuml/txtai/blob/master/examples/79_RAG_is_more_than_Vector_Search.ipynb) | Context retrieval via Web, SQL and other sources | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/79_RAG_is_more_than_Vector_Search.ipynb) | 145 | [OpenCode as a txtai LLM](https://github.com/neuml/txtai/blob/master/examples/81_OpenCode_as_a_txtai_LLM.ipynb) | Integrate OpenCode with the txtai ecosystem | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/81_OpenCode_as_a_txtai_LLM.ipynb) | 146 | [Agentic College Search](https://github.com/neuml/txtai/blob/master/examples/82_Agentic_College_Search.ipynb) | Identify a list of strong engineering colleges | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/82_Agentic_College_Search.ipynb) | 147 | [TxtAI got skills](https://github.com/neuml/txtai/blob/master/examples/83_TxtAI_got_skills.ipynb) | Integrate skill.md files with your agent | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/83_TxtAI_got_skills.ipynb) | 148 | [Agent Tools](https://github.com/neuml/txtai/blob/master/examples/84_Agent_Tools.ipynb) [▶️](https://www.youtube.com/watch?v=RDNaFXQy3GQ) | Learn about the txtai agent toolkit | [](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/84_Agent_Tools.ipynb) | 149 150 ## Configuration-driven example 151 152 Pipelines are run with Python or configuration. Pipelines can be instantiated in [configuration](../../../api/configuration/#pipeline) using the lower case name of the pipeline. Configuration-driven pipelines are run with [workflows](../../../workflow/#configuration-driven-example) or the [API](../../../api#local-instance). 153 154 ### config.yml 155 ```yaml 156 # Create pipeline using lower case class name 157 llm: 158 159 # Run pipeline with workflow 160 workflow: 161 llm: 162 tasks: 163 - action: llm 164 ``` 165 166 Similar to the Python example above, the underlying [Hugging Face pipeline parameters](https://huggingface.co/docs/transformers/main/main_classes/pipelines#transformers.pipeline.model) and [model parameters](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModel.from_pretrained) can be set in pipeline configuration. 167 168 ```yaml 169 llm: 170 path: Qwen/Qwen3-0.6B 171 dtype: torch.bfloat16 172 ``` 173 174 ### Run with Workflows 175 176 ```python 177 from txtai import Application 178 179 # Create and run pipeline with workflow 180 app = Application("config.yml") 181 list(app.workflow("llm", [ 182 """ 183 Answer the following question using the provided context. 184 185 Question: 186 What are the applications of txtai? 187 188 Context: 189 txtai is an open-source platform for semantic search and 190 workflows powered by language models. 191 """ 192 ])) 193 ``` 194 195 ### Run with API 196 197 ```bash 198 CONFIG=config.yml uvicorn "txtai.api:app" & 199 200 curl \ 201 -X POST "http://localhost:8000/workflow" \ 202 -H "Content-Type: application/json" \ 203 -d '{"name":"llm", "elements": ["Answer the following question..."]}' 204 ``` 205 206 ## Methods 207 208 Python documentation for the pipeline. 209 210 ### ::: txtai.pipeline.LLM.__init__ 211 ### ::: txtai.pipeline.LLM.__call__