/ examples / 79_RAG_is_more_than_Vector_Search.ipynb
79_RAG_is_more_than_Vector_Search.ipynb
  1  {
  2   "cells": [
  3    {
  4     "cell_type": "markdown",
  5     "id": "4c8ab3b0",
  6     "metadata": {},
  7     "source": [
  8      "# RAG is more than Vector Search\n",
  9      "\n",
 10      "Retrieval Augmented Generation (RAG) is often associated with vector search. And while that is a primary use case, any search will do.\n",
 11      "\n",
 12      "- ✅ Vector Search\n",
 13      "- ✅ Web Search\n",
 14      "- ✅ SQL Query\n",
 15      "\n",
 16      "This notebook will go over a few RAG examples covering different retrieval methods. These examples require txtai 9.3+."
 17     ]
 18    },
 19    {
 20     "cell_type": "markdown",
 21     "id": "ab8a77d1",
 22     "metadata": {},
 23     "source": [
 24      "# Install dependencies\n",
 25      "\n",
 26      "Install `txtai` and all dependencies."
 27     ]
 28    },
 29    {
 30     "cell_type": "code",
 31     "execution_count": null,
 32     "id": "21bebb46",
 33     "metadata": {},
 34     "outputs": [],
 35     "source": [
 36      "%%capture\n",
 37      "!pip install git+https://github.com/neuml/txtai#egg=txtai[pipeline-data]\n",
 38      "\n",
 39      "# Download example SQL database\n",
 40      "!wget https://huggingface.co/NeuML/txtai-wikipedia-slim/resolve/main/documents"
 41     ]
 42    },
 43    {
 44     "cell_type": "markdown",
 45     "id": "1ece2b09",
 46     "metadata": {},
 47     "source": [
 48      "# RAG with Late Interaction\n",
 49      "\n",
 50      "The first example will cover RAG with ColBERT / Late Interaction retrieval. TxtAI 9.0 added support for [MUVERA](https://arxiv.org/abs/2405.19504) and [ColBERT](https://arxiv.org/abs/2112.01488) multi-vector ranking. \n",
 51      "\n",
 52      "We'll build a pipeline that reads the ColBERT v2 paper, extracts the text into sections and builds an index with a ColBERT model. Then we'll wrap that as a [Reranker pipeline](https://neuml.github.io/txtai/pipeline/text/reranker/) using the same ColBERT model. Finally a RAG pipeline will utilize this for retrieval.\n",
 53      "\n",
 54      "_Note: This uses the custom [ColBERT Muvera Nano](https://huggingface.co/NeuML/colbert-muvera-nano) model which is only 970K parameters! That's right thousands. It's surprisingly effective._"
 55     ]
 56    },
 57    {
 58     "cell_type": "code",
 59     "execution_count": 1,
 60     "id": "dae2e6dc",
 61     "metadata": {},
 62     "outputs": [
 63      {
 64       "name": "stdout",
 65       "output_type": "stream",
 66       "text": [
 67        "This paper introduces ColBERTv2, a neural information retrieval model that enhances the quality and efficiency of late interaction by combining an aggressive residual compression mechanism with a denoised supervision strategy, achieving state-of-the-art performance across diverse benchmarks while reducing the model's space footprint by 6–10× compared to previous methods.\n"
 68       ]
 69      }
 70     ],
 71     "source": [
 72      "from txtai import Embeddings, RAG, Textractor\n",
 73      "from txtai.pipeline import Reranker, Similarity\n",
 74      "\n",
 75      "# Get text from ColBERT v2 paper\n",
 76      "textractor = Textractor(sections=True, backend=\"docling\")\n",
 77      "data = textractor(\"https://arxiv.org/pdf/2112.01488\")\n",
 78      "\n",
 79      "# MUVERA fixed dimensional encodings\n",
 80      "embeddings = Embeddings(content=True, path=\"neuml/colbert-muvera-nano\", vectors={\"trust_remote_code\": True})\n",
 81      "embeddings.index(data)\n",
 82      "\n",
 83      "# Re-rank using same late interaction model\n",
 84      "reranker = Reranker(embeddings, Similarity(\"neuml/colbert-muvera-nano\", lateencode=True, vectors={\"trust_remote_code\": True}))\n",
 85      "\n",
 86      "template = \"\"\"\n",
 87      "  Answer the following question using the provided context.\n",
 88      "\n",
 89      "  Question:\n",
 90      "  {question}\n",
 91      "\n",
 92      "  Context:\n",
 93      "  {context}\n",
 94      "\"\"\"\n",
 95      "\n",
 96      "# RAG with late interaction models\n",
 97      "rag = RAG(reranker, \"Qwen/Qwen3-4B-Instruct-2507\", template=template, output=\"flatten\")\n",
 98      "print(rag(\"Write a sentence abstract about this paper\", maxlength=2048))"
 99     ]
100    },
101    {
102     "cell_type": "markdown",
103     "id": "b31e46de",
104     "metadata": {},
105     "source": [
106      "# RAG with a Web Search\n",
107      "\n",
108      "Next we'll run a RAG pipeline using a web search as the retrieval method."
109     ]
110    },
111    {
112     "cell_type": "code",
113     "execution_count": 2,
114     "id": "b61b684b",
115     "metadata": {},
116     "outputs": [
117      {
118       "name": "stdout",
119       "output_type": "stream",
120       "text": [
121        "Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It involves technologies like machine learning, deep learning, and natural language processing, and enables machines to simulate human-like learning, comprehension, problem solving, decision-making, creativity, and autonomy.\n"
122       ]
123      }
124     ],
125     "source": [
126      "from smolagents import WebSearchTool\n",
127      "\n",
128      "tool = WebSearchTool()\n",
129      "\n",
130      "def websearch(queries, limit):\n",
131      "    results = []\n",
132      "    for query in queries:\n",
133      "        result = [\n",
134      "            {\"id\": i, \"text\": f'{x[\"title\"]} {x[\"description\"]}', \"score\": 1.0} for i, x in enumerate(tool.search(query))\n",
135      "        ]\n",
136      "        results.append(result[:limit])\n",
137      "\n",
138      "    return results\n",
139      "\n",
140      "# RAG with a websearch\n",
141      "rag = RAG(websearch, \"Qwen/Qwen3-4B-Instruct-2507\", template=template, output=\"flatten\")\n",
142      "print(rag(\"What is AI?\", maxlength=2048))"
143     ]
144    },
145    {
146     "cell_type": "markdown",
147     "id": "45c8a096",
148     "metadata": {},
149     "source": [
150      "# RAG with a SQL Query\n",
151      "\n",
152      "The last example we'll cover is running RAG with a SQL query. We'll use the SQL database that's a component of the [txtai-wikipedia-slim](https://huggingface.co/NeuML/txtai-wikipedia-slim) embeddings database.\n",
153      "\n",
154      "Since this is just a database with Wikipedia abstracts, we'll need a way to build a SQL query from a search query. For that we'll use an LLM to extract a keyword to use in a `LIKE` clause.\n",
155      "\n",
156      "Given that the LLM used was released in August 2025, let's ask it a question that can only be accurated answered with external data. `Who won the 2025 World Series?` which ended in November."
157     ]
158    },
159    {
160     "cell_type": "code",
161     "execution_count": null,
162     "id": "c7ff2a35",
163     "metadata": {},
164     "outputs": [
165      {
166       "name": "stdout",
167       "output_type": "stream",
168       "text": [
169        "In the 2025 World Series, the Los Angeles Dodgers defeated the Toronto Blue Jays in seven games to win the championship. The series took place from October 24 to November 1 (ending early on November 2, Toronto time). Dodgers pitcher Yoshinobu Yamamoto was named the World Series MVP. The series was televised by Fox in the United States and by Sportsnet in Canada.\n"
170       ]
171      }
172     ],
173     "source": [
174      "import sqlite3\n",
175      "\n",
176      "from txtai import LLM\n",
177      "\n",
178      "def keyword(query):\n",
179      "    return llm(f\"\"\"\n",
180      "        Extract a keyword for this search query: {query}.\n",
181      "        Return only text with no other formatting or explanation.\n",
182      "    \"\"\")\n",
183      "\n",
184      "def sqlsearch(queries, limit):\n",
185      "    results = []\n",
186      "    sql = \"SELECT id, text FROM sections WHERE id LIKE ? LIMIT ?\"\n",
187      "\n",
188      "    for query in queries:\n",
189      "        # Extract a keyword for this search\n",
190      "        query = keyword(query)\n",
191      "\n",
192      "        # Run the SQL Query\n",
193      "        results.append([\n",
194      "            {\"id\": uid, \"text\": text, \"score\": 1.0}\n",
195      "            for uid, text in cursor.execute(sql, [f\"%{query}%\", limit])\n",
196      "        ])\n",
197      "\n",
198      "    return results\n",
199      "\n",
200      "# Load the database\n",
201      "cursor = sqlite3.connect(\"documents\")\n",
202      "\n",
203      "# Load the LLM\n",
204      "llm = LLM(\"Qwen/Qwen3-4B-Instruct-2507\")\n",
205      "\n",
206      "# RAG with a SQL query\n",
207      "rag = RAG(sqlsearch, llm, template=template, output=\"flatten\")\n",
208      "print(rag(\"Tell me what happened in the 2025 World Series\", maxlength=2048))"
209     ]
210    },
211    {
212     "cell_type": "markdown",
213     "id": "a9f78076",
214     "metadata": {},
215     "source": [
216      "And as we see, this answer is using the SQL database!"
217     ]
218    },
219    {
220     "cell_type": "markdown",
221     "id": "97b1e282",
222     "metadata": {},
223     "source": [
224      "# Wrapping up\n",
225      "\n",
226      "This notebook showed that RAG is about much more than vector search. With txtai 9.3+, any callable method is now supported for retrieval. Enjoy!"
227     ]
228    }
229   ],
230   "metadata": {
231    "kernelspec": {
232     "display_name": "local",
233     "language": "python",
234     "name": "python3"
235    },
236    "language_info": {
237     "codemirror_mode": {
238      "name": "ipython",
239      "version": 3
240     },
241     "file_extension": ".py",
242     "mimetype": "text/x-python",
243     "name": "python",
244     "nbconvert_exporter": "python",
245     "pygments_lexer": "ipython3",
246     "version": "3.10.19"
247    }
248   },
249   "nbformat": 4,
250   "nbformat_minor": 5
251  }