15_recursive_context.py
1 """ 2 Example 15: Recursive Context Pattern 3 4 Demonstrates the RLM (Recursive Language Model) pattern for handling large context: 5 - Store large documents as artifacts (not in LLM context) 6 - Agent uses tools (grep/head/tail/chunk) to explore artifacts 7 - Dramatically reduces token usage (~2K vs ~10K+) 8 9 This pattern solves "context rot" by treating context as programmatically 10 explorable data instead of dumping everything into the LLM's context window. 11 """ 12 13 import tempfile 14 import os 15 from praisonaiagents import Agent 16 from praisonai.context import FileSystemArtifactStore 17 from praisonai.context.queue import create_artifact_tools 18 from praisonaiagents.context.artifacts import ArtifactMetadata 19 20 21 def create_sample_document(): 22 """Create a sample large document for testing.""" 23 # Simulate a research paper / large document 24 content = """ 25 # Attention Is All You Need 26 27 ## Abstract 28 29 The dominant sequence transduction models are based on complex recurrent or 30 convolutional neural networks that include an encoder and a decoder. The best 31 performing models also connect the encoder and decoder through an attention 32 mechanism. We propose a new simple network architecture, the Transformer, 33 based solely on attention mechanisms, dispensing with recurrence and convolutions 34 entirely. 35 36 ## Introduction 37 38 Recurrent neural networks, long short-term memory and gated recurrent neural 39 networks in particular, have been firmly established as state of the art 40 approaches in sequence modeling and transduction problems such as language 41 modeling and machine translation. 42 43 ## Background 44 45 The goal of reducing sequential computation also forms the foundation of the 46 Extended Neural GPU, ByteNet and ConvS2S, all of which use convolutional neural 47 networks as basic building block. 48 49 ## Model Architecture 50 51 ### Encoder and Decoder Stacks 52 53 The encoder is composed of a stack of N = 6 identical layers. Each layer has 54 two sub-layers. The first is a multi-head self-attention mechanism, and the 55 second is a simple, position-wise fully connected feed-forward network. 56 57 ### Attention 58 59 An attention function can be described as mapping a query and a set of key-value 60 pairs to an output, where the query, keys, values, and output are all vectors. 61 62 #### Scaled Dot-Product Attention 63 64 We call our particular attention "Scaled Dot-Product Attention". The input 65 consists of queries and keys of dimension dk, and values of dimension dv. 66 67 The formula is: Attention(Q, K, V) = softmax(QK^T / sqrt(d_k))V 68 69 ### Multi-Head Attention 70 71 Multi-head attention allows the model to jointly attend to information from 72 different representation subspaces at different positions. 73 74 ## Results 75 76 ### Machine Translation 77 78 On the WMT 2014 English-to-German translation task, the big transformer model 79 outperforms the best previously reported models including ensembles. 80 81 The model achieved a BLEU score of 28.4 on the WMT 2014 English-to-German 82 translation task, improving over the existing best results by over 2 BLEU. 83 84 On the WMT 2014 English-to-French translation task, our big model achieves 85 a BLEU score of 41.8, outperforming all published single models. 86 87 ### Training 88 89 Training took 3.5 days on 8 P100 GPUs. The big model was trained for 300,000 90 steps (3.5 days) at a cost of approximately $1000 in cloud compute. 91 92 ## Model Variations 93 94 | Model | Parameters | BLEU EN-DE | BLEU EN-FR | 95 |-------|------------|------------|------------| 96 | Base | 65M | 27.3 | 38.1 | 97 | Big | 213M | 28.4 | 41.8 | 98 99 The base model has 65 million parameters (65 × 10^6). 100 The big model has 213 million parameters. 101 102 ## Conclusion 103 104 In this work, we presented the Transformer, the first sequence transduction 105 model based entirely on attention, replacing the recurrent layers most commonly 106 used in encoder-decoder architectures with multi-headed self-attention. 107 108 The Transformer can be trained significantly faster than architectures based 109 on recurrent or convolutional layers. 110 111 ## Authors 112 113 Ashish Vaswani (Google Brain) 114 Noam Shazeer (Google Brain) 115 Niki Parmar (Google Research) 116 Jakob Uszkoreit (Google Research) 117 Llion Jones (Google Research) 118 Aidan N. Gomez (University of Toronto) 119 Łukasz Kaiser (Google Brain) 120 Illia Polosukhin 121 122 ## References 123 124 [1] Neural Machine Translation by Jointly Learning to Align and Translate 125 [2] Sequence to Sequence Learning with Neural Networks 126 [3] Learning Phrase Representations using RNN Encoder-Decoder 127 """ 128 # Repeat content to make it larger 129 full_content = (content + "\n\n") * 10 # ~35KB 130 return full_content 131 132 133 def main(): 134 print("=" * 70) 135 print("Recursive Context Pattern - Token Efficiency Demo") 136 print("=" * 70) 137 138 # Track token usage across methods 139 token_stats = { 140 "traditional": {"input": 0, "output": 0}, 141 "recursive": {"input": 0, "output": 0}, 142 } 143 144 with tempfile.TemporaryDirectory() as tmpdir: 145 # 1. Create artifact store and tools 146 store = FileSystemArtifactStore(base_dir=tmpdir) 147 artifact_tools = create_artifact_tools(store=store) 148 149 print(f"\n1. Created FileSystemArtifactStore") 150 print(f" Location: {tmpdir}") 151 print(f" Tools available: {[t.__name__ for t in artifact_tools]}") 152 153 # 2. Create sample document and store as artifact 154 document = create_sample_document() 155 doc_size = len(document) 156 doc_tokens = doc_size // 4 # Rough estimate: 4 chars per token 157 158 metadata = ArtifactMetadata( 159 agent_id="research_agent", 160 run_id="demo_run", 161 tool_name="document_loader", 162 turn_id=1 163 ) 164 165 ref = store.store(document, metadata) 166 167 print(f"\n2. Stored document as artifact") 168 print(f" Size: {doc_size:,} chars (~{doc_tokens:,} tokens)") 169 print(f" Path: {ref.path}") 170 print(f" Summary: {ref.summary[:80]}...") 171 172 # 3. Create agent WITH artifact tools (Recursive Context approach) 173 agent = Agent( 174 instructions="""You are a research paper analyst with access to artifact exploration tools. 175 When given an artifact path, use the tools to explore it efficiently: 176 - artifact_grep: Search for patterns (USE THIS FIRST) 177 - artifact_head: See first N lines 178 - artifact_tail: See last N lines 179 - artifact_chunk: Get specific line ranges 180 181 Be precise and cite line numbers when finding information.""", 182 tools=artifact_tools, 183 output='silent' # Quiet output for cleaner demo 184 ) 185 186 print(f"\n3. Created agent with {len(agent.tools)} artifact tools") 187 188 # 4. Query using RECURSIVE CONTEXT (small prompt + tools) 189 print(f"\n4. RECURSIVE CONTEXT APPROACH") 190 print("-" * 40) 191 192 prompt = f"""I have a research paper stored at: {ref.path} 193 194 Find the BLEU score on the WMT 2014 English-to-German translation task. 195 Use artifact_grep to search, then report the answer with line number.""" 196 197 prompt_tokens = len(prompt) // 4 198 print(f" Prompt size: {len(prompt)} chars (~{prompt_tokens} tokens)") 199 print(f" Document NOT in prompt context!") 200 201 response = agent.chat(prompt) 202 203 # Estimate output tokens (could use actual LiteLLM response if available) 204 output_tokens = len(response) // 4 205 206 # Calculate recursive approach tokens (prompt + tool calls + response) 207 # Tool call: ~200 tokens for tool desc + call 208 # Tool response: ~300 tokens for grep results 209 estimated_total = prompt_tokens + 200 + 300 + output_tokens 210 token_stats["recursive"]["input"] = estimated_total 211 212 print(f"\n Agent Response:") 213 print(f" {response[:200]}...") 214 print(f"\n Estimated tokens used: ~{estimated_total:,}") 215 216 # 5. Compare with TRADITIONAL approach (full context) 217 print(f"\n5. TRADITIONAL APPROACH (for comparison)") 218 print("-" * 40) 219 220 traditional_prompt_tokens = prompt_tokens + doc_tokens 221 token_stats["traditional"]["input"] = traditional_prompt_tokens 222 223 print(f" If we passed FULL document in prompt:") 224 print(f" Prompt: {prompt_tokens} + Document: {doc_tokens} = {traditional_prompt_tokens:,} tokens") 225 226 # 6. Show token savings 227 print(f"\n6. TOKEN COMPARISON") 228 print("=" * 40) 229 230 savings = token_stats["traditional"]["input"] - token_stats["recursive"]["input"] 231 savings_pct = (savings / token_stats["traditional"]["input"]) * 100 232 233 print(f" Traditional approach: ~{token_stats['traditional']['input']:,} tokens") 234 print(f" Recursive approach: ~{token_stats['recursive']['input']:,} tokens") 235 print(f" -----------------------------------") 236 print(f" SAVINGS: ~{savings:,} tokens ({savings_pct:.0f}%)") 237 238 # 7. Demonstrate other artifact operations 239 print(f"\n7. ARTIFACT OPERATIONS DEMO") 240 print("-" * 40) 241 242 # head 243 head_result = store.head(ref, lines=5) 244 print(f" head(5 lines):") 245 for line in head_result.split('\n')[:3]: 246 print(f" {line[:60]}") 247 248 # grep 249 grep_results = store.grep(ref, pattern=r"BLEU.*\d+", max_matches=3) 250 print(f"\n grep('BLEU.*\\d+'):") 251 for match in grep_results[:2]: 252 print(f" Line {match.line_number}: {match.line_content[:50].strip()}...") 253 254 # chunk 255 chunk_result = store.chunk(ref, start_line=50, end_line=55) 256 print(f"\n chunk(lines 50-55):") 257 for line in chunk_result.split('\n')[:3]: 258 print(f" {line[:60]}") 259 260 print("\n" + "=" * 70) 261 print("CONCLUSION: Recursive Context Pattern") 262 print("=" * 70) 263 print(""" 264 The Recursive Context pattern (RLM) provides: 265 ✓ Token efficiency: ~{savings_pct:.0f}% reduction in token usage 266 ✓ Scalability: Handle documents of any size 267 ✓ Precision: Agent searches for exactly what it needs 268 ✓ Cost savings: Lower API costs due to fewer tokens 269 ✓ Better accuracy: Avoids "context rot" from large contexts 270 """.format(savings_pct=savings_pct)) 271 272 273 if __name__ == "__main__": 274 main()