/ examples / context / 15_recursive_context.py
15_recursive_context.py
  1  """
  2  Example 15: Recursive Context Pattern
  3  
  4  Demonstrates the RLM (Recursive Language Model) pattern for handling large context:
  5  - Store large documents as artifacts (not in LLM context)
  6  - Agent uses tools (grep/head/tail/chunk) to explore artifacts
  7  - Dramatically reduces token usage (~2K vs ~10K+)
  8  
  9  This pattern solves "context rot" by treating context as programmatically
 10  explorable data instead of dumping everything into the LLM's context window.
 11  """
 12  
 13  import tempfile
 14  import os
 15  from praisonaiagents import Agent
 16  from praisonai.context import FileSystemArtifactStore
 17  from praisonai.context.queue import create_artifact_tools
 18  from praisonaiagents.context.artifacts import ArtifactMetadata
 19  
 20  
 21  def create_sample_document():
 22      """Create a sample large document for testing."""
 23      # Simulate a research paper / large document
 24      content = """
 25  # Attention Is All You Need
 26  
 27  ## Abstract
 28  
 29  The dominant sequence transduction models are based on complex recurrent or 
 30  convolutional neural networks that include an encoder and a decoder. The best
 31  performing models also connect the encoder and decoder through an attention
 32  mechanism. We propose a new simple network architecture, the Transformer,
 33  based solely on attention mechanisms, dispensing with recurrence and convolutions
 34  entirely. 
 35  
 36  ## Introduction
 37  
 38  Recurrent neural networks, long short-term memory and gated recurrent neural 
 39  networks in particular, have been firmly established as state of the art 
 40  approaches in sequence modeling and transduction problems such as language 
 41  modeling and machine translation. 
 42  
 43  ## Background
 44  
 45  The goal of reducing sequential computation also forms the foundation of the 
 46  Extended Neural GPU, ByteNet and ConvS2S, all of which use convolutional neural
 47  networks as basic building block.
 48  
 49  ## Model Architecture
 50  
 51  ### Encoder and Decoder Stacks
 52  
 53  The encoder is composed of a stack of N = 6 identical layers. Each layer has 
 54  two sub-layers. The first is a multi-head self-attention mechanism, and the 
 55  second is a simple, position-wise fully connected feed-forward network.
 56  
 57  ### Attention
 58  
 59  An attention function can be described as mapping a query and a set of key-value
 60  pairs to an output, where the query, keys, values, and output are all vectors.
 61  
 62  #### Scaled Dot-Product Attention
 63  
 64  We call our particular attention "Scaled Dot-Product Attention". The input 
 65  consists of queries and keys of dimension dk, and values of dimension dv.
 66  
 67  The formula is: Attention(Q, K, V) = softmax(QK^T / sqrt(d_k))V
 68  
 69  ### Multi-Head Attention
 70  
 71  Multi-head attention allows the model to jointly attend to information from
 72  different representation subspaces at different positions.
 73  
 74  ## Results
 75  
 76  ### Machine Translation
 77  
 78  On the WMT 2014 English-to-German translation task, the big transformer model
 79  outperforms the best previously reported models including ensembles.
 80  
 81  The model achieved a BLEU score of 28.4 on the WMT 2014 English-to-German 
 82  translation task, improving over the existing best results by over 2 BLEU.
 83  
 84  On the WMT 2014 English-to-French translation task, our big model achieves 
 85  a BLEU score of 41.8, outperforming all published single models.
 86  
 87  ### Training
 88  
 89  Training took 3.5 days on 8 P100 GPUs. The big model was trained for 300,000
 90  steps (3.5 days) at a cost of approximately $1000 in cloud compute.
 91  
 92  ## Model Variations
 93  
 94  | Model | Parameters | BLEU EN-DE | BLEU EN-FR |
 95  |-------|------------|------------|------------|
 96  | Base  | 65M        | 27.3       | 38.1       |
 97  | Big   | 213M       | 28.4       | 41.8       |
 98  
 99  The base model has 65 million parameters (65 × 10^6).
100  The big model has 213 million parameters.
101  
102  ## Conclusion
103  
104  In this work, we presented the Transformer, the first sequence transduction
105  model based entirely on attention, replacing the recurrent layers most commonly
106  used in encoder-decoder architectures with multi-headed self-attention.
107  
108  The Transformer can be trained significantly faster than architectures based
109  on recurrent or convolutional layers.
110  
111  ## Authors
112  
113  Ashish Vaswani (Google Brain)
114  Noam Shazeer (Google Brain)
115  Niki Parmar (Google Research)
116  Jakob Uszkoreit (Google Research)
117  Llion Jones (Google Research)
118  Aidan N. Gomez (University of Toronto)
119  Łukasz Kaiser (Google Brain)
120  Illia Polosukhin
121  
122  ## References
123  
124  [1] Neural Machine Translation by Jointly Learning to Align and Translate
125  [2] Sequence to Sequence Learning with Neural Networks
126  [3] Learning Phrase Representations using RNN Encoder-Decoder
127  """
128      # Repeat content to make it larger
129      full_content = (content + "\n\n") * 10  # ~35KB
130      return full_content
131  
132  
133  def main():
134      print("=" * 70)
135      print("Recursive Context Pattern - Token Efficiency Demo")
136      print("=" * 70)
137      
138      # Track token usage across methods
139      token_stats = {
140          "traditional": {"input": 0, "output": 0},
141          "recursive": {"input": 0, "output": 0},
142      }
143      
144      with tempfile.TemporaryDirectory() as tmpdir:
145          # 1. Create artifact store and tools
146          store = FileSystemArtifactStore(base_dir=tmpdir)
147          artifact_tools = create_artifact_tools(store=store)
148          
149          print(f"\n1. Created FileSystemArtifactStore")
150          print(f"   Location: {tmpdir}")
151          print(f"   Tools available: {[t.__name__ for t in artifact_tools]}")
152          
153          # 2. Create sample document and store as artifact
154          document = create_sample_document()
155          doc_size = len(document)
156          doc_tokens = doc_size // 4  # Rough estimate: 4 chars per token
157          
158          metadata = ArtifactMetadata(
159              agent_id="research_agent",
160              run_id="demo_run",
161              tool_name="document_loader",
162              turn_id=1
163          )
164          
165          ref = store.store(document, metadata)
166          
167          print(f"\n2. Stored document as artifact")
168          print(f"   Size: {doc_size:,} chars (~{doc_tokens:,} tokens)")
169          print(f"   Path: {ref.path}")
170          print(f"   Summary: {ref.summary[:80]}...")
171          
172          # 3. Create agent WITH artifact tools (Recursive Context approach)
173          agent = Agent(
174              instructions="""You are a research paper analyst with access to artifact exploration tools.
175  When given an artifact path, use the tools to explore it efficiently:
176  - artifact_grep: Search for patterns (USE THIS FIRST)
177  - artifact_head: See first N lines
178  - artifact_tail: See last N lines
179  - artifact_chunk: Get specific line ranges
180  
181  Be precise and cite line numbers when finding information.""",
182              tools=artifact_tools,
183              output='silent'  # Quiet output for cleaner demo
184          )
185          
186          print(f"\n3. Created agent with {len(agent.tools)} artifact tools")
187          
188          # 4. Query using RECURSIVE CONTEXT (small prompt + tools)
189          print(f"\n4. RECURSIVE CONTEXT APPROACH")
190          print("-" * 40)
191          
192          prompt = f"""I have a research paper stored at: {ref.path}
193  
194  Find the BLEU score on the WMT 2014 English-to-German translation task.
195  Use artifact_grep to search, then report the answer with line number."""
196  
197          prompt_tokens = len(prompt) // 4
198          print(f"   Prompt size: {len(prompt)} chars (~{prompt_tokens} tokens)")
199          print(f"   Document NOT in prompt context!")
200          
201          response = agent.chat(prompt)
202          
203          # Estimate output tokens (could use actual LiteLLM response if available)
204          output_tokens = len(response) // 4
205          
206          # Calculate recursive approach tokens (prompt + tool calls + response)
207          # Tool call: ~200 tokens for tool desc + call
208          # Tool response: ~300 tokens for grep results
209          estimated_total = prompt_tokens + 200 + 300 + output_tokens
210          token_stats["recursive"]["input"] = estimated_total
211          
212          print(f"\n   Agent Response:")
213          print(f"   {response[:200]}...")
214          print(f"\n   Estimated tokens used: ~{estimated_total:,}")
215          
216          # 5. Compare with TRADITIONAL approach (full context)
217          print(f"\n5. TRADITIONAL APPROACH (for comparison)")
218          print("-" * 40)
219          
220          traditional_prompt_tokens = prompt_tokens + doc_tokens
221          token_stats["traditional"]["input"] = traditional_prompt_tokens
222          
223          print(f"   If we passed FULL document in prompt:")
224          print(f"   Prompt: {prompt_tokens} + Document: {doc_tokens} = {traditional_prompt_tokens:,} tokens")
225          
226          # 6. Show token savings
227          print(f"\n6. TOKEN COMPARISON")
228          print("=" * 40)
229          
230          savings = token_stats["traditional"]["input"] - token_stats["recursive"]["input"]
231          savings_pct = (savings / token_stats["traditional"]["input"]) * 100
232          
233          print(f"   Traditional approach: ~{token_stats['traditional']['input']:,} tokens")
234          print(f"   Recursive approach:   ~{token_stats['recursive']['input']:,} tokens")
235          print(f"   -----------------------------------")
236          print(f"   SAVINGS:             ~{savings:,} tokens ({savings_pct:.0f}%)")
237          
238          # 7. Demonstrate other artifact operations
239          print(f"\n7. ARTIFACT OPERATIONS DEMO")
240          print("-" * 40)
241          
242          # head
243          head_result = store.head(ref, lines=5)
244          print(f"   head(5 lines):")
245          for line in head_result.split('\n')[:3]:
246              print(f"      {line[:60]}")
247          
248          # grep
249          grep_results = store.grep(ref, pattern=r"BLEU.*\d+", max_matches=3)
250          print(f"\n   grep('BLEU.*\\d+'):")
251          for match in grep_results[:2]:
252              print(f"      Line {match.line_number}: {match.line_content[:50].strip()}...")
253          
254          # chunk
255          chunk_result = store.chunk(ref, start_line=50, end_line=55)
256          print(f"\n   chunk(lines 50-55):")
257          for line in chunk_result.split('\n')[:3]:
258              print(f"      {line[:60]}")
259          
260          print("\n" + "=" * 70)
261          print("CONCLUSION: Recursive Context Pattern")
262          print("=" * 70)
263          print("""
264  The Recursive Context pattern (RLM) provides:
265  ✓ Token efficiency: ~{savings_pct:.0f}% reduction in token usage
266  ✓ Scalability: Handle documents of any size
267  ✓ Precision: Agent searches for exactly what it needs
268  ✓ Cost savings: Lower API costs due to fewer tokens
269  ✓ Better accuracy: Avoids "context rot" from large contexts
270  """.format(savings_pct=savings_pct))
271  
272  
273  if __name__ == "__main__":
274      main()