v0.2.md
1 # Oxyjen v0.2 Documentation 2 3 Oxyjen v0.2 introduces the **first stable execution layer** of the framework. 4 5 The focus of this release is correctness, determinism, and clean boundaries: 6 - Memory-aware graph nodes 7 - Deterministic retry & fallback execution 8 - Clear separation between policy and enforcement 9 - A minimal but stable public LLM API 10 11 This document describes **what v0.2 adds**, how the pieces fit together, and how to use them. 12 13 --- 14 15 ## Table of Contents 16 1. [What's New](#whats-new-in-v02) 17 2. [What's NOT Included](#explicitly-not-included) 18 3. [Core Concepts](#memory--nodecontext-core-concept) 19 4. [Public LLM API](#public-llm-api) 20 5. [LLMNode](#llmnode-graph-primitive) 21 6. [LLMChain](#llmchain-retry--fallback-execution) 22 7. [Exception Model](#exception-model-execution-semantics) 23 8. [OpenAI Transport](#transportopenai) 24 9. [Stability Guarantees](#stability-guarantees) 25 26 --- 27 28 ## What's New in v0.2 29 30 ### Core additions 31 - **`Memory`** abstraction with ordered history 32 - **`NodeContext`** for stateful graph execution 33 - **`LLMNode`** as the primary graph primitive 34 - **`LLMChain`** for retry + fallback execution 35 - **Public `LLM` API** (`of`, `profile`, `chain`) 36 - **Explicit exception taxonomy** for deterministic retry behavior 37 - **OpenAI transport layer** (`transport/openai`) 38 39 --- 40 41 ## Explicitly NOT Included 42 43 The following features are **intentionally deferred** to v0.3+: 44 45 ### Timeout Enforcement 46 - Timeout is **policy only** in v0.2 47 - `LLMChain.builder().timeout(Duration.ofSeconds(5))` sets intent but **does not enforce** 48 - No timeout exceptions thrown 49 - No retries triggered by timeout 50 - **Enforcement planned for v0.3** 51 52 ### Model Registry 53 - No centralized model registry in v0.2 54 - Model validation happens at runtime via OpenAI API 55 - Unknown models fail on first API call 56 - **Planned for v0.3**: `Models.register()` and pre-validation 57 58 ### Conversation Replay 59 - Memory stores full conversation history 60 - **BUT**: LLMs are stateless and don't automatically use history 61 - Each `chat()` call is independent 62 - Users must manually build conversation context (see [Memory Limitations](#memory-limitations)) 63 - **Planned for v0.3**: Automatic conversation replay via `ChatMemory` 64 65 ### Additional Features (v0.3+) 66 - DAG execution 67 - Concurrency 68 - Streaming 69 - Automatic OpenAI smoke tests (manual only) 70 - Token counting 71 - Conversation summarization 72 73 --- 74 75 ## Memory & NodeContext (Core Concept) 76 77 ### Memory 78 79 `Memory` is a **state container** with two responsibilities: 80 1. **Key–value storage** for arbitrary data 81 2. **Ordered history** of events/messages 82 83 ```java 84 Memory memory = new InMemoryMemory("chat"); 85 86 // Key-value storage 87 memory.put("count", 42); 88 int count = memory.get("count", Integer.class); 89 90 // Ordered history 91 memory.append("user", "hello"); 92 memory.append("assistant", "hi"); 93 94 List<MemoryEntry> history = memory.entries(); 95 ``` 96 97 **Key properties:** 98 - Thread-safe 99 - Ordered history (insertion order preserved) 100 - Type-safe retrieval 101 - History is immutable from the outside 102 103 --- 104 105 ### NodeContext 106 107 `NodeContext` owns memory across executions. 108 109 ```java 110 NodeContext ctx = new NodeContext(); 111 112 Memory chat = ctx.memory("chat"); 113 Memory system = ctx.memory("system"); 114 ``` 115 116 **Important guarantees:** 117 - Same memory name → same instance 118 - Different names → isolated memory 119 - Memory survives across multiple node executions 120 - State lives in the context, **not inside nodes** 121 122 --- 123 124 ### Memory Limitations (v0.2) 125 126 **Critical Understanding:** 127 128 **Memory stores history, but LLMs don't automatically use it.** 129 130 ```java 131 NodeContext ctx = new NodeContext(); 132 LLMNode node = LLMNode.builder() 133 .model("gpt-4o") 134 .memory("chat") 135 .build(); 136 137 // Turn 1 138 String r1 = node.process("My name is Alice", ctx); 139 // Memory now has: [user: "My name is Alice", assistant: "..."] 140 141 // Turn 2 142 String r2 = node.process("What's my name?", ctx); 143 // MODEL WON'T REMEMBER! (in v0.2) 144 // The LLM receives ONLY "What's my name?" as a fresh prompt 145 ``` 146 147 **Why?** 148 - Memory **stores** the conversation 149 - LLMNode **appends** to memory 150 - But the underlying `ChatModel` is **stateless** 151 - Each `chat()` call is independent 152 153 **Workaround for v0.2:** 154 Users must manually build conversation context: 155 156 ```java 157 Memory memory = ctx.memory("chat"); 158 StringBuilder prompt = new StringBuilder(); 159 160 for (MemoryEntry entry : memory.entries()) { 161 prompt.append(entry.type()).append(": ") 162 .append(entry.value()).append("\n"); 163 } 164 165 String response = model.chat(prompt.toString()); 166 ``` 167 168 **v0.3 Solution:** 169 `ChatMemory` will automatically replay conversation history to the LLM. 170 171 --- 172 173 ## Public LLM API 174 175 ### ChatModel (root abstraction) 176 177 ```java 178 public interface ChatModel { 179 String chat(String input); 180 } 181 ``` 182 183 Everything in Oxyjen depends **only** on `ChatModel`. 184 185 This allows: 186 - Real models (OpenAI) 187 - Fake models (tests) 188 - Chains (retry + fallback) 189 - Future providers (Anthropic, local models) 190 191 --- 192 193 ### LLM.of(...) 194 195 Create a model by name: 196 197 ```java 198 ChatModel model = LLM.of("gpt-4o"); 199 String out = model.chat("hello"); 200 ``` 201 202 **Validation:** 203 - Unknown model → `IllegalArgumentException` 204 - Null / empty → `IllegalArgumentException` 205 206 --- 207 208 ### LLM.profile(...) 209 210 Profiles map use-cases to models. 211 212 **Default profiles:** 213 - `fast` → `gpt-4o-mini` 214 - `cheap` → `gpt-3.5-turbo` 215 - `smart` → `gpt-4o` 216 217 ```java 218 ChatModel model = LLM.profile("fast"); 219 ``` 220 221 **Note:** Profiles are runtime configuration, not execution logic. 222 223 --- 224 225 ## LLMNode (Graph Primitive) 226 227 `LLMNode` is where Oxyjen differs from LangChain. 228 229 **It is:** 230 - Memory-aware 231 - Context-driven 232 - Stateless itself (state lives in `NodeContext`) 233 234 ```java 235 LLMNode node = LLMNode.builder() 236 .model("gpt-4o") 237 .memory("chat") 238 .build(); 239 240 NodeContext ctx = new NodeContext(); 241 String out = node.process("hello", ctx); 242 ``` 243 244 ### What happens internally: 245 246 1. User input appended to memory 247 2. Model invoked (stateless!) 248 3. Assistant response appended to memory 249 4. Output returned 250 251 ### Memory after two runs: 252 253 ``` 254 user → hello 255 assistant → echo:hello 256 user → world 257 assistant → echo:world 258 ``` 259 260 **This proves nodes are stateful through context, not internally.** 261 262 --- 263 264 ## LLMChain (Retry + Fallback Execution) 265 266 `LLMChain` composes multiple `ChatModel`s for resilience. 267 268 ```java 269 ChatModel chain = LLMChain.builder() 270 .primary("gpt-4o") 271 .fallback("gpt-3.5-turbo") 272 .retry(3) 273 .build(); 274 ``` 275 276 ### Execution guarantees: 277 278 1. **Retries happen per model** (3 attempts on `gpt-4o`, then 3 on `gpt-3.5-turbo`) 279 2. **Fallback occurs after retries are exhausted** 280 3. **First successful response short-circuits execution** 281 4. **Final failure throws `LLMException`** 282 283 --- 284 285 ### Retry Semantics 286 287 Retries happen **only** for transient errors: 288 289 | Error Type | Retryable? | 290 |------------|-----------| 291 | `RateLimitException` | Yes | 292 | `NetworkException` | Yes | 293 | `TimeoutException` (v0.3) | Yes | 294 | `InvalidAPIKeyException` | No | 295 | `ModelNotFoundException` | No | 296 | `TokenLimitExceededException` | No | 297 298 --- 299 300 ### Backoff 301 302 Two strategies: 303 304 1. **Fixed backoff**: 1s between retries 305 2. **Exponential backoff**: 1s, 2s, 4s, 8s... 306 307 ```java 308 LLMChain.builder() 309 .primary("gpt-4o") 310 .retry(3) 311 .exponentialBackoff() // or fixedBackoff() 312 .build(); 313 ``` 314 315 **Note:** Backoff affects retry **delay** only, not execution order. 316 317 --- 318 319 ### Timeout (v0.2 - Policy Only) 320 321 ```java 322 LLMChain.builder() 323 .timeout(Duration.ofSeconds(5)) 324 ``` 325 326 **In v0.2:** 327 - Timeout is **policy only** 328 - ❌ No enforcement 329 - ❌ No exceptions 330 - ❌ No retries triggered 331 332 **Enforcement is planned for v0.3.** 333 334 --- 335 336 ## Exception Model (Execution Semantics) 337 338 Oxyjen uses **explicit exception types** to drive execution: 339 340 | Exception | Meaning | Retry? | 341 |-----------|---------|--------| 342 | `InvalidAPIKeyException` | Auth failure | No | 343 | `ModelNotFoundException` | Model invalid | No | 344 | `TokenLimitExceededException` | Prompt too large | No | 345 | `RateLimitException` | Transient quota | Yes | 346 | `NetworkException` | Server failure | Yes | 347 | `TimeoutException` (v0.3) | Budget exceeded | Yes | 348 349 This makes retry & fallback **deterministic and testable**. 350 351 --- 352 353 ## transport/openai 354 355 ### Models Registry (limited in v0.2) 356 357 ```java 358 Models.isSupported("gpt-4o"); // true 359 Models.isSupported("foo"); // false 360 ``` 361 362 - A minimal internal model registry (`Models`) exists 363 - Used for validation and error classification 364 - Not user-extensible in v0.2 365 - No dynamic registration 366 - **Planned for v0.3**: `Models.register()` and preflight validation 367 368 Note: Model metadata is internal in v0.2 and not part of the public API. 369 370 --- 371 372 ### OpenAIClient 373 374 **Responsible for:** 375 1. HTTP request construction 376 2. JSON parsing (minimal, v0.2) 377 3. Error classification 378 379 **Example error mapping:** 380 - `401` → `InvalidAPIKeyException` 381 - `429` → `RateLimitException` 382 - `5xx` → `NetworkException` 383 - `400` + "context length" → `TokenLimitExceededException` 384 385 --- 386 387 ### OpenAIChatModel 388 389 Wraps `OpenAIClient` behind `ChatModel`. 390 391 ```java 392 ChatModel model = LLM.openai("gpt-4o", apiKey); 393 String out = model.chat("hello"); 394 ``` 395 396 **This allows OpenAI to plug cleanly into:** 397 - `LLMNode` 398 - `LLMChain` 399 - Future graph executors 400 401 --- 402 403 ## Stability Guarantees 404 405 For **v0.2.x**: 406 - Public APIs are frozen 407 - Behavior is deterministic 408 - Exception model is stable 409 - Breaking changes only in v0.3 410 411 --- 412 413 ## Quick Start Example 414 415 ```java 416 import io.oxyjen.core.*; 417 import io.oxyjen.llm.*; 418 419 public class QuickStart { 420 public static void main(String[] args) { 421 // 1. Build a resilient chain 422 ChatModel chain = LLMChain.builder() 423 .primary("gpt-4o") 424 .fallback("gpt-4o-mini") 425 .retry(3) 426 .build(); 427 428 // 2. Create a graph node 429 LLMNode node = LLMNode.builder() 430 .model(chain) 431 .memory("conversation") 432 .build(); 433 434 // 3. Execute with context 435 NodeContext ctx = new NodeContext(); 436 String response = node.process("Explain quantum computing", ctx); 437 438 System.out.println(response); 439 440 // 4. Memory persists across calls 441 Memory memory = ctx.memory("conversation"); 442 System.out.println("History size: " + memory.entries().size()); 443 } 444 } 445 ``` 446 447 --- 448 449 ## What's Next? 450 451 ### v0.3 Roadmap 452 - Timeout enforcement 453 - Automatic conversation replay (`ChatMemory`) 454 - Model registry with pre-validation 455 - Token counting and management 456 - Conversation summarization 457 - Streaming support 458 459 --- 460 461 ## Feedback 462 463 Found a bug? Have a feature request? 464 465 - Open an issue on GitHub 466 - Join the discussion in Discussions 467 - Star the repo if you find Oxyjen useful! 468 469 --- 470 471 **Oxyjen v0.2** - Simple. Deterministic. Production-ready graphs for AI.