/ driftkit-common / README.md
README.md
1 # DriftKit Common Module 2 3 ## Overview 4 5 The `driftkit-common` module serves as the foundational layer for the DriftKit AI ETL framework, providing shared domain objects, utilities, and core services that other modules depend on. This module contains the essential building blocks for AI-powered applications including chat management, document processing, text analysis, and model integration. 6 7 ## Spring Boot Initialization 8 9 The common module doesn't require special Spring Boot configuration as it provides only domain objects and utilities. Simply include it as a dependency: 10 11 ```java 12 @SpringBootApplication 13 public class YourApplication { 14 public static void main(String[] args) { 15 SpringApplication.run(YourApplication.class, args); 16 } 17 } 18 ``` 19 20 The module provides: 21 - **Domain objects**: No Spring annotations, pure POJOs 22 - **Services**: Stateless utility services that can be instantiated directly 23 - **Configuration**: `EtlConfig` can be used with `@ConfigurationProperties` 24 25 ## Architecture 26 27 ### Module Structure 28 29 ``` 30 driftkit-common/ 31 ├── src/main/java/ai/driftkit/ 32 │ ├── common/ 33 │ │ ├── domain/ # Core domain objects 34 │ │ ├── service/ # Core services 35 │ │ └── utils/ # Utility classes 36 │ └── config/ # Configuration classes 37 ``` 38 39 ### Key Dependencies 40 41 - **Lombok** - Code generation and boilerplate reduction 42 - **Jackson** - JSON serialization with JSR310 support 43 - **Apache Commons Lang3** - String and general utilities 44 - **Apache Commons Collections4** - Enhanced collection operations 45 - **SLF4J** - Logging facade 46 - **Jakarta Validation** - Bean validation annotations 47 48 ## Core Domain Objects 49 50 ### Chat Management 51 52 #### Chat 53 Represents a chat session with comprehensive metadata: 54 55 ```java 56 @Data 57 @Builder 58 public class Chat { 59 private String id; 60 private String name; 61 private String systemMessage; 62 private Language language; 63 private Integer memoryLength; 64 private ModelRole modelRole; 65 private boolean hidden; 66 private LocalDateTime createdAt; 67 private LocalDateTime updatedAt; 68 } 69 ``` 70 71 **Key Features:** 72 - Unique identification and naming 73 - System message configuration for AI behavior 74 - Language specification (GENERAL, SPANISH, ENGLISH) 75 - Memory length control for conversation context 76 - Model role assignment (MAIN, ABTEST, CHECKER, NONE) 77 - Visibility controls with hidden flag 78 - Automatic timestamp tracking 79 80 **Usage Example:** 81 ```java 82 Chat chat = Chat.builder() 83 .id("chat-123") 84 .name("Customer Support Session") 85 .systemMessage("You are a helpful customer support agent") 86 .language(Language.ENGLISH) 87 .memoryLength(50) 88 .modelRole(ModelRole.MAIN) 89 .hidden(false) 90 .createdAt(LocalDateTime.now()) 91 .build(); 92 ``` 93 94 #### Message 95 Comprehensive message representation supporting multi-modal content: 96 97 ```java 98 @Data 99 @Builder 100 public class Message implements ChatItem { 101 private String id; 102 private String chatId; 103 private String parentId; 104 private String content; 105 private ChatMessageType messageType; 106 private MessageType contentType; 107 private Grade grade; 108 private Map<String, Object> workflowContext; 109 private LogProbs logProbs; 110 private LocalDateTime requestInitTime; 111 private LocalDateTime responseTime; 112 private Map<String, Object> variables; 113 private List<String> imageUrls; 114 private String audioUrl; 115 private String videoUrl; 116 private String fileUrl; 117 } 118 ``` 119 120 **Key Features:** 121 - Multi-modal support (TEXT, IMAGE, AUDIO, VIDEO, FILE) 122 - Hierarchical message structure with parent-child relationships 123 - Grading system for quality assessment 124 - Workflow context integration 125 - Token log probabilities for advanced analysis 126 - Performance timing tracking 127 - Variable and media URL storage 128 129 **Usage Example:** 130 ```java 131 Message message = Message.builder() 132 .id("msg-456") 133 .chatId("chat-123") 134 .content("Hello, how can I help you today?") 135 .messageType(ChatMessageType.AI) 136 .contentType(MessageType.TEXT) 137 .grade(Grade.EXCELLENT) 138 .requestInitTime(LocalDateTime.now()) 139 .build(); 140 ``` 141 142 #### AITask 143 Comprehensive task representation for AI operations: 144 145 ```java 146 @Data 147 @Builder 148 public class AITask { 149 private String id; 150 private String chatId; 151 private String workflowId; 152 private String prompt; 153 private Map<String, Object> variables; 154 private List<String> imageUrls; 155 private String audioUrl; 156 private String videoUrl; 157 private String fileUrl; 158 private Map<String, Object> workflowContext; 159 private Grade grade; 160 private LocalDateTime createdAt; 161 private LocalDateTime completedAt; 162 private String errorMessage; 163 private Map<String, Object> metadata; 164 } 165 ``` 166 167 **Key Features:** 168 - Multi-modal input support 169 - Workflow integration with context preservation 170 - Variable substitution support 171 - Performance and error tracking 172 - Metadata extensibility 173 - Quality grading system 174 175 ### Model Client Abstraction 176 177 #### ModelClient 178 Abstract base class for AI model integrations. The ModelClient provides unified access to different AI capabilities including text generation, image generation, and function calling. 179 180 **Supported Capabilities:** 181 - `TEXT_TO_TEXT` - Text completion and chat 182 - `TEXT_TO_IMAGE` - Image generation from text 183 - `IMAGE_TO_TEXT` - Image analysis and description 184 - `FUNCTION_CALLING` - Tool use and function execution 185 186 **Configuration Parameters:** 187 - `temperature` - Randomness control (0.0-2.0) 188 - `top_p` - Nucleus sampling threshold 189 - `max_tokens` - Maximum response length 190 - `frequency_penalty` - Repetition penalty 191 - `presence_penalty` - Topic diversity control 192 - `stop_sequences` - Generation stopping conditions 193 194 **Usage Example:** 195 ```java 196 ModelTextRequest request = ModelTextRequest.builder() 197 .prompt("Explain quantum computing in simple terms") 198 .temperature(0.7) 199 .maxTokens(500) 200 .build(); 201 202 ModelTextResponse response = modelClient.generateText(request); 203 System.out.println(response.getContent()); 204 ``` 205 206 ## Core Services 207 208 ### Chat Memory Management 209 210 #### ChatMemory 211 Interface for managing conversation history with methods to add, retrieve, clear, and filter messages by ID or type. 212 213 #### TokenWindowChatMemory 214 Advanced memory management with token-based capacity control: 215 216 ```java 217 @Builder 218 public class TokenWindowChatMemory implements ChatMemory { 219 private final int maxTokens; 220 private final Tokenizer tokenizer; 221 private final ChatMemoryStore store; 222 223 @Override 224 public void add(ChatItem message) { 225 store.add(message); 226 ensureTokenLimit(); 227 } 228 229 private void ensureTokenLimit() { 230 // Evict older messages while preserving system messages 231 // and maintaining conversation context 232 } 233 } 234 ``` 235 236 **Key Features:** 237 - Token-based capacity management 238 - Automatic message eviction 239 - System message preservation 240 - Context-aware pruning 241 - Multiple tokenizer support 242 243 **Configuration Example:** 244 ```java 245 ChatMemory memory = TokenWindowChatMemory.builder() 246 .maxTokens(4000) 247 .tokenizer(new SimpleTokenizer()) 248 .store(new InMemoryChatMemoryStore()) 249 .build(); 250 251 // Add messages - automatic pruning when limit exceeded 252 memory.add(systemMessage); 253 memory.add(userMessage); 254 memory.add(aiResponse); 255 ``` 256 257 #### InMemoryChatMemoryStore 258 Simple in-memory implementation for development and testing: 259 260 ```java 261 public class InMemoryChatMemoryStore implements ChatMemoryStore { 262 private final Map<String, List<ChatItem>> conversations = new HashMap<>(); 263 264 @Override 265 public void add(String chatId, ChatItem message) { 266 conversations.computeIfAbsent(chatId, k -> new ArrayList<>()).add(message); 267 } 268 269 @Override 270 public List<ChatItem> getMessages(String chatId) { 271 return conversations.getOrDefault(chatId, new ArrayList<>()); 272 } 273 } 274 ``` 275 276 ## Utility Classes 277 278 ### Document Processing 279 280 #### DocumentSplitter 281 Intelligent document chunking for RAG applications with configurable chunk sizes and overlap. The splitter uses a sentence-first splitting strategy to maintain context boundaries. 282 283 **Key Features:** 284 - Sentence-first splitting strategy 285 - Configurable overlap between chunks 286 - Token count validation 287 - Oversized content handling 288 - Context preservation 289 290 **Usage Example:** 291 ```java 292 DocumentSplitter splitter = DocumentSplitter.builder() 293 .maxChunkSize(512) 294 .overlapSize(50) 295 .tokenizer(new SimpleTokenizer()) 296 .build(); 297 298 List<String> chunks = splitter.split(longDocument); 299 // Each chunk ≤ 512 tokens with 50-token overlap 300 ``` 301 302 ### Text Analysis 303 304 #### TextSimilarityUtil 305 Comprehensive text similarity calculations with multiple algorithms including Levenshtein distance, Jaccard similarity, Cosine similarity, and a weighted combined similarity metric. 306 307 **Supported Algorithms:** 308 - **Levenshtein Distance** - Character-level edit distance 309 - **Jaccard Similarity** - Set-based similarity using word overlap 310 - **Cosine Similarity** - Vector-based similarity with TF-IDF 311 - **Combined Similarity** - Weighted combination of multiple metrics 312 313 **Usage Example:** 314 ```java 315 String text1 = "The quick brown fox jumps over the lazy dog"; 316 String text2 = "A quick brown fox leaps over a lazy dog"; 317 318 double similarity = TextSimilarityUtil.combinedSimilarity(text1, text2); 319 // Returns weighted score: 0.4 * levenshtein + 0.3 * jaccard + 0.3 * cosine 320 ``` 321 322 #### VariableExtractor 323 Template variable extraction with advanced features for parsing templates. Supports simple variables, conditional blocks, list iterations, and nested properties with dot notation. 324 325 **Supported Features:** 326 - Simple variable extraction: `{{variable}}` 327 - Conditional blocks: `{{#if condition}}...{{/if}}` 328 - List iterations: `{{#each items}}...{{/each}}` 329 - Nested properties: `{{user.profile.name}}` 330 - Escape sequences: `{{{{literal}}}}` 331 332 **Usage Example:** 333 ```java 334 String template = """ 335 Hello {{user.name}}! 336 {{#if user.isPremium}} 337 Welcome to our premium service. 338 {{/if}} 339 Your recent orders: 340 {{#each orders}} 341 - {{this.product}} ({{this.price}}) 342 {{/each}} 343 """; 344 345 Set<String> variables = VariableExtractor.extractVariables(template); 346 // Returns: ["user.name", "user.isPremium", "orders", "this.product", "this.price"] 347 ``` 348 349 ### JSON Processing 350 351 #### JsonUtils 352 Robust JSON parsing and repair utilities with support for malformed JSON, relaxed parsing (comments, trailing commas, single quotes), JSON extraction from mixed text, and safe type conversion. 353 354 **Key Features:** 355 - Automatic JSON repair for malformed input 356 - Relaxed parsing with comments and trailing commas 357 - JSON extraction from mixed text content 358 - Safe type conversion with error handling 359 - Support for single quotes and unquoted keys 360 361 **Usage Example:** 362 ```java 363 String malformedJson = """ 364 { 365 name: 'John Doe', // User's full name 366 age: 30, 367 "active": true, // Trailing comma 368 } 369 """; 370 371 Optional<JsonNode> parsed = JsonUtils.parseJsonRelaxed(malformedJson); 372 if (parsed.isPresent()) { 373 String name = parsed.get().get("name").asText(); 374 int age = parsed.get().get("age").asInt(); 375 } 376 ``` 377 378 ### Tokenization 379 380 #### SimpleTokenizer 381 Basic tokenization for estimation: 382 383 ```java 384 public class SimpleTokenizer implements Tokenizer { 385 private final double tokenCostMultiplier; 386 387 public SimpleTokenizer() { 388 this(0.7); // Default multiplier 389 } 390 391 @Override 392 public int estimateTokenCount(String text) { 393 return (int) (text.length() * tokenCostMultiplier); 394 } 395 396 @Override 397 public int estimateTokenCount(List<ChatItem> messages) { 398 return messages.stream() 399 .mapToInt(msg -> estimateTokenCount(msg.getContent())) 400 .sum(); 401 } 402 } 403 ``` 404 405 ## Configuration 406 407 ### EtlConfig 408 Central configuration for the entire framework: 409 410 ```java 411 @Data 412 @Builder 413 public class EtlConfig { 414 private List<VectorStoreConfig> vectorStores; 415 private List<EmbeddingServiceConfig> embeddingServices; 416 private List<PromptServiceConfig> promptServices; 417 private List<VaultConfig> vault; 418 private YoutubeProxyConfig youtubeProxy; 419 420 // Nested configuration classes 421 @Data 422 public static class VectorStoreConfig { 423 private String name; 424 private String type; // "inmemory", "filebased", "pinecone" 425 private String url; 426 private String apiKey; 427 private String environment; 428 private String index; 429 private Integer dimension; 430 private String metric; 431 private String filePath; 432 } 433 434 @Data 435 public static class EmbeddingServiceConfig { 436 private String name; 437 private String type; // "openai", "cohere", "local" 438 private String apiKey; 439 private String model; 440 private String url; 441 private String modelPath; 442 private Integer maxTokens; 443 private Double temperature; 444 } 445 446 @Data 447 public static class VaultConfig { 448 private String name; 449 private String type; // "openai", "anthropic", "google" 450 private String apiKey; 451 private String model; 452 private String baseUrl; 453 private Double temperature; 454 private Integer maxTokens; 455 private Double topP; 456 private Double frequencyPenalty; 457 private Double presencePenalty; 458 private List<String> stopSequences; 459 } 460 } 461 ``` 462 463 **Configuration Example (application.yml):** 464 ```yaml 465 driftkit: 466 vault: 467 - name: "primary-openai" 468 type: "openai" 469 apiKey: "${OPENAI_API_KEY}" 470 model: "gpt-4" 471 temperature: 0.7 472 maxTokens: 2000 473 - name: "claude" 474 type: "claude" 475 apiKey: "${CLAUDE_API_KEY}" 476 model: "claude-sonnet-4-20250514" 477 temperature: 0.7 478 maxTokens: 2000 479 480 vectorStores: 481 - name: "main-vector-store" 482 type: "pinecone" 483 apiKey: "${PINECONE_API_KEY}" 484 environment: "us-west1-gcp" 485 index: "driftkit-vectors" 486 dimension: 1536 487 metric: "cosine" 488 489 embeddingServices: 490 - name: "primary-embedding" 491 type: "openai" 492 apiKey: "${OPENAI_API_KEY}" 493 model: "text-embedding-ada-002" 494 495 promptServices: 496 - name: "file-prompts" 497 type: "filesystem" 498 basePath: "./prompts" 499 500 youtubeProxy: 501 proxyUrl: "http://proxy.example.com:8080" 502 username: "proxyuser" 503 password: "${PROXY_PASSWORD}" 504 ``` 505 506 ## Usage Patterns 507 508 ### Basic Chat Implementation 509 510 ```java 511 @Service 512 public class ChatService { 513 private final ChatMemory memory; 514 private final ModelClient modelClient; 515 516 public ChatService() { 517 this.memory = TokenWindowChatMemory.builder() 518 .maxTokens(4000) 519 .tokenizer(new SimpleTokenizer()) 520 .store(new InMemoryChatMemoryStore()) 521 .build(); 522 523 this.modelClient = new OpenAIModelClient(); 524 } 525 526 public String processMessage(String chatId, String userMessage) { 527 // Add user message to memory 528 Message userMsg = Message.builder() 529 .id(UUID.randomUUID().toString()) 530 .chatId(chatId) 531 .content(userMessage) 532 .messageType(ChatMessageType.USER) 533 .build(); 534 memory.add(userMsg); 535 536 // Generate AI response 537 ModelTextRequest request = ModelTextRequest.builder() 538 .messages(memory.messages()) 539 .temperature(0.7) 540 .maxTokens(1000) 541 .build(); 542 543 ModelTextResponse response = modelClient.generateText(request); 544 545 // Add AI response to memory 546 Message aiMsg = Message.builder() 547 .id(UUID.randomUUID().toString()) 548 .chatId(chatId) 549 .content(response.getContent()) 550 .messageType(ChatMessageType.AI) 551 .build(); 552 memory.add(aiMsg); 553 554 return response.getContent(); 555 } 556 } 557 ``` 558 559 ### Document Processing Pipeline 560 561 ```java 562 @Service 563 public class DocumentProcessor { 564 private final DocumentSplitter splitter; 565 private final TextSimilarityUtil similarity; 566 567 public DocumentProcessor() { 568 this.splitter = DocumentSplitter.builder() 569 .maxChunkSize(512) 570 .overlapSize(50) 571 .tokenizer(new SimpleTokenizer()) 572 .build(); 573 } 574 575 public List<String> processDocument(String document) { 576 // Split document into chunks 577 List<String> chunks = splitter.split(document); 578 579 // Remove duplicate chunks based on similarity 580 List<String> uniqueChunks = new ArrayList<>(); 581 for (String chunk : chunks) { 582 boolean isDuplicate = uniqueChunks.stream() 583 .anyMatch(existing -> 584 similarity.combinedSimilarity(chunk, existing) > 0.9); 585 586 if (!isDuplicate) { 587 uniqueChunks.add(chunk); 588 } 589 } 590 591 return uniqueChunks; 592 } 593 } 594 ``` 595 596 ### Template Processing 597 598 ```java 599 @Service 600 public class TemplateProcessor { 601 private final VariableExtractor extractor; 602 603 public String processTemplate(String template, Map<String, Object> variables) { 604 // Extract required variables 605 Set<String> requiredVars = extractor.extractVariables(template); 606 607 // Validate all variables are provided 608 for (String var : requiredVars) { 609 if (!variables.containsKey(var)) { 610 throw new IllegalArgumentException("Missing variable: " + var); 611 } 612 } 613 614 // Process template (simplified - use actual template engine) 615 String result = template; 616 for (Map.Entry<String, Object> entry : variables.entrySet()) { 617 result = result.replace("{{" + entry.getKey() + "}}", 618 String.valueOf(entry.getValue())); 619 } 620 621 return result; 622 } 623 } 624 ``` 625 626 ## Testing 627 628 ### Unit Test Examples 629 630 ```java 631 @Test 632 public void testDocumentSplitter() { 633 DocumentSplitter splitter = DocumentSplitter.builder() 634 .maxChunkSize(100) 635 .overlapSize(20) 636 .tokenizer(new SimpleTokenizer()) 637 .build(); 638 639 String document = "This is a long document that needs to be split..."; 640 List<String> chunks = splitter.split(document); 641 642 assertThat(chunks).isNotEmpty(); 643 assertThat(chunks.get(0)).hasSizeLessThanOrEqualTo(100); 644 } 645 646 @Test 647 public void testTextSimilarity() { 648 String text1 = "The quick brown fox"; 649 String text2 = "A quick brown fox"; 650 651 double similarity = TextSimilarityUtil.combinedSimilarity(text1, text2); 652 assertThat(similarity).isBetween(0.7, 1.0); 653 } 654 655 @Test 656 public void testVariableExtraction() { 657 String template = "Hello {{name}}! You have {{count}} messages."; 658 Set<String> variables = VariableExtractor.extractVariables(template); 659 660 assertThat(variables).containsExactlyInAnyOrder("name", "count"); 661 } 662 ``` 663 664 ## Performance Considerations 665 666 ### Memory Management 667 - Use `TokenWindowChatMemory` for production to prevent memory leaks 668 - Configure appropriate token limits based on model context windows 669 - Consider persistent storage for long-term conversation history 670 671 ### Text Processing 672 - `TextSimilarityUtil` operations are O(n²) for large texts 673 - Use caching for repeated similarity calculations 674 - Consider approximate algorithms for very large document sets 675 676 ### JSON Processing 677 - `JsonUtils.repairJson()` has overhead - use sparingly 678 - Cache parsed JSON for repeated access 679 - Validate JSON structure before processing 680 681 ## Integration with Other Modules 682 683 ### With driftkit-clients 684 ```java 685 // Use common domain objects with model clients 686 ModelTextRequest request = ModelTextRequest.builder() 687 .messages(memory.messages()) // ChatItem list 688 .temperature(0.7) 689 .build(); 690 ``` 691 692 ### With driftkit-workflows 693 ```java 694 // AITask integrates with workflow context 695 AITask task = AITask.builder() 696 .workflowId("rag-workflow") 697 .workflowContext(workflowContext) 698 .variables(templateVariables) 699 .build(); 700 ``` 701 702 ### With driftkit-vector 703 ```java 704 // Document processing for vector storage 705 List<String> chunks = documentSplitter.split(document); 706 // Chunks can be embedded and stored in vector databases 707 ``` 708 709 ## Error Handling 710 711 ### Common Exceptions 712 - `IllegalArgumentException` - Invalid configuration or parameters 713 - `JsonProcessingException` - JSON parsing failures 714 - `ValidationException` - Bean validation failures 715 - `TokenLimitExceededException` - Memory capacity exceeded 716 717 ### Best Practices 718 ```java 719 // Always validate inputs 720 ValidationUtils.requireNonNull(chatId, "Chat ID cannot be null"); 721 722 // Handle JSON parsing gracefully 723 Optional<JsonNode> json = JsonUtils.parseJson(inputJson); 724 if (json.isEmpty()) { 725 log.warn("Failed to parse JSON: {}", inputJson); 726 return defaultResponse; 727 } 728 729 // Use try-with-resources for cleanup 730 try (var resource = acquireResource()) { 731 // Process resource 732 } catch (Exception e) { 733 log.error("Processing failed", e); 734 throw new ProcessingException("Failed to process request", e); 735 } 736 ``` 737 738 ## Migration Guide 739 740 ### From Version 1.x to 2.x 741 1. Update imports from `javax.annotation` to `jakarta.annotation` 742 2. Replace deprecated `@Getter/@Setter` with `@Data` where appropriate 743 3. Update `EtlConfig` usage to use Spring Boot configuration properties 744 4. Replace manual JSON parsing with `JsonUtils` methods 745 746 ### Configuration Changes 747 ```yaml 748 # Old format 749 etl: 750 openai: 751 apiKey: "sk-..." 752 753 # New format 754 driftkit: 755 vault: 756 - name: "primary" 757 type: "openai" 758 apiKey: "sk-..." 759 ``` 760 761 ## Real-World Demo Examples 762 763 ### 1. Building a Customer Support Chatbot 764 765 This example demonstrates building a complete customer support chatbot using the common module's chat management features. 766 767 ```java 768 @Service 769 public class CustomerSupportBot { 770 private final ChatMemory memory; 771 private final ModelClient modelClient; 772 private final TextSimilarityUtil similarity; 773 774 public CustomerSupportBot() { 775 // Initialize with 4K token window for GPT-4 776 this.memory = TokenWindowChatMemory.builder() 777 .maxTokens(4000) 778 .tokenizer(new SimpleTokenizer()) 779 .store(new InMemoryChatMemoryStore()) 780 .build(); 781 } 782 783 public String handleCustomerQuery(String chatId, String query) { 784 // Add system message if it's a new conversation 785 if (memory.messages().isEmpty()) { 786 Message systemMsg = Message.builder() 787 .id(UUID.randomUUID().toString()) 788 .chatId(chatId) 789 .content("You are a helpful customer support agent for an e-commerce platform. Be professional, empathetic, and solution-oriented.") 790 .messageType(ChatMessageType.SYSTEM) 791 .build(); 792 memory.add(systemMsg); 793 } 794 795 // Check for similar previous queries 796 String similarResponse = findSimilarResponse(query); 797 if (similarResponse != null) { 798 return similarResponse; 799 } 800 801 // Add user message 802 Message userMsg = Message.builder() 803 .id(UUID.randomUUID().toString()) 804 .chatId(chatId) 805 .content(query) 806 .messageType(ChatMessageType.USER) 807 .requestInitTime(LocalDateTime.now()) 808 .build(); 809 memory.add(userMsg); 810 811 // Generate response 812 ModelTextRequest request = ModelTextRequest.builder() 813 .messages(memory.messages()) 814 .temperature(0.7) 815 .maxTokens(500) 816 .build(); 817 818 ModelTextResponse response = modelClient.textToText(request); 819 String aiResponse = response.getChoices().get(0).getMessage().getContent(); 820 821 // Add AI response to memory 822 Message aiMsg = Message.builder() 823 .id(UUID.randomUUID().toString()) 824 .chatId(chatId) 825 .content(aiResponse) 826 .messageType(ChatMessageType.AI) 827 .responseTime(LocalDateTime.now()) 828 .build(); 829 memory.add(aiMsg); 830 831 return aiResponse; 832 } 833 834 private String findSimilarResponse(String query) { 835 // Look for similar queries in past conversations 836 List<ChatItem> userMessages = memory.findByType(ChatMessageType.USER); 837 838 for (ChatItem msg : userMessages) { 839 double sim = TextSimilarityUtil.combinedSimilarity(query, msg.getContent()); 840 if (sim > 0.85) { 841 // Find the AI response that followed this message 842 // Return it as a cached response 843 return "Based on a similar query..."; 844 } 845 } 846 return null; 847 } 848 } 849 ``` 850 851 ### 2. Document Intelligence System for Legal Contracts 852 853 This example shows how to build a document analysis system for legal contracts using document splitting and AI processing. 854 855 ```java 856 @Service 857 public class LegalContractAnalyzer { 858 private final DocumentSplitter splitter; 859 private final ModelClient modelClient; 860 private final Map<String, List<String>> contractClauses = new HashMap<>(); 861 862 public LegalContractAnalyzer() { 863 this.splitter = DocumentSplitter.builder() 864 .maxChunkSize(1024) // Larger chunks for better context 865 .overlapSize(100) // Overlap to maintain clause continuity 866 .tokenizer(new SimpleTokenizer()) 867 .build(); 868 } 869 870 public ContractAnalysis analyzeContract(String contractId, String contractText) { 871 // Split contract into analyzable chunks 872 List<String> chunks = splitter.split(contractText); 873 contractClauses.put(contractId, chunks); 874 875 ContractAnalysis analysis = new ContractAnalysis(); 876 analysis.setContractId(contractId); 877 analysis.setTotalClauses(chunks.size()); 878 879 // Analyze each chunk for specific legal elements 880 for (int i = 0; i < chunks.size(); i++) { 881 String chunk = chunks.get(i); 882 883 // Extract key information from each chunk 884 ClauseAnalysis clauseAnalysis = analyzeClause(chunk, i); 885 analysis.addClause(clauseAnalysis); 886 887 // Check for risk factors 888 if (containsRiskIndicators(chunk)) { 889 analysis.addRiskFlag(new RiskFlag(i, chunk, assessRiskLevel(chunk))); 890 } 891 } 892 893 // Generate executive summary 894 analysis.setSummary(generateExecutiveSummary(analysis)); 895 896 return analysis; 897 } 898 899 private ClauseAnalysis analyzeClause(String clause, int index) { 900 // Use AI to categorize and extract key information 901 String prompt = """ 902 Analyze this legal clause and provide: 903 1. Clause type (e.g., payment terms, liability, termination) 904 2. Key obligations 905 3. Important dates or deadlines 906 4. Parties involved 907 908 Clause: """ + clause; 909 910 ModelTextRequest request = ModelTextRequest.builder() 911 .prompt(prompt) 912 .temperature(0.1) // Low temperature for factual analysis 913 .maxTokens(300) 914 .jsonResponse(true) 915 .build(); 916 917 ModelTextResponse response = modelClient.textToText(request); 918 919 // Parse the structured response 920 return parseClauseAnalysis(response.getContent()); 921 } 922 923 private boolean containsRiskIndicators(String text) { 924 String[] riskKeywords = { 925 "unlimited liability", "indemnification", "penalty", 926 "liquidated damages", "non-compete", "exclusivity" 927 }; 928 929 String normalizedText = text.toLowerCase(); 930 return Arrays.stream(riskKeywords) 931 .anyMatch(normalizedText::contains); 932 } 933 934 public List<String> findSimilarClauses(String contractId, String searchClause) { 935 List<String> clauses = contractClauses.get(contractId); 936 if (clauses == null) return Collections.emptyList(); 937 938 return clauses.stream() 939 .filter(clause -> TextSimilarityUtil.combinedSimilarity(searchClause, clause) > 0.7) 940 .collect(Collectors.toList()); 941 } 942 } 943 944 @Data 945 class ContractAnalysis { 946 private String contractId; 947 private int totalClauses; 948 private List<ClauseAnalysis> clauses = new ArrayList<>(); 949 private List<RiskFlag> riskFlags = new ArrayList<>(); 950 private String summary; 951 952 public void addClause(ClauseAnalysis clause) { 953 clauses.add(clause); 954 } 955 956 public void addRiskFlag(RiskFlag flag) { 957 riskFlags.add(flag); 958 } 959 } 960 ``` 961 962 ### 3. Multi-Language Content Processing Pipeline 963 964 This example demonstrates handling multi-language content with automatic translation and cultural adaptation. 965 966 ```java 967 @Service 968 public class MultiLanguageContentProcessor { 969 private final VariableExtractor extractor; 970 private final ModelClient modelClient; 971 private final Map<Language, Map<String, String>> translations = new HashMap<>(); 972 973 public ProcessedContent processContent(String template, Language targetLanguage, Map<String, Object> data) { 974 // Extract all variables from template 975 Set<String> variables = extractor.extractVariables(template); 976 Set<String> conditionals = extractor.extractConditionalVariables(template); 977 978 // Validate all required variables are present 979 for (String var : variables) { 980 if (!data.containsKey(var) && !conditionals.contains(var)) { 981 throw new IllegalArgumentException("Missing required variable: " + var); 982 } 983 } 984 985 // Process template with language-specific adaptations 986 String processed = processTemplate(template, targetLanguage, data); 987 988 // Apply cultural adaptations 989 processed = applyCulturalAdaptations(processed, targetLanguage); 990 991 // Generate language-specific metadata 992 ProcessedContent content = new ProcessedContent(); 993 content.setContent(processed); 994 content.setLanguage(targetLanguage); 995 content.setVariablesUsed(variables); 996 content.setProcessingTime(LocalDateTime.now()); 997 998 return content; 999 } 1000 1001 private String processTemplate(String template, Language language, Map<String, Object> data) { 1002 // First, handle language-specific number and date formatting 1003 Map<String, Object> localizedData = localizeData(data, language); 1004 1005 // Process the template 1006 String processed = TemplateEngine.renderTemplate(template, localizedData); 1007 1008 // Translate if needed 1009 if (language != Language.ENGLISH) { 1010 processed = translateContent(processed, language); 1011 } 1012 1013 return processed; 1014 } 1015 1016 private Map<String, Object> localizeData(Map<String, Object> data, Language language) { 1017 Map<String, Object> localized = new HashMap<>(data); 1018 1019 // Format numbers based on locale 1020 data.forEach((key, value) -> { 1021 if (value instanceof Number) { 1022 localized.put(key, formatNumber((Number) value, language)); 1023 } else if (value instanceof LocalDateTime) { 1024 localized.put(key, formatDate((LocalDateTime) value, language)); 1025 } 1026 }); 1027 1028 return localized; 1029 } 1030 1031 private String translateContent(String content, Language targetLanguage) { 1032 // Use AI for context-aware translation 1033 String prompt = String.format( 1034 "Translate the following content to %s. Maintain the tone and context:\n\n%s", 1035 targetLanguage.name(), content 1036 ); 1037 1038 ModelTextRequest request = ModelTextRequest.builder() 1039 .prompt(prompt) 1040 .temperature(0.3) 1041 .maxTokens(1000) 1042 .build(); 1043 1044 ModelTextResponse response = modelClient.textToText(request); 1045 return response.getContent(); 1046 } 1047 1048 public void preloadTranslations(String key, Map<Language, String> translations) { 1049 translations.forEach((lang, translation) -> { 1050 this.translations.computeIfAbsent(lang, k -> new HashMap<>()) 1051 .put(key, translation); 1052 }); 1053 } 1054 } 1055 ``` 1056 1057 ### 4. Intelligent Task Routing System 1058 1059 This example shows how to build an intelligent task routing system using AITask and workflow context. 1060 1061 ```java 1062 @Service 1063 public class IntelligentTaskRouter { 1064 private final Map<String, WorkflowHandler> handlers = new HashMap<>(); 1065 private final ModelClient modelClient; 1066 1067 public TaskResult routeTask(AITask task) { 1068 // Analyze task to determine best workflow 1069 String workflowId = determineWorkflow(task); 1070 task.setWorkflowId(workflowId); 1071 1072 // Get appropriate handler 1073 WorkflowHandler handler = handlers.get(workflowId); 1074 if (handler == null) { 1075 task.setErrorMessage("No handler found for workflow: " + workflowId); 1076 task.setCompletedAt(LocalDateTime.now()); 1077 return TaskResult.failure(task); 1078 } 1079 1080 try { 1081 // Execute task with context preservation 1082 TaskResult result = handler.execute(task); 1083 1084 // Update task with results 1085 task.setCompletedAt(LocalDateTime.now()); 1086 task.setGrade(evaluateResult(result)); 1087 1088 return result; 1089 } catch (Exception e) { 1090 task.setErrorMessage("Task execution failed: " + e.getMessage()); 1091 task.setCompletedAt(LocalDateTime.now()); 1092 task.setGrade(Grade.POOR); 1093 return TaskResult.failure(task); 1094 } 1095 } 1096 1097 private String determineWorkflow(AITask task) { 1098 // Use AI to classify the task 1099 String classificationPrompt = buildClassificationPrompt(task); 1100 1101 ModelTextRequest request = ModelTextRequest.builder() 1102 .prompt(classificationPrompt) 1103 .temperature(0.1) 1104 .maxTokens(50) 1105 .jsonResponse(true) 1106 .build(); 1107 1108 ModelTextResponse response = modelClient.textToText(request); 1109 1110 // Parse workflow ID from response 1111 JsonNode result = JsonUtils.parseJson(response.getContent()).orElse(null); 1112 return result != null ? result.get("workflow").asText() : "default"; 1113 } 1114 1115 private String buildClassificationPrompt(AITask task) { 1116 return String.format(""" 1117 Classify this task into one of the following workflows: 1118 - customer-service: Customer inquiries and support 1119 - content-generation: Creating marketing or educational content 1120 - data-analysis: Analyzing data and generating reports 1121 - document-processing: Processing and extracting information from documents 1122 1123 Task details: 1124 Prompt: %s 1125 Has Images: %s 1126 Has Audio: %s 1127 Variables: %s 1128 1129 Respond with JSON: {"workflow": "workflow-id", "confidence": 0.0-1.0} 1130 """, 1131 task.getPrompt(), 1132 task.getImageUrls() != null && !task.getImageUrls().isEmpty(), 1133 task.getAudioUrl() != null, 1134 task.getVariables() 1135 ); 1136 } 1137 1138 private Grade evaluateResult(TaskResult result) { 1139 if (!result.isSuccess()) return Grade.POOR; 1140 1141 double score = result.getConfidenceScore(); 1142 if (score >= 0.9) return Grade.EXCELLENT; 1143 if (score >= 0.7) return Grade.GOOD; 1144 if (score >= 0.5) return Grade.AVERAGE; 1145 return Grade.POOR; 1146 } 1147 1148 public void registerHandler(String workflowId, WorkflowHandler handler) { 1149 handlers.put(workflowId, handler); 1150 } 1151 } 1152 1153 interface WorkflowHandler { 1154 TaskResult execute(AITask task); 1155 } 1156 1157 @Data 1158 class TaskResult { 1159 private boolean success; 1160 private String output; 1161 private double confidenceScore; 1162 private Map<String, Object> metadata; 1163 1164 public static TaskResult failure(AITask task) { 1165 TaskResult result = new TaskResult(); 1166 result.setSuccess(false); 1167 result.setOutput(task.getErrorMessage()); 1168 result.setConfidenceScore(0.0); 1169 return result; 1170 } 1171 } 1172 ``` 1173 1174 ### 5. Intelligent Chat Memory Optimization 1175 1176 This example shows advanced memory management for long-running conversations. 1177 1178 ```java 1179 @Service 1180 public class OptimizedChatMemoryService { 1181 private final Map<String, TokenWindowChatMemory> userMemories = new ConcurrentHashMap<>(); 1182 private final ModelClient modelClient; 1183 1184 public String processUserMessage(String userId, String message) { 1185 TokenWindowChatMemory memory = getUserMemory(userId); 1186 1187 // Add user message 1188 Message userMsg = Message.builder() 1189 .id(UUID.randomUUID().toString()) 1190 .chatId(userId) 1191 .content(message) 1192 .messageType(ChatMessageType.USER) 1193 .requestInitTime(LocalDateTime.now()) 1194 .build(); 1195 memory.add(userMsg); 1196 1197 // Check if we need to summarize old messages 1198 if (shouldSummarize(memory)) { 1199 summarizeOldMessages(memory); 1200 } 1201 1202 // Generate response with optimized context 1203 String response = generateResponse(memory); 1204 1205 // Add AI response 1206 Message aiMsg = Message.builder() 1207 .id(UUID.randomUUID().toString()) 1208 .chatId(userId) 1209 .content(response) 1210 .messageType(ChatMessageType.AI) 1211 .responseTime(LocalDateTime.now()) 1212 .build(); 1213 memory.add(aiMsg); 1214 1215 return response; 1216 } 1217 1218 private TokenWindowChatMemory getUserMemory(String userId) { 1219 return userMemories.computeIfAbsent(userId, k -> 1220 TokenWindowChatMemory.builder() 1221 .maxTokens(3500) // Leave room for response 1222 .tokenizer(new SimpleTokenizer()) 1223 .store(new InMemoryChatMemoryStore()) 1224 .build() 1225 ); 1226 } 1227 1228 private boolean shouldSummarize(TokenWindowChatMemory memory) { 1229 // Summarize when we have more than 10 message pairs 1230 long messageCount = memory.messages().stream() 1231 .filter(m -> m.getMessageType() != ChatMessageType.SYSTEM) 1232 .count(); 1233 return messageCount > 20; 1234 } 1235 1236 private void summarizeOldMessages(TokenWindowChatMemory memory) { 1237 List<ChatItem> messages = memory.messages(); 1238 1239 // Keep system message and recent messages 1240 List<ChatItem> toSummarize = messages.stream() 1241 .filter(m -> m.getMessageType() != ChatMessageType.SYSTEM) 1242 .limit(messages.size() - 10) // Keep last 10 messages 1243 .collect(Collectors.toList()); 1244 1245 if (toSummarize.isEmpty()) return; 1246 1247 // Generate summary 1248 String summaryPrompt = buildSummaryPrompt(toSummarize); 1249 ModelTextRequest request = ModelTextRequest.builder() 1250 .prompt(summaryPrompt) 1251 .temperature(0.3) 1252 .maxTokens(500) 1253 .build(); 1254 1255 ModelTextResponse response = modelClient.textToText(request); 1256 String summary = response.getContent(); 1257 1258 // Create summary message 1259 Message summaryMsg = Message.builder() 1260 .id(UUID.randomUUID().toString()) 1261 .chatId(memory.messages().get(0).getChatId()) 1262 .content("Previous conversation summary: " + summary) 1263 .messageType(ChatMessageType.SYSTEM) 1264 .createdTime(System.currentTimeMillis()) 1265 .build(); 1266 1267 // Clear old messages and add summary 1268 memory.clear(); 1269 1270 // Re-add system message if exists 1271 messages.stream() 1272 .filter(m -> m.getMessageType() == ChatMessageType.SYSTEM) 1273 .findFirst() 1274 .ifPresent(memory::add); 1275 1276 // Add summary 1277 memory.add(summaryMsg); 1278 1279 // Add recent messages 1280 messages.stream() 1281 .skip(Math.max(0, messages.size() - 10)) 1282 .forEach(memory::add); 1283 } 1284 1285 private String buildSummaryPrompt(List<ChatItem> messages) { 1286 StringBuilder prompt = new StringBuilder(); 1287 prompt.append("Summarize the following conversation, keeping key points and context:\n\n"); 1288 1289 for (ChatItem msg : messages) { 1290 String role = msg.getMessageType() == ChatMessageType.USER ? "User" : "Assistant"; 1291 prompt.append(role).append(": ").append(msg.getContent()).append("\n\n"); 1292 } 1293 1294 prompt.append("Provide a concise summary that preserves important information and context."); 1295 return prompt.toString(); 1296 } 1297 } 1298 ``` 1299 1300 This comprehensive documentation provides a complete reference for the driftkit-common module, covering all major components, usage patterns, and integration points. The module serves as the foundation for building sophisticated AI applications with robust chat management, document processing, and text analysis capabilities.