/ rfcs / 0015-phase7-distribution-and-production-hardening.md
0015-phase7-distribution-and-production-hardening.md
  1  # RFC-0015 — Phase 7 Distribution and Production Hardening
  2  
  3  Status: Accepted
  4  
  5  ## Summary
  6  
  7  Implement the Phase 7 production-hardening baseline:
  8  
  9  - supervision tree guidance model
 10  - metrics and tracing primitives
 11  - rate limiting hooks
 12  - CSRF/session protection hooks
 13  - deterministic load testing harness
 14  - security review documentation
 15  - deployment guide documentation
 16  
 17  ## Motivation
 18  
 19  Runtime behavior is stable through Phase 6, but production operation requires explicit controls and documentation for failure isolation, observability, abuse control, and rollout safety.
 20  
 21  ## Design
 22  
 23  ### Supervision guidance
 24  
 25  Add `lightspeed/ops/supervision` with:
 26  
 27  - typed child spec guidance
 28  - restart/strategy labels
 29  - default plan builder
 30  - validation for required children and duplicate IDs
 31  
 32  ### Metrics and tracing
 33  
 34  Add `lightspeed/ops/telemetry` with:
 35  
 36  - mapping from session telemetry events to counters/gauges
 37  - receive-span model and status labels
 38  - log-friendly metric line serializer
 39  
 40  ### Rate limiting and protection hooks
 41  
 42  Extend transport contract with:
 43  
 44  - `ProtectionHook` for CSRF/session binding and related policy
 45  - `RateLimitHook` for event throttling policy
 46  - explicit adapter errors for protection and rate-limit failures
 47  
 48  Extend websocket adapter with:
 49  
 50  - `connect_with_hooks` / `reconnect_with_hooks`
 51  - `receive_with_hooks`
 52  - default wrappers preserving earlier function shape
 53  
 54  ### Load testing harness
 55  
 56  Add `lightspeed/ops/load_harness` with deterministic scenarios covering:
 57  
 58  - reconnect behavior
 59  - crash/restart behavior
 60  - slow-client ack behavior
 61  - repeatability checks
 62  
 63  ### Documentation
 64  
 65  Add:
 66  
 67  - `docs/supervision_tree.md`
 68  - `docs/security_review.md`
 69  - `docs/deployment_guide.md`
 70  
 71  Update top-level docs to reference the new Phase 7 assets.
 72  
 73  ## API impact
 74  
 75  New public modules:
 76  
 77  - `lightspeed/ops/supervision`
 78  - `lightspeed/ops/telemetry`
 79  - `lightspeed/ops/load_harness`
 80  
 81  Transport contract public API expands with protection/rate hook types and helpers.
 82  
 83  ## Protocol impact
 84  
 85  No frame shape changes.
 86  
 87  ## ISA impact
 88  
 89  No instruction set changes.
 90  
 91  ## Security impact
 92  
 93  - explicit CSRF/session protection policy surface
 94  - explicit rate-limit policy surface
 95  - deterministic rejection paths for blocked client traffic
 96  
 97  ## Alternatives
 98  
 99  - defer all hardening to adapter/application layer only
100  - add hardcoded rate-limiter/CSRF policy instead of hooks
101  - prioritize deployment docs without typed guidance modules
102  
103  ## Unresolved questions
104  
105  - standardized distributed limiter backend API
106  - optional shared replay-nonce guard in core
107  - future structured export formats for metrics/traces