EVALUATION-REPORT.adoc
1 = Cerro Torre Evaluation Report 2 :toc: left 3 :toclevels: 2 4 5 This document records periodic, evidence-backed evaluations of completion against the repo's checklists. 6 It is intentionally *veridical*: every claim must link to a command output, file path, or CI run. 7 8 == How to use this 9 * Create a new section per evaluation date. 10 * For each domain, choose a rating and provide evidence. 11 * Link to the exact checklist file and (where relevant) the exact checklist item IDs (e.g., "A1", "C2"). 12 13 == Rating scale 14 * *Fullest* — Delight + polish; minimal friction; interop + migrations are boring and reliable. 15 * *Full* — Paved road end-to-end; strong diagnostics; stable surfaces; CI gates enforce the contract. 16 * *Strong* — Very usable; small sharp edges remain; most seams covered by conformance and interop. 17 * *Moderate* — Works for insiders; occasional setup/debug friction; partial conformance/interop. 18 * *Basic* — Core functions exist but are fragile/manual; little automation or diagnostics. 19 * *Not implemented* — Missing or not verifiable. 20 21 == Evidence rules 22 A rating without evidence is invalid. 23 Accepted evidence types: 24 * CI run URL(s) (preferred) 25 * exact command + output excerpt (redacted if needed) committed under `evidence/` 26 * file paths + commit hashes 27 28 == Change control 29 * Any claim of *Full* or above MUST reference passing required CI checks. 30 * If a rating drops, record the regression cause and link to the issue/PR. 31 32 --- 33 34 === 2025-12-30 (Evaluator: claude-code) 35 36 Checklist: link:QOL-AUDIT.adoc[QOL-AUDIT.adoc] @ commit `d4e378d7cfb688488795718847514aff530a33f5` 37 38 Scope: Branch `claude/cerro-torre-mvp-plan-tKJdw`, pre-v0.1 39 40 Platforms covered: linux x86_64 (specification only, no runtime tests) 41 42 ==== Overall rating 43 44 *Rating:* Basic 45 46 *Justification:* 47 48 - All core CLI commands scaffolded with help text but no functional implementation 49 - Comprehensive specifications exist (canonicalization, policy, index, crypto suites) 50 - No tests directory, no conformance vectors, no CI job separation 51 - Mental model doc missing; onboarding story incomplete 52 53 ==== Domain ratings (with evidence) 54 55 [cols="1,1,4",options="header"] 56 |=== 57 | Domain | Rating | Evidence (links / commands) 58 59 | A. Onboarding + doctor 60 | Basic 61 | * `ct doctor` scaffolded but returns "Not yet implemented" 62 * Evidence: `evidence/2025-12-30/cli-scaffold.txt` 63 * CLI source: `src/cli/cerro_cli.adb:357-392` 64 * Missing: actual crypto backend check, config validation 65 66 | B. CLI ergonomics + surfaces 67 | Basic-Moderate 68 | * 17+ commands scaffolded in `src/cli/cerro_cli.ads` 69 * Exit codes defined: `src/cli/ct_errors.ads` (0-12) 70 * Spec: `spec/cli-ergonomics.adoc` (comprehensive) 71 * Evidence: `evidence/2025-12-30/cli-scaffold.txt` 72 * Missing: implementation, --json output 73 74 | C. Happy-path E2E (README is executable) 75 | Not implemented 76 | * `ct pack` returns "Not yet implemented" 77 * `ct verify` returns "Not yet implemented" 78 * No tests/ directory exists 79 * Evidence: `evidence/2025-12-30/tests-dir.txt` 80 * No golden tests, no determinism check 81 82 | D. Config/state/cleanup 83 | Basic 84 | * Key storage location documented: `~/.config/cerro/keys/` 85 * Keystore policy spec: `spec/keystore-policy.json` 86 * Argon2id policy documented in spec 87 * Evidence: `evidence/2025-12-30/schema-files.txt` 88 * Missing: implementation, config.toml spec for mirrors 89 90 | E. Release/reproducibility 91 | Moderate 92 | * Canonicalization spec: `spec/manifest-canonicalization.adoc` (comprehensive) 93 * Manifest format spec: `spec/manifest-format.md` 94 * Bundle format: `spec/ctp-bundle-format.adoc` 95 * Missing: conformance vectors, reproducible build evidence 96 97 | F. Seam + surface checks (conformance/interop) 98 | Not implemented 99 | * Svalinn integration spec exists: `spec/svalinn-integration.adoc` 100 * No conformance test harness 101 * No interop CI job 102 * CI not separated: single `ada-spark-ci.yml` 103 * Evidence: `.github/workflows/ada-spark-ci.yml` (40 lines, minimal) 104 105 | G. Smoothing docs + usability 106 | Not implemented 107 | * No `docs/mental-model.adoc` 108 * No `.github/PULL_REQUEST_TEMPLATE.md` 109 * Troubleshooting: `ct doctor` scaffolded only 110 * Example policies exist: `evidence/2025-12-30/example-policies.txt` 111 |=== 112 113 ==== Evidence files committed 114 115 [source] 116 ---- 117 evidence/2025-12-30/ 118 ├── cli-scaffold.txt # ls -la src/cli/*.ad[sb] 119 ├── example-policies.txt # ls -la examples/*.json 120 ├── schema-files.txt # ls -la spec/*.json 121 └── tests-dir.txt # ls tests/ (shows non-existent) 122 ---- 123 124 ==== Checklist item status (cross-reference to QOL-AUDIT.adoc) 125 126 [cols="1,1,3",options="header"] 127 |=== 128 | Item | Status | Notes 129 130 | A1. `ct pack` deterministic 131 | Basic 132 | CLI scaffold exists, not implemented 133 134 | A2. `ct verify` structured errors 135 | Basic 136 | Exit codes defined in `ct_errors.ads`, not implemented 137 138 | A3. `ct explain` narrative 139 | Basic 140 | CLI scaffold exists, not implemented 141 142 | B1. Key lifecycle 143 | Basic 144 | All subcommands scaffolded, none implemented 145 146 | B2. Policy UX 147 | Moderate 148 | Schema complete (`policy-schema.json`), examples exist 149 150 | B3. Rotation/multi-signer 151 | Moderate 152 | Schema supports threshold/deny, `ct re-sign` scaffolded 153 154 | C1. Offline export/import 155 | Basic 156 | Scaffolded, not implemented 157 158 | C2. Mirror resolution 159 | Not impl 160 | No mirror config section, no --offline flag 161 162 | D1. Canonicalization rules 163 | Strong 164 | Spec complete, ATS shadow exists, vectors missing 165 166 | D2. Turing-incomplete 167 | Strong 168 | TOML inherently bounded, spec documents limits 169 170 | E1. Proof gates separated 171 | Moderate 172 | Single CI workflow, SPARK proof optional 173 174 | E2. Developer mode 175 | Not impl 176 | No --dev flag, no dev markers 177 178 | F1. Spec version pin 179 | Basic 180 | `ct version` prints 0.1.0-dev, no spec version 181 182 | F2. Conformance harness 183 | Not impl 184 | No harness, no vectors imported 185 186 | F3. Svalinn interop 187 | Not impl 188 | Spec exists, no CI job 189 190 | G1. `ct doctor` 191 | Basic 192 | Scaffolded with check list, not implemented 193 194 | G2. `ct diff` 195 | Basic 196 | Scaffolded with sample output, not implemented 197 198 | G3. Mental model doc 199 | Not impl 200 | Does not exist 201 |=== 202 203 ==== Regressions since last evaluation 204 205 * N/A — First evaluation 206 207 ==== Top 5 next actions (ranked) 208 209 . [ ] *Implement core crypto* — SHA-256 + Ed25519 via libsodium bindings. Required for any verification. 210 . [ ] *Create tests/ directory with conformance vectors* — `tests/canon/valid/`, `tests/canon/invalid/`, `tests/errors/`. Enables CI gates. 211 . [ ] *Implement `ct pack` and `ct verify`* — Core loop. Without this, nothing is usable. 212 . [ ] *Create `docs/mental-model.adoc`* — 2-page user guide. Critical for onboarding. 213 . [ ] *Separate CI into job matrix* — `build`, `unit-test`, `spark-proof`, `lint`. Enables proper gates. 214 215 ==== Notes 216 217 * Schema completeness is high — `policy-schema.json`, `index-schema.json`, `crypto-suites.json`, `keystore-policy.schema.json` all exist and are comprehensive. 218 * CLI scaffold is thorough with detailed help text for all 17+ commands. 219 * The gap between "specified" and "implemented" is the primary issue. 220 * ATS2 shadow verifier exists (`tools/ats-shadow/`) but is non-authoritative. 221 * Next evaluation should be after implementing pack/verify and creating test vectors. 222 223 --- 224 225 == Evaluation Template (copy for next evaluation) 226 227 === YYYY-MM-DD (Evaluator: NAME/HANDLE) 228 229 Checklist: link:QOL-AUDIT.adoc[QOL-AUDIT.adoc] @ commit `COMMIT_HASH` 230 231 Scope: (release tag, branch, or commit range) 232 233 Platforms covered: (e.g., linux x86_64 rootless, linux aarch64 rootful, etc.) 234 235 ==== Overall rating 236 237 *Rating:* (Fullest | Full | Strong | Moderate | Basic | Not implemented) 238 239 *Justification (1–3 sentences):* 240 241 - ... 242 243 ==== Domain ratings (with evidence) 244 245 [cols="1,1,4",options="header"] 246 |=== 247 | Domain | Rating | Evidence (links / commands) 248 249 | A. Onboarding + doctor 250 | (..) 251 | * CI: ... 252 * Evidence file: `evidence/YYYY-MM-DD/doctor.txt` 253 * Notes: ... 254 255 | B. CLI ergonomics + surfaces 256 | (..) 257 | * CI: ... 258 * Schema: `spec/*.json` @ commit ... 259 * Notes: ... 260 261 | C. Happy-path E2E (README is executable) 262 | (..) 263 | * CI: ... 264 * Logs: ... 265 * Notes: ... 266 267 | D. Config/state/cleanup 268 | (..) 269 | * Docs: `docs/...` 270 * CI: ... 271 * Notes: ... 272 273 | E. Release/reproducibility 274 | (..) 275 | * Release assets: ... 276 * Attestations: ... 277 * Notes: ... 278 279 | F. Seam + surface checks (conformance/interop) 280 | (..) 281 | * Conformance: ... 282 * Interop matrix: ... 283 * Notes: ... 284 285 | G. Smoothing docs + usability 286 | (..) 287 | * Troubleshooting: ... 288 * Mental model: ... 289 * Notes: ... 290 |=== 291 292 ==== Regressions since last evaluation 293 294 * None / list issues 295 - (issue/PR link) — what regressed, impact, mitigation 296 297 ==== Top 5 next actions (ranked) 298 299 . [ ] (action) — link to issue/PR, owner, target milestone 300 . [ ] ... 301 . [ ] ... 302 . [ ] ... 303 . [ ] ... 304 305 ==== Notes 306 307 * Anything learned during evaluation that should become a checklist item: 308 - ...