credential-rotation-playbook.md
1 --- 2 title: Credential Rotation Playbook 3 category: security 4 last_verified: 2026-03-18 5 related_files: 6 - .env.secrets.example 7 - .env 8 - docs/plans/distributed-agent-system.md 9 - docs/plans/ironclaw-setup.md 10 tags: [security, credentials, rotation, ops] 11 status: active 12 --- 13 14 # Credential Rotation Playbook 15 16 Solo-operator playbook for rotating all credentials used by the 333Method pipeline. Designed for a NixOS host + Docker (Claude Code) environment with SQLite and Cloudflare Workers. 17 18 ## Secrets Storage Rules 19 20 - **Never commit secrets to git.** `.env.secrets` is gitignored. If you suspect a secret was committed, treat it as compromised and rotate immediately. 21 - **Production secrets** are managed via SOPS (`333Method-infra/secrets/production.yaml`). After rotating any credential, update SOPS as well. 22 - **Backup secrets** in a password manager (Bitwarden, 1Password, KeePassXC). Store each credential with: service name, creation date, rotation date, and the dashboard URL for regeneration. 23 - **Shared secrets** (used by both pipeline and web infra): `AUDITANDFIX_WORKER_SECRET`, `UNSUBSCRIBE_SECRET`, `RESEND_WEBHOOK_SECRET`. These require coordinated updates across `.env.secrets` AND Cloudflare Worker env vars / Hostinger `.htaccess`. 24 - **2Step shares secrets** via `~/code/2Step/src/utils/load-env.js` loading from 333Method. Rotating Twilio or Resend keys affects 2Step too. 25 26 --- 27 28 ## Rotation Schedule 29 30 | Credential | Rotation | Trigger | 31 | --------------------------- | ----------------- | ------------------------------------ | 32 | `OPENROUTER_API_KEY` | Quarterly | Billing cycle or suspected leak | 33 | `ZENROWS_API_KEY` | Annually | Low risk (read-only scraping) | 34 | `TWILIO_AUTH_TOKEN` | Quarterly | Financial risk (SMS costs money) | 35 | `RESEND_API_KEY` | Quarterly | Reputation risk (email sending) | 36 | `RESEND_WEBHOOK_SECRET` | Annually | Low risk (inbound verification only) | 37 | `PAYPAL_CLIENT_SECRET` | Quarterly | Financial risk (payment processing) | 38 | `GOOGLE_SHEETS_PRIVATE_KEY` | Annually | Low risk (internal reporting) | 39 | `AUDITANDFIX_WORKER_SECRET` | Semi-annually | Shared secret, coordinated rotation | 40 | `UNSUBSCRIBE_SECRET` | Annually | Low risk (HMAC signing) | 41 | `DATAFORSEO_PASSWORD` | Annually | Low risk | 42 | `ZEROBOUNCE_API_KEY` | Annually | Low risk | 43 | `FIXER_API_KEY` | Annually | Low risk (free tier) | 44 | `PEXELS_API_KEY` | Annually | Low risk (2Step, image search) | 45 | SSH keys | Annually | Or on any suspected host compromise | 46 | `NOPECHA_API_KEY` | Never (free tier) | Only on compromise | 47 48 **Calendar reminder:** Set a quarterly reminder (1st of Jan/Apr/Jul/Oct) to run through the quarterly rotations. Rehearse the full restore path at least once per quarter (see Testing section). 49 50 --- 51 52 ## Per-Service Rotation Steps 53 54 ### OPENROUTER_API_KEY 55 56 1. **Generate:** [openrouter.ai/keys](https://openrouter.ai/keys) -- create new key before revoking old one. 57 2. **Update:** `~/.../333Method/.env.secrets` -- replace `OPENROUTER_API_KEY=...` 58 3. **Verify:** `node -e "require('./src/utils/load-env')(); const k=process.env.OPENROUTER_API_KEY; console.log('Key starts with:', k?.slice(0,8));"` then run a single scoring pass: `SCORE_SITES_BATCH=1 node src/score-sites.js` 59 4. **Revoke old key:** Back in the OpenRouter dashboard, delete the previous key. 60 5. **Downtime risk:** None if you create-then-revoke. Pipeline stages will fail with 401 if the key is invalid. 61 6. **SOPS:** Update `333Method-infra/secrets/production.yaml`. 62 63 ### ZENROWS_API_KEY 64 65 1. **Generate:** [app.zenrows.com](https://app.zenrows.com/) -- API Keys section. 66 2. **Update:** `.env.secrets`. 67 3. **Verify:** `SERP_BATCH=1 node src/serp-scraper.js` -- confirm one SERP fetch succeeds. 68 4. **Revoke old key** in ZenRows dashboard. 69 5. **Downtime risk:** None. SERP scraping can pause briefly. 70 71 ### TWILIO_AUTH_TOKEN (and SID) 72 73 1. **Generate:** [console.twilio.com](https://console.twilio.com/) -- Account > API keys & tokens > Auth tokens. You can create a secondary auth token before revoking the primary. 74 2. **Update:** `.env.secrets` -- `TWILIO_ACCOUNT_SID` (rarely changes) and `TWILIO_AUTH_TOKEN`. 75 3. **Also update:** Twilio test credentials if rotating those too. 76 4. **Verify:** `node -e "require('./src/utils/load-env')(); const twilio = require('twilio'); const c = twilio(process.env.TWILIO_ACCOUNT_SID, process.env.TWILIO_AUTH_TOKEN); c.api.accounts(process.env.TWILIO_ACCOUNT_SID).fetch().then(a => console.log('OK:', a.friendlyName)).catch(e => console.error('FAIL:', e.message));"` 77 5. **Revoke:** Promote new token to primary, revoke old. 78 6. **Downtime risk:** Minimal. Twilio supports secondary tokens for zero-downtime rotation. Inbound SMS webhook verification will fail if the token is wrong -- check webhook logs. 79 80 ### RESEND_API_KEY 81 82 1. **Generate:** [resend.com/api-keys](https://resend.com/api-keys) -- create new key. 83 2. **Update:** `.env.secrets` -- `RESEND_API_KEY`. 84 3. **Verify:** Send a test email: `OUTREACH_DRY_RUN=true node -e "require('./src/utils/load-env')(); /* check key loads */"` or trigger one email send to your own address. 85 4. **Revoke old key** in Resend dashboard. 86 5. **Downtime risk:** Email sends will fail with 401 during the gap. Keep it short. 87 88 ### RESEND_WEBHOOK_SECRET 89 90 1. **Generate:** [resend.com/webhooks](https://resend.com/webhooks) -- edit webhook, regenerate signing secret. 91 2. **Update:** `.env.secrets` -- `RESEND_WEBHOOK_SECRET=whsec_...` 92 3. **Also update:** If using a Cloudflare Worker for webhook relay (`EMAIL_EVENTS_WORKER_URL`), update the worker's environment variable too: `cd workers/resend-webhook && wrangler secret put RESEND_WEBHOOK_SECRET`. 93 4. **Verify:** Trigger a test event from Resend dashboard; check webhook logs for successful signature validation. 94 5. **Downtime risk:** Inbound email events (bounces, complaints) will be rejected until both sides match. No impact on outbound sending. 95 96 ### PAYPAL_CLIENT_ID / PAYPAL_CLIENT_SECRET 97 98 1. **Generate:** [developer.paypal.com/dashboard/applications](https://developer.paypal.com/dashboard/applications/) -- create new REST API app (or regenerate secret on existing app). 99 2. **Update:** `.env.secrets` -- both `PAYPAL_CLIENT_ID` and `PAYPAL_CLIENT_SECRET`. 100 3. **Also update:** Cloudflare Worker env vars if the worker uses these directly: `cd workers/paypal-webhook && wrangler secret put PAYPAL_CLIENT_SECRET`. 101 4. **Verify:** `node -e "require('./src/utils/load-env')(); const { getAccessToken } = require('./src/payment/paypal-client'); getAccessToken().then(t => console.log('OK: token length', t.length)).catch(e => console.error('FAIL:', e.message));"` 102 5. **Downtime risk:** Payment processing will fail during rotation. Do this during off-hours (Sydney late night). Webhook verification may also break if the worker secret is out of sync. 103 104 ### GOOGLE_SHEETS_PRIVATE_KEY / CLIENT_EMAIL 105 106 1. **Generate:** [console.cloud.google.com](https://console.cloud.google.com/) -- IAM & Admin > Service Accounts > your account > Keys > Add Key > JSON. 107 2. **Update:** `.env.secrets` -- extract `client_email` and `private_key` from the downloaded JSON. Preserve `\n` characters in the private key value. 108 3. **Verify:** `node -e "require('./src/utils/load-env')(); const { getAuthClient } = require('./src/utils/google-sheets'); getAuthClient().then(() => console.log('OK')).catch(e => console.error('FAIL:', e.message));"` 109 4. **Revoke:** Delete old key in GCP console. 110 5. **Downtime risk:** None. Sheets reporting is non-critical. 111 112 ### AUDITANDFIX_WORKER_SECRET 113 114 This is a shared secret between the pipeline (`reply-processor.js` POSTs to `api.php`) and the Hostinger PHP backend. Both sides must match. 115 116 1. **Generate:** `openssl rand -hex 32` 117 2. **Update ALL locations:** 118 - `.env.secrets` -- `AUDITANDFIX_WORKER_SECRET=<new value>` 119 - Hostinger `.htaccess` -- `SetEnv AUDITANDFIX_WORKER_SECRET <new value>` (edit via Hostinger File Manager or SSH) 120 - Any Cloudflare Workers that use this secret: `wrangler secret put AUDITANDFIX_WORKER_SECRET` 121 3. **Verify:** Trigger a prefill store from the pipeline and confirm the PHP endpoint accepts it (check HTTP response code). 122 4. **Downtime risk:** Prefill short URLs (`/o/{site_id}`) will return 403 if the secrets are mismatched. Update Hostinger first, then pipeline, to minimize the gap. 123 124 ### UNSUBSCRIBE_SECRET 125 126 1. **Generate:** `openssl rand -hex 32` 127 2. **Update:** `.env.secrets` and the Cloudflare unsubscribe worker: `cd workers/unsubscribe && wrangler secret put UNSUBSCRIBE_SECRET`. 128 3. **Verify:** Generate an unsubscribe link and confirm it resolves correctly. 129 4. **Downtime risk:** Existing unsubscribe links in already-sent emails will break if the HMAC key changes. Consider this carefully -- you may want to support both old and new keys for a grace period, or accept that old links will fail. 130 131 ### DATAFORSEO_LOGIN / DATAFORSEO_PASSWORD 132 133 1. **Generate:** [app.dataforseo.com/](https://app.dataforseo.com/) -- Account Settings > API credentials. 134 2. **Update:** `.env.secrets`. 135 3. **Verify:** Run keyword validation: `node -e "require('./src/utils/load-env')(); /* test DataForSEO call */"` 136 4. **Downtime risk:** None. Keyword research can pause. 137 138 ### SSH Keys 139 140 1. **Generate:** `ssh-keygen -t ed25519 -C "jason@nixos-$(date +%Y%m%d)"` on the host. 141 2. **Update:** Add new public key to `~/.ssh/authorized_keys` on remote hosts and GitHub deploy keys. 142 3. **Verify:** `ssh -i ~/.ssh/new_key host` works. 143 4. **Revoke:** Remove old public key from `authorized_keys` and GitHub. 144 5. **Downtime risk:** Lock yourself out if you remove the old key before verifying the new one. Always test first. 145 146 --- 147 148 ## Emergency Rotation (Suspected Compromise) 149 150 If you suspect any credential has leaked (committed to git, visible in logs, unauthorized API usage), follow this order. Rotate highest blast-radius credentials first. 151 152 ### Priority order: 153 154 1. **PAYPAL_CLIENT_SECRET** -- financial loss. Revoke immediately in PayPal dashboard, then regenerate. 155 2. **TWILIO_AUTH_TOKEN** -- financial loss (SMS charges). Revoke in Twilio console. 156 3. **SSH keys** -- full host access. Remove compromised public key from all `authorized_keys` files. 157 4. **OPENROUTER_API_KEY** -- API billing. Revoke in OpenRouter dashboard. 158 5. **RESEND_API_KEY** -- domain reputation damage (spam). Revoke in Resend dashboard. 159 6. **AUDITANDFIX_WORKER_SECRET** -- could allow unauthorized prefill injection. Rotate on Hostinger first. 160 7. **Everything else** -- rotate in any order. 161 162 ### Emergency checklist: 163 164 - [ ] Identify which credential(s) were exposed and how 165 - [ ] Revoke the exposed credential immediately (don't wait to generate the replacement) 166 - [ ] Check service dashboards for unauthorized usage (API call logs, billing spikes) 167 - [ ] Generate replacement credential 168 - [ ] Update `.env.secrets` on the pipeline host 169 - [ ] Update SOPS (`333Method-infra/secrets/production.yaml`) 170 - [ ] Update any Cloudflare Workers that use the credential (`wrangler secret put ...`) 171 - [ ] Update Hostinger if the credential is a shared secret 172 - [ ] Restart the pipeline service: `systemctl --user restart 333method-pipeline` 173 - [ ] Verify pipeline is healthy: `bash scripts/monitoring-checks.sh` 174 - [ ] If the leak was a git commit: rewrite history with `git filter-repo` or `BFG`, force-push, and treat every credential in that file as compromised 175 - [ ] Document the incident: what leaked, when, what was rotated, any evidence of misuse 176 177 --- 178 179 ## Testing the Restore Path 180 181 Run this quarterly to verify you can rotate credentials without breaking things. Do this during a maintenance window (no active outreach sends). 182 183 ### Quick verification (15 min) 184 185 For each credential you rotated, run its verify step from the per-service section above. At minimum, test these critical paths: 186 187 - [ ] LLM calls work: `SCORE_SITES_BATCH=1 node src/score-sites.js` (uses OpenRouter or Anthropic) 188 - [ ] Email works: send a test email to yourself via Resend 189 - [ ] SMS works: send a test SMS to yourself via Twilio 190 - [ ] Sheets work: `node -e "require('./src/utils/load-env')(); require('./src/utils/google-sheets').getAuthClient().then(() => console.log('OK'))"` 191 - [ ] Pipeline overall: `bash scripts/monitoring-checks.sh` -- no errors 192 193 ### Canary approach 194 195 If you want to test a new key before cutting over: 196 197 1. Set the new key in a separate env var (e.g., `OPENROUTER_API_KEY_NEW=sk-or-...`) 198 2. Manually test with: `OPENROUTER_API_KEY=$OPENROUTER_API_KEY_NEW node -e "..."` 199 3. Once verified, swap into the real variable and restart the pipeline. 200 201 ### Rollback 202 203 If a new credential breaks something: 204 205 1. Restore the old key from your password manager backup. 206 2. Update `.env.secrets`. 207 3. Restart: `systemctl --user restart 333method-pipeline` 208 4. Investigate why the new key failed before trying again. 209 210 There is no staging environment -- this is a solo-operator stack. The canary approach above is the closest equivalent. Keep the old key in your password manager for at least 24 hours after rotation. 211 212 --- 213 214 ## IRONCLAW Isolation (TODO) 215 216 Per the distributed-agent-system plan, the following isolation measure is planned but not yet implemented: 217 218 - **Create `IRONCLAW_OPENROUTER_API_KEY`** with a dedicated OpenRouter sub-account. 219 - Purpose: if the IronClaw agent framework is compromised, the attacker gets a key with its own rate limit and billing -- not the pipeline's main key. 220 - Implementation: add the key to `.env.secrets`, configure IronClaw to read `IRONCLAW_OPENROUTER_API_KEY` instead of `OPENROUTER_API_KEY`. 221 - Set a spending cap on the sub-account in OpenRouter dashboard. 222 - Add to the rotation schedule as a quarterly credential (same as main OpenRouter key). 223 224 This is a blast-radius reduction measure. Until implemented, a single `OPENROUTER_API_KEY` is shared across all consumers. 225 226 --- 227 228 ## Automation Potential 229 230 About **30–40% of the rotation workflow is automatable** — primarily detection and verification, not key generation itself (most service dashboards don't expose a rotation API). 231 232 ### What can be automated now 233 234 **Rotation reminder cron** — a `scripts/check-rotation-schedule.js` that reads a `credentials-metadata.json` tracking `{service, last_rotated, interval_days}` and fires a human-review queue entry (or Telegram alert via IronClaw) when any credential is overdue. Zero manual effort; just keeps a metadata file updated after each rotation. 235 236 **Post-rotation verification** — a `scripts/verify-credentials.js` that makes a lightweight test call for each active API key and reports pass/fail. Run this after rotation to confirm nothing is broken before restarting the pipeline. Could also run on a weekly schedule to catch silently-expired keys. 237 238 **Twilio key swap** — Twilio's API supports creating a secondary auth token, promoting it to primary, and revoking the old one programmatically. Full zero-downtime rotation scriptable with ~30 lines of Node.js. 239 240 **sops re-encryption** — once you've updated `.env.secrets` manually, re-encrypting SOPS is a single command (`sops -e -i secrets/production.yaml`). Scriptable as part of a post-rotation hook. 241 242 **Cloudflare Worker secret updates** — `wrangler secret put <KEY>` is CLI-driven, fully scriptable once you have the new value. 243 244 ### What stays manual 245 246 - **Key generation** for most services (OpenRouter, Anthropic, Resend, PayPal, Google Sheets, ZenRows) — no rotation API, requires dashboard interaction 247 - **Hostinger `.htaccess`** — no API; requires File Manager or SSH 248 - **SSH key rotation** — pushing new pubkeys to authorized hosts requires human verification 249 - **UNSUBSCRIBE_SECRET** — rotation breaks existing unsubscribe links in sent emails; human judgment required on timing 250 251 ### Recommended first scripts to build 252 253 When the orchestrator is stable, add these as monthly cron batch types: 254 255 1. `scripts/check-rotation-schedule.js` — reads `credentials-metadata.json`, creates human-review items for overdue credentials 256 2. `scripts/verify-credentials.js` — test-calls each API, reports failures to human-review queue 257 258 These give automated _detection_ while rotation stays manual. Both are ~50 lines each. 259 260 --- 261 262 ## Post-Rotation Restart Checklist 263 264 After any credential rotation: 265 266 - [ ] `.env.secrets` updated with new value 267 - [ ] SOPS updated (`333Method-infra/secrets/production.yaml`) 268 - [ ] Cloudflare Workers updated (if applicable): `wrangler secret put <KEY_NAME>` 269 - [ ] Hostinger updated (if applicable): `.htaccess` SetEnv 270 - [ ] Password manager updated with new value + rotation date 271 - [ ] Pipeline service restarted: `systemctl --user restart 333method-pipeline` 272 - [ ] `bash scripts/monitoring-checks.sh` passes 273 - [ ] Old key revoked in service dashboard (not before verifying new key works) 274 - [ ] 2Step checked (if rotating Twilio/Resend -- it loads from 333Method's env files)