/ CLAUDE.md
CLAUDE.md
1 # CLAUDE.md — Agent Coding Guidelines 2 3 Python 3.11+ · Ruff · mypy --strict · pytest · uv 4 5 --- 6 7 ## Architecture 8 9 ``` 10 cli/ → entry points only (arg parsing, logging setup) 11 pipeline/ → orchestration: calls services in sequence, no business logic 12 services/ → domain logic (coaching, scoring, outreach strategy) 13 integrations/ → one file per external API (brightdata.py, notion.py, anthropic.py) 14 models/ → Pydantic schemas only, no logic 15 prompts/ → LLM prompt templates (.txt / .jinja2), not inline f-strings 16 config.py → pydantic-settings backed by .env 17 exceptions.py → custom exception hierarchy 18 ``` 19 20 **Layer rules:** 21 - Services depend on Protocols, never on integration classes directly. 22 - No API/Notion logic inside scoring or coaching logic. 23 - No business logic in CLI files. 24 - One class = one responsibility. >5 unrelated public methods → split it. 25 26 **Pre-push hook location:** `.githooks/pre-push` (not `.git/hooks/`). 27 Activated via `git config core.hooksPath .githooks`. 28 29 **Bandit path maintenance:** When adding a new top-level module, add it to 30 the bandit path list in `.githooks/pre-push`. Bandit does not support 31 `targets` in `pyproject.toml`, so paths are specified as CLI arguments. 32 33 **Vulture whitelist:** When adding Pydantic models with `model_config`, 34 `@field_validator`/`@model_validator`, or new Protocol methods, check 35 `uv run vulture`. If it flags false positives, add entries to 36 `vulture_whitelist.py`. 37 38 --- 39 40 ## Configuration & Secrets 41 42 - All config lives in `config.py` using `pydantic-settings` + `.env`. 43 - No hardcoded URLs, model names, API keys, sleep durations, thresholds, or magic numbers. 44 - Magic numbers → named constants with a comment explaining the value. 45 - Validate all env vars at startup. Missing required var → raise `ConfigurationError` with the var name. 46 - All required env vars documented in `.env.example`. 47 48 ```python 49 class Settings(BaseSettings): 50 anthropic_api_key: str 51 brightdata_token: str 52 notion_token: str | None = None 53 llm_model: str = "claude-sonnet-4-6" 54 polling_interval_seconds: int = 15 55 56 class Config: 57 env_file = ".env" 58 59 settings = Settings() 60 ``` 61 62 --- 63 64 ## Error Handling & Exceptions 65 66 - Catch specific exceptions only — no bare `except:`. 67 - No silent failures. If you can't recover: log at ERROR + raise a domain exception. 68 - Custom exceptions live in `exceptions.py` (e.g., `APIError`, `ConfigurationError`, `ParseError`). 69 - Validate external data (API responses, JSON) before accessing keys — use `.get()` or parse via Pydantic. 70 - No `sys.exit()` in library code — only in CLI entry points. 71 72 --- 73 74 ## Pydantic Models 75 76 - One canonical definition per model in `models/`, split by domain (`models/job.py`, `models/profile.py`). 77 - Never redefine a model in a second place. Never use bare `dict` when a model exists. 78 - Use `model_validator` / `field_validator` for invariants. 79 - Use `ConfigDict(frozen=True)` for value objects. 80 - Use `@dataclass` for internal data containers that don't cross I/O boundaries. 81 82 --- 83 84 ## External API Clients 85 86 - Each API gets its own class in `integrations/`, implementing a Protocol for testability. 87 - Explicit timeouts on all HTTP calls — never open-ended. 88 - Retry transient failures (5xx, timeout) with `tenacity` exponential backoff. 4xx = permanent → don't retry. 89 - Log request method + URL at DEBUG; status code at INFO. 90 - Never log request/response bodies at INFO+ (may contain secrets or PII). 91 92 --- 93 94 ## LLM Provider 95 96 All providers implement a common Protocol: 97 98 ```python 99 class LLMProvider(Protocol): 100 def complete(self, system: str, user: str, *, temperature: float = 0.0, seed: int | None = None) -> str: ... 101 ``` 102 103 - Model name, temperature, max tokens come from `settings` — never hardcoded. 104 - Wrap calls in a retry decorator for transient errors. 105 - Prompt templates live in `prompts/` — not as inline strings in business logic. 106 107 --- 108 109 ## Project-Specific Conventions 110 111 - **Language**: English only in all code, comments, docstrings. No French. 112 - **No emojis** in code, comments, or logs. 113 - **No commented-out code** — delete it, use git history. 114 - **Docstrings**: Google style. Include Args, Returns, Raises for non-trivial functions. 115 - **`__all__`**: Define in every public module. 116 - **Async**: Pick one model per pipeline (full async or thread-pool). Don't mix. 117 - **Cache directories**: Version them (`cache/v1/`) so schema changes don't corrupt old data. 118 - **Dependencies**: `pyproject.toml` + `uv`. Dev tools in `[dependency-groups]`. 119 120 --- 121 122 ## Tests 123 124 - **Test names must convey intent**: Name describes *what behaviour is verified and why it matters*, not just the method called. Prefer `test_error_entries_excluded_for_rescoring` over `test_error_entries`. 125 - **Rationale comments on every non-obvious test**: A one-line `#` comment above the test body explaining *why* this test exists — what real-world scenario or edge case it guards against. Trivial happy-path tests (e.g., "returns expected value") may omit the comment if the name is self-explanatory. 126 - **Class and module docstrings must reflect full scope**: If a test class covers both happy path and error cases, the name and docstring must say so — not just "error handling". 127 - **No network, no API keys**: Unit tests must run without secrets. Required env vars are set in `tests/conftest.py` with dummy values. 128 129 --- 130 131 ## Agent Workflow Rules 132 133 These are mandatory behavioral rules for AI agents working on this codebase. 134 135 ### Spec Required 136 Do not implement any feature without an approved spec in `specs/active/`. Read the spec first, break the work into atomic steps, then implement. 137 138 ### Check Learnings 139 Before starting any task, read `LEARNINGS.md` for known pitfalls and past mistakes. 140 141 ### Pattern Sweep 142 When a review finds one instance of a defect class (bare except, hardcoded value, missing annotation), search **all files in scope** for other occurrences before proposing any fix. 143 144 ### List Before Fixing 145 Report all discovered occurrences with file:line references **before** writing code: 146 ``` 147 Found 4 occurrences of bare except: 148 - services/coach.py:91 149 - services/coach.py:112 150 - integrations/brightdata.py:205 151 - cli/main.py:368 152 ``` 153 154 ### Permission Gate 155 After listing occurrences, **ask for explicit approval** before applying grouped fixes. No silent batch changes. 156 157 ### Minimal Diff 158 Fix only what was asked. Don't refactor surrounding code, rename variables, add docstrings, or clean up style in unrelated lines. Smallest possible diff. 159 160 ### No Speculative Changes 161 Don't add error handling, logging, validation, or abstractions for scenarios not present in the code or explicitly requested. 162 163 ### Spec Naming Convention 164 Spec files follow the pattern `YYYY_MM_DD-NN_name.md` (e.g., `2026_03_12-01_score_jobs.md`). 165 The `-NN` suffix (01, 02, …) handles multiple specs created on the same day. 166 167 ### Archive Spec on Merge 168 The last commit on every feature branch must move the spec from `specs/active/` to `specs/archived/`. No spec should remain in `specs/active/` after its PR is merged. 169 170 --- 171 172 ## Journal Guidelines 173 174 * **Location:** Use a single rolling file at `docs/journal.md`. 175 * **Session Index:** Add a one-line summary with the date (YYYY-MM-DD). 176 * **Session Entries:** Append new sessions at the end using the template fields: Goal, Steps, Results, Metrics, Decisions, Issues, Next, Artifacts. 177 * **Artifacts:** Link with repo-relative paths only. 178 * **Brevity:** Keep entries concise and factual. 179 * **Process Exception:** Updating `docs/journal.md` does not require an approved spec (e.g., brainstorming or session notes). All other code/config changes still require an approved spec.