Cradicle Explorer

/ UBIQUITOUS_LANGUAGE.md
UBIQUITOUS_LANGUAGE.md
  1  # Ubiquitous Language
  2  
  3  Canonical terminology for the WriterAIScore project (browser extension surfacing
  4  author-trust signals on book marketplaces). Use these terms consistently in
  5  code, copy, docs, and commit messages.
  6  
  7  ## Actors
  8  
  9  | Term                 | Definition                                                                                          | Aliases to avoid                       |
 10  |----------------------|-----------------------------------------------------------------------------------------------------|----------------------------------------|
 11  | **Buyer**            | A person at the point of purchase decision on a Marketplace. The primary user of the extension.     | consumer, shopper, user                |
 12  | **Reader**           | A person who has already consumed a purchased work. Distinct from Buyer.                            | consumer, end-user                     |
 13  | **Author**           | The person or identity named on a book's byline. May be an Original Author or a Synthetic Author.   | writer, creator                        |
 14  | **Original Author**  | A real human whose published work has been appropriated into Laundered Content.                     | victim, source author                  |
 15  | **Synthetic Author** | A pseudonymous identity with no verifiable human behind it, used to publish AI-generated books.     | fake author, bot, AI persona, impostor |
 16  | **Fraudster**        | The unknown actor or organization operating one or more Synthetic Authors.                          | bad actor, scammer, unethical actor    |
 17  | **Publisher**        | An organization that vouches for and distributes authored content.                                  | press, imprint                         |
 18  | **Marketplace**      | A retail platform that lists books for sale (Amazon, Goodreads, Barnes & Noble, Apple Books, Kobo). | platform, store, site                  |
 19  
 20  ## The fraud
 21  
 22  | Term                     | Definition                                                                                                           | Aliases to avoid                                     |
 23  |--------------------------|----------------------------------------------------------------------------------------------------------------------|------------------------------------------------------|
 24  | **Laundered Content**    | Published material algorithmically reshuffled from one or more Original Authors' work to evade plagiarism detection. | recycled content, rewritten content, scraped content |
 25  | **Pseudonymous Listing** | A Marketplace listing attributed to a Synthetic Author.                                                              | fake listing, bot listing                            |
 26  | **Attribution Loss**     | The condition where an Original Author receives no credit or royalty for content derived from their work.            | plagiarism (too narrow), theft (too broad)           |
 27  
 28  ## Trust machinery
 29  
 30  | Term                     | Definition                                                                                                                       | Aliases to avoid                         |
 31  |--------------------------|----------------------------------------------------------------------------------------------------------------------------------|------------------------------------------|
 32  | **Trust Signal**         | A single, observable, quantifiable input that informs a Buyer's judgement about authorship authenticity.                         | indicator, hint, flag, metric            |
 33  | **Trust Score**          | A weighted composite of multiple Trust Signals. Distinct from a single Signal.                                                   | rating, grade, confidence                |
 34  | **Publication Velocity** | Number of books attributed to an Author over a fixed time window (e.g. last 12 months). The primary Trust Signal.                | publishing rate, output rate, throughput |
 35  | **Baseline**             | The typical Publication Velocity for a human Author in a given Genre, used as a comparator.                                      | norm, benchmark, average                 |
 36  | **Genre**                | A book-subject category that scopes the Baseline (e.g. romance, cozy mystery, technical non-fiction). Sourced from Marketplace metadata. | category, subject, vertical              |
 37  | **Burst**                | A cluster of Publication Velocity concentrated in an abnormally short window (e.g. ≥5 books in 30 days). A derived Trust Signal. | spike, flood                             |
 38  | **Topic-Spread Entropy** | A Trust Signal measuring how unrelated an Author's book subjects are.                                                            | topic diversity, subject spread          |
 39  | **Provenance**           | Cryptographic attestation of a work's origin and edit history, carried by a C2PA Manifest.                                       | source, origin                           |
 40  | **C2PA Manifest**        | The concrete Provenance record attached to a work, per the C2PA specification (Coalition for Content Provenance and Authenticity). | manifest, attestation, C2PA record       |
 41  
 42  ## Verification strategies
 43  
 44  | Term                   | Definition                                                                                                                 | Aliases to avoid    |
 45  |------------------------|----------------------------------------------------------------------------------------------------------------------------|---------------------|
 46  | **Verify the Human**   | Strategy to confirm an Author is a real person with verifiable identity (ORCID, government ID, institutional affiliation). | identity check, KYC |
 47  | **Verify the Content** | Strategy to confirm a work's origin and authenticity (C2PA, similarity scans, disclosure labels).                          | content auth        |
 48  | **Verify the Signal**  | Strategy to confirm that surface trust signals (reviews, badges, listings) are themselves trustworthy.                     | signal auth         |
 49  
 50  ## Extension anatomy
 51  
 52  | Term               | Definition                                                                                                     | Aliases to avoid               |
 53  |--------------------|----------------------------------------------------------------------------------------------------------------|--------------------------------|
 54  | **Info Card**      | The UI element the extension injects into a Marketplace page to display Trust Signals to the Buyer. Placement is defined per-Marketplace by the Adapter; the target is adjacent to the Author byline, not overlaying the purchase controls. | widget, tooltip, popup, banner |
 55  | **Content Script** | The browser-extension component that runs on Marketplace pages, detects the Author, and renders the Info Card. | injection, page script         |
 56  | **Lookup**         | A query to a book-metadata API (Google Books, OpenLibrary) resolving an Author name to their book list.        | fetch, query, API call         |
 57  | **Author Cache**   | Local per-browser storage of prior Lookup results, keyed by Author name, with a TTL.                           | local store, cache             |
 58  | **Adapter**        | A per-Marketplace module that knows how to extract the Author from that site's DOM.                            | scraper, parser, selector      |
 59  
 60  ## Relationships
 61  
 62  - A **Marketplace** hosts **Pseudonymous Listings** attributed to **Synthetic
 63    Authors**.
 64  - A **Fraudster** operates one or more **Synthetic Authors**.
 65  - A **Synthetic Author** produces **Laundered Content** derived from one or more
 66    **Original Authors**, causing **Attribution Loss**.
 67  - A **Trust Score** aggregates multiple **Trust Signals**; **Publication
 68    Velocity** is the primary **Trust Signal**.
 69  - A **Burst** is a derived Trust Signal computed from Publication Velocity over
 70    short windows.
 71  - The **Content Script** uses an **Adapter** to extract an **Author**, issues a
 72    **Lookup**, caches the result in the **Author Cache**, and renders the **Info
 73    Card**.
 74  - The extension's three release strategies map to **Verify the Human**, **Verify
 75    the Content**, and **Verify the Signal**.
 76  
 77  ## Example dialogue
 78  
 79  > **Dev:** "When the Content Script lands on an Amazon `/dp/` page, do we
 80  > always fetch the `/author/` page, or do we hit the Author Cache first?"
 81  
 82  > **Domain expert:** "Cache first, seven-day TTL, keyed by Amazon author
 83  > ID. The catalog for a given Author doesn't move minute-to-minute, and
 84  > fetching aggressively would look like automated access."
 85  
 86  > **Dev:** "Got it. And if the parsed catalog shows forty books in the last
 87  > year — we label the Author a Synthetic Author in the Info Card?"
 88  
 89  > **Domain expert:** "No. We show the count, attribute it *per Amazon's
 90  > catalog*, and let the Buyer decide. Labelling an Author a Synthetic
 91  > Author without proof is a legal risk and defeats the neutral-Signal
 92  > framing."
 93  
 94  > **Dev:** "Do we show a Baseline alongside for comparison?"
 95  
 96  > **Domain expert:** "Not in v0.1. 2026-04-17 decision — raw counts only,
 97  > rolling 12mo and calendar year, side by side. A Baseline is contestable
 98  > (what counts as 'typical'?), and the bet is that two raw counts with a
 99  > visible *see full catalog* link are enough for the Buyer to judge.
100  > Revisit after observing usage."
101  
102  > **Dev:** "So the Info Card is Verify the Signal, not Verify the Human?"
103  
104  > **Domain expert:** "For v0.1, yes. Verify the Human arrives in v0.4
105  > when we pull from Wikipedia, Goodreads, and ORCID. Keep the MVP
106  > strictly to raw Publication Velocity from Amazon's own catalog."
107  
108  ## Versioned scope
109  
110  The glossary is the full domain vocabulary, but only a subset is active in any
111  given release. Terms outside the active version remain canonical but
112  unimplemented — do not write code, UI copy, or docs that assume them.
113  
114  | Version | Surface(s)                                  | Active Trust Signals (raw, no composite)              | Active Strategy           | Notable inactive terms                                                                             |
115  |---------|---------------------------------------------|-------------------------------------------------------|---------------------------|----------------------------------------------------------------------------------------------------|
116  | v0.1    | Amazon `/dp/` product page                  | Publication Velocity (rolling 12mo + calendar year)   | Verify the Signal         | Baseline, Burst, Topic-Spread Entropy, Genre, Trust Score, Provenance, Verify the Human            |
117  | v0.2    | + Amazon `/author/` page with timeline      | (same Signals; timeline is visualization, not a new Signal) | Verify the Signal   | (as v0.1)                                                                                          |
118  | v0.3    | + Amazon search results                     | (same Signals)                                        | Verify the Signal         | (as v0.1)                                                                                          |
119  | v0.4    | (same surfaces)                             | + Wikipedia / Goodreads / ORCID presence              | Verify the Signal + Human | Trust Score (still not composed), C2PA Manifest, Baseline (possibly never shipped)                 |
120  | Later   | Multi-Marketplace                           | Composed Trust Score, C2PA Manifest checks            | All three strategies      | —                                                                                                  |
121  
122  *On Baseline:* the 2026-04-17 grilling decided to ship v0.1 without a
123  Baseline comparator. The Info Card shows raw counts; the Buyer brings
124  the context. Whether a Baseline (editorial or computed) is ever added
125  is an open question to revisit after observing v0.1 usage. Until then,
126  Baseline remains canonically defined here but inactive in every shipped
127  version.
128  
129  *On the roadmap shape:* v0.2–v0.4 are organized around *surface and
130  data-source growth*, not new Trust Signals. Identity verification
131  (ORCID, Wikipedia) arrives in v0.4 when the backend lands that can
132  cross-reference those sources, not v0.3 as earlier plans suggested.
133  
134  Three standing rules that follow from this:
135  
136  - Do not introduce UI copy referencing a **Trust Score** until the composite
137    exists. v0.1 ships Signals, not a Score — despite the repository name.
138  - Do not label an Author a **Synthetic Author** in the Info Card. The Buyer
139    draws that conclusion from the Signal; the extension does not.
140  - Do not introduce a **Baseline** comparator in v0.1 — no "typical author"
141    number, no percentile, no colour coding, no threshold highlight. The
142    Info Card shows raw counts, attributed *per Amazon*. Revisit after v0.1
143    usage.
144  
145  ## Flagged ambiguities
146  
147  - **"buyer" vs "reader" vs "consumer"** — the extension targets the **Buyer** at
148    point of purchase. "Reader" is post-purchase. "Consumer" is too vague and
149    should be avoided entirely in product copy.
150  - **"author" overloaded** — in the conversation, "author" meant both real humans
151    and AI-manufactured identities. Always qualify as **Original Author** or
152    **Synthetic Author** when the distinction matters; plain **Author** is
153    acceptable only when the status is unknown or irrelevant.
154  - **"fake" vs "synthetic" vs "AI-generated"** — use **Synthetic Author** in docs
155    and UI copy. "Fake" is legally risky without proof (defamation exposure).
156    "AI-generated" presumes proof of AI involvement that the extension cannot
157    establish.
158  - **"score" vs "signal"** — a **Trust Signal** is a single input; a **Trust
159    Score** is a composite. The conversation used them loosely; tighten this in
160    all code and copy. MVP ships Signals only, no Score.
161  - **"platform" vs "marketplace"** — canonical is **Marketplace**. "Platform" is
162    ambiguous (Amazon, KDP, AWS all qualify).
163  - **"bot"** — informal and imprecise. Use **Synthetic Author** when referring to
164    the fake author identity, **Fraudster** when referring to the human operator.
165  - **"score" as product name, resolved** — the repository is `WriterAIScore`
166    (commit-history continuity). The public-facing product name is
167    *WhoWroteThis*. Code identifiers and internal docs may use *WriterAIScore*;
168    user-visible copy, the manifest `name` field, and the Chrome Web Store
169    listing use *WhoWroteThis*. Decision date: 2026-04-17. See `grill.org`.