Move connection/transaction management and the generic _put/_get/_all row access
plus evidence-read helpers into a mixin; CopyrighterStore now inherits it. Methods
rely on the host class's self.db_path / self._write_lock. Behavior-preserving.
sqlite_store.py 3368 -> 3072 lines.
Move shared constants + _bounded_int_env into store_constants (a leaf module),
and the remaining module-level domain helpers (validation, query signatures,
search-hint evidence, watchlist selection, knowledge type/provenance) into
store_serialization. sqlite_store.py is now the CopyrighterStore class plus thin
imports: 3613 -> 3368 lines (5333 -> 3368 overall, -37%). All behavior-preserving.
Move submission/evidence payload builders, provider-state derivation, UI<->domain
evidence mapping, weak-label handling, and id/label/image helpers into
store_serialization (depends only on stdlib + domain + url/text helpers, no store
coupling). Behavior-preserving; imported back into sqlite_store. 3992 -> 3613 lines.
Move the pure text helpers (_text_list, _unique_texts) into store_text and the
~950-line page/CSS/JSON/srcset image-URL extraction (the _PageImageParser and
its helpers) into store_page_scrape. Both behavior-preserving; store_page_scrape
depends only on stdlib + url/text helpers + domain Evidence (no store coupling).
sqlite_store.py 4955 -> 3992 lines.
Move the pure network layer (image/page/stylesheet fetchers, SSRF guard,
redirect-validating opener) into store_remote_fetch, and the DDL/typed-column/
constraint-migration helpers into store_schema. Both are behavior-preserving
relocations imported back into sqlite_store; tests repoint their fetch
monkeypatches to store_remote_fetch. sqlite_store.py 5333 -> 4948 lines.
Move the SQLite store's 5 URL helpers (_decoded_nested_url, _is_http_url,
_url_path_has_image_suffix, _url_has_image_format_hint, _url_looks_like_image)
into a focused module and import them back. Pure relocation, no behavior
change. First step of splitting the 5300-line sqlite_store god file.
Address commit security review: the same-origin branch of safeUrl accepted
//host and /\host, which browsers normalize to an external host (open
redirect). Allow only true same-origin paths.
Add safeUrl() to gate external search-result URLs into href/src (blocks
javascript:/data:), parse the response body before the ok check in apiJson
so non-JSON error bodies surface the real status, and hide broken evidence
preview images via onerror.
Governance: purge biometric face crops past a retention window (env
COPYRIGHTER_FACE_CROP_RETENTION_DAYS, default 90d) with an audit trail, run
at startup and reload; audit personal-image transmission to external Vision.
Concurrency: a process write lock + atomic provider-usage delta stop lost
counter updates; candidate promotion is idempotent (deterministic id + status
guard); seeding is serialized. Correctness: skip LLM summarize when a summary
already exists; constraint migration cleans orphan temp tables on failure.
Add provider-readiness startup log. Tests pin all of the above plus risk-band
boundaries (29/30/69/70, 100 cap) and media path-traversal guards.
Address commit security review: replace the ?token= query fallback (which
leaked the token into logs/referrers) with an HttpOnly, SameSite=Strict
session cookie minted on the first header-authenticated request, so <img>
media loads authenticate without a URL token. Use hmac.compare_digest for
constant-time comparison and add Cache-Control: no-store + Referrer-Policy:
no-referrer on untrusted biometric media. Also cover upload/import boundary
validation (400) at the HTTP layer.
Resolve each candidate image/page/stylesheet URL and refuse loopback,
RFC1918, link-local (cloud-metadata), reserved, multicast, and unspecified
targets before fetching; re-validate on every redirect hop via a custom
opener. URLs originate from external search-result content, so this closes
the operator server fetching internal services.
Add requirements.txt (numpy/opencv-python-headless/pillow — the only
third-party runtime imports) and requirements-dev.txt, plus an offline
install runbook. Ignore .coverage and wheelhouse/.
Enable WAL + busy_timeout in _connect (ThreadingHTTPServer concurrent
operators no longer hit 'database is locked'), add a _transaction helper and
thread an optional conn through _put/_get/add_audit_event so record_decision
commits its status change, watchlist entry, and audit event atomically.
Remove wildcard CORS (prevented cross-origin reads of biometric/case data
from localhost), add optional shared-token auth gate on data routes
(COPYRIGHTER_AUTH_TOKEN; GUI shell + /health stay open), cap request body
size (413), and map malformed JSON to 400 and SQLite lock contention to 503.
Correctness:
- Make the local-artifact audit test skip on fresh clones (data/ is
gitignored), so the suite passes outside this workstation
- Drop the transform from the viewRise entrance animation: an animated
transform made .view.active a containing block for 320ms and threw
the fixed decision panel off-screen on every workbench entry
- Collapse the queue toolbar at 1380px instead of 1180px; 1280x800
laptops no longer get a horizontal scrollbar (verified live)
- Serve .woff2 as font/woff2 with an immutable cache header so the
2MB bundled font is fetched once, not per page load (with test)
- Clip overflow on top-bar status chips (long apiError strings spilled
over neighbors at 981-1180px)
- Give queue-row selection a selector that outranks the even-row
zebra stripe (selection background was parity-dependent)
Cleanup:
- Replace the stale old-palette focus ring and ::selection literals
with color-mix over var(--teal)
- Delete dead tokens: unused back-compat aliases (the comment claiming
they were referenced was false), --rail-bot, --ochre-deep, and
--font-stamp (identical to --font-ui since the Pretendard switch)
- Tokenize scattered raw colors: rail ink scale, soft tint levels,
inset-well and bevel shadows, naver/internal source-chip triplets
- Remove the asset-preload div and three orphan SVGs nothing renders;
tests now reject reintroducing them
Verified: 359 tests pass; Playwright audit at 1440/1280/390 shows zero
horizontal overflow on all views, Pretendard active, decision panel
fixed at the viewport corner mid-animation.
- Bundle Pretendard Variable woff2 locally (air-gapped safe, no CDN)
and switch UI/stamp font stacks to it; preload in index.html
- Replace the forensic-dossier paper theme with a flat neutral cool
palette: single teal accent, white cards, no noise texture, and
zero linear/radial gradients (per design contract)
- Restore the product-purpose top-bar block and its CSS, drop the
unused global search form, and strip the stray UTF-8 BOM
- Re-skin queue hover/selection, eyebrows, nav rail, chips, and
empty states to the neutral palette; tabular numerals for numbers
- Regenerate ui-overhaul final audit artifacts: zero horizontal
overflow across 8 views at 1440x900 and 390x844, Pretendard active
Design spec: docs/superpowers/specs/2026-06-11-operator-console-clean-review-ui-design.md
Plan: docs/plans/2026-06-11-001-feat-operator-console-clean-review-ui-plan.md
Tests: 358 passed (full suite incl. browser smoke)
Image rights / copyright detection system: SQLite store, HTTP app,
search integrations (Naver, Google Custom Search, Google Cloud Vision
web detection), image analysis (fingerprints, face/person detection,
evidence enrichment, risk scoring), an admin/review layer, governance
and retention policies, batch jobs, and a browser-based operator GUI.
This baseline incorporates a full code-review remediation pass
(46 fixes; 358 tests passing). Highlights:
CRITICAL
- Prevent evidence cascade-delete during the schema-constraint
migration by disabling FK enforcement around the table rebuild.
Security
- Sandbox served media (neutralize stored XSS from uploaded/collected
SVGs) via CSP + nosniff on the untrusted media routes.
- Strip embedded EXIF/GPS from external image derivatives before they
are sent to third-party APIs.
- Return a clean 404 (not an uncaught StopIteration) for PATCH on an
unknown provider.
Correctness
- LLM-summary failures no longer add +30 to the risk score.
- Decode only explicit JS escapes so Korean image URLs are not mangled.
- Consume search quota only after a successful request.
- Naver/Google adapters map responses inside the failure boundary, so a
malformed response degrades to evidence instead of crashing enrichment.
- Domain-aware provider attribution; face-box IoU de-duplication; count
searches (not result items); per-box crop isolation; clamp evidence
confidence and Google CSE num; real submittedEpoch; and more.
Robustness
- Offline LLM connect fast-fails (short connect timeout) so seed/reload
requests are not stalled; full read timeout preserved for generation.
- Malformed numeric env vars fall back to defaults instead of crashing
startup.
Performance
- Per-submission evidence reads (no full-table scan per rescore),
audit-log LIMIT, lazy active-store lookup, hoisted timestamps.
Tests
- ~24 regression tests added pinning the above fixes.
Runtime data (data/, outputs/, *.sqlite3, *.log), secrets (.env), and
node_modules are gitignored.