Commit graph

33 commits

Author SHA1 Message Date
유창욱
2eb7bd3b8b docs: mark god-file split complete in remediation plan 2026-06-20 22:36:02 +09:00
유창욱
6184d0f464 refactor: extract submission-queue management into StoreQueueMixin
Move queue id derivation, active-queue selection, legacy-submission migration,
and queue bootstrap/ensure helpers into a mixin; CopyrighterStore inherits it.
Drop now-unused hashlib import. sqlite_store.py 874 -> 724 lines (5119 -> 724,
-86%); now under the 800-line guideline.
2026-06-20 22:35:02 +09:00
유창욱
b575d2ee06 refactor: extract operator operations into StoreOperationsMixin
Move knowledge-entry lifecycle, rerun/auto/manual search drivers, LLM summary
management, manual knowledge registration, and keyword-candidate collection/
promotion into a mixin; CopyrighterStore inherits it. Drop now-unused imports;
point the rollback test at store_serialization._stable_id. sqlite_store.py 1598
-> 874 lines (5333 -> 874, -84%).
2026-06-20 22:28:30 +09:00
유창욱
3bc07d94c3 refactor: extract enrichment internals into StoreEnrichmentMixin
Move provider-state sync, local face evidence/crop sync + retention purge,
internal/Google re-analysis, face-crop web detection, and the auto Google/Naver
search drivers into a mixin; CopyrighterStore inherits it. Drop now-unused
imports. Tests now also patch HeuristicFacePersonDetector on store_enrichment
(face detection moved there). sqlite_store.py 2358 -> 1598 lines (-70% overall).
2026-06-20 22:18:38 +09:00
유창욱
8e0a8c307d refactor: extract search-result similarity and candidate storage into mixin
Move the search-result image similarity, candidate-image storage, in-memory
knowledge repository, and rescoring methods into StoreSearchCandidatesMixin;
CopyrighterStore inherits it. Drop now-unused imports. sqlite_store.py 3072 ->
2358 lines (5333 -> 2358 overall, -56%). Behavior-preserving.
2026-06-20 21:58:10 +09:00
유창욱
40501e13f1 refactor: extract SQLite persistence primitives into StorePersistenceMixin
Move connection/transaction management and the generic _put/_get/_all row access
plus evidence-read helpers into a mixin; CopyrighterStore now inherits it. Methods
rely on the host class's self.db_path / self._write_lock. Behavior-preserving.
sqlite_store.py 3368 -> 3072 lines.
2026-06-20 21:46:01 +09:00
유창욱
3be7b016ce refactor: extract store constants and remaining domain helpers
Move shared constants + _bounded_int_env into store_constants (a leaf module),
and the remaining module-level domain helpers (validation, query signatures,
search-hint evidence, watchlist selection, knowledge type/provenance) into
store_serialization. sqlite_store.py is now the CopyrighterStore class plus thin
imports: 3613 -> 3368 lines (5333 -> 3368 overall, -37%). All behavior-preserving.
2026-06-20 21:38:03 +09:00
유창욱
8e53139029 refactor: extract payload serialization helpers into store_serialization
Move submission/evidence payload builders, provider-state derivation, UI<->domain
evidence mapping, weak-label handling, and id/label/image helpers into
store_serialization (depends only on stdlib + domain + url/text helpers, no store
coupling). Behavior-preserving; imported back into sqlite_store. 3992 -> 3613 lines.
2026-06-20 21:24:58 +09:00
유창욱
e3bc99e6b9 refactor: extract text helpers and HTML/CSS image-scraping from sqlite_store
Move the pure text helpers (_text_list, _unique_texts) into store_text and the
~950-line page/CSS/JSON/srcset image-URL extraction (the _PageImageParser and
its helpers) into store_page_scrape. Both behavior-preserving; store_page_scrape
depends only on stdlib + url/text helpers + domain Evidence (no store coupling).
sqlite_store.py 4955 -> 3992 lines.
2026-06-20 21:10:22 +09:00
유창욱
bd35cf6f3f docs: record remediation implementation status in plan 2026-06-20 20:57:02 +09:00
유창욱
da917755dd refactor: extract remote-fetch and schema modules from sqlite_store
Move the pure network layer (image/page/stylesheet fetchers, SSRF guard,
redirect-validating opener) into store_remote_fetch, and the DDL/typed-column/
constraint-migration helpers into store_schema. Both are behavior-preserving
relocations imported back into sqlite_store; tests repoint their fetch
monkeypatches to store_remote_fetch. sqlite_store.py 5333 -> 4948 lines.
2026-06-20 20:50:29 +09:00
유창욱
e66f9d5001 refactor: extract URL helpers into store_url_utils
Move the SQLite store's 5 URL helpers (_decoded_nested_url, _is_http_url,
_url_path_has_image_suffix, _url_has_image_format_hint, _url_looks_like_image)
into a focused module and import them back. Pure relocation, no behavior
change. First step of splitting the 5300-line sqlite_store god file.
2026-06-20 20:35:21 +09:00
유창욱
7317bfb2b3 fix: reject protocol-relative and backslash URLs in safeUrl
Address commit security review: the same-origin branch of safeUrl accepted
//host and /\host, which browsers normalize to an external host (open
redirect). Allow only true same-origin paths.
2026-06-20 18:47:13 +09:00
유창욱
f8aa10f91b fix: frontend URL scheme allowlist, fetch ok-check, image onerror
Add safeUrl() to gate external search-result URLs into href/src (blocks
javascript:/data:), parse the response body before the ok check in apiJson
so non-JSON error bodies surface the real status, and hide broken evidence
preview images via onerror.
2026-06-20 18:44:20 +09:00
유창욱
7f5799e5e1 fix: PII retention, write-race serialization, and correctness fixes
Governance: purge biometric face crops past a retention window (env
COPYRIGHTER_FACE_CROP_RETENTION_DAYS, default 90d) with an audit trail, run
at startup and reload; audit personal-image transmission to external Vision.
Concurrency: a process write lock + atomic provider-usage delta stop lost
counter updates; candidate promotion is idempotent (deterministic id + status
guard); seeding is serialized. Correctness: skip LLM summarize when a summary
already exists; constraint migration cleans orphan temp tables on failure.
Add provider-readiness startup log. Tests pin all of the above plus risk-band
boundaries (29/30/69/70, 100 cap) and media path-traversal guards.
2026-06-20 18:44:08 +09:00
유창욱
1abb1107a2 fix: cookie-based operator auth keeps token out of URLs
Address commit security review: replace the ?token= query fallback (which
leaked the token into logs/referrers) with an HttpOnly, SameSite=Strict
session cookie minted on the first header-authenticated request, so <img>
media loads authenticate without a URL token. Use hmac.compare_digest for
constant-time comparison and add Cache-Control: no-store + Referrer-Policy:
no-referrer on untrusted biometric media. Also cover upload/import boundary
validation (400) at the HTTP layer.
2026-06-20 18:43:53 +09:00
유창욱
62e2d183f8 fix: block SSRF to internal addresses in remote fetchers
Resolve each candidate image/page/stylesheet URL and refuse loopback,
RFC1918, link-local (cloud-metadata), reserved, multicast, and unspecified
targets before fetching; re-validate on every redirect hop via a custom
opener. URLs originate from external search-result content, so this closes
the operator server fetching internal services.
2026-06-20 18:22:10 +09:00
유창욱
8958dd1b83 chore: pin runtime dependencies for offline air-gapped install
Add requirements.txt (numpy/opencv-python-headless/pillow — the only
third-party runtime imports) and requirements-dev.txt, plus an offline
install runbook. Ignore .coverage and wheelhouse/.
2026-06-20 18:19:08 +09:00
유창욱
20a6f55408 fix: SQLite concurrency safety and atomic decision writes
Enable WAL + busy_timeout in _connect (ThreadingHTTPServer concurrent
operators no longer hit 'database is locked'), add a _transaction helper and
thread an optional conn through _put/_get/add_audit_event so record_decision
commits its status change, watchlist entry, and audit event atomically.
2026-06-20 18:19:08 +09:00
유창욱
e9a15e8110 fix: harden operator HTTP server
Remove wildcard CORS (prevented cross-origin reads of biometric/case data
from localhost), add optional shared-token auth gate on data routes
(COPYRIGHTER_AUTH_TOKEN; GUI shell + /health stay open), cap request body
size (413), and map malformed JSON to 400 and SQLite lock contention to 503.
2026-06-20 18:18:54 +09:00
유창욱
62c13faafa docs: implementation plan for project-review remediation 2026-06-20 18:18:54 +09:00
유창욱
37294dc140 fix: resolve multi-agent review findings for workbench efficiency round 2026-06-12 18:44:35 +09:00
유창욱
4d98582ed3 feat: rerun enrichment evidence diff with score delta and new-evidence badges 2026-06-12 18:00:43 +09:00
유창욱
1e0f4f8690 feat: persist and display detected face crop thumbnails in workbench 2026-06-12 17:56:09 +09:00
유창욱
646b871b76 feat: knowledge base search/filter, inline edit, and server-backed lifecycle actions 2026-06-12 17:51:36 +09:00
유창욱
cd9d69dddb feat: knowledge entry update/deactivate/reactivate endpoints with audit events 2026-06-12 17:48:26 +09:00
유창욱
cf342425c5 feat: expose google_search as operator manual text-query provider 2026-06-12 17:46:45 +09:00
유창욱
4abb837aaa feat: one-click and batch execution for suggested evidence queries 2026-06-12 17:44:48 +09:00
유창욱
63bbf0d755 docs: implementation plan for operator workbench efficiency (F1-F5) 2026-06-12 17:40:34 +09:00
유창욱
b4b8f4b5d8 docs: operator workbench efficiency design (F1-F5) 2026-06-11 16:03:15 +09:00
유창욱
7cac0b3835 fix: resolve code-review findings from the clean-review restyle
Correctness:
- Make the local-artifact audit test skip on fresh clones (data/ is
  gitignored), so the suite passes outside this workstation
- Drop the transform from the viewRise entrance animation: an animated
  transform made .view.active a containing block for 320ms and threw
  the fixed decision panel off-screen on every workbench entry
- Collapse the queue toolbar at 1380px instead of 1180px; 1280x800
  laptops no longer get a horizontal scrollbar (verified live)
- Serve .woff2 as font/woff2 with an immutable cache header so the
  2MB bundled font is fetched once, not per page load (with test)
- Clip overflow on top-bar status chips (long apiError strings spilled
  over neighbors at 981-1180px)
- Give queue-row selection a selector that outranks the even-row
  zebra stripe (selection background was parity-dependent)

Cleanup:
- Replace the stale old-palette focus ring and ::selection literals
  with color-mix over var(--teal)
- Delete dead tokens: unused back-compat aliases (the comment claiming
  they were referenced was false), --rail-bot, --ochre-deep, and
  --font-stamp (identical to --font-ui since the Pretendard switch)
- Tokenize scattered raw colors: rail ink scale, soft tint levels,
  inset-well and bevel shadows, naver/internal source-chip triplets
- Remove the asset-preload div and three orphan SVGs nothing renders;
  tests now reject reintroducing them

Verified: 359 tests pass; Playwright audit at 1440/1280/390 shows zero
horizontal overflow on all views, Pretendard active, decision panel
fixed at the viewport corner mid-animation.
2026-06-11 11:13:46 +09:00
유창욱
ed701bd436 feat: clean review-instrument restyle with bundled Pretendard font
- Bundle Pretendard Variable woff2 locally (air-gapped safe, no CDN)
  and switch UI/stamp font stacks to it; preload in index.html
- Replace the forensic-dossier paper theme with a flat neutral cool
  palette: single teal accent, white cards, no noise texture, and
  zero linear/radial gradients (per design contract)
- Restore the product-purpose top-bar block and its CSS, drop the
  unused global search form, and strip the stray UTF-8 BOM
- Re-skin queue hover/selection, eyebrows, nav rail, chips, and
  empty states to the neutral palette; tabular numerals for numbers
- Regenerate ui-overhaul final audit artifacts: zero horizontal
  overflow across 8 views at 1440x900 and 390x844, Pretendard active

Design spec: docs/superpowers/specs/2026-06-11-operator-console-clean-review-ui-design.md
Plan: docs/plans/2026-06-11-001-feat-operator-console-clean-review-ui-plan.md
Tests: 358 passed (full suite incl. browser smoke)
2026-06-11 10:31:16 +09:00
유창욱
3f7b3a9cf2 chore: initial commit of copyrighter (rights_filter)
Image rights / copyright detection system: SQLite store, HTTP app,
search integrations (Naver, Google Custom Search, Google Cloud Vision
web detection), image analysis (fingerprints, face/person detection,
evidence enrichment, risk scoring), an admin/review layer, governance
and retention policies, batch jobs, and a browser-based operator GUI.

This baseline incorporates a full code-review remediation pass
(46 fixes; 358 tests passing). Highlights:

CRITICAL
- Prevent evidence cascade-delete during the schema-constraint
  migration by disabling FK enforcement around the table rebuild.

Security
- Sandbox served media (neutralize stored XSS from uploaded/collected
  SVGs) via CSP + nosniff on the untrusted media routes.
- Strip embedded EXIF/GPS from external image derivatives before they
  are sent to third-party APIs.
- Return a clean 404 (not an uncaught StopIteration) for PATCH on an
  unknown provider.

Correctness
- LLM-summary failures no longer add +30 to the risk score.
- Decode only explicit JS escapes so Korean image URLs are not mangled.
- Consume search quota only after a successful request.
- Naver/Google adapters map responses inside the failure boundary, so a
  malformed response degrades to evidence instead of crashing enrichment.
- Domain-aware provider attribution; face-box IoU de-duplication; count
  searches (not result items); per-box crop isolation; clamp evidence
  confidence and Google CSE num; real submittedEpoch; and more.

Robustness
- Offline LLM connect fast-fails (short connect timeout) so seed/reload
  requests are not stalled; full read timeout preserved for generation.
- Malformed numeric env vars fall back to defaults instead of crashing
  startup.

Performance
- Per-submission evidence reads (no full-table scan per rescore),
  audit-log LIMIT, lazy active-store lookup, hoisted timestamps.

Tests
- ~24 regression tests added pinning the above fixes.

Runtime data (data/, outputs/, *.sqlite3, *.log), secrets (.env), and
node_modules are gitignored.
2026-06-09 09:50:31 +09:00