Commit graph

15 commits

Author SHA1 Message Date
유창욱
da917755dd refactor: extract remote-fetch and schema modules from sqlite_store
Move the pure network layer (image/page/stylesheet fetchers, SSRF guard,
redirect-validating opener) into store_remote_fetch, and the DDL/typed-column/
constraint-migration helpers into store_schema. Both are behavior-preserving
relocations imported back into sqlite_store; tests repoint their fetch
monkeypatches to store_remote_fetch. sqlite_store.py 5333 -> 4948 lines.
2026-06-20 20:50:29 +09:00
유창욱
7f5799e5e1 fix: PII retention, write-race serialization, and correctness fixes
Governance: purge biometric face crops past a retention window (env
COPYRIGHTER_FACE_CROP_RETENTION_DAYS, default 90d) with an audit trail, run
at startup and reload; audit personal-image transmission to external Vision.
Concurrency: a process write lock + atomic provider-usage delta stop lost
counter updates; candidate promotion is idempotent (deterministic id + status
guard); seeding is serialized. Correctness: skip LLM summarize when a summary
already exists; constraint migration cleans orphan temp tables on failure.
Add provider-readiness startup log. Tests pin all of the above plus risk-band
boundaries (29/30/69/70, 100 cap) and media path-traversal guards.
2026-06-20 18:44:08 +09:00
유창욱
1abb1107a2 fix: cookie-based operator auth keeps token out of URLs
Address commit security review: replace the ?token= query fallback (which
leaked the token into logs/referrers) with an HttpOnly, SameSite=Strict
session cookie minted on the first header-authenticated request, so <img>
media loads authenticate without a URL token. Use hmac.compare_digest for
constant-time comparison and add Cache-Control: no-store + Referrer-Policy:
no-referrer on untrusted biometric media. Also cover upload/import boundary
validation (400) at the HTTP layer.
2026-06-20 18:43:53 +09:00
유창욱
62e2d183f8 fix: block SSRF to internal addresses in remote fetchers
Resolve each candidate image/page/stylesheet URL and refuse loopback,
RFC1918, link-local (cloud-metadata), reserved, multicast, and unspecified
targets before fetching; re-validate on every redirect hop via a custom
opener. URLs originate from external search-result content, so this closes
the operator server fetching internal services.
2026-06-20 18:22:10 +09:00
유창욱
20a6f55408 fix: SQLite concurrency safety and atomic decision writes
Enable WAL + busy_timeout in _connect (ThreadingHTTPServer concurrent
operators no longer hit 'database is locked'), add a _transaction helper and
thread an optional conn through _put/_get/add_audit_event so record_decision
commits its status change, watchlist entry, and audit event atomically.
2026-06-20 18:19:08 +09:00
유창욱
e9a15e8110 fix: harden operator HTTP server
Remove wildcard CORS (prevented cross-origin reads of biometric/case data
from localhost), add optional shared-token auth gate on data routes
(COPYRIGHTER_AUTH_TOKEN; GUI shell + /health stay open), cap request body
size (413), and map malformed JSON to 400 and SQLite lock contention to 503.
2026-06-20 18:18:54 +09:00
유창욱
37294dc140 fix: resolve multi-agent review findings for workbench efficiency round 2026-06-12 18:44:35 +09:00
유창욱
4d98582ed3 feat: rerun enrichment evidence diff with score delta and new-evidence badges 2026-06-12 18:00:43 +09:00
유창욱
1e0f4f8690 feat: persist and display detected face crop thumbnails in workbench 2026-06-12 17:56:09 +09:00
유창욱
646b871b76 feat: knowledge base search/filter, inline edit, and server-backed lifecycle actions 2026-06-12 17:51:36 +09:00
유창욱
cd9d69dddb feat: knowledge entry update/deactivate/reactivate endpoints with audit events 2026-06-12 17:48:26 +09:00
유창욱
cf342425c5 feat: expose google_search as operator manual text-query provider 2026-06-12 17:46:45 +09:00
유창욱
4abb837aaa feat: one-click and batch execution for suggested evidence queries 2026-06-12 17:44:48 +09:00
유창욱
7cac0b3835 fix: resolve code-review findings from the clean-review restyle
Correctness:
- Make the local-artifact audit test skip on fresh clones (data/ is
  gitignored), so the suite passes outside this workstation
- Drop the transform from the viewRise entrance animation: an animated
  transform made .view.active a containing block for 320ms and threw
  the fixed decision panel off-screen on every workbench entry
- Collapse the queue toolbar at 1380px instead of 1180px; 1280x800
  laptops no longer get a horizontal scrollbar (verified live)
- Serve .woff2 as font/woff2 with an immutable cache header so the
  2MB bundled font is fetched once, not per page load (with test)
- Clip overflow on top-bar status chips (long apiError strings spilled
  over neighbors at 981-1180px)
- Give queue-row selection a selector that outranks the even-row
  zebra stripe (selection background was parity-dependent)

Cleanup:
- Replace the stale old-palette focus ring and ::selection literals
  with color-mix over var(--teal)
- Delete dead tokens: unused back-compat aliases (the comment claiming
  they were referenced was false), --rail-bot, --ochre-deep, and
  --font-stamp (identical to --font-ui since the Pretendard switch)
- Tokenize scattered raw colors: rail ink scale, soft tint levels,
  inset-well and bevel shadows, naver/internal source-chip triplets
- Remove the asset-preload div and three orphan SVGs nothing renders;
  tests now reject reintroducing them

Verified: 359 tests pass; Playwright audit at 1440/1280/390 shows zero
horizontal overflow on all views, Pretendard active, decision panel
fixed at the viewport corner mid-animation.
2026-06-11 11:13:46 +09:00
유창욱
3f7b3a9cf2 chore: initial commit of copyrighter (rights_filter)
Image rights / copyright detection system: SQLite store, HTTP app,
search integrations (Naver, Google Custom Search, Google Cloud Vision
web detection), image analysis (fingerprints, face/person detection,
evidence enrichment, risk scoring), an admin/review layer, governance
and retention policies, batch jobs, and a browser-based operator GUI.

This baseline incorporates a full code-review remediation pass
(46 fixes; 358 tests passing). Highlights:

CRITICAL
- Prevent evidence cascade-delete during the schema-constraint
  migration by disabling FK enforcement around the table rebuild.

Security
- Sandbox served media (neutralize stored XSS from uploaded/collected
  SVGs) via CSP + nosniff on the untrusted media routes.
- Strip embedded EXIF/GPS from external image derivatives before they
  are sent to third-party APIs.
- Return a clean 404 (not an uncaught StopIteration) for PATCH on an
  unknown provider.

Correctness
- LLM-summary failures no longer add +30 to the risk score.
- Decode only explicit JS escapes so Korean image URLs are not mangled.
- Consume search quota only after a successful request.
- Naver/Google adapters map responses inside the failure boundary, so a
  malformed response degrades to evidence instead of crashing enrichment.
- Domain-aware provider attribution; face-box IoU de-duplication; count
  searches (not result items); per-box crop isolation; clamp evidence
  confidence and Google CSE num; real submittedEpoch; and more.

Robustness
- Offline LLM connect fast-fails (short connect timeout) so seed/reload
  requests are not stalled; full read timeout preserved for generation.
- Malformed numeric env vars fall back to defaults instead of crashing
  startup.

Performance
- Per-submission evidence reads (no full-table scan per rescore),
  audit-log LIMIT, lazy active-store lookup, hoisted timestamps.

Tests
- ~24 regression tests added pinning the above fixes.

Runtime data (data/, outputs/, *.sqlite3, *.log), secrets (.env), and
node_modules are gitignored.
2026-06-09 09:50:31 +09:00