POSA_Copyrighter/docs/plans/2026-05-26-003-feat-evidence-quality-watchlist-plan.md
유창욱 3f7b3a9cf2 chore: initial commit of copyrighter (rights_filter)
Image rights / copyright detection system: SQLite store, HTTP app,
search integrations (Naver, Google Custom Search, Google Cloud Vision
web detection), image analysis (fingerprints, face/person detection,
evidence enrichment, risk scoring), an admin/review layer, governance
and retention policies, batch jobs, and a browser-based operator GUI.

This baseline incorporates a full code-review remediation pass
(46 fixes; 358 tests passing). Highlights:

CRITICAL
- Prevent evidence cascade-delete during the schema-constraint
  migration by disabling FK enforcement around the table rebuild.

Security
- Sandbox served media (neutralize stored XSS from uploaded/collected
  SVGs) via CSP + nosniff on the untrusted media routes.
- Strip embedded EXIF/GPS from external image derivatives before they
  are sent to third-party APIs.
- Return a clean 404 (not an uncaught StopIteration) for PATCH on an
  unknown provider.

Correctness
- LLM-summary failures no longer add +30 to the risk score.
- Decode only explicit JS escapes so Korean image URLs are not mangled.
- Consume search quota only after a successful request.
- Naver/Google adapters map responses inside the failure boundary, so a
  malformed response degrades to evidence instead of crashing enrichment.
- Domain-aware provider attribution; face-box IoU de-duplication; count
  searches (not result items); per-box crop isolation; clamp evidence
  confidence and Google CSE num; real submittedEpoch; and more.

Robustness
- Offline LLM connect fast-fails (short connect timeout) so seed/reload
  requests are not stalled; full read timeout preserved for generation.
- Malformed numeric env vars fall back to defaults instead of crashing
  startup.

Performance
- Per-submission evidence reads (no full-table scan per rescore),
  audit-log LIMIT, lazy active-store lookup, hoisted timestamps.

Tests
- ~24 regression tests added pinning the above fixes.

Runtime data (data/, outputs/, *.sqlite3, *.log), secrets (.env), and
node_modules are gitignored.
2026-06-09 09:50:31 +09:00

21 KiB

title type status date origin
feat: Add Evidence Quality And Watchlist Growth feat implemented 2026-05-26 docs/brainstorms/2026-05-26-evidence-quality-watchlist-requirements.md

feat: Add Evidence Quality And Watchlist Growth

Summary

Add a decision-first feedback loop around evidence status, watchlist candidate generation, strong watchlist matching, and candidate management. The implementation will keep the existing SQLite JSON-payload pattern and extend the current operator console instead of introducing a new persistence layer.


Problem Frame

The current console can collect many internal, Google, Naver, and face-area web evidence items, but it does not yet let operators mark which evidence actually informed a case decision. It also promotes rejected cases too bluntly and does not create strong watchlist signals from held cases.


Requirements

  • R1. Evidence items support operator status: used for judgment, irrelevant, false positive, and pending. (Origin R1-R3, F1, AE1-AE2)
  • R2. Evidence status never creates a DB candidate by itself; candidate creation happens only after case decision. (Origin R2, R4, AE2)
  • R3. Held and rejected case decisions automatically create persistent watchlist candidates. (Origin R5-R6, F2, AE1)
  • R4. Approved decisions do not create automatic candidates. (Origin R7, AE3)
  • R5. Watchlist candidates strongly affect future risk scoring, at roughly the same strength as confirmed DB image matches. (Origin R8, F3, AE4)
  • R6. Watchlist candidate matches are visually distinct from confirmed DB matches. (Origin R9, AE4)
  • R7. Watchlist signals never change case status automatically. (Origin R10)
  • R8. Operators can promote watchlist candidates to confirmed DB entries or exclude them as false positives. (Origin R11-R13, F4, AE5-AE6)
  • R9. Confirmed DB, watchlist candidates, and excluded candidates retain status, source decision, source evidence, and contribution counts. (Origin R14-R17)

Origin actors: A1 operator, A2 rights risk filter, A3 DB administrator

Origin flows: F1 evidence status marking, F2 decision-driven watchlist creation, F3 watchlist-based rediscovery, F4 candidate promotion and exclusion

Origin acceptance examples: AE1, AE2, AE3, AE4, AE5, AE6


Scope Boundaries

  • No automatic approval, hold, or rejection.
  • No face embeddings, face similarity database, biometric template storage, or identity recognition.
  • No Google Image Search, Google Lens, Naver web UI automation, or scraping.
  • No applicant-facing exposure of evidence statuses, watchlist candidates, scoring rules, or internal reasons.
  • No new relational migration framework; this iteration keeps the existing JSON payload tables.

Deferred to Follow-Up Work

  • Bulk watchlist cleanup, analytics dashboards, and advanced merge suggestions can follow after the core loop is proven.
  • Domain-wide false-positive suppression is deferred because it can hide valid evidence from large sites.

Context & Research

Relevant Code and Patterns

  • src/rights_filter/server/sqlite_store.py is the persistence and orchestration boundary. It stores JSON payloads in submissions, evidence, knowledge_entries, collection_candidates, corrections, and audit_events.
  • CopyrighterStore.record_decision currently updates case decision and creates a rejected-reference knowledge entry only for rejected cases.
  • CopyrighterStore._knowledge_repository rebuilds an in-memory repository from active knowledge_entries and feeds InternalAnalyzer.
  • src/rights_filter/analysis/internal_analyzer.py emits fingerprint evidence for knowledge-base image similarity.
  • src/rights_filter/analysis/risk_scoring.py already gives high weight to strong fingerprint matches and ignores non-contributing/queued evidence.
  • web/operator-gui/app.js, index.html, and styles.css implement the current static console, evidence grouping, decision actions, candidate collection, and knowledge DB management.
  • tests/rights_filter/server/test_sqlite_store.py is the main integration test surface for persistence behavior.
  • tests/operator_gui/test_static_workbench.py protects the UI contract without browser runtime dependencies.

Institutional Learnings

  • No docs/solutions/ directory exists in this workspace.
  • No STRATEGY.md exists; the active product strategy is captured in the brainstorm requirements documents.

External References

  • No new external APIs are introduced. Existing Google/Naver/Ollama boundaries and no-scraping policy remain unchanged.

Key Technical Decisions

  • Use knowledge_entries for persistent watchlist state: watchlist candidates are persistent risk references, not transient keyword collection results, so they should not live in collection_candidates, which is cleared on each keyword search.
  • Add status fields instead of new tables: JSON payload storage lets us add entryStatus, originDecisionStatus, sourceSubmissionId, sourceEvidenceIds, and contributionCount without schema migration complexity.
  • Generate candidates from the local submission image when available: the decision API passes the local image store into record_decision, which stores a perceptual sample fingerprint for held/rejected watchlist entries. If the image store is unavailable, the candidate is still recorded but cannot participate in image similarity until a sample fingerprint is added.
  • Strong watchlist scoring: watchlist similarity should use the same high-risk path as rejected-image similarity, but with separate reason text and UI group so operators can see it is not confirmed DB evidence.
  • False-positive suppression scope: start with exact evidence identity, URL/image URL/title, and candidate fingerprint. Do not suppress an entire provider domain from one false-positive action.
  • Decision-driven default evidence set: use evidence marked used_for_judgment when available; if none are marked, generate the watchlist candidate from the case fingerprint and top contributing evidence so held/rejected decisions still strengthen future detection.

Open Questions

Resolved During Planning

  • Watchlist score strength: use the same high-confidence fingerprint match behavior as confirmed/rejected DB references, with separate UI labeling.
  • UI distinction: add a dedicated watchlist/주의 후보 evidence group and badges rather than mixing it into confirmed internal DB evidence.
  • False-positive propagation: suppress exact evidence/candidate patterns first, not whole domains.

Deferred to Implementation

  • Exact Korean microcopy can be adjusted while fitting existing console labels.
  • Exact CSS treatment should follow the existing evidence group and chip styles after visual verification.

High-Level Technical Design

This illustrates the intended approach and is directional guidance for review, not implementation specification.

flowchart TB
  Evidence[Collected evidence] --> Mark[Operator marks evidence status]
  Mark --> Decision[Operator decides approve / hold / reject]
  Decision -->|approved| NoCandidate[No automatic candidate]
  Decision -->|held or rejected| Watchlist[Create watchlist candidate]
  Watchlist --> Analyze[Future internal analysis]
  Confirmed[Confirmed DB entries] --> Analyze
  Analyze --> Score[Risk scoring]
  Score --> UI[Separate confirmed vs watchlist evidence groups]
  Watchlist --> Promote[Promote to confirmed DB]
  Watchlist --> Exclude[Exclude as false positive]
  Exclude --> Score

Implementation Units

flowchart TB
  U1[U1 Evidence status API and payload]
  U2[U2 Decision-driven watchlist creation]
  U3[U3 Watchlist matching and scoring]
  U4[U4 Candidate promotion and exclusion]
  U5[U5 Operator UI controls]
  U6[U6 Docs and verification]

  U1 --> U2
  U2 --> U3
  U3 --> U4
  U1 --> U5
  U3 --> U5
  U4 --> U5
  U5 --> U6

U1. Evidence Status API And Payload

Goal: Let operators mark evidence as used for judgment, irrelevant, false positive, or pending without changing case decision or DB state.

Requirements: R1, R2

Dependencies: None

Files:

  • Modify: src/rights_filter/server/sqlite_store.py
  • Modify: src/rights_filter/server/http_app.py
  • Modify: web/operator-gui/app.js
  • Modify: web/operator-gui/index.html
  • Modify: web/operator-gui/styles.css
  • Test: tests/rights_filter/server/test_sqlite_store.py
  • Test: tests/rights_filter/server/test_http_app.py
  • Test: tests/operator_gui/test_static_workbench.py

Approach:

  • Add a store method that updates an existing evidence payload with an operator evidence status and optional note.
  • Add an HTTP route for evidence status updates.
  • Keep evidence status inside each evidence payload so existing bootstrap/review responses include it automatically.
  • Treat false-positive and irrelevant evidence as non-contributing during rescore.
  • Keep pending evidence visible but non-final.

Execution note: Test-first. Start with store-level tests proving status changes do not create candidates and do affect rescore contribution.

Patterns to follow:

  • Existing record_decision, _put, _evidence_by_submission, and HTTP body parsing patterns in src/rights_filter/server/sqlite_store.py and src/rights_filter/server/http_app.py.

Test scenarios:

  • Happy path: marking a Google evidence item as used for judgment persists in review() and bootstrap().
  • Happy path: marking evidence as irrelevant sets it non-contributing and rescore omits its points.
  • Edge case: marking a missing evidence ID returns a not-found error.
  • Edge case: unsupported evidence status returns a validation error.
  • Integration: HTTP evidence status route updates the review payload.

Verification:

  • Evidence status is visible in the API payload and does not create any knowledge entry or watchlist candidate by itself.

U2. Decision-Driven Watchlist Creation

Goal: Create persistent watchlist candidates automatically after held or rejected decisions, using case fingerprint evidence and judgment-used evidence.

Requirements: R2, R3, R4, R9

Dependencies: U1

Files:

  • Modify: src/rights_filter/server/sqlite_store.py
  • Test: tests/rights_filter/server/test_sqlite_store.py

Approach:

  • Extend record_decision so held and rejected decisions create or update a watchlist entry.
  • Stop treating rejected decisions as immediately confirmed DB entries; rejected decisions should create watchlist entries first, then operators can promote them.
  • Populate watchlist payloads with source submission, origin decision status, source evidence IDs, sample fingerprints, memo, active/excluded state, and contribution count.
  • Use the case's generated fingerprint evidence as the primary sample fingerprint source.
  • Prefer evidence marked used for judgment; if none is marked, fallback to top contributing evidence plus the case fingerprint so strict detection still grows.
  • Ensure repeated decisions update the existing source-submission watchlist entry instead of creating duplicates.

Execution note: Test-first around decision outcomes before changing the existing rejected-entry behavior.

Patterns to follow:

  • Existing automatic rejected-reference creation in record_decision.
  • Existing knowledge-entry payload shape from register_manual_knowledge_entry and candidate promotion methods.

Test scenarios:

  • Happy path: held decision creates one active watchlist entry with source submission and fingerprint.
  • Happy path: rejected decision creates one active watchlist entry with source evidence IDs.
  • Happy path: approved decision creates no watchlist entry.
  • Edge case: repeating held/rejected decision for the same submission updates one candidate, not duplicates.
  • Edge case: no used evidence still creates an incomplete watchlist entry from available fingerprint evidence.

Verification:

  • Held and rejected decisions create persistent watchlist entries, approval does not, and candidate provenance is visible in knowledgeEntries.

U3. Watchlist Matching And Scoring

Goal: Make watchlist candidates strongly affect future risk while remaining distinguishable from confirmed DB entries.

Requirements: R5, R6, R7, R9

Dependencies: U2

Files:

  • Modify: src/rights_filter/domain/records.py
  • Modify: src/rights_filter/analysis/internal_analyzer.py
  • Modify: src/rights_filter/analysis/risk_scoring.py
  • Modify: src/rights_filter/server/sqlite_store.py
  • Test: tests/rights_filter/analysis/test_internal_analyzer.py
  • Test: tests/rights_filter/analysis/test_risk_scoring.py
  • Test: tests/rights_filter/server/test_sqlite_store.py

Approach:

  • Carry knowledge entry status into internal fingerprint evidence so matches can be labeled as watchlist or confirmed.
  • Keep watchlist entries active for matching unless excluded.
  • Score watchlist image similarity at the same high-risk level as confirmed rejected-image similarity when similarity is high.
  • Use distinct evidence reason/data for watchlist matches so UI grouping can separate them.
  • Increment contribution count when a watchlist entry contributes to a rescore or analysis result.

Execution note: Test scoring and reason text before wiring UI labels.

Patterns to follow:

  • InternalAnalyzer knowledge-base similarity loop.
  • RiskScorer fingerprint evidence handling and non-contributing evidence checks.

Test scenarios:

  • Happy path: image similar to a watchlist entry emits watchlist similarity evidence.
  • Happy path: watchlist similarity at or above threshold produces high-risk score.
  • Happy path: matched watchlist evidence does not change decisionStatus.
  • Edge case: excluded watchlist entry is not included in repository matching.
  • Integration: contribution count increases only when watchlist evidence contributes to the case score.

Verification:

  • Watchlist matches raise risk strongly while remaining labeled as watchlist-derived evidence.

U4. Candidate Promotion And False-Positive Exclusion

Goal: Let operators promote watchlist candidates to confirmed DB entries or exclude them so future matching is suppressed.

Requirements: R8, R9

Dependencies: U2, U3

Files:

  • Modify: src/rights_filter/server/sqlite_store.py
  • Modify: src/rights_filter/server/http_app.py
  • Test: tests/rights_filter/server/test_sqlite_store.py
  • Test: tests/rights_filter/server/test_http_app.py

Approach:

  • Add store methods and HTTP routes for promoting a watchlist entry and excluding a watchlist entry.
  • Promotion changes the entry status to confirmed while preserving source decision and evidence history.
  • Exclusion changes the entry status to excluded, disables matching, and stores an exclusion reason.
  • Apply false-positive evidence status to exact evidence/candidate patterns, image fingerprint, URL/image URL, and title where available.
  • Add audit events for promotion and exclusion.

Execution note: Characterize existing manual/collection promotion behavior first, then add watchlist-specific paths.

Patterns to follow:

  • Existing promote_collection_candidate, promote_collection_candidates, and knowledge entry active/deactivation patterns.

Test scenarios:

  • Happy path: promoting a watchlist entry makes it confirmed and keeps sample fingerprints.
  • Happy path: excluding a watchlist entry prevents future similarity evidence from that entry.
  • Edge case: promoting an excluded entry requires explicit unexclude or returns validation error.
  • Edge case: missing candidate ID returns not found.
  • Integration: audit log records promote/exclude actions.

Verification:

  • Operators can move candidates between watchlist, confirmed, and excluded states without losing provenance.

U5. Operator UI Controls And Evidence Grouping

Goal: Make evidence status, watchlist matches, and candidate actions clear in the operator console.

Requirements: R1, R3, R6, R8, R9

Dependencies: U1, U3, U4

Files:

  • Modify: web/operator-gui/index.html
  • Modify: web/operator-gui/app.js
  • Modify: web/operator-gui/styles.css
  • Test: tests/operator_gui/test_static_workbench.py

Approach:

  • Add evidence-row controls for 판단에 사용, 무관, 오탐, 보류.
  • Hide or de-emphasize irrelevant and false-positive evidence by default while preserving a details view.
  • Add a dedicated 주의 후보 근거 group for watchlist matches.
  • Add watchlist status chips in the knowledge DB list: 주의 후보, 확정 기준, 오탐 제외.
  • Add promote/exclude actions for watchlist rows.
  • Keep controls dense and consistent with the existing operator dashboard; avoid introducing a separate landing or wizard.

Execution note: Follow frontend design checks after implementation: load the local 9500 page with Playwright and check for console errors and obvious layout breakage.

Patterns to follow:

  • Existing evidence group rendering, details overflow, candidate cards, and knowledge rows in web/operator-gui/app.js.
  • Existing compact panel and row styles in web/operator-gui/styles.css.

Test scenarios:

  • Static contract: UI exposes evidence status action handlers and API paths.
  • Static contract: watchlist group label and knowledge status chips are present.
  • Static contract: irrelevant/false-positive evidence handling is represented in rendering functions.
  • Browser check: page loads on desktop viewport without console errors after server restart.

Verification:

  • Operators can mark evidence status, see watchlist evidence separately, and manage watchlist entries without confusing them with confirmed DB entries.

U6. Documentation And Regression Verification

Goal: Update operations guidance and verify the feature end to end.

Requirements: R1-R9

Dependencies: U1-U5

Files:

  • Modify: docs/operations/copyrighter-operation-worklist.md
  • Test: tests/rights_filter/server/test_sqlite_store.py
  • Test: tests/rights_filter/server/test_http_app.py
  • Test: tests/operator_gui/test_static_workbench.py

Approach:

  • Document the operator flow: mark evidence, decide case, watchlist creation, promotion, exclusion.
  • State that watchlist matching is strong but not automatic case disposition.
  • Run full test suite.
  • Restart the 9500 server and verify /health, provider state, and browser load.

Execution note: Preserve the active .env and existing local data. Do not reset DB unless the user explicitly asks.

Patterns to follow:

  • Existing operations doc format and local server verification pattern.

Test scenarios:

  • Integration: full pytest passes.
  • Browser: 9500 page loads without console errors.
  • Operational: /health returns ok after restart.

Verification:

  • Feature is documented, tests pass, and the local server is running with the updated code.

System-Wide Impact

  • Interaction graph: Case decisions now trigger watchlist updates; evidence status affects scoring contribution; internal analysis reads active confirmed/watchlist entries.
  • Error propagation: Invalid evidence status, missing evidence, missing candidate, or invalid promotion/exclusion should return clear API errors without corrupting stored payloads.
  • State lifecycle risks: Repeated held/rejected decisions must be idempotent per submission. Promotion and exclusion must not lose source decision provenance.
  • API surface parity: Bootstrap, review, knowledge list, and evidence rows all need the new fields so the static UI stays in sync with server state.
  • Integration coverage: Store tests must cover decision-to-watchlist-to-analysis; UI static tests must cover controls and grouping.
  • Unchanged invariants: No automatic final case disposition, no applicant exposure, no biometric face storage, no scraping.

Risks & Dependencies

Risk Mitigation
Watchlist candidates over-amplify false positives Keep watchlist visually distinct, add exclusion flow, and do not apply domain-wide suppression.
Rejected-entry behavior changes existing expectations Update tests to make watchlist the automatic intermediate state and promotion the explicit confirmed state.
JSON payload fields drift across old records Use default values when fields are absent and normalize in rendering/scoring paths.
UI becomes crowded Use compact segmented evidence actions and keep weak/irrelevant evidence collapsed.

Documentation / Operational Notes

  • Update the operations doc with the decision-first flow and the difference between 주의 후보 and 확정 기준 DB.
  • Keep the current .env behavior unchanged.
  • Restart the 9500 server after implementation so the operator console uses the updated route handlers and JS.

Sources & References