POSA_Copyrighter/docs/plans/2026-05-26-003-feat-evidence-quality-watchlist-plan.md
유창욱 3f7b3a9cf2 chore: initial commit of copyrighter (rights_filter)
Image rights / copyright detection system: SQLite store, HTTP app,
search integrations (Naver, Google Custom Search, Google Cloud Vision
web detection), image analysis (fingerprints, face/person detection,
evidence enrichment, risk scoring), an admin/review layer, governance
and retention policies, batch jobs, and a browser-based operator GUI.

This baseline incorporates a full code-review remediation pass
(46 fixes; 358 tests passing). Highlights:

CRITICAL
- Prevent evidence cascade-delete during the schema-constraint
  migration by disabling FK enforcement around the table rebuild.

Security
- Sandbox served media (neutralize stored XSS from uploaded/collected
  SVGs) via CSP + nosniff on the untrusted media routes.
- Strip embedded EXIF/GPS from external image derivatives before they
  are sent to third-party APIs.
- Return a clean 404 (not an uncaught StopIteration) for PATCH on an
  unknown provider.

Correctness
- LLM-summary failures no longer add +30 to the risk score.
- Decode only explicit JS escapes so Korean image URLs are not mangled.
- Consume search quota only after a successful request.
- Naver/Google adapters map responses inside the failure boundary, so a
  malformed response degrades to evidence instead of crashing enrichment.
- Domain-aware provider attribution; face-box IoU de-duplication; count
  searches (not result items); per-box crop isolation; clamp evidence
  confidence and Google CSE num; real submittedEpoch; and more.

Robustness
- Offline LLM connect fast-fails (short connect timeout) so seed/reload
  requests are not stalled; full read timeout preserved for generation.
- Malformed numeric env vars fall back to defaults instead of crashing
  startup.

Performance
- Per-submission evidence reads (no full-table scan per rescore),
  audit-log LIMIT, lazy active-store lookup, hoisted timestamps.

Tests
- ~24 regression tests added pinning the above fixes.

Runtime data (data/, outputs/, *.sqlite3, *.log), secrets (.env), and
node_modules are gitignored.
2026-06-09 09:50:31 +09:00

420 lines
21 KiB
Markdown

---
title: "feat: Add Evidence Quality And Watchlist Growth"
type: feat
status: implemented
date: 2026-05-26
origin: docs/brainstorms/2026-05-26-evidence-quality-watchlist-requirements.md
---
# feat: Add Evidence Quality And Watchlist Growth
## Summary
Add a decision-first feedback loop around evidence status, watchlist candidate generation, strong watchlist matching, and candidate management. The implementation will keep the existing SQLite JSON-payload pattern and extend the current operator console instead of introducing a new persistence layer.
---
## Problem Frame
The current console can collect many internal, Google, Naver, and face-area web evidence items, but it does not yet let operators mark which evidence actually informed a case decision. It also promotes rejected cases too bluntly and does not create strong watchlist signals from held cases.
---
## Requirements
- R1. Evidence items support operator status: used for judgment, irrelevant, false positive, and pending. (Origin R1-R3, F1, AE1-AE2)
- R2. Evidence status never creates a DB candidate by itself; candidate creation happens only after case decision. (Origin R2, R4, AE2)
- R3. Held and rejected case decisions automatically create persistent watchlist candidates. (Origin R5-R6, F2, AE1)
- R4. Approved decisions do not create automatic candidates. (Origin R7, AE3)
- R5. Watchlist candidates strongly affect future risk scoring, at roughly the same strength as confirmed DB image matches. (Origin R8, F3, AE4)
- R6. Watchlist candidate matches are visually distinct from confirmed DB matches. (Origin R9, AE4)
- R7. Watchlist signals never change case status automatically. (Origin R10)
- R8. Operators can promote watchlist candidates to confirmed DB entries or exclude them as false positives. (Origin R11-R13, F4, AE5-AE6)
- R9. Confirmed DB, watchlist candidates, and excluded candidates retain status, source decision, source evidence, and contribution counts. (Origin R14-R17)
**Origin actors:** A1 operator, A2 rights risk filter, A3 DB administrator
**Origin flows:** F1 evidence status marking, F2 decision-driven watchlist creation, F3 watchlist-based rediscovery, F4 candidate promotion and exclusion
**Origin acceptance examples:** AE1, AE2, AE3, AE4, AE5, AE6
---
## Scope Boundaries
- No automatic approval, hold, or rejection.
- No face embeddings, face similarity database, biometric template storage, or identity recognition.
- No Google Image Search, Google Lens, Naver web UI automation, or scraping.
- No applicant-facing exposure of evidence statuses, watchlist candidates, scoring rules, or internal reasons.
- No new relational migration framework; this iteration keeps the existing JSON payload tables.
### Deferred to Follow-Up Work
- Bulk watchlist cleanup, analytics dashboards, and advanced merge suggestions can follow after the core loop is proven.
- Domain-wide false-positive suppression is deferred because it can hide valid evidence from large sites.
---
## Context & Research
### Relevant Code and Patterns
- `src/rights_filter/server/sqlite_store.py` is the persistence and orchestration boundary. It stores JSON payloads in `submissions`, `evidence`, `knowledge_entries`, `collection_candidates`, `corrections`, and `audit_events`.
- `CopyrighterStore.record_decision` currently updates case decision and creates a rejected-reference knowledge entry only for rejected cases.
- `CopyrighterStore._knowledge_repository` rebuilds an in-memory repository from active `knowledge_entries` and feeds `InternalAnalyzer`.
- `src/rights_filter/analysis/internal_analyzer.py` emits fingerprint evidence for knowledge-base image similarity.
- `src/rights_filter/analysis/risk_scoring.py` already gives high weight to strong fingerprint matches and ignores non-contributing/queued evidence.
- `web/operator-gui/app.js`, `index.html`, and `styles.css` implement the current static console, evidence grouping, decision actions, candidate collection, and knowledge DB management.
- `tests/rights_filter/server/test_sqlite_store.py` is the main integration test surface for persistence behavior.
- `tests/operator_gui/test_static_workbench.py` protects the UI contract without browser runtime dependencies.
### Institutional Learnings
- No `docs/solutions/` directory exists in this workspace.
- No `STRATEGY.md` exists; the active product strategy is captured in the brainstorm requirements documents.
### External References
- No new external APIs are introduced. Existing Google/Naver/Ollama boundaries and no-scraping policy remain unchanged.
---
## Key Technical Decisions
- Use `knowledge_entries` for persistent watchlist state: watchlist candidates are persistent risk references, not transient keyword collection results, so they should not live in `collection_candidates`, which is cleared on each keyword search.
- Add status fields instead of new tables: JSON payload storage lets us add `entryStatus`, `originDecisionStatus`, `sourceSubmissionId`, `sourceEvidenceIds`, and `contributionCount` without schema migration complexity.
- Generate candidates from the local submission image when available: the decision API passes the local image store into `record_decision`, which stores a perceptual sample fingerprint for held/rejected watchlist entries. If the image store is unavailable, the candidate is still recorded but cannot participate in image similarity until a sample fingerprint is added.
- Strong watchlist scoring: watchlist similarity should use the same high-risk path as rejected-image similarity, but with separate reason text and UI group so operators can see it is not confirmed DB evidence.
- False-positive suppression scope: start with exact evidence identity, URL/image URL/title, and candidate fingerprint. Do not suppress an entire provider domain from one false-positive action.
- Decision-driven default evidence set: use evidence marked `used_for_judgment` when available; if none are marked, generate the watchlist candidate from the case fingerprint and top contributing evidence so held/rejected decisions still strengthen future detection.
---
## Open Questions
### Resolved During Planning
- Watchlist score strength: use the same high-confidence fingerprint match behavior as confirmed/rejected DB references, with separate UI labeling.
- UI distinction: add a dedicated watchlist/주의 후보 evidence group and badges rather than mixing it into confirmed internal DB evidence.
- False-positive propagation: suppress exact evidence/candidate patterns first, not whole domains.
### Deferred to Implementation
- Exact Korean microcopy can be adjusted while fitting existing console labels.
- Exact CSS treatment should follow the existing evidence group and chip styles after visual verification.
---
## High-Level Technical Design
> *This illustrates the intended approach and is directional guidance for review, not implementation specification.*
```mermaid
flowchart TB
Evidence[Collected evidence] --> Mark[Operator marks evidence status]
Mark --> Decision[Operator decides approve / hold / reject]
Decision -->|approved| NoCandidate[No automatic candidate]
Decision -->|held or rejected| Watchlist[Create watchlist candidate]
Watchlist --> Analyze[Future internal analysis]
Confirmed[Confirmed DB entries] --> Analyze
Analyze --> Score[Risk scoring]
Score --> UI[Separate confirmed vs watchlist evidence groups]
Watchlist --> Promote[Promote to confirmed DB]
Watchlist --> Exclude[Exclude as false positive]
Exclude --> Score
```
---
## Implementation Units
```mermaid
flowchart TB
U1[U1 Evidence status API and payload]
U2[U2 Decision-driven watchlist creation]
U3[U3 Watchlist matching and scoring]
U4[U4 Candidate promotion and exclusion]
U5[U5 Operator UI controls]
U6[U6 Docs and verification]
U1 --> U2
U2 --> U3
U3 --> U4
U1 --> U5
U3 --> U5
U4 --> U5
U5 --> U6
```
### U1. Evidence Status API And Payload
**Goal:** Let operators mark evidence as used for judgment, irrelevant, false positive, or pending without changing case decision or DB state.
**Requirements:** R1, R2
**Dependencies:** None
**Files:**
- Modify: `src/rights_filter/server/sqlite_store.py`
- Modify: `src/rights_filter/server/http_app.py`
- Modify: `web/operator-gui/app.js`
- Modify: `web/operator-gui/index.html`
- Modify: `web/operator-gui/styles.css`
- Test: `tests/rights_filter/server/test_sqlite_store.py`
- Test: `tests/rights_filter/server/test_http_app.py`
- Test: `tests/operator_gui/test_static_workbench.py`
**Approach:**
- Add a store method that updates an existing evidence payload with an operator evidence status and optional note.
- Add an HTTP route for evidence status updates.
- Keep evidence status inside each evidence payload so existing bootstrap/review responses include it automatically.
- Treat false-positive and irrelevant evidence as non-contributing during rescore.
- Keep pending evidence visible but non-final.
**Execution note:** Test-first. Start with store-level tests proving status changes do not create candidates and do affect rescore contribution.
**Patterns to follow:**
- Existing `record_decision`, `_put`, `_evidence_by_submission`, and HTTP body parsing patterns in `src/rights_filter/server/sqlite_store.py` and `src/rights_filter/server/http_app.py`.
**Test scenarios:**
- Happy path: marking a Google evidence item as used for judgment persists in `review()` and `bootstrap()`.
- Happy path: marking evidence as irrelevant sets it non-contributing and rescore omits its points.
- Edge case: marking a missing evidence ID returns a not-found error.
- Edge case: unsupported evidence status returns a validation error.
- Integration: HTTP evidence status route updates the review payload.
**Verification:**
- Evidence status is visible in the API payload and does not create any knowledge entry or watchlist candidate by itself.
---
### U2. Decision-Driven Watchlist Creation
**Goal:** Create persistent watchlist candidates automatically after held or rejected decisions, using case fingerprint evidence and judgment-used evidence.
**Requirements:** R2, R3, R4, R9
**Dependencies:** U1
**Files:**
- Modify: `src/rights_filter/server/sqlite_store.py`
- Test: `tests/rights_filter/server/test_sqlite_store.py`
**Approach:**
- Extend `record_decision` so `held` and `rejected` decisions create or update a watchlist entry.
- Stop treating rejected decisions as immediately confirmed DB entries; rejected decisions should create watchlist entries first, then operators can promote them.
- Populate watchlist payloads with source submission, origin decision status, source evidence IDs, sample fingerprints, memo, active/excluded state, and contribution count.
- Use the case's generated fingerprint evidence as the primary sample fingerprint source.
- Prefer evidence marked used for judgment; if none is marked, fallback to top contributing evidence plus the case fingerprint so strict detection still grows.
- Ensure repeated decisions update the existing source-submission watchlist entry instead of creating duplicates.
**Execution note:** Test-first around decision outcomes before changing the existing rejected-entry behavior.
**Patterns to follow:**
- Existing automatic rejected-reference creation in `record_decision`.
- Existing knowledge-entry payload shape from `register_manual_knowledge_entry` and candidate promotion methods.
**Test scenarios:**
- Happy path: held decision creates one active watchlist entry with source submission and fingerprint.
- Happy path: rejected decision creates one active watchlist entry with source evidence IDs.
- Happy path: approved decision creates no watchlist entry.
- Edge case: repeating held/rejected decision for the same submission updates one candidate, not duplicates.
- Edge case: no used evidence still creates an incomplete watchlist entry from available fingerprint evidence.
**Verification:**
- Held and rejected decisions create persistent watchlist entries, approval does not, and candidate provenance is visible in `knowledgeEntries`.
---
### U3. Watchlist Matching And Scoring
**Goal:** Make watchlist candidates strongly affect future risk while remaining distinguishable from confirmed DB entries.
**Requirements:** R5, R6, R7, R9
**Dependencies:** U2
**Files:**
- Modify: `src/rights_filter/domain/records.py`
- Modify: `src/rights_filter/analysis/internal_analyzer.py`
- Modify: `src/rights_filter/analysis/risk_scoring.py`
- Modify: `src/rights_filter/server/sqlite_store.py`
- Test: `tests/rights_filter/analysis/test_internal_analyzer.py`
- Test: `tests/rights_filter/analysis/test_risk_scoring.py`
- Test: `tests/rights_filter/server/test_sqlite_store.py`
**Approach:**
- Carry knowledge entry status into internal fingerprint evidence so matches can be labeled as watchlist or confirmed.
- Keep watchlist entries active for matching unless excluded.
- Score watchlist image similarity at the same high-risk level as confirmed rejected-image similarity when similarity is high.
- Use distinct evidence reason/data for watchlist matches so UI grouping can separate them.
- Increment contribution count when a watchlist entry contributes to a rescore or analysis result.
**Execution note:** Test scoring and reason text before wiring UI labels.
**Patterns to follow:**
- `InternalAnalyzer` knowledge-base similarity loop.
- `RiskScorer` fingerprint evidence handling and non-contributing evidence checks.
**Test scenarios:**
- Happy path: image similar to a watchlist entry emits watchlist similarity evidence.
- Happy path: watchlist similarity at or above threshold produces high-risk score.
- Happy path: matched watchlist evidence does not change `decisionStatus`.
- Edge case: excluded watchlist entry is not included in repository matching.
- Integration: contribution count increases only when watchlist evidence contributes to the case score.
**Verification:**
- Watchlist matches raise risk strongly while remaining labeled as watchlist-derived evidence.
---
### U4. Candidate Promotion And False-Positive Exclusion
**Goal:** Let operators promote watchlist candidates to confirmed DB entries or exclude them so future matching is suppressed.
**Requirements:** R8, R9
**Dependencies:** U2, U3
**Files:**
- Modify: `src/rights_filter/server/sqlite_store.py`
- Modify: `src/rights_filter/server/http_app.py`
- Test: `tests/rights_filter/server/test_sqlite_store.py`
- Test: `tests/rights_filter/server/test_http_app.py`
**Approach:**
- Add store methods and HTTP routes for promoting a watchlist entry and excluding a watchlist entry.
- Promotion changes the entry status to confirmed while preserving source decision and evidence history.
- Exclusion changes the entry status to excluded, disables matching, and stores an exclusion reason.
- Apply false-positive evidence status to exact evidence/candidate patterns, image fingerprint, URL/image URL, and title where available.
- Add audit events for promotion and exclusion.
**Execution note:** Characterize existing manual/collection promotion behavior first, then add watchlist-specific paths.
**Patterns to follow:**
- Existing `promote_collection_candidate`, `promote_collection_candidates`, and knowledge entry active/deactivation patterns.
**Test scenarios:**
- Happy path: promoting a watchlist entry makes it confirmed and keeps sample fingerprints.
- Happy path: excluding a watchlist entry prevents future similarity evidence from that entry.
- Edge case: promoting an excluded entry requires explicit unexclude or returns validation error.
- Edge case: missing candidate ID returns not found.
- Integration: audit log records promote/exclude actions.
**Verification:**
- Operators can move candidates between watchlist, confirmed, and excluded states without losing provenance.
---
### U5. Operator UI Controls And Evidence Grouping
**Goal:** Make evidence status, watchlist matches, and candidate actions clear in the operator console.
**Requirements:** R1, R3, R6, R8, R9
**Dependencies:** U1, U3, U4
**Files:**
- Modify: `web/operator-gui/index.html`
- Modify: `web/operator-gui/app.js`
- Modify: `web/operator-gui/styles.css`
- Test: `tests/operator_gui/test_static_workbench.py`
**Approach:**
- Add evidence-row controls for 판단에 사용, 무관, 오탐, 보류.
- Hide or de-emphasize irrelevant and false-positive evidence by default while preserving a details view.
- Add a dedicated 주의 후보 근거 group for watchlist matches.
- Add watchlist status chips in the knowledge DB list: 주의 후보, 확정 기준, 오탐 제외.
- Add promote/exclude actions for watchlist rows.
- Keep controls dense and consistent with the existing operator dashboard; avoid introducing a separate landing or wizard.
**Execution note:** Follow frontend design checks after implementation: load the local 9500 page with Playwright and check for console errors and obvious layout breakage.
**Patterns to follow:**
- Existing evidence group rendering, details overflow, candidate cards, and knowledge rows in `web/operator-gui/app.js`.
- Existing compact panel and row styles in `web/operator-gui/styles.css`.
**Test scenarios:**
- Static contract: UI exposes evidence status action handlers and API paths.
- Static contract: watchlist group label and knowledge status chips are present.
- Static contract: irrelevant/false-positive evidence handling is represented in rendering functions.
- Browser check: page loads on desktop viewport without console errors after server restart.
**Verification:**
- Operators can mark evidence status, see watchlist evidence separately, and manage watchlist entries without confusing them with confirmed DB entries.
---
### U6. Documentation And Regression Verification
**Goal:** Update operations guidance and verify the feature end to end.
**Requirements:** R1-R9
**Dependencies:** U1-U5
**Files:**
- Modify: `docs/operations/copyrighter-operation-worklist.md`
- Test: `tests/rights_filter/server/test_sqlite_store.py`
- Test: `tests/rights_filter/server/test_http_app.py`
- Test: `tests/operator_gui/test_static_workbench.py`
**Approach:**
- Document the operator flow: mark evidence, decide case, watchlist creation, promotion, exclusion.
- State that watchlist matching is strong but not automatic case disposition.
- Run full test suite.
- Restart the 9500 server and verify `/health`, provider state, and browser load.
**Execution note:** Preserve the active `.env` and existing local data. Do not reset DB unless the user explicitly asks.
**Patterns to follow:**
- Existing operations doc format and local server verification pattern.
**Test scenarios:**
- Integration: full `pytest` passes.
- Browser: 9500 page loads without console errors.
- Operational: `/health` returns ok after restart.
**Verification:**
- Feature is documented, tests pass, and the local server is running with the updated code.
---
## System-Wide Impact
- **Interaction graph:** Case decisions now trigger watchlist updates; evidence status affects scoring contribution; internal analysis reads active confirmed/watchlist entries.
- **Error propagation:** Invalid evidence status, missing evidence, missing candidate, or invalid promotion/exclusion should return clear API errors without corrupting stored payloads.
- **State lifecycle risks:** Repeated held/rejected decisions must be idempotent per submission. Promotion and exclusion must not lose source decision provenance.
- **API surface parity:** Bootstrap, review, knowledge list, and evidence rows all need the new fields so the static UI stays in sync with server state.
- **Integration coverage:** Store tests must cover decision-to-watchlist-to-analysis; UI static tests must cover controls and grouping.
- **Unchanged invariants:** No automatic final case disposition, no applicant exposure, no biometric face storage, no scraping.
---
## Risks & Dependencies
| Risk | Mitigation |
|------|------------|
| Watchlist candidates over-amplify false positives | Keep watchlist visually distinct, add exclusion flow, and do not apply domain-wide suppression. |
| Rejected-entry behavior changes existing expectations | Update tests to make watchlist the automatic intermediate state and promotion the explicit confirmed state. |
| JSON payload fields drift across old records | Use default values when fields are absent and normalize in rendering/scoring paths. |
| UI becomes crowded | Use compact segmented evidence actions and keep weak/irrelevant evidence collapsed. |
---
## Documentation / Operational Notes
- Update the operations doc with the decision-first flow and the difference between 주의 후보 and 확정 기준 DB.
- Keep the current `.env` behavior unchanged.
- Restart the 9500 server after implementation so the operator console uses the updated route handlers and JS.
---
## Sources & References
- **Origin document:** [docs/brainstorms/2026-05-26-evidence-quality-watchlist-requirements.md](docs/brainstorms/2026-05-26-evidence-quality-watchlist-requirements.md)
- Related plan: [docs/plans/2026-05-25-002-feat-image-rights-review-enrichment-plan.md](docs/plans/2026-05-25-002-feat-image-rights-review-enrichment-plan.md)
- Related code: `src/rights_filter/server/sqlite_store.py`
- Related code: `src/rights_filter/analysis/internal_analyzer.py`
- Related code: `src/rights_filter/analysis/risk_scoring.py`
- Related UI: `web/operator-gui/app.js`