POSA_Copyrighter/docs/plans/2026-05-26-003-feat-evidence-quality-watchlist-plan.md

---
title: "feat: Add Evidence Quality And Watchlist Growth"
type: feat
status: implemented
date: 2026-05-26
origin: docs/brainstorms/2026-05-26-evidence-quality-watchlist-requirements.md
---

# feat: Add Evidence Quality And Watchlist Growth

## Summary

Add a decision-first feedback loop around evidence status, watchlist candidate generation, strong watchlist matching, and candidate management. The implementation will keep the existing SQLite JSON-payload pattern and extend the current operator console instead of introducing a new persistence layer.

---

## Problem Frame

The current console can collect many internal, Google, Naver, and face-area web evidence items, but it does not yet let operators mark which evidence actually informed a case decision. It also promotes rejected cases too bluntly and does not create strong watchlist signals from held cases.

---

## Requirements

- R1. Evidence items support operator status: used for judgment, irrelevant, false positive, and pending. (Origin R1-R3, F1, AE1-AE2)
- R2. Evidence status never creates a DB candidate by itself; candidate creation happens only after case decision. (Origin R2, R4, AE2)
- R3. Held and rejected case decisions automatically create persistent watchlist candidates. (Origin R5-R6, F2, AE1)
- R4. Approved decisions do not create automatic candidates. (Origin R7, AE3)
- R5. Watchlist candidates strongly affect future risk scoring, at roughly the same strength as confirmed DB image matches. (Origin R8, F3, AE4)
- R6. Watchlist candidate matches are visually distinct from confirmed DB matches. (Origin R9, AE4)
- R7. Watchlist signals never change case status automatically. (Origin R10)
- R8. Operators can promote watchlist candidates to confirmed DB entries or exclude them as false positives. (Origin R11-R13, F4, AE5-AE6)
- R9. Confirmed DB, watchlist candidates, and excluded candidates retain status, source decision, source evidence, and contribution counts. (Origin R14-R17)

**Origin actors:** A1 operator, A2 rights risk filter, A3 DB administrator

**Origin flows:** F1 evidence status marking, F2 decision-driven watchlist creation, F3 watchlist-based rediscovery, F4 candidate promotion and exclusion

**Origin acceptance examples:** AE1, AE2, AE3, AE4, AE5, AE6

---

## Scope Boundaries

- No automatic approval, hold, or rejection.
- No face embeddings, face similarity database, biometric template storage, or identity recognition.
- No Google Image Search, Google Lens, Naver web UI automation, or scraping.
- No applicant-facing exposure of evidence statuses, watchlist candidates, scoring rules, or internal reasons.
- No new relational migration framework; this iteration keeps the existing JSON payload tables.

### Deferred to Follow-Up Work

- Bulk watchlist cleanup, analytics dashboards, and advanced merge suggestions can follow after the core loop is proven.
- Domain-wide false-positive suppression is deferred because it can hide valid evidence from large sites.

---

## Context & Research

### Relevant Code and Patterns

- `src/rights_filter/server/sqlite_store.py` is the persistence and orchestration boundary. It stores JSON payloads in `submissions`, `evidence`, `knowledge_entries`, `collection_candidates`, `corrections`, and `audit_events`.
- `CopyrighterStore.record_decision` currently updates case decision and creates a rejected-reference knowledge entry only for rejected cases.
- `CopyrighterStore._knowledge_repository` rebuilds an in-memory repository from active `knowledge_entries` and feeds `InternalAnalyzer`.
- `src/rights_filter/analysis/internal_analyzer.py` emits fingerprint evidence for knowledge-base image similarity.
- `src/rights_filter/analysis/risk_scoring.py` already gives high weight to strong fingerprint matches and ignores non-contributing/queued evidence.
- `web/operator-gui/app.js`, `index.html`, and `styles.css` implement the current static console, evidence grouping, decision actions, candidate collection, and knowledge DB management.
- `tests/rights_filter/server/test_sqlite_store.py` is the main integration test surface for persistence behavior.
- `tests/operator_gui/test_static_workbench.py` protects the UI contract without browser runtime dependencies.

### Institutional Learnings

- No `docs/solutions/` directory exists in this workspace.
- No `STRATEGY.md` exists; the active product strategy is captured in the brainstorm requirements documents.

### External References

- No new external APIs are introduced. Existing Google/Naver/Ollama boundaries and no-scraping policy remain unchanged.

---

## Key Technical Decisions

- Use `knowledge_entries` for persistent watchlist state: watchlist candidates are persistent risk references, not transient keyword collection results, so they should not live in `collection_candidates`, which is cleared on each keyword search.
- Add status fields instead of new tables: JSON payload storage lets us add `entryStatus`, `originDecisionStatus`, `sourceSubmissionId`, `sourceEvidenceIds`, and `contributionCount` without schema migration complexity.
- Generate candidates from the local submission image when available: the decision API passes the local image store into `record_decision`, which stores a perceptual sample fingerprint for held/rejected watchlist entries. If the image store is unavailable, the candidate is still recorded but cannot participate in image similarity until a sample fingerprint is added.
- Strong watchlist scoring: watchlist similarity should use the same high-risk path as rejected-image similarity, but with separate reason text and UI group so operators can see it is not confirmed DB evidence.
- False-positive suppression scope: start with exact evidence identity, URL/image URL/title, and candidate fingerprint. Do not suppress an entire provider domain from one false-positive action.
- Decision-driven default evidence set: use evidence marked `used_for_judgment` when available; if none are marked, generate the watchlist candidate from the case fingerprint and top contributing evidence so held/rejected decisions still strengthen future detection.

---

## Open Questions

### Resolved During Planning

- Watchlist score strength: use the same high-confidence fingerprint match behavior as confirmed/rejected DB references, with separate UI labeling.
- UI distinction: add a dedicated watchlist/주의 후보 evidence group and badges rather than mixing it into confirmed internal DB evidence.
- False-positive propagation: suppress exact evidence/candidate patterns first, not whole domains.

### Deferred to Implementation

- Exact Korean microcopy can be adjusted while fitting existing console labels.
- Exact CSS treatment should follow the existing evidence group and chip styles after visual verification.

---

## High-Level Technical Design

> *This illustrates the intended approach and is directional guidance for review, not implementation specification.*

```mermaid
flowchart TB
  Evidence[Collected evidence] --> Mark[Operator marks evidence status]
  Mark --> Decision[Operator decides approve / hold / reject]
  Decision -->|approved| NoCandidate[No automatic candidate]
  Decision -->|held or rejected| Watchlist[Create watchlist candidate]
  Watchlist --> Analyze[Future internal analysis]
  Confirmed[Confirmed DB entries] --> Analyze
  Analyze --> Score[Risk scoring]
  Score --> UI[Separate confirmed vs watchlist evidence groups]
  Watchlist --> Promote[Promote to confirmed DB]
  Watchlist --> Exclude[Exclude as false positive]
  Exclude --> Score
```

---

## Implementation Units

```mermaid
flowchart TB
  U1[U1 Evidence status API and payload]
  U2[U2 Decision-driven watchlist creation]
  U3[U3 Watchlist matching and scoring]
  U4[U4 Candidate promotion and exclusion]
  U5[U5 Operator UI controls]
  U6[U6 Docs and verification]

  U1 --> U2
  U2 --> U3
  U3 --> U4
  U1 --> U5
  U3 --> U5
  U4 --> U5
  U5 --> U6
```

### U1. Evidence Status API And Payload

**Goal:** Let operators mark evidence as used for judgment, irrelevant, false positive, or pending without changing case decision or DB state.

**Requirements:** R1, R2

**Dependencies:** None

**Files:**
- Modify: `src/rights_filter/server/sqlite_store.py`
- Modify: `src/rights_filter/server/http_app.py`
- Modify: `web/operator-gui/app.js`
- Modify: `web/operator-gui/index.html`
- Modify: `web/operator-gui/styles.css`
- Test: `tests/rights_filter/server/test_sqlite_store.py`
- Test: `tests/rights_filter/server/test_http_app.py`
- Test: `tests/operator_gui/test_static_workbench.py`

**Approach:**
- Add a store method that updates an existing evidence payload with an operator evidence status and optional note.
- Add an HTTP route for evidence status updates.
- Keep evidence status inside each evidence payload so existing bootstrap/review responses include it automatically.
- Treat false-positive and irrelevant evidence as non-contributing during rescore.
- Keep pending evidence visible but non-final.

**Execution note:** Test-first. Start with store-level tests proving status changes do not create candidates and do affect rescore contribution.

**Patterns to follow:**
- Existing `record_decision`, `_put`, `_evidence_by_submission`, and HTTP body parsing patterns in `src/rights_filter/server/sqlite_store.py` and `src/rights_filter/server/http_app.py`.

**Test scenarios:**
- Happy path: marking a Google evidence item as used for judgment persists in `review()` and `bootstrap()`.
- Happy path: marking evidence as irrelevant sets it non-contributing and rescore omits its points.
- Edge case: marking a missing evidence ID returns a not-found error.
- Edge case: unsupported evidence status returns a validation error.
- Integration: HTTP evidence status route updates the review payload.

**Verification:**
- Evidence status is visible in the API payload and does not create any knowledge entry or watchlist candidate by itself.

---

### U2. Decision-Driven Watchlist Creation

**Goal:** Create persistent watchlist candidates automatically after held or rejected decisions, using case fingerprint evidence and judgment-used evidence.

**Requirements:** R2, R3, R4, R9

**Dependencies:** U1

**Files:**
- Modify: `src/rights_filter/server/sqlite_store.py`
- Test: `tests/rights_filter/server/test_sqlite_store.py`

**Approach:**
- Extend `record_decision` so `held` and `rejected` decisions create or update a watchlist entry.
- Stop treating rejected decisions as immediately confirmed DB entries; rejected decisions should create watchlist entries first, then operators can promote them.
- Populate watchlist payloads with source submission, origin decision status, source evidence IDs, sample fingerprints, memo, active/excluded state, and contribution count.
- Use the case's generated fingerprint evidence as the primary sample fingerprint source.
- Prefer evidence marked used for judgment; if none is marked, fallback to top contributing evidence plus the case fingerprint so strict detection still grows.
- Ensure repeated decisions update the existing source-submission watchlist entry instead of creating duplicates.

**Execution note:** Test-first around decision outcomes before changing the existing rejected-entry behavior.

**Patterns to follow:**
- Existing automatic rejected-reference creation in `record_decision`.
- Existing knowledge-entry payload shape from `register_manual_knowledge_entry` and candidate promotion methods.

**Test scenarios:**
- Happy path: held decision creates one active watchlist entry with source submission and fingerprint.
- Happy path: rejected decision creates one active watchlist entry with source evidence IDs.
- Happy path: approved decision creates no watchlist entry.
- Edge case: repeating held/rejected decision for the same submission updates one candidate, not duplicates.
- Edge case: no used evidence still creates an incomplete watchlist entry from available fingerprint evidence.

**Verification:**
- Held and rejected decisions create persistent watchlist entries, approval does not, and candidate provenance is visible in `knowledgeEntries`.

---

### U3. Watchlist Matching And Scoring

**Goal:** Make watchlist candidates strongly affect future risk while remaining distinguishable from confirmed DB entries.

**Requirements:** R5, R6, R7, R9

**Dependencies:** U2

**Files:**
- Modify: `src/rights_filter/domain/records.py`
- Modify: `src/rights_filter/analysis/internal_analyzer.py`
- Modify: `src/rights_filter/analysis/risk_scoring.py`
- Modify: `src/rights_filter/server/sqlite_store.py`
- Test: `tests/rights_filter/analysis/test_internal_analyzer.py`
- Test: `tests/rights_filter/analysis/test_risk_scoring.py`
- Test: `tests/rights_filter/server/test_sqlite_store.py`

**Approach:**
- Carry knowledge entry status into internal fingerprint evidence so matches can be labeled as watchlist or confirmed.
- Keep watchlist entries active for matching unless excluded.
- Score watchlist image similarity at the same high-risk level as confirmed rejected-image similarity when similarity is high.
- Use distinct evidence reason/data for watchlist matches so UI grouping can separate them.
- Increment contribution count when a watchlist entry contributes to a rescore or analysis result.

**Execution note:** Test scoring and reason text before wiring UI labels.

**Patterns to follow:**
- `InternalAnalyzer` knowledge-base similarity loop.
- `RiskScorer` fingerprint evidence handling and non-contributing evidence checks.

**Test scenarios:**
- Happy path: image similar to a watchlist entry emits watchlist similarity evidence.
- Happy path: watchlist similarity at or above threshold produces high-risk score.
- Happy path: matched watchlist evidence does not change `decisionStatus`.
- Edge case: excluded watchlist entry is not included in repository matching.
- Integration: contribution count increases only when watchlist evidence contributes to the case score.

**Verification:**
- Watchlist matches raise risk strongly while remaining labeled as watchlist-derived evidence.

---

### U4. Candidate Promotion And False-Positive Exclusion

**Goal:** Let operators promote watchlist candidates to confirmed DB entries or exclude them so future matching is suppressed.

**Requirements:** R8, R9

**Dependencies:** U2, U3

**Files:**
- Modify: `src/rights_filter/server/sqlite_store.py`
- Modify: `src/rights_filter/server/http_app.py`
- Test: `tests/rights_filter/server/test_sqlite_store.py`
- Test: `tests/rights_filter/server/test_http_app.py`

**Approach:**
- Add store methods and HTTP routes for promoting a watchlist entry and excluding a watchlist entry.
- Promotion changes the entry status to confirmed while preserving source decision and evidence history.
- Exclusion changes the entry status to excluded, disables matching, and stores an exclusion reason.
- Apply false-positive evidence status to exact evidence/candidate patterns, image fingerprint, URL/image URL, and title where available.
- Add audit events for promotion and exclusion.

**Execution note:** Characterize existing manual/collection promotion behavior first, then add watchlist-specific paths.

**Patterns to follow:**
- Existing `promote_collection_candidate`, `promote_collection_candidates`, and knowledge entry active/deactivation patterns.

**Test scenarios:**
- Happy path: promoting a watchlist entry makes it confirmed and keeps sample fingerprints.
- Happy path: excluding a watchlist entry prevents future similarity evidence from that entry.
- Edge case: promoting an excluded entry requires explicit unexclude or returns validation error.
- Edge case: missing candidate ID returns not found.
- Integration: audit log records promote/exclude actions.

**Verification:**
- Operators can move candidates between watchlist, confirmed, and excluded states without losing provenance.

---

### U5. Operator UI Controls And Evidence Grouping

**Goal:** Make evidence status, watchlist matches, and candidate actions clear in the operator console.

**Requirements:** R1, R3, R6, R8, R9

**Dependencies:** U1, U3, U4

**Files:**
- Modify: `web/operator-gui/index.html`
- Modify: `web/operator-gui/app.js`
- Modify: `web/operator-gui/styles.css`
- Test: `tests/operator_gui/test_static_workbench.py`

**Approach:**
- Add evidence-row controls for 판단에 사용, 무관, 오탐, 보류.
- Hide or de-emphasize irrelevant and false-positive evidence by default while preserving a details view.
- Add a dedicated 주의 후보 근거 group for watchlist matches.
- Add watchlist status chips in the knowledge DB list: 주의 후보, 확정 기준, 오탐 제외.
- Add promote/exclude actions for watchlist rows.
- Keep controls dense and consistent with the existing operator dashboard; avoid introducing a separate landing or wizard.

**Execution note:** Follow frontend design checks after implementation: load the local 9500 page with Playwright and check for console errors and obvious layout breakage.

**Patterns to follow:**
- Existing evidence group rendering, details overflow, candidate cards, and knowledge rows in `web/operator-gui/app.js`.
- Existing compact panel and row styles in `web/operator-gui/styles.css`.

**Test scenarios:**
- Static contract: UI exposes evidence status action handlers and API paths.
- Static contract: watchlist group label and knowledge status chips are present.
- Static contract: irrelevant/false-positive evidence handling is represented in rendering functions.
- Browser check: page loads on desktop viewport without console errors after server restart.

**Verification:**
- Operators can mark evidence status, see watchlist evidence separately, and manage watchlist entries without confusing them with confirmed DB entries.

---

### U6. Documentation And Regression Verification

**Goal:** Update operations guidance and verify the feature end to end.

**Requirements:** R1-R9

**Dependencies:** U1-U5

**Files:**
- Modify: `docs/operations/copyrighter-operation-worklist.md`
- Test: `tests/rights_filter/server/test_sqlite_store.py`
- Test: `tests/rights_filter/server/test_http_app.py`
- Test: `tests/operator_gui/test_static_workbench.py`

**Approach:**
- Document the operator flow: mark evidence, decide case, watchlist creation, promotion, exclusion.
- State that watchlist matching is strong but not automatic case disposition.
- Run full test suite.
- Restart the 9500 server and verify `/health`, provider state, and browser load.

**Execution note:** Preserve the active `.env` and existing local data. Do not reset DB unless the user explicitly asks.

**Patterns to follow:**
- Existing operations doc format and local server verification pattern.

**Test scenarios:**
- Integration: full `pytest` passes.
- Browser: 9500 page loads without console errors.
- Operational: `/health` returns ok after restart.

**Verification:**
- Feature is documented, tests pass, and the local server is running with the updated code.

---

## System-Wide Impact

- **Interaction graph:** Case decisions now trigger watchlist updates; evidence status affects scoring contribution; internal analysis reads active confirmed/watchlist entries.
- **Error propagation:** Invalid evidence status, missing evidence, missing candidate, or invalid promotion/exclusion should return clear API errors without corrupting stored payloads.
- **State lifecycle risks:** Repeated held/rejected decisions must be idempotent per submission. Promotion and exclusion must not lose source decision provenance.
- **API surface parity:** Bootstrap, review, knowledge list, and evidence rows all need the new fields so the static UI stays in sync with server state.
- **Integration coverage:** Store tests must cover decision-to-watchlist-to-analysis; UI static tests must cover controls and grouping.
- **Unchanged invariants:** No automatic final case disposition, no applicant exposure, no biometric face storage, no scraping.

---

## Risks & Dependencies

| Risk | Mitigation |
|------|------------|
| Watchlist candidates over-amplify false positives | Keep watchlist visually distinct, add exclusion flow, and do not apply domain-wide suppression. |
| Rejected-entry behavior changes existing expectations | Update tests to make watchlist the automatic intermediate state and promotion the explicit confirmed state. |
| JSON payload fields drift across old records | Use default values when fields are absent and normalize in rendering/scoring paths. |
| UI becomes crowded | Use compact segmented evidence actions and keep weak/irrelevant evidence collapsed. |

---

## Documentation / Operational Notes

- Update the operations doc with the decision-first flow and the difference between 주의 후보 and 확정 기준 DB.
- Keep the current `.env` behavior unchanged.
- Restart the 9500 server after implementation so the operator console uses the updated route handlers and JS.

---

## Sources & References

- **Origin document:** [docs/brainstorms/2026-05-26-evidence-quality-watchlist-requirements.md](docs/brainstorms/2026-05-26-evidence-quality-watchlist-requirements.md)
- Related plan: [docs/plans/2026-05-25-002-feat-image-rights-review-enrichment-plan.md](docs/plans/2026-05-25-002-feat-image-rights-review-enrichment-plan.md)
- Related code: `src/rights_filter/server/sqlite_store.py`
- Related code: `src/rights_filter/analysis/internal_analyzer.py`
- Related code: `src/rights_filter/analysis/risk_scoring.py`
- Related UI: `web/operator-gui/app.js`