cw2007/POSA_Copyrighter

유창욱 3f7b3a9cf2 chore: initial commit of copyrighter (rights_filter)

Image rights / copyright detection system: SQLite store, HTTP app,
search integrations (Naver, Google Custom Search, Google Cloud Vision
web detection), image analysis (fingerprints, face/person detection,
evidence enrichment, risk scoring), an admin/review layer, governance
and retention policies, batch jobs, and a browser-based operator GUI.

This baseline incorporates a full code-review remediation pass
(46 fixes; 358 tests passing). Highlights:

CRITICAL
- Prevent evidence cascade-delete during the schema-constraint
  migration by disabling FK enforcement around the table rebuild.

Security
- Sandbox served media (neutralize stored XSS from uploaded/collected
  SVGs) via CSP + nosniff on the untrusted media routes.
- Strip embedded EXIF/GPS from external image derivatives before they
  are sent to third-party APIs.
- Return a clean 404 (not an uncaught StopIteration) for PATCH on an
  unknown provider.

Correctness
- LLM-summary failures no longer add +30 to the risk score.
- Decode only explicit JS escapes so Korean image URLs are not mangled.
- Consume search quota only after a successful request.
- Naver/Google adapters map responses inside the failure boundary, so a
  malformed response degrades to evidence instead of crashing enrichment.
- Domain-aware provider attribution; face-box IoU de-duplication; count
  searches (not result items); per-box crop isolation; clamp evidence
  confidence and Google CSE num; real submittedEpoch; and more.

Robustness
- Offline LLM connect fast-fails (short connect timeout) so seed/reload
  requests are not stalled; full read timeout preserved for generation.
- Malformed numeric env vars fall back to defaults instead of crashing
  startup.

Performance
- Per-submission evidence reads (no full-table scan per rescore),
  audit-log LIMIT, lazy active-store lookup, hoisted timestamps.

Tests
- ~24 regression tests added pinning the above fixes.

Runtime data (data/, outputs/, *.sqlite3, *.log), secrets (.env), and
node_modules are gitignored.

2026-06-09 09:50:31 +09:00

12 KiB

Raw Blame History

Image Rights Operator GUI Design

This document defines the user-facing GUI/UX shape for the internal image rights review tool. It does not implement a frontend app yet; the current workspace has no frontend framework or design system.

Goal

Give operators one coherent internal workbench where they can review every submitted image, inspect rights-risk evidence, run or retry enrichment, manage the criteria database, correct bad decisions, and control external providers without exposing automated analysis to applicants.

Product Posture

This is an operational review console, not a marketing site. The interface should be dense, calm, audit-friendly, and built for repeated decisions under risk. The first screen should be useful immediately: a review queue with risk-ranked submissions and enough context to choose the next case.

Visual Thesis

Use a quiet, utilitarian dashboard style: neutral surfaces, high contrast text, restrained accent colors for risk and action, compact spacing, stable table/list layouts, and persistent context panels. Avoid decorative hero sections, oversized cards, gradients, or illustrative marketing composition.

Use a persistent left navigation rail with these primary areas:

Review Queue
Case Review
Evidence Search
Knowledge Base
Corrections
Provider Controls
Audit Log

Use a top command bar for global search, current provider mode, queue health, and operator identity. Provider status should be visible but not visually dominant.

Primary Screens

1. Review Queue

Purpose: let operators decide what to review next.

The queue should be a dense table or split list with:

Thumbnail
Submission ID
Risk score and band
Top two risk reasons
Provider state: internal, Naver, Google, LLM, failed, skipped, pending
Applicant-visible status
Operator decision status
Age / submitted time
Last analysis time

Required controls:

Risk band filter: high, medium, low, failed, pending
Source filter: Naver hit, Google hit, fingerprint match, face/person, LLM summary, failed provider
Decision filter: unreviewed, held, rejected, approved, corrected
Sort by risk, newest, oldest, analysis failure, provider failure
Bulk selection only for non-decisive actions such as re-run analysis or assign reviewer; no bulk approve/reject in v1

UX rule: the queue should never require opening every item just to understand why it is high risk.

2. Case Review

Purpose: let one operator make a final decision on one submitted image.

Use a three-pane layout on desktop:

Left pane: image viewer
Center pane: evidence and reasoning
Right pane: decision and case controls

Left image pane:

Original/internal review image preview
Zoom, fit, actual size, rotate
Side-by-side thumbnails for visually similar or matching search images
Basic file facts: dimensions, submitted time, analysis version
Clear indication when only an internal derivative is shown

Center evidence pane:

Risk score, band, and top reasons at the top
Evidence grouped by source:
- Internal fingerprints
- Prior rejection similarity
- Face/person presence
- Naver search results
- Google Web Detection results
- Internal LLM summary
- Failures and skipped providers
Each evidence row should show source, confidence or strength, query if relevant, URL/domain, retrieval time, and whether it contributed to the score
LLM summary must show citations or source chips; source-less claims appear as unverified notes and do not appear as score reasons
Failure states must be visible, not hidden in logs

Right decision pane:

Current recommendation: low, medium, high, needs review, failed/partial
Manual action buttons: approve, hold, reject
Required memo on reject and correction
Optional memo on approve/hold
Rejection outcome preview: whether a rejected-image entry or candidate will be added to the knowledge base
Quick actions:
- Add entity to knowledge base
- Mark evidence irrelevant
- Re-run enrichment
- Disable stale automatic entry
- Open correction flow

UX rule: the final decision controls must be visually and functionally separate from the automated recommendation. The UI must make it clear that the system suggests and the operator decides.

3. Evidence Search

Purpose: let operators inspect and reproduce why search evidence appeared.

This screen should show:

Search query history per submission
Naver result list with title, thumbnail, source page, image URL, rank, timestamp
Google Web Detection entities, matching images, pages, and labels
LLM-generated query candidates with execution status
Provider failures, quota skips, disabled-provider states

Required controls:

Manual text query run for Naver, subject to policy and quota
Re-run selected query
Mark result as relevant, irrelevant, duplicate, or unsafe
Create knowledge-base candidate from a result

UX rule: Naver is text-query search only. The UI must not offer an image upload reverse-search interaction for Naver.

4. Knowledge Base

Purpose: let operators build and maintain the criteria database that improves future filtering.

Main objects:

Celebrity / public figure
Group
Work
Character
Webtoon
Game
Rejected image reference
Other policy-relevant entity

Each entity detail should show:

Name
Aliases
Related keywords
Type
Policy memo
Exception conditions
Sample image fingerprints or rejected-image references
Provenance: manual, automatic rejection, search evidence
Active/inactive state
Created from decision or operator entry

Required controls:

Create manual entity
Add alias
Add related keyword
Add sample fingerprint from reviewed case
Deactivate entry
Reactivate entry only with memo
View affected future matches or historical matches

UX rule: automatic entries and manual entries must look different. Operators should not mistake a rejection-derived entry for a verified policy rule.

5. Corrections

Purpose: prevent false positives from contaminating future review.

This screen should focus on decision lineage:

Corrected decisions
Automatic knowledge entries derived from each decision
Current active/inactive state
Reason for correction
Operator who corrected
Timestamp

Required controls:

Correct prior rejection
Deactivate all derived automatic entries
Keep selected derived entries active with memo
Add correction note
Show before/after risk impact for future similar submissions when available

UX rule: correction should be a first-class workflow, not a hidden admin cleanup task.

6. Provider Controls

Purpose: let admins safely operate external and assisted analysis modes.

Provider cards or rows:

Internal analysis
Naver search
Google Web Detection
Internal LLM

Each provider should show:

Enabled/disabled state
Compliance approval state
Daily quota and usage
Last successful call
Last failure
Data boundary summary
Emergency disable control

Required controls:

Disable provider immediately
Set daily limit
View recent failures
Retry failed enrichments
Export provider usage audit

UX rule: provider controls are admin-only and should not be mixed into normal operator decision controls.

7. Audit Log

Purpose: make decisions and evidence changes reviewable.

Audit events:

Analysis run created
Provider called/skipped/failed
LLM summary generated
Operator decision created
Rejection-derived entry created
Knowledge entry manually created
Knowledge entry deactivated/reactivated
Correction applied
Provider setting changed

Audit rows should include actor, timestamp, object, event type, before/after where relevant, and linked case.

End-To-End Operator Flow

Operator opens Review Queue.
Operator filters to high risk or failed analysis.
Operator opens a case.
Case Review shows image, risk score, top reasons, grouped evidence, and provider state.
Operator opens relevant Naver/Google result links or thumbnails.
Operator reads LLM summary only as a source-linked digest.
Operator approves, holds, or rejects manually.
If rejecting, operator confirms memo and knowledge-base accumulation behavior.
If later wrong, operator uses Corrections to deactivate derived entries.

Information Architecture Principles

Queue optimizes prioritization.
Case Review optimizes judgment.
Evidence Search optimizes traceability.
Knowledge Base optimizes future detection quality.
Corrections optimize decontamination.
Provider Controls optimize operational safety.
Audit Log optimizes accountability.

States And Empty States

Every major screen must handle:

No analysis yet
Analysis pending
Internal-only mode
External provider disabled
Provider quota reached
Provider failed
Search returned no result
LLM unavailable
LLM summary unverified
Evidence conflict
Existing corrected decision
Knowledge entry inactive

Empty states should be operational, not explanatory marketing copy. Example intent: "No Naver results for this query" with the query and timestamp, not a generic blank panel.

Interaction Design

Recommended controls:

Icon buttons for zoom, rotate, open link, copy URL, retry, disable, history
Segmented controls for risk filters and evidence source filters
Toggle switches for provider enablement
Checkbox selection for bulk queue operations
Menus for secondary actions such as mark irrelevant or create candidate
Textarea with required-state validation for rejection and correction memos
Tabs inside evidence pane only when vertical grouping becomes too long

Do not place cards inside cards. Use panels for major layout regions and compact rows for repeated evidence items.

Accessibility And Safety

All actions must be keyboard reachable.
Focus states must be visible.
Risk cannot be indicated by color alone; include labels such as high, medium, low, failed.
External links open with clear source/domain display.
Destructive or contamination-affecting actions require confirmation and memo.
Applicant-facing surfaces must not be able to render this GUI data.

Responsive Behavior

Primary target is desktop because image comparison and evidence review need space.

On tablet:

Queue remains usable.
Case Review becomes two-pane: image/evidence tabs plus decision panel.

On mobile:

Allow triage and status checks.
Avoid final reject/correction workflows unless the actual target product requires mobile operations.

Data Needed From Current Backend

The existing backend presenter and evidence model already provide most of the needed data:

Submission ID
Image reference
Score and band
Top reasons
Evidence grouped by source
Provider status
LLM summaries
Manual actions
Knowledge-base provenance
Correction/deactivation hooks

Missing for a full GUI integration:

Real user/auth roles
Persistent DB records
Real submission list source
Original image storage and signed URL policy
Actual frontend framework
Admin routing
Audit event store

Design Acceptance Criteria

An operator can identify the next high-risk case from the queue without opening every case.
An operator can make a final decision from one case screen without jumping between unrelated pages.
Naver, Google, internal, and LLM evidence are visually distinct.
LLM text is never presented as authoritative unless linked to source evidence.
Rejection-derived knowledge-base changes are visible before confirmation.
A wrong rejection can be corrected and its derived automatic entries deactivated.
Provider failures and disabled states are visible to operators and admins.
Applicant-facing views cannot access or display automated evidence.

Recommended First GUI MVP

Build these first:

Review Queue
Case Review
Knowledge-base entry creation from a case
Correction flow for rejected decisions
Provider Controls read/write for Naver, Google, and LLM enablement

Defer these:

Advanced analytics
Bulk approve/reject
Mobile full decision workflow
Dedicated brand/logo detector UI
Applicant-facing explanation or appeal flow

Implementation Handoff Notes

The current repo has no frontend application. The next plan should either:

add a small internal web admin app around the existing Python module, or
integrate this GUI into the actual production app once its framework and routes are available.

The second path is preferable if a production admin app already exists elsewhere, because auth, image storage, audit logging, and submission state should follow the real product's conventions.

12 KiB Raw Blame History

Image Rights Operator GUI Design

Goal

Product Posture

Visual Thesis

Core Navigation

Primary Screens

1. Review Queue

2. Case Review

3. Evidence Search

4. Knowledge Base

5. Corrections

6. Provider Controls

7. Audit Log

End-To-End Operator Flow

Information Architecture Principles

States And Empty States

Interaction Design

Accessibility And Safety

Responsive Behavior

Data Needed From Current Backend

Design Acceptance Criteria

Recommended First GUI MVP

Implementation Handoff Notes

12 KiB

Raw Blame History