POSA_Copyrighter/docs/solutions/best-practices/no-demo-fallbacks-in-production-review-tools-2026-05-27.md
유창욱 3f7b3a9cf2 chore: initial commit of copyrighter (rights_filter)
Image rights / copyright detection system: SQLite store, HTTP app,
search integrations (Naver, Google Custom Search, Google Cloud Vision
web detection), image analysis (fingerprints, face/person detection,
evidence enrichment, risk scoring), an admin/review layer, governance
and retention policies, batch jobs, and a browser-based operator GUI.

This baseline incorporates a full code-review remediation pass
(46 fixes; 358 tests passing). Highlights:

CRITICAL
- Prevent evidence cascade-delete during the schema-constraint
  migration by disabling FK enforcement around the table rebuild.

Security
- Sandbox served media (neutralize stored XSS from uploaded/collected
  SVGs) via CSP + nosniff on the untrusted media routes.
- Strip embedded EXIF/GPS from external image derivatives before they
  are sent to third-party APIs.
- Return a clean 404 (not an uncaught StopIteration) for PATCH on an
  unknown provider.

Correctness
- LLM-summary failures no longer add +30 to the risk score.
- Decode only explicit JS escapes so Korean image URLs are not mangled.
- Consume search quota only after a successful request.
- Naver/Google adapters map responses inside the failure boundary, so a
  malformed response degrades to evidence instead of crashing enrichment.
- Domain-aware provider attribution; face-box IoU de-duplication; count
  searches (not result items); per-box crop isolation; clamp evidence
  confidence and Google CSE num; real submittedEpoch; and more.

Robustness
- Offline LLM connect fast-fails (short connect timeout) so seed/reload
  requests are not stalled; full read timeout preserved for generation.
- Malformed numeric env vars fall back to defaults instead of crashing
  startup.

Performance
- Per-submission evidence reads (no full-table scan per rescore),
  audit-log LIMIT, lazy active-store lookup, hoisted timestamps.

Tests
- ~24 regression tests added pinning the above fixes.

Runtime data (data/, outputs/, *.sqlite3, *.log), secrets (.env), and
node_modules are gitignored.
2026-06-09 09:50:31 +09:00

96 lines
3.4 KiB
Markdown

---
title: No Demo Fallbacks in Production Review Tools
date: 2026-05-27
category: docs/solutions/best-practices/
module: copyrighter operations hardening
problem_type: best_practice
component: development_workflow
severity: high
applies_when:
- "A review or moderation tool is used for operator decisions"
- "Local tests need fixtures that should not affect production behavior"
- "API failure, image decoding failure, or database drift could change operator judgment"
tags: [demo-data, operator-ui, image-analysis, sqlite, evidence-ids]
---
# No Demo Fallbacks in Production Review Tools
## Context
The Copyrighter operator console and analysis engine had several convenience paths that were useful while prototyping but unsafe for rights review work. The frontend could render hardcoded sample cases when the API failed, face detection could infer a face from marker text in image bytes, pHash could turn non-image bytes into a fuzzy hash, and evidence IDs used Python's salted `hash()`.
## Guidance
Keep demo behavior out of production runtime paths. If a test needs synthetic evidence, inject it through an explicit test double rather than hiding it in product code.
For UI startup, initialize operational collections as empty and render a clear API failure state:
```js
const submissions = [];
function showApiError(message) {
state.apiError = message;
const target = document.getElementById("queue-health");
if (target) target.textContent = message;
}
```
For analysis, failed image decoding should mean "no local signal", not a text-marker signal or fuzzy byte similarity:
```py
def detect(self, image: ImagePayload) -> FacePersonSignal:
return self._detect_with_opencv(image.content)
```
When a fallback value is necessary, make it non-contributing unless it is an exact match. For evidence identity, derive IDs from a stable digest:
```py
return "ev-" + hashlib.sha256(base.encode("utf-8")).hexdigest()[:24]
```
For persistence, add enough typed columns and write-time validation that malformed payloads fail before they become silent JSON drift.
## Why This Matters
Operator tools are decision surfaces. If an API failure renders fake cases, or a failed analyzer creates plausible-looking evidence, the operator can make a real decision from non-real inputs. Stable evidence IDs also matter because dedupe, audit trails, and reanalysis history depend on the same evidence retaining the same identity across restarts.
## When to Apply
- Any console that reviewers use to approve, reject, or hold cases.
- Any analyzer fallback that can influence risk score, evidence grouping, or case status.
- Any local fixture or sample data path that could be bundled with the UI.
- Any persistence layer that stores flexible payloads but still feeds operational decisions.
## Examples
Before:
```py
if opencv_signal.present:
return opencv_signal
return self._detect_marker_text(image.content)
```
After:
```py
return self._detect_with_opencv(image.content)
```
Before:
```js
const submissions = [{ id: "SUB-1007", riskBand: "high" }];
```
After:
```js
const submissions = [];
```
Tests should move the synthetic signal into the test boundary:
```py
class OneFaceDetector:
def detect(self, image):
return FacePersonSignal(face_count=1, person_count=1)
```
## Related
- Review fix touched `src/rights_filter/analysis/face_person_detection.py`, `src/rights_filter/analysis/fingerprints.py`, `src/rights_filter/server/sqlite_store.py`, and `web/operator-gui/app.js`.