Image rights / copyright detection system: SQLite store, HTTP app, search integrations (Naver, Google Custom Search, Google Cloud Vision web detection), image analysis (fingerprints, face/person detection, evidence enrichment, risk scoring), an admin/review layer, governance and retention policies, batch jobs, and a browser-based operator GUI. This baseline incorporates a full code-review remediation pass (46 fixes; 358 tests passing). Highlights: CRITICAL - Prevent evidence cascade-delete during the schema-constraint migration by disabling FK enforcement around the table rebuild. Security - Sandbox served media (neutralize stored XSS from uploaded/collected SVGs) via CSP + nosniff on the untrusted media routes. - Strip embedded EXIF/GPS from external image derivatives before they are sent to third-party APIs. - Return a clean 404 (not an uncaught StopIteration) for PATCH on an unknown provider. Correctness - LLM-summary failures no longer add +30 to the risk score. - Decode only explicit JS escapes so Korean image URLs are not mangled. - Consume search quota only after a successful request. - Naver/Google adapters map responses inside the failure boundary, so a malformed response degrades to evidence instead of crashing enrichment. - Domain-aware provider attribution; face-box IoU de-duplication; count searches (not result items); per-box crop isolation; clamp evidence confidence and Google CSE num; real submittedEpoch; and more. Robustness - Offline LLM connect fast-fails (short connect timeout) so seed/reload requests are not stalled; full read timeout preserved for generation. - Malformed numeric env vars fall back to defaults instead of crashing startup. Performance - Per-submission evidence reads (no full-table scan per rescore), audit-log LIMIT, lazy active-store lookup, hoisted timestamps. Tests - ~24 regression tests added pinning the above fixes. Runtime data (data/, outputs/, *.sqlite3, *.log), secrets (.env), and node_modules are gitignored.
96 lines
3.4 KiB
Markdown
96 lines
3.4 KiB
Markdown
---
|
|
title: No Demo Fallbacks in Production Review Tools
|
|
date: 2026-05-27
|
|
category: docs/solutions/best-practices/
|
|
module: copyrighter operations hardening
|
|
problem_type: best_practice
|
|
component: development_workflow
|
|
severity: high
|
|
applies_when:
|
|
- "A review or moderation tool is used for operator decisions"
|
|
- "Local tests need fixtures that should not affect production behavior"
|
|
- "API failure, image decoding failure, or database drift could change operator judgment"
|
|
tags: [demo-data, operator-ui, image-analysis, sqlite, evidence-ids]
|
|
---
|
|
|
|
# No Demo Fallbacks in Production Review Tools
|
|
|
|
## Context
|
|
The Copyrighter operator console and analysis engine had several convenience paths that were useful while prototyping but unsafe for rights review work. The frontend could render hardcoded sample cases when the API failed, face detection could infer a face from marker text in image bytes, pHash could turn non-image bytes into a fuzzy hash, and evidence IDs used Python's salted `hash()`.
|
|
|
|
## Guidance
|
|
Keep demo behavior out of production runtime paths. If a test needs synthetic evidence, inject it through an explicit test double rather than hiding it in product code.
|
|
|
|
For UI startup, initialize operational collections as empty and render a clear API failure state:
|
|
|
|
```js
|
|
const submissions = [];
|
|
|
|
function showApiError(message) {
|
|
state.apiError = message;
|
|
const target = document.getElementById("queue-health");
|
|
if (target) target.textContent = message;
|
|
}
|
|
```
|
|
|
|
For analysis, failed image decoding should mean "no local signal", not a text-marker signal or fuzzy byte similarity:
|
|
|
|
```py
|
|
def detect(self, image: ImagePayload) -> FacePersonSignal:
|
|
return self._detect_with_opencv(image.content)
|
|
```
|
|
|
|
When a fallback value is necessary, make it non-contributing unless it is an exact match. For evidence identity, derive IDs from a stable digest:
|
|
|
|
```py
|
|
return "ev-" + hashlib.sha256(base.encode("utf-8")).hexdigest()[:24]
|
|
```
|
|
|
|
For persistence, add enough typed columns and write-time validation that malformed payloads fail before they become silent JSON drift.
|
|
|
|
## Why This Matters
|
|
Operator tools are decision surfaces. If an API failure renders fake cases, or a failed analyzer creates plausible-looking evidence, the operator can make a real decision from non-real inputs. Stable evidence IDs also matter because dedupe, audit trails, and reanalysis history depend on the same evidence retaining the same identity across restarts.
|
|
|
|
## When to Apply
|
|
- Any console that reviewers use to approve, reject, or hold cases.
|
|
- Any analyzer fallback that can influence risk score, evidence grouping, or case status.
|
|
- Any local fixture or sample data path that could be bundled with the UI.
|
|
- Any persistence layer that stores flexible payloads but still feeds operational decisions.
|
|
|
|
## Examples
|
|
Before:
|
|
|
|
```py
|
|
if opencv_signal.present:
|
|
return opencv_signal
|
|
return self._detect_marker_text(image.content)
|
|
```
|
|
|
|
After:
|
|
|
|
```py
|
|
return self._detect_with_opencv(image.content)
|
|
```
|
|
|
|
Before:
|
|
|
|
```js
|
|
const submissions = [{ id: "SUB-1007", riskBand: "high" }];
|
|
```
|
|
|
|
After:
|
|
|
|
```js
|
|
const submissions = [];
|
|
```
|
|
|
|
Tests should move the synthetic signal into the test boundary:
|
|
|
|
```py
|
|
class OneFaceDetector:
|
|
def detect(self, image):
|
|
return FacePersonSignal(face_count=1, person_count=1)
|
|
```
|
|
|
|
## Related
|
|
- Review fix touched `src/rights_filter/analysis/face_person_detection.py`, `src/rights_filter/analysis/fingerprints.py`, `src/rights_filter/server/sqlite_store.py`, and `web/operator-gui/app.js`.
|