Image rights / copyright detection system: SQLite store, HTTP app, search integrations (Naver, Google Custom Search, Google Cloud Vision web detection), image analysis (fingerprints, face/person detection, evidence enrichment, risk scoring), an admin/review layer, governance and retention policies, batch jobs, and a browser-based operator GUI. This baseline incorporates a full code-review remediation pass (46 fixes; 358 tests passing). Highlights: CRITICAL - Prevent evidence cascade-delete during the schema-constraint migration by disabling FK enforcement around the table rebuild. Security - Sandbox served media (neutralize stored XSS from uploaded/collected SVGs) via CSP + nosniff on the untrusted media routes. - Strip embedded EXIF/GPS from external image derivatives before they are sent to third-party APIs. - Return a clean 404 (not an uncaught StopIteration) for PATCH on an unknown provider. Correctness - LLM-summary failures no longer add +30 to the risk score. - Decode only explicit JS escapes so Korean image URLs are not mangled. - Consume search quota only after a successful request. - Naver/Google adapters map responses inside the failure boundary, so a malformed response degrades to evidence instead of crashing enrichment. - Domain-aware provider attribution; face-box IoU de-duplication; count searches (not result items); per-box crop isolation; clamp evidence confidence and Google CSE num; real submittedEpoch; and more. Robustness - Offline LLM connect fast-fails (short connect timeout) so seed/reload requests are not stalled; full read timeout preserved for generation. - Malformed numeric env vars fall back to defaults instead of crashing startup. Performance - Per-submission evidence reads (no full-table scan per rescore), audit-log LIMIT, lazy active-store lookup, hoisted timestamps. Tests - ~24 regression tests added pinning the above fixes. Runtime data (data/, outputs/, *.sqlite3, *.log), secrets (.env), and node_modules are gitignored.
3.3 KiB
3.3 KiB
Folder-Aware Submission Queue Design (2026-05-29)
Goal
- On server startup, do not auto-import images from the configured default folder.
- Activate an import queue only when the user requests a folder import via
POST /api/submissions/import-folder. - Keep queue results in SQLite, keyed by folder path.
- Re-importing the same folder should keep only currently present files (deleted files are removed from that queue).
- Importing a different folder should switch active queue and avoid mixing submissions across queues.
Requirements from user intent
- Startup is central and shared; queue selection is driven by user-imported folders.
- User folder import sets the active queue for the current session.
- Queue state and submission mapping are persisted in DB.
- Same folder re-import returns only remaining submissions.
- Different folder import starts/activates a separate queue.
Data model
submission_queuestable:id:queue-{sha1(abs_path)[:16]}folder_path: absolute folder pathlabel: queue label (default = folder name)is_active: 1 if active queuecreated_at,created_epochlast_imported_at,last_imported_epoch
submissionsalready storesqueue_idand defaults to"".
Runtime flow
-
Server startup
CopyrighterStore.initialize()only.- No
seed_from_image_store(...)during startup.
-
User-import folder
- Endpoint:
POST /api/submissions/import-folder - Request body:
{ path: "..." }(or{ folder: "..." }) - Handler creates
LocalSubmissionImageStore(path)and callsseed_from_image_store(). seed_from_image_store()callsensure_queue(path)first.- Queue row is created/selected and marked active.
- First-time legacy queue-less rows are migrated to the active queue (
queue_id).
- Bootstrap payload is returned with
submissionQueueand queue-filteredsubmissions.
- Endpoint:
-
Same folder re-import
seed_from_image_store()calls_prune_missing_submission_files(...)for that queue.- Existing records missing on disk are removed.
- Only records missing from DB are added.
-
Different folder import
ensure_queue(path)deactivates previous queues and activates the target folder queue.- Bootstrap/reload endpoints use active queue only.
-
Persistence
- Active queue metadata is saved in
submission_queues. - Restarting store (
initialize()+ newCopyrighterStore) keeps active queue.
- Active queue metadata is saved in
API behavior
GET /api/bootstrap- Returns
submissionQueuefor the currently active queue. - Returns only
submissionsbelonging to that queue.
- Returns
POST /api/submissions/reload- Re-syncs only active queue.
POST /api/submissions/import-folder- Switches/creates active queue, persists queue metadata, syncs only selected folder.
Verification coverage
- HTTP
tests/rights_filter/server/test_http_app.pytest_http_server_does_not_auto_import_on_startuptest_http_server_reimports_same_folder_only_with_remaining_submissions
- Storage/DB
tests/rights_filter/server/test_sqlite_store.pytest_sqlite_store_switches_active_queue_when_importing_from_different_folderstest_sqlite_store_reload_uses_remaining_files_only_for_the_active_queuetest_sqlite_store_persists_active_queue_in_database