POSA_Copyrighter/docs/solutions/folder-aware-submission-queue-design-2026-05-29.md
유창욱 3f7b3a9cf2 chore: initial commit of copyrighter (rights_filter)
Image rights / copyright detection system: SQLite store, HTTP app,
search integrations (Naver, Google Custom Search, Google Cloud Vision
web detection), image analysis (fingerprints, face/person detection,
evidence enrichment, risk scoring), an admin/review layer, governance
and retention policies, batch jobs, and a browser-based operator GUI.

This baseline incorporates a full code-review remediation pass
(46 fixes; 358 tests passing). Highlights:

CRITICAL
- Prevent evidence cascade-delete during the schema-constraint
  migration by disabling FK enforcement around the table rebuild.

Security
- Sandbox served media (neutralize stored XSS from uploaded/collected
  SVGs) via CSP + nosniff on the untrusted media routes.
- Strip embedded EXIF/GPS from external image derivatives before they
  are sent to third-party APIs.
- Return a clean 404 (not an uncaught StopIteration) for PATCH on an
  unknown provider.

Correctness
- LLM-summary failures no longer add +30 to the risk score.
- Decode only explicit JS escapes so Korean image URLs are not mangled.
- Consume search quota only after a successful request.
- Naver/Google adapters map responses inside the failure boundary, so a
  malformed response degrades to evidence instead of crashing enrichment.
- Domain-aware provider attribution; face-box IoU de-duplication; count
  searches (not result items); per-box crop isolation; clamp evidence
  confidence and Google CSE num; real submittedEpoch; and more.

Robustness
- Offline LLM connect fast-fails (short connect timeout) so seed/reload
  requests are not stalled; full read timeout preserved for generation.
- Malformed numeric env vars fall back to defaults instead of crashing
  startup.

Performance
- Per-submission evidence reads (no full-table scan per rescore),
  audit-log LIMIT, lazy active-store lookup, hoisted timestamps.

Tests
- ~24 regression tests added pinning the above fixes.

Runtime data (data/, outputs/, *.sqlite3, *.log), secrets (.env), and
node_modules are gitignored.
2026-06-09 09:50:31 +09:00

72 lines
3.3 KiB
Markdown

# Folder-Aware Submission Queue Design (2026-05-29)
## Goal
- On server startup, do not auto-import images from the configured default folder.
- Activate an import queue only when the user requests a folder import via `POST /api/submissions/import-folder`.
- Keep queue results in SQLite, keyed by folder path.
- Re-importing the same folder should keep only currently present files (deleted files are removed from that queue).
- Importing a different folder should switch active queue and avoid mixing submissions across queues.
## Requirements from user intent
1. Startup is central and shared; queue selection is driven by user-imported folders.
2. User folder import sets the active queue for the current session.
3. Queue state and submission mapping are persisted in DB.
4. Same folder re-import returns only remaining submissions.
5. Different folder import starts/activates a separate queue.
## Data model
- `submission_queues` table:
- `id`: `queue-{sha1(abs_path)[:16]}`
- `folder_path`: absolute folder path
- `label`: queue label (default = folder name)
- `is_active`: 1 if active queue
- `created_at`, `created_epoch`
- `last_imported_at`, `last_imported_epoch`
- `submissions` already stores `queue_id` and defaults to `""`.
## Runtime flow
1. Server startup
- `CopyrighterStore.initialize()` only.
- No `seed_from_image_store(...)` during startup.
2. User-import folder
- Endpoint: `POST /api/submissions/import-folder`
- Request body: `{ path: "..." }` (or `{ folder: "..." }`)
- Handler creates `LocalSubmissionImageStore(path)` and calls `seed_from_image_store()`.
- `seed_from_image_store()` calls `ensure_queue(path)` first.
- Queue row is created/selected and marked active.
- First-time legacy queue-less rows are migrated to the active queue (`queue_id`).
- Bootstrap payload is returned with `submissionQueue` and queue-filtered `submissions`.
3. Same folder re-import
- `seed_from_image_store()` calls `_prune_missing_submission_files(...)` for that queue.
- Existing records missing on disk are removed.
- Only records missing from DB are added.
4. Different folder import
- `ensure_queue(path)` deactivates previous queues and activates the target folder queue.
- Bootstrap/reload endpoints use active queue only.
5. Persistence
- Active queue metadata is saved in `submission_queues`.
- Restarting store (`initialize()` + new `CopyrighterStore`) keeps active queue.
## API behavior
- `GET /api/bootstrap`
- Returns `submissionQueue` for the currently active queue.
- Returns only `submissions` belonging to that queue.
- `POST /api/submissions/reload`
- Re-syncs only active queue.
- `POST /api/submissions/import-folder`
- Switches/creates active queue, persists queue metadata, syncs only selected folder.
## Verification coverage
- HTTP
- `tests/rights_filter/server/test_http_app.py`
- `test_http_server_does_not_auto_import_on_startup`
- `test_http_server_reimports_same_folder_only_with_remaining_submissions`
- Storage/DB
- `tests/rights_filter/server/test_sqlite_store.py`
- `test_sqlite_store_switches_active_queue_when_importing_from_different_folders`
- `test_sqlite_store_reload_uses_remaining_files_only_for_the_active_queue`
- `test_sqlite_store_persists_active_queue_in_database`