Image rights / copyright detection system: SQLite store, HTTP app, search integrations (Naver, Google Custom Search, Google Cloud Vision web detection), image analysis (fingerprints, face/person detection, evidence enrichment, risk scoring), an admin/review layer, governance and retention policies, batch jobs, and a browser-based operator GUI. This baseline incorporates a full code-review remediation pass (46 fixes; 358 tests passing). Highlights: CRITICAL - Prevent evidence cascade-delete during the schema-constraint migration by disabling FK enforcement around the table rebuild. Security - Sandbox served media (neutralize stored XSS from uploaded/collected SVGs) via CSP + nosniff on the untrusted media routes. - Strip embedded EXIF/GPS from external image derivatives before they are sent to third-party APIs. - Return a clean 404 (not an uncaught StopIteration) for PATCH on an unknown provider. Correctness - LLM-summary failures no longer add +30 to the risk score. - Decode only explicit JS escapes so Korean image URLs are not mangled. - Consume search quota only after a successful request. - Naver/Google adapters map responses inside the failure boundary, so a malformed response degrades to evidence instead of crashing enrichment. - Domain-aware provider attribution; face-box IoU de-duplication; count searches (not result items); per-box crop isolation; clamp evidence confidence and Google CSE num; real submittedEpoch; and more. Robustness - Offline LLM connect fast-fails (short connect timeout) so seed/reload requests are not stalled; full read timeout preserved for generation. - Malformed numeric env vars fall back to defaults instead of crashing startup. Performance - Per-submission evidence reads (no full-table scan per rescore), audit-log LIMIT, lazy active-store lookup, hoisted timestamps. Tests - ~24 regression tests added pinning the above fixes. Runtime data (data/, outputs/, *.sqlite3, *.log), secrets (.env), and node_modules are gitignored.
72 lines
3.3 KiB
Markdown
72 lines
3.3 KiB
Markdown
# Folder-Aware Submission Queue Design (2026-05-29)
|
|
|
|
## Goal
|
|
- On server startup, do not auto-import images from the configured default folder.
|
|
- Activate an import queue only when the user requests a folder import via `POST /api/submissions/import-folder`.
|
|
- Keep queue results in SQLite, keyed by folder path.
|
|
- Re-importing the same folder should keep only currently present files (deleted files are removed from that queue).
|
|
- Importing a different folder should switch active queue and avoid mixing submissions across queues.
|
|
|
|
## Requirements from user intent
|
|
1. Startup is central and shared; queue selection is driven by user-imported folders.
|
|
2. User folder import sets the active queue for the current session.
|
|
3. Queue state and submission mapping are persisted in DB.
|
|
4. Same folder re-import returns only remaining submissions.
|
|
5. Different folder import starts/activates a separate queue.
|
|
|
|
## Data model
|
|
- `submission_queues` table:
|
|
- `id`: `queue-{sha1(abs_path)[:16]}`
|
|
- `folder_path`: absolute folder path
|
|
- `label`: queue label (default = folder name)
|
|
- `is_active`: 1 if active queue
|
|
- `created_at`, `created_epoch`
|
|
- `last_imported_at`, `last_imported_epoch`
|
|
- `submissions` already stores `queue_id` and defaults to `""`.
|
|
|
|
## Runtime flow
|
|
1. Server startup
|
|
- `CopyrighterStore.initialize()` only.
|
|
- No `seed_from_image_store(...)` during startup.
|
|
|
|
2. User-import folder
|
|
- Endpoint: `POST /api/submissions/import-folder`
|
|
- Request body: `{ path: "..." }` (or `{ folder: "..." }`)
|
|
- Handler creates `LocalSubmissionImageStore(path)` and calls `seed_from_image_store()`.
|
|
- `seed_from_image_store()` calls `ensure_queue(path)` first.
|
|
- Queue row is created/selected and marked active.
|
|
- First-time legacy queue-less rows are migrated to the active queue (`queue_id`).
|
|
- Bootstrap payload is returned with `submissionQueue` and queue-filtered `submissions`.
|
|
|
|
3. Same folder re-import
|
|
- `seed_from_image_store()` calls `_prune_missing_submission_files(...)` for that queue.
|
|
- Existing records missing on disk are removed.
|
|
- Only records missing from DB are added.
|
|
|
|
4. Different folder import
|
|
- `ensure_queue(path)` deactivates previous queues and activates the target folder queue.
|
|
- Bootstrap/reload endpoints use active queue only.
|
|
|
|
5. Persistence
|
|
- Active queue metadata is saved in `submission_queues`.
|
|
- Restarting store (`initialize()` + new `CopyrighterStore`) keeps active queue.
|
|
|
|
## API behavior
|
|
- `GET /api/bootstrap`
|
|
- Returns `submissionQueue` for the currently active queue.
|
|
- Returns only `submissions` belonging to that queue.
|
|
- `POST /api/submissions/reload`
|
|
- Re-syncs only active queue.
|
|
- `POST /api/submissions/import-folder`
|
|
- Switches/creates active queue, persists queue metadata, syncs only selected folder.
|
|
|
|
## Verification coverage
|
|
- HTTP
|
|
- `tests/rights_filter/server/test_http_app.py`
|
|
- `test_http_server_does_not_auto_import_on_startup`
|
|
- `test_http_server_reimports_same_folder_only_with_remaining_submissions`
|
|
- Storage/DB
|
|
- `tests/rights_filter/server/test_sqlite_store.py`
|
|
- `test_sqlite_store_switches_active_queue_when_importing_from_different_folders`
|
|
- `test_sqlite_store_reload_uses_remaining_files_only_for_the_active_queue`
|
|
- `test_sqlite_store_persists_active_queue_in_database`
|