chore: initial commit of copyrighter (rights_filter)
Image rights / copyright detection system: SQLite store, HTTP app, search integrations (Naver, Google Custom Search, Google Cloud Vision web detection), image analysis (fingerprints, face/person detection, evidence enrichment, risk scoring), an admin/review layer, governance and retention policies, batch jobs, and a browser-based operator GUI. This baseline incorporates a full code-review remediation pass (46 fixes; 358 tests passing). Highlights: CRITICAL - Prevent evidence cascade-delete during the schema-constraint migration by disabling FK enforcement around the table rebuild. Security - Sandbox served media (neutralize stored XSS from uploaded/collected SVGs) via CSP + nosniff on the untrusted media routes. - Strip embedded EXIF/GPS from external image derivatives before they are sent to third-party APIs. - Return a clean 404 (not an uncaught StopIteration) for PATCH on an unknown provider. Correctness - LLM-summary failures no longer add +30 to the risk score. - Decode only explicit JS escapes so Korean image URLs are not mangled. - Consume search quota only after a successful request. - Naver/Google adapters map responses inside the failure boundary, so a malformed response degrades to evidence instead of crashing enrichment. - Domain-aware provider attribution; face-box IoU de-duplication; count searches (not result items); per-box crop isolation; clamp evidence confidence and Google CSE num; real submittedEpoch; and more. Robustness - Offline LLM connect fast-fails (short connect timeout) so seed/reload requests are not stalled; full read timeout preserved for generation. - Malformed numeric env vars fall back to defaults instead of crashing startup. Performance - Per-submission evidence reads (no full-table scan per rescore), audit-log LIMIT, lazy active-store lookup, hoisted timestamps. Tests - ~24 regression tests added pinning the above fixes. Runtime data (data/, outputs/, *.sqlite3, *.log), secrets (.env), and node_modules are gitignored.
This commit is contained in:
commit
3f7b3a9cf2
123 changed files with 36599 additions and 0 deletions
41
.env.example
Normal file
41
.env.example
Normal file
|
|
@ -0,0 +1,41 @@
|
||||||
|
NAVER_CLIENT_ID=
|
||||||
|
NAVER_CLIENT_SECRET=
|
||||||
|
NAVER_SEARCH_DISPLAY=10
|
||||||
|
NAVER_SEARCH_PAGES=1
|
||||||
|
NAVER_SEARCH_SORT=sim
|
||||||
|
NAVER_BLOG_SEARCH_DISPLAY=3
|
||||||
|
NAVER_BLOG_SEARCH_PAGES=1
|
||||||
|
NAVER_BLOG_SEARCH_SORT=sim
|
||||||
|
NAVER_WEB_SEARCH_DISPLAY=3
|
||||||
|
NAVER_WEB_SEARCH_PAGES=1
|
||||||
|
|
||||||
|
GOOGLE_CLOUD_VISION_API_KEY=
|
||||||
|
GOOGLE_CLOUD_VISION_PARENT=
|
||||||
|
COPYRIGHTER_GOOGLE_FACE_CROP_SEARCH=false
|
||||||
|
|
||||||
|
GOOGLE_CUSTOM_SEARCH_API_KEY=
|
||||||
|
GOOGLE_CUSTOM_SEARCH_CX=
|
||||||
|
GOOGLE_CUSTOM_SEARCH_IMAGE_RESULTS=3
|
||||||
|
GOOGLE_CUSTOM_SEARCH_IMAGE_PAGES=1
|
||||||
|
GOOGLE_CUSTOM_SEARCH_WEB_RESULTS=3
|
||||||
|
GOOGLE_CUSTOM_SEARCH_WEB_PAGES=1
|
||||||
|
|
||||||
|
COPYRIGHTER_AUTO_NAVER_QUERY_LIMIT=3
|
||||||
|
COPYRIGHTER_AUTO_NAVER_BLOG_QUERY_LIMIT=1
|
||||||
|
COPYRIGHTER_AUTO_NAVER_WEB_QUERY_LIMIT=1
|
||||||
|
COPYRIGHTER_AUTO_GOOGLE_CUSTOM_QUERY_LIMIT=2
|
||||||
|
COPYRIGHTER_SEARCH_RESULT_COMPARE_LIMIT=3
|
||||||
|
COPYRIGHTER_SEARCH_RESULT_PAGE_IMAGE_LIMIT=3
|
||||||
|
COPYRIGHTER_SEARCH_RESULT_SIMILARITY_THRESHOLD=0.9
|
||||||
|
COPYRIGHTER_COVERAGE_GOOD_THRESHOLD=70
|
||||||
|
COPYRIGHTER_COVERAGE_WARN_THRESHOLD=40
|
||||||
|
COPYRIGHTER_QUERY_COVERAGE_GOOD_THRESHOLD=70
|
||||||
|
COPYRIGHTER_QUERY_COVERAGE_WARN_THRESHOLD=40
|
||||||
|
|
||||||
|
COPYRIGHTER_NAVER_DAILY_LIMIT=100
|
||||||
|
COPYRIGHTER_GOOGLE_DAILY_LIMIT=100
|
||||||
|
COPYRIGHTER_GOOGLE_CUSTOM_SEARCH_DAILY_LIMIT=100
|
||||||
|
COPYRIGHTER_LLM_DAILY_LIMIT=100
|
||||||
|
|
||||||
|
OLLAMA_BASE_URL=http://localhost:11434
|
||||||
|
OLLAMA_MODEL=qwen2.5:0.5b-instruct
|
||||||
29
.gitignore
vendored
Normal file
29
.gitignore
vendored
Normal file
|
|
@ -0,0 +1,29 @@
|
||||||
|
# Python
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
*.egg-info/
|
||||||
|
.pytest_cache/
|
||||||
|
.mypy_cache/
|
||||||
|
.ruff_cache/
|
||||||
|
|
||||||
|
# Secrets & local environment (keep .env.example)
|
||||||
|
.env
|
||||||
|
.env.local
|
||||||
|
|
||||||
|
# Node
|
||||||
|
node_modules/
|
||||||
|
|
||||||
|
# Runtime data, databases, logs, generated artifacts
|
||||||
|
data/
|
||||||
|
outputs/
|
||||||
|
*.sqlite3
|
||||||
|
*.sqlite3-journal
|
||||||
|
*.log
|
||||||
|
|
||||||
|
# Temp / scratch
|
||||||
|
tmp_*
|
||||||
|
tmp_dbg_img/
|
||||||
|
|
||||||
|
# OS
|
||||||
|
.DS_Store
|
||||||
|
Thumbs.db
|
||||||
773
docs/ai-ml-visual-explainer.html
Normal file
773
docs/ai-ml-visual-explainer.html
Normal file
|
|
@ -0,0 +1,773 @@
|
||||||
|
<!doctype html>
|
||||||
|
<html lang="ko">
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||||
|
<title>Copyrighter AI/ML 사용 설명서</title>
|
||||||
|
<style>
|
||||||
|
:root {
|
||||||
|
--bg: #f4f1ea;
|
||||||
|
--panel: #ffffff;
|
||||||
|
--panel-soft: #fbfaf6;
|
||||||
|
--ink: #172124;
|
||||||
|
--muted: #5f6b6f;
|
||||||
|
--line: #d7ccbd;
|
||||||
|
--accent: #24667a;
|
||||||
|
--green: #28734f;
|
||||||
|
--red: #a13d35;
|
||||||
|
--amber: #916300;
|
||||||
|
--blue-soft: #edf7fb;
|
||||||
|
--green-soft: #ecf8f1;
|
||||||
|
--amber-soft: #fff7e2;
|
||||||
|
--red-soft: #fff0ed;
|
||||||
|
--shadow: 0 18px 42px rgba(23, 33, 36, 0.12);
|
||||||
|
}
|
||||||
|
|
||||||
|
* {
|
||||||
|
box-sizing: border-box;
|
||||||
|
}
|
||||||
|
|
||||||
|
body {
|
||||||
|
margin: 0;
|
||||||
|
background: var(--bg);
|
||||||
|
color: var(--ink);
|
||||||
|
font-family: "Segoe UI", "Apple SD Gothic Neo", "Malgun Gothic", sans-serif;
|
||||||
|
line-height: 1.58;
|
||||||
|
}
|
||||||
|
|
||||||
|
a {
|
||||||
|
color: inherit;
|
||||||
|
}
|
||||||
|
|
||||||
|
.shell {
|
||||||
|
width: min(1180px, calc(100vw - 32px));
|
||||||
|
margin: 0 auto;
|
||||||
|
}
|
||||||
|
|
||||||
|
.hero {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: minmax(0, 0.95fr) minmax(420px, 1.05fr);
|
||||||
|
gap: 30px;
|
||||||
|
align-items: center;
|
||||||
|
min-height: 84vh;
|
||||||
|
padding: 42px 0 26px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.eyebrow {
|
||||||
|
margin: 0 0 10px;
|
||||||
|
color: var(--accent);
|
||||||
|
font-size: 12px;
|
||||||
|
font-weight: 900;
|
||||||
|
letter-spacing: 0.08em;
|
||||||
|
text-transform: uppercase;
|
||||||
|
}
|
||||||
|
|
||||||
|
h1,
|
||||||
|
h2,
|
||||||
|
h3 {
|
||||||
|
margin: 0;
|
||||||
|
line-height: 1.18;
|
||||||
|
}
|
||||||
|
|
||||||
|
h1 {
|
||||||
|
max-width: 760px;
|
||||||
|
font-size: clamp(38px, 5.3vw, 70px);
|
||||||
|
}
|
||||||
|
|
||||||
|
h2 {
|
||||||
|
font-size: clamp(28px, 3.1vw, 42px);
|
||||||
|
}
|
||||||
|
|
||||||
|
h3 {
|
||||||
|
font-size: 18px;
|
||||||
|
}
|
||||||
|
|
||||||
|
p {
|
||||||
|
margin: 10px 0 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
.lead {
|
||||||
|
max-width: 670px;
|
||||||
|
margin-top: 18px;
|
||||||
|
color: var(--muted);
|
||||||
|
font-size: 18px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.hero-card,
|
||||||
|
.panel,
|
||||||
|
.image-card,
|
||||||
|
.claim-card,
|
||||||
|
.metric-card {
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 8px;
|
||||||
|
background: var(--panel);
|
||||||
|
}
|
||||||
|
|
||||||
|
.hero-card {
|
||||||
|
overflow: hidden;
|
||||||
|
box-shadow: var(--shadow);
|
||||||
|
}
|
||||||
|
|
||||||
|
.hero-card img,
|
||||||
|
.image-card img {
|
||||||
|
display: block;
|
||||||
|
width: 100%;
|
||||||
|
height: auto;
|
||||||
|
}
|
||||||
|
|
||||||
|
.caption {
|
||||||
|
padding: 12px 14px;
|
||||||
|
color: var(--muted);
|
||||||
|
font-size: 13px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.answer-strip {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: 1fr 1fr;
|
||||||
|
gap: 14px;
|
||||||
|
margin: 0 0 32px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.verdict,
|
||||||
|
.warning {
|
||||||
|
padding: 22px;
|
||||||
|
border-radius: 8px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.verdict {
|
||||||
|
border: 1px solid #9ec8b0;
|
||||||
|
background: var(--green-soft);
|
||||||
|
}
|
||||||
|
|
||||||
|
.warning {
|
||||||
|
border: 1px solid #e5c08a;
|
||||||
|
background: var(--amber-soft);
|
||||||
|
}
|
||||||
|
|
||||||
|
.verdict strong,
|
||||||
|
.warning strong {
|
||||||
|
display: block;
|
||||||
|
margin-bottom: 6px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.verdict strong {
|
||||||
|
color: var(--green);
|
||||||
|
font-size: 28px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.warning strong {
|
||||||
|
color: var(--amber);
|
||||||
|
font-size: 20px;
|
||||||
|
}
|
||||||
|
|
||||||
|
section {
|
||||||
|
padding: 46px 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
.section-head {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: minmax(0, 0.72fr) minmax(320px, 0.28fr);
|
||||||
|
gap: 22px;
|
||||||
|
align-items: end;
|
||||||
|
margin-bottom: 18px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.section-head p,
|
||||||
|
.panel p,
|
||||||
|
.claim-card p,
|
||||||
|
.metric-card p,
|
||||||
|
.plain-list li {
|
||||||
|
color: var(--muted);
|
||||||
|
}
|
||||||
|
|
||||||
|
.claim-grid {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(4, minmax(0, 1fr));
|
||||||
|
gap: 12px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.claim-card,
|
||||||
|
.metric-card,
|
||||||
|
.panel {
|
||||||
|
padding: 16px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.claim-card strong {
|
||||||
|
display: block;
|
||||||
|
margin-bottom: 8px;
|
||||||
|
font-size: 15px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.tag {
|
||||||
|
display: inline-flex;
|
||||||
|
align-items: center;
|
||||||
|
min-height: 24px;
|
||||||
|
margin-bottom: 12px;
|
||||||
|
padding: 3px 8px;
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 999px;
|
||||||
|
background: var(--panel-soft);
|
||||||
|
color: var(--muted);
|
||||||
|
font-size: 12px;
|
||||||
|
font-weight: 800;
|
||||||
|
white-space: nowrap;
|
||||||
|
}
|
||||||
|
|
||||||
|
.tag.ai {
|
||||||
|
border-color: #b9cfdb;
|
||||||
|
background: var(--blue-soft);
|
||||||
|
color: var(--accent);
|
||||||
|
}
|
||||||
|
|
||||||
|
.tag.ml {
|
||||||
|
border-color: #a6d4b7;
|
||||||
|
background: var(--green-soft);
|
||||||
|
color: var(--green);
|
||||||
|
}
|
||||||
|
|
||||||
|
.tag.guard {
|
||||||
|
border-color: #e5c08a;
|
||||||
|
background: var(--amber-soft);
|
||||||
|
color: var(--amber);
|
||||||
|
}
|
||||||
|
|
||||||
|
.tag.rule {
|
||||||
|
border-color: #c4b6a2;
|
||||||
|
background: #f6efe3;
|
||||||
|
color: #594b36;
|
||||||
|
}
|
||||||
|
|
||||||
|
.two-col {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: minmax(0, 1fr) minmax(0, 1fr);
|
||||||
|
gap: 16px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.three-col {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(3, minmax(0, 1fr));
|
||||||
|
gap: 12px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.metric-card strong {
|
||||||
|
display: block;
|
||||||
|
color: var(--accent);
|
||||||
|
font-size: 28px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.mermaid-wrap {
|
||||||
|
padding: 16px;
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 8px;
|
||||||
|
background: #ffffff;
|
||||||
|
overflow: auto;
|
||||||
|
}
|
||||||
|
|
||||||
|
.evidence-stack {
|
||||||
|
display: grid;
|
||||||
|
gap: 12px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.evidence-row {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: 180px minmax(0, 1fr) 170px;
|
||||||
|
gap: 12px;
|
||||||
|
align-items: start;
|
||||||
|
padding: 14px;
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 8px;
|
||||||
|
background: var(--panel);
|
||||||
|
}
|
||||||
|
|
||||||
|
.evidence-row strong,
|
||||||
|
.evidence-row span {
|
||||||
|
min-width: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
.evidence-row span {
|
||||||
|
color: var(--muted);
|
||||||
|
}
|
||||||
|
|
||||||
|
.score-table {
|
||||||
|
width: 100%;
|
||||||
|
border-collapse: collapse;
|
||||||
|
overflow: hidden;
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 8px;
|
||||||
|
background: var(--panel);
|
||||||
|
}
|
||||||
|
|
||||||
|
.score-table th,
|
||||||
|
.score-table td {
|
||||||
|
padding: 12px;
|
||||||
|
border-bottom: 1px solid var(--line);
|
||||||
|
text-align: left;
|
||||||
|
vertical-align: top;
|
||||||
|
}
|
||||||
|
|
||||||
|
.score-table th {
|
||||||
|
background: #efe9dd;
|
||||||
|
color: #3f4b4d;
|
||||||
|
font-size: 13px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.score-table td {
|
||||||
|
color: var(--muted);
|
||||||
|
font-size: 14px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.score-table tr:last-child td {
|
||||||
|
border-bottom: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
.image-grid,
|
||||||
|
.language-box {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: 1fr 1fr;
|
||||||
|
gap: 14px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.do,
|
||||||
|
.dont {
|
||||||
|
padding: 16px;
|
||||||
|
border-radius: 8px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.do {
|
||||||
|
border: 1px solid #9ec8b0;
|
||||||
|
background: var(--green-soft);
|
||||||
|
}
|
||||||
|
|
||||||
|
.dont {
|
||||||
|
border: 1px solid #e3aaa4;
|
||||||
|
background: var(--red-soft);
|
||||||
|
}
|
||||||
|
|
||||||
|
.plain-list {
|
||||||
|
margin: 12px 0 0;
|
||||||
|
padding-left: 20px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.plain-list li + li {
|
||||||
|
margin-top: 8px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.code-list {
|
||||||
|
display: grid;
|
||||||
|
gap: 8px;
|
||||||
|
margin: 0;
|
||||||
|
padding: 0;
|
||||||
|
list-style: none;
|
||||||
|
}
|
||||||
|
|
||||||
|
.code-list li {
|
||||||
|
padding: 10px 12px;
|
||||||
|
border: 1px solid var(--line);
|
||||||
|
border-radius: 8px;
|
||||||
|
background: var(--panel-soft);
|
||||||
|
font-family: Consolas, "Cascadia Mono", monospace;
|
||||||
|
font-size: 13px;
|
||||||
|
}
|
||||||
|
|
||||||
|
footer {
|
||||||
|
padding: 34px 0 52px;
|
||||||
|
color: var(--muted);
|
||||||
|
font-size: 13px;
|
||||||
|
}
|
||||||
|
|
||||||
|
@media (max-width: 960px) {
|
||||||
|
.hero,
|
||||||
|
.answer-strip,
|
||||||
|
.section-head,
|
||||||
|
.two-col,
|
||||||
|
.image-grid,
|
||||||
|
.language-box {
|
||||||
|
grid-template-columns: 1fr;
|
||||||
|
}
|
||||||
|
|
||||||
|
.claim-grid,
|
||||||
|
.three-col {
|
||||||
|
grid-template-columns: 1fr 1fr;
|
||||||
|
}
|
||||||
|
|
||||||
|
.evidence-row {
|
||||||
|
grid-template-columns: 150px minmax(0, 1fr);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@media (max-width: 620px) {
|
||||||
|
.claim-grid,
|
||||||
|
.three-col,
|
||||||
|
.evidence-row {
|
||||||
|
grid-template-columns: 1fr;
|
||||||
|
}
|
||||||
|
|
||||||
|
section {
|
||||||
|
padding: 34px 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
.score-table {
|
||||||
|
display: block;
|
||||||
|
overflow-x: auto;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
</style>
|
||||||
|
<script type="module">
|
||||||
|
import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs";
|
||||||
|
mermaid.initialize({
|
||||||
|
startOnLoad: true,
|
||||||
|
theme: "base",
|
||||||
|
themeVariables: {
|
||||||
|
primaryColor: "#edf7fb",
|
||||||
|
primaryTextColor: "#172124",
|
||||||
|
primaryBorderColor: "#9eb8c4",
|
||||||
|
lineColor: "#5f6b6f",
|
||||||
|
secondaryColor: "#ecf8f1",
|
||||||
|
tertiaryColor: "#fff7e2",
|
||||||
|
fontFamily: "Segoe UI, Malgun Gothic, sans-serif"
|
||||||
|
}
|
||||||
|
});
|
||||||
|
</script>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<main class="shell">
|
||||||
|
<section class="hero">
|
||||||
|
<div>
|
||||||
|
<p class="eyebrow">AI/ML usage explainer</p>
|
||||||
|
<h1>Copyrighter의 AI/ML은 판정 자동화가 아니라 근거 자동화입니다.</h1>
|
||||||
|
<p class="lead">
|
||||||
|
시스템은 제출 이미지를 분석해 유사 이미지, 웹 출처, 인물 존재 신호, 기준 DB 유사도, 검색 증거, LLM 요약을 만듭니다.
|
||||||
|
최종 저작권 위험 판정은 운영자가 하며, AI/ML 출력은 그 판단을 빠르게 만드는 검토 근거입니다.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
<figure class="hero-card">
|
||||||
|
<img src="../web/operator-gui/pitch-assets/case-review.png" alt="Copyrighter 케이스 심사 화면">
|
||||||
|
<figcaption class="caption">케이스 심사 화면은 위험 점수, 상위 근거, 검색 증거, 요약, 운영자 판정을 한 흐름에 배치합니다.</figcaption>
|
||||||
|
</figure>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<div class="answer-strip">
|
||||||
|
<div class="verdict">
|
||||||
|
<strong>AI/ML 사용이라고 말할 수 있다.</strong>
|
||||||
|
<span>컴퓨터 비전 ML, 로컬 생성형 AI, 얼굴/인물 감지, 이미지 유사도 계산이 실제 처리 흐름에 포함되어 있습니다.</span>
|
||||||
|
</div>
|
||||||
|
<div class="warning">
|
||||||
|
<strong>정확한 표현</strong>
|
||||||
|
<span>“AI가 침해 여부를 확정한다”가 아니라 “AI/ML이 출처 기반 증거와 위험 triage를 생성하고, 운영자가 최종 판단한다”입니다.</span>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<div class="section-head">
|
||||||
|
<div>
|
||||||
|
<p class="eyebrow">Definitions</p>
|
||||||
|
<h2>이 문서에서 AI, ML, 알고리즘은 서로 다른 역할을 합니다.</h2>
|
||||||
|
</div>
|
||||||
|
<p>혼동을 줄이려면 “어떤 기술이 무엇을 보고, 무엇을 산출하며, 그 산출물이 점수에 어떻게 반영되는지”를 분리해야 합니다.</p>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="claim-grid">
|
||||||
|
<article class="claim-card">
|
||||||
|
<span class="tag ml">ML</span>
|
||||||
|
<strong>Google Cloud Vision Web Detection</strong>
|
||||||
|
<p>이미지에서 웹 엔티티, 동일 이미지, 부분 매칭, 유사 이미지, 출처 페이지 후보를 반환합니다. 외부 ML 서비스가 만든 탐지 결과입니다.</p>
|
||||||
|
</article>
|
||||||
|
<article class="claim-card">
|
||||||
|
<span class="tag ai">Generative AI</span>
|
||||||
|
<strong>Ollama 로컬 LLM 요약</strong>
|
||||||
|
<p>저장된 evidence만 입력으로 받아 운영자용 요약을 생성합니다. 새 사실을 만들거나 최종 판정을 내리지 않도록 제한합니다.</p>
|
||||||
|
</article>
|
||||||
|
<article class="claim-card">
|
||||||
|
<span class="tag ml">Classical CV</span>
|
||||||
|
<strong>얼굴/인물 존재 감지</strong>
|
||||||
|
<p>OpenCV Haar cascade로 얼굴 박스 존재를 탐지합니다. 동일인 식별, 얼굴 임베딩, 신원 추정은 수행하지 않습니다.</p>
|
||||||
|
</article>
|
||||||
|
<article class="claim-card">
|
||||||
|
<span class="tag rule">Algorithm</span>
|
||||||
|
<strong>SHA / pHash 이미지 지문</strong>
|
||||||
|
<p>학습 모델은 아니지만 이미지 내용의 지문을 만들고 해밍 거리로 유사도를 계산해 기준 DB 및 검색 결과 이미지와 비교합니다.</p>
|
||||||
|
</article>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<div class="section-head">
|
||||||
|
<div>
|
||||||
|
<p class="eyebrow">Operating pipeline</p>
|
||||||
|
<h2>한 장의 제출 이미지는 evidence 묶음으로 변환됩니다.</h2>
|
||||||
|
</div>
|
||||||
|
<p>위험 점수는 LLM이 직접 만든 값이 아니라, 각 evidence의 유형과 신뢰도에 규칙을 적용해 계산한 triage 점수입니다.</p>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="mermaid-wrap">
|
||||||
|
<pre class="mermaid">
|
||||||
|
flowchart LR
|
||||||
|
A[제출 이미지] --> B[로컬 전처리]
|
||||||
|
B --> C[SHA / pHash 지문 생성]
|
||||||
|
B --> D[얼굴·인물 존재 감지]
|
||||||
|
B --> E[Google Vision Web Detection]
|
||||||
|
E --> F[웹 엔티티·동일 이미지·부분 매칭·출처 페이지]
|
||||||
|
F --> G[Naver 텍스트 검색 보강]
|
||||||
|
F --> H[레거시 Google 맞춤 검색<br/>비활성 가능]
|
||||||
|
C --> I[기준 DB 및 검색 결과 이미지 유사도 비교]
|
||||||
|
D --> J[로컬 인물 존재 evidence]
|
||||||
|
F --> K[Google evidence]
|
||||||
|
G --> L[Naver evidence]
|
||||||
|
H --> L
|
||||||
|
I --> M[유사도 evidence]
|
||||||
|
J --> N[규칙 기반 위험 점수]
|
||||||
|
K --> N
|
||||||
|
L --> N
|
||||||
|
M --> N
|
||||||
|
K --> O[Ollama LLM 요약]
|
||||||
|
L --> O
|
||||||
|
M --> O
|
||||||
|
O --> P[출처 연결 요약 evidence]
|
||||||
|
N --> Q[운영자 검토]
|
||||||
|
P --> Q
|
||||||
|
Q --> R[승인 / 보류 / 반려]
|
||||||
|
|
||||||
|
classDef ai fill:#edf7fb,stroke:#24667a,color:#172124;
|
||||||
|
classDef cv fill:#ecf8f1,stroke:#28734f,color:#172124;
|
||||||
|
classDef rule fill:#fff7e2,stroke:#916300,color:#172124;
|
||||||
|
classDef human fill:#fff0ed,stroke:#a13d35,color:#172124;
|
||||||
|
class E,O ai;
|
||||||
|
class C,D,I cv;
|
||||||
|
class N rule;
|
||||||
|
class Q,R human;
|
||||||
|
</pre>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<div class="section-head">
|
||||||
|
<div>
|
||||||
|
<p class="eyebrow">What comes out</p>
|
||||||
|
<h2>각 기술은 서로 다른 결과물을 만들고, UI는 이를 한 줄의 판단 근거로 모읍니다.</h2>
|
||||||
|
</div>
|
||||||
|
<p>이 구분이 중요합니다. “신뢰도”는 하나의 전역 AI 확률값이 아니라 evidence별 confidence, 유사도, 매칭 유형, 규칙 점수의 조합입니다.</p>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="evidence-stack">
|
||||||
|
<div class="evidence-row">
|
||||||
|
<strong>Google Vision</strong>
|
||||||
|
<span>웹 엔티티, full/partial/visual image match, matching page, weak label을 생성합니다.</span>
|
||||||
|
<span>confidence는 Google score 또는 fallback confidence에서 옵니다.</span>
|
||||||
|
</div>
|
||||||
|
<div class="evidence-row">
|
||||||
|
<strong>Naver 검색</strong>
|
||||||
|
<span>Google/기준 DB에서 나온 이름, 페이지 제목, 라벨을 텍스트 쿼리로 확장해 블로그/웹문서 근거를 모읍니다.</span>
|
||||||
|
<span>promoted 결과만 위험 점수에 직접 기여합니다.</span>
|
||||||
|
</div>
|
||||||
|
<div class="evidence-row">
|
||||||
|
<strong>pHash 유사도</strong>
|
||||||
|
<span>64비트 perceptual hash의 해밍 거리로 0.0~1.0 유사도를 계산합니다.</span>
|
||||||
|
<span>0.9 이상이면 강한 동일/유사 이미지 신호로 취급합니다.</span>
|
||||||
|
</div>
|
||||||
|
<div class="evidence-row">
|
||||||
|
<strong>얼굴/인물 감지</strong>
|
||||||
|
<span>얼굴 또는 인물이 있는지 presence-only 신호를 제공합니다.</span>
|
||||||
|
<span>신원 식별 confidence가 아니라 위험 검토 필요성 신호입니다.</span>
|
||||||
|
</div>
|
||||||
|
<div class="evidence-row">
|
||||||
|
<strong>Ollama LLM 요약</strong>
|
||||||
|
<span>기존 evidence의 출처, 이유, confidence, URL만 보고 내부 운영자용 요약을 생성합니다.</span>
|
||||||
|
<span>위험 점수에는 직접 가산되지 않습니다.</span>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<div class="two-col">
|
||||||
|
<div class="panel">
|
||||||
|
<p class="eyebrow">Score and confidence</p>
|
||||||
|
<h2>위험 점수는 AI의 “확률”이 아니라 규칙 기반 triage 점수입니다.</h2>
|
||||||
|
<p>
|
||||||
|
evidence에는 confidence가 붙지만, 최종 riskScore는 `RiskScorer`가 evidence 유형별 가중치를 더해 0~100으로 제한한 값입니다.
|
||||||
|
따라서 “100점 = 100% 침해 확률”이 아니라 “검토 우선순위가 매우 높음”입니다.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
<div class="panel">
|
||||||
|
<p class="eyebrow">Band</p>
|
||||||
|
<h2>점수는 운영 큐 정렬을 위한 구간으로 변환됩니다.</h2>
|
||||||
|
<ul class="plain-list">
|
||||||
|
<li>70점 이상: 높음</li>
|
||||||
|
<li>30점 이상 70점 미만: 중간</li>
|
||||||
|
<li>30점 미만: 낮음</li>
|
||||||
|
<li>LLM 요약 evidence는 점수 가산에서 제외됩니다.</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<table class="score-table" aria-label="위험 점수 계산 규칙 요약">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>근거 유형</th>
|
||||||
|
<th>점수 반영 방식</th>
|
||||||
|
<th>의미</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>pHash 유사도</td>
|
||||||
|
<td>similarity 0.9 이상이면 +80, 그 외 의미 있는 지문 근거는 +30</td>
|
||||||
|
<td>기준 DB 또는 검색 결과 이미지와 시각적으로 매우 가깝다는 신호</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>얼굴/인물 존재</td>
|
||||||
|
<td>존재 신호가 있으면 +35</td>
|
||||||
|
<td>초상권/인물 이미지 검토가 필요할 수 있다는 신호</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Google full match</td>
|
||||||
|
<td>동일 이미지 매칭은 +45</td>
|
||||||
|
<td>웹에 같은 이미지가 존재한다는 강한 출처 후보</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Google partial/page match</td>
|
||||||
|
<td>부분 이미지 또는 페이지 매칭은 +35</td>
|
||||||
|
<td>일부 요소 또는 출처 페이지가 제출 이미지와 관련될 가능성</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Google visual match</td>
|
||||||
|
<td>시각적 유사 이미지는 +10</td>
|
||||||
|
<td>약한 참고 신호이며 단독으로 강한 판정 근거가 되지 않음</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Naver promoted 검색 결과</td>
|
||||||
|
<td>round(50 * confidence)</td>
|
||||||
|
<td>검색 결과가 기준 후보로 승격될 만큼 관련성이 있다고 본 경우</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Ollama LLM 요약</td>
|
||||||
|
<td>0점</td>
|
||||||
|
<td>판정 점수가 아니라 사람이 읽기 쉬운 출처 연결 설명</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<div class="section-head">
|
||||||
|
<div>
|
||||||
|
<p class="eyebrow">Trust boundaries</p>
|
||||||
|
<h2>신뢰도는 단계별로 다르게 해석해야 합니다.</h2>
|
||||||
|
</div>
|
||||||
|
<p>설명 자료에서는 “AI 신뢰도” 하나로 뭉뚱그리지 말고 아래처럼 말하는 편이 정확합니다.</p>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="three-col">
|
||||||
|
<article class="metric-card">
|
||||||
|
<strong>Evidence confidence</strong>
|
||||||
|
<p>Google score, fallback confidence, 검색 승격 confidence처럼 개별 근거에 붙는 값입니다.</p>
|
||||||
|
</article>
|
||||||
|
<article class="metric-card">
|
||||||
|
<strong>Similarity</strong>
|
||||||
|
<p>pHash 거리에서 나온 이미지 유사도입니다. ML 확률이 아니라 지문 거리 기반 수치입니다.</p>
|
||||||
|
</article>
|
||||||
|
<article class="metric-card">
|
||||||
|
<strong>Risk score</strong>
|
||||||
|
<p>여러 근거를 규칙으로 합산한 운영 우선순위 점수입니다. 법적 침해 확률이 아닙니다.</p>
|
||||||
|
</article>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<div class="two-col">
|
||||||
|
<div class="panel">
|
||||||
|
<p class="eyebrow">LLM guardrail</p>
|
||||||
|
<h2>LLM은 근거를 요약할 뿐, 새 결론을 만들지 못하게 설계되어 있습니다.</h2>
|
||||||
|
<p>
|
||||||
|
프롬프트는 “제공된 source evidence만 요약하라”, “최종 결정을 내리지 말라”, “근거 없는 주장을 추가하지 말라”로 제한됩니다.
|
||||||
|
요약 evidence에는 source URL 또는 source evidence id가 연결됩니다.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
<div class="mermaid-wrap">
|
||||||
|
<pre class="mermaid">
|
||||||
|
sequenceDiagram
|
||||||
|
participant DB as Evidence DB
|
||||||
|
participant LLM as Ollama 로컬 LLM
|
||||||
|
participant UI as 운영 콘솔
|
||||||
|
participant Human as 운영자
|
||||||
|
|
||||||
|
DB->>LLM: fingerprint, face, google, naver evidence 전달
|
||||||
|
LLM->>DB: 출처 연결 요약 evidence 저장
|
||||||
|
DB->>UI: 원문 evidence + 요약 evidence 표시
|
||||||
|
Human->>UI: 증거 사용/미사용 선택
|
||||||
|
Human->>UI: 승인/보류/반려 최종 판정
|
||||||
|
</pre>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<div class="section-head">
|
||||||
|
<div>
|
||||||
|
<p class="eyebrow">Visual proof</p>
|
||||||
|
<h2>운영 화면은 AI/ML 결과가 어떻게 사람이 검토할 수 있는 근거로 바뀌는지 보여줍니다.</h2>
|
||||||
|
</div>
|
||||||
|
<p>시연에서는 “AI가 판정했다”가 아니라 “AI/ML이 근거를 정리했고 운영자가 판정한다”는 화면 흐름을 보여주는 것이 좋습니다.</p>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="image-grid">
|
||||||
|
<figure class="image-card">
|
||||||
|
<img src="../web/operator-gui/pitch-assets/evidence-search.png" alt="검색 증거 화면">
|
||||||
|
<figcaption class="caption">검색 증거: 쿼리, 출처 URL, 이미지, 매칭 유형이 함께 남아 추적 가능한 검토 근거가 됩니다.</figcaption>
|
||||||
|
</figure>
|
||||||
|
<figure class="image-card">
|
||||||
|
<img src="../web/operator-gui/pitch-assets/provider-controls.png" alt="외부 검색 tool 활용 화면">
|
||||||
|
<figcaption class="caption">외부 검색 tool 활용: Google, Naver, Ollama의 활성 상태, 실패 상태, 사용량을 운영자가 확인합니다.</figcaption>
|
||||||
|
</figure>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<div class="section-head">
|
||||||
|
<div>
|
||||||
|
<p class="eyebrow">Recommended wording</p>
|
||||||
|
<h2>강조 문구는 기술의 힘과 판정 책임의 경계를 함께 담아야 합니다.</h2>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="language-box">
|
||||||
|
<div class="do">
|
||||||
|
<h3>추천 표현</h3>
|
||||||
|
<p>Copyrighter는 컴퓨터 비전 ML, 이미지 지문 유사도, 검색 증거 수집, 로컬 LLM 요약을 결합해 이미지 저작권 위험 검토 근거를 자동 생성합니다.</p>
|
||||||
|
<p>위험 점수는 AI가 내린 법적 결론이 아니라 evidence confidence, 매칭 유형, 이미지 유사도에 기반한 운영 triage 점수입니다.</p>
|
||||||
|
<p>최종 승인, 보류, 반려는 운영자가 수행하며 모든 근거는 출처와 함께 보존됩니다.</p>
|
||||||
|
</div>
|
||||||
|
<div class="dont">
|
||||||
|
<h3>피해야 할 표현</h3>
|
||||||
|
<p>AI가 저작권 침해 여부를 자동 판정합니다.</p>
|
||||||
|
<p>LLM이 원작자, 유명인, 침해 여부를 단독으로 확정합니다.</p>
|
||||||
|
<p>위험 점수 100은 침해 확률 100%를 의미합니다.</p>
|
||||||
|
<p>Naver에 이미지를 업로드해 역검색합니다.</p>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<div class="section-head">
|
||||||
|
<div>
|
||||||
|
<p class="eyebrow">Code evidence</p>
|
||||||
|
<h2>설명서의 근거가 되는 구현 지점입니다.</h2>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<ul class="code-list">
|
||||||
|
<li>src/rights_filter/integrations/cloud_vision_web_detection.py - Google Cloud Vision Web Detection 호출 및 evidence 매핑</li>
|
||||||
|
<li>src/rights_filter/analysis/llm_assistance.py - Ollama Generate API 기반 source-linked LLM 요약</li>
|
||||||
|
<li>src/rights_filter/analysis/fingerprints.py - SHA 및 pHash 이미지 지문, 해밍 거리 기반 유사도</li>
|
||||||
|
<li>src/rights_filter/analysis/face_person_detection.py - OpenCV Haar cascade 기반 얼굴/인물 존재 감지</li>
|
||||||
|
<li>src/rights_filter/analysis/risk_scoring.py - evidence 유형별 규칙 기반 위험 점수 산정</li>
|
||||||
|
<li>src/rights_filter/server/sqlite_store.py - evidence 저장, 외부 검색 tool 활용 상태, LLM 요약 자동 생성</li>
|
||||||
|
<li>docs/operations/image-rights-risk-filter.md - 외부 API, LLM, 데이터 경계 운영 정책</li>
|
||||||
|
</ul>
|
||||||
|
</section>
|
||||||
|
</main>
|
||||||
|
|
||||||
|
<footer class="shell">
|
||||||
|
Copyrighter AI/ML usage explainer. AI/ML 출력은 내부 검수 근거이며 최종 판정은 운영자가 수행합니다.
|
||||||
|
</footer>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
|
@ -0,0 +1,166 @@
|
||||||
|
---
|
||||||
|
date: 2026-05-25
|
||||||
|
topic: image-rights-review-enrichment
|
||||||
|
---
|
||||||
|
|
||||||
|
# Image Rights Review Enrichment
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
기존 이미지 권리 리스크 필터를 보강해 운영자가 한 신청을 상세 검토할 때 원본 이미지, 위험도, 내부 분석, Naver 검색 근거, Google Web Detection 근거, 내부 LLM 요약, 최종 판정 액션을 한 화면에서 확인하게 한다. 1차 보강의 중심은 상세 검토 화면이며, 검색 보강과 LLM은 운영자 판단을 돕는 근거 정리 역할로 제한한다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problem Frame
|
||||||
|
|
||||||
|
현재 필터 코어는 이미지 지문, 얼굴/사람 존재 신호, 외부 Web Detection, 점수화, 운영자 요약을 표현할 수 있다. 하지만 실제 운영자가 신청 단위로 "이 이미지를 상품화해도 되는가"를 판단하려면 여러 근거가 한 화면에 정리되어야 한다.
|
||||||
|
|
||||||
|
특히 한국 연예인, 방송/웹툰/게임 캐릭터, 팬아트, AI 생성 이미지처럼 검색 근거가 중요한 사례에서는 빈 기준 DB만으로는 초기에 탐지력이 부족하다. 운영자는 검색어를 직접 떠올리고 결과를 해석하는 시간을 줄여야 하며, 검색 결과가 왜 위험 신호인지 빠르게 확인해야 한다.
|
||||||
|
|
||||||
|
Naver 검색은 한국어 웹/이미지 근거를 찾는 데 유용하지만, 공식 이미지 검색 API는 이미지 업로드 역검색이 아니라 텍스트 쿼리 기반 검색이다. 따라서 Naver는 원본이나 축소 이미지를 보내는 역검색 채널이 아니라, 내부 분석과 LLM이 만든 한국어 후보 쿼리를 검색해 근거 URL과 이미지를 보강하는 채널로 다룬다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Actors
|
||||||
|
|
||||||
|
- A1. 신청자: 상품화할 이미지를 제출하지만 자동 분석 결과는 보지 않는다.
|
||||||
|
- A2. 운영자: 상세 검토 화면에서 위험 근거를 확인하고 승인, 보류, 반려를 최종 선택한다.
|
||||||
|
- A3. 권리 리스크 필터: 이미지 분석, 검색 근거 수집, 점수화, 요약 생성을 수행한다.
|
||||||
|
- A4. 내부 LLM: 후보 검색어 생성, 검색 결과 구조화, 운영자용 요약을 수행하되 점수와 최종 판정은 하지 않는다.
|
||||||
|
- A5. Naver 검색 API: 한국어 텍스트 쿼리 기반 이미지/웹 검색 결과를 제공한다.
|
||||||
|
- A6. Google Cloud Vision Web Detection: 승인된 조건에서 이미지의 웹 엔티티, 매칭 이미지, 유사 이미지, 포함 페이지 근거를 제공한다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Flows
|
||||||
|
|
||||||
|
- F1. 검색 보강 분석
|
||||||
|
- **Trigger:** 신청 이미지의 위험도 분석이 실행된다.
|
||||||
|
- **Actors:** A3, A4, A5, A6
|
||||||
|
- **Steps:** 필터가 내부 분석 근거를 만들고, 내부 LLM이 이미지 라벨/텍스트/기준 DB 후보/기존 근거를 바탕으로 한국어 검색 쿼리를 생성한다. Naver 검색 API는 텍스트 쿼리로 한국어 이미지/웹 근거를 반환한다. 승인된 경우 Google Web Detection도 축소 파생 이미지를 통해 웹 근거를 반환한다. 모든 근거는 출처, 쿼리, 시간, 신뢰도, 실패 사유와 함께 저장된다.
|
||||||
|
- **Outcome:** 운영자 상세 검토 화면에서 사용할 수 있는 검색 근거와 요약이 생성된다.
|
||||||
|
- **Covered by:** R1, R2, R3, R4, R5, R6, R7, R12
|
||||||
|
|
||||||
|
- F2. 상세 검토
|
||||||
|
- **Trigger:** 운영자가 검토 큐에서 신청 하나를 연다.
|
||||||
|
- **Actors:** A2, A3, A4
|
||||||
|
- **Steps:** 화면은 신청 이미지, 위험 점수와 등급, 상위 위험 사유, 내부 지문/기준 DB 근거, Naver 근거, Google 근거, 내부 LLM 요약, 분석 실패/스킵 사유를 함께 보여준다. 운영자는 근거 링크를 확인하고 승인, 보류, 반려 중 하나를 수동으로 선택하며 메모를 남긴다.
|
||||||
|
- **Outcome:** 자동 분석은 추천으로만 사용되고 최종 상태는 운영자 판정으로 기록된다.
|
||||||
|
- **Covered by:** R8, R9, R10, R11, R13, R14
|
||||||
|
|
||||||
|
- F3. 판정 기반 기준 DB 누적
|
||||||
|
- **Trigger:** 운영자가 신청을 반려하거나 위험 엔티티를 수동 등록한다.
|
||||||
|
- **Actors:** A2, A3
|
||||||
|
- **Steps:** 반려 이미지의 지문과 판정 사유는 자동 누적 후보 또는 자동 기준 항목으로 남고, 운영자가 명시적으로 등록한 유명인/캐릭터/작품/별칭/정책 메모는 수동 기준 항목으로 남는다. 자동 항목과 수동 항목은 출처가 분리되어야 하며 정정 시 비활성화할 수 있어야 한다.
|
||||||
|
- **Outcome:** 빈 기준 DB에서 시작해도 운영자의 실제 판정이 이후 검토 품질을 높인다.
|
||||||
|
- **Covered by:** R15, R16, R17
|
||||||
|
|
||||||
|
- F4. 정정 및 오염 방지
|
||||||
|
- **Trigger:** 운영자가 이전 반려나 기준 DB 항목이 잘못되었다고 판단한다.
|
||||||
|
- **Actors:** A2, A3
|
||||||
|
- **Steps:** 운영자는 해당 판정이나 기준 항목을 정정하고, 그 판정에서 파생된 자동 기준 항목을 비활성화하거나 재검토 대상으로 돌린다.
|
||||||
|
- **Outcome:** 잘못된 반려가 미래 신청의 위험도를 계속 올리는 오염을 줄인다.
|
||||||
|
- **Covered by:** R16, R17, R18
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
**상세 검토 화면**
|
||||||
|
- R1. 시스템은 신청 단위 상세 검토 화면에서 원본 또는 내부 검토용 이미지, 0-100 위험 점수, 위험 등급, 상위 위험 사유를 함께 보여줘야 한다.
|
||||||
|
- R2. 상세 검토 화면은 근거를 내부 지문/기준 DB, Naver 검색, Google Web Detection, 내부 LLM 요약, 분석 실패/스킵 사유로 구분해 보여줘야 한다.
|
||||||
|
- R3. 화면은 운영자가 근거 URL, 이미지 URL, 썸네일, 출처 페이지, 검색 쿼리, 검색 시각을 확인할 수 있게 해야 한다.
|
||||||
|
- R4. 상위 위험 사유는 운영자가 바로 이해할 수 있는 문장으로 제공하되, 원천 근거와 연결되어야 한다.
|
||||||
|
- R5. 분석 실패, 외부 API 비활성화, 쿼터 초과, 검색 결과 없음은 숨기지 않고 상세 화면에 운영 사유로 표시해야 한다.
|
||||||
|
|
||||||
|
**Naver 검색 보강**
|
||||||
|
- R6. Naver 검색은 공식 검색 API를 통한 텍스트 쿼리 기반 근거 수집으로 제한한다.
|
||||||
|
- R7. Naver에는 원본 이미지나 축소 파생 이미지를 업로드하지 않으며, 공식 문서상 확인된 이미지 업로드 역검색 기능이 없는 것으로 취급한다.
|
||||||
|
- R8. Naver 검색 쿼리는 내부 분석 결과, OCR/라벨 후보, 기준 DB 후보, Google 근거, 운영자 등록 별칭을 바탕으로 생성할 수 있다.
|
||||||
|
- R9. Naver 검색 결과는 위험 점수를 직접 결정하지 않고, 제목, 설명, 이미지 URL, 썸네일, 원문 링크, 순위, 쿼리와 함께 근거 후보로 저장해야 한다.
|
||||||
|
- R10. Naver 검색 결과가 한국 연예인, 작품명, 캐릭터명, 방송/웹툰/게임명과 강하게 연결될 때만 위험 사유로 승격해야 한다.
|
||||||
|
|
||||||
|
**내부 LLM 사용**
|
||||||
|
- R11. 내부 LLM의 역할은 후보 쿼리 생성, 검색 결과 구조화, 중복/상충 근거 정리, 운영자용 요약 작성으로 제한한다.
|
||||||
|
- R12. 내부 LLM 출력은 위험 점수의 직접 입력값이나 최종 판정 근거가 아니라, 출처가 연결된 근거를 사람이 읽기 쉽게 정리한 보조 설명이어야 한다.
|
||||||
|
- R13. 내부 LLM 요약은 어떤 근거 URL과 검색 결과에서 나온 주장인지 추적 가능해야 하며, 출처 없는 추정은 위험 사유로 사용하지 않는다.
|
||||||
|
- R14. 원본 이미지, 신청자 개인정보, 외부 전송이 금지된 자료를 외부 LLM에 보내지 않는다. 이 보강 범위에서 LLM은 내부 운영 환경 또는 동등한 비보관 계약이 확인된 환경만 허용한다.
|
||||||
|
|
||||||
|
**운영자 판정과 기준 DB**
|
||||||
|
- R15. 상세 검토 화면의 승인, 보류, 반려 액션은 자동 분석과 분리되어야 하며 운영자가 명시적으로 선택해야 한다.
|
||||||
|
- R16. 반려 판정은 이후 유사 이미지 탐지를 위한 기준 DB 누적 후보 또는 자동 기준 항목을 만들 수 있어야 한다.
|
||||||
|
- R17. 자동 누적 항목, 운영자 수동 등록 항목, 검색 결과 기반 후보는 출처와 신뢰 수준이 구분되어야 한다.
|
||||||
|
- R18. 잘못된 판정이나 잘못된 기준 항목은 비활성화, 정정, 재검토 처리가 가능해야 한다.
|
||||||
|
- R19. 신청자 화면에는 위험 점수, 검색 근거, 내부 LLM 요약, 외부 API 사용 여부를 노출하지 않는다.
|
||||||
|
|
||||||
|
**안전 경계**
|
||||||
|
- R20. Google Image Search, Google Lens, Naver 웹 UI를 자동화하거나 스크래핑하지 않는다.
|
||||||
|
- R21. Google Web Detection은 기존 요구사항과 동일하게 계약, 데이터 사용 조건, 메타데이터 로그 범위, 비보관 조건이 확인된 경우에만 사용한다.
|
||||||
|
- R22. 얼굴/사람 존재 신호는 초상권 검토 신호로만 사용하며, 얼굴 인식, 유명인 식별, 생체 템플릿 저장으로 확장하지 않는다.
|
||||||
|
- R23. 검색 실패나 LLM 실패는 기존 고위험 근거를 낮추는 데 사용하지 않는다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Acceptance Examples
|
||||||
|
|
||||||
|
- AE1. **Covers R1, R2, R6, R8, R10, R11.** 기준 DB가 비어 있는 상태에서 한국 연예인으로 의심되는 이미지가 들어오면, 내부 LLM은 한국어 후보 쿼리를 만들고 Naver 결과에서 인물명/작품명/이미지 링크가 연결된 근거를 상세 검토 화면에 표시한다.
|
||||||
|
- AE2. **Covers R6, R7, R9.** Naver 검색 보강을 실행해도 원본 이미지나 축소 이미지는 Naver에 전송되지 않고, 텍스트 쿼리와 검색 결과 메타데이터만 근거로 남는다.
|
||||||
|
- AE3. **Covers R11, R12, R13.** 내부 LLM이 "유명 캐릭터로 보임"이라고 요약하더라도 연결된 URL이나 검색 결과가 없으면 그 문장은 위험 점수 사유가 아니라 참고 요약으로만 표시된다.
|
||||||
|
- AE4. **Covers R5, R21, R23.** Naver 쿼터가 초과되거나 Google Web Detection이 비활성화되어도 내부 지문/기준 DB 근거가 고위험이면 점수는 낮아지지 않고, 실패 또는 스킵 사유가 운영자에게 표시된다.
|
||||||
|
- AE5. **Covers R15, R16, R17.** 운영자가 반려를 선택하면 신청 상태는 운영자 판정으로만 바뀌고, 해당 이미지 지문은 자동 누적 출처로 구분되어 이후 유사 이미지 탐지에 쓰인다.
|
||||||
|
- AE6. **Covers R18.** 나중에 반려가 잘못된 것으로 정정되면 그 판정에서 파생된 자동 기준 항목은 비활성화되어 이후 신청의 위험도를 올리지 않는다.
|
||||||
|
- AE7. **Covers R19.** 신청자는 자신의 신청 상태만 볼 수 있고, 위험 점수, Naver/Google 근거, 내부 LLM 요약, 분석 실패 사유는 볼 수 없다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
- 운영자는 한 신청의 상세 화면만 보고 고위험 이미지의 주요 근거를 빠르게 파악할 수 있다.
|
||||||
|
- 한국어 검색 근거가 필요한 연예인, 방송, 웹툰, 게임, 캐릭터 사례에서 초기 기준 DB 부족을 완화한다.
|
||||||
|
- 내부 LLM은 운영자의 검색/요약 시간을 줄이지만, 출처 없는 추정이나 환각이 점수와 판정을 오염시키지 않는다.
|
||||||
|
- 반려와 정정이 기준 DB 품질을 점진적으로 높이되, 잘못된 판정이 장기 오염으로 남지 않는다.
|
||||||
|
- 이후 계획 단계가 상세 검토 화면의 근거 범위, LLM 역할, Naver 사용 경계, 신청자 비노출 정책을 새로 정의하지 않아도 된다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scope Boundaries
|
||||||
|
|
||||||
|
- Naver 이미지 업로드 역검색 기능은 포함하지 않는다.
|
||||||
|
- Google Image Search, Google Lens, Naver 검색 웹 UI 자동화와 스크래핑은 포함하지 않는다.
|
||||||
|
- 내부 LLM을 법적 판단자, 점수 산정자, 자동 반려 결정자로 사용하지 않는다.
|
||||||
|
- 외부 LLM 사용은 1차 보강 범위에서 제외한다. 예외가 필요하면 별도 계약/데이터 처리 검토 후 새 요구사항으로 다룬다.
|
||||||
|
- 신청자에게 자동 분석 결과를 설명하거나 이의제기 UI를 제공하는 기능은 포함하지 않는다.
|
||||||
|
- 기준 DB 전체 관리 화면, 대량 등록, 권한별 워크플로, 고급 통계 화면은 후속 보강으로 둔다.
|
||||||
|
- 브랜드 로고, 상표, 스톡 이미지 도용 전용 탐지기는 이번 보강의 중심 범위가 아니다. 검색 중 강한 근거가 발견되면 운영자 근거로만 표시한다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Decisions
|
||||||
|
|
||||||
|
- 상세 검토 화면 중심: 1차 보강은 탐지 모델 자체보다 운영자가 근거를 보고 판정할 수 있는 화면을 우선한다.
|
||||||
|
- 검색 보강은 한국어 근거 확보용: Naver는 한국어 텍스트 쿼리 기반 이미지/웹 검색 근거를 보강하는 채널로 사용한다.
|
||||||
|
- 내부 LLM은 보조자: LLM은 쿼리를 만들고 결과를 정리하지만 점수와 최종 판정의 권한을 갖지 않는다.
|
||||||
|
- 근거 출처 우선: 모든 요약과 위험 사유는 가능한 한 원천 근거 URL, 검색 쿼리, 분석 출처와 연결되어야 한다.
|
||||||
|
- 오염 방지 내장: 반려 누적으로 기준 DB를 키우되, 정정과 비활성화 없이는 자동 누적이 위험하다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dependencies / Assumptions
|
||||||
|
|
||||||
|
- Naver 공식 이미지 검색 API 문서는 검색어와 검색 조건을 쿼리 스트링으로 전달하는 REST API로 설명하며, 검색 API 하루 호출 한도는 25,000회로 안내한다. 참고: https://developers.naver.com/docs/serviceapi/search/image/image.md
|
||||||
|
- Naver 검색 API 제품 문서도 이미지, 웹문서, 블로그, 뉴스 등 검색 결과 제공과 25,000/일 처리한도를 안내한다. 참고: https://developers.naver.com/products/service-api/search/search.md
|
||||||
|
- Naver API 이용은 제공조건, 이용 가능 횟수, 클라이언트 아이디 관리, 약관 준수가 전제다. 참고: https://developers.naver.com/products/terms
|
||||||
|
- Google Cloud Vision Data Usage FAQ는 온라인 즉시 응답 작업의 이미지 데이터는 메모리에서 처리되고 디스크에 저장되지 않으며, 일부 요청 메타데이터는 임시 로그될 수 있다고 설명한다. 참고: https://docs.cloud.google.com/vision/docs/data-usage
|
||||||
|
- Google Cloud Vision Web Detection은 웹 엔티티, 매칭 이미지, 유사 이미지, 이미지가 포함된 페이지, best guess label을 반환할 수 있다. 참고: https://docs.cloud.google.com/vision/docs/detecting-web
|
||||||
|
- 실제 관리자 앱, DB, 인증/권한 구조는 현재 작업공간에 없으므로 상세 검토 화면의 최종 UI와 저장 방식은 후속 계획에서 대상 앱 구조에 맞춰야 한다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Outstanding Questions
|
||||||
|
|
||||||
|
### Deferred to Planning
|
||||||
|
|
||||||
|
- [Affects R1-R5][Technical] 상세 검토 화면을 실제 웹 관리자 화면으로 만들지, 현재 코어에 맞춘 API/CLI 시뮬레이션으로 먼저 만들지 결정해야 한다.
|
||||||
|
- [Affects R6-R10][Needs research] Naver 검색 API의 실제 응답 품질, 결과 중복, 성인/부적절 결과, 쿼터 정책을 파일럿 샘플로 검증해야 한다.
|
||||||
|
- [Affects R11-R14][Technical] 내부 LLM의 배포 위치, 로그 보관, 프롬프트/출력 감사, 원본 이미지 입력 허용 여부를 운영 환경 기준으로 확정해야 한다.
|
||||||
|
- [Affects R16-R18][Technical] 반려 시 기준 DB에 즉시 자동 반영할지, 자동 후보로 만들고 운영자가 확정하게 할지 실제 운영 리스크를 보고 정해야 한다.
|
||||||
|
|
@ -0,0 +1,168 @@
|
||||||
|
---
|
||||||
|
date: 2026-05-25
|
||||||
|
topic: image-rights-risk-filter
|
||||||
|
---
|
||||||
|
|
||||||
|
# Image Rights Risk Filter
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
신청 이미지를 주기적으로 전수 분석해 운영자에게 0-100 위험도와 검토 근거를 제공하는 내부 권리 리스크 필터를 만든다. 시스템은 승인, 보류, 반려를 자동 결정하지 않고 운영자 판단을 돕는 추천 도구로 동작한다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problem Frame
|
||||||
|
|
||||||
|
이미지를 상품화하는 운영자는 신청자가 실제 권리자인지, 이미지가 유명인 초상이나 저작권 있는 캐릭터/작품을 포함하는지 1차로 걸러야 한다. 현재 이 판단을 사람이 처음부터 모두 확인하면 검토량이 커지고, 권리 없는 이미지가 상품화되는 법적·운영 리스크가 생긴다.
|
||||||
|
|
||||||
|
특히 국내 연예인, 아이돌, 배우, 스포츠 스타, 한국 방송·웹툰·게임·캐릭터 이미지는 신청자가 권리 없이 제출할 가능성이 있다. 반대로 자동 탐지가 틀릴 수 있으므로 시스템이 법적 판단이나 최종 심사를 대신하면 오탐 비용과 책임이 커진다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Actors
|
||||||
|
|
||||||
|
- A1. 신청자: 상품화할 이미지를 제출하지만 자동 분석 결과는 보지 않는다.
|
||||||
|
- A2. 운영자: 분석 점수와 근거를 확인하고 승인, 보류, 반려 중 최종 상태를 결정한다.
|
||||||
|
- A3. 필터링 시스템: 배치로 신청 이미지를 분석하고 위험도, 사유, 근거를 생성한다.
|
||||||
|
- A4. 외부 이미지 분석 API: 계약상 데이터 비저장 조건을 충족할 때만 축소본을 분석해 웹 탐지 근거를 반환한다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Flows
|
||||||
|
|
||||||
|
- F1. 배치 위험도 분석
|
||||||
|
- **Trigger:** 새 신청 이미지가 쌓이고 정기 배치가 실행된다.
|
||||||
|
- **Actors:** A3, A4
|
||||||
|
- **Steps:** 필터링 시스템은 모든 신청 이미지를 대상으로 내부 분석을 수행하고, 조건이 충족된 외부 API에는 EXIF가 제거된 축소본만 전송한다. 내부 분석, 외부 검색 근거, 기존 기준 DB, 과거 판정 이력을 합산해 위험도와 사유를 생성한다.
|
||||||
|
- **Outcome:** 각 신청에는 0-100 위험도, 사유 목록, 운영 근거가 붙는다.
|
||||||
|
- **Covered by:** R1, R5, R6, R8, R9, R11, R12, R13, R15, R25, R26, R27
|
||||||
|
|
||||||
|
- F2. 운영자 검토
|
||||||
|
- **Trigger:** 운영자가 신청 목록 또는 검토 큐를 확인한다.
|
||||||
|
- **Actors:** A2, A3
|
||||||
|
- **Steps:** 운영자는 위험도 점수, 사유, 근거 URL, 감지된 엔티티, 유사 이미지 정보를 확인한다. 시스템 추천을 참고하되 직접 승인, 보류, 반려 중 하나를 선택한다.
|
||||||
|
- **Outcome:** 최종 판정과 메모가 신청 기록에 남는다.
|
||||||
|
- **Covered by:** R12, R13, R16, R17, R18, R19
|
||||||
|
|
||||||
|
- F3. 기준 DB 누적
|
||||||
|
- **Trigger:** 운영자가 이미지를 반려하거나, 운영자가 위험 엔티티를 직접 등록한다.
|
||||||
|
- **Actors:** A2, A3
|
||||||
|
- **Steps:** 반려 이미지는 자동으로 기준 DB에 누적되고, 운영자는 별도로 엔티티명, 별칭, 샘플 이미지, 정책 메모, 예외 조건을 등록할 수 있다. 잘못 누적된 항목은 이후 비활성화하거나 정정할 수 있어야 한다.
|
||||||
|
- **Outcome:** 이후 신청 이미지의 유사도와 위험도 산정에 누적된 지식이 반영된다.
|
||||||
|
- **Covered by:** R20, R21, R22, R23, R24
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
**분석 범위**
|
||||||
|
- R1. 시스템은 신청 이미지를 대량 배치 방식으로 전수 분석해야 한다.
|
||||||
|
- R2. 1차 위험 유형은 유명인/연예인 초상권 위험과 저작권 캐릭터/작품 이미지 위험으로 둔다.
|
||||||
|
- R3. 1차 튜닝 범위는 한국 중심으로 하며, 국내 연예인, K-pop, 한국 방송·영화·웹툰·게임·캐릭터를 우선한다.
|
||||||
|
- R4. 글로벌 IP, 브랜드, 로고, 상표, 스톡 이미지 도용은 1차 핵심 범위가 아니지만 외부 검색 결과에서 강한 근거가 우연히 발견되면 사유로 기록할 수 있다. 단, 이를 위한 별도 탐지기, 튜닝, 관리 화면은 1차 범위에 포함하지 않는다.
|
||||||
|
|
||||||
|
**내부 분석**
|
||||||
|
- R5. 시스템은 이미지 해시와 유사 해시를 생성해 동일 이미지, 변형 이미지, 과거 반려 이미지, 운영자 등록 샘플과 비교할 수 있어야 한다.
|
||||||
|
- R6. 시스템은 얼굴 또는 사람 존재 여부를 탐지해 초상권 검토 신호로 사용해야 한다.
|
||||||
|
- R7. 얼굴 또는 사람이 감지됐다는 사실만으로 고위험 처리하지 않고, 유명인 또는 권리물 근거가 함께 있을 때 위험도를 크게 올려야 한다.
|
||||||
|
|
||||||
|
**외부 검색 및 데이터 제한**
|
||||||
|
- R8. 외부 API 사용은 전송 이미지 콘텐츠가 저장·학습·공개·제3자 공유되지 않는다는 조건과 요청 메타데이터 로그 범위·보관 기간이 법무·개인정보 기준에서 허용된다는 조건이 확인된 경우에만 허용한다.
|
||||||
|
- R9. 외부 API에는 내부 원본이 아니라 EXIF가 제거되고 운영 기준상 허용된 최대 해상도 이하로 줄어든 축소본만 전송한다.
|
||||||
|
- R10. Google 이미지 검색 또는 Google Lens 웹 UI 자동화와 스크래핑은 사용하지 않는다.
|
||||||
|
- R11. Google Cloud Vision `WEB_DETECTION`은 데이터 사용 조건 확인 후 조건부 후보로 사용하되, 비동기 오프라인 배치 방식은 사용하지 않는다.
|
||||||
|
|
||||||
|
**위험도와 근거**
|
||||||
|
- R12. 분석 결과는 0-100 위험도 점수와 사유 목록으로 제공해야 한다.
|
||||||
|
- R13. 위험 사유는 운영자가 이해할 수 있는 단위로 제공해야 하며, 얼굴/사람 감지, 해시 유사도, 웹 엔티티, 일치·유사 이미지, 근거 URL, 과거 반려 이력 등을 포함할 수 있어야 한다.
|
||||||
|
- R14. 팬아트 또는 AI 생성 이미지도 특정 유명인, 캐릭터, 작품과 연결되는 강한 근거가 있으면 고위험으로 분류할 수 있어야 한다.
|
||||||
|
- R15. 분석 실패나 애매한 경우는 낮은 위험도로 숨기지 않고 중간 위험도 사유로 표시해야 한다. 단, 실패 사유는 이미 탐지된 고위험 신호나 기존 위험도를 낮추는 근거로 사용하지 않는다.
|
||||||
|
|
||||||
|
**운영자 검토**
|
||||||
|
- R16. 시스템은 신청 상태를 자동으로 변경하지 않고 추천만 제공해야 한다.
|
||||||
|
- R17. 운영자의 최종 상태는 승인, 보류, 반려로 제한한다.
|
||||||
|
- R18. 자동 분석 결과는 신청자에게 노출하지 않고 운영자 내부 검토용으로만 사용한다.
|
||||||
|
- R19. 운영자는 최종 판정과 판정 메모를 남길 수 있어야 한다.
|
||||||
|
|
||||||
|
**기준 DB와 운영 지식**
|
||||||
|
- R20. 기준 DB는 처음 비어 있는 상태로 시작할 수 있어야 하며, 검색 결과, 운영자 등록, 반려 판정을 통해 누적되어야 한다.
|
||||||
|
- R21. 운영자는 위험 엔티티를 등록할 때 엔티티명, 별칭, 유형, 관련 키워드, 이미지 수준 샘플 지문, 정책 메모, 예외 조건을 관리할 수 있어야 한다. 얼굴 임베딩이나 특정 개인 식별용 생체 템플릿은 샘플 지문으로 저장하지 않는다.
|
||||||
|
- R22. 반려 판정 이미지는 이후 유사 이미지 탐지 기준으로 자동 누적되어야 한다.
|
||||||
|
- R23. 자동 누적 항목과 수동 등록 항목은 구분되어야 한다.
|
||||||
|
- R24. 오판으로 인한 기준 DB 오염을 줄이기 위해 누적 항목을 비활성화하거나 정정할 수 있어야 한다.
|
||||||
|
|
||||||
|
**근거 보관과 운영 안정성**
|
||||||
|
- R25. 시스템은 위험도 점수, 위험 사유, 해시/유사 해시, 감지 결과, 웹 엔티티, 근거 URL, 분석 시각, 분석 버전, 외부 API 사용 여부, 운영자 판정과 메모를 운영 근거로 보관해야 한다.
|
||||||
|
- R26. 내부 원본 이미지, 축소본, 분석용 파생 파일, 이미지 지문, 운영 근거 데이터는 접근 권한, 보관 기간, 삭제, 정정 정책의 적용 대상이어야 한다.
|
||||||
|
- R27. 외부 API 비용과 장애에 대비해 재분석 방지, 사용량 한도, 실패 재시도, 외부 API 비활성화 운영 모드를 제공해야 한다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Acceptance Examples
|
||||||
|
|
||||||
|
- AE1. **Covers R5, R12, R13, R22.** 과거 반려 이미지와 유사한 신청 이미지가 들어오면, 배치 분석 후 높은 해시 유사도 사유와 함께 위험도가 상승하고 운영자 화면에 근거가 표시된다.
|
||||||
|
- AE2. **Covers R6, R7, R12.** 사람 얼굴이 있는 일반 사진이 들어오면 얼굴 감지 사유는 남지만, 유명인 또는 권리물 근거가 없으면 얼굴 감지만으로 고위험이 되지 않는다.
|
||||||
|
- AE3. **Covers R8, R9, R10, R11.** 외부 API 조건이 확인되지 않았거나 비활성화된 상태에서는 원본·축소본 모두 외부로 전송하지 않고 내부 분석 결과만으로 점수를 산정한다.
|
||||||
|
- AE4. **Covers R14.** AI 생성풍 이미지가 특정 캐릭터명, 작품명, 일치 이미지 URL 등 강한 근거와 함께 탐지되면 사진 원본 복제가 아니어도 고위험 사유가 붙는다.
|
||||||
|
- AE5. **Covers R15.** 이미지 손상이나 API 장애로 분석에 실패하면 해당 신청은 낮은 위험도로 처리되지 않고, 분석 실패 사유와 함께 중간 위험도 검토 대상으로 남는다. 이미 다른 분석에서 고위험 근거가 발견된 경우 실패 사유는 그 위험도를 낮추지 않는다.
|
||||||
|
- AE6. **Covers R16, R17, R18.** 위험도 90점 이미지가 분석되어도 시스템은 상태를 반려로 바꾸지 않고, 운영자만 승인, 보류, 반려 중 하나를 선택할 수 있다.
|
||||||
|
- AE7. **Covers R20, R21, R23, R24.** 운영자가 유명인 또는 캐릭터 엔티티를 수동 등록하면 이후 점수 계산에 반영되고, 잘못 등록된 항목은 비활성화해 새 분석에 반영되지 않도록 할 수 있다.
|
||||||
|
- AE8. **Covers R21, R26.** 운영자가 유명인 샘플 이미지를 등록하면 시스템은 이미지 수준 지문과 운영 메모를 관리하되, 특정 개인을 식별하기 위한 얼굴 생체 템플릿은 저장하지 않는다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
- 운영자는 신청 이미지를 처음부터 모두 수동 조사하지 않고도 고위험 후보를 우선 검토할 수 있다.
|
||||||
|
- 권리 없는 유명인·캐릭터 이미지가 상품화 단계로 넘어갈 가능성이 낮아진다.
|
||||||
|
- 위험도 산정 근거가 남아 운영자 판정, 재검토, 정책 개선에 사용할 수 있다.
|
||||||
|
- 빈 기준 DB로 시작해도 외부 검색 근거와 운영자 판정 누적으로 시간이 갈수록 내부 기준이 강화된다.
|
||||||
|
- 자동 판정이 아니라 추천 도구로 동작해 오탐으로 인한 자동 반려 리스크를 줄인다.
|
||||||
|
- 후속 계획 단계가 분석 범위, 운영 흐름, 데이터 보관, 외부 API 제한, 비목표를 새로 invent하지 않아도 된다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scope Boundaries
|
||||||
|
|
||||||
|
- 자동 승인, 자동 보류, 자동 반려 상태 변경은 포함하지 않는다.
|
||||||
|
- 신청자에게 위험 점수, 검색 근거, 자동 분석 사유를 노출하지 않는다.
|
||||||
|
- Google 이미지 검색 또는 Google Lens 웹 UI 자동화와 스크래핑은 포함하지 않는다.
|
||||||
|
- 글로벌 IP를 1차 튜닝 중심으로 삼지 않는다.
|
||||||
|
- 브랜드, 로고, 상표, 스톡 이미지 도용을 위한 별도 탐지기, 튜닝, 관리 화면은 1차 핵심 범위에 포함하지 않는다.
|
||||||
|
- 특정 개인 얼굴 인식 자체를 내부에서 직접 수행하지 않는다.
|
||||||
|
- 이 기능은 법률 자문이나 최종 권리 판단을 대체하지 않는다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Decisions
|
||||||
|
|
||||||
|
- 검토 큐 분류 중심: 자동 차단보다 운영자가 볼 위험 후보를 정렬하고 설명하는 것이 1차 목표다.
|
||||||
|
- 배치 전수 분석: 신청자 제출 흐름을 지연시키지 않고도 모든 신청 이미지에 위험 근거를 붙인다.
|
||||||
|
- 한국 중심 우선순위: 국내 연예인, K-pop, 방송, 웹툰, 게임, 캐릭터 위험을 먼저 잡는다.
|
||||||
|
- 내부 원본 저장과 외부 축소본 전송 분리: 원본은 내부 운영 저장소에 보관하되, 외부 API에는 제한된 파생 이미지만 보낸다.
|
||||||
|
- 외부 API 조건부 사용: 공식 API의 데이터 사용 조건과 계약이 확인된 경우에만 사용한다.
|
||||||
|
- 운영 지식 누적: 초기 DB가 비어 있어도 반려 판정과 운영자 등록으로 필터 품질을 개선한다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dependencies / Assumptions
|
||||||
|
|
||||||
|
- Google Cloud Vision은 온라인 즉시 응답 방식에서 이미지 데이터를 메모리 처리하고 디스크에 저장하지 않는다고 공식 문서에 설명되어 있으나, 일부 요청 메타데이터는 임시 로그될 수 있다. 실제 사용 전 계약 조건, 메타데이터 로그 범위, 조직의 법무·개인정보 기준 확인이 필요하다. 참고: https://docs.cloud.google.com/vision/docs/data-usage
|
||||||
|
- Google Cloud Vision의 얼굴 탐지는 특정 개인 식별을 지원하지 않으므로, 유명인 판단은 웹 엔티티, 유사 이미지, 운영자 등록 지식, 판정 이력 같은 근거 조합으로 다뤄야 한다. 참고: https://docs.cloud.google.com/vision/docs/detecting-faces
|
||||||
|
- Google Cloud Vision의 웹 탐지는 웹 엔티티, 일치 이미지, 유사 이미지, 이미지가 포함된 페이지 같은 근거를 반환할 수 있다. 참고: https://docs.cloud.google.com/vision/docs/detecting-web
|
||||||
|
- 외부 API 비용과 쿼터는 신청량에 따라 커질 수 있으므로 사용량 제한과 비활성화 모드가 운영상 필요하다.
|
||||||
|
- 반려 이미지 자동 누적은 필터 성능을 높이지만, 잘못된 반려가 기준 DB를 오염시킬 수 있어 정정 흐름이 필요하다.
|
||||||
|
- 내부 원본 이미지 보관을 허용하더라도 원본, 축소본, 파생 파일, 이미지 지문은 각각 별도의 보관·접근·삭제 정책을 가져야 한다.
|
||||||
|
- 샘플 이미지 지문은 이미지 유사도와 중복 탐지를 위한 값으로 한정하고, 특정 개인 얼굴을 식별하기 위한 생체 정보 저장소로 확장하지 않는다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Outstanding Questions
|
||||||
|
|
||||||
|
### Deferred to Planning
|
||||||
|
|
||||||
|
- [Affects R5, R12][Technical] 이미지 해시와 유사 해시의 정확도 기준, 점수 가중치, 재분석 주기를 정해야 한다.
|
||||||
|
- [Affects R6, R7][Technical] 얼굴/사람 탐지 모델 또는 API를 어떤 방식으로 구성할지 정해야 한다.
|
||||||
|
- [Affects R8, R11][Needs research] Google Cloud Vision 사용 전 계약, DPA, 이미지 콘텐츠 보관, 요청 메타데이터 로그, 지역 설정 조건을 최종 확인해야 한다.
|
||||||
|
- [Affects R12, R13][Technical] 위험도 0-100 산식과 운영자 화면의 사유 우선순위를 정해야 한다.
|
||||||
|
- [Affects R25, R26][Technical] 원본 이미지, 축소본, 파생 파일, 이미지 지문, 근거 URL, 판정 이력의 보관 기간, 삭제 정책, 접근 권한을 정해야 한다.
|
||||||
|
|
@ -0,0 +1,148 @@
|
||||||
|
---
|
||||||
|
date: 2026-05-26
|
||||||
|
topic: evidence-quality-watchlist
|
||||||
|
---
|
||||||
|
|
||||||
|
# Evidence Quality And Watchlist Growth
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
근거 품질 관리와 기준 DB 성장을 하나의 운영 루프로 묶는다. 운영자는 케이스 판정을 먼저 내리고, 보류 또는 반려된 케이스에서 사용된 근거는 자동으로 주의 후보가 되어 이후 유사 케이스를 강하게 검출한다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problem Frame
|
||||||
|
|
||||||
|
현재 시스템은 Google, Naver, 내부 지문, 얼굴 영역 웹 근거를 모아 운영자에게 보여줄 수 있다. 하지만 검색 결과와 약한 근거가 많아질수록 운영자는 어떤 근거를 실제 판단에 써야 하는지 구분해야 하고, 잘못된 근거가 기준 DB에 섞이면 이후 분석 품질이 흔들릴 수 있다.
|
||||||
|
|
||||||
|
반대로 이 프로젝트의 운영 목적은 최대한 많은 위험 후보를 잡아 나중의 파급효과를 줄이는 것이다. 따라서 보류와 반려 케이스를 적극적으로 다음 탐지에 반영하되, 확정 기준 DB와 주의 후보를 구분해 운영자가 근거의 출처와 확정 수준을 볼 수 있어야 한다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Actors
|
||||||
|
|
||||||
|
- A1. 운영자: 근거를 검토하고 케이스를 승인, 보류, 반려 중 하나로 최종 판정한다.
|
||||||
|
- A2. 권리 리스크 필터: 근거를 수집하고, 근거 상태와 케이스 판정을 바탕으로 위험도와 후보를 갱신한다.
|
||||||
|
- A3. 기준 DB 관리자: 주의 후보를 확정 기준 DB로 승격하거나 오탐 제외 처리한다. 초기 운영에서는 운영자가 이 역할을 겸할 수 있다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Flows
|
||||||
|
|
||||||
|
- F1. 근거 상태 표시
|
||||||
|
- **Trigger:** 운영자가 케이스 상세 화면에서 수집된 근거를 검토한다.
|
||||||
|
- **Actors:** A1, A2
|
||||||
|
- **Steps:** 운영자는 각 근거를 판단에 사용, 무관, 오탐, 보류 중 하나로 표시한다. 이 표시는 해당 케이스 판단을 돕는 상태이며 기준 DB 편입을 즉시 의미하지 않는다.
|
||||||
|
- **Outcome:** 케이스 안에서 어떤 근거가 판단에 쓰였는지 남는다.
|
||||||
|
- **Covered by:** R1, R2, R3
|
||||||
|
|
||||||
|
- F2. 케이스 판정 후 주의 후보 생성
|
||||||
|
- **Trigger:** 운영자가 케이스를 보류 또는 반려로 판정한다.
|
||||||
|
- **Actors:** A1, A2
|
||||||
|
- **Steps:** 시스템은 판단에 사용된 근거와 제출 이미지 지문을 묶어 주의 후보를 자동 생성한다. 승인 케이스에서는 자동 후보를 만들지 않는다.
|
||||||
|
- **Outcome:** 보류와 반려 케이스가 이후 탐지 기준으로 누적된다.
|
||||||
|
- **Covered by:** R4, R5, R6, R7
|
||||||
|
|
||||||
|
- F3. 주의 후보 기반 재검출
|
||||||
|
- **Trigger:** 새 제출 이미지가 들어오거나 기존 케이스가 재분석된다.
|
||||||
|
- **Actors:** A2
|
||||||
|
- **Steps:** 시스템은 새 이미지와 주의 후보의 이미지 지문, 근거 패턴, 관련 키워드를 비교한다. 유사성이 높으면 확정 기준 DB와 거의 같은 강도로 위험도를 올리되, 화면에는 주의 후보 기반 감지로 분리 표시한다.
|
||||||
|
- **Outcome:** 확정되지 않은 보류/반려 기반 신호도 강하게 검출된다.
|
||||||
|
- **Covered by:** R8, R9, R10
|
||||||
|
|
||||||
|
- F4. 후보 승격과 오탐 제외
|
||||||
|
- **Trigger:** 운영자가 후보 관리 화면에서 주의 후보를 검토한다.
|
||||||
|
- **Actors:** A1, A3, A2
|
||||||
|
- **Steps:** 운영자는 주의 후보를 확정 기준 DB로 승격하거나 오탐 제외 처리한다. 오탐 제외된 후보와 근거 패턴은 이후 우선순위와 점수 반영이 낮아진다.
|
||||||
|
- **Outcome:** 많이 잡는 전략을 유지하면서 DB 오염을 줄인다.
|
||||||
|
- **Covered by:** R11, R12, R13
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
**근거 품질 상태**
|
||||||
|
- R1. 시스템은 근거별 상태로 판단에 사용, 무관, 오탐, 보류를 제공해야 한다.
|
||||||
|
- R2. 근거별 상태는 기준 DB 편입을 즉시 발생시키지 않고, 먼저 케이스 판단 맥락에만 귀속되어야 한다.
|
||||||
|
- R3. 무관 또는 오탐으로 표시된 근거는 기본 근거 목록에서 낮은 우선순위로 이동하거나 접힌 상태로 보여야 한다.
|
||||||
|
|
||||||
|
**케이스 판정 기반 후보 생성**
|
||||||
|
- R4. 기준 DB 후보 생성은 케이스 판정 이후에만 일어나야 한다.
|
||||||
|
- R5. 보류 케이스는 자동으로 주의 후보를 생성해야 한다.
|
||||||
|
- R6. 반려 케이스는 자동으로 주의 후보를 생성해야 한다.
|
||||||
|
- R7. 승인 케이스는 자동으로 주의 후보 또는 확정 기준 DB 항목을 생성하지 않아야 한다.
|
||||||
|
|
||||||
|
**주의 후보의 점수 반영**
|
||||||
|
- R8. 주의 후보와 유사한 새 제출은 확정 기준 DB와 거의 같은 강도로 위험도에 반영되어야 한다.
|
||||||
|
- R9. 주의 후보 기반 감지는 확정 기준 DB 감지와 UI에서 명확히 구분되어야 한다.
|
||||||
|
- R10. 주의 후보 기반 감지는 자동 반려나 자동 승인으로 이어지지 않고 운영자 검토 우선순위를 높이는 용도로만 사용되어야 한다.
|
||||||
|
|
||||||
|
**후보 관리와 오염 방지**
|
||||||
|
- R11. 운영자는 주의 후보를 확정 기준 DB로 승격할 수 있어야 한다.
|
||||||
|
- R12. 운영자는 주의 후보 또는 근거 패턴을 오탐 제외 처리할 수 있어야 한다.
|
||||||
|
- R13. 오탐 제외된 후보나 근거 패턴은 이후 위험도 반영과 표시 우선순위가 낮아져야 한다.
|
||||||
|
- R14. 확정 기준 DB와 주의 후보는 같은 목록에 섞이더라도 상태, 출처, 판정 케이스가 구분되어야 한다.
|
||||||
|
|
||||||
|
**운영 가시성**
|
||||||
|
- R15. 각 기준 DB 항목과 주의 후보는 어떤 케이스 판정에서 생성되었는지 추적 가능해야 한다.
|
||||||
|
- R16. 각 기준 DB 항목과 주의 후보는 이후 몇 번 위험 판단에 기여했는지 운영자가 볼 수 있어야 한다.
|
||||||
|
- R17. 후보 생성, 승격, 오탐 제외, 근거 상태 변경은 감사 로그에 남아야 한다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Acceptance Examples
|
||||||
|
|
||||||
|
- AE1. **Covers R1, R2, R4, R5.** 운영자가 케이스 근거 3개 중 2개를 판단에 사용으로 표시하고 케이스를 보류하면, 시스템은 그 2개 근거와 제출 이미지 지문을 바탕으로 주의 후보를 자동 생성한다.
|
||||||
|
- AE2. **Covers R1, R2, R6.** 운영자가 근거를 판단에 사용으로 표시했더라도 케이스 판정을 아직 내리지 않았으면 기준 DB 후보는 생성되지 않는다.
|
||||||
|
- AE3. **Covers R7.** 운영자가 케이스를 승인하면 판단에 사용으로 표시된 근거가 있어도 자동 주의 후보는 생성되지 않는다.
|
||||||
|
- AE4. **Covers R8, R9, R10.** 새 이미지가 기존 주의 후보와 강하게 유사하면 위험도가 크게 오르고 근거 그룹에 주의 후보 기반 감지로 표시되지만, 케이스 상태는 자동 변경되지 않는다.
|
||||||
|
- AE5. **Covers R11, R14, R15.** 운영자가 주의 후보를 확정 기준 DB로 승격하면 해당 항목은 확정 상태, 원래 판정 케이스, 샘플 이미지, 별칭/키워드 정보를 함께 가진다.
|
||||||
|
- AE6. **Covers R12, R13.** 운영자가 특정 주의 후보를 오탐 제외하면 이후 유사 이미지가 들어와도 해당 후보는 강한 위험도 근거로 쓰이지 않고 낮은 우선순위 참고로만 남는다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
- 운영자는 근거가 많아도 판단에 쓸 근거와 버릴 근거를 빠르게 구분할 수 있다.
|
||||||
|
- 보류와 반려 케이스가 자동으로 다음 탐지 품질을 높여, 초기 기준 DB가 부족해도 위험 후보를 많이 잡을 수 있다.
|
||||||
|
- 강직한 검출 전략을 유지하면서도 확정 기준 DB, 주의 후보, 오탐 제외가 분리되어 DB 오염을 추적하고 줄일 수 있다.
|
||||||
|
- 구현 계획 단계가 후보 생성 시점, 보류 케이스 처리, 주의 후보의 점수 강도, UI 구분 방식을 새로 invent하지 않아도 된다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scope Boundaries
|
||||||
|
|
||||||
|
- 근거 상태는 케이스 판정을 대체하지 않는다.
|
||||||
|
- 주의 후보는 자동 생성되지만 확정 기준 DB 승격은 운영자 검토를 거친다.
|
||||||
|
- 주의 후보가 위험도를 강하게 올리더라도 자동 반려, 자동 보류, 자동 승인 상태 변경은 하지 않는다.
|
||||||
|
- 얼굴 임베딩, 특정 개인 식별용 생체 템플릿, 얼굴 유사도 DB는 만들지 않는다.
|
||||||
|
- Google Image Search, Google Lens, Naver 웹 UI 자동화나 스크래핑은 포함하지 않는다.
|
||||||
|
- 신청자에게 근거 상태, 주의 후보, 위험도 산식, 내부 판단 근거를 노출하지 않는다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Decisions
|
||||||
|
|
||||||
|
- 케이스 판정 우선: 근거 상태는 중간 판단 보조이며 DB 후보 생성은 보류/반려 판정 이후에만 발생한다.
|
||||||
|
- 보류까지 자동 후보화: 파급효과를 줄이기 위해 보류 케이스도 위험 신호로 적극 누적한다.
|
||||||
|
- 주의 후보 강반영: 주의 후보는 확정 DB와 거의 같은 강도로 위험도에 반영하되 UI에서는 출처를 분리한다.
|
||||||
|
- 확정과 주의 분리: 많이 잡는 전략을 유지하되, 운영자가 확정 기준과 주의 기준을 구분할 수 있게 한다.
|
||||||
|
- 오탐은 학습 신호: 오탐 제외는 단순 숨김이 아니라 이후 우선순위와 점수 반영을 낮추는 운영 피드백으로 사용한다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dependencies / Assumptions
|
||||||
|
|
||||||
|
- 현재 시스템은 이미지 지문과 기준 DB 유사도 분석을 이미 갖고 있으므로 주의 후보도 같은 이미지 수준 지문 기반으로 시작할 수 있다.
|
||||||
|
- 얼굴 관련 처리는 현재 정책처럼 얼굴 영역 웹 근거와 사람/얼굴 존재 신호에 한정하며, 생체 인식 저장소로 확장하지 않는다.
|
||||||
|
- 실제 점수 가중치는 구현 계획에서 현재 위험도 산식과 테스트를 확인해 정한다.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Outstanding Questions
|
||||||
|
|
||||||
|
### Deferred to Planning
|
||||||
|
|
||||||
|
- [Affects R8, R13][Technical] 주의 후보 강반영 점수를 확정 DB와 완전히 같게 둘지, 약간 낮게 둘지 현재 위험도 산식과 함께 결정해야 한다.
|
||||||
|
- [Affects R3, R9][Technical] UI에서 주의 후보 기반 감지와 확정 DB 감지를 어떤 근거 그룹과 배지로 분리할지 정해야 한다.
|
||||||
|
- [Affects R12, R13][Technical] 오탐 제외가 URL, 이미지 지문, 제목, 도메인 중 어느 범위까지 전파되는지 정해야 한다.
|
||||||
251
docs/operations/copyrighter-operation-worklist.md
Normal file
251
docs/operations/copyrighter-operation-worklist.md
Normal file
|
|
@ -0,0 +1,251 @@
|
||||||
|
# Copyrighter 운영 연결 상태
|
||||||
|
|
||||||
|
기준 포트는 `9500`이다.
|
||||||
|
|
||||||
|
## 이번에 연결된 항목
|
||||||
|
|
||||||
|
- 9500 API 서버 골격 생성 완료
|
||||||
|
- 정적 운영자 콘솔을 9500 서버에서 함께 제공
|
||||||
|
- `web/operator-gui/app.js`가 `/api/bootstrap` API 응답으로 초기 데이터를 갱신하도록 변경
|
||||||
|
- SQLite 저장소 연결 완료
|
||||||
|
- 로컬 제출 이미지 폴더 연결 완료
|
||||||
|
- 기본 헬스체크 추가
|
||||||
|
- `.env` 또는 프로세스 환경변수 기반 외부 API/로컬 LLM 설정 로딩
|
||||||
|
- Naver 이미지 검색 API 연결: 수동 검색에서 텍스트 쿼리만 전송
|
||||||
|
- Google Cloud Vision Web Detection 연결: 신규 로컬 제출 분석 시 파생 이미지 전송
|
||||||
|
- 내부 Ollama 연결: 재분석 시 기존 근거 기반 LLM 요약 생성
|
||||||
|
- 증거 상태 기록: 판단에 사용, 무관, 오탐, 보류를 증거별로 저장
|
||||||
|
- 보류/반려 판정 기반 주의 후보 자동 생성
|
||||||
|
- 주의 후보 이미지 유사도 매칭, 확정 DB 편입, 오탐 제외 흐름 추가
|
||||||
|
|
||||||
|
## 실행 방법
|
||||||
|
|
||||||
|
`.env.example`을 참고해 루트에 `.env`를 만들고 필요한 키만 채운다. 키가 없는 외부 API provider는 자동으로 disabled 상태가 된다. Ollama LLM은 로컬 기본값으로 켜진다.
|
||||||
|
|
||||||
|
```text
|
||||||
|
NAVER_CLIENT_ID=
|
||||||
|
NAVER_CLIENT_SECRET=
|
||||||
|
GOOGLE_CLOUD_VISION_API_KEY=
|
||||||
|
COPYRIGHTER_GOOGLE_FACE_CROP_SEARCH=false
|
||||||
|
OLLAMA_BASE_URL=http://localhost:11434
|
||||||
|
OLLAMA_MODEL=qwen2.5:0.5b-instruct
|
||||||
|
```
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
cd C:\Users\USER\Desktop\complete\copyrighter
|
||||||
|
python run_copyrighter_server.py
|
||||||
|
```
|
||||||
|
|
||||||
|
브라우저:
|
||||||
|
|
||||||
|
```text
|
||||||
|
http://127.0.0.1:9500/
|
||||||
|
```
|
||||||
|
|
||||||
|
헬스체크:
|
||||||
|
|
||||||
|
```text
|
||||||
|
http://127.0.0.1:9500/health
|
||||||
|
http://127.0.0.1:9500/api/providers/health
|
||||||
|
```
|
||||||
|
|
||||||
|
기본 저장 위치:
|
||||||
|
|
||||||
|
```text
|
||||||
|
data/copyrighter.sqlite3
|
||||||
|
data/submissions/submissions.json
|
||||||
|
data/submissions/images/
|
||||||
|
```
|
||||||
|
|
||||||
|
## env 키와 연결 동작
|
||||||
|
|
||||||
|
| env | 용도 | 연결되는 흐름 |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `NAVER_CLIENT_ID` | Naver Open API 클라이언트 ID | `POST /api/search/manual`에서 provider가 `naver`일 때 사용 |
|
||||||
|
| `NAVER_CLIENT_SECRET` | Naver Open API 클라이언트 Secret | Naver 요청 헤더에 사용 |
|
||||||
|
| `NAVER_SEARCH_DISPLAY` | 검색 결과 개수, 기본 `10` | Naver 이미지 검색 쿼리 |
|
||||||
|
| `NAVER_SEARCH_PAGES` | 이미지 검색 페이지 수, 기본 `1`, 최대 `10` | `display` 단위로 다음 결과 페이지까지 가져온다. 페이지 수만큼 API 호출량이 늘어난다. |
|
||||||
|
| `NAVER_SEARCH_SORT` | 정렬, 기본 `sim` | Naver 이미지 검색 쿼리 |
|
||||||
|
| `NAVER_BLOG_SEARCH_DISPLAY` | 블로그 검색 결과 개수, 기본 `3` | 이미지 검색이 직접 매칭을 만들지 못할 때 원문 페이지 대표 이미지를 찾는 보조 검색 |
|
||||||
|
| `NAVER_BLOG_SEARCH_PAGES` | 블로그 검색 페이지 수, 기본 `1`, 최대 `10` | 블로그 보조 검색의 다음 결과 페이지까지 가져온다. 페이지 수만큼 API 호출량이 늘어난다. |
|
||||||
|
| `NAVER_BLOG_SEARCH_SORT` | 블로그 정렬, 기본 `sim` | Naver 블로그 검색 쿼리 |
|
||||||
|
| `NAVER_WEB_SEARCH_DISPLAY` | 웹문서 검색 결과 개수, 기본 `3` | 이미지/블로그 검색이 매칭 이미지를 만들지 못할 때 일반 웹문서 페이지 대표 이미지를 찾는 보조 검색 |
|
||||||
|
| `NAVER_WEB_SEARCH_PAGES` | 웹문서 검색 페이지 수, 기본 `1`, 최대 `10` | 웹문서 보조 검색의 다음 결과 페이지까지 가져온다. 페이지 수만큼 API 호출량이 늘어난다. |
|
||||||
|
| `GOOGLE_CLOUD_VISION_API_KEY` | Cloud Vision REST API 키 | 신규 제출 seed 분석에서 Web Detection 사용 |
|
||||||
|
| `GOOGLE_CLOUD_VISION_PARENT` | 선택 project/location parent | Cloud Vision 요청 body에 선택적으로 포함 |
|
||||||
|
| `COPYRIGHTER_GOOGLE_FACE_CROP_SEARCH` | 얼굴 영역 Google Web Detection 사용 여부, 기본 `false` | 재분석 시 감지된 얼굴 영역 crop만 별도 파생 이미지로 보내 웹 근거를 수집한다. 동일인 판정이나 얼굴 인식 점수로 쓰지 않는다. |
|
||||||
|
| `GOOGLE_CUSTOM_SEARCH_IMAGE_RESULTS` | Google 이미지 검색 1페이지 결과 개수, 기본 `3` | Google Custom Search 이미지 검색 쿼리 |
|
||||||
|
| `GOOGLE_CUSTOM_SEARCH_IMAGE_PAGES` | Google 이미지 검색 페이지 수, 기본 `1`, 최대 `10` | `num` 단위로 다음 결과 페이지까지 가져온다. 페이지 수만큼 API 호출량이 늘어난다. |
|
||||||
|
| `GOOGLE_CUSTOM_SEARCH_WEB_RESULTS` | Google 웹 검색 1페이지 결과 개수, 기본 `3` | 이미지 검색이 직접 매칭을 만들지 못할 때 웹 검색 결과 페이지의 대표 이미지를 찾는다. |
|
||||||
|
| `GOOGLE_CUSTOM_SEARCH_WEB_PAGES` | Google 웹 검색 페이지 수, 기본 `1`, 최대 `10` | 웹 검색의 다음 결과 페이지까지 가져온다. 페이지 수만큼 API 호출량이 늘어난다. |
|
||||||
|
| `COPYRIGHTER_AUTO_NAVER_QUERY_LIMIT` | Google 근거에서 자동 생성해 실행할 Naver 쿼리 수, 기본 `3`, 최대 `10` | Google 페이지 제목과 엔티티를 우선순위로 정렬해 여러 텍스트 이미지 검색을 자동 실행한다. |
|
||||||
|
| `COPYRIGHTER_AUTO_NAVER_BLOG_QUERY_LIMIT` | 이미지 검색 매칭이 없을 때 추가 실행할 Naver 블로그 쿼리 수, 기본 `1`, 최대 `10` | 블로그 검색 결과 페이지의 대표 이미지를 추출해 제출 이미지와 지문 비교한다. |
|
||||||
|
| `COPYRIGHTER_AUTO_NAVER_WEB_QUERY_LIMIT` | 이미지/블로그 검색 매칭이 없을 때 추가 실행할 Naver 웹문서 쿼리 수, 기본 `1`, 최대 `10` | 웹문서 검색 결과 페이지의 대표 이미지를 추출해 제출 이미지와 지문 비교한다. |
|
||||||
|
| `COPYRIGHTER_SEARCH_RESULT_COMPARE_LIMIT` | 검색 결과 이미지 URL을 내려받아 제출 이미지와 지문 비교할 건수, 기본 `3`, 최대 `20` | Naver/Google 결과 이미지를 로컬 저장소에 저장한 뒤 pHash로 제출 이미지와 직접 비교한다. |
|
||||||
|
| `COPYRIGHTER_SEARCH_RESULT_PAGE_IMAGE_LIMIT` | 검색 결과 원문 페이지에서 추출할 대표 이미지 수, 기본 `3`, 최대 `10` | 결과 자체에 이미지 URL이 없으면 `og:image`, `twitter:image`, `img` 후보를 제한된 수만 내려받아 제출 이미지와 비교한다. |
|
||||||
|
| `COPYRIGHTER_SEARCH_RESULT_SIMILARITY_THRESHOLD` | 검색 결과 이미지 유사도 필터 임계치, 기본 `0.9`, 범위 `0.0`~`1.0` | Naver/Google 결과 후보 이미지의 pHash 유사도 기반 매칭 임계치를 조절한다. |
|
||||||
|
| `OLLAMA_BASE_URL` | Ollama 로컬 서버 URL, 기본 `http://localhost:11434` | `POST /api/submissions/{id}/rerun-enrichment`에서 LLM 요약 사용 |
|
||||||
|
| `OLLAMA_MODEL` | Ollama 모델, 기본 `qwen2.5:0.5b-instruct` | Ollama `/api/generate` payload의 `model` |
|
||||||
|
| `COPYRIGHTER_NAVER_DAILY_LIMIT` | Naver 일일 호출 제한 | 로컬 policy gate |
|
||||||
|
| `COPYRIGHTER_GOOGLE_DAILY_LIMIT` | Google 일일 호출 제한 | 로컬 policy gate |
|
||||||
|
| `COPYRIGHTER_LLM_DAILY_LIMIT` | LLM 일일 호출 제한 | provider 상태 표시용 |
|
||||||
|
|
||||||
|
환경변수는 `.env`보다 우선한다. 예를 들어 PowerShell에서 이미 `$env:OLLAMA_MODEL`이 있으면 `.env` 값으로 덮어쓰지 않는다.
|
||||||
|
|
||||||
|
## 공식 문서 기준
|
||||||
|
|
||||||
|
- Naver 이미지 검색 API: https://developers.naver.com/docs/serviceapi/search/image/image.md
|
||||||
|
- Google Cloud Vision Web Detection: https://cloud.google.com/vision/docs/detecting-web
|
||||||
|
- Ollama Generate API: https://docs.ollama.com/api/generate
|
||||||
|
- qwen2.5 0.5B Instruct 모델: https://ollama.com/library/qwen2.5:0.5b-instruct
|
||||||
|
|
||||||
|
## 현재 API
|
||||||
|
|
||||||
|
```text
|
||||||
|
GET /health
|
||||||
|
GET /api/providers/health
|
||||||
|
GET /api/bootstrap
|
||||||
|
GET /api/review-queue
|
||||||
|
GET /api/submissions/{submission_id}/review
|
||||||
|
POST /api/submissions/reload
|
||||||
|
POST /api/submissions/{submission_id}/rerun-enrichment
|
||||||
|
POST /api/submissions/{submission_id}/decision
|
||||||
|
POST /api/evidence/{evidence_id}/status
|
||||||
|
POST /api/knowledge/{entry_id}/promote-watchlist
|
||||||
|
POST /api/knowledge/{entry_id}/exclude-watchlist
|
||||||
|
POST /api/search/manual
|
||||||
|
GET /api/providers
|
||||||
|
PATCH /api/providers/{provider_id}
|
||||||
|
POST /api/providers/emergency-disable
|
||||||
|
GET /api/audit-events
|
||||||
|
GET /media/{image_path}
|
||||||
|
```
|
||||||
|
|
||||||
|
## 로컬 이미지 저장 방식
|
||||||
|
|
||||||
|
제출 이미지는 두 방식으로 넣을 수 있다.
|
||||||
|
|
||||||
|
- 빠른 방식: 이미지 파일을 `data/submissions/images/` 아래에 복사한 뒤 화면에서 `새 제출 불러오기`를 누른다. 이 경우 제출 ID와 제목은 파일명 기준으로 자동 생성된다.
|
||||||
|
- 명시 방식: `data/submissions/submissions.json`에 ID, 제목, 크기, 제출 시간을 직접 등록한 뒤 화면에서 `새 제출 불러오기`를 누른다.
|
||||||
|
|
||||||
|
예시:
|
||||||
|
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"id": "SUB-LOCAL-001",
|
||||||
|
"title": "로컬 얼굴 이미지 샘플",
|
||||||
|
"file": "images/local-face.svg",
|
||||||
|
"width": 1200,
|
||||||
|
"height": 900,
|
||||||
|
"submitted_at": "2026-05-26 10:00"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
이미지 파일은 `data/submissions/images/` 아래에 둔다. 서버는 이 폴더 밖으로 나가는 경로를 거부한다.
|
||||||
|
|
||||||
|
서버를 재시작하지 않아도 된다. 운영 콘솔의 심사 큐 상단에서 `새 제출 불러오기`를 누르면 `submissions.json` 변경분과 폴더에 새로 복사한 이미지가 SQLite DB로 import된다.
|
||||||
|
|
||||||
|
## 운영자 판정 흐름
|
||||||
|
|
||||||
|
1. 제출 이미지를 선택하고 상단의 `선택 재분석` 또는 행 안의 증거를 확인한다.
|
||||||
|
2. 증거 행에서 `판단에 사용`, `무관`, `오탐`, `보류`를 표시한다. 이 표시는 케이스 기록과 점수 반영 여부를 정리할 뿐, 기준 DB 후보를 만들지는 않는다.
|
||||||
|
3. 케이스 판정을 먼저 내린다. `승인`은 자동 후보를 만들지 않는다. `보류`와 `반려`는 제출 이미지 지문과 선택된 근거를 묶어 `주의 후보`를 만든다.
|
||||||
|
4. 이후 같은 이미지가 들어오면 `주의 후보 근거` 그룹에 별도로 표시되고 높은 위험 신호로 반영된다. 그래도 최종 판정은 자동 변경되지 않는다.
|
||||||
|
5. 지식 DB 화면에서 주의 후보를 검토한 뒤 `확정 DB 편입` 또는 `오탐 제외`를 누른다. 제외된 후보는 다음 내부 유사도 분석에서 사용하지 않는다.
|
||||||
|
|
||||||
|
## 외부 API 연결 상태
|
||||||
|
|
||||||
|
연결 완료:
|
||||||
|
|
||||||
|
- Naver: `https://openapi.naver.com/v1/search/image`에 `GET` 요청을 보낸다. `query`, `display`, `start`, `sort`를 쿼리스트링으로 보내고 `X-Naver-Client-Id`, `X-Naver-Client-Secret` 헤더를 사용한다.
|
||||||
|
- Google: `https://vision.googleapis.com/v1/images:annotate`에 `WEB_DETECTION` 요청을 보낸다. 서버는 내부 파생 이미지 bytes를 base64로 인코딩해 전송한다.
|
||||||
|
- Provider 상태: 외부 API는 키가 있으면 enabled, 없으면 disabled와 missing env 사유를 `/api/providers`에 표시한다.
|
||||||
|
- Provider Controls: 화면의 provider 활성/비활성 상태는 DB에 저장된다.
|
||||||
|
|
||||||
|
아직 운영 검증이 필요한 것:
|
||||||
|
|
||||||
|
- 실제 운영 키로 Naver/Google 샘플 호출 품질 확인
|
||||||
|
- 공급자별 재시도/backoff 정책
|
||||||
|
- 일일 한도 초과 시 관리자 알림
|
||||||
|
- API 키를 Windows 서비스/작업 스케줄러/배포 환경의 시크릿 저장소로 옮기는 운영 방식
|
||||||
|
|
||||||
|
운영 경계:
|
||||||
|
|
||||||
|
- Naver에는 텍스트 쿼리만 보낸다.
|
||||||
|
- Naver로 원본 이미지나 파생 이미지를 보내지 않는다.
|
||||||
|
- Google에는 승인된 파생 이미지만 보낸다.
|
||||||
|
- Google Cloud Vision 사용 전 계약, DPA, 데이터 보존, 메타데이터 로깅, 리전, 자격 증명 정책을 확인한다.
|
||||||
|
|
||||||
|
## LLM 연결 상태
|
||||||
|
|
||||||
|
연결 완료:
|
||||||
|
|
||||||
|
- Ollama 로컬 API에 `POST /api/generate` 요청을 보낸다.
|
||||||
|
- 기본 URL은 `http://localhost:11434`, 기본 모델은 `qwen2.5:0.5b-instruct`다.
|
||||||
|
- Ollama 요청은 API 키를 쓰지 않는다.
|
||||||
|
- 응답 스트리밍은 끄고 `stream: false`로 한 번에 요약을 받는다.
|
||||||
|
- `rerun-enrichment`는 이미 저장된 내부/Naver/Google 근거만 LLM 입력으로 넘긴다.
|
||||||
|
- LLM 요약은 `source_evidence_ids` 또는 출처 URL을 가진 보조 근거로만 저장한다.
|
||||||
|
- LLM 실패는 기존 근거를 낮추지 않고 failure evidence로 남긴다.
|
||||||
|
|
||||||
|
설치 명령:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
ollama pull qwen2.5:0.5b-instruct
|
||||||
|
```
|
||||||
|
|
||||||
|
아직 운영 검증이 필요한 것:
|
||||||
|
|
||||||
|
- 실제 운영 PC에서 `ollama pull qwen2.5:0.5b-instruct` 후 샘플 재분석 호출 품질 확인
|
||||||
|
- 프롬프트/응답 로그 저장 여부와 마스킹 정책 확정
|
||||||
|
- LLM 요약을 신청자에게 노출하지 않는 정책 재검증
|
||||||
|
|
||||||
|
LLM이 해도 되는 일:
|
||||||
|
|
||||||
|
- 검색 쿼리 후보 생성
|
||||||
|
- 증거 요약
|
||||||
|
- 중복/상충 증거 정리
|
||||||
|
- 운영자가 읽기 쉬운 근거 메모 생성
|
||||||
|
|
||||||
|
LLM이 하면 안 되는 일:
|
||||||
|
|
||||||
|
- 최종 승인/보류/반려 결정
|
||||||
|
- 단독 점수 산정
|
||||||
|
- 출처 없는 유명인/작품/IP 단정
|
||||||
|
- 신청자에게 보여줄 자동 설명 생성
|
||||||
|
|
||||||
|
## 아직 안 된 것: 운영용 인증과 배포
|
||||||
|
|
||||||
|
현재 9500 서버는 로컬 실행용이다. 실운영 전에 아래가 필요하다.
|
||||||
|
|
||||||
|
- 로그인/세션 또는 사내 SSO
|
||||||
|
- 일반 운영자와 관리자 권한 분리
|
||||||
|
- Provider Controls 관리자 전용 접근
|
||||||
|
- 운영자 결정/보정/지식 DB 변경 감사 로그 강화
|
||||||
|
- 9500 서버 프로세스 관리 방식 확정
|
||||||
|
- 백업, 복구, 모니터링, 알림
|
||||||
|
## Google Custom Search 설정
|
||||||
|
|
||||||
|
| env | 용도 | 연결되는 흐름 |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `GOOGLE_CUSTOM_SEARCH_API_KEY` | Google Programmable Search JSON API 키 | Google Vision 텍스트 단서를 Google 이미지/웹 검색으로 확장 |
|
||||||
|
| `GOOGLE_CUSTOM_SEARCH_CX` | Programmable Search Engine ID | Custom Search JSON API `cx` 값 |
|
||||||
|
| `GOOGLE_CUSTOM_SEARCH_IMAGE_RESULTS` | 이미지 검색 결과 개수, 기본 `3` | 텍스트 쿼리 기반 이미지 결과를 내려받아 제출 이미지와 pHash 비교 |
|
||||||
|
| `GOOGLE_CUSTOM_SEARCH_WEB_RESULTS` | 웹 검색 결과 개수, 기본 `3` | 이미지 검색이 비었을 때 웹 결과 페이지 대표 이미지를 비교 |
|
||||||
|
| `COPYRIGHTER_AUTO_GOOGLE_CUSTOM_QUERY_LIMIT` | 자동 Google Custom Search 쿼리 수, 기본 `2`, 최대 `10` | 제출 이미지를 보내지 않고 텍스트 쿼리만 전송 |
|
||||||
|
| `COPYRIGHTER_GOOGLE_CUSTOM_SEARCH_DAILY_LIMIT` | Google Custom Search 일일 호출 제한 | 로컬 policy gate |
|
||||||
|
|
||||||
|
자동 검색 쿼리 원천:
|
||||||
|
|
||||||
|
- Google 페이지 제목과 엔티티가 가장 높은 우선순위다.
|
||||||
|
- Google best guess label만 있는 경우에도 `person`, `gentleman`, `portrait` 같은 일반어는 버리고, `IU official profile`처럼 구체적인 문구만 낮은 우선순위 쿼리로 사용한다.
|
||||||
|
- 얼굴 영역 Google Web Detection은 동일 인물 판정 근거로 쓰지 않는다. 다만 페이지 제목이나 엔티티가 구체적이면 낮은 우선순위의 텍스트 검색 쿼리로만 재사용한다.
|
||||||
|
- 페이지 제목이 영어권 문구면 보조 쿼리에 `image`를 붙이고, 한국어 문구면 `이미지`를 붙인다. 검색엔진에 깨진 한글 템플릿을 보내지 않는다.
|
||||||
|
- Google Custom Search와 Naver에는 제출 이미지를 보내지 않고 텍스트 쿼리만 보낸 뒤, 검색 결과 이미지나 원문 페이지 대표 이미지를 내려받아 로컬에서 pHash 비교한다.
|
||||||
|
|
||||||
|
설정 진단:
|
||||||
|
|
||||||
|
- `/api/providers/health`와 공급자 화면은 `requiredEnv`, `configuredEnv`를 비밀값 없이 표시한다.
|
||||||
|
- `google_search`가 disabled이면 우선 `GOOGLE_CUSTOM_SEARCH_API_KEY`, `GOOGLE_CUSTOM_SEARCH_CX`가 둘 다 `.env`에 있는지 확인한다.
|
||||||
52
docs/operations/image-rights-risk-filter.md
Normal file
52
docs/operations/image-rights-risk-filter.md
Normal file
|
|
@ -0,0 +1,52 @@
|
||||||
|
# Image Rights Risk Filter Operations
|
||||||
|
|
||||||
|
## Modes
|
||||||
|
|
||||||
|
- Internal-only mode: external web detection is disabled. The batch still runs fingerprint, knowledge-base, face/person-presence, and scoring logic.
|
||||||
|
- External-enriched mode: Google Cloud Vision web detection can run only after contract, DPA, content-retention, request-metadata logging, region, credential, and quota controls are approved.
|
||||||
|
- Search-enriched mode: official text-query search APIs can supplement operator evidence. Naver search is used for Korean query-based evidence, not image-upload reverse search.
|
||||||
|
- LLM-assisted mode: an internal LLM can generate search queries, structure search results, and summarize evidence for operators. It must not decide the score or final status.
|
||||||
|
|
||||||
|
## Hard Boundaries
|
||||||
|
|
||||||
|
- Do not automate Google Image Search or Google Lens web UI.
|
||||||
|
- Do not automate Naver web search UI or scrape Naver result pages.
|
||||||
|
- Do not send internal originals to external APIs.
|
||||||
|
- Do not send original images or analysis derivatives to Naver search; use text queries only.
|
||||||
|
- Do not store face embeddings, celebrity identity matches from faces, or biometric templates.
|
||||||
|
- Do not use an LLM output as standalone evidence unless it is tied back to source evidence.
|
||||||
|
- Do not expose automated scores, evidence, or provider details to applicants.
|
||||||
|
- Do not let the filter change review status automatically.
|
||||||
|
|
||||||
|
## Disable External Calls
|
||||||
|
|
||||||
|
Set the external API policy to disabled. Batches should continue internal-only and record external-skipped evidence reasons for affected submissions.
|
||||||
|
|
||||||
|
Set the search API policy to disabled to stop Naver enrichment. The review enrichment job should continue with existing internal and Google evidence, then record search-skipped reasons for operators.
|
||||||
|
|
||||||
|
Disable the internal LLM assistant if prompt logging, model hosting, or source-citation behavior cannot be audited. LLM-disabled operation should still show raw internal, Naver, and Google evidence.
|
||||||
|
|
||||||
|
## Provider Credentials And Quotas
|
||||||
|
|
||||||
|
- Keep Naver credentials separate from Google credentials.
|
||||||
|
- Configure provider-specific daily limits before enabling search-enriched mode.
|
||||||
|
- Treat quota exhaustion as an operator-visible skipped-provider state, not as low-risk evidence.
|
||||||
|
- Do not log raw original images, derivatives, or applicant personal data in provider request metadata.
|
||||||
|
|
||||||
|
## Operator Guidance
|
||||||
|
|
||||||
|
Scores are triage signals, not legal conclusions. Operators should review the reasons and evidence, then choose approved, held, or rejected manually.
|
||||||
|
|
||||||
|
Evidence status is a judgment aid. Mark evidence as used for judgment, irrelevant, false positive, or pending to keep the case record clear. Evidence status alone must not create a knowledge-base entry.
|
||||||
|
|
||||||
|
Held and rejected case decisions create watchlist candidates. Watchlist matches are strong future risk signals, but they are visually separated from confirmed DB matches and never change a future case decision automatically.
|
||||||
|
|
||||||
|
Promote a watchlist candidate only after the team accepts it as a reusable confirmed reference. Exclude it when it is a false positive so future internal similarity analysis ignores it.
|
||||||
|
|
||||||
|
Naver results are search evidence candidates. Operators should prefer results that connect the submitted image to a named person, work, character, broadcast, webtoon, game, official source, or repeated matching image source.
|
||||||
|
|
||||||
|
LLM summaries are reading aids. If a summary mentions a celebrity, character, or copyrighted work without a linked source result, treat that claim as unverified.
|
||||||
|
|
||||||
|
## Correction Flow
|
||||||
|
|
||||||
|
If a rejection or hold was wrong, correct the operator decision and exclude any watchlist candidates derived from that decision so future submissions are not penalized by stale evidence.
|
||||||
590
docs/plans/2026-05-25-001-feat-image-rights-risk-filter-plan.md
Normal file
590
docs/plans/2026-05-25-001-feat-image-rights-risk-filter-plan.md
Normal file
|
|
@ -0,0 +1,590 @@
|
||||||
|
---
|
||||||
|
title: "feat: Add Image Rights Risk Filter"
|
||||||
|
type: feat
|
||||||
|
status: active
|
||||||
|
date: 2026-05-25
|
||||||
|
origin: docs/brainstorms/2026-05-25-image-rights-risk-filter-requirements.md
|
||||||
|
---
|
||||||
|
|
||||||
|
# feat: Add Image Rights Risk Filter
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Build a new backend rights-filtering subsystem that analyzes submitted images asynchronously, stores structured evidence, computes a 0-100 risk score, and presents recommendations to operators without changing application review status automatically. Because the current workspace has no application source code, this plan defines a portable module layout and integration contracts to adapt into the target app.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problem Frame
|
||||||
|
|
||||||
|
Operators need a first-pass risk filter before submitted images are commercialized. The origin requirements define the product behavior: batch analysis, operator-only recommendations, Korean celebrity/IP priority, no applicant-facing analysis output, and no automatic approval or rejection.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
- R1. Analyze every submitted image through a batch pipeline. (Origin R1, F1)
|
||||||
|
- R2. Prioritize celebrity/publicity and copyrighted character/work risks, tuned first for Korean entities, while recording incidental strong evidence for out-of-core categories without building dedicated v1 detectors. (Origin R2, R3, R4)
|
||||||
|
- R3. Generate exact and perceptual image fingerprints for duplicate, transformed, prior-rejected, and operator-registered sample matching. (Origin R5, AE1)
|
||||||
|
- R4. Detect face/person presence without identifying specific individuals. (Origin R6, R7, AE2)
|
||||||
|
- R5. Use external APIs only when content non-retention, metadata logging, and legal/privacy constraints are approved. (Origin R8, R9, R10, R11, AE3)
|
||||||
|
- R6. Normalize internal and external evidence into a 0-100 risk score plus operator-readable reasons. (Origin R12, R13, R14, R15, AE4, AE5)
|
||||||
|
- R7. Preserve operator control: the system recommends only; operators choose approved, held, or rejected. (Origin R16, R17, R18, R19, AE6)
|
||||||
|
- R8. Start with an empty knowledge base and grow it from search evidence, operator registration, and rejected decisions. (Origin R20, R21, R22, R23, R24, AE7, AE8)
|
||||||
|
- R9. Store risk evidence, analysis versioning, external API usage, operator decisions, and governance-relevant metadata. (Origin R25)
|
||||||
|
- R10. Apply access, retention, deletion, and correction policy to original images, derivatives, fingerprints, and evidence. (Origin R26)
|
||||||
|
- R11. Provide usage limits, retry behavior, idempotency, and an external API disable mode. (Origin R27)
|
||||||
|
|
||||||
|
**Origin actors:** A1 신청자, A2 운영자, A3 필터링 시스템, A4 외부 이미지 분석 API
|
||||||
|
|
||||||
|
**Origin flows:** F1 배치 위험도 분석, F2 운영자 검토, F3 기준 DB 누적
|
||||||
|
|
||||||
|
**Origin acceptance examples:** AE1, AE2, AE3, AE4, AE5, AE6, AE7, AE8
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scope Boundaries
|
||||||
|
|
||||||
|
- No automatic approval, hold, or rejection status changes.
|
||||||
|
- No applicant-facing risk score, evidence, or automated reason display.
|
||||||
|
- No Google Image Search or Google Lens web UI automation or scraping.
|
||||||
|
- No dedicated v1 detector, tuning, or management surface for global IP, brands, logos, trademarks, or stock-photo theft; incidental strong evidence may still be shown.
|
||||||
|
- No internal face recognition, celebrity identity matching from face embeddings, or biometric template storage.
|
||||||
|
- No legal advice or final rights judgment replacement.
|
||||||
|
|
||||||
|
### Deferred to Follow-Up Work
|
||||||
|
|
||||||
|
- Dedicated brand/logo/trademark/stock-photo detection: separate product iteration after celebrity and character workflows prove useful.
|
||||||
|
- Advanced model calibration from historical outcomes: future iteration after enough operator decisions are collected.
|
||||||
|
- Applicant-facing rights-evidence upload workflow: excluded from this internal v1 filter.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context & Research
|
||||||
|
|
||||||
|
### Relevant Code and Patterns
|
||||||
|
|
||||||
|
- No application source code, framework files, or tests are present in the current workspace. The only local source artifact is the origin requirements document.
|
||||||
|
- The paths in this plan define a proposed module layout under `src/rights_filter/` and `tests/rights_filter/`. If the actual application lives elsewhere, adapt these paths to its framework conventions before implementation.
|
||||||
|
|
||||||
|
### Institutional Learnings
|
||||||
|
|
||||||
|
- No `docs/solutions/` directory or prior local implementation notes were found in this workspace.
|
||||||
|
|
||||||
|
### External References
|
||||||
|
|
||||||
|
- Google Cloud Vision Data Usage FAQ: online immediate-response operations process image data in memory and do not persist it to disk; asynchronous offline batch operations temporarily persist data. It also notes temporary request metadata logging. https://docs.cloud.google.com/vision/docs/data-usage
|
||||||
|
- Google Cloud Vision Web Detection: returns web entities, full matching images, visually similar images, pages with matching images, and best guess labels that can become operator evidence. https://docs.cloud.google.com/vision/docs/detecting-web
|
||||||
|
- Google Cloud Vision Face Detection: detects faces and attributes, but specific individual facial recognition is not supported. https://docs.cloud.google.com/vision/docs/detecting-faces
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Technical Decisions
|
||||||
|
|
||||||
|
- Modular pipeline over inline review logic: keep collection, analysis, scoring, evidence storage, and operator presentation as separate boundaries so the filter can run asynchronously and be disabled safely.
|
||||||
|
- Fingerprint-first internal analysis: compute exact file hash plus perceptual image fingerprints before external calls so duplicates, retries, and rejected-image matches do not depend on third-party availability.
|
||||||
|
- Face/person detection is a risk signal, not identity evidence: it can raise review attention but must not become a celebrity-recognition system.
|
||||||
|
- Cloud Vision is an optional synchronous adapter: only use online `WEB_DETECTION`, never asynchronous offline batch, and only when compliance settings mark it approved.
|
||||||
|
- Evidence ledger is append-oriented: each analysis run records version, inputs used, evidence sources, and score components so operators can audit why a recommendation changed.
|
||||||
|
- Failed analysis adds uncertainty, not exculpation: failures become review reasons and must not lower already-detected high-risk evidence.
|
||||||
|
- Knowledge base entries carry provenance: automatic entries from rejections and manual entries from operators remain distinguishable and reversible.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
### Resolved During Planning
|
||||||
|
|
||||||
|
- Hashing strategy: use an exact content hash plus at least one perceptual fingerprint; final algorithm and thresholds are implementation choices validated by tests and sample data.
|
||||||
|
- Face/person strategy: use local or approved internal detection for presence only; do not store face embeddings or identify individuals.
|
||||||
|
- Google strategy: integrate as a disabled-by-default adapter gated by compliance configuration and usage limits.
|
||||||
|
- Score strategy: use a configurable weighted model that records component reasons rather than a hidden monolithic score.
|
||||||
|
- Data governance strategy: make original files, derivatives, fingerprints, and evidence all policy-managed data classes.
|
||||||
|
|
||||||
|
### Deferred to Implementation
|
||||||
|
|
||||||
|
- Exact application framework integration points: no app code exists in this workspace, so implementers must adapt proposed paths to the real app.
|
||||||
|
- Final perceptual hash algorithm, similarity thresholds, and score weights: tune against pilot samples and operator feedback.
|
||||||
|
- Exact face/person detector package or service: choose based on the target stack, runtime environment, and privacy constraints.
|
||||||
|
- Final retention durations and access roles: confirm with legal/privacy stakeholders before production rollout.
|
||||||
|
- Cloud Vision contract details, DPA, region, and metadata logging controls: confirm before enabling external API calls.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Output Structure
|
||||||
|
|
||||||
|
src/
|
||||||
|
rights_filter/
|
||||||
|
domain/
|
||||||
|
analysis/
|
||||||
|
integrations/
|
||||||
|
jobs/
|
||||||
|
admin/
|
||||||
|
governance/
|
||||||
|
tests/
|
||||||
|
rights_filter/
|
||||||
|
docs/
|
||||||
|
operations/
|
||||||
|
|
||||||
|
The tree is the expected module shape for a new backend subsystem. The implementer may adjust it to match the actual application framework while preserving the boundaries and test coverage below.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## High-Level Technical Design
|
||||||
|
|
||||||
|
> *This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.*
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TB
|
||||||
|
Submission[Submitted image] --> Batch[Batch analyzer]
|
||||||
|
Batch --> Preprocess[Derivative and EXIF cleanup]
|
||||||
|
Preprocess --> Internal[Internal fingerprints and face/person signals]
|
||||||
|
Preprocess --> ExternalGate{External API allowed?}
|
||||||
|
ExternalGate -->|yes| WebDetect[Cloud Vision web detection]
|
||||||
|
ExternalGate -->|no| Skipped[External evidence skipped]
|
||||||
|
Internal --> Evidence[Evidence ledger]
|
||||||
|
WebDetect --> Evidence
|
||||||
|
Skipped --> Evidence
|
||||||
|
Evidence --> Scoring[Risk scoring]
|
||||||
|
Scoring --> Operator[Operator review surface]
|
||||||
|
Operator --> Decision[Approved / Held / Rejected]
|
||||||
|
Decision --> Knowledge[Knowledge base updates]
|
||||||
|
Knowledge --> Internal
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Units
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TB
|
||||||
|
U1[U1 Persistence and records]
|
||||||
|
U2[U2 Image preprocessing]
|
||||||
|
U3[U3 Internal analysis]
|
||||||
|
U4[U4 Knowledge base]
|
||||||
|
U5[U5 External API adapter]
|
||||||
|
U6[U6 Evidence and scoring]
|
||||||
|
U7[U7 Batch orchestration]
|
||||||
|
U8[U8 Operator review integration]
|
||||||
|
U9[U9 Governance and operations]
|
||||||
|
|
||||||
|
U1 --> U2
|
||||||
|
U1 --> U4
|
||||||
|
U2 --> U3
|
||||||
|
U2 --> U5
|
||||||
|
U3 --> U6
|
||||||
|
U4 --> U6
|
||||||
|
U5 --> U6
|
||||||
|
U6 --> U7
|
||||||
|
U6 --> U8
|
||||||
|
U8 --> U4
|
||||||
|
U1 --> U9
|
||||||
|
U7 --> U9
|
||||||
|
```
|
||||||
|
|
||||||
|
### U1. Persistence and Domain Records
|
||||||
|
|
||||||
|
**Goal:** Create the persistent model layer for analysis runs, evidence, risk scores, knowledge-base entries, operator decisions, and governance metadata.
|
||||||
|
|
||||||
|
**Requirements:** R1, R8, R9, R10; F1, F2, F3
|
||||||
|
|
||||||
|
**Dependencies:** None
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/domain/records.*`
|
||||||
|
- Create: `src/rights_filter/domain/repositories.*`
|
||||||
|
- Create: `src/rights_filter/domain/policies.*`
|
||||||
|
- Create: `db/migrate/*rights_filter*`
|
||||||
|
- Test: `tests/rights_filter/domain/test_records.*`
|
||||||
|
- Test: `tests/rights_filter/domain/test_repositories.*`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Model each analysis as an immutable run with a version, source image reference, input classes used, evidence entries, score result, and failure reasons.
|
||||||
|
- Store evidence as structured records grouped by source: fingerprint, face/person signal, web entity, matching image URL, prior rejection, operator manual entry, and failure.
|
||||||
|
- Represent knowledge-base entries with type, provenance, active state, policy memo, exception notes, and links back to source decisions or manual registrations.
|
||||||
|
- Keep original image storage references separate from derived image references and fingerprints so governance policy can treat them independently.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- No local app patterns exist. Follow the target application's existing persistence, migration, repository, and test conventions when integrating.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: creating an analysis run with multiple evidence sources persists all evidence and returns the latest score for the submitted image.
|
||||||
|
- Happy path: manual and automatically generated knowledge-base entries are distinguishable by provenance.
|
||||||
|
- Edge case: an inactive knowledge-base entry is retained for audit but excluded from future matching.
|
||||||
|
- Edge case: a corrected operator decision records new state without deleting prior evidence.
|
||||||
|
- Error path: invalid evidence without a source, timestamp, or analysis run is rejected.
|
||||||
|
- Integration: rejected operator decision creates a linkable source for future knowledge-base entries.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- The data layer can represent every origin evidence type without storing face biometric templates.
|
||||||
|
- The same submitted image can have multiple versioned analysis runs without overwriting prior audit data.
|
||||||
|
|
||||||
|
### U2. Image Preprocessing and Derivative Handling
|
||||||
|
|
||||||
|
**Goal:** Produce safe analysis derivatives from internally stored originals: EXIF-stripped resized images for external APIs, and normalized internal inputs for hashing/detection.
|
||||||
|
|
||||||
|
**Requirements:** R5, R10; AE3, AE8
|
||||||
|
|
||||||
|
**Dependencies:** U1
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/analysis/preprocessing.*`
|
||||||
|
- Create: `src/rights_filter/analysis/derivatives.*`
|
||||||
|
- Create: `src/rights_filter/governance/data_classes.*`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_preprocessing.*`
|
||||||
|
- Test: `tests/rights_filter/governance/test_data_classes.*`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Define data classes for original image, analysis derivative, external derivative, fingerprint, web evidence, and operator note.
|
||||||
|
- Strip EXIF and cap external derivative resolution before any external API call.
|
||||||
|
- Ensure preprocessing is deterministic enough for idempotent retries but does not mutate the internal original.
|
||||||
|
- Track derivative lifecycle so generated files can be cleaned up according to retention policy.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Use the target app's existing file storage and background temp-file cleanup patterns if available.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: a high-resolution image produces an EXIF-free external derivative under the configured maximum size.
|
||||||
|
- Happy path: internal original storage reference remains unchanged after derivative creation.
|
||||||
|
- Edge case: an image already below the maximum size is still EXIF-stripped before external use.
|
||||||
|
- Error path: corrupt image input records a preprocessing failure reason and does not produce a misleading low-risk result.
|
||||||
|
- Integration: generated derivatives are associated with the correct analysis run and data class.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- No external adapter can receive a file that bypassed the derivative policy.
|
||||||
|
|
||||||
|
### U3. Internal Fingerprint and Face/Person Analysis
|
||||||
|
|
||||||
|
**Goal:** Implement internal analysis signals: exact duplicate hash, perceptual image fingerprints, and face/person presence detection without identity recognition.
|
||||||
|
|
||||||
|
**Requirements:** R3, R4, R6; AE1, AE2, AE8
|
||||||
|
|
||||||
|
**Dependencies:** U1, U2
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/analysis/fingerprints.*`
|
||||||
|
- Create: `src/rights_filter/analysis/face_person_detection.*`
|
||||||
|
- Create: `src/rights_filter/analysis/internal_analyzer.*`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_fingerprints.*`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_face_person_detection.*`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_internal_analyzer.*`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Compute exact content hashes for repeat upload and idempotency checks.
|
||||||
|
- Compute perceptual fingerprints suitable for resize, compression, and modest crop/transform matching.
|
||||||
|
- Compare fingerprints against prior rejected images and active sample entries in the knowledge base.
|
||||||
|
- Detect face/person presence as a reason component only; do not identify a person, store face embeddings, or compare faces across images.
|
||||||
|
|
||||||
|
**Execution note:** Implement behavior test-first around the privacy boundary: face detection can emit count/presence evidence, but cannot emit identity or biometric template artifacts.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Use approved local libraries or internal services that can operate without sending original images to unapproved third parties.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Covers AE1. Happy path: a resized or recompressed copy of a prior rejected image produces a high-similarity evidence reason.
|
||||||
|
- Covers AE2. Happy path: a normal face-containing image records face/person presence but does not become high risk without stronger celebrity/IP evidence.
|
||||||
|
- Edge case: no-face images produce no face/person risk reason.
|
||||||
|
- Error path: detector failure records an analysis failure reason and does not lower other high-risk evidence.
|
||||||
|
- Privacy: face/person analysis output contains no identity label, face embedding, or biometric template.
|
||||||
|
- Integration: internal evidence feeds U6 score components with source and confidence metadata.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Internal analysis can run with external API disabled and still produce useful duplicate, rejection, and face/person evidence.
|
||||||
|
|
||||||
|
### U4. Operator Knowledge Base
|
||||||
|
|
||||||
|
**Goal:** Build management and matching support for manual risk entities and automatic rejected-image accumulation.
|
||||||
|
|
||||||
|
**Requirements:** R8, R10; F3, AE7, AE8
|
||||||
|
|
||||||
|
**Dependencies:** U1
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/domain/knowledge_base.*`
|
||||||
|
- Create: `src/rights_filter/admin/knowledge_base_handlers.*`
|
||||||
|
- Create: `src/rights_filter/admin/knowledge_base_views.*`
|
||||||
|
- Test: `tests/rights_filter/domain/test_knowledge_base.*`
|
||||||
|
- Test: `tests/rights_filter/admin/test_knowledge_base_handlers.*`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Support entity records for celebrity, group, work, character, webtoon, game, and other policy-relevant types.
|
||||||
|
- Store names, aliases, related keywords, image-level sample fingerprints, policy memo, exception conditions, active state, and provenance.
|
||||||
|
- Automatically add rejected image fingerprints as matching references while distinguishing them from operator-created entries.
|
||||||
|
- Provide correction and deactivation flows so bad rejections or bad samples do not keep poisoning future scores.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Match target app admin patterns for CRUD, audit fields, and permission checks.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: operator creates a Korean celebrity entity with aliases, sample fingerprints, memo, and exception notes.
|
||||||
|
- Happy path: a rejected decision creates an automatic reference that participates in future similarity matching.
|
||||||
|
- Edge case: deactivating an automatic reference removes it from matching but preserves audit history.
|
||||||
|
- Edge case: manual and automatic entries with similar names remain separate provenance classes.
|
||||||
|
- Error path: attempting to upload a face biometric template or identity embedding is rejected by policy validation.
|
||||||
|
- Integration: U3 matching can query only active entries while U8 can show both active and inactive history.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- The knowledge base can start empty and become useful through operator decisions and registrations without requiring seed data.
|
||||||
|
|
||||||
|
### U5. External Web Detection Adapter
|
||||||
|
|
||||||
|
**Goal:** Integrate Google Cloud Vision `WEB_DETECTION` as a compliance-gated, synchronous, disabled-by-default adapter.
|
||||||
|
|
||||||
|
**Requirements:** R5, R6, R11; AE3, AE4
|
||||||
|
|
||||||
|
**Dependencies:** U1, U2
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/integrations/external_policy.*`
|
||||||
|
- Create: `src/rights_filter/integrations/cloud_vision_web_detection.*`
|
||||||
|
- Create: `src/rights_filter/integrations/web_detection_result_mapper.*`
|
||||||
|
- Test: `tests/rights_filter/integrations/test_external_policy.*`
|
||||||
|
- Test: `tests/rights_filter/integrations/test_cloud_vision_web_detection.*`
|
||||||
|
- Test: `tests/rights_filter/integrations/test_web_detection_result_mapper.*`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Gate all external calls behind explicit compliance configuration that records approval status, allowed operation type, metadata logging acceptance, and usage limits.
|
||||||
|
- Send only EXIF-stripped external derivatives, never internal originals.
|
||||||
|
- Use only synchronous online web detection; do not use asynchronous offline batch operations.
|
||||||
|
- Normalize web entities, best guess labels, full/partial/visually similar images, and pages with matching images into evidence records.
|
||||||
|
- Store call outcome, source, failure reason, and quota state without storing returned content beyond operational evidence needed by the origin requirements.
|
||||||
|
|
||||||
|
**Execution note:** Start with adapter contract tests using a fake client so disabled-mode, failure-mode, and evidence mapping are correct before enabling real credentials.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Follow target app secrets management and outbound HTTP client patterns.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Covers AE3. Happy path: when compliance is approved and API is enabled, only the external derivative is sent and mapped evidence is stored.
|
||||||
|
- Covers AE3. Disabled mode: when compliance is not approved or external mode is disabled, no outbound call occurs and internal scoring continues.
|
||||||
|
- Happy path: web entities and matching-image URLs become operator-readable evidence reasons.
|
||||||
|
- Edge case: incidental global IP evidence is stored as evidence but does not activate dedicated brand/global-IP workflows.
|
||||||
|
- Error path: API timeout, quota exhaustion, or provider error records a failure reason and leaves existing high-risk evidence intact.
|
||||||
|
- Policy: attempting to call asynchronous offline batch mode is blocked by adapter policy.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- External API can be switched off without breaking the batch pipeline.
|
||||||
|
- Provider results are auditable but not treated as final legal decisions.
|
||||||
|
|
||||||
|
### U6. Evidence Normalization and Risk Scoring
|
||||||
|
|
||||||
|
**Goal:** Combine internal analysis, external web evidence, knowledge-base matches, failures, and prior decisions into a transparent 0-100 risk score with reason components.
|
||||||
|
|
||||||
|
**Requirements:** R2, R3, R4, R6; F1, F2, AE1, AE2, AE4, AE5
|
||||||
|
|
||||||
|
**Dependencies:** U1, U3, U4, U5
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/analysis/evidence_normalizer.*`
|
||||||
|
- Create: `src/rights_filter/analysis/risk_scoring.*`
|
||||||
|
- Create: `src/rights_filter/analysis/reason_builder.*`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_evidence_normalizer.*`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_risk_scoring.*`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_reason_builder.*`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Define score bands for low, medium, and high risk while keeping the raw 0-100 score visible to operators.
|
||||||
|
- Use component reasons such as prior rejection similarity, active sample match, web entity strength, matching URL evidence, face/person presence, and analysis failure.
|
||||||
|
- Treat face/person presence as a modest signal unless combined with stronger celebrity/IP evidence.
|
||||||
|
- Treat fanart and AI-style images as high risk only when supported by strong entity, work, character, URL, or similarity evidence.
|
||||||
|
- Ensure failures add uncertainty/review reasons without reducing other risk components.
|
||||||
|
- Version the scoring model so score changes remain explainable.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Prefer a small configurable scoring table over hard-coded branching spread across analyzers.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Covers AE1. Prior rejected-image similarity raises risk and produces a clear reason.
|
||||||
|
- Covers AE2. Face/person presence alone stays below high risk.
|
||||||
|
- Covers AE4. Character/work web evidence plus matching-image evidence can produce high risk for non-photo fanart or AI-style images.
|
||||||
|
- Covers AE5. External API failure adds a failure reason but does not lower an existing high-risk similarity match.
|
||||||
|
- Edge case: conflicting evidence is preserved in reasons rather than collapsed into an unexplained score.
|
||||||
|
- Regression: changing scoring version does not mutate historical analysis run results.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Operators can read the top reasons and understand why the score was low, medium, or high without inspecting raw provider payloads.
|
||||||
|
|
||||||
|
### U7. Batch Orchestration, Idempotency, and Operations
|
||||||
|
|
||||||
|
**Goal:** Run the analysis pipeline across all submitted images with retries, usage limits, idempotency, and partial-failure handling.
|
||||||
|
|
||||||
|
**Requirements:** R1, R5, R6, R9, R11; F1, AE3, AE5
|
||||||
|
|
||||||
|
**Dependencies:** U1, U2, U3, U5, U6
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/jobs/batch_analyzer.*`
|
||||||
|
- Create: `src/rights_filter/jobs/analysis_worker.*`
|
||||||
|
- Create: `src/rights_filter/jobs/retry_policy.*`
|
||||||
|
- Create: `src/rights_filter/jobs/usage_limits.*`
|
||||||
|
- Test: `tests/rights_filter/jobs/test_batch_analyzer.*`
|
||||||
|
- Test: `tests/rights_filter/jobs/test_analysis_worker.*`
|
||||||
|
- Test: `tests/rights_filter/jobs/test_retry_policy.*`
|
||||||
|
- Test: `tests/rights_filter/jobs/test_usage_limits.*`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Select submitted images that need initial or version-triggered analysis, avoiding duplicate work for the same image and scoring version.
|
||||||
|
- Run internal analysis for every eligible image; run external web detection only when policy and usage limits allow it.
|
||||||
|
- Record partial failures per source so one failed provider does not invalidate the whole run.
|
||||||
|
- Provide retry behavior for transient errors and terminal failure reasons for persistent errors.
|
||||||
|
- Emit operational counters for processed, skipped, failed, externally called, externally skipped, and limit-exhausted items.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Use target app background job, scheduler, queue, and observability conventions.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: all pending submissions receive analysis runs and score results.
|
||||||
|
- Happy path: re-running the batch does not duplicate analysis when inputs and scoring version have not changed.
|
||||||
|
- Edge case: external API usage limit is reached; remaining images still receive internal-only analysis and external-skipped reasons.
|
||||||
|
- Error path: corrupt images and transient provider failures produce failure reasons and retry behavior according to policy.
|
||||||
|
- Integration: batch execution creates evidence records, risk score records, and operator-visible summaries for each processed image.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- The batch can run safely on a schedule and can be paused or degraded to internal-only mode without data loss.
|
||||||
|
|
||||||
|
### U8. Operator Review Integration
|
||||||
|
|
||||||
|
**Goal:** Expose risk scores, reasons, evidence, and final operator decisions to the internal review workflow without showing automated analysis to applicants.
|
||||||
|
|
||||||
|
**Requirements:** R6, R7, R9; F2, F3, AE6, AE7
|
||||||
|
|
||||||
|
**Dependencies:** U1, U4, U6
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/admin/review_handlers.*`
|
||||||
|
- Create: `src/rights_filter/admin/review_presenters.*`
|
||||||
|
- Create: `src/rights_filter/admin/decision_feedback.*`
|
||||||
|
- Test: `tests/rights_filter/admin/test_review_handlers.*`
|
||||||
|
- Test: `tests/rights_filter/admin/test_review_presenters.*`
|
||||||
|
- Test: `tests/rights_filter/admin/test_decision_feedback.*`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Add operator-facing access to score, band, top reasons, evidence links, external API usage marker, and analysis failures.
|
||||||
|
- Keep the status action separate: operators explicitly choose approved, held, or rejected.
|
||||||
|
- Ensure applicant-facing surfaces cannot read risk score, evidence, automated reasons, or provider details.
|
||||||
|
- On rejection, trigger knowledge-base accumulation of image fingerprints and provenance for future matching.
|
||||||
|
- Allow operator memo and correction flows to update decisions and mark derived knowledge-base entries for deactivation review.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Match target app admin authorization, audit logging, and review-state transition patterns.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Covers AE6. A high-risk score appears to operators but does not change application status automatically.
|
||||||
|
- Covers AE6. Applicant-facing endpoints or views do not expose score, evidence, or automatic reasons.
|
||||||
|
- Covers AE7. Rejection creates automatic knowledge-base reference with source decision provenance.
|
||||||
|
- Edge case: corrected rejection can mark derived reference entries as inactive or requiring operator review.
|
||||||
|
- Error path: missing or failed analysis is shown as a review reason rather than hiding the submission from the queue.
|
||||||
|
- Integration: operator review surface can display internal-only, external-enriched, and partial-failure analysis runs consistently.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Operators can make final decisions from the evidence view while automated analysis remains internal-only and non-authoritative.
|
||||||
|
|
||||||
|
### U9. Governance, Retention, and Runbooks
|
||||||
|
|
||||||
|
**Goal:** Add policy enforcement, operational controls, and documentation for sensitive image data, derivatives, fingerprints, evidence, external API usage, and corrections.
|
||||||
|
|
||||||
|
**Requirements:** R5, R9, R10, R11; AE3, AE8
|
||||||
|
|
||||||
|
**Dependencies:** U1, U2, U5, U7, U8
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/governance/retention_policy.*`
|
||||||
|
- Create: `src/rights_filter/governance/access_policy.*`
|
||||||
|
- Create: `src/rights_filter/governance/correction_policy.*`
|
||||||
|
- Create: `docs/operations/image-rights-risk-filter.md`
|
||||||
|
- Test: `tests/rights_filter/governance/test_retention_policy.*`
|
||||||
|
- Test: `tests/rights_filter/governance/test_access_policy.*`
|
||||||
|
- Test: `tests/rights_filter/governance/test_correction_policy.*`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Define policy-managed classes for original images, external derivatives, internal derivatives, fingerprints, evidence, provider metadata, and operator notes.
|
||||||
|
- Enforce access boundaries for operator-only analysis output.
|
||||||
|
- Provide retention and deletion hooks that can be wired to the target app's storage cleanup mechanisms.
|
||||||
|
- Document external API enablement prerequisites: contract/DPA review, metadata logging acceptance, region/credential policy, quota limits, and rollback to internal-only mode.
|
||||||
|
- Document what the system is not allowed to do: no scraping, no applicant exposure, no face recognition, no biometric template store, no automatic status change.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Use target app policy, permission, and runbook conventions.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: each data class has an assigned policy for access, retention, deletion, and correction handling.
|
||||||
|
- Covers AE3. External API cannot be enabled until required compliance settings are present.
|
||||||
|
- Covers AE8. A sample image entry stores image-level fingerprint evidence but no face biometric template.
|
||||||
|
- Edge case: deleting or correcting a source decision marks dependent automatic knowledge-base entries for deactivation handling.
|
||||||
|
- Error path: unauthorized access to operator-only evidence is denied and audited.
|
||||||
|
- Operational: disabling external API mode causes the batch to continue internal-only and records skipped-external reasons.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- A deployer has enough runbook detail to enable, disable, audit, and roll back the filter safely.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## System-Wide Impact
|
||||||
|
|
||||||
|
- **Interaction graph:** The subsystem touches image submission storage, background job scheduling, external outbound API configuration, operator review, and decision feedback.
|
||||||
|
- **Error propagation:** Source-specific failures become evidence reasons and operational counters, not silent success or low-risk results.
|
||||||
|
- **State lifecycle risks:** Analysis runs are versioned; rejected decisions can create derived knowledge-base entries; corrected decisions must not leave stale active references behind.
|
||||||
|
- **API surface parity:** Operator-only review surfaces may read scores and evidence; applicant-facing surfaces must not.
|
||||||
|
- **Integration coverage:** Batch-to-evidence-to-score-to-review-to-knowledge-base feedback must be tested as a full flow.
|
||||||
|
- **Unchanged invariants:** Existing submission and operator decision status semantics remain authoritative; the filter never changes status directly.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Risks & Dependencies
|
||||||
|
|
||||||
|
| Risk | Mitigation |
|
||||||
|
|------|------------|
|
||||||
|
| External API data terms are misunderstood | Keep external adapter disabled until contract, DPA, content retention, request metadata logging, region, and credential controls are approved. |
|
||||||
|
| False positives poison future matching | Keep provenance, deactivation, correction, and audit trails for all automatic rejected-image entries. |
|
||||||
|
| Face/person detection drifts into identity recognition | Validate output shape, forbid face embeddings/biometric templates, and document the boundary in governance policy. |
|
||||||
|
| Batch costs grow unexpectedly | Add usage limits, idempotency, external disable mode, and processed/skipped/failed counters. |
|
||||||
|
| Operators over-trust scores | Present reasons and evidence, keep final decision manual, and label provider output as evidence rather than legal judgment. |
|
||||||
|
| Actual application stack differs from proposed paths | Adapt module paths to the real app before implementation while preserving unit boundaries and tests. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Alternative Approaches Considered
|
||||||
|
|
||||||
|
- External-search-only filter: rejected because it would create unnecessary third-party dependency and weaker behavior when API use is disabled.
|
||||||
|
- Local-only filter: rejected for v1 because the knowledge base starts empty and would miss many celebrity/IP references before operator data accumulates.
|
||||||
|
- Automatic hold/reject workflow: rejected by origin requirements because operator judgment must remain authoritative.
|
||||||
|
- Face-recognition or celebrity-identification model: rejected because the origin explicitly excludes specific individual face recognition and biometric template storage.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phased Delivery
|
||||||
|
|
||||||
|
### Phase 1: Internal foundation
|
||||||
|
|
||||||
|
- Land U1, U2, U3, and the internal-only path through U6.
|
||||||
|
- Operators can see internal duplicate/rejection/face-person/failure reasons with no external API dependency.
|
||||||
|
|
||||||
|
### Phase 2: Knowledge and review loop
|
||||||
|
|
||||||
|
- Land U4 and U8 so operator decisions create reusable knowledge and correction paths.
|
||||||
|
|
||||||
|
### Phase 3: External enrichment
|
||||||
|
|
||||||
|
- Land U5 after compliance approval, then connect it through U6 and U7 with usage limits.
|
||||||
|
|
||||||
|
### Phase 4: Governance hardening
|
||||||
|
|
||||||
|
- Land U9 runbooks, retention hooks, access tests, and production enablement checklist before external API rollout.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Documentation / Operational Notes
|
||||||
|
|
||||||
|
- Document how to run the system in internal-only mode and how to disable external calls immediately.
|
||||||
|
- Document the Cloud Vision approval checklist, including online synchronous-only usage and metadata logging review.
|
||||||
|
- Document how operators should interpret low, medium, and high scores without treating them as legal conclusions.
|
||||||
|
- Document how to deactivate or correct automatically generated knowledge-base references after an erroneous rejection.
|
||||||
|
- Document retention classes for originals, derivatives, fingerprints, provider evidence, and operator notes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Sources & References
|
||||||
|
|
||||||
|
- **Origin document:** [docs/brainstorms/2026-05-25-image-rights-risk-filter-requirements.md](docs/brainstorms/2026-05-25-image-rights-risk-filter-requirements.md)
|
||||||
|
- Google Cloud Vision Data Usage FAQ: https://docs.cloud.google.com/vision/docs/data-usage
|
||||||
|
- Google Cloud Vision Web Detection: https://docs.cloud.google.com/vision/docs/detecting-web
|
||||||
|
- Google Cloud Vision Face Detection: https://docs.cloud.google.com/vision/docs/detecting-faces
|
||||||
|
|
@ -0,0 +1,570 @@
|
||||||
|
---
|
||||||
|
title: "feat: Add Image Rights Review Enrichment"
|
||||||
|
type: feat
|
||||||
|
status: completed
|
||||||
|
date: 2026-05-25
|
||||||
|
origin: docs/brainstorms/2026-05-25-image-rights-review-enrichment-requirements.md
|
||||||
|
---
|
||||||
|
|
||||||
|
# feat: Add Image Rights Review Enrichment
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Extend the existing portable `rights_filter` core with search-enriched evidence, internal LLM-assisted query and summary boundaries, and a detailed operator review view model. Because this workspace has no real web admin app or database, the plan delivers backend contracts and tests that a target admin UI can render without re-deciding product behavior.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problem Frame
|
||||||
|
|
||||||
|
The current filter can analyze images, score risk, and expose a basic operator summary. Operators still need a single detailed review surface where internal evidence, Korean search evidence, provider evidence, LLM summaries, failures, and final manual actions are grouped coherently.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
- R1. Provide a detailed operator review representation containing image reference, 0-100 score, band, top reasons, evidence groups, provider status, analysis failures, and manual decision actions. (Origin R1-R5, F2, AE7)
|
||||||
|
- R2. Add Naver search evidence as official text-query evidence only; do not upload images or automate/scrape Naver web UI. (Origin R6-R10, R20, AE1, AE2)
|
||||||
|
- R3. Add internal LLM assistance for query generation, search-result structuring, contradiction/deduplication, and operator summaries only. (Origin R11-R14, AE3)
|
||||||
|
- R4. Ensure LLM output is never a standalone scoring source or final decision authority. (Origin R12-R13, AE3)
|
||||||
|
- R5. Integrate enrichment failures and skipped providers as operator-visible reasons without reducing existing high-risk evidence. (Origin R5, R21, R23, AE4)
|
||||||
|
- R6. Keep approval, hold, and rejection as explicit operator actions; automated analysis must not change status. (Origin R15, AE5)
|
||||||
|
- R7. Preserve applicant isolation: applicants cannot see scores, search evidence, LLM summaries, provider details, or analysis failure reasons. (Origin R19, AE7)
|
||||||
|
- R8. Support rejection-derived knowledge accumulation and correction/deactivation paths that prevent bad decisions from poisoning future matching. (Origin R16-R18, F3, F4, AE5, AE6)
|
||||||
|
- R9. Keep existing Google Web Detection compliance gates and no-scraping boundaries intact. (Origin R20-R22)
|
||||||
|
|
||||||
|
**Origin actors:** A1 신청자, A2 운영자, A3 권리 리스크 필터, A4 내부 LLM, A5 Naver 검색 API, A6 Google Cloud Vision Web Detection
|
||||||
|
|
||||||
|
**Origin flows:** F1 검색 보강 분석, F2 상세 검토, F3 판정 기반 기준 DB 누적, F4 정정 및 오염 방지
|
||||||
|
|
||||||
|
**Origin acceptance examples:** AE1, AE2, AE3, AE4, AE5, AE6, AE7
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scope Boundaries
|
||||||
|
|
||||||
|
- No real web admin frontend is built in this workspace. The deliverable is a backend review view model and integration contract that a target app can render.
|
||||||
|
- No Naver image-upload reverse search.
|
||||||
|
- No Google Image Search, Google Lens, Naver web UI automation, or scraping.
|
||||||
|
- No external LLM integration in this iteration.
|
||||||
|
- No LLM-based legal judgment, score calculation, automatic approval, or automatic rejection.
|
||||||
|
- No applicant-facing explanation, rights-evidence upload, or appeal UI.
|
||||||
|
- No dedicated brand/logo/trademark/stock-image detector; strong incidental evidence can still appear as operator evidence.
|
||||||
|
- No face recognition, celebrity identification from faces, face embeddings, or biometric template storage.
|
||||||
|
|
||||||
|
### Deferred to Follow-Up Work
|
||||||
|
|
||||||
|
- Target admin UI implementation: wire the detailed view model into the actual application once the app framework, routes, auth, and database exist.
|
||||||
|
- Full criteria database management screen: keep only the backend hooks needed for review feedback and contamination control here.
|
||||||
|
- Search-quality calibration: run pilot samples and tune query generation, result promotion, and scoring weights after the adapters exist.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context & Research
|
||||||
|
|
||||||
|
### Relevant Code and Patterns
|
||||||
|
|
||||||
|
- `src/rights_filter/domain/records.py` defines immutable-ish evidence records, analysis runs, knowledge-base entries, review statuses, data classes, and the in-memory repository.
|
||||||
|
- `src/rights_filter/jobs/batch_analyzer.py` orchestrates internal analysis, external derivative creation, Cloud Vision evidence, scoring, and run persistence.
|
||||||
|
- `src/rights_filter/integrations/cloud_vision_web_detection.py` uses a fake client boundary and maps provider results into `Evidence` records.
|
||||||
|
- `src/rights_filter/integrations/external_policy.py` provides a simple policy gate with disabled/compliance/metadata/online/quota checks.
|
||||||
|
- `src/rights_filter/admin/review_handlers.py` already separates operator-visible summaries from applicant summaries and creates automatic rejected-image entries on rejection.
|
||||||
|
- `src/rights_filter/analysis/risk_scoring.py` scores by evidence source and keeps failures from acting as exculpatory evidence.
|
||||||
|
- `tests/rights_filter/test_public_module_layout.py` protects the public module boundary; new planned modules should be added there.
|
||||||
|
|
||||||
|
### Institutional Learnings
|
||||||
|
|
||||||
|
- No `docs/solutions/` directory or prior institutional learning notes are present.
|
||||||
|
- There is no `STRATEGY.md`, `AGENTS.md`, or `CLAUDE.md` file in the workspace root; the active instructions are from the conversation context.
|
||||||
|
|
||||||
|
### External References
|
||||||
|
|
||||||
|
- Naver image search API is a REST API that accepts search terms and conditions as query-string data, not image uploads. It documents image search result fields such as title, link, thumbnail, size, and daily Search API quota. https://developers.naver.com/docs/serviceapi/search/image/image.md
|
||||||
|
- Naver Search API product page lists web, news, blog, image, encyclopedia, and other search surfaces, with a 25,000/day processing limit. https://developers.naver.com/products/service-api/search/search.md
|
||||||
|
- Naver API terms state API use is subject to provided conditions, allowed counts, client ID management, and policy compliance. https://developers.naver.com/products/terms
|
||||||
|
- Google Cloud Vision Data Usage FAQ explains the data-handling distinction that must remain part of the external enablement checklist. https://docs.cloud.google.com/vision/docs/data-usage
|
||||||
|
- Google Cloud Vision Web Detection can return web entities, matching images, visually similar images, pages with matching images, and best guess labels. https://docs.cloud.google.com/vision/docs/detecting-web
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Technical Decisions
|
||||||
|
|
||||||
|
- Backend view model first: since no admin app exists locally, implement the detailed review surface as structured presenter output rather than a web page.
|
||||||
|
- Evidence-source expansion over special-case strings: represent Naver search, LLM summaries, and enrichment skipped/failure states as first-class evidence sources so grouping, scoring, and presentation remain explicit.
|
||||||
|
- Text-only Naver adapter: mirror the existing fake-client adapter pattern, but accept only query text and provider options; image payload types must not be part of the Naver adapter interface.
|
||||||
|
- LLM as evidence organizer: use an internal LLM boundary that emits candidate queries and summaries tied to source evidence IDs or source URLs; never let LLM output feed scoring directly.
|
||||||
|
- Search-result promotion is conservative: Naver evidence contributes meaningful risk only when linked to named persons, works, characters, broadcasts, webtoons, games, official pages, or repeated matching-image sources.
|
||||||
|
- Enrichment orchestration is separate from the existing batch analyzer until the flow is proven: keep a focused enrichment job that can be called after or within batch analysis without destabilizing the internal-only path.
|
||||||
|
- Correction is provenance-driven: automatic rejection-derived entries must stay distinguishable from manual entries and deactivatable from the source decision path.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
### Resolved During Planning
|
||||||
|
|
||||||
|
- Detailed review screen form: implement a backend presenter/view model now; defer a real web admin screen because no app shell exists in this workspace.
|
||||||
|
- Naver role: use official text-query search only, with fake-client tests; do not perform image upload reverse search.
|
||||||
|
- LLM role: query generation, result structuring, and operator summary only; no score or final status authority.
|
||||||
|
- Rejection feedback default: create explicit automatic entries or candidates with provenance, and require correction/deactivation mechanics in the same iteration.
|
||||||
|
|
||||||
|
### Deferred to Implementation
|
||||||
|
|
||||||
|
- Exact Naver request tuning: final display count, sort order, and query variants should be validated with pilot samples and quota behavior.
|
||||||
|
- Internal LLM runtime: choose the actual local/internal model, prompt storage, logging policy, and deployment boundary in the target environment.
|
||||||
|
- Real admin app integration: map the review view model to framework routes, components, auth, and storage once the target app exists.
|
||||||
|
- Final score weights: tune Naver evidence contribution after sample outcomes are collected; start conservative.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Output Structure
|
||||||
|
|
||||||
|
src/
|
||||||
|
rights_filter/
|
||||||
|
analysis/
|
||||||
|
evidence_enrichment.py
|
||||||
|
llm_assistance.py
|
||||||
|
search_result_promoter.py
|
||||||
|
admin/
|
||||||
|
detailed_review_presenter.py
|
||||||
|
knowledge_base_handlers.py
|
||||||
|
correction_handlers.py
|
||||||
|
integrations/
|
||||||
|
naver_search.py
|
||||||
|
search_policy.py
|
||||||
|
jobs/
|
||||||
|
review_enrichment_job.py
|
||||||
|
tests/
|
||||||
|
rights_filter/
|
||||||
|
analysis/
|
||||||
|
admin/
|
||||||
|
integrations/
|
||||||
|
jobs/
|
||||||
|
docs/
|
||||||
|
operations/
|
||||||
|
|
||||||
|
The tree shows proposed additions. The implementer may adjust names to fit the target application, but should preserve the same boundaries.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## High-Level Technical Design
|
||||||
|
|
||||||
|
> *This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.*
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TB
|
||||||
|
Run[Existing analysis run] --> Query[Internal LLM query generation]
|
||||||
|
Query --> NaverGate{Naver policy allows?}
|
||||||
|
NaverGate -->|yes| Naver[Naver text-query search]
|
||||||
|
NaverGate -->|no| SearchSkipped[Search skipped evidence]
|
||||||
|
Naver --> Promote[Search result promotion]
|
||||||
|
Promote --> Ledger[Evidence ledger]
|
||||||
|
SearchSkipped --> Ledger
|
||||||
|
Run --> Ledger
|
||||||
|
Ledger --> Summary[Internal LLM evidence summary]
|
||||||
|
Summary --> Review[Detailed operator review view model]
|
||||||
|
Review --> Decision[Manual approve / hold / reject]
|
||||||
|
Decision --> Knowledge[Knowledge-base feedback]
|
||||||
|
Knowledge --> Correction[Correction / deactivation]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Units
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TB
|
||||||
|
U1[U1 Evidence model extensions]
|
||||||
|
U2[U2 Naver search adapter]
|
||||||
|
U3[U3 Internal LLM assistance]
|
||||||
|
U4[U4 Enrichment orchestration]
|
||||||
|
U5[U5 Scoring and reason promotion]
|
||||||
|
U6[U6 Detailed review presenter]
|
||||||
|
U7[U7 Decision feedback and correction]
|
||||||
|
U8[U8 Governance docs and module layout]
|
||||||
|
|
||||||
|
U1 --> U2
|
||||||
|
U1 --> U3
|
||||||
|
U2 --> U4
|
||||||
|
U3 --> U4
|
||||||
|
U4 --> U5
|
||||||
|
U5 --> U6
|
||||||
|
U6 --> U7
|
||||||
|
U1 --> U8
|
||||||
|
U7 --> U8
|
||||||
|
```
|
||||||
|
|
||||||
|
### U1. Evidence Model Extensions
|
||||||
|
|
||||||
|
**Goal:** Extend the domain records so search evidence, LLM summaries, provider skips, and source-linked summaries can be represented without stringly-typed workarounds.
|
||||||
|
|
||||||
|
**Requirements:** R1, R2, R3, R4, R5, R7
|
||||||
|
|
||||||
|
**Dependencies:** Existing domain model
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `src/rights_filter/domain/records.py`
|
||||||
|
- Modify: `src/rights_filter/governance/policies.py`
|
||||||
|
- Test: `tests/rights_filter/domain/test_records.py`
|
||||||
|
- Test: `tests/rights_filter/governance/test_policies.py`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Add evidence source categories for Naver search, LLM summary, search skipped, and enrichment failure while keeping existing source names stable.
|
||||||
|
- Add data-class coverage for search evidence and LLM summary output so governance policies can distinguish them from provider metadata and operator notes.
|
||||||
|
- Preserve the simple `Evidence` shape, but standardize expected `data` keys by source in tests and presenters.
|
||||||
|
- Keep the repository in-memory for this workspace; do not invent a database layer.
|
||||||
|
|
||||||
|
**Execution note:** Write tests first for the privacy and governance boundary: LLM summary records must point back to source evidence, and Naver records must not contain uploaded-image payloads.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Follow the enum/dataclass style in `src/rights_filter/domain/records.py`.
|
||||||
|
- Follow policy mapping style in `src/rights_filter/governance/policies.py`.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: a Naver search evidence record stores query, result URL, image URL, thumbnail URL, title, rank, and retrieved timestamp.
|
||||||
|
- Happy path: an LLM summary evidence record references source evidence IDs or source URLs and is classified as operator-only data.
|
||||||
|
- Edge case: Naver evidence data does not accept original-image or derivative-image payload markers.
|
||||||
|
- Error path: governance validation rejects LLM summary data that claims standalone authority without source references.
|
||||||
|
- Regression: existing fingerprint, face/person, web detection, external skipped, and failure evidence remain importable and scoreable.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- The domain layer can represent every enrichment requirement without exposing applicant-visible fields or storing biometric artifacts.
|
||||||
|
|
||||||
|
### U2. Naver Text-Query Search Adapter
|
||||||
|
|
||||||
|
**Goal:** Add a Naver integration boundary that performs official text-query search through a fake-client contract in tests and records provider outcomes as evidence candidates.
|
||||||
|
|
||||||
|
**Requirements:** R2, R5, R9
|
||||||
|
|
||||||
|
**Dependencies:** U1
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/integrations/search_policy.py`
|
||||||
|
- Create: `src/rights_filter/integrations/naver_search.py`
|
||||||
|
- Test: `tests/rights_filter/integrations/test_naver_search.py`
|
||||||
|
- Test: `tests/rights_filter/integrations/test_search_policy.py`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Model a search policy with disabled state, compliance approval, daily quota, calls made, and allowed provider names.
|
||||||
|
- Implement a fake-client adapter pattern similar to `CloudVisionWebDetectionAdapter`.
|
||||||
|
- Accept only text query requests and search parameters; keep image payload types out of the public method boundary.
|
||||||
|
- Map Naver result items into evidence records with source, query, rank, result URLs, thumbnail, title/description, and provider status.
|
||||||
|
- Record skipped, quota-exhausted, and provider-error states as operator-visible evidence.
|
||||||
|
|
||||||
|
**Execution note:** Start with adapter contract tests using a fake Naver client; do not add real credentials or live outbound calls.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- `src/rights_filter/integrations/cloud_vision_web_detection.py` fake client and response mapper.
|
||||||
|
- `src/rights_filter/integrations/external_policy.py` simple policy-gate style.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: approved text query returns multiple Naver result evidence records with query and rank preserved.
|
||||||
|
- Edge case: empty result set creates a low-confidence "no results" evidence record instead of disappearing.
|
||||||
|
- Error path: disabled policy, quota exhaustion, or provider exception records a skipped/failure evidence item and makes no client call when policy blocks it.
|
||||||
|
- Policy: passing an image payload-shaped input to the Naver adapter boundary is rejected before any provider call.
|
||||||
|
- Integration: Naver evidence can be stored on an analysis run alongside fingerprint and Google Web Detection evidence.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- No test path sends original images or derivatives to Naver, and skipped search still appears in operator review data.
|
||||||
|
|
||||||
|
### U3. Internal LLM Query and Summary Assistance
|
||||||
|
|
||||||
|
**Goal:** Add an internal LLM boundary that can generate Korean search queries and summarize evidence for operators while remaining source-linked and non-authoritative.
|
||||||
|
|
||||||
|
**Requirements:** R3, R4, R5, R7
|
||||||
|
|
||||||
|
**Dependencies:** U1
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/analysis/llm_assistance.py`
|
||||||
|
- Create: `src/rights_filter/analysis/search_query_generation.py`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_llm_assistance.py`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_search_query_generation.py`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Define an internal assistant interface with deterministic fake implementation for tests.
|
||||||
|
- Generate query candidates from existing evidence, OCR/label placeholders when present, knowledge-base names/aliases, and Google Web Detection entities.
|
||||||
|
- Structure assistant summaries as evidence records that cite source evidence IDs or URLs.
|
||||||
|
- Add guardrails that mark ungrounded assistant claims as notes, not risk reasons.
|
||||||
|
- Keep all assistant outputs operator-only and exclude them from applicant summaries.
|
||||||
|
|
||||||
|
**Execution note:** Test first that LLM output without citations cannot become a score reason.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Existing analysis service classes under `src/rights_filter/analysis/`.
|
||||||
|
- Existing applicant/operator separation in `src/rights_filter/admin/review_handlers.py`.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: web entity "아이유" and alias "IU" produce Korean query candidates that include person/work context without duplicates.
|
||||||
|
- Happy path: LLM summary cites Naver and Google evidence URLs and appears in the operator review model.
|
||||||
|
- Edge case: duplicate or contradictory search results are summarized as uncertainty, not collapsed into one definitive claim.
|
||||||
|
- Error path: LLM provider failure records an enrichment failure evidence item and does not block existing analysis.
|
||||||
|
- Policy: source-less LLM claims are excluded from scoring reasons and applicant summaries.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Internal LLM assistance reduces operator reading work while remaining auditable and non-authoritative.
|
||||||
|
|
||||||
|
### U4. Review Enrichment Orchestration
|
||||||
|
|
||||||
|
**Goal:** Orchestrate query generation, Naver search, result promotion, LLM summary creation, and evidence persistence around existing analysis runs.
|
||||||
|
|
||||||
|
**Requirements:** R1, R2, R3, R5
|
||||||
|
|
||||||
|
**Dependencies:** U1, U2, U3
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/analysis/evidence_enrichment.py`
|
||||||
|
- Create: `src/rights_filter/jobs/review_enrichment_job.py`
|
||||||
|
- Modify: `src/rights_filter/jobs/batch_analyzer.py`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_evidence_enrichment.py`
|
||||||
|
- Test: `tests/rights_filter/jobs/test_review_enrichment_job.py`
|
||||||
|
- Test: `tests/rights_filter/jobs/test_batch_analyzer.py`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Keep enrichment idempotent by analysis version and provider/query signature.
|
||||||
|
- Allow the batch analyzer to remain useful in internal-only mode; enrichment should be callable after a run is created or disabled entirely.
|
||||||
|
- Store each query and provider outcome as evidence so the detailed review surface can explain missing or skipped evidence.
|
||||||
|
- Preserve partial success: one failed query or provider should not invalidate other evidence.
|
||||||
|
- Record operational counters for generated queries, executed searches, skipped searches, provider failures, and summary failures.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- `src/rights_filter/jobs/batch_analyzer.py` summary counters and idempotency check.
|
||||||
|
- Evidence append flow through `AnalysisRun.add_evidence`.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: an analysis run with Google entity evidence generates Naver queries, stores Naver evidence, stores an LLM summary, and recomputes a score.
|
||||||
|
- Happy path: rerunning enrichment with the same analysis version and same query signature does not duplicate evidence.
|
||||||
|
- Edge case: Naver disabled still records search-skipped evidence and creates an LLM summary from internal/Google evidence when possible.
|
||||||
|
- Error path: corrupt or missing analysis run returns a failure summary without creating a misleading low-risk result.
|
||||||
|
- Integration: batch analysis can run without enrichment, and enrichment can run later against stored runs.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Search enrichment can be enabled, disabled, retried, and audited independently of the internal-only baseline.
|
||||||
|
|
||||||
|
### U5. Search Result Promotion, Scoring, and Reasons
|
||||||
|
|
||||||
|
**Goal:** Convert promoted search evidence into conservative score contributions and operator-readable reasons while preventing LLM output from directly affecting the score.
|
||||||
|
|
||||||
|
**Requirements:** R1, R2, R3, R4, R5
|
||||||
|
|
||||||
|
**Dependencies:** U1, U2, U3, U4
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/analysis/search_result_promoter.py`
|
||||||
|
- Modify: `src/rights_filter/analysis/risk_scoring.py`
|
||||||
|
- Modify: `src/rights_filter/analysis/reason_builder.py`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_search_result_promoter.py`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_risk_scoring.py`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_reason_builder.py`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Promote Naver results only when they connect the query/result to named people, works, characters, broadcasts, webtoons, games, official sources, or repeated matching-image sources.
|
||||||
|
- Treat low-confidence Naver evidence as operator context, not high-risk proof.
|
||||||
|
- Ensure LLM summary evidence is visible but contributes zero direct score points.
|
||||||
|
- Keep failures and skips visible; failures add uncertainty where appropriate but never reduce stronger high-risk evidence.
|
||||||
|
- Preserve historical score behavior for existing fingerprint, face/person, and Google evidence.
|
||||||
|
|
||||||
|
**Execution note:** Add regression tests that face/person evidence alone and LLM-only evidence do not produce high risk.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- `src/rights_filter/analysis/risk_scoring.py` source-based scoring and unique reason handling.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: Naver evidence linking a Korean celebrity name and repeated image sources contributes medium or high review risk with clear reasons.
|
||||||
|
- Happy path: Naver evidence for a known character/work plus Google matching-image evidence reaches high risk.
|
||||||
|
- Edge case: generic image results with no named person/work/character remain context-only and do not create high risk.
|
||||||
|
- Error path: LLM summary claiming a celebrity match without source references contributes no score reason.
|
||||||
|
- Regression: external provider failure does not lower a prior rejected-image similarity score.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Operators see why search evidence mattered, and scores remain explainable without trusting LLM prose.
|
||||||
|
|
||||||
|
### U6. Detailed Operator Review Presenter
|
||||||
|
|
||||||
|
**Goal:** Build the backend representation of the detailed review screen, grouping evidence and actions so a future web admin UI can render it directly.
|
||||||
|
|
||||||
|
**Requirements:** R1, R6, R7
|
||||||
|
|
||||||
|
**Dependencies:** U1, U5
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/admin/detailed_review_presenter.py`
|
||||||
|
- Modify: `src/rights_filter/admin/review_handlers.py`
|
||||||
|
- Modify: `src/rights_filter/admin/review_presenters.py`
|
||||||
|
- Test: `tests/rights_filter/admin/test_detailed_review_presenter.py`
|
||||||
|
- Test: `tests/rights_filter/admin/test_review_handlers.py`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Produce a detailed review dictionary or dataclass containing submission ID, image reference, score, band, top reasons, grouped evidence, provider statuses, LLM summaries, failures, and allowed manual actions.
|
||||||
|
- Group evidence into internal, Naver, Google, LLM, failure/skipped, and knowledge-base sections.
|
||||||
|
- Keep applicant summaries minimal and unchanged except for explicit regression coverage.
|
||||||
|
- Surface missing analysis as a review state rather than hiding the submission.
|
||||||
|
- Keep action affordances separate from automated recommendation data.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Existing `operator_summary_for` and `applicant_summary_for` behavior in `src/rights_filter/admin/review_handlers.py`.
|
||||||
|
- Existing tests in `tests/rights_filter/admin/test_review_handlers.py`.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: detailed review output includes image reference, score, band, top reasons, Naver evidence group, Google evidence group, and LLM summary group.
|
||||||
|
- Happy path: high-risk analysis appears to operators but does not change review status automatically.
|
||||||
|
- Edge case: missing analysis returns a review model with analysis unavailable and manual actions still controlled by the host workflow.
|
||||||
|
- Error path: failed Naver, failed LLM, or disabled Google evidence appears under provider status/failure groups.
|
||||||
|
- Security: applicant summary excludes score, reasons, evidence, LLM summaries, provider status, and failure details.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- A real admin UI can be built from the presenter output without exposing internal evidence to applicants.
|
||||||
|
|
||||||
|
### U7. Decision Feedback and Contamination Control
|
||||||
|
|
||||||
|
**Goal:** Strengthen rejection-derived knowledge accumulation and correction flows so automatic entries are useful but reversible.
|
||||||
|
|
||||||
|
**Requirements:** R6, R8
|
||||||
|
|
||||||
|
**Dependencies:** U1, U6
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/rights_filter/admin/knowledge_base_handlers.py`
|
||||||
|
- Create: `src/rights_filter/admin/correction_handlers.py`
|
||||||
|
- Modify: `src/rights_filter/admin/decision_feedback.py`
|
||||||
|
- Modify: `src/rights_filter/domain/knowledge_base.py`
|
||||||
|
- Test: `tests/rights_filter/admin/test_knowledge_base_handlers.py`
|
||||||
|
- Test: `tests/rights_filter/admin/test_correction_handlers.py`
|
||||||
|
- Test: `tests/rights_filter/domain/test_knowledge_base.py`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Keep automatic rejection-derived entries distinct from manual operator registrations and search-result candidates.
|
||||||
|
- Add correction handling that deactivates entries derived from a corrected decision while preserving audit history.
|
||||||
|
- Allow manual entity registration with names, aliases, related keywords, policy memo, exception notes, and sample fingerprints.
|
||||||
|
- For this workspace, implement repository-level behavior only; defer role-based UI and persistent audit tables to target app integration.
|
||||||
|
|
||||||
|
**Execution note:** Test first around stale automatic entries: a corrected rejection must stop influencing future matching.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- `InMemoryRightsFilterRepository.create_rejected_image_entry`.
|
||||||
|
- `KnowledgeBaseEntry` provenance and active/deactivation fields in `src/rights_filter/domain/records.py`.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: rejecting a submission creates an automatic rejected-image entry with source decision provenance.
|
||||||
|
- Happy path: manually registering a celebrity or character entry creates a manual entry with aliases and policy memo.
|
||||||
|
- Edge case: automatic and manual entries with similar names remain separate and independently deactivatable.
|
||||||
|
- Error path: correcting a rejection deactivates derived automatic entries but leaves manual entries untouched.
|
||||||
|
- Privacy: attempts to register face embeddings or biometric templates are rejected by governance validation.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- The knowledge base can grow from real operator decisions without making false positives permanent.
|
||||||
|
|
||||||
|
### U8. Governance, Operations, and Public Module Layout
|
||||||
|
|
||||||
|
**Goal:** Update operations guidance, public module tests, and governance policies so the new enrichment modes remain discoverable and safe to operate.
|
||||||
|
|
||||||
|
**Requirements:** R2, R3, R4, R5, R7, R9
|
||||||
|
|
||||||
|
**Dependencies:** U1, U7
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `docs/operations/image-rights-risk-filter.md`
|
||||||
|
- Modify: `tests/rights_filter/test_public_module_layout.py`
|
||||||
|
- Modify: `src/rights_filter/integrations/__init__.py`
|
||||||
|
- Modify: `src/rights_filter/analysis/__init__.py`
|
||||||
|
- Modify: `src/rights_filter/admin/__init__.py`
|
||||||
|
- Test: `tests/rights_filter/governance/test_policies.py`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Document search-enriched mode and LLM-assisted mode with explicit disable guidance.
|
||||||
|
- Add public module imports for planned modules so missing boundaries fail fast.
|
||||||
|
- Update governance tests for Naver text-only evidence, LLM source-linking, applicant isolation, and no-scraping boundaries.
|
||||||
|
- Keep external API enablement documentation explicit: Naver credentials and Google credentials are separate provider risks.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Existing operations doc structure in `docs/operations/image-rights-risk-filter.md`.
|
||||||
|
- Existing public module import test in `tests/rights_filter/test_public_module_layout.py`.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: all new public modules import successfully.
|
||||||
|
- Policy: operations guidance states Naver uses text queries only and LLM summaries are reading aids.
|
||||||
|
- Security: governance tests cover applicant non-exposure for Naver evidence and LLM summaries.
|
||||||
|
- Regression: existing Cloud Vision compliance-gated mode remains documented and tested.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- A deployer can identify how to enable, disable, audit, and explain search and LLM enrichment without weakening existing safety boundaries.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## System-Wide Impact
|
||||||
|
|
||||||
|
- **Interaction graph:** The enrichment flow touches analysis runs, evidence source typing, provider adapters, scoring, operator presenters, decision feedback, and governance policies.
|
||||||
|
- **Error propagation:** Naver, LLM, and Google failures become evidence/failure groups for operators, not silent success and not low-risk proof.
|
||||||
|
- **State lifecycle risks:** Enrichment must be idempotent by run/provider/query to avoid duplicate evidence; rejection-derived knowledge entries must be deactivatable when decisions are corrected.
|
||||||
|
- **API surface parity:** Operator-only surfaces gain richer evidence; applicant-facing summaries must stay intentionally sparse.
|
||||||
|
- **Integration coverage:** End-to-end tests should cover analysis run -> enrichment -> score -> detailed review -> rejection -> knowledge entry -> correction.
|
||||||
|
- **Unchanged invariants:** The filter never changes review status automatically, never exposes provider evidence to applicants, and never uses face identity recognition.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Risks & Dependencies
|
||||||
|
|
||||||
|
| Risk | Mitigation |
|
||||||
|
|------|------------|
|
||||||
|
| Naver API is mistaken for reverse image search | Keep adapter text-query only, reject image payload inputs, and document the official API limitation. |
|
||||||
|
| LLM hallucination pollutes scores | Require source references for summaries and assign zero direct scoring weight to LLM evidence. |
|
||||||
|
| Search results create false positives | Promote only strongly linked person/work/character evidence, keep context-only results visible but low impact, and preserve operator judgment. |
|
||||||
|
| External provider cost or quota spikes | Add provider policies, daily limits, skipped evidence, and operational counters. |
|
||||||
|
| Detailed presenter leaks to applicants | Keep separate operator and applicant presenters with explicit regression tests. |
|
||||||
|
| Automatic rejection entries poison future matching | Preserve provenance, add correction/deactivation flows, and test stale-entry removal from active matching. |
|
||||||
|
| Real app integration differs from portable core | Keep this plan focused on backend contracts; defer routes, UI components, auth, and persistence wiring to target app integration. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Alternative Approaches Considered
|
||||||
|
|
||||||
|
- Build the real web admin screen now: rejected because this workspace contains no admin app, frontend stack, routes, auth, or database.
|
||||||
|
- Search-enrichment-only implementation: rejected because operators need a detailed surface to judge evidence quality.
|
||||||
|
- LLM-first scoring: rejected because source-less LLM output is not reliable enough for rights-risk decisions and conflicts with the origin safety boundary.
|
||||||
|
- Naver scraping or browser automation: rejected because official APIs are available for text-query search and UI automation would create unnecessary legal and operational risk.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phased Delivery
|
||||||
|
|
||||||
|
### Phase 1: Evidence and provider foundation
|
||||||
|
|
||||||
|
- Land U1 and U2 so Naver search evidence has a safe, typed, text-only path.
|
||||||
|
|
||||||
|
### Phase 2: LLM and enrichment pipeline
|
||||||
|
|
||||||
|
- Land U3 and U4 so query generation, search execution, summaries, and provider failure handling can run around existing analysis.
|
||||||
|
|
||||||
|
### Phase 3: Scoring and operator review
|
||||||
|
|
||||||
|
- Land U5 and U6 so promoted evidence affects risk conservatively and operators can inspect grouped evidence in one view model.
|
||||||
|
|
||||||
|
### Phase 4: Feedback, correction, and operations
|
||||||
|
|
||||||
|
- Land U7 and U8 so decisions improve future matching without permanent false-positive contamination.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Documentation / Operational Notes
|
||||||
|
|
||||||
|
- Update the operations doc with Naver credential handling, query-only usage, provider quotas, and emergency disable mode.
|
||||||
|
- Document that LLM summaries are reading aids and must cite source evidence.
|
||||||
|
- Document how to interpret search-skipped, no-results, provider-failure, and LLM-failure states.
|
||||||
|
- Document correction flow for rejection-derived knowledge entries.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Sources & References
|
||||||
|
|
||||||
|
- **Origin document:** [docs/brainstorms/2026-05-25-image-rights-review-enrichment-requirements.md](docs/brainstorms/2026-05-25-image-rights-review-enrichment-requirements.md)
|
||||||
|
- **Base plan:** [docs/plans/2026-05-25-001-feat-image-rights-risk-filter-plan.md](docs/plans/2026-05-25-001-feat-image-rights-risk-filter-plan.md)
|
||||||
|
- **Operations doc:** [docs/operations/image-rights-risk-filter.md](docs/operations/image-rights-risk-filter.md)
|
||||||
|
- Related code: [src/rights_filter/domain/records.py](src/rights_filter/domain/records.py)
|
||||||
|
- Related code: [src/rights_filter/admin/review_handlers.py](src/rights_filter/admin/review_handlers.py)
|
||||||
|
- Related code: [src/rights_filter/jobs/batch_analyzer.py](src/rights_filter/jobs/batch_analyzer.py)
|
||||||
|
- Related code: [src/rights_filter/integrations/cloud_vision_web_detection.py](src/rights_filter/integrations/cloud_vision_web_detection.py)
|
||||||
|
- Related tests: [tests/rights_filter/admin/test_review_handlers.py](tests/rights_filter/admin/test_review_handlers.py)
|
||||||
|
- Naver Image Search API: https://developers.naver.com/docs/serviceapi/search/image/image.md
|
||||||
|
- Naver Search API product page: https://developers.naver.com/products/service-api/search/search.md
|
||||||
|
- Naver API terms: https://developers.naver.com/products/terms
|
||||||
|
- Google Cloud Vision Data Usage FAQ: https://docs.cloud.google.com/vision/docs/data-usage
|
||||||
|
- Google Cloud Vision Web Detection: https://docs.cloud.google.com/vision/docs/detecting-web
|
||||||
|
|
@ -0,0 +1,420 @@
|
||||||
|
---
|
||||||
|
title: "feat: Add Evidence Quality And Watchlist Growth"
|
||||||
|
type: feat
|
||||||
|
status: implemented
|
||||||
|
date: 2026-05-26
|
||||||
|
origin: docs/brainstorms/2026-05-26-evidence-quality-watchlist-requirements.md
|
||||||
|
---
|
||||||
|
|
||||||
|
# feat: Add Evidence Quality And Watchlist Growth
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Add a decision-first feedback loop around evidence status, watchlist candidate generation, strong watchlist matching, and candidate management. The implementation will keep the existing SQLite JSON-payload pattern and extend the current operator console instead of introducing a new persistence layer.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problem Frame
|
||||||
|
|
||||||
|
The current console can collect many internal, Google, Naver, and face-area web evidence items, but it does not yet let operators mark which evidence actually informed a case decision. It also promotes rejected cases too bluntly and does not create strong watchlist signals from held cases.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
- R1. Evidence items support operator status: used for judgment, irrelevant, false positive, and pending. (Origin R1-R3, F1, AE1-AE2)
|
||||||
|
- R2. Evidence status never creates a DB candidate by itself; candidate creation happens only after case decision. (Origin R2, R4, AE2)
|
||||||
|
- R3. Held and rejected case decisions automatically create persistent watchlist candidates. (Origin R5-R6, F2, AE1)
|
||||||
|
- R4. Approved decisions do not create automatic candidates. (Origin R7, AE3)
|
||||||
|
- R5. Watchlist candidates strongly affect future risk scoring, at roughly the same strength as confirmed DB image matches. (Origin R8, F3, AE4)
|
||||||
|
- R6. Watchlist candidate matches are visually distinct from confirmed DB matches. (Origin R9, AE4)
|
||||||
|
- R7. Watchlist signals never change case status automatically. (Origin R10)
|
||||||
|
- R8. Operators can promote watchlist candidates to confirmed DB entries or exclude them as false positives. (Origin R11-R13, F4, AE5-AE6)
|
||||||
|
- R9. Confirmed DB, watchlist candidates, and excluded candidates retain status, source decision, source evidence, and contribution counts. (Origin R14-R17)
|
||||||
|
|
||||||
|
**Origin actors:** A1 operator, A2 rights risk filter, A3 DB administrator
|
||||||
|
|
||||||
|
**Origin flows:** F1 evidence status marking, F2 decision-driven watchlist creation, F3 watchlist-based rediscovery, F4 candidate promotion and exclusion
|
||||||
|
|
||||||
|
**Origin acceptance examples:** AE1, AE2, AE3, AE4, AE5, AE6
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scope Boundaries
|
||||||
|
|
||||||
|
- No automatic approval, hold, or rejection.
|
||||||
|
- No face embeddings, face similarity database, biometric template storage, or identity recognition.
|
||||||
|
- No Google Image Search, Google Lens, Naver web UI automation, or scraping.
|
||||||
|
- No applicant-facing exposure of evidence statuses, watchlist candidates, scoring rules, or internal reasons.
|
||||||
|
- No new relational migration framework; this iteration keeps the existing JSON payload tables.
|
||||||
|
|
||||||
|
### Deferred to Follow-Up Work
|
||||||
|
|
||||||
|
- Bulk watchlist cleanup, analytics dashboards, and advanced merge suggestions can follow after the core loop is proven.
|
||||||
|
- Domain-wide false-positive suppression is deferred because it can hide valid evidence from large sites.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context & Research
|
||||||
|
|
||||||
|
### Relevant Code and Patterns
|
||||||
|
|
||||||
|
- `src/rights_filter/server/sqlite_store.py` is the persistence and orchestration boundary. It stores JSON payloads in `submissions`, `evidence`, `knowledge_entries`, `collection_candidates`, `corrections`, and `audit_events`.
|
||||||
|
- `CopyrighterStore.record_decision` currently updates case decision and creates a rejected-reference knowledge entry only for rejected cases.
|
||||||
|
- `CopyrighterStore._knowledge_repository` rebuilds an in-memory repository from active `knowledge_entries` and feeds `InternalAnalyzer`.
|
||||||
|
- `src/rights_filter/analysis/internal_analyzer.py` emits fingerprint evidence for knowledge-base image similarity.
|
||||||
|
- `src/rights_filter/analysis/risk_scoring.py` already gives high weight to strong fingerprint matches and ignores non-contributing/queued evidence.
|
||||||
|
- `web/operator-gui/app.js`, `index.html`, and `styles.css` implement the current static console, evidence grouping, decision actions, candidate collection, and knowledge DB management.
|
||||||
|
- `tests/rights_filter/server/test_sqlite_store.py` is the main integration test surface for persistence behavior.
|
||||||
|
- `tests/operator_gui/test_static_workbench.py` protects the UI contract without browser runtime dependencies.
|
||||||
|
|
||||||
|
### Institutional Learnings
|
||||||
|
|
||||||
|
- No `docs/solutions/` directory exists in this workspace.
|
||||||
|
- No `STRATEGY.md` exists; the active product strategy is captured in the brainstorm requirements documents.
|
||||||
|
|
||||||
|
### External References
|
||||||
|
|
||||||
|
- No new external APIs are introduced. Existing Google/Naver/Ollama boundaries and no-scraping policy remain unchanged.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Technical Decisions
|
||||||
|
|
||||||
|
- Use `knowledge_entries` for persistent watchlist state: watchlist candidates are persistent risk references, not transient keyword collection results, so they should not live in `collection_candidates`, which is cleared on each keyword search.
|
||||||
|
- Add status fields instead of new tables: JSON payload storage lets us add `entryStatus`, `originDecisionStatus`, `sourceSubmissionId`, `sourceEvidenceIds`, and `contributionCount` without schema migration complexity.
|
||||||
|
- Generate candidates from the local submission image when available: the decision API passes the local image store into `record_decision`, which stores a perceptual sample fingerprint for held/rejected watchlist entries. If the image store is unavailable, the candidate is still recorded but cannot participate in image similarity until a sample fingerprint is added.
|
||||||
|
- Strong watchlist scoring: watchlist similarity should use the same high-risk path as rejected-image similarity, but with separate reason text and UI group so operators can see it is not confirmed DB evidence.
|
||||||
|
- False-positive suppression scope: start with exact evidence identity, URL/image URL/title, and candidate fingerprint. Do not suppress an entire provider domain from one false-positive action.
|
||||||
|
- Decision-driven default evidence set: use evidence marked `used_for_judgment` when available; if none are marked, generate the watchlist candidate from the case fingerprint and top contributing evidence so held/rejected decisions still strengthen future detection.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
### Resolved During Planning
|
||||||
|
|
||||||
|
- Watchlist score strength: use the same high-confidence fingerprint match behavior as confirmed/rejected DB references, with separate UI labeling.
|
||||||
|
- UI distinction: add a dedicated watchlist/주의 후보 evidence group and badges rather than mixing it into confirmed internal DB evidence.
|
||||||
|
- False-positive propagation: suppress exact evidence/candidate patterns first, not whole domains.
|
||||||
|
|
||||||
|
### Deferred to Implementation
|
||||||
|
|
||||||
|
- Exact Korean microcopy can be adjusted while fitting existing console labels.
|
||||||
|
- Exact CSS treatment should follow the existing evidence group and chip styles after visual verification.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## High-Level Technical Design
|
||||||
|
|
||||||
|
> *This illustrates the intended approach and is directional guidance for review, not implementation specification.*
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TB
|
||||||
|
Evidence[Collected evidence] --> Mark[Operator marks evidence status]
|
||||||
|
Mark --> Decision[Operator decides approve / hold / reject]
|
||||||
|
Decision -->|approved| NoCandidate[No automatic candidate]
|
||||||
|
Decision -->|held or rejected| Watchlist[Create watchlist candidate]
|
||||||
|
Watchlist --> Analyze[Future internal analysis]
|
||||||
|
Confirmed[Confirmed DB entries] --> Analyze
|
||||||
|
Analyze --> Score[Risk scoring]
|
||||||
|
Score --> UI[Separate confirmed vs watchlist evidence groups]
|
||||||
|
Watchlist --> Promote[Promote to confirmed DB]
|
||||||
|
Watchlist --> Exclude[Exclude as false positive]
|
||||||
|
Exclude --> Score
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Units
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TB
|
||||||
|
U1[U1 Evidence status API and payload]
|
||||||
|
U2[U2 Decision-driven watchlist creation]
|
||||||
|
U3[U3 Watchlist matching and scoring]
|
||||||
|
U4[U4 Candidate promotion and exclusion]
|
||||||
|
U5[U5 Operator UI controls]
|
||||||
|
U6[U6 Docs and verification]
|
||||||
|
|
||||||
|
U1 --> U2
|
||||||
|
U2 --> U3
|
||||||
|
U3 --> U4
|
||||||
|
U1 --> U5
|
||||||
|
U3 --> U5
|
||||||
|
U4 --> U5
|
||||||
|
U5 --> U6
|
||||||
|
```
|
||||||
|
|
||||||
|
### U1. Evidence Status API And Payload
|
||||||
|
|
||||||
|
**Goal:** Let operators mark evidence as used for judgment, irrelevant, false positive, or pending without changing case decision or DB state.
|
||||||
|
|
||||||
|
**Requirements:** R1, R2
|
||||||
|
|
||||||
|
**Dependencies:** None
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `src/rights_filter/server/sqlite_store.py`
|
||||||
|
- Modify: `src/rights_filter/server/http_app.py`
|
||||||
|
- Modify: `web/operator-gui/app.js`
|
||||||
|
- Modify: `web/operator-gui/index.html`
|
||||||
|
- Modify: `web/operator-gui/styles.css`
|
||||||
|
- Test: `tests/rights_filter/server/test_sqlite_store.py`
|
||||||
|
- Test: `tests/rights_filter/server/test_http_app.py`
|
||||||
|
- Test: `tests/operator_gui/test_static_workbench.py`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Add a store method that updates an existing evidence payload with an operator evidence status and optional note.
|
||||||
|
- Add an HTTP route for evidence status updates.
|
||||||
|
- Keep evidence status inside each evidence payload so existing bootstrap/review responses include it automatically.
|
||||||
|
- Treat false-positive and irrelevant evidence as non-contributing during rescore.
|
||||||
|
- Keep pending evidence visible but non-final.
|
||||||
|
|
||||||
|
**Execution note:** Test-first. Start with store-level tests proving status changes do not create candidates and do affect rescore contribution.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Existing `record_decision`, `_put`, `_evidence_by_submission`, and HTTP body parsing patterns in `src/rights_filter/server/sqlite_store.py` and `src/rights_filter/server/http_app.py`.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: marking a Google evidence item as used for judgment persists in `review()` and `bootstrap()`.
|
||||||
|
- Happy path: marking evidence as irrelevant sets it non-contributing and rescore omits its points.
|
||||||
|
- Edge case: marking a missing evidence ID returns a not-found error.
|
||||||
|
- Edge case: unsupported evidence status returns a validation error.
|
||||||
|
- Integration: HTTP evidence status route updates the review payload.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Evidence status is visible in the API payload and does not create any knowledge entry or watchlist candidate by itself.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### U2. Decision-Driven Watchlist Creation
|
||||||
|
|
||||||
|
**Goal:** Create persistent watchlist candidates automatically after held or rejected decisions, using case fingerprint evidence and judgment-used evidence.
|
||||||
|
|
||||||
|
**Requirements:** R2, R3, R4, R9
|
||||||
|
|
||||||
|
**Dependencies:** U1
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `src/rights_filter/server/sqlite_store.py`
|
||||||
|
- Test: `tests/rights_filter/server/test_sqlite_store.py`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Extend `record_decision` so `held` and `rejected` decisions create or update a watchlist entry.
|
||||||
|
- Stop treating rejected decisions as immediately confirmed DB entries; rejected decisions should create watchlist entries first, then operators can promote them.
|
||||||
|
- Populate watchlist payloads with source submission, origin decision status, source evidence IDs, sample fingerprints, memo, active/excluded state, and contribution count.
|
||||||
|
- Use the case's generated fingerprint evidence as the primary sample fingerprint source.
|
||||||
|
- Prefer evidence marked used for judgment; if none is marked, fallback to top contributing evidence plus the case fingerprint so strict detection still grows.
|
||||||
|
- Ensure repeated decisions update the existing source-submission watchlist entry instead of creating duplicates.
|
||||||
|
|
||||||
|
**Execution note:** Test-first around decision outcomes before changing the existing rejected-entry behavior.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Existing automatic rejected-reference creation in `record_decision`.
|
||||||
|
- Existing knowledge-entry payload shape from `register_manual_knowledge_entry` and candidate promotion methods.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: held decision creates one active watchlist entry with source submission and fingerprint.
|
||||||
|
- Happy path: rejected decision creates one active watchlist entry with source evidence IDs.
|
||||||
|
- Happy path: approved decision creates no watchlist entry.
|
||||||
|
- Edge case: repeating held/rejected decision for the same submission updates one candidate, not duplicates.
|
||||||
|
- Edge case: no used evidence still creates an incomplete watchlist entry from available fingerprint evidence.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Held and rejected decisions create persistent watchlist entries, approval does not, and candidate provenance is visible in `knowledgeEntries`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### U3. Watchlist Matching And Scoring
|
||||||
|
|
||||||
|
**Goal:** Make watchlist candidates strongly affect future risk while remaining distinguishable from confirmed DB entries.
|
||||||
|
|
||||||
|
**Requirements:** R5, R6, R7, R9
|
||||||
|
|
||||||
|
**Dependencies:** U2
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `src/rights_filter/domain/records.py`
|
||||||
|
- Modify: `src/rights_filter/analysis/internal_analyzer.py`
|
||||||
|
- Modify: `src/rights_filter/analysis/risk_scoring.py`
|
||||||
|
- Modify: `src/rights_filter/server/sqlite_store.py`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_internal_analyzer.py`
|
||||||
|
- Test: `tests/rights_filter/analysis/test_risk_scoring.py`
|
||||||
|
- Test: `tests/rights_filter/server/test_sqlite_store.py`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Carry knowledge entry status into internal fingerprint evidence so matches can be labeled as watchlist or confirmed.
|
||||||
|
- Keep watchlist entries active for matching unless excluded.
|
||||||
|
- Score watchlist image similarity at the same high-risk level as confirmed rejected-image similarity when similarity is high.
|
||||||
|
- Use distinct evidence reason/data for watchlist matches so UI grouping can separate them.
|
||||||
|
- Increment contribution count when a watchlist entry contributes to a rescore or analysis result.
|
||||||
|
|
||||||
|
**Execution note:** Test scoring and reason text before wiring UI labels.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- `InternalAnalyzer` knowledge-base similarity loop.
|
||||||
|
- `RiskScorer` fingerprint evidence handling and non-contributing evidence checks.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: image similar to a watchlist entry emits watchlist similarity evidence.
|
||||||
|
- Happy path: watchlist similarity at or above threshold produces high-risk score.
|
||||||
|
- Happy path: matched watchlist evidence does not change `decisionStatus`.
|
||||||
|
- Edge case: excluded watchlist entry is not included in repository matching.
|
||||||
|
- Integration: contribution count increases only when watchlist evidence contributes to the case score.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Watchlist matches raise risk strongly while remaining labeled as watchlist-derived evidence.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### U4. Candidate Promotion And False-Positive Exclusion
|
||||||
|
|
||||||
|
**Goal:** Let operators promote watchlist candidates to confirmed DB entries or exclude them so future matching is suppressed.
|
||||||
|
|
||||||
|
**Requirements:** R8, R9
|
||||||
|
|
||||||
|
**Dependencies:** U2, U3
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `src/rights_filter/server/sqlite_store.py`
|
||||||
|
- Modify: `src/rights_filter/server/http_app.py`
|
||||||
|
- Test: `tests/rights_filter/server/test_sqlite_store.py`
|
||||||
|
- Test: `tests/rights_filter/server/test_http_app.py`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Add store methods and HTTP routes for promoting a watchlist entry and excluding a watchlist entry.
|
||||||
|
- Promotion changes the entry status to confirmed while preserving source decision and evidence history.
|
||||||
|
- Exclusion changes the entry status to excluded, disables matching, and stores an exclusion reason.
|
||||||
|
- Apply false-positive evidence status to exact evidence/candidate patterns, image fingerprint, URL/image URL, and title where available.
|
||||||
|
- Add audit events for promotion and exclusion.
|
||||||
|
|
||||||
|
**Execution note:** Characterize existing manual/collection promotion behavior first, then add watchlist-specific paths.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Existing `promote_collection_candidate`, `promote_collection_candidates`, and knowledge entry active/deactivation patterns.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Happy path: promoting a watchlist entry makes it confirmed and keeps sample fingerprints.
|
||||||
|
- Happy path: excluding a watchlist entry prevents future similarity evidence from that entry.
|
||||||
|
- Edge case: promoting an excluded entry requires explicit unexclude or returns validation error.
|
||||||
|
- Edge case: missing candidate ID returns not found.
|
||||||
|
- Integration: audit log records promote/exclude actions.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Operators can move candidates between watchlist, confirmed, and excluded states without losing provenance.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### U5. Operator UI Controls And Evidence Grouping
|
||||||
|
|
||||||
|
**Goal:** Make evidence status, watchlist matches, and candidate actions clear in the operator console.
|
||||||
|
|
||||||
|
**Requirements:** R1, R3, R6, R8, R9
|
||||||
|
|
||||||
|
**Dependencies:** U1, U3, U4
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `web/operator-gui/index.html`
|
||||||
|
- Modify: `web/operator-gui/app.js`
|
||||||
|
- Modify: `web/operator-gui/styles.css`
|
||||||
|
- Test: `tests/operator_gui/test_static_workbench.py`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Add evidence-row controls for 판단에 사용, 무관, 오탐, 보류.
|
||||||
|
- Hide or de-emphasize irrelevant and false-positive evidence by default while preserving a details view.
|
||||||
|
- Add a dedicated 주의 후보 근거 group for watchlist matches.
|
||||||
|
- Add watchlist status chips in the knowledge DB list: 주의 후보, 확정 기준, 오탐 제외.
|
||||||
|
- Add promote/exclude actions for watchlist rows.
|
||||||
|
- Keep controls dense and consistent with the existing operator dashboard; avoid introducing a separate landing or wizard.
|
||||||
|
|
||||||
|
**Execution note:** Follow frontend design checks after implementation: load the local 9500 page with Playwright and check for console errors and obvious layout breakage.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Existing evidence group rendering, details overflow, candidate cards, and knowledge rows in `web/operator-gui/app.js`.
|
||||||
|
- Existing compact panel and row styles in `web/operator-gui/styles.css`.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Static contract: UI exposes evidence status action handlers and API paths.
|
||||||
|
- Static contract: watchlist group label and knowledge status chips are present.
|
||||||
|
- Static contract: irrelevant/false-positive evidence handling is represented in rendering functions.
|
||||||
|
- Browser check: page loads on desktop viewport without console errors after server restart.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Operators can mark evidence status, see watchlist evidence separately, and manage watchlist entries without confusing them with confirmed DB entries.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### U6. Documentation And Regression Verification
|
||||||
|
|
||||||
|
**Goal:** Update operations guidance and verify the feature end to end.
|
||||||
|
|
||||||
|
**Requirements:** R1-R9
|
||||||
|
|
||||||
|
**Dependencies:** U1-U5
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `docs/operations/copyrighter-operation-worklist.md`
|
||||||
|
- Test: `tests/rights_filter/server/test_sqlite_store.py`
|
||||||
|
- Test: `tests/rights_filter/server/test_http_app.py`
|
||||||
|
- Test: `tests/operator_gui/test_static_workbench.py`
|
||||||
|
|
||||||
|
**Approach:**
|
||||||
|
- Document the operator flow: mark evidence, decide case, watchlist creation, promotion, exclusion.
|
||||||
|
- State that watchlist matching is strong but not automatic case disposition.
|
||||||
|
- Run full test suite.
|
||||||
|
- Restart the 9500 server and verify `/health`, provider state, and browser load.
|
||||||
|
|
||||||
|
**Execution note:** Preserve the active `.env` and existing local data. Do not reset DB unless the user explicitly asks.
|
||||||
|
|
||||||
|
**Patterns to follow:**
|
||||||
|
- Existing operations doc format and local server verification pattern.
|
||||||
|
|
||||||
|
**Test scenarios:**
|
||||||
|
- Integration: full `pytest` passes.
|
||||||
|
- Browser: 9500 page loads without console errors.
|
||||||
|
- Operational: `/health` returns ok after restart.
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Feature is documented, tests pass, and the local server is running with the updated code.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## System-Wide Impact
|
||||||
|
|
||||||
|
- **Interaction graph:** Case decisions now trigger watchlist updates; evidence status affects scoring contribution; internal analysis reads active confirmed/watchlist entries.
|
||||||
|
- **Error propagation:** Invalid evidence status, missing evidence, missing candidate, or invalid promotion/exclusion should return clear API errors without corrupting stored payloads.
|
||||||
|
- **State lifecycle risks:** Repeated held/rejected decisions must be idempotent per submission. Promotion and exclusion must not lose source decision provenance.
|
||||||
|
- **API surface parity:** Bootstrap, review, knowledge list, and evidence rows all need the new fields so the static UI stays in sync with server state.
|
||||||
|
- **Integration coverage:** Store tests must cover decision-to-watchlist-to-analysis; UI static tests must cover controls and grouping.
|
||||||
|
- **Unchanged invariants:** No automatic final case disposition, no applicant exposure, no biometric face storage, no scraping.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Risks & Dependencies
|
||||||
|
|
||||||
|
| Risk | Mitigation |
|
||||||
|
|------|------------|
|
||||||
|
| Watchlist candidates over-amplify false positives | Keep watchlist visually distinct, add exclusion flow, and do not apply domain-wide suppression. |
|
||||||
|
| Rejected-entry behavior changes existing expectations | Update tests to make watchlist the automatic intermediate state and promotion the explicit confirmed state. |
|
||||||
|
| JSON payload fields drift across old records | Use default values when fields are absent and normalize in rendering/scoring paths. |
|
||||||
|
| UI becomes crowded | Use compact segmented evidence actions and keep weak/irrelevant evidence collapsed. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Documentation / Operational Notes
|
||||||
|
|
||||||
|
- Update the operations doc with the decision-first flow and the difference between 주의 후보 and 확정 기준 DB.
|
||||||
|
- Keep the current `.env` behavior unchanged.
|
||||||
|
- Restart the 9500 server after implementation so the operator console uses the updated route handlers and JS.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Sources & References
|
||||||
|
|
||||||
|
- **Origin document:** [docs/brainstorms/2026-05-26-evidence-quality-watchlist-requirements.md](docs/brainstorms/2026-05-26-evidence-quality-watchlist-requirements.md)
|
||||||
|
- Related plan: [docs/plans/2026-05-25-002-feat-image-rights-review-enrichment-plan.md](docs/plans/2026-05-25-002-feat-image-rights-review-enrichment-plan.md)
|
||||||
|
- Related code: `src/rights_filter/server/sqlite_store.py`
|
||||||
|
- Related code: `src/rights_filter/analysis/internal_analyzer.py`
|
||||||
|
- Related code: `src/rights_filter/analysis/risk_scoring.py`
|
||||||
|
- Related UI: `web/operator-gui/app.js`
|
||||||
174
docs/plans/2026-05-28-001-feat-operator-ui-overhaul-plan.md
Normal file
174
docs/plans/2026-05-28-001-feat-operator-ui-overhaul-plan.md
Normal file
|
|
@ -0,0 +1,174 @@
|
||||||
|
---
|
||||||
|
status: active
|
||||||
|
created: 2026-05-28
|
||||||
|
type: feat
|
||||||
|
title: Operator UI overhaul
|
||||||
|
---
|
||||||
|
|
||||||
|
# Operator UI Overhaul Plan
|
||||||
|
|
||||||
|
## Problem Frame
|
||||||
|
|
||||||
|
The current operator console has drifted into a patched layout instead of a coherent review workspace. The top evidence coverage card scrolls inside the header, queue columns create awkward gaps between top evidence and provider state, Knowledge DB badges wrap unevenly, and "Case review" and "Search evidence" are separate top-level tabs even though both depend on the same selected case.
|
||||||
|
|
||||||
|
This plan rebuilds the information architecture around the operator's real workflow: filter the queue, select a case, inspect all evidence in one workbench, decide, and maintain reference knowledge.
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
- Replace the top scrolling evidence coverage panel with a horizontal tab/segmented indicator.
|
||||||
|
- Remove "Search evidence" as an independent top-level navigation destination.
|
||||||
|
- Combine case review, evidence, query history, relevant knowledge, and decision controls into one case workbench.
|
||||||
|
- Rebuild queue rows so top evidence and provider state sit next to each other without unused whitespace.
|
||||||
|
- Normalize Knowledge DB badge wrapping and action alignment.
|
||||||
|
- Preserve existing API endpoints and local static deployment.
|
||||||
|
- Verify with static tests and browser screenshots across desktop and mobile.
|
||||||
|
|
||||||
|
## Key Technical Decisions
|
||||||
|
|
||||||
|
- Keep the existing no-framework static app (`index.html`, `app.js`, `styles.css`) and avoid adding dependencies.
|
||||||
|
- Use CSS grid rows for the queue instead of table column sizing for the primary desktop layout.
|
||||||
|
- Use a compact horizontal `.coverage-tabs` control for provider coverage. It should be filter-capable, not just decorative.
|
||||||
|
- Treat the workbench as the only selected-case surface. Evidence search becomes a workbench tab.
|
||||||
|
- Use one shared chip/badge alignment system for Knowledge DB, evidence, providers, and queue summaries.
|
||||||
|
|
||||||
|
## Implementation Units
|
||||||
|
|
||||||
|
### U1: Header Coverage Tabs And Queue Row Grid
|
||||||
|
|
||||||
|
Files:
|
||||||
|
- `web/operator-gui/index.html`
|
||||||
|
- `web/operator-gui/app.js`
|
||||||
|
- `web/operator-gui/styles.css`
|
||||||
|
- `tests/operator_gui/test_static_workbench.py`
|
||||||
|
|
||||||
|
Approach:
|
||||||
|
- Replace `#search-coverage` card semantics with `#coverage-tabs`.
|
||||||
|
- Render coverage as one horizontal segmented row: all, evidence coverage, provider query counts, failures.
|
||||||
|
- Add provider/source filter behavior through `data-coverage-filter`.
|
||||||
|
- Replace queue table visual styling with `.queue-grid`/`.queue-row` classes while keeping accessible table markup if needed for low-risk migration.
|
||||||
|
- Reduce queue provider/evidence whitespace by explicitly sizing grid areas.
|
||||||
|
|
||||||
|
Test scenarios:
|
||||||
|
- Static shell exposes `id="coverage-tabs"` and does not expose a scrollable `search-coverage` header panel.
|
||||||
|
- Script contains `renderCoverageTabs` and `applyCoverageFilter`.
|
||||||
|
- CSS contains `.coverage-tabs`, `.coverage-tab`, `.queue-grid`, and `.queue-row`.
|
||||||
|
- CSS does not set `overflow: auto` on the top coverage control.
|
||||||
|
|
||||||
|
Verification:
|
||||||
|
- `python -m pytest tests/operator_gui/test_static_workbench.py`
|
||||||
|
- Browser screenshot at 1440x1000 and 390x844 with no document horizontal overflow.
|
||||||
|
|
||||||
|
### U2: Case Workbench Internal Tabs
|
||||||
|
|
||||||
|
Files:
|
||||||
|
- `web/operator-gui/index.html`
|
||||||
|
- `web/operator-gui/app.js`
|
||||||
|
- `web/operator-gui/styles.css`
|
||||||
|
- `tests/operator_gui/test_static_workbench.py`
|
||||||
|
|
||||||
|
Approach:
|
||||||
|
- Rename the case surface to a workbench.
|
||||||
|
- Move query history and searchable evidence from the former `evidence-view` into workbench internal tabs.
|
||||||
|
- Add workbench tabs: summary, evidence, queries, knowledge, decision.
|
||||||
|
- Keep the selected case state shared, but stop representing evidence as an independent top-level route.
|
||||||
|
|
||||||
|
Test scenarios:
|
||||||
|
- Top-level view set excludes `evidence`.
|
||||||
|
- Workbench markup contains `data-workbench-tab="evidence"` and `data-workbench-panel="evidence"`.
|
||||||
|
- Selecting a case routes to workbench and preserves selected submission.
|
||||||
|
- Search evidence renderer targets the workbench panel.
|
||||||
|
|
||||||
|
Verification:
|
||||||
|
- Static tests.
|
||||||
|
- Browser click test: select queue row, switch internal tabs, confirm no top-level route mismatch.
|
||||||
|
|
||||||
|
### U3: Knowledge DB Badge And Action Alignment
|
||||||
|
|
||||||
|
Files:
|
||||||
|
- `web/operator-gui/app.js`
|
||||||
|
- `web/operator-gui/styles.css`
|
||||||
|
- `tests/operator_gui/test_static_workbench.py`
|
||||||
|
|
||||||
|
Approach:
|
||||||
|
- Split Knowledge DB row into media, content, chip rail, and actions.
|
||||||
|
- Move name/title out of chip row.
|
||||||
|
- Use a deterministic chip order: status, type, provenance, samples, watchlist meta.
|
||||||
|
- Align actions to a fixed rail on desktop and full-width row on mobile.
|
||||||
|
|
||||||
|
Test scenarios:
|
||||||
|
- Script renders `.knowledge-chip-row` separate from `.row-title`.
|
||||||
|
- CSS defines `.knowledge-row` grid areas and `.knowledge-actions` alignment.
|
||||||
|
- Long aliases/keywords wrap in metadata text, not inside status badges.
|
||||||
|
|
||||||
|
Verification:
|
||||||
|
- Static tests.
|
||||||
|
- Browser screenshot of Knowledge DB desktop and mobile.
|
||||||
|
|
||||||
|
### U4: Navigation Simplification And Labels
|
||||||
|
|
||||||
|
Files:
|
||||||
|
- `web/operator-gui/index.html`
|
||||||
|
- `web/operator-gui/app.js`
|
||||||
|
- `web/operator-gui/styles.css`
|
||||||
|
- `tests/operator_gui/test_static_workbench.py`
|
||||||
|
|
||||||
|
Approach:
|
||||||
|
- Reduce top-level navigation to queue, workbench, knowledge, providers, audit.
|
||||||
|
- Move correction tools under Knowledge DB or workbench decision context, depending on current behavior preservation.
|
||||||
|
- Ensure URL hash handling maps old `#case` and `#evidence` gracefully to the new workbench during transition.
|
||||||
|
|
||||||
|
Test scenarios:
|
||||||
|
- Top-level nav does not include `data-view="evidence"`.
|
||||||
|
- Old hash routes map to workbench instead of blank state.
|
||||||
|
- Existing correction controls remain reachable.
|
||||||
|
|
||||||
|
Verification:
|
||||||
|
- Static tests.
|
||||||
|
- Browser hash smoke checks.
|
||||||
|
|
||||||
|
### U5: Visual Verification And Cleanup
|
||||||
|
|
||||||
|
Files:
|
||||||
|
- `data/logs/`
|
||||||
|
- `web/operator-gui/styles.css`
|
||||||
|
- `tests/operator_gui/test_static_workbench.py`
|
||||||
|
|
||||||
|
Approach:
|
||||||
|
- Capture screenshots for queue, workbench tabs, Knowledge DB, providers.
|
||||||
|
- Remove obsolete CSS tied to scroll coverage panels and table-only queue spacing.
|
||||||
|
- Keep only compatibility styles needed during transition.
|
||||||
|
|
||||||
|
Test scenarios:
|
||||||
|
- No document horizontal overflow at 390px mobile.
|
||||||
|
- Header has no nested scroll control.
|
||||||
|
- Queue row content is visible and aligned at desktop and mobile.
|
||||||
|
|
||||||
|
Verification:
|
||||||
|
- `python -m pytest tests/operator_gui/test_static_workbench.py`
|
||||||
|
- Playwright screenshot audit with overflow diagnostics.
|
||||||
|
|
||||||
|
## Scope Boundaries
|
||||||
|
|
||||||
|
In scope:
|
||||||
|
- Static operator console UI structure, interaction state, and visual layout.
|
||||||
|
- Existing local server and current API response shape.
|
||||||
|
- Tests for UI contracts and visual regression smoke checks.
|
||||||
|
|
||||||
|
Out of scope:
|
||||||
|
- Backend API schema changes.
|
||||||
|
- New frontend framework migration.
|
||||||
|
- Authentication, deployment, or production process management.
|
||||||
|
- Rewriting garbled legacy Korean copy except where labels are touched by the UI structure change.
|
||||||
|
|
||||||
|
## Risks
|
||||||
|
|
||||||
|
- The existing static test file contains legacy string assertions and one unrelated `NaN` contract failure. Update tests deliberately so failures point to the new UI contract.
|
||||||
|
- Moving evidence into the workbench can break event listeners if IDs are duplicated or moved without updating render targets.
|
||||||
|
- Keeping table markup while visually converting to grid is a migration compromise; if it creates accessibility or CSS complexity, switch to semantic list/card rows with explicit labels.
|
||||||
|
|
||||||
|
## Verification Plan
|
||||||
|
|
||||||
|
- Run static UI contract tests after each implementation unit.
|
||||||
|
- Use Playwright to capture desktop and mobile screenshots after U1, U2, and U3.
|
||||||
|
- Record overflow diagnostics: `document.documentElement.scrollWidth === document.documentElement.clientWidth`.
|
||||||
|
- Inspect console errors and distinguish data 404s from UI regressions.
|
||||||
|
|
@ -0,0 +1,161 @@
|
||||||
|
---
|
||||||
|
status: active
|
||||||
|
created: 2026-06-03
|
||||||
|
type: quality
|
||||||
|
title: Operator console quality improvement plan
|
||||||
|
---
|
||||||
|
|
||||||
|
# Operator Console Quality Improvement Plan
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
This plan responds to the concrete quality issues raised during review of the current operator console:
|
||||||
|
|
||||||
|
- static tests rely too heavily on string presence;
|
||||||
|
- visible Korean copy is partially corrupted;
|
||||||
|
- `web/operator-gui/app.js` carries too many responsibilities;
|
||||||
|
- the case workbench still needs stronger next-action guidance;
|
||||||
|
- submission upload and folder reload UX needs clearer operator feedback.
|
||||||
|
- the product purpose is not explicit enough in the UI.
|
||||||
|
|
||||||
|
## 1. Strengthen Tests Beyond String Presence
|
||||||
|
|
||||||
|
Problem:
|
||||||
|
The operator GUI tests catch missing selectors and accidental deletion, but they do not prove the main flows work in a browser.
|
||||||
|
|
||||||
|
Plan:
|
||||||
|
|
||||||
|
- Keep static tests as fast contract checks.
|
||||||
|
- Add server-backed behavior tests for upload, reload, and manual search.
|
||||||
|
- Add future browser smoke tests for queue selection, workbench tab switching, and suggested-query fill behavior.
|
||||||
|
|
||||||
|
Immediate action:
|
||||||
|
Completed:
|
||||||
|
|
||||||
|
- `tests/operator_gui/test_browser_smoke.py` now verifies the suggested-query browser flow: selecting a recommendation fills the manual query input without running a search.
|
||||||
|
- `tests/operator_gui/test_browser_smoke.py` now verifies the real browser upload flow through the local HTTP server: file selection, `/api/submissions/upload-image`, saved image file, new queue item, workbench navigation, and selected uploaded submission.
|
||||||
|
- `tests/rights_filter/server/test_http_app.py::test_http_server_uploads_submission_image_into_active_queue` verifies the upload HTTP contract saves the image into the active queue and returns the uploaded submission in the refreshed bootstrap payload.
|
||||||
|
- `tests/operator_gui/test_static_workbench.py` retains fast structural checks and now also protects module load order, mojibake prevention, product purpose copy, workflow guidance, and upload UX contracts.
|
||||||
|
|
||||||
|
## 2. Repair Corrupted Korean Copy
|
||||||
|
|
||||||
|
Problem:
|
||||||
|
Several operator-facing status strings in `app.js` are mojibake. This weakens trust in a review tool.
|
||||||
|
|
||||||
|
Plan:
|
||||||
|
|
||||||
|
- Prioritize visible workflow copy over legacy compatibility strings.
|
||||||
|
- Replace high-use status text first: reload, folder import, image upload, bulk rerun, manual search, and decision memo errors.
|
||||||
|
- Keep compatibility-only tokens isolated until tests can be rewritten around clean labels.
|
||||||
|
|
||||||
|
Immediate action:
|
||||||
|
Completed:
|
||||||
|
|
||||||
|
- Reload, folder import, upload, bulk rerun, manual search, evidence, provider, knowledge DB, candidate, correction, and empty-state copy were normalized to readable Korean.
|
||||||
|
- A mojibake regression check now scans the operator GUI files for common corrupted UTF-8 fragments.
|
||||||
|
- The operator GUI JavaScript files pass `node --check`.
|
||||||
|
|
||||||
|
## 3. Split `app.js` Responsibilities
|
||||||
|
|
||||||
|
Problem:
|
||||||
|
`app.js` owns queue rendering, workbench rendering, search, knowledge DB, provider controls, upload, and audit events.
|
||||||
|
|
||||||
|
Plan:
|
||||||
|
|
||||||
|
- First extract pure helpers: file payload reading, evidence sufficiency, query suggestions, and import status formatting.
|
||||||
|
- Then split rendering domains into small static modules if the deployment can safely move to module scripts.
|
||||||
|
- Avoid a broad framework migration.
|
||||||
|
|
||||||
|
Immediate action:
|
||||||
|
Completed:
|
||||||
|
|
||||||
|
- `web/operator-gui/operator-labels.js` owns visible label dictionaries.
|
||||||
|
- `web/operator-gui/submission-import.js` owns submission import/upload file payload helpers and import status message helpers.
|
||||||
|
- `web/operator-gui/evidence-guidance.js` owns evidence sufficiency checks, follow-up reasons, query seed normalization, and suggested evidence queries.
|
||||||
|
- `web/operator-gui/operator-search.js` owns query status labels, query strategy labels, and manual search provider normalization.
|
||||||
|
- `web/operator-gui/app.js` now delegates those concerns to the extracted helpers while retaining rendering and event orchestration.
|
||||||
|
|
||||||
|
Remaining:
|
||||||
|
|
||||||
|
- Rendering domains are still mostly in `app.js`. Further extraction should focus on evidence rendering and knowledge/candidate rendering only if needed, because the current no-framework static runtime is now split enough to reduce the most immediate risk.
|
||||||
|
|
||||||
|
## 4. Improve Next-Action Guidance
|
||||||
|
|
||||||
|
Problem:
|
||||||
|
The workbench shows evidence, but does not always tell the operator what to do when evidence is thin.
|
||||||
|
|
||||||
|
Plan:
|
||||||
|
|
||||||
|
- Keep the existing "근거 보강 추천" panel.
|
||||||
|
- Expand the sufficiency heuristic with clearer reasons: no direct match, too few searchable results, provider empty state, or duplicate queries.
|
||||||
|
- Add a browser smoke test to ensure clicking a recommendation fills the manual query form without running a search.
|
||||||
|
|
||||||
|
Immediate action:
|
||||||
|
Completed:
|
||||||
|
|
||||||
|
- The "근거 보강 추천" panel now explains concrete insufficiency reasons such as missing direct/page evidence, fewer than two searchable results, empty provider results, and failed provider history.
|
||||||
|
- Suggested queries fill the manual search input and switch to the query tab without automatically running a search.
|
||||||
|
- Browser smoke coverage verifies this non-automatic behavior.
|
||||||
|
|
||||||
|
## 5. Clarify Upload And Folder Reload UX
|
||||||
|
|
||||||
|
Problem:
|
||||||
|
Operators should not need to understand active queue internals. After adding a photo or changing folders, the UI should say what happened.
|
||||||
|
|
||||||
|
Plan:
|
||||||
|
|
||||||
|
- Provide a file picker for adding a photo to the active queue.
|
||||||
|
- Save uploaded files into the active queue folder and rescan immediately.
|
||||||
|
- Select the uploaded submission when available.
|
||||||
|
- Show clear status text for imported count, selected ID, upload progress, and active folder label.
|
||||||
|
|
||||||
|
Immediate action:
|
||||||
|
Completed:
|
||||||
|
|
||||||
|
- The upload endpoint and UI are implemented.
|
||||||
|
- Upload success selects the uploaded submission, resets queue filters, moves to the workbench evidence tab, and tells the operator the new case was selected.
|
||||||
|
- The queue header now explains that adding a photo creates a new review case and opens it for case review.
|
||||||
|
- Browser smoke coverage verifies the real upload flow end to end.
|
||||||
|
|
||||||
|
## 6. Clarify Product Purpose And Workflow
|
||||||
|
|
||||||
|
Problem:
|
||||||
|
The console exposed many implementation concepts and actions, but the product purpose and operator flow were not explicit enough.
|
||||||
|
|
||||||
|
Plan:
|
||||||
|
|
||||||
|
- State the product purpose in the top bar.
|
||||||
|
- Show the operator's expected flow on the queue screen.
|
||||||
|
- Keep this copy stable with static tests.
|
||||||
|
|
||||||
|
Completed:
|
||||||
|
|
||||||
|
- The top bar now states: "이미지 저작권 위험 심사" and explains that submitted images, external search evidence, and the internal criteria DB are reviewed together.
|
||||||
|
- The queue screen now shows a three-step operating flow: "심사 건 추가", "근거 보강", and "운영 결정".
|
||||||
|
- Static tests protect the purpose copy and workflow copy.
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
Run:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
python -m pytest tests\operator_gui\test_static_workbench.py
|
||||||
|
python -m pytest tests\operator_gui\test_browser_smoke.py
|
||||||
|
python -m pytest tests\rights_filter\server\test_http_app.py::test_http_server_uploads_submission_image_into_active_queue
|
||||||
|
node --check web\operator-gui\app.js
|
||||||
|
node --check web\operator-gui\operator-labels.js
|
||||||
|
node --check web\operator-gui\submission-import.js
|
||||||
|
node --check web\operator-gui\evidence-guidance.js
|
||||||
|
node --check web\operator-gui\operator-search.js
|
||||||
|
```
|
||||||
|
|
||||||
|
Latest local verification:
|
||||||
|
|
||||||
|
- `python -m pytest tests\operator_gui\test_static_workbench.py tests\operator_gui\test_browser_smoke.py` -> 39 passed.
|
||||||
|
- `python -m pytest tests\rights_filter\server\test_http_app.py::test_http_server_uploads_submission_image_into_active_queue` -> 1 passed.
|
||||||
|
- `node --check` passed for `app.js`, `operator-labels.js`, `submission-import.js`, `evidence-guidance.js`, and `operator-search.js`.
|
||||||
|
|
||||||
|
Future verification:
|
||||||
|
|
||||||
|
- If more rendering code is extracted from `app.js`, add a browser smoke check for that affected flow before relying on static checks.
|
||||||
|
- If the visual layout changes again, refresh the `ui-overhaul-final-results.json` overflow audit and screenshots.
|
||||||
256
docs/project-introduction-and-technical-implementation.md
Normal file
256
docs/project-introduction-and-technical-implementation.md
Normal file
|
|
@ -0,0 +1,256 @@
|
||||||
|
# Copyrighter Project Introduction and Technical Implementation
|
||||||
|
|
||||||
|
## 1. Project Overview
|
||||||
|
|
||||||
|
Copyrighter is an operator-facing image rights review system. It helps a review team identify potentially risky image submissions before they are approved for use. The system does not automatically make a legal copyright decision. Instead, it gathers evidence, computes a triage risk score, summarizes source-linked findings, and presents the case to a human operator for final approval, hold, rejection, or correction.
|
||||||
|
|
||||||
|
The core product idea is evidence-led review:
|
||||||
|
|
||||||
|
- collect internal image signals from the submitted file;
|
||||||
|
- enrich the case with approved external search and computer vision sources;
|
||||||
|
- compare images against known reference material and collected candidates;
|
||||||
|
- generate a source-grounded LLM summary for operator readability;
|
||||||
|
- preserve all evidence, actions, and changes in an auditable local database.
|
||||||
|
|
||||||
|
## 2. What Problem It Solves
|
||||||
|
|
||||||
|
Image review teams often need to decide whether a submitted image may be associated with a celebrity, character, broadcast, webtoon, game, brand asset, copied source image, or previously rejected internal reference. A manual-only workflow is slow because the operator has to inspect the image, search the web, remember prior decisions, compare similar images, and document the reason.
|
||||||
|
|
||||||
|
Copyrighter reduces that effort by turning a raw submission into a structured review case:
|
||||||
|
|
||||||
|
- a risk score and risk band for queue prioritization;
|
||||||
|
- top evidence explaining why the case may be risky;
|
||||||
|
- external search tool status for Google, Naver, and local LLM summarization;
|
||||||
|
- source URLs, image candidates, query history, and provider failures;
|
||||||
|
- a knowledge database for confirmed, watchlist, and excluded references;
|
||||||
|
- an audit log for operational traceability.
|
||||||
|
|
||||||
|
## 3. Main User Experience
|
||||||
|
|
||||||
|
The operator console is a local web application with these primary work areas:
|
||||||
|
|
||||||
|
- Review Queue: board-style list of submissions with image thumbnail, risk, top evidence, external search tool status, applicant status, operator decision, and timestamp.
|
||||||
|
- Case Review: detailed case screen where evidence and judgment controls are reviewed in one flow. The operator can mark evidence as used or unused and make the final decision.
|
||||||
|
- Knowledge DB: confirmed references and watchlist candidates used for future internal similarity checks.
|
||||||
|
- External Search Tool Usage: provider status, quota, recent success/failure state, and emergency disable controls.
|
||||||
|
- Audit Log: event history covering provider changes, analysis runs, manual searches, knowledge entry updates, and operator decisions.
|
||||||
|
|
||||||
|
## 4. High-Level Architecture
|
||||||
|
|
||||||
|
Copyrighter is implemented as a local Python backend with a static operator GUI.
|
||||||
|
|
||||||
|
Key components:
|
||||||
|
|
||||||
|
- `src/rights_filter/server/http_app.py`: local HTTP API server.
|
||||||
|
- `src/rights_filter/server/sqlite_store.py`: SQLite-backed application store, evidence persistence, provider state synchronization, enrichment orchestration, and audit events.
|
||||||
|
- `src/rights_filter/server/image_store.py`: local submission image loading.
|
||||||
|
- `web/operator-gui/`: static HTML, CSS, and JavaScript operator console.
|
||||||
|
- `data/copyrighter.sqlite3`: local SQLite database.
|
||||||
|
- `data/submissions/`: local submission image source directory.
|
||||||
|
|
||||||
|
The system can run locally at `http://127.0.0.1:9500/`.
|
||||||
|
|
||||||
|
## 5. End-to-End Processing Flow
|
||||||
|
|
||||||
|
1. A submission image is loaded from the local image store.
|
||||||
|
2. The backend creates or refreshes a submission record in SQLite.
|
||||||
|
3. Internal analysis generates local evidence:
|
||||||
|
- SHA-256 exact fingerprint;
|
||||||
|
- pHash perceptual fingerprint;
|
||||||
|
- face/person presence signal;
|
||||||
|
- known reference similarity matches.
|
||||||
|
4. Approved external enrichment may run:
|
||||||
|
- Google Cloud Vision Web Detection for web entities, matching images, visually similar images, and pages with matching images;
|
||||||
|
- Naver text-query search for Korean image, blog, and web evidence;
|
||||||
|
- Google Custom Search only when configured, though this is treated as a legacy/disabled-capable path.
|
||||||
|
5. Search result images and page images can be compared against the submitted image using pHash similarity.
|
||||||
|
6. Ollama local LLM summarizes only the stored source evidence.
|
||||||
|
7. `RiskScorer` computes a rule-based risk score and band.
|
||||||
|
8. The operator reviews the evidence and makes the final decision manually.
|
||||||
|
|
||||||
|
## 6. AI, ML, and Algorithmic Components
|
||||||
|
|
||||||
|
Copyrighter uses several AI/ML or algorithmic techniques. They have different roles and should not be described as one generic "AI score."
|
||||||
|
|
||||||
|
### 6.1 Google Cloud Vision Web Detection
|
||||||
|
|
||||||
|
Google Cloud Vision Web Detection is the strongest external ML-based computer vision component. It analyzes an approved image derivative and returns:
|
||||||
|
|
||||||
|
- web entities;
|
||||||
|
- full matching images;
|
||||||
|
- partial matching images;
|
||||||
|
- visually similar images;
|
||||||
|
- pages with matching images;
|
||||||
|
- best guess labels.
|
||||||
|
|
||||||
|
These results are stored as evidence with source, URL, image URL, page title, match type, provider score, and confidence.
|
||||||
|
|
||||||
|
### 6.2 Local LLM Summarization with Ollama
|
||||||
|
|
||||||
|
The local LLM assistant uses Ollama's Generate API through `src/rights_filter/analysis/llm_assistance.py`.
|
||||||
|
|
||||||
|
Its prompt explicitly restricts the model:
|
||||||
|
|
||||||
|
- summarize only the provided source evidence;
|
||||||
|
- do not make a final decision;
|
||||||
|
- do not add claims that are not grounded in source evidence.
|
||||||
|
|
||||||
|
The LLM output is stored as a source-linked summary evidence item. It helps operators read the case faster, but it does not directly add to the risk score.
|
||||||
|
|
||||||
|
### 6.3 Face and Person Presence Detection
|
||||||
|
|
||||||
|
`src/rights_filter/analysis/face_person_detection.py` uses OpenCV Haar cascades for presence-only face/person detection. It detects whether a face-like/person-like signal exists in the image.
|
||||||
|
|
||||||
|
Important boundary:
|
||||||
|
|
||||||
|
- it does not identify a person;
|
||||||
|
- it does not store face embeddings;
|
||||||
|
- it does not perform biometric matching;
|
||||||
|
- it is used only as a review-priority signal.
|
||||||
|
|
||||||
|
### 6.4 Image Fingerprints and pHash Similarity
|
||||||
|
|
||||||
|
`src/rights_filter/analysis/fingerprints.py` generates:
|
||||||
|
|
||||||
|
- SHA-256 exact file fingerprint;
|
||||||
|
- 64-bit perceptual hash from an 8x8 grayscale thumbnail.
|
||||||
|
|
||||||
|
pHash similarity is computed from Hamming distance. This is not a learned ML model, but it is a key algorithmic image-comparison feature. A similarity score close to `1.0` means the images are visually very similar by this hash method.
|
||||||
|
|
||||||
|
## 7. Risk Score and Confidence Model
|
||||||
|
|
||||||
|
The risk score is not an LLM-generated probability. It is a rule-based triage score implemented in `src/rights_filter/analysis/risk_scoring.py`.
|
||||||
|
|
||||||
|
The scorer adds points based on evidence type:
|
||||||
|
|
||||||
|
- pHash similarity `>= 0.9`: strong image similarity signal.
|
||||||
|
- Face/person presence: additional review signal.
|
||||||
|
- Google full image match: strong external match.
|
||||||
|
- Google partial/page match: medium external match.
|
||||||
|
- Google visual match: weaker supporting signal.
|
||||||
|
- Promoted Naver search result: score based on confidence.
|
||||||
|
- LLM summary: no direct score contribution.
|
||||||
|
|
||||||
|
The final score is capped at 100 and mapped into bands:
|
||||||
|
|
||||||
|
- `high`: 70 or above;
|
||||||
|
- `medium`: 30 to 69;
|
||||||
|
- `low`: below 30.
|
||||||
|
|
||||||
|
Therefore, `riskScore = 100` means "highest review priority under the rule set." It does not mean "100% legally infringing."
|
||||||
|
|
||||||
|
## 8. Evidence Model
|
||||||
|
|
||||||
|
Evidence is the central unit of the system. Each evidence item can include:
|
||||||
|
|
||||||
|
- source, such as fingerprint, face, Google, Naver, LLM, or failure;
|
||||||
|
- reason/title;
|
||||||
|
- confidence;
|
||||||
|
- query and query strategy;
|
||||||
|
- URL, image URL, thumbnail URL, source page URL;
|
||||||
|
- match type and provider score;
|
||||||
|
- source evidence IDs;
|
||||||
|
- contribution status;
|
||||||
|
- operator status.
|
||||||
|
|
||||||
|
Operators can mark evidence as used or unused for judgment. This keeps the final decision explainable without allowing raw automation to become the final decision maker.
|
||||||
|
|
||||||
|
## 9. External Search Tool Usage
|
||||||
|
|
||||||
|
External integrations are provider-managed. The UI exposes them as "External Search Tool Usage" rather than generic "providers."
|
||||||
|
|
||||||
|
Supported provider paths include:
|
||||||
|
|
||||||
|
- Google Cloud Vision for image/web detection;
|
||||||
|
- Naver Search API for text-query based Korean evidence;
|
||||||
|
- Google Custom Search when configured, treated as an optional legacy/disable-capable path;
|
||||||
|
- Ollama local LLM for evidence summarization.
|
||||||
|
|
||||||
|
Provider state is calculated per submission:
|
||||||
|
|
||||||
|
- `covered`: evidence exists;
|
||||||
|
- `empty`: the tool ran but returned no useful result;
|
||||||
|
- `not_run`: no run has happened;
|
||||||
|
- `failed`: the tool attempted execution and failed;
|
||||||
|
- `disabled`: the tool is configured off.
|
||||||
|
|
||||||
|
This distinction avoids misleading queue states such as treating every enabled tool as merely pending.
|
||||||
|
|
||||||
|
## 10. Knowledge DB and Feedback Loop
|
||||||
|
|
||||||
|
Copyrighter includes a knowledge database for reusable review references:
|
||||||
|
|
||||||
|
- confirmed entries: accepted reusable references;
|
||||||
|
- watchlist entries: derived from held/rejected cases but not yet confirmed;
|
||||||
|
- excluded entries: false positives or stale references.
|
||||||
|
|
||||||
|
The knowledge DB helps the internal analyzer detect future similar submissions. It is deliberately operator-controlled: automated evidence can suggest candidates, but operators decide what becomes a reusable reference.
|
||||||
|
|
||||||
|
## 11. Persistence and Auditability
|
||||||
|
|
||||||
|
SQLite is used as the local persistence layer. The store manages:
|
||||||
|
|
||||||
|
- submissions;
|
||||||
|
- evidence;
|
||||||
|
- external search tool records;
|
||||||
|
- knowledge entries;
|
||||||
|
- collection candidates;
|
||||||
|
- corrections;
|
||||||
|
- audit events.
|
||||||
|
|
||||||
|
Audit events are created for important actions such as analysis runs, manual provider calls, provider setting changes, knowledge entry creation, and submission pruning. This makes the review process traceable and easier to inspect after a decision.
|
||||||
|
|
||||||
|
## 12. Operational Boundaries
|
||||||
|
|
||||||
|
The project intentionally keeps several strict boundaries:
|
||||||
|
|
||||||
|
- Do not automate Google Image Search, Google Lens, Naver web UI, or scrape result pages.
|
||||||
|
- Do not send original images to Naver; Naver is used through text queries.
|
||||||
|
- Do not store biometric templates, face embeddings, or celebrity identity matches from faces.
|
||||||
|
- Do not expose internal risk scores or evidence details to applicants.
|
||||||
|
- Do not let automated analysis change final review status.
|
||||||
|
- Do not treat LLM output as standalone evidence unless it links back to source evidence.
|
||||||
|
|
||||||
|
These boundaries make the system more defensible: AI/ML is used to support review, not to replace accountable human judgment.
|
||||||
|
|
||||||
|
## 13. Configuration
|
||||||
|
|
||||||
|
Runtime behavior is configured through environment variables and provider runtime construction in `src/rights_filter/integrations/env_clients.py`.
|
||||||
|
|
||||||
|
Examples of configurable areas:
|
||||||
|
|
||||||
|
- Google Cloud Vision API key and request limits;
|
||||||
|
- Naver client ID/secret and query limits;
|
||||||
|
- Google Custom Search key/CX when used;
|
||||||
|
- Ollama base URL and model;
|
||||||
|
- daily limits and provider-specific policies;
|
||||||
|
- automatic Naver query limits;
|
||||||
|
- search result image comparison thresholds.
|
||||||
|
|
||||||
|
When a required external credential is missing, the corresponding tool can be disabled while the internal workflow continues.
|
||||||
|
|
||||||
|
## 14. Why This Can Be Described as an AI/ML System
|
||||||
|
|
||||||
|
The project can accurately be described as using AI/ML because it includes:
|
||||||
|
|
||||||
|
- ML-based computer vision through Google Cloud Vision Web Detection;
|
||||||
|
- local generative AI summarization through Ollama;
|
||||||
|
- classical computer vision face/person presence detection;
|
||||||
|
- algorithmic image similarity through perceptual hashing;
|
||||||
|
- rule-based evidence scoring for review triage.
|
||||||
|
|
||||||
|
The strongest and most accurate positioning is:
|
||||||
|
|
||||||
|
> Copyrighter is an AI/ML-assisted image rights review platform that automatically collects, compares, and summarizes source-linked evidence, while keeping final rights decisions under human operator control.
|
||||||
|
|
||||||
|
## 15. Current Limitations
|
||||||
|
|
||||||
|
The system is a review-assistance platform, not a legal decision engine. Known limitations include:
|
||||||
|
|
||||||
|
- image similarity does not prove copyright ownership;
|
||||||
|
- search results can be incomplete, duplicated, stale, or misleading;
|
||||||
|
- weak labels and visually similar images are low-confidence signals;
|
||||||
|
- LLM summaries can only be trusted to the extent that the source evidence is complete and correctly linked;
|
||||||
|
- provider failures and quota limits must be visible to operators rather than silently treated as low risk.
|
||||||
|
|
||||||
|
These limitations are handled by surfacing evidence, source links, provider status, and manual operator controls in the UI.
|
||||||
|
|
@ -0,0 +1,96 @@
|
||||||
|
---
|
||||||
|
title: No Demo Fallbacks in Production Review Tools
|
||||||
|
date: 2026-05-27
|
||||||
|
category: docs/solutions/best-practices/
|
||||||
|
module: copyrighter operations hardening
|
||||||
|
problem_type: best_practice
|
||||||
|
component: development_workflow
|
||||||
|
severity: high
|
||||||
|
applies_when:
|
||||||
|
- "A review or moderation tool is used for operator decisions"
|
||||||
|
- "Local tests need fixtures that should not affect production behavior"
|
||||||
|
- "API failure, image decoding failure, or database drift could change operator judgment"
|
||||||
|
tags: [demo-data, operator-ui, image-analysis, sqlite, evidence-ids]
|
||||||
|
---
|
||||||
|
|
||||||
|
# No Demo Fallbacks in Production Review Tools
|
||||||
|
|
||||||
|
## Context
|
||||||
|
The Copyrighter operator console and analysis engine had several convenience paths that were useful while prototyping but unsafe for rights review work. The frontend could render hardcoded sample cases when the API failed, face detection could infer a face from marker text in image bytes, pHash could turn non-image bytes into a fuzzy hash, and evidence IDs used Python's salted `hash()`.
|
||||||
|
|
||||||
|
## Guidance
|
||||||
|
Keep demo behavior out of production runtime paths. If a test needs synthetic evidence, inject it through an explicit test double rather than hiding it in product code.
|
||||||
|
|
||||||
|
For UI startup, initialize operational collections as empty and render a clear API failure state:
|
||||||
|
|
||||||
|
```js
|
||||||
|
const submissions = [];
|
||||||
|
|
||||||
|
function showApiError(message) {
|
||||||
|
state.apiError = message;
|
||||||
|
const target = document.getElementById("queue-health");
|
||||||
|
if (target) target.textContent = message;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
For analysis, failed image decoding should mean "no local signal", not a text-marker signal or fuzzy byte similarity:
|
||||||
|
|
||||||
|
```py
|
||||||
|
def detect(self, image: ImagePayload) -> FacePersonSignal:
|
||||||
|
return self._detect_with_opencv(image.content)
|
||||||
|
```
|
||||||
|
|
||||||
|
When a fallback value is necessary, make it non-contributing unless it is an exact match. For evidence identity, derive IDs from a stable digest:
|
||||||
|
|
||||||
|
```py
|
||||||
|
return "ev-" + hashlib.sha256(base.encode("utf-8")).hexdigest()[:24]
|
||||||
|
```
|
||||||
|
|
||||||
|
For persistence, add enough typed columns and write-time validation that malformed payloads fail before they become silent JSON drift.
|
||||||
|
|
||||||
|
## Why This Matters
|
||||||
|
Operator tools are decision surfaces. If an API failure renders fake cases, or a failed analyzer creates plausible-looking evidence, the operator can make a real decision from non-real inputs. Stable evidence IDs also matter because dedupe, audit trails, and reanalysis history depend on the same evidence retaining the same identity across restarts.
|
||||||
|
|
||||||
|
## When to Apply
|
||||||
|
- Any console that reviewers use to approve, reject, or hold cases.
|
||||||
|
- Any analyzer fallback that can influence risk score, evidence grouping, or case status.
|
||||||
|
- Any local fixture or sample data path that could be bundled with the UI.
|
||||||
|
- Any persistence layer that stores flexible payloads but still feeds operational decisions.
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
Before:
|
||||||
|
|
||||||
|
```py
|
||||||
|
if opencv_signal.present:
|
||||||
|
return opencv_signal
|
||||||
|
return self._detect_marker_text(image.content)
|
||||||
|
```
|
||||||
|
|
||||||
|
After:
|
||||||
|
|
||||||
|
```py
|
||||||
|
return self._detect_with_opencv(image.content)
|
||||||
|
```
|
||||||
|
|
||||||
|
Before:
|
||||||
|
|
||||||
|
```js
|
||||||
|
const submissions = [{ id: "SUB-1007", riskBand: "high" }];
|
||||||
|
```
|
||||||
|
|
||||||
|
After:
|
||||||
|
|
||||||
|
```js
|
||||||
|
const submissions = [];
|
||||||
|
```
|
||||||
|
|
||||||
|
Tests should move the synthetic signal into the test boundary:
|
||||||
|
|
||||||
|
```py
|
||||||
|
class OneFaceDetector:
|
||||||
|
def detect(self, image):
|
||||||
|
return FacePersonSignal(face_count=1, person_count=1)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Related
|
||||||
|
- Review fix touched `src/rights_filter/analysis/face_person_detection.py`, `src/rights_filter/analysis/fingerprints.py`, `src/rights_filter/server/sqlite_store.py`, and `web/operator-gui/app.js`.
|
||||||
|
|
@ -0,0 +1,72 @@
|
||||||
|
# Folder-Aware Submission Queue Design (2026-05-29)
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
- On server startup, do not auto-import images from the configured default folder.
|
||||||
|
- Activate an import queue only when the user requests a folder import via `POST /api/submissions/import-folder`.
|
||||||
|
- Keep queue results in SQLite, keyed by folder path.
|
||||||
|
- Re-importing the same folder should keep only currently present files (deleted files are removed from that queue).
|
||||||
|
- Importing a different folder should switch active queue and avoid mixing submissions across queues.
|
||||||
|
|
||||||
|
## Requirements from user intent
|
||||||
|
1. Startup is central and shared; queue selection is driven by user-imported folders.
|
||||||
|
2. User folder import sets the active queue for the current session.
|
||||||
|
3. Queue state and submission mapping are persisted in DB.
|
||||||
|
4. Same folder re-import returns only remaining submissions.
|
||||||
|
5. Different folder import starts/activates a separate queue.
|
||||||
|
|
||||||
|
## Data model
|
||||||
|
- `submission_queues` table:
|
||||||
|
- `id`: `queue-{sha1(abs_path)[:16]}`
|
||||||
|
- `folder_path`: absolute folder path
|
||||||
|
- `label`: queue label (default = folder name)
|
||||||
|
- `is_active`: 1 if active queue
|
||||||
|
- `created_at`, `created_epoch`
|
||||||
|
- `last_imported_at`, `last_imported_epoch`
|
||||||
|
- `submissions` already stores `queue_id` and defaults to `""`.
|
||||||
|
|
||||||
|
## Runtime flow
|
||||||
|
1. Server startup
|
||||||
|
- `CopyrighterStore.initialize()` only.
|
||||||
|
- No `seed_from_image_store(...)` during startup.
|
||||||
|
|
||||||
|
2. User-import folder
|
||||||
|
- Endpoint: `POST /api/submissions/import-folder`
|
||||||
|
- Request body: `{ path: "..." }` (or `{ folder: "..." }`)
|
||||||
|
- Handler creates `LocalSubmissionImageStore(path)` and calls `seed_from_image_store()`.
|
||||||
|
- `seed_from_image_store()` calls `ensure_queue(path)` first.
|
||||||
|
- Queue row is created/selected and marked active.
|
||||||
|
- First-time legacy queue-less rows are migrated to the active queue (`queue_id`).
|
||||||
|
- Bootstrap payload is returned with `submissionQueue` and queue-filtered `submissions`.
|
||||||
|
|
||||||
|
3. Same folder re-import
|
||||||
|
- `seed_from_image_store()` calls `_prune_missing_submission_files(...)` for that queue.
|
||||||
|
- Existing records missing on disk are removed.
|
||||||
|
- Only records missing from DB are added.
|
||||||
|
|
||||||
|
4. Different folder import
|
||||||
|
- `ensure_queue(path)` deactivates previous queues and activates the target folder queue.
|
||||||
|
- Bootstrap/reload endpoints use active queue only.
|
||||||
|
|
||||||
|
5. Persistence
|
||||||
|
- Active queue metadata is saved in `submission_queues`.
|
||||||
|
- Restarting store (`initialize()` + new `CopyrighterStore`) keeps active queue.
|
||||||
|
|
||||||
|
## API behavior
|
||||||
|
- `GET /api/bootstrap`
|
||||||
|
- Returns `submissionQueue` for the currently active queue.
|
||||||
|
- Returns only `submissions` belonging to that queue.
|
||||||
|
- `POST /api/submissions/reload`
|
||||||
|
- Re-syncs only active queue.
|
||||||
|
- `POST /api/submissions/import-folder`
|
||||||
|
- Switches/creates active queue, persists queue metadata, syncs only selected folder.
|
||||||
|
|
||||||
|
## Verification coverage
|
||||||
|
- HTTP
|
||||||
|
- `tests/rights_filter/server/test_http_app.py`
|
||||||
|
- `test_http_server_does_not_auto_import_on_startup`
|
||||||
|
- `test_http_server_reimports_same_folder_only_with_remaining_submissions`
|
||||||
|
- Storage/DB
|
||||||
|
- `tests/rights_filter/server/test_sqlite_store.py`
|
||||||
|
- `test_sqlite_store_switches_active_queue_when_importing_from_different_folders`
|
||||||
|
- `test_sqlite_store_reload_uses_remaining_files_only_for_the_active_queue`
|
||||||
|
- `test_sqlite_store_persists_active_queue_in_database`
|
||||||
|
|
@ -0,0 +1,113 @@
|
||||||
|
# Multi Candidate Knowledge Promotion Implementation Plan
|
||||||
|
|
||||||
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||||
|
|
||||||
|
**Goal:** Allow an operator to select several collected image-search candidates and promote them into one knowledge-base entry with multiple visual samples.
|
||||||
|
|
||||||
|
**Architecture:** Keep the current Naver candidate collection flow and add a batch-promotion path beside the existing single-candidate path. The SQLite store owns the merge behavior, the HTTP layer exposes one batch endpoint, and the GUI adds selection controls plus one shared promotion form.
|
||||||
|
|
||||||
|
**Tech Stack:** Python standard-library HTTP server, SQLite JSON payload store, pytest, static HTML/CSS/JavaScript operator GUI.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Task 1: Store Contract
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `tests/rights_filter/server/test_sqlite_store.py`
|
||||||
|
- Modify: `src/rights_filter/server/sqlite_store.py`
|
||||||
|
|
||||||
|
- [ ] **Step 1: Write the failing test**
|
||||||
|
|
||||||
|
Add a test that collects two candidates and calls `promote_collection_candidates({"candidate_ids": [...]})`. Assert that one knowledge entry is created, both candidate fingerprints are preserved, and both candidates point to the same `promotedKnowledgeId`.
|
||||||
|
|
||||||
|
- [ ] **Step 2: Run the test to verify RED**
|
||||||
|
|
||||||
|
Run: `python -m pytest tests/rights_filter/server/test_sqlite_store.py::test_sqlite_store_promotes_multiple_candidates_into_one_knowledge_entry -q`
|
||||||
|
|
||||||
|
Expected: fail because `CopyrighterStore.promote_collection_candidates` does not exist.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Implement the store merge**
|
||||||
|
|
||||||
|
Add `promote_collection_candidates` and let the existing `promote_collection_candidate` delegate to it with a single ID. The merged knowledge payload must include `sourceCandidates`, `sampleFingerprints`, `imageAsset`, `imageAssets`, `imageFacts`, and operator metadata.
|
||||||
|
|
||||||
|
- [ ] **Step 4: Run the store tests**
|
||||||
|
|
||||||
|
Run: `python -m pytest tests/rights_filter/server/test_sqlite_store.py::test_sqlite_store_collects_keyword_candidates_and_promotes_one_to_knowledge tests/rights_filter/server/test_sqlite_store.py::test_sqlite_store_promotes_multiple_candidates_into_one_knowledge_entry -q`
|
||||||
|
|
||||||
|
Expected: pass.
|
||||||
|
|
||||||
|
### Task 2: HTTP Endpoint
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `tests/rights_filter/server/test_http_app.py`
|
||||||
|
- Modify: `src/rights_filter/server/http_app.py`
|
||||||
|
|
||||||
|
- [ ] **Step 1: Write the failing test**
|
||||||
|
|
||||||
|
Add a test that posts to `POST /api/collections/candidates/promote-batch` with two candidate IDs and asserts that the response contains one merged knowledge entry.
|
||||||
|
|
||||||
|
- [ ] **Step 2: Run the test to verify RED**
|
||||||
|
|
||||||
|
Run: `python -m pytest tests/rights_filter/server/test_http_app.py::test_http_server_promotes_multiple_collection_candidates_into_one_knowledge_entry -q`
|
||||||
|
|
||||||
|
Expected: fail with HTTP 404 or missing route.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Implement the route**
|
||||||
|
|
||||||
|
Route `/api/collections/candidates/promote-batch` to `store.promote_collection_candidates(body)` and keep `/api/collections/candidates/{id}/promote` intact.
|
||||||
|
|
||||||
|
- [ ] **Step 4: Run the HTTP tests**
|
||||||
|
|
||||||
|
Run: `python -m pytest tests/rights_filter/server/test_http_app.py::test_http_server_collects_keyword_candidates_and_promotes_candidate tests/rights_filter/server/test_http_app.py::test_http_server_promotes_multiple_collection_candidates_into_one_knowledge_entry -q`
|
||||||
|
|
||||||
|
Expected: pass.
|
||||||
|
|
||||||
|
### Task 3: Operator GUI
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `tests/operator_gui/test_static_workbench.py`
|
||||||
|
- Modify: `web/operator-gui/index.html`
|
||||||
|
- Modify: `web/operator-gui/app.js`
|
||||||
|
- Modify: `web/operator-gui/styles.css`
|
||||||
|
|
||||||
|
- [ ] **Step 1: Write the static GUI test**
|
||||||
|
|
||||||
|
Assert that the GUI exposes candidate checkboxes, a shared collection promotion form, and a call to `/api/collections/candidates/promote-batch`.
|
||||||
|
|
||||||
|
- [ ] **Step 2: Run the static GUI test to verify RED**
|
||||||
|
|
||||||
|
Run: `python -m pytest tests/operator_gui/test_static_workbench.py::test_operator_gui_exposes_keyword_candidate_collection_workflow -q`
|
||||||
|
|
||||||
|
Expected: fail because the batch form and handler are not present.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Implement the UI**
|
||||||
|
|
||||||
|
Add checkboxes to candidate cards, a compact batch promotion form under the candidate list, and a `promoteSelectedCollectionCandidates` handler that posts selected IDs plus name/type/aliases/keywords/memo.
|
||||||
|
|
||||||
|
- [ ] **Step 4: Run GUI checks**
|
||||||
|
|
||||||
|
Run: `python -m pytest tests/operator_gui/test_static_workbench.py -q`
|
||||||
|
Run: `node --check web/operator-gui/app.js`
|
||||||
|
|
||||||
|
Expected: pass.
|
||||||
|
|
||||||
|
### Task 4: End-To-End Verification
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- No additional source files.
|
||||||
|
|
||||||
|
- [ ] **Step 1: Run full automated verification**
|
||||||
|
|
||||||
|
Run: `python -m pytest`
|
||||||
|
|
||||||
|
Expected: all tests pass.
|
||||||
|
|
||||||
|
- [ ] **Step 2: Restart local server on port 9500**
|
||||||
|
|
||||||
|
Run: `python run_copyrighter_server.py --host 127.0.0.1 --port 9500`
|
||||||
|
|
||||||
|
Expected: `/health` returns `{"status":"ok","port":9500}`.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Visual smoke check**
|
||||||
|
|
||||||
|
Open `http://127.0.0.1:9500`, switch to the Knowledge Base view, and confirm candidate cards show stable checkboxes and a single batch-promotion control.
|
||||||
|
|
@ -0,0 +1,380 @@
|
||||||
|
# Image Rights Operator GUI Design
|
||||||
|
|
||||||
|
> This document defines the user-facing GUI/UX shape for the internal image rights review tool. It does not implement a frontend app yet; the current workspace has no frontend framework or design system.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Give operators one coherent internal workbench where they can review every submitted image, inspect rights-risk evidence, run or retry enrichment, manage the criteria database, correct bad decisions, and control external providers without exposing automated analysis to applicants.
|
||||||
|
|
||||||
|
## Product Posture
|
||||||
|
|
||||||
|
This is an operational review console, not a marketing site. The interface should be dense, calm, audit-friendly, and built for repeated decisions under risk. The first screen should be useful immediately: a review queue with risk-ranked submissions and enough context to choose the next case.
|
||||||
|
|
||||||
|
## Visual Thesis
|
||||||
|
|
||||||
|
Use a quiet, utilitarian dashboard style: neutral surfaces, high contrast text, restrained accent colors for risk and action, compact spacing, stable table/list layouts, and persistent context panels. Avoid decorative hero sections, oversized cards, gradients, or illustrative marketing composition.
|
||||||
|
|
||||||
|
## Core Navigation
|
||||||
|
|
||||||
|
Use a persistent left navigation rail with these primary areas:
|
||||||
|
|
||||||
|
- Review Queue
|
||||||
|
- Case Review
|
||||||
|
- Evidence Search
|
||||||
|
- Knowledge Base
|
||||||
|
- Corrections
|
||||||
|
- Provider Controls
|
||||||
|
- Audit Log
|
||||||
|
|
||||||
|
Use a top command bar for global search, current provider mode, queue health, and operator identity. Provider status should be visible but not visually dominant.
|
||||||
|
|
||||||
|
## Primary Screens
|
||||||
|
|
||||||
|
### 1. Review Queue
|
||||||
|
|
||||||
|
Purpose: let operators decide what to review next.
|
||||||
|
|
||||||
|
The queue should be a dense table or split list with:
|
||||||
|
|
||||||
|
- Thumbnail
|
||||||
|
- Submission ID
|
||||||
|
- Risk score and band
|
||||||
|
- Top two risk reasons
|
||||||
|
- Provider state: internal, Naver, Google, LLM, failed, skipped, pending
|
||||||
|
- Applicant-visible status
|
||||||
|
- Operator decision status
|
||||||
|
- Age / submitted time
|
||||||
|
- Last analysis time
|
||||||
|
|
||||||
|
Required controls:
|
||||||
|
|
||||||
|
- Risk band filter: high, medium, low, failed, pending
|
||||||
|
- Source filter: Naver hit, Google hit, fingerprint match, face/person, LLM summary, failed provider
|
||||||
|
- Decision filter: unreviewed, held, rejected, approved, corrected
|
||||||
|
- Sort by risk, newest, oldest, analysis failure, provider failure
|
||||||
|
- Bulk selection only for non-decisive actions such as re-run analysis or assign reviewer; no bulk approve/reject in v1
|
||||||
|
|
||||||
|
UX rule: the queue should never require opening every item just to understand why it is high risk.
|
||||||
|
|
||||||
|
### 2. Case Review
|
||||||
|
|
||||||
|
Purpose: let one operator make a final decision on one submitted image.
|
||||||
|
|
||||||
|
Use a three-pane layout on desktop:
|
||||||
|
|
||||||
|
- Left pane: image viewer
|
||||||
|
- Center pane: evidence and reasoning
|
||||||
|
- Right pane: decision and case controls
|
||||||
|
|
||||||
|
Left image pane:
|
||||||
|
|
||||||
|
- Original/internal review image preview
|
||||||
|
- Zoom, fit, actual size, rotate
|
||||||
|
- Side-by-side thumbnails for visually similar or matching search images
|
||||||
|
- Basic file facts: dimensions, submitted time, analysis version
|
||||||
|
- Clear indication when only an internal derivative is shown
|
||||||
|
|
||||||
|
Center evidence pane:
|
||||||
|
|
||||||
|
- Risk score, band, and top reasons at the top
|
||||||
|
- Evidence grouped by source:
|
||||||
|
- Internal fingerprints
|
||||||
|
- Prior rejection similarity
|
||||||
|
- Face/person presence
|
||||||
|
- Naver search results
|
||||||
|
- Google Web Detection results
|
||||||
|
- Internal LLM summary
|
||||||
|
- Failures and skipped providers
|
||||||
|
- Each evidence row should show source, confidence or strength, query if relevant, URL/domain, retrieval time, and whether it contributed to the score
|
||||||
|
- LLM summary must show citations or source chips; source-less claims appear as unverified notes and do not appear as score reasons
|
||||||
|
- Failure states must be visible, not hidden in logs
|
||||||
|
|
||||||
|
Right decision pane:
|
||||||
|
|
||||||
|
- Current recommendation: low, medium, high, needs review, failed/partial
|
||||||
|
- Manual action buttons: approve, hold, reject
|
||||||
|
- Required memo on reject and correction
|
||||||
|
- Optional memo on approve/hold
|
||||||
|
- Rejection outcome preview: whether a rejected-image entry or candidate will be added to the knowledge base
|
||||||
|
- Quick actions:
|
||||||
|
- Add entity to knowledge base
|
||||||
|
- Mark evidence irrelevant
|
||||||
|
- Re-run enrichment
|
||||||
|
- Disable stale automatic entry
|
||||||
|
- Open correction flow
|
||||||
|
|
||||||
|
UX rule: the final decision controls must be visually and functionally separate from the automated recommendation. The UI must make it clear that the system suggests and the operator decides.
|
||||||
|
|
||||||
|
### 3. Evidence Search
|
||||||
|
|
||||||
|
Purpose: let operators inspect and reproduce why search evidence appeared.
|
||||||
|
|
||||||
|
This screen should show:
|
||||||
|
|
||||||
|
- Search query history per submission
|
||||||
|
- Naver result list with title, thumbnail, source page, image URL, rank, timestamp
|
||||||
|
- Google Web Detection entities, matching images, pages, and labels
|
||||||
|
- LLM-generated query candidates with execution status
|
||||||
|
- Provider failures, quota skips, disabled-provider states
|
||||||
|
|
||||||
|
Required controls:
|
||||||
|
|
||||||
|
- Manual text query run for Naver, subject to policy and quota
|
||||||
|
- Re-run selected query
|
||||||
|
- Mark result as relevant, irrelevant, duplicate, or unsafe
|
||||||
|
- Create knowledge-base candidate from a result
|
||||||
|
|
||||||
|
UX rule: Naver is text-query search only. The UI must not offer an image upload reverse-search interaction for Naver.
|
||||||
|
|
||||||
|
### 4. Knowledge Base
|
||||||
|
|
||||||
|
Purpose: let operators build and maintain the criteria database that improves future filtering.
|
||||||
|
|
||||||
|
Main objects:
|
||||||
|
|
||||||
|
- Celebrity / public figure
|
||||||
|
- Group
|
||||||
|
- Work
|
||||||
|
- Character
|
||||||
|
- Webtoon
|
||||||
|
- Game
|
||||||
|
- Rejected image reference
|
||||||
|
- Other policy-relevant entity
|
||||||
|
|
||||||
|
Each entity detail should show:
|
||||||
|
|
||||||
|
- Name
|
||||||
|
- Aliases
|
||||||
|
- Related keywords
|
||||||
|
- Type
|
||||||
|
- Policy memo
|
||||||
|
- Exception conditions
|
||||||
|
- Sample image fingerprints or rejected-image references
|
||||||
|
- Provenance: manual, automatic rejection, search evidence
|
||||||
|
- Active/inactive state
|
||||||
|
- Created from decision or operator entry
|
||||||
|
|
||||||
|
Required controls:
|
||||||
|
|
||||||
|
- Create manual entity
|
||||||
|
- Add alias
|
||||||
|
- Add related keyword
|
||||||
|
- Add sample fingerprint from reviewed case
|
||||||
|
- Deactivate entry
|
||||||
|
- Reactivate entry only with memo
|
||||||
|
- View affected future matches or historical matches
|
||||||
|
|
||||||
|
UX rule: automatic entries and manual entries must look different. Operators should not mistake a rejection-derived entry for a verified policy rule.
|
||||||
|
|
||||||
|
### 5. Corrections
|
||||||
|
|
||||||
|
Purpose: prevent false positives from contaminating future review.
|
||||||
|
|
||||||
|
This screen should focus on decision lineage:
|
||||||
|
|
||||||
|
- Corrected decisions
|
||||||
|
- Automatic knowledge entries derived from each decision
|
||||||
|
- Current active/inactive state
|
||||||
|
- Reason for correction
|
||||||
|
- Operator who corrected
|
||||||
|
- Timestamp
|
||||||
|
|
||||||
|
Required controls:
|
||||||
|
|
||||||
|
- Correct prior rejection
|
||||||
|
- Deactivate all derived automatic entries
|
||||||
|
- Keep selected derived entries active with memo
|
||||||
|
- Add correction note
|
||||||
|
- Show before/after risk impact for future similar submissions when available
|
||||||
|
|
||||||
|
UX rule: correction should be a first-class workflow, not a hidden admin cleanup task.
|
||||||
|
|
||||||
|
### 6. Provider Controls
|
||||||
|
|
||||||
|
Purpose: let admins safely operate external and assisted analysis modes.
|
||||||
|
|
||||||
|
Provider cards or rows:
|
||||||
|
|
||||||
|
- Internal analysis
|
||||||
|
- Naver search
|
||||||
|
- Google Web Detection
|
||||||
|
- Internal LLM
|
||||||
|
|
||||||
|
Each provider should show:
|
||||||
|
|
||||||
|
- Enabled/disabled state
|
||||||
|
- Compliance approval state
|
||||||
|
- Daily quota and usage
|
||||||
|
- Last successful call
|
||||||
|
- Last failure
|
||||||
|
- Data boundary summary
|
||||||
|
- Emergency disable control
|
||||||
|
|
||||||
|
Required controls:
|
||||||
|
|
||||||
|
- Disable provider immediately
|
||||||
|
- Set daily limit
|
||||||
|
- View recent failures
|
||||||
|
- Retry failed enrichments
|
||||||
|
- Export provider usage audit
|
||||||
|
|
||||||
|
UX rule: provider controls are admin-only and should not be mixed into normal operator decision controls.
|
||||||
|
|
||||||
|
### 7. Audit Log
|
||||||
|
|
||||||
|
Purpose: make decisions and evidence changes reviewable.
|
||||||
|
|
||||||
|
Audit events:
|
||||||
|
|
||||||
|
- Analysis run created
|
||||||
|
- Provider called/skipped/failed
|
||||||
|
- LLM summary generated
|
||||||
|
- Operator decision created
|
||||||
|
- Rejection-derived entry created
|
||||||
|
- Knowledge entry manually created
|
||||||
|
- Knowledge entry deactivated/reactivated
|
||||||
|
- Correction applied
|
||||||
|
- Provider setting changed
|
||||||
|
|
||||||
|
Audit rows should include actor, timestamp, object, event type, before/after where relevant, and linked case.
|
||||||
|
|
||||||
|
## End-To-End Operator Flow
|
||||||
|
|
||||||
|
1. Operator opens Review Queue.
|
||||||
|
2. Operator filters to high risk or failed analysis.
|
||||||
|
3. Operator opens a case.
|
||||||
|
4. Case Review shows image, risk score, top reasons, grouped evidence, and provider state.
|
||||||
|
5. Operator opens relevant Naver/Google result links or thumbnails.
|
||||||
|
6. Operator reads LLM summary only as a source-linked digest.
|
||||||
|
7. Operator approves, holds, or rejects manually.
|
||||||
|
8. If rejecting, operator confirms memo and knowledge-base accumulation behavior.
|
||||||
|
9. If later wrong, operator uses Corrections to deactivate derived entries.
|
||||||
|
|
||||||
|
## Information Architecture Principles
|
||||||
|
|
||||||
|
- Queue optimizes prioritization.
|
||||||
|
- Case Review optimizes judgment.
|
||||||
|
- Evidence Search optimizes traceability.
|
||||||
|
- Knowledge Base optimizes future detection quality.
|
||||||
|
- Corrections optimize decontamination.
|
||||||
|
- Provider Controls optimize operational safety.
|
||||||
|
- Audit Log optimizes accountability.
|
||||||
|
|
||||||
|
## States And Empty States
|
||||||
|
|
||||||
|
Every major screen must handle:
|
||||||
|
|
||||||
|
- No analysis yet
|
||||||
|
- Analysis pending
|
||||||
|
- Internal-only mode
|
||||||
|
- External provider disabled
|
||||||
|
- Provider quota reached
|
||||||
|
- Provider failed
|
||||||
|
- Search returned no result
|
||||||
|
- LLM unavailable
|
||||||
|
- LLM summary unverified
|
||||||
|
- Evidence conflict
|
||||||
|
- Existing corrected decision
|
||||||
|
- Knowledge entry inactive
|
||||||
|
|
||||||
|
Empty states should be operational, not explanatory marketing copy. Example intent: "No Naver results for this query" with the query and timestamp, not a generic blank panel.
|
||||||
|
|
||||||
|
## Interaction Design
|
||||||
|
|
||||||
|
Recommended controls:
|
||||||
|
|
||||||
|
- Icon buttons for zoom, rotate, open link, copy URL, retry, disable, history
|
||||||
|
- Segmented controls for risk filters and evidence source filters
|
||||||
|
- Toggle switches for provider enablement
|
||||||
|
- Checkbox selection for bulk queue operations
|
||||||
|
- Menus for secondary actions such as mark irrelevant or create candidate
|
||||||
|
- Textarea with required-state validation for rejection and correction memos
|
||||||
|
- Tabs inside evidence pane only when vertical grouping becomes too long
|
||||||
|
|
||||||
|
Do not place cards inside cards. Use panels for major layout regions and compact rows for repeated evidence items.
|
||||||
|
|
||||||
|
## Accessibility And Safety
|
||||||
|
|
||||||
|
- All actions must be keyboard reachable.
|
||||||
|
- Focus states must be visible.
|
||||||
|
- Risk cannot be indicated by color alone; include labels such as high, medium, low, failed.
|
||||||
|
- External links open with clear source/domain display.
|
||||||
|
- Destructive or contamination-affecting actions require confirmation and memo.
|
||||||
|
- Applicant-facing surfaces must not be able to render this GUI data.
|
||||||
|
|
||||||
|
## Responsive Behavior
|
||||||
|
|
||||||
|
Primary target is desktop because image comparison and evidence review need space.
|
||||||
|
|
||||||
|
On tablet:
|
||||||
|
|
||||||
|
- Queue remains usable.
|
||||||
|
- Case Review becomes two-pane: image/evidence tabs plus decision panel.
|
||||||
|
|
||||||
|
On mobile:
|
||||||
|
|
||||||
|
- Allow triage and status checks.
|
||||||
|
- Avoid final reject/correction workflows unless the actual target product requires mobile operations.
|
||||||
|
|
||||||
|
## Data Needed From Current Backend
|
||||||
|
|
||||||
|
The existing backend presenter and evidence model already provide most of the needed data:
|
||||||
|
|
||||||
|
- Submission ID
|
||||||
|
- Image reference
|
||||||
|
- Score and band
|
||||||
|
- Top reasons
|
||||||
|
- Evidence grouped by source
|
||||||
|
- Provider status
|
||||||
|
- LLM summaries
|
||||||
|
- Manual actions
|
||||||
|
- Knowledge-base provenance
|
||||||
|
- Correction/deactivation hooks
|
||||||
|
|
||||||
|
Missing for a full GUI integration:
|
||||||
|
|
||||||
|
- Real user/auth roles
|
||||||
|
- Persistent DB records
|
||||||
|
- Real submission list source
|
||||||
|
- Original image storage and signed URL policy
|
||||||
|
- Actual frontend framework
|
||||||
|
- Admin routing
|
||||||
|
- Audit event store
|
||||||
|
|
||||||
|
## Design Acceptance Criteria
|
||||||
|
|
||||||
|
- An operator can identify the next high-risk case from the queue without opening every case.
|
||||||
|
- An operator can make a final decision from one case screen without jumping between unrelated pages.
|
||||||
|
- Naver, Google, internal, and LLM evidence are visually distinct.
|
||||||
|
- LLM text is never presented as authoritative unless linked to source evidence.
|
||||||
|
- Rejection-derived knowledge-base changes are visible before confirmation.
|
||||||
|
- A wrong rejection can be corrected and its derived automatic entries deactivated.
|
||||||
|
- Provider failures and disabled states are visible to operators and admins.
|
||||||
|
- Applicant-facing views cannot access or display automated evidence.
|
||||||
|
|
||||||
|
## Recommended First GUI MVP
|
||||||
|
|
||||||
|
Build these first:
|
||||||
|
|
||||||
|
1. Review Queue
|
||||||
|
2. Case Review
|
||||||
|
3. Knowledge-base entry creation from a case
|
||||||
|
4. Correction flow for rejected decisions
|
||||||
|
5. Provider Controls read/write for Naver, Google, and LLM enablement
|
||||||
|
|
||||||
|
Defer these:
|
||||||
|
|
||||||
|
- Advanced analytics
|
||||||
|
- Bulk approve/reject
|
||||||
|
- Mobile full decision workflow
|
||||||
|
- Dedicated brand/logo detector UI
|
||||||
|
- Applicant-facing explanation or appeal flow
|
||||||
|
|
||||||
|
## Implementation Handoff Notes
|
||||||
|
|
||||||
|
The current repo has no frontend application. The next plan should either:
|
||||||
|
|
||||||
|
- add a small internal web admin app around the existing Python module, or
|
||||||
|
- integrate this GUI into the actual production app once its framework and routes are available.
|
||||||
|
|
||||||
|
The second path is preferable if a production admin app already exists elsewhere, because auth, image storage, audit logging, and submission state should follow the real product's conventions.
|
||||||
|
|
@ -0,0 +1,69 @@
|
||||||
|
# Insufficient Evidence Query Suggestions
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Operators can reach a case where external search has run, but the evidence is still too thin to make a confident approval, hold, or rejection decision. Today the console shows raw evidence and query history, but it does not suggest a concrete next action when the evidence is insufficient.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Add a lightweight workbench guide that detects insufficient evidence and generates safe follow-up query suggestions. The system must not run external searches automatically. It should only prepare likely useful queries and let the operator decide whether to execute one.
|
||||||
|
|
||||||
|
## User Experience
|
||||||
|
|
||||||
|
When the selected case has weak or sparse evidence, the evidence workbench shows a "근거 보강 추천" panel above the evidence groups. The panel explains that the current evidence is insufficient and shows a few suggested Naver query buttons.
|
||||||
|
|
||||||
|
Clicking a suggestion:
|
||||||
|
|
||||||
|
- switches to the workbench query tab;
|
||||||
|
- fills the existing manual query input;
|
||||||
|
- selects the normalized operator search provider;
|
||||||
|
- leaves execution to the operator through the existing submit button.
|
||||||
|
|
||||||
|
If evidence is already sufficient, the panel stays hidden.
|
||||||
|
|
||||||
|
## Evidence Sufficiency Rule
|
||||||
|
|
||||||
|
The first version uses a conservative client-side heuristic:
|
||||||
|
|
||||||
|
- direct image/page matches are strong evidence;
|
||||||
|
- Naver or Google searchable evidence is supporting evidence;
|
||||||
|
- a case is insufficient when it has no strong direct match or has fewer than two searchable evidence items;
|
||||||
|
- query suggestions are only shown when there is at least some indication that search has run, such as query history, provider state, or searchable evidence.
|
||||||
|
|
||||||
|
This avoids blocking decisions and avoids adding backend state.
|
||||||
|
|
||||||
|
## Query Generation
|
||||||
|
|
||||||
|
Suggestions are generated from the selected submission title and deduplicated against existing query history. The initial templates are:
|
||||||
|
|
||||||
|
- title
|
||||||
|
- title + " 저작권"
|
||||||
|
- title + " 공식"
|
||||||
|
- title + " 이미지 출처"
|
||||||
|
|
||||||
|
The list is capped at four suggestions.
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
In scope:
|
||||||
|
|
||||||
|
- static operator GUI markup, script, and styles;
|
||||||
|
- client-side insufficient evidence assessment;
|
||||||
|
- query suggestion rendering;
|
||||||
|
- click behavior that fills the manual query form;
|
||||||
|
- static tests for the UI contract.
|
||||||
|
|
||||||
|
Out of scope:
|
||||||
|
|
||||||
|
- automatic external search execution;
|
||||||
|
- backend API changes;
|
||||||
|
- hard decision blocking;
|
||||||
|
- machine-learned query generation.
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
Run the operator static suite:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
python -m pytest tests\operator_gui\test_static_workbench.py
|
||||||
|
```
|
||||||
4
package.json
Normal file
4
package.json
Normal file
|
|
@ -0,0 +1,4 @@
|
||||||
|
{
|
||||||
|
"private": true,
|
||||||
|
"type": "module"
|
||||||
|
}
|
||||||
15
run_copyrighter_server.py
Normal file
15
run_copyrighter_server.py
Normal file
|
|
@ -0,0 +1,15 @@
|
||||||
|
from pathlib import Path
|
||||||
|
import sys
|
||||||
|
|
||||||
|
|
||||||
|
ROOT = Path(__file__).resolve().parent
|
||||||
|
SRC = ROOT / "src"
|
||||||
|
|
||||||
|
if str(SRC) not in sys.path:
|
||||||
|
sys.path.insert(0, str(SRC))
|
||||||
|
|
||||||
|
from rights_filter.server.__main__ import main
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
1
src/rights_filter/__init__.py
Normal file
1
src/rights_filter/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
"""Rights filtering subsystem for image commercialization review."""
|
||||||
1
src/rights_filter/admin/__init__.py
Normal file
1
src/rights_filter/admin/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
"""Operator review helpers."""
|
||||||
14
src/rights_filter/admin/correction_handlers.py
Normal file
14
src/rights_filter/admin/correction_handlers.py
Normal file
|
|
@ -0,0 +1,14 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
InMemoryRightsFilterRepository,
|
||||||
|
KnowledgeBaseEntry,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def correct_rejected_decision(
|
||||||
|
repository: InMemoryRightsFilterRepository,
|
||||||
|
decision_id: str,
|
||||||
|
reason: str,
|
||||||
|
) -> list[KnowledgeBaseEntry]:
|
||||||
|
return repository.deactivate_entries_for_source_decision(decision_id, reason)
|
||||||
4
src/rights_filter/admin/decision_feedback.py
Normal file
4
src/rights_filter/admin/decision_feedback.py
Normal file
|
|
@ -0,0 +1,4 @@
|
||||||
|
from rights_filter.admin.review_handlers import record_operator_decision
|
||||||
|
from rights_filter.admin.correction_handlers import correct_rejected_decision
|
||||||
|
|
||||||
|
__all__ = ["correct_rejected_decision", "record_operator_decision"]
|
||||||
96
src/rights_filter/admin/detailed_review_presenter.py
Normal file
96
src/rights_filter/admin/detailed_review_presenter.py
Normal file
|
|
@ -0,0 +1,96 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
Evidence,
|
||||||
|
EvidenceSource,
|
||||||
|
InMemoryRightsFilterRepository,
|
||||||
|
ReviewStatus,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def detailed_review_for(
|
||||||
|
repository: InMemoryRightsFilterRepository,
|
||||||
|
submission_id: str,
|
||||||
|
image_reference: str | None = None,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
runs = repository.analysis_runs_for_submission(submission_id)
|
||||||
|
if not runs:
|
||||||
|
return {
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"image_reference": image_reference,
|
||||||
|
"analysis_available": False,
|
||||||
|
"evidence_groups": {},
|
||||||
|
"manual_actions": _manual_actions(),
|
||||||
|
}
|
||||||
|
|
||||||
|
run = runs[-1]
|
||||||
|
score = run.score
|
||||||
|
return {
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"image_reference": image_reference,
|
||||||
|
"analysis_available": True,
|
||||||
|
"score": score.score if score else None,
|
||||||
|
"band": score.band if score else None,
|
||||||
|
"top_reasons": score.reasons if score else [],
|
||||||
|
"evidence_groups": _group_evidence(run.evidence),
|
||||||
|
"provider_status": _provider_status(run.evidence),
|
||||||
|
"manual_actions": _manual_actions(),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _manual_actions() -> list[str]:
|
||||||
|
return [
|
||||||
|
ReviewStatus.APPROVED.value,
|
||||||
|
ReviewStatus.HELD.value,
|
||||||
|
ReviewStatus.REJECTED.value,
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _group_evidence(evidence: list[Evidence]) -> dict[str, list[dict[str, Any]]]:
|
||||||
|
groups: dict[str, list[dict[str, Any]]] = {
|
||||||
|
"internal": [],
|
||||||
|
"naver": [],
|
||||||
|
"google": [],
|
||||||
|
"llm": [],
|
||||||
|
"failures": [],
|
||||||
|
}
|
||||||
|
for item in evidence:
|
||||||
|
groups[_group_name(item)].append(_present_evidence(item))
|
||||||
|
return {key: value for key, value in groups.items() if value}
|
||||||
|
|
||||||
|
|
||||||
|
def _group_name(item: Evidence) -> str:
|
||||||
|
if item.source in {EvidenceSource.FINGERPRINT, EvidenceSource.FACE_PERSON}:
|
||||||
|
return "internal"
|
||||||
|
if item.source == EvidenceSource.NAVER_SEARCH:
|
||||||
|
return "naver"
|
||||||
|
if item.source == EvidenceSource.WEB_DETECTION:
|
||||||
|
return "google"
|
||||||
|
if item.source == EvidenceSource.LLM_SUMMARY:
|
||||||
|
return "llm"
|
||||||
|
return "failures"
|
||||||
|
|
||||||
|
|
||||||
|
def _present_evidence(item: Evidence) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"source": item.source,
|
||||||
|
"reason": item.reason,
|
||||||
|
"confidence": item.confidence,
|
||||||
|
"data": item.data,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _provider_status(evidence: list[Evidence]) -> dict[str, list[str]]:
|
||||||
|
status: dict[str, list[str]] = {}
|
||||||
|
for item in evidence:
|
||||||
|
if item.source in {
|
||||||
|
EvidenceSource.EXTERNAL_SKIPPED,
|
||||||
|
EvidenceSource.SEARCH_SKIPPED,
|
||||||
|
EvidenceSource.FAILURE,
|
||||||
|
EvidenceSource.ENRICHMENT_FAILURE,
|
||||||
|
}:
|
||||||
|
provider = str(item.data.get("provider", item.source.value))
|
||||||
|
status.setdefault(provider, []).append(item.reason)
|
||||||
|
return status
|
||||||
36
src/rights_filter/admin/knowledge_base_handlers.py
Normal file
36
src/rights_filter/admin/knowledge_base_handlers.py
Normal file
|
|
@ -0,0 +1,36 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from rights_filter.domain.knowledge_base import create_manual_knowledge_entry
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
InMemoryRightsFilterRepository,
|
||||||
|
KnowledgeBaseEntry,
|
||||||
|
KnowledgeEntryType,
|
||||||
|
)
|
||||||
|
from rights_filter.governance.policies import assert_no_biometric_template
|
||||||
|
|
||||||
|
|
||||||
|
def register_manual_entry(
|
||||||
|
repository: InMemoryRightsFilterRepository,
|
||||||
|
entry_type: KnowledgeEntryType,
|
||||||
|
name: str,
|
||||||
|
aliases: list[str] | None = None,
|
||||||
|
related_keywords: list[str] | None = None,
|
||||||
|
policy_memo: str = "",
|
||||||
|
exception_conditions: str = "",
|
||||||
|
sample_fingerprints: list[str] | None = None,
|
||||||
|
**payload: Any,
|
||||||
|
) -> KnowledgeBaseEntry:
|
||||||
|
assert_no_biometric_template(payload)
|
||||||
|
entry = create_manual_knowledge_entry(
|
||||||
|
entry_type=entry_type,
|
||||||
|
name=name,
|
||||||
|
aliases=aliases,
|
||||||
|
related_keywords=related_keywords,
|
||||||
|
policy_memo=policy_memo,
|
||||||
|
exception_conditions=exception_conditions,
|
||||||
|
sample_fingerprints=sample_fingerprints,
|
||||||
|
)
|
||||||
|
repository.save_knowledge_entry(entry)
|
||||||
|
return entry
|
||||||
63
src/rights_filter/admin/review_handlers.py
Normal file
63
src/rights_filter/admin/review_handlers.py
Normal file
|
|
@ -0,0 +1,63 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
InMemoryRightsFilterRepository,
|
||||||
|
OperatorDecision,
|
||||||
|
ReviewStatus,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def operator_summary_for(
|
||||||
|
repository: InMemoryRightsFilterRepository, submission_id: str
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
runs = repository.analysis_runs_for_submission(submission_id)
|
||||||
|
if not runs:
|
||||||
|
return {"submission_id": submission_id, "analysis": None}
|
||||||
|
run = runs[-1]
|
||||||
|
return {
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"score": run.score.score if run.score else None,
|
||||||
|
"band": run.score.band if run.score else None,
|
||||||
|
"reasons": run.score.reasons if run.score else [],
|
||||||
|
"evidence": [
|
||||||
|
{
|
||||||
|
"source": item.source,
|
||||||
|
"reason": item.reason,
|
||||||
|
"confidence": item.confidence,
|
||||||
|
"data": item.data,
|
||||||
|
}
|
||||||
|
for item in run.evidence
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def applicant_summary_for(
|
||||||
|
repository: InMemoryRightsFilterRepository, submission_id: str
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
return {"submission_id": submission_id}
|
||||||
|
|
||||||
|
|
||||||
|
def record_operator_decision(
|
||||||
|
repository: InMemoryRightsFilterRepository,
|
||||||
|
submission_id: str,
|
||||||
|
status: ReviewStatus,
|
||||||
|
memo: str = "",
|
||||||
|
fingerprints: list[str] | None = None,
|
||||||
|
) -> OperatorDecision:
|
||||||
|
decision = OperatorDecision.create(
|
||||||
|
submission_id=submission_id,
|
||||||
|
status=status,
|
||||||
|
memo=memo,
|
||||||
|
)
|
||||||
|
repository.save_operator_decision(decision)
|
||||||
|
|
||||||
|
if status == ReviewStatus.REJECTED:
|
||||||
|
repository.create_rejected_image_entry(
|
||||||
|
decision_id=decision.id,
|
||||||
|
submission_id=submission_id,
|
||||||
|
fingerprints=fingerprints or [],
|
||||||
|
)
|
||||||
|
|
||||||
|
return decision
|
||||||
4
src/rights_filter/admin/review_presenters.py
Normal file
4
src/rights_filter/admin/review_presenters.py
Normal file
|
|
@ -0,0 +1,4 @@
|
||||||
|
from rights_filter.admin.detailed_review_presenter import detailed_review_for
|
||||||
|
from rights_filter.admin.review_handlers import applicant_summary_for, operator_summary_for
|
||||||
|
|
||||||
|
__all__ = ["applicant_summary_for", "detailed_review_for", "operator_summary_for"]
|
||||||
1
src/rights_filter/analysis/__init__.py
Normal file
1
src/rights_filter/analysis/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
"""Analysis pipeline components."""
|
||||||
11
src/rights_filter/analysis/derivatives.py
Normal file
11
src/rights_filter/analysis/derivatives.py
Normal file
|
|
@ -0,0 +1,11 @@
|
||||||
|
from rights_filter.analysis.preprocessing import (
|
||||||
|
ImagePayload,
|
||||||
|
PreprocessingError,
|
||||||
|
build_external_derivative,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"ImagePayload",
|
||||||
|
"PreprocessingError",
|
||||||
|
"build_external_derivative",
|
||||||
|
]
|
||||||
120
src/rights_filter/analysis/evidence_enrichment.py
Normal file
120
src/rights_filter/analysis/evidence_enrichment.py
Normal file
|
|
@ -0,0 +1,120 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
|
||||||
|
from rights_filter.analysis.llm_assistance import InternalLlmAssistant
|
||||||
|
from rights_filter.analysis.risk_scoring import RiskScorer
|
||||||
|
from rights_filter.analysis.search_query_generation import SearchQueryGenerator
|
||||||
|
from rights_filter.analysis.search_result_promoter import SearchResultPromoter
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
Evidence,
|
||||||
|
EvidenceSource,
|
||||||
|
InMemoryRightsFilterRepository,
|
||||||
|
)
|
||||||
|
from rights_filter.integrations.naver_search import NaverSearchAdapter
|
||||||
|
from rights_filter.integrations.search_policy import SearchApiPolicy
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class EnrichmentSummary:
|
||||||
|
generated_queries: int = 0
|
||||||
|
executed_searches: int = 0
|
||||||
|
skipped_searches: int = 0
|
||||||
|
provider_failures: int = 0
|
||||||
|
summary_failures: int = 0
|
||||||
|
failed: int = 0
|
||||||
|
failure_reasons: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
class EvidenceEnricher:
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
query_generator: SearchQueryGenerator,
|
||||||
|
naver_adapter: NaverSearchAdapter,
|
||||||
|
search_policy: SearchApiPolicy,
|
||||||
|
promoter: SearchResultPromoter,
|
||||||
|
llm_assistant: InternalLlmAssistant,
|
||||||
|
scorer: RiskScorer,
|
||||||
|
) -> None:
|
||||||
|
self.query_generator = query_generator
|
||||||
|
self.naver_adapter = naver_adapter
|
||||||
|
self.search_policy = search_policy
|
||||||
|
self.promoter = promoter
|
||||||
|
self.llm_assistant = llm_assistant
|
||||||
|
self.scorer = scorer
|
||||||
|
|
||||||
|
def enrich_latest(
|
||||||
|
self, repository: InMemoryRightsFilterRepository, submission_id: str
|
||||||
|
) -> EnrichmentSummary:
|
||||||
|
runs = repository.analysis_runs_for_submission(submission_id)
|
||||||
|
if not runs:
|
||||||
|
return EnrichmentSummary(
|
||||||
|
failed=1,
|
||||||
|
failure_reasons=[f"missing analysis run for {submission_id}"],
|
||||||
|
)
|
||||||
|
|
||||||
|
run = runs[-1]
|
||||||
|
summary = EnrichmentSummary()
|
||||||
|
queries = self.query_generator.generate(
|
||||||
|
run.evidence, repository.active_knowledge_entries()
|
||||||
|
)
|
||||||
|
summary.generated_queries = len(queries)
|
||||||
|
|
||||||
|
new_evidence: list[Evidence] = []
|
||||||
|
existing_signatures = _existing_query_signatures(run.evidence)
|
||||||
|
for query in queries:
|
||||||
|
signature = "naver:" + " ".join(query.lower().split())
|
||||||
|
if signature in existing_signatures:
|
||||||
|
continue
|
||||||
|
found = self.naver_adapter.search(submission_id, query, self.search_policy)
|
||||||
|
promoted = self.promoter.promote(found)
|
||||||
|
new_evidence.extend(promoted)
|
||||||
|
# Each naver_adapter.search() result is homogeneous in source, so
|
||||||
|
# classify the single search CALL by its outcome instead of summing
|
||||||
|
# per result item (which inflated executed_searches by item count).
|
||||||
|
outcome_sources = {item.source for item in promoted}
|
||||||
|
if EvidenceSource.SEARCH_SKIPPED in outcome_sources:
|
||||||
|
summary.skipped_searches += 1
|
||||||
|
elif EvidenceSource.ENRICHMENT_FAILURE in outcome_sources:
|
||||||
|
summary.provider_failures += 1
|
||||||
|
elif EvidenceSource.NAVER_SEARCH in outcome_sources:
|
||||||
|
summary.executed_searches += 1
|
||||||
|
|
||||||
|
source_evidence = [
|
||||||
|
item
|
||||||
|
for item in [*run.evidence, *new_evidence]
|
||||||
|
if item.source
|
||||||
|
in {
|
||||||
|
EvidenceSource.NAVER_SEARCH,
|
||||||
|
EvidenceSource.WEB_DETECTION,
|
||||||
|
EvidenceSource.FINGERPRINT,
|
||||||
|
}
|
||||||
|
]
|
||||||
|
llm_summary = self.llm_assistant.summarize(submission_id, source_evidence)
|
||||||
|
if llm_summary.source == EvidenceSource.ENRICHMENT_FAILURE:
|
||||||
|
summary.summary_failures += 1
|
||||||
|
if not _has_existing_llm_summary(run.evidence):
|
||||||
|
new_evidence.append(llm_summary)
|
||||||
|
|
||||||
|
for item in new_evidence:
|
||||||
|
run.add_evidence(item)
|
||||||
|
run.score = self.scorer.score(run.evidence)
|
||||||
|
return summary
|
||||||
|
|
||||||
|
|
||||||
|
def _existing_query_signatures(evidence: list[Evidence]) -> set[str]:
|
||||||
|
return {
|
||||||
|
str(item.data["query_signature"])
|
||||||
|
for item in evidence
|
||||||
|
if item.source
|
||||||
|
in {
|
||||||
|
EvidenceSource.NAVER_SEARCH,
|
||||||
|
EvidenceSource.SEARCH_SKIPPED,
|
||||||
|
EvidenceSource.ENRICHMENT_FAILURE,
|
||||||
|
}
|
||||||
|
and item.data.get("query_signature")
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _has_existing_llm_summary(evidence: list[Evidence]) -> bool:
|
||||||
|
return any(item.source == EvidenceSource.LLM_SUMMARY for item in evidence)
|
||||||
9
src/rights_filter/analysis/evidence_normalizer.py
Normal file
9
src/rights_filter/analysis/evidence_normalizer.py
Normal file
|
|
@ -0,0 +1,9 @@
|
||||||
|
from rights_filter.domain.records import Evidence
|
||||||
|
|
||||||
|
|
||||||
|
class EvidenceNormalizer:
|
||||||
|
def normalize(self, evidence: list[Evidence]) -> list[Evidence]:
|
||||||
|
return list(evidence)
|
||||||
|
|
||||||
|
|
||||||
|
__all__ = ["EvidenceNormalizer"]
|
||||||
106
src/rights_filter/analysis/face_person_detection.py
Normal file
106
src/rights_filter/analysis/face_person_detection.py
Normal file
|
|
@ -0,0 +1,106 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from io import BytesIO
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
from rights_filter.analysis.preprocessing import ImagePayload
|
||||||
|
|
||||||
|
FaceBox = tuple[int, int, int, int]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class FacePersonSignal:
|
||||||
|
face_count: int
|
||||||
|
person_count: int
|
||||||
|
face_boxes: tuple[FaceBox, ...] = ()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def present(self) -> bool:
|
||||||
|
return self.face_count > 0 or self.person_count > 0
|
||||||
|
|
||||||
|
|
||||||
|
class HeuristicFacePersonDetector:
|
||||||
|
"""Presence-only detector used by the standalone implementation.
|
||||||
|
|
||||||
|
Target applications should replace this with their approved local detector or
|
||||||
|
internal service. It intentionally emits no identity or embedding data.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def detect(self, image: ImagePayload) -> FacePersonSignal:
|
||||||
|
return self._detect_with_opencv(image.content)
|
||||||
|
|
||||||
|
def _detect_with_opencv(self, content: bytes) -> FacePersonSignal:
|
||||||
|
try:
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
except Exception:
|
||||||
|
return FacePersonSignal(face_count=0, person_count=0)
|
||||||
|
|
||||||
|
encoded = np.frombuffer(content, dtype=np.uint8)
|
||||||
|
decoded = cv2.imdecode(encoded, cv2.IMREAD_COLOR)
|
||||||
|
if decoded is None:
|
||||||
|
decoded = _decode_with_pillow(content, cv2, np)
|
||||||
|
if decoded is None:
|
||||||
|
return FacePersonSignal(face_count=0, person_count=0)
|
||||||
|
|
||||||
|
try:
|
||||||
|
gray = cv2.cvtColor(decoded, cv2.COLOR_BGR2GRAY)
|
||||||
|
except Exception:
|
||||||
|
return FacePersonSignal(face_count=0, person_count=0)
|
||||||
|
|
||||||
|
face_boxes: list[FaceBox] = []
|
||||||
|
for cascade_name in (
|
||||||
|
"haarcascade_frontalface_default.xml",
|
||||||
|
"haarcascade_profileface.xml",
|
||||||
|
):
|
||||||
|
cascade = cv2.CascadeClassifier(f"{cv2.data.haarcascades}{cascade_name}")
|
||||||
|
if hasattr(cascade, "empty") and cascade.empty():
|
||||||
|
continue
|
||||||
|
rects = cascade.detectMultiScale(
|
||||||
|
gray,
|
||||||
|
scaleFactor=1.1,
|
||||||
|
minNeighbors=4,
|
||||||
|
minSize=(30, 30),
|
||||||
|
)
|
||||||
|
for rect in rects:
|
||||||
|
box = tuple(int(value) for value in rect)
|
||||||
|
if len(box) == 4 and not any(
|
||||||
|
_boxes_overlap(box, existing) for existing in face_boxes
|
||||||
|
):
|
||||||
|
face_boxes.append(box)
|
||||||
|
|
||||||
|
face_count = len(face_boxes)
|
||||||
|
return FacePersonSignal(
|
||||||
|
face_count=face_count,
|
||||||
|
person_count=face_count,
|
||||||
|
face_boxes=tuple(face_boxes),
|
||||||
|
)
|
||||||
|
|
||||||
|
def _boxes_overlap(a: FaceBox, b: FaceBox, threshold: float = 0.5) -> bool:
|
||||||
|
# Boxes are (x, y, w, h). Treat them as the same physical face when their
|
||||||
|
# intersection-over-union meets the threshold, so a face detected by BOTH
|
||||||
|
# the frontal and profile cascades is counted once rather than twice.
|
||||||
|
ax, ay, aw, ah = a
|
||||||
|
bx, by, bw, bh = b
|
||||||
|
left = max(ax, bx)
|
||||||
|
top = max(ay, by)
|
||||||
|
right = min(ax + aw, bx + bw)
|
||||||
|
bottom = min(ay + ah, by + bh)
|
||||||
|
intersection = max(0, right - left) * max(0, bottom - top)
|
||||||
|
if intersection == 0:
|
||||||
|
return False
|
||||||
|
union = (aw * ah) + (bw * bh) - intersection
|
||||||
|
if union <= 0:
|
||||||
|
return False
|
||||||
|
return intersection / union >= threshold
|
||||||
|
|
||||||
|
|
||||||
|
def _decode_with_pillow(content: bytes, cv2: object, np: object) -> object | None:
|
||||||
|
try:
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
with Image.open(BytesIO(content)) as image:
|
||||||
|
rgb = image.convert("RGB")
|
||||||
|
return cv2.cvtColor(np.array(rgb), cv2.COLOR_RGB2BGR)
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
61
src/rights_filter/analysis/fingerprints.py
Normal file
61
src/rights_filter/analysis/fingerprints.py
Normal file
|
|
@ -0,0 +1,61 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import hashlib
|
||||||
|
from io import BytesIO
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class Fingerprints:
|
||||||
|
exact: str
|
||||||
|
perceptual: str
|
||||||
|
|
||||||
|
|
||||||
|
class FingerprintService:
|
||||||
|
def fingerprints_for(self, content: bytes) -> Fingerprints:
|
||||||
|
exact_digest = hashlib.sha256(content).hexdigest()
|
||||||
|
exact = "exact:" + exact_digest
|
||||||
|
perceptual = "phash:" + _perceptual_hash(content, exact_digest)
|
||||||
|
return Fingerprints(exact=exact, perceptual=perceptual)
|
||||||
|
|
||||||
|
def similarity(self, left: str, right: str) -> float:
|
||||||
|
if left == right:
|
||||||
|
return 1.0
|
||||||
|
left_value = left.split(":", 1)[-1]
|
||||||
|
right_value = right.split(":", 1)[-1]
|
||||||
|
if left_value.startswith("unavailable:") or right_value.startswith("unavailable:"):
|
||||||
|
return 0.0
|
||||||
|
if _looks_like_64bit_hash(left_value) and _looks_like_64bit_hash(right_value):
|
||||||
|
distance = (int(left_value, 16) ^ int(right_value, 16)).bit_count()
|
||||||
|
return 1 - (distance / 64)
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
|
||||||
|
def _perceptual_hash(content: bytes, exact_digest: str) -> str:
|
||||||
|
try:
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
with Image.open(BytesIO(content)) as image:
|
||||||
|
thumbnail = image.convert("L").resize((8, 8))
|
||||||
|
if hasattr(thumbnail, "get_flattened_data"):
|
||||||
|
pixels = list(thumbnail.get_flattened_data())
|
||||||
|
else:
|
||||||
|
pixels = list(thumbnail.getdata())
|
||||||
|
except Exception:
|
||||||
|
return f"unavailable:{exact_digest}"
|
||||||
|
|
||||||
|
average = sum(pixels) / len(pixels)
|
||||||
|
bits = 0
|
||||||
|
for pixel in pixels:
|
||||||
|
bits = (bits << 1) | int(pixel >= average)
|
||||||
|
return f"{bits:016x}"
|
||||||
|
|
||||||
|
|
||||||
|
def _looks_like_64bit_hash(value: str) -> bool:
|
||||||
|
if len(value) != 16:
|
||||||
|
return False
|
||||||
|
try:
|
||||||
|
int(value, 16)
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
return True
|
||||||
84
src/rights_filter/analysis/internal_analyzer.py
Normal file
84
src/rights_filter/analysis/internal_analyzer.py
Normal file
|
|
@ -0,0 +1,84 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from rights_filter.analysis.face_person_detection import HeuristicFacePersonDetector
|
||||||
|
from rights_filter.analysis.fingerprints import FingerprintService
|
||||||
|
from rights_filter.analysis.preprocessing import ImagePayload
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
Evidence,
|
||||||
|
EvidenceSource,
|
||||||
|
InMemoryRightsFilterRepository,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class InternalAnalyzer:
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
repository: InMemoryRightsFilterRepository,
|
||||||
|
fingerprint_service: FingerprintService,
|
||||||
|
face_person_detector: HeuristicFacePersonDetector,
|
||||||
|
similarity_threshold: float = 0.9,
|
||||||
|
) -> None:
|
||||||
|
self.repository = repository
|
||||||
|
self.fingerprint_service = fingerprint_service
|
||||||
|
self.face_person_detector = face_person_detector
|
||||||
|
self.similarity_threshold = similarity_threshold
|
||||||
|
|
||||||
|
def analyze(self, submission_id: str, image: ImagePayload) -> list[Evidence]:
|
||||||
|
evidence: list[Evidence] = []
|
||||||
|
fingerprints = self.fingerprint_service.fingerprints_for(image.content)
|
||||||
|
evidence.append(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.FINGERPRINT,
|
||||||
|
reason="Image fingerprints generated",
|
||||||
|
confidence=1.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"exact": fingerprints.exact,
|
||||||
|
"perceptual": fingerprints.perceptual,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
for entry in self.repository.active_knowledge_entries():
|
||||||
|
for sample in entry.sample_fingerprints:
|
||||||
|
similarity = self.fingerprint_service.similarity(
|
||||||
|
fingerprints.perceptual, sample
|
||||||
|
)
|
||||||
|
if similarity >= self.similarity_threshold:
|
||||||
|
entry_status = entry.entry_status or "confirmed"
|
||||||
|
reason = (
|
||||||
|
f"주의 후보 이미지 유사도 {similarity:.2f}"
|
||||||
|
if entry_status == "watchlist"
|
||||||
|
else f"Knowledge base image similarity {similarity:.2f}"
|
||||||
|
)
|
||||||
|
evidence.append(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.FINGERPRINT,
|
||||||
|
reason=reason,
|
||||||
|
confidence=similarity,
|
||||||
|
data={
|
||||||
|
"knowledge_entry_id": entry.id,
|
||||||
|
"knowledge_name": entry.name,
|
||||||
|
"knowledge_entry_status": entry_status,
|
||||||
|
"source_submission_id": entry.source_submission_id,
|
||||||
|
"similarity": similarity,
|
||||||
|
"provenance": entry.provenance,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
signal = self.face_person_detector.detect(image)
|
||||||
|
if signal.present:
|
||||||
|
evidence.append(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.FACE_PERSON,
|
||||||
|
reason="Face/person detected",
|
||||||
|
confidence=0.8,
|
||||||
|
data={
|
||||||
|
"face_count": signal.face_count,
|
||||||
|
"person_count": signal.person_count,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
return evidence
|
||||||
149
src/rights_filter/analysis/llm_assistance.py
Normal file
149
src/rights_filter/analysis/llm_assistance.py
Normal file
|
|
@ -0,0 +1,149 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from rights_filter.domain.records import Evidence, EvidenceSource
|
||||||
|
from rights_filter.integrations.http_json import UrllibJsonTransport
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class FakeInternalLlmClient:
|
||||||
|
summary: str = ""
|
||||||
|
error: Exception | None = None
|
||||||
|
|
||||||
|
def summarize_evidence(self, evidence: list[Evidence]) -> str:
|
||||||
|
if self.error:
|
||||||
|
raise self.error
|
||||||
|
return self.summary or "Evidence summary unavailable."
|
||||||
|
|
||||||
|
|
||||||
|
class OllamaGenerateLlmClient:
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
base_url: str = "http://127.0.0.1:11434",
|
||||||
|
model: str = "qwen2.5:0.5b-instruct",
|
||||||
|
transport: Any | None = None,
|
||||||
|
timeout: int = 30,
|
||||||
|
connect_timeout: float = 2,
|
||||||
|
) -> None:
|
||||||
|
self.base_url = base_url.rstrip("/")
|
||||||
|
self.model = model
|
||||||
|
# Fast-fail the connect (e.g. a local LLM that isn't running) so seed /
|
||||||
|
# reload requests are not stalled for the full read timeout, while real
|
||||||
|
# generation still gets the full `timeout`.
|
||||||
|
self.transport = transport or UrllibJsonTransport(connect_timeout=connect_timeout)
|
||||||
|
self.timeout = timeout
|
||||||
|
|
||||||
|
def summarize_evidence(self, evidence: list[Evidence]) -> str:
|
||||||
|
payload = {
|
||||||
|
"model": self.model,
|
||||||
|
"system": (
|
||||||
|
"Summarize only the provided source evidence for an internal image "
|
||||||
|
"rights review operator. Do not make a final decision. Do not add "
|
||||||
|
"claims that are not grounded in source evidence."
|
||||||
|
),
|
||||||
|
"prompt": _evidence_prompt(evidence),
|
||||||
|
"stream": False,
|
||||||
|
"options": {"temperature": 0.1},
|
||||||
|
}
|
||||||
|
response = self.transport.request_json(
|
||||||
|
"POST",
|
||||||
|
self._generate_url(),
|
||||||
|
payload=payload,
|
||||||
|
timeout=self.timeout,
|
||||||
|
)
|
||||||
|
return _response_text(response)
|
||||||
|
|
||||||
|
def _generate_url(self) -> str:
|
||||||
|
if self.base_url.endswith("/api"):
|
||||||
|
return f"{self.base_url}/generate"
|
||||||
|
return f"{self.base_url}/api/generate"
|
||||||
|
|
||||||
|
|
||||||
|
class InternalLlmAssistant:
|
||||||
|
def __init__(self, client: Any) -> None:
|
||||||
|
self.client = client
|
||||||
|
|
||||||
|
def summarize(self, submission_id: str, evidence: list[Evidence]) -> Evidence:
|
||||||
|
try:
|
||||||
|
summary = self.client.summarize_evidence(evidence)
|
||||||
|
except Exception as exc:
|
||||||
|
return Evidence(
|
||||||
|
source=EvidenceSource.ENRICHMENT_FAILURE,
|
||||||
|
reason=f"LLM assistance failed: {exc}",
|
||||||
|
confidence=1.0,
|
||||||
|
data={"submission_id": submission_id},
|
||||||
|
)
|
||||||
|
|
||||||
|
source_urls = _source_urls(evidence)
|
||||||
|
source_evidence_ids = _source_evidence_ids(evidence)
|
||||||
|
if not source_urls and not source_evidence_ids:
|
||||||
|
return Evidence(
|
||||||
|
source=EvidenceSource.LLM_SUMMARY,
|
||||||
|
reason="LLM summary has no source evidence",
|
||||||
|
confidence=0.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"summary": summary,
|
||||||
|
"source_urls": [],
|
||||||
|
"source_evidence_ids": [],
|
||||||
|
"verified": False,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
return Evidence(
|
||||||
|
source=EvidenceSource.LLM_SUMMARY,
|
||||||
|
reason="Assistant summarized source-linked evidence",
|
||||||
|
confidence=0.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"summary": summary,
|
||||||
|
"source_urls": source_urls,
|
||||||
|
"source_evidence_ids": source_evidence_ids,
|
||||||
|
"verified": True,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _source_urls(evidence: list[Evidence]) -> list[str]:
|
||||||
|
urls: list[str] = []
|
||||||
|
for item in evidence:
|
||||||
|
for key in ("result_url", "url", "image_url"):
|
||||||
|
value = item.data.get(key)
|
||||||
|
if value and value not in urls:
|
||||||
|
urls.append(str(value))
|
||||||
|
return urls
|
||||||
|
|
||||||
|
|
||||||
|
def _source_evidence_ids(evidence: list[Evidence]) -> list[str]:
|
||||||
|
ids: list[str] = []
|
||||||
|
for item in evidence:
|
||||||
|
value = item.data.get("evidence_id")
|
||||||
|
if value and value not in ids:
|
||||||
|
ids.append(str(value))
|
||||||
|
return ids
|
||||||
|
|
||||||
|
|
||||||
|
def _evidence_prompt(evidence: list[Evidence]) -> str:
|
||||||
|
lines = ["Source evidence:"]
|
||||||
|
for index, item in enumerate(evidence, start=1):
|
||||||
|
lines.append(
|
||||||
|
f"{index}. source={item.source.value}; reason={item.reason}; "
|
||||||
|
f"confidence={item.confidence}; data={item.data}"
|
||||||
|
)
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def _response_text(response: dict[str, Any]) -> str:
|
||||||
|
if response.get("response"):
|
||||||
|
return str(response["response"])
|
||||||
|
if response.get("output_text"):
|
||||||
|
return str(response["output_text"])
|
||||||
|
chunks: list[str] = []
|
||||||
|
for item in response.get("output", []):
|
||||||
|
for content in item.get("content", []):
|
||||||
|
text = content.get("text")
|
||||||
|
if text:
|
||||||
|
chunks.append(str(text))
|
||||||
|
return "\n".join(chunks) or "Evidence summary unavailable."
|
||||||
176
src/rights_filter/analysis/preprocessing.py
Normal file
176
src/rights_filter/analysis/preprocessing.py
Normal file
|
|
@ -0,0 +1,176 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from io import BytesIO
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
|
||||||
|
from rights_filter.domain.records import DataClass
|
||||||
|
|
||||||
|
FaceBox = tuple[int, int, int, int]
|
||||||
|
|
||||||
|
|
||||||
|
class PreprocessingError(ValueError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ImagePayload:
|
||||||
|
content: bytes
|
||||||
|
width: int
|
||||||
|
height: int
|
||||||
|
metadata: dict[str, str] = field(default_factory=dict)
|
||||||
|
data_class: DataClass = DataClass.ORIGINAL_IMAGE
|
||||||
|
|
||||||
|
|
||||||
|
def build_external_derivative(
|
||||||
|
original: ImagePayload, max_side: int = 1600
|
||||||
|
) -> ImagePayload:
|
||||||
|
if not original.content:
|
||||||
|
raise PreprocessingError("empty image content")
|
||||||
|
if original.width <= 0 or original.height <= 0:
|
||||||
|
raise PreprocessingError("image dimensions must be positive")
|
||||||
|
|
||||||
|
longest_side = max(original.width, original.height)
|
||||||
|
if longest_side > max_side:
|
||||||
|
scale = max_side / longest_side
|
||||||
|
width = max(1, round(original.width * scale))
|
||||||
|
height = max(1, round(original.height * scale))
|
||||||
|
else:
|
||||||
|
width = original.width
|
||||||
|
height = original.height
|
||||||
|
|
||||||
|
if original.metadata.get("format", "").upper() == "AVIF":
|
||||||
|
content = _jpeg_derivative(original.content, width, height)
|
||||||
|
else:
|
||||||
|
# Re-encode through PIL to drop embedded EXIF/GPS metadata from the image
|
||||||
|
# bytes. Clearing only the metadata dict (below) left camera serials and
|
||||||
|
# GPS coordinates inside the bytes that get shipped to external APIs.
|
||||||
|
content = _reencode_without_metadata(original.content, width, height)
|
||||||
|
if content is None:
|
||||||
|
content = original.content
|
||||||
|
|
||||||
|
return ImagePayload(
|
||||||
|
content=content,
|
||||||
|
width=width,
|
||||||
|
height=height,
|
||||||
|
metadata={},
|
||||||
|
data_class=DataClass.EXTERNAL_DERIVATIVE,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def build_face_crop_derivatives(
|
||||||
|
original: ImagePayload,
|
||||||
|
face_boxes: list[FaceBox] | tuple[FaceBox, ...],
|
||||||
|
max_crops: int = 3,
|
||||||
|
padding_ratio: float = 0.25,
|
||||||
|
max_side: int = 768,
|
||||||
|
) -> list[ImagePayload]:
|
||||||
|
if not original.content or not face_boxes:
|
||||||
|
return []
|
||||||
|
|
||||||
|
try:
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
with Image.open(BytesIO(original.content)) as image:
|
||||||
|
image = image.convert("RGB")
|
||||||
|
crops: list[ImagePayload] = []
|
||||||
|
image_width, image_height = image.size
|
||||||
|
for box in face_boxes[:max_crops]:
|
||||||
|
try:
|
||||||
|
bounds = _padded_bounds(box, image_width, image_height, padding_ratio)
|
||||||
|
if bounds is None:
|
||||||
|
continue
|
||||||
|
crop = image.crop(bounds)
|
||||||
|
crop = _resize_to_max_side(crop, max_side)
|
||||||
|
output = BytesIO()
|
||||||
|
crop.save(output, format="JPEG", quality=90, optimize=True)
|
||||||
|
crops.append(
|
||||||
|
ImagePayload(
|
||||||
|
content=output.getvalue(),
|
||||||
|
width=int(crop.width),
|
||||||
|
height=int(crop.height),
|
||||||
|
metadata={},
|
||||||
|
data_class=DataClass.EXTERNAL_DERIVATIVE,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
# Skip a single bad box instead of discarding all crops.
|
||||||
|
continue
|
||||||
|
return crops
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def _jpeg_derivative(content: bytes, width: int, height: int) -> bytes | None:
|
||||||
|
try:
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
with Image.open(BytesIO(content)) as image:
|
||||||
|
image = image.convert("RGB")
|
||||||
|
if image.size != (width, height):
|
||||||
|
image = image.resize((width, height))
|
||||||
|
output = BytesIO()
|
||||||
|
image.save(output, format="JPEG", quality=90, optimize=True)
|
||||||
|
return output.getvalue()
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
_REENCODABLE_FORMATS = {"JPEG", "PNG", "WEBP", "GIF", "BMP", "TIFF"}
|
||||||
|
|
||||||
|
|
||||||
|
def _reencode_without_metadata(content: bytes, width: int, height: int) -> bytes | None:
|
||||||
|
"""Re-save the image (preserving format where possible) so embedded EXIF/GPS
|
||||||
|
metadata is dropped. Returns None when PIL is unavailable or the bytes are
|
||||||
|
not a decodable image, so the caller can fall back to the original content."""
|
||||||
|
try:
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
with Image.open(BytesIO(content)) as image:
|
||||||
|
source_format = (image.format or "").upper()
|
||||||
|
save_format = source_format if source_format in _REENCODABLE_FORMATS else "JPEG"
|
||||||
|
if save_format in {"JPEG", "BMP"} and image.mode not in {"RGB", "L"}:
|
||||||
|
image = image.convert("RGB")
|
||||||
|
elif image.mode == "P":
|
||||||
|
image = image.convert("RGBA")
|
||||||
|
if image.size != (width, height):
|
||||||
|
image = image.resize((width, height))
|
||||||
|
# Rebuild from raw pixels so NO metadata (EXIF/GPS/ICC) survives —
|
||||||
|
# Pillow can otherwise re-emit image.info["exif"] when re-saving.
|
||||||
|
clean = Image.frombytes(image.mode, image.size, image.tobytes())
|
||||||
|
output = BytesIO()
|
||||||
|
clean.save(output, format=save_format)
|
||||||
|
return output.getvalue()
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _padded_bounds(
|
||||||
|
box: FaceBox,
|
||||||
|
image_width: int,
|
||||||
|
image_height: int,
|
||||||
|
padding_ratio: float,
|
||||||
|
) -> tuple[int, int, int, int] | None:
|
||||||
|
x, y, width, height = (int(value) for value in box)
|
||||||
|
if width <= 0 or height <= 0 or image_width <= 0 or image_height <= 0:
|
||||||
|
return None
|
||||||
|
|
||||||
|
pad_x = round(width * padding_ratio)
|
||||||
|
pad_y = round(height * padding_ratio)
|
||||||
|
left = max(0, x - pad_x)
|
||||||
|
top = max(0, y - pad_y)
|
||||||
|
right = min(image_width, x + width + pad_x)
|
||||||
|
bottom = min(image_height, y + height + pad_y)
|
||||||
|
if right <= left or bottom <= top:
|
||||||
|
return None
|
||||||
|
return (left, top, right, bottom)
|
||||||
|
|
||||||
|
|
||||||
|
def _resize_to_max_side(image: object, max_side: int) -> object:
|
||||||
|
longest_side = max(int(image.width), int(image.height))
|
||||||
|
if longest_side <= max_side:
|
||||||
|
return image
|
||||||
|
|
||||||
|
scale = max_side / longest_side
|
||||||
|
width = max(1, round(int(image.width) * scale))
|
||||||
|
height = max(1, round(int(image.height) * scale))
|
||||||
|
return image.resize((width, height))
|
||||||
13
src/rights_filter/analysis/reason_builder.py
Normal file
13
src/rights_filter/analysis/reason_builder.py
Normal file
|
|
@ -0,0 +1,13 @@
|
||||||
|
from rights_filter.domain.records import Evidence
|
||||||
|
|
||||||
|
|
||||||
|
class ReasonBuilder:
|
||||||
|
def reasons_for(self, evidence: list[Evidence]) -> list[str]:
|
||||||
|
reasons: list[str] = []
|
||||||
|
for item in evidence:
|
||||||
|
if item.reason and item.reason not in reasons:
|
||||||
|
reasons.append(item.reason)
|
||||||
|
return reasons
|
||||||
|
|
||||||
|
|
||||||
|
__all__ = ["ReasonBuilder"]
|
||||||
123
src/rights_filter/analysis/risk_scoring.py
Normal file
123
src/rights_filter/analysis/risk_scoring.py
Normal file
|
|
@ -0,0 +1,123 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from rights_filter.domain.records import Evidence, EvidenceSource, ScoreResult
|
||||||
|
|
||||||
|
|
||||||
|
class RiskScorer:
|
||||||
|
def score(self, evidence: list[Evidence]) -> ScoreResult:
|
||||||
|
score = 0
|
||||||
|
reasons: list[str] = []
|
||||||
|
counted_web_buckets: set[str] = set()
|
||||||
|
|
||||||
|
for item in evidence:
|
||||||
|
if item.data.get("operator_status") in {"irrelevant", "false_positive"}:
|
||||||
|
continue
|
||||||
|
if item.source == EvidenceSource.FINGERPRINT:
|
||||||
|
if item.data.get("contributed") is False or item.data.get("status") == "queued":
|
||||||
|
continue
|
||||||
|
similarity = _safe_float(item.data.get("similarity", 0))
|
||||||
|
if similarity >= 0.9:
|
||||||
|
score += 80
|
||||||
|
reasons.append(item.reason)
|
||||||
|
elif item.reason != "Image fingerprints generated":
|
||||||
|
score += 30
|
||||||
|
reasons.append(item.reason)
|
||||||
|
elif item.source == EvidenceSource.FACE_PERSON:
|
||||||
|
score += 35
|
||||||
|
reasons.append(item.reason)
|
||||||
|
elif item.source == EvidenceSource.WEB_DETECTION:
|
||||||
|
bucket = _web_detection_bucket(item)
|
||||||
|
if bucket and bucket in counted_web_buckets:
|
||||||
|
continue
|
||||||
|
points = _web_detection_points(item)
|
||||||
|
score += points
|
||||||
|
if points:
|
||||||
|
reasons.append(item.reason)
|
||||||
|
if bucket:
|
||||||
|
counted_web_buckets.add(bucket)
|
||||||
|
elif item.source == EvidenceSource.NAVER_SEARCH:
|
||||||
|
points = _naver_search_points(item)
|
||||||
|
score += points
|
||||||
|
if points:
|
||||||
|
reasons.append(item.reason)
|
||||||
|
elif item.source == EvidenceSource.LLM_SUMMARY:
|
||||||
|
continue
|
||||||
|
elif item.source in {
|
||||||
|
EvidenceSource.FAILURE,
|
||||||
|
EvidenceSource.EXTERNAL_SKIPPED,
|
||||||
|
EvidenceSource.SEARCH_SKIPPED,
|
||||||
|
EvidenceSource.ENRICHMENT_FAILURE,
|
||||||
|
}:
|
||||||
|
# An LLM-summary failure only means narration didn't run; it is
|
||||||
|
# not a search/coverage gap, so it must not add risk or surface
|
||||||
|
# as a reason. Genuine provider failures (search / web-detection,
|
||||||
|
# e.g. "External API failed") still contribute the +30 signal.
|
||||||
|
if item.reason.startswith("LLM assistance failed"):
|
||||||
|
continue
|
||||||
|
score += (
|
||||||
|
30
|
||||||
|
if item.source
|
||||||
|
in {EvidenceSource.FAILURE, EvidenceSource.ENRICHMENT_FAILURE}
|
||||||
|
else 0
|
||||||
|
)
|
||||||
|
reasons.append(item.reason)
|
||||||
|
|
||||||
|
score = min(100, score)
|
||||||
|
if score >= 70:
|
||||||
|
band = "high"
|
||||||
|
elif score >= 30:
|
||||||
|
band = "medium"
|
||||||
|
else:
|
||||||
|
band = "low"
|
||||||
|
|
||||||
|
return ScoreResult(score=score, band=band, reasons=_unique(reasons))
|
||||||
|
|
||||||
|
|
||||||
|
def _web_detection_points(item: Evidence) -> int:
|
||||||
|
if item.data.get("weak_hint"):
|
||||||
|
return 0
|
||||||
|
if item.data.get("match") == "full":
|
||||||
|
return 45
|
||||||
|
if item.data.get("match") in {"partial", "page"}:
|
||||||
|
return 35
|
||||||
|
if item.data.get("match") == "visual":
|
||||||
|
return 10
|
||||||
|
if item.data.get("url"):
|
||||||
|
return 45
|
||||||
|
if item.data.get("category") in {"character", "celebrity", "work"}:
|
||||||
|
return 45
|
||||||
|
if item.data.get("entity"):
|
||||||
|
return round(40 * item.confidence)
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def _web_detection_bucket(item: Evidence) -> str:
|
||||||
|
match = str(item.data.get("match", ""))
|
||||||
|
if match in {"full", "partial", "page", "visual", "entity"}:
|
||||||
|
return match
|
||||||
|
if item.data.get("entity"):
|
||||||
|
return "entity"
|
||||||
|
if item.data.get("url"):
|
||||||
|
return "url"
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def _naver_search_points(item: Evidence) -> int:
|
||||||
|
if item.data.get("promoted"):
|
||||||
|
return round(50 * item.confidence)
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def _unique(values: list[str]) -> list[str]:
|
||||||
|
result: list[str] = []
|
||||||
|
for value in values:
|
||||||
|
if value and value not in result:
|
||||||
|
result.append(value)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def _safe_float(value: object) -> float:
|
||||||
|
try:
|
||||||
|
return float(value or 0)
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return 0.0
|
||||||
321
src/rights_filter/analysis/search_query_generation.py
Normal file
321
src/rights_filter/analysis/search_query_generation.py
Normal file
|
|
@ -0,0 +1,321 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import html
|
||||||
|
import re
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
from rights_filter.domain.records import Evidence, EvidenceSource, KnowledgeBaseEntry
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class GeneratedSearchQuery:
|
||||||
|
query: str
|
||||||
|
strategy: str
|
||||||
|
source: str
|
||||||
|
priority: int
|
||||||
|
|
||||||
|
|
||||||
|
class SearchQueryGenerator:
|
||||||
|
def plan(
|
||||||
|
self,
|
||||||
|
evidence: list[Evidence],
|
||||||
|
knowledge_entries: list[KnowledgeBaseEntry],
|
||||||
|
max_queries: int = 5,
|
||||||
|
) -> list[GeneratedSearchQuery]:
|
||||||
|
candidates: list[GeneratedSearchQuery] = []
|
||||||
|
|
||||||
|
for item in evidence:
|
||||||
|
if item.data.get("local_query_hint"):
|
||||||
|
candidates.extend(_local_metadata_candidates(item))
|
||||||
|
continue
|
||||||
|
if item.source != EvidenceSource.WEB_DETECTION:
|
||||||
|
continue
|
||||||
|
if item.data.get("weak_hint"):
|
||||||
|
candidates.extend(_weak_hint_candidates(item))
|
||||||
|
continue
|
||||||
|
|
||||||
|
page_title = item.data.get("page_title")
|
||||||
|
if page_title:
|
||||||
|
candidates.extend(_page_title_candidates(str(page_title)))
|
||||||
|
|
||||||
|
entity = item.data.get("entity") or item.data.get("label")
|
||||||
|
if entity:
|
||||||
|
candidates.extend(_entity_candidates(str(entity), item.data.get("category")))
|
||||||
|
|
||||||
|
for entry in knowledge_entries:
|
||||||
|
keyword = entry.related_keywords[0] if entry.related_keywords else "공식 이미지"
|
||||||
|
candidates.append(
|
||||||
|
_candidate(f"{entry.name} {keyword}", "knowledge_entry", entry.name, 70)
|
||||||
|
)
|
||||||
|
for alias in entry.aliases:
|
||||||
|
if alias != entry.name:
|
||||||
|
candidates.append(
|
||||||
|
_candidate(f"{alias} {keyword}", "knowledge_alias", entry.name, 68)
|
||||||
|
)
|
||||||
|
|
||||||
|
return _unique_sorted(candidates)[: max(0, max_queries)]
|
||||||
|
|
||||||
|
def generate(
|
||||||
|
self,
|
||||||
|
evidence: list[Evidence],
|
||||||
|
knowledge_entries: list[KnowledgeBaseEntry],
|
||||||
|
max_queries: int = 5,
|
||||||
|
) -> list[str]:
|
||||||
|
return [
|
||||||
|
item.query
|
||||||
|
for item in self.plan(
|
||||||
|
evidence,
|
||||||
|
knowledge_entries,
|
||||||
|
max_queries=max_queries,
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _page_title_candidates(title: str) -> list[GeneratedSearchQuery]:
|
||||||
|
cleaned = _clean_title(title)
|
||||||
|
if not cleaned:
|
||||||
|
return []
|
||||||
|
|
||||||
|
candidates = [_candidate(cleaned, "google_page", cleaned, 100)]
|
||||||
|
if not _has_image_word(cleaned):
|
||||||
|
candidates.append(_candidate(f"{cleaned} {_image_query_word(cleaned)}", "google_page", cleaned, 96))
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
def _weak_hint_candidates(item: Evidence) -> list[GeneratedSearchQuery]:
|
||||||
|
if item.data.get("face_crop_search"):
|
||||||
|
return _face_crop_candidates(item)
|
||||||
|
|
||||||
|
candidates: list[GeneratedSearchQuery] = []
|
||||||
|
page_title = item.data.get("page_title")
|
||||||
|
if page_title:
|
||||||
|
candidates.extend(_weak_page_title_candidates(str(page_title)))
|
||||||
|
|
||||||
|
label = item.data.get("label")
|
||||||
|
entity = item.data.get("entity") or item.data.get("label")
|
||||||
|
if entity and (
|
||||||
|
not label
|
||||||
|
or _is_generic_best_guess_label(str(label))
|
||||||
|
):
|
||||||
|
candidates.extend(_weak_entity_candidates(str(entity)))
|
||||||
|
|
||||||
|
if label:
|
||||||
|
candidates.extend(_best_guess_label_candidates(str(label)))
|
||||||
|
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
def _weak_page_title_candidates(title: str) -> list[GeneratedSearchQuery]:
|
||||||
|
cleaned = _clean_title(title)
|
||||||
|
if not cleaned:
|
||||||
|
return []
|
||||||
|
|
||||||
|
candidates = [_candidate(cleaned, "google_page", cleaned, 88)]
|
||||||
|
if not _has_image_word(cleaned):
|
||||||
|
candidates.append(
|
||||||
|
_candidate(f"{cleaned} {_image_query_word(cleaned)}", "google_page", cleaned, 84)
|
||||||
|
)
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
def _entity_candidates(entity: str, category: object) -> list[GeneratedSearchQuery]:
|
||||||
|
cleaned = _clean_query(entity)
|
||||||
|
if not cleaned:
|
||||||
|
return []
|
||||||
|
|
||||||
|
if _is_person_category(str(category or "")):
|
||||||
|
templates = [
|
||||||
|
("{entity} 공식 프로필", 90),
|
||||||
|
("{entity} 사진", 86),
|
||||||
|
("{entity} 화보", 84),
|
||||||
|
]
|
||||||
|
else:
|
||||||
|
templates = [
|
||||||
|
("{entity} 공식 이미지", 88),
|
||||||
|
("{entity} 이미지", 82),
|
||||||
|
]
|
||||||
|
return [
|
||||||
|
_candidate(template.format(entity=cleaned), "google_entity", cleaned, priority)
|
||||||
|
for template, priority in templates
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _best_guess_label_candidates(label: str) -> list[GeneratedSearchQuery]:
|
||||||
|
cleaned = _clean_query(label)
|
||||||
|
if not cleaned or _is_generic_best_guess_label(cleaned):
|
||||||
|
return []
|
||||||
|
|
||||||
|
candidates = [_candidate(cleaned, "google_best_guess", cleaned, 60)]
|
||||||
|
if not _has_image_word(cleaned):
|
||||||
|
candidates.append(_candidate(f"{cleaned} {_image_query_word(cleaned)}", "google_best_guess", cleaned, 56))
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
def _weak_entity_candidates(entity: str) -> list[GeneratedSearchQuery]:
|
||||||
|
cleaned = _clean_query(entity)
|
||||||
|
if not cleaned or _is_generic_best_guess_label(cleaned):
|
||||||
|
return []
|
||||||
|
|
||||||
|
candidates: list[GeneratedSearchQuery] = [_candidate(cleaned, "google_entity", cleaned, 60)]
|
||||||
|
if not _has_image_word(cleaned):
|
||||||
|
candidates.append(_candidate(f"{cleaned} image", "google_entity", cleaned, 62))
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
def _face_crop_candidates(item: Evidence) -> list[GeneratedSearchQuery]:
|
||||||
|
candidates: list[GeneratedSearchQuery] = []
|
||||||
|
page_title = item.data.get("page_title")
|
||||||
|
if page_title:
|
||||||
|
candidates.extend(_face_crop_page_title_candidates(str(page_title)))
|
||||||
|
|
||||||
|
entity = item.data.get("entity") or item.data.get("label")
|
||||||
|
if entity:
|
||||||
|
candidates.extend(_face_crop_entity_candidates(str(entity)))
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
def _face_crop_page_title_candidates(title: str) -> list[GeneratedSearchQuery]:
|
||||||
|
cleaned = _clean_title(title)
|
||||||
|
if not cleaned:
|
||||||
|
return []
|
||||||
|
|
||||||
|
candidates = [_candidate(cleaned, "google_face_crop_page", cleaned, 74)]
|
||||||
|
if not _has_image_word(cleaned):
|
||||||
|
candidates.append(_candidate(f"{cleaned} {_image_query_word(cleaned)}", "google_face_crop_page", cleaned, 70))
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
def _face_crop_entity_candidates(entity: str) -> list[GeneratedSearchQuery]:
|
||||||
|
cleaned = _clean_query(entity)
|
||||||
|
if not cleaned or _is_generic_best_guess_label(cleaned):
|
||||||
|
return []
|
||||||
|
|
||||||
|
candidates = [_candidate(cleaned, "google_face_crop_entity", cleaned, 66)]
|
||||||
|
if not _has_image_word(cleaned):
|
||||||
|
candidates.append(_candidate(f"{cleaned} {_image_query_word(cleaned)}", "google_face_crop_entity", cleaned, 62))
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
def _local_metadata_candidates(item: Evidence) -> list[GeneratedSearchQuery]:
|
||||||
|
query = _clean_local_query(str(item.data.get("query", "")))
|
||||||
|
if not query or _is_generic_local_query(query):
|
||||||
|
return []
|
||||||
|
|
||||||
|
source = str(item.data.get("hint_source", "local")).strip() or "local"
|
||||||
|
candidates = [_candidate(query, "local_metadata", source, 54)]
|
||||||
|
if not _has_image_word(query):
|
||||||
|
candidates.append(_candidate(f"{query} {_image_query_word(query)}", "local_metadata", source, 50))
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
def _candidate(query: str, strategy: str, source: str, priority: int) -> GeneratedSearchQuery:
|
||||||
|
return GeneratedSearchQuery(
|
||||||
|
query=_trim_query(_clean_query(query)),
|
||||||
|
strategy=strategy,
|
||||||
|
source=_trim_query(_clean_query(source)),
|
||||||
|
priority=priority,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _unique_sorted(candidates: list[GeneratedSearchQuery]) -> list[GeneratedSearchQuery]:
|
||||||
|
result: list[GeneratedSearchQuery] = []
|
||||||
|
seen: set[str] = set()
|
||||||
|
for candidate in sorted(candidates, key=lambda item: item.priority, reverse=True):
|
||||||
|
if not candidate.query:
|
||||||
|
continue
|
||||||
|
key = _query_key(candidate.query)
|
||||||
|
if key in seen:
|
||||||
|
continue
|
||||||
|
seen.add(key)
|
||||||
|
result.append(candidate)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def _clean_title(value: str) -> str:
|
||||||
|
cleaned = _clean_query(value)
|
||||||
|
parts = re.split(r"\s(?:-|[|]|–|—|:)\s", cleaned, maxsplit=1)
|
||||||
|
return parts[0].strip()
|
||||||
|
|
||||||
|
|
||||||
|
def _clean_query(value: str) -> str:
|
||||||
|
text = html.unescape(str(value))
|
||||||
|
text = re.sub(r"<[^>]+>", " ", text)
|
||||||
|
text = re.sub(r"\s+", " ", text).strip()
|
||||||
|
return text
|
||||||
|
|
||||||
|
|
||||||
|
def _clean_local_query(value: str) -> str:
|
||||||
|
text = _clean_query(value)
|
||||||
|
text = re.sub(r"\.[a-zA-Z0-9]{2,5}$", "", text)
|
||||||
|
text = re.sub(r"[_]+", " ", text)
|
||||||
|
text = re.sub(r"\s+", " ", text).strip()
|
||||||
|
return text
|
||||||
|
|
||||||
|
|
||||||
|
def _trim_query(value: str, limit: int = 80) -> str:
|
||||||
|
if len(value) <= limit:
|
||||||
|
return value
|
||||||
|
trimmed = value[:limit].rsplit(" ", 1)[0].strip()
|
||||||
|
return trimmed or value[:limit].strip()
|
||||||
|
|
||||||
|
|
||||||
|
def _query_key(value: str) -> str:
|
||||||
|
return " ".join(value.casefold().split())
|
||||||
|
|
||||||
|
|
||||||
|
def _is_person_category(category: str) -> bool:
|
||||||
|
return category.casefold() in {"celebrity", "person", "public_figure", "actor", "artist"}
|
||||||
|
|
||||||
|
|
||||||
|
def _is_generic_best_guess_label(value: str) -> bool:
|
||||||
|
text = _query_key(value)
|
||||||
|
if len(text) < 2:
|
||||||
|
return True
|
||||||
|
generic_labels = {
|
||||||
|
"person",
|
||||||
|
"people",
|
||||||
|
"man",
|
||||||
|
"woman",
|
||||||
|
"gentleman",
|
||||||
|
"lady",
|
||||||
|
"girl",
|
||||||
|
"boy",
|
||||||
|
"portrait",
|
||||||
|
"face",
|
||||||
|
"photo",
|
||||||
|
"image",
|
||||||
|
"picture",
|
||||||
|
"selfie",
|
||||||
|
"screenshot",
|
||||||
|
"wallpaper",
|
||||||
|
}
|
||||||
|
return text in generic_labels
|
||||||
|
|
||||||
|
|
||||||
|
def _is_generic_local_query(value: str) -> bool:
|
||||||
|
text = _query_key(value)
|
||||||
|
if _is_generic_best_guess_label(text):
|
||||||
|
return True
|
||||||
|
if "sample" in text:
|
||||||
|
return True
|
||||||
|
if re.fullmatch(r"(img|image|photo|picture|portrait|screenshot|download|copy|scan|file)[\s_-]*\d*", text):
|
||||||
|
return True
|
||||||
|
if re.fullmatch(r"(sub|db|api|test|local)[\s_-]*\d*", text):
|
||||||
|
return True
|
||||||
|
if re.fullmatch(r"\d+", text):
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def _has_image_word(value: str) -> bool:
|
||||||
|
text = value.casefold()
|
||||||
|
markers = ("image", "photo", "picture", "사진", "이미지", "화보", "프로필")
|
||||||
|
return any(marker in text for marker in markers)
|
||||||
|
|
||||||
|
|
||||||
|
def _image_query_word(value: str) -> str:
|
||||||
|
return "이미지" if _has_hangul(value) else "image"
|
||||||
|
|
||||||
|
|
||||||
|
def _has_hangul(value: str) -> bool:
|
||||||
|
return any("\uac00" <= character <= "\ud7a3" for character in value)
|
||||||
52
src/rights_filter/analysis/search_result_promoter.py
Normal file
52
src/rights_filter/analysis/search_result_promoter.py
Normal file
|
|
@ -0,0 +1,52 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import re
|
||||||
|
from dataclasses import replace
|
||||||
|
|
||||||
|
from rights_filter.domain.records import Evidence, EvidenceSource
|
||||||
|
|
||||||
|
|
||||||
|
class SearchResultPromoter:
|
||||||
|
def promote(self, evidence: list[Evidence]) -> list[Evidence]:
|
||||||
|
return [self._promote_item(item) for item in evidence]
|
||||||
|
|
||||||
|
def _promote_item(self, item: Evidence) -> Evidence:
|
||||||
|
if item.source != EvidenceSource.NAVER_SEARCH:
|
||||||
|
return item
|
||||||
|
|
||||||
|
if item.reason == "Naver search returned no results":
|
||||||
|
return replace(item, confidence=0.0, data={**item.data, "promoted": False})
|
||||||
|
|
||||||
|
text = " ".join(
|
||||||
|
str(item.data.get(key, ""))
|
||||||
|
for key in ("title", "description", "query", "result_url")
|
||||||
|
).lower()
|
||||||
|
promoted = _has_named_rights_signal(text)
|
||||||
|
data = {
|
||||||
|
**item.data,
|
||||||
|
"promoted": promoted,
|
||||||
|
"promotion_reason": (
|
||||||
|
"named person/work search evidence" if promoted else "context-only search evidence"
|
||||||
|
),
|
||||||
|
}
|
||||||
|
return replace(item, confidence=0.8 if promoted else 0.2, data=data)
|
||||||
|
|
||||||
|
|
||||||
|
def _has_named_rights_signal(text: str) -> bool:
|
||||||
|
markers = {
|
||||||
|
"iu",
|
||||||
|
"official",
|
||||||
|
"album",
|
||||||
|
"cover",
|
||||||
|
"character",
|
||||||
|
"webtoon",
|
||||||
|
"game",
|
||||||
|
"broadcast",
|
||||||
|
"drama",
|
||||||
|
"copyright",
|
||||||
|
"celebrity",
|
||||||
|
}
|
||||||
|
# Match whole words only. Plain `in` matching let short markers like "iu"
|
||||||
|
# or "cover" hit inside unrelated words/URLs ("premium", "discover"),
|
||||||
|
# falsely promoting context-only results.
|
||||||
|
return any(re.search(rf"\b{re.escape(marker)}\b", text) for marker in markers)
|
||||||
1
src/rights_filter/domain/__init__.py
Normal file
1
src/rights_filter/domain/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
"""Domain records for the rights filter."""
|
||||||
33
src/rights_filter/domain/knowledge_base.py
Normal file
33
src/rights_filter/domain/knowledge_base.py
Normal file
|
|
@ -0,0 +1,33 @@
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
KnowledgeBaseEntry,
|
||||||
|
KnowledgeEntryType,
|
||||||
|
KnowledgeProvenance,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def create_manual_knowledge_entry(
|
||||||
|
entry_type: KnowledgeEntryType,
|
||||||
|
name: str,
|
||||||
|
aliases: list[str] | None = None,
|
||||||
|
related_keywords: list[str] | None = None,
|
||||||
|
policy_memo: str = "",
|
||||||
|
exception_conditions: str = "",
|
||||||
|
sample_fingerprints: list[str] | None = None,
|
||||||
|
) -> KnowledgeBaseEntry:
|
||||||
|
return KnowledgeBaseEntry.create(
|
||||||
|
entry_type=entry_type,
|
||||||
|
name=name,
|
||||||
|
provenance=KnowledgeProvenance.MANUAL,
|
||||||
|
aliases=aliases,
|
||||||
|
related_keywords=related_keywords,
|
||||||
|
policy_memo=policy_memo,
|
||||||
|
exception_conditions=exception_conditions,
|
||||||
|
sample_fingerprints=sample_fingerprints,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"KnowledgeBaseEntry",
|
||||||
|
"KnowledgeEntryType",
|
||||||
|
"KnowledgeProvenance",
|
||||||
|
"create_manual_knowledge_entry",
|
||||||
|
]
|
||||||
11
src/rights_filter/domain/policies.py
Normal file
11
src/rights_filter/domain/policies.py
Normal file
|
|
@ -0,0 +1,11 @@
|
||||||
|
from rights_filter.governance.policies import (
|
||||||
|
DataClassPolicy,
|
||||||
|
GovernancePolicyRegistry,
|
||||||
|
assert_no_biometric_template,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"DataClassPolicy",
|
||||||
|
"GovernancePolicyRegistry",
|
||||||
|
"assert_no_biometric_template",
|
||||||
|
]
|
||||||
253
src/rights_filter/domain/records.py
Normal file
253
src/rights_filter/domain/records.py
Normal file
|
|
@ -0,0 +1,253 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from enum import StrEnum
|
||||||
|
from itertools import count
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
|
||||||
|
_id_counter = count(1)
|
||||||
|
|
||||||
|
|
||||||
|
def _new_id(prefix: str) -> str:
|
||||||
|
return f"{prefix}-{next(_id_counter)}"
|
||||||
|
|
||||||
|
|
||||||
|
class EvidenceSource(StrEnum):
|
||||||
|
FINGERPRINT = "fingerprint"
|
||||||
|
FACE_PERSON = "face_person"
|
||||||
|
WEB_DETECTION = "web_detection"
|
||||||
|
NAVER_SEARCH = "naver_search"
|
||||||
|
LLM_SUMMARY = "llm_summary"
|
||||||
|
EXTERNAL_SKIPPED = "external_skipped"
|
||||||
|
SEARCH_SKIPPED = "search_skipped"
|
||||||
|
FAILURE = "failure"
|
||||||
|
ENRICHMENT_FAILURE = "enrichment_failure"
|
||||||
|
|
||||||
|
|
||||||
|
class KnowledgeEntryType(StrEnum):
|
||||||
|
CELEBRITY = "celebrity"
|
||||||
|
GROUP = "group"
|
||||||
|
WORK = "work"
|
||||||
|
CHARACTER = "character"
|
||||||
|
WEBTOON = "webtoon"
|
||||||
|
GAME = "game"
|
||||||
|
REJECTED_IMAGE = "rejected_image"
|
||||||
|
OTHER = "other"
|
||||||
|
|
||||||
|
|
||||||
|
class KnowledgeProvenance(StrEnum):
|
||||||
|
MANUAL = "manual"
|
||||||
|
AUTOMATIC_REJECTION = "automatic_rejection"
|
||||||
|
EXTERNAL_EVIDENCE = "external_evidence"
|
||||||
|
|
||||||
|
|
||||||
|
class ReviewStatus(StrEnum):
|
||||||
|
APPROVED = "approved"
|
||||||
|
HELD = "held"
|
||||||
|
REJECTED = "rejected"
|
||||||
|
|
||||||
|
|
||||||
|
class DataClass(StrEnum):
|
||||||
|
ORIGINAL_IMAGE = "original_image"
|
||||||
|
INTERNAL_DERIVATIVE = "internal_derivative"
|
||||||
|
EXTERNAL_DERIVATIVE = "external_derivative"
|
||||||
|
IMAGE_FINGERPRINT = "image_fingerprint"
|
||||||
|
WEB_EVIDENCE = "web_evidence"
|
||||||
|
SEARCH_EVIDENCE = "search_evidence"
|
||||||
|
LLM_SUMMARY = "llm_summary"
|
||||||
|
PROVIDER_METADATA = "provider_metadata"
|
||||||
|
OPERATOR_NOTE = "operator_note"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class Evidence:
|
||||||
|
source: EvidenceSource
|
||||||
|
reason: str
|
||||||
|
confidence: float
|
||||||
|
data: dict[str, Any] = field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ScoreResult:
|
||||||
|
score: int
|
||||||
|
band: str
|
||||||
|
reasons: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class AnalysisRun:
|
||||||
|
id: str
|
||||||
|
submission_id: str
|
||||||
|
analysis_version: str
|
||||||
|
evidence: list[Evidence] = field(default_factory=list)
|
||||||
|
score: ScoreResult | None = None
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def for_submission(cls, submission_id: str, analysis_version: str) -> "AnalysisRun":
|
||||||
|
return cls(
|
||||||
|
id=_new_id("analysis"),
|
||||||
|
submission_id=submission_id,
|
||||||
|
analysis_version=analysis_version,
|
||||||
|
)
|
||||||
|
|
||||||
|
def add_evidence(self, evidence: Evidence) -> None:
|
||||||
|
self.evidence.append(evidence)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class KnowledgeBaseEntry:
|
||||||
|
id: str
|
||||||
|
entry_type: KnowledgeEntryType
|
||||||
|
name: str
|
||||||
|
provenance: KnowledgeProvenance
|
||||||
|
aliases: list[str] = field(default_factory=list)
|
||||||
|
related_keywords: list[str] = field(default_factory=list)
|
||||||
|
policy_memo: str = ""
|
||||||
|
exception_conditions: str = ""
|
||||||
|
sample_fingerprints: list[str] = field(default_factory=list)
|
||||||
|
source_decision_id: str | None = None
|
||||||
|
entry_status: str = "confirmed"
|
||||||
|
source_submission_id: str = ""
|
||||||
|
active: bool = True
|
||||||
|
deactivation_reason: str | None = None
|
||||||
|
data_classes: set[DataClass] = field(default_factory=set)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def create(
|
||||||
|
cls,
|
||||||
|
entry_type: KnowledgeEntryType,
|
||||||
|
name: str,
|
||||||
|
provenance: KnowledgeProvenance,
|
||||||
|
aliases: list[str] | None = None,
|
||||||
|
related_keywords: list[str] | None = None,
|
||||||
|
policy_memo: str = "",
|
||||||
|
exception_conditions: str = "",
|
||||||
|
sample_fingerprints: list[str] | None = None,
|
||||||
|
source_decision_id: str | None = None,
|
||||||
|
entry_status: str = "confirmed",
|
||||||
|
source_submission_id: str = "",
|
||||||
|
) -> "KnowledgeBaseEntry":
|
||||||
|
data_classes = {DataClass.OPERATOR_NOTE}
|
||||||
|
if sample_fingerprints:
|
||||||
|
data_classes.add(DataClass.IMAGE_FINGERPRINT)
|
||||||
|
|
||||||
|
return cls(
|
||||||
|
id=_new_id("knowledge"),
|
||||||
|
entry_type=entry_type,
|
||||||
|
name=name,
|
||||||
|
provenance=provenance,
|
||||||
|
aliases=aliases or [],
|
||||||
|
related_keywords=related_keywords or [],
|
||||||
|
policy_memo=policy_memo,
|
||||||
|
exception_conditions=exception_conditions,
|
||||||
|
sample_fingerprints=sample_fingerprints or [],
|
||||||
|
source_decision_id=source_decision_id,
|
||||||
|
entry_status=entry_status,
|
||||||
|
source_submission_id=source_submission_id,
|
||||||
|
data_classes=data_classes,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class OperatorDecision:
|
||||||
|
id: str
|
||||||
|
submission_id: str
|
||||||
|
status: ReviewStatus
|
||||||
|
memo: str = ""
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def create(
|
||||||
|
cls, submission_id: str, status: ReviewStatus, memo: str = ""
|
||||||
|
) -> "OperatorDecision":
|
||||||
|
return cls(
|
||||||
|
id=_new_id("decision"),
|
||||||
|
submission_id=submission_id,
|
||||||
|
status=status,
|
||||||
|
memo=memo,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class InMemoryRightsFilterRepository:
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._analysis_runs: list[AnalysisRun] = []
|
||||||
|
self._knowledge_entries: dict[str, KnowledgeBaseEntry] = {}
|
||||||
|
self._operator_decisions: dict[str, OperatorDecision] = {}
|
||||||
|
|
||||||
|
def save_analysis_run(self, run: AnalysisRun) -> None:
|
||||||
|
self._analysis_runs.append(run)
|
||||||
|
|
||||||
|
def analysis_runs_for_submission(self, submission_id: str) -> list[AnalysisRun]:
|
||||||
|
return [
|
||||||
|
run
|
||||||
|
for run in self._analysis_runs
|
||||||
|
if run.submission_id == submission_id
|
||||||
|
]
|
||||||
|
|
||||||
|
def has_analysis_run(self, submission_id: str, analysis_version: str) -> bool:
|
||||||
|
return any(
|
||||||
|
run.submission_id == submission_id
|
||||||
|
and run.analysis_version == analysis_version
|
||||||
|
for run in self._analysis_runs
|
||||||
|
)
|
||||||
|
|
||||||
|
def latest_score_for_submission(self, submission_id: str) -> ScoreResult | None:
|
||||||
|
for run in reversed(self._analysis_runs):
|
||||||
|
if run.submission_id == submission_id and run.score is not None:
|
||||||
|
return run.score
|
||||||
|
return None
|
||||||
|
|
||||||
|
def save_knowledge_entry(self, entry: KnowledgeBaseEntry) -> None:
|
||||||
|
self._knowledge_entries[entry.id] = entry
|
||||||
|
|
||||||
|
def knowledge_entry(self, entry_id: str) -> KnowledgeBaseEntry:
|
||||||
|
return self._knowledge_entries[entry_id]
|
||||||
|
|
||||||
|
def active_knowledge_entries(self) -> list[KnowledgeBaseEntry]:
|
||||||
|
return [
|
||||||
|
entry
|
||||||
|
for entry in self._knowledge_entries.values()
|
||||||
|
if entry.active
|
||||||
|
]
|
||||||
|
|
||||||
|
def knowledge_entries_for_source_decision(
|
||||||
|
self, decision_id: str
|
||||||
|
) -> list[KnowledgeBaseEntry]:
|
||||||
|
return [
|
||||||
|
entry
|
||||||
|
for entry in self._knowledge_entries.values()
|
||||||
|
if entry.source_decision_id == decision_id
|
||||||
|
]
|
||||||
|
|
||||||
|
def deactivate_knowledge_entry(self, entry_id: str, reason: str) -> None:
|
||||||
|
entry = self.knowledge_entry(entry_id)
|
||||||
|
entry.active = False
|
||||||
|
entry.deactivation_reason = reason
|
||||||
|
|
||||||
|
def deactivate_entries_for_source_decision(
|
||||||
|
self, decision_id: str, reason: str
|
||||||
|
) -> list[KnowledgeBaseEntry]:
|
||||||
|
entries = self.knowledge_entries_for_source_decision(decision_id)
|
||||||
|
for entry in entries:
|
||||||
|
entry.active = False
|
||||||
|
entry.deactivation_reason = reason
|
||||||
|
return entries
|
||||||
|
|
||||||
|
def save_operator_decision(self, decision: OperatorDecision) -> None:
|
||||||
|
self._operator_decisions[decision.id] = decision
|
||||||
|
|
||||||
|
def operator_decision(self, decision_id: str) -> OperatorDecision:
|
||||||
|
return self._operator_decisions[decision_id]
|
||||||
|
|
||||||
|
def create_rejected_image_entry(
|
||||||
|
self, decision_id: str, submission_id: str, fingerprints: list[str]
|
||||||
|
) -> KnowledgeBaseEntry:
|
||||||
|
entry = KnowledgeBaseEntry.create(
|
||||||
|
entry_type=KnowledgeEntryType.REJECTED_IMAGE,
|
||||||
|
name=f"rejected:{submission_id}",
|
||||||
|
provenance=KnowledgeProvenance.AUTOMATIC_REJECTION,
|
||||||
|
source_decision_id=decision_id,
|
||||||
|
sample_fingerprints=fingerprints,
|
||||||
|
)
|
||||||
|
self.save_knowledge_entry(entry)
|
||||||
|
return entry
|
||||||
3
src/rights_filter/domain/repositories.py
Normal file
3
src/rights_filter/domain/repositories.py
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
from rights_filter.domain.records import InMemoryRightsFilterRepository
|
||||||
|
|
||||||
|
__all__ = ["InMemoryRightsFilterRepository"]
|
||||||
1
src/rights_filter/governance/__init__.py
Normal file
1
src/rights_filter/governance/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
"""Governance policies for sensitive rights-filter data."""
|
||||||
3
src/rights_filter/governance/access_policy.py
Normal file
3
src/rights_filter/governance/access_policy.py
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
from rights_filter.governance.policies import DataClassPolicy, GovernancePolicyRegistry
|
||||||
|
|
||||||
|
__all__ = ["DataClassPolicy", "GovernancePolicyRegistry"]
|
||||||
3
src/rights_filter/governance/correction_policy.py
Normal file
3
src/rights_filter/governance/correction_policy.py
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
from rights_filter.governance.policies import DataClassPolicy, GovernancePolicyRegistry
|
||||||
|
|
||||||
|
__all__ = ["DataClassPolicy", "GovernancePolicyRegistry"]
|
||||||
3
src/rights_filter/governance/data_classes.py
Normal file
3
src/rights_filter/governance/data_classes.py
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
from rights_filter.domain.records import DataClass
|
||||||
|
|
||||||
|
__all__ = ["DataClass"]
|
||||||
114
src/rights_filter/governance/policies.py
Normal file
114
src/rights_filter/governance/policies.py
Normal file
|
|
@ -0,0 +1,114 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from rights_filter.domain.records import DataClass
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class DataClassPolicy:
|
||||||
|
access_roles: set[str]
|
||||||
|
retention: str
|
||||||
|
deletion: str
|
||||||
|
correction: str
|
||||||
|
|
||||||
|
|
||||||
|
class GovernancePolicyRegistry:
|
||||||
|
def __init__(self, policies: dict[DataClass, DataClassPolicy]) -> None:
|
||||||
|
self._policies = policies
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def default(cls) -> "GovernancePolicyRegistry":
|
||||||
|
return cls(
|
||||||
|
{
|
||||||
|
DataClass.ORIGINAL_IMAGE: DataClassPolicy(
|
||||||
|
{"operator", "admin"},
|
||||||
|
"application retention policy",
|
||||||
|
"delete or detach according to submission lifecycle",
|
||||||
|
"correct through source submission workflow",
|
||||||
|
),
|
||||||
|
DataClass.INTERNAL_DERIVATIVE: DataClassPolicy(
|
||||||
|
{"system", "admin"},
|
||||||
|
"short-lived analysis retention",
|
||||||
|
"delete after analysis or retention expiry",
|
||||||
|
"regenerate from corrected original",
|
||||||
|
),
|
||||||
|
DataClass.EXTERNAL_DERIVATIVE: DataClassPolicy(
|
||||||
|
{"system", "admin"},
|
||||||
|
"short-lived external-call retention",
|
||||||
|
"delete after provider request completes",
|
||||||
|
"regenerate only after compliance-approved rerun",
|
||||||
|
),
|
||||||
|
DataClass.IMAGE_FINGERPRINT: DataClassPolicy(
|
||||||
|
{"operator", "admin", "system"},
|
||||||
|
"evidence retention policy",
|
||||||
|
"delete or deactivate with source correction",
|
||||||
|
"deactivate stale automatic references",
|
||||||
|
),
|
||||||
|
DataClass.WEB_EVIDENCE: DataClassPolicy(
|
||||||
|
{"operator", "admin"},
|
||||||
|
"evidence retention policy",
|
||||||
|
"delete with analysis evidence policy",
|
||||||
|
"supersede with new analysis run",
|
||||||
|
),
|
||||||
|
DataClass.SEARCH_EVIDENCE: DataClassPolicy(
|
||||||
|
{"operator", "admin"},
|
||||||
|
"search evidence retention policy",
|
||||||
|
"delete with analysis evidence policy",
|
||||||
|
"supersede with new search-enriched analysis",
|
||||||
|
),
|
||||||
|
DataClass.LLM_SUMMARY: DataClassPolicy(
|
||||||
|
{"operator", "admin"},
|
||||||
|
"operator-only summary retention policy",
|
||||||
|
"delete with analysis evidence policy",
|
||||||
|
"regenerate from corrected source evidence",
|
||||||
|
),
|
||||||
|
DataClass.PROVIDER_METADATA: DataClassPolicy(
|
||||||
|
{"admin"},
|
||||||
|
"provider audit retention policy",
|
||||||
|
"delete with provider audit policy",
|
||||||
|
"correct through provider evidence review",
|
||||||
|
),
|
||||||
|
DataClass.OPERATOR_NOTE: DataClassPolicy(
|
||||||
|
{"operator", "admin"},
|
||||||
|
"decision audit retention policy",
|
||||||
|
"delete with decision audit policy",
|
||||||
|
"append correction rather than overwriting audit trail",
|
||||||
|
),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
def policy_for(self, data_class: DataClass) -> DataClassPolicy:
|
||||||
|
return self._policies[data_class]
|
||||||
|
|
||||||
|
|
||||||
|
def assert_no_biometric_template(payload: dict[str, Any]) -> None:
|
||||||
|
forbidden_keys = {"embedding", "face_embedding", "biometric_template"}
|
||||||
|
if forbidden_keys.intersection(payload):
|
||||||
|
raise ValueError("biometric template storage is not allowed")
|
||||||
|
|
||||||
|
|
||||||
|
def assert_operator_evidence_payload_allowed(
|
||||||
|
data_class: DataClass, payload: dict[str, Any]
|
||||||
|
) -> None:
|
||||||
|
assert_no_biometric_template(payload)
|
||||||
|
|
||||||
|
if data_class == DataClass.SEARCH_EVIDENCE:
|
||||||
|
forbidden_image_keys = {
|
||||||
|
"original_image",
|
||||||
|
"internal_derivative",
|
||||||
|
"external_derivative",
|
||||||
|
"image_payload",
|
||||||
|
"image_content",
|
||||||
|
"derivative",
|
||||||
|
"content",
|
||||||
|
}
|
||||||
|
if forbidden_image_keys.intersection(payload):
|
||||||
|
raise ValueError("image payload cannot be stored for search evidence")
|
||||||
|
|
||||||
|
if data_class == DataClass.LLM_SUMMARY:
|
||||||
|
source_urls = payload.get("source_urls") or []
|
||||||
|
source_evidence_ids = payload.get("source_evidence_ids") or []
|
||||||
|
if not source_urls and not source_evidence_ids:
|
||||||
|
raise ValueError("LLM summary must reference source evidence")
|
||||||
3
src/rights_filter/governance/retention_policy.py
Normal file
3
src/rights_filter/governance/retention_policy.py
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
from rights_filter.governance.policies import DataClassPolicy, GovernancePolicyRegistry
|
||||||
|
|
||||||
|
__all__ = ["DataClassPolicy", "GovernancePolicyRegistry"]
|
||||||
1
src/rights_filter/integrations/__init__.py
Normal file
1
src/rights_filter/integrations/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
"""External integration adapters."""
|
||||||
396
src/rights_filter/integrations/cloud_vision_web_detection.py
Normal file
396
src/rights_filter/integrations/cloud_vision_web_detection.py
Normal file
|
|
@ -0,0 +1,396 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import base64
|
||||||
|
from typing import Any
|
||||||
|
from urllib.parse import quote_plus
|
||||||
|
|
||||||
|
from rights_filter.analysis.preprocessing import ImagePayload
|
||||||
|
from rights_filter.domain.records import Evidence, EvidenceSource
|
||||||
|
from rights_filter.integrations.external_policy import ExternalApiPolicy
|
||||||
|
from rights_filter.integrations.http_json import UrllibJsonTransport
|
||||||
|
|
||||||
|
|
||||||
|
class FakeWebDetectionClient:
|
||||||
|
def __init__(self, response: dict[str, Any] | None = None) -> None:
|
||||||
|
self.response = response or {}
|
||||||
|
self.calls: list[ImagePayload] = []
|
||||||
|
|
||||||
|
def detect_web(self, image: ImagePayload) -> dict[str, Any]:
|
||||||
|
self.calls.append(image)
|
||||||
|
return self.response
|
||||||
|
|
||||||
|
|
||||||
|
class GoogleVisionRestClient:
|
||||||
|
endpoint = "https://vision.googleapis.com/v1/images:annotate"
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
api_key: str,
|
||||||
|
transport: Any | None = None,
|
||||||
|
parent: str | None = None,
|
||||||
|
max_results: int = 10,
|
||||||
|
timeout: int = 15,
|
||||||
|
) -> None:
|
||||||
|
self.api_key = api_key
|
||||||
|
self.transport = transport or UrllibJsonTransport()
|
||||||
|
self.parent = parent
|
||||||
|
self.max_results = max_results
|
||||||
|
self.timeout = timeout
|
||||||
|
|
||||||
|
def detect_web(self, image: ImagePayload) -> dict[str, Any]:
|
||||||
|
request: dict[str, Any] = {
|
||||||
|
"image": {
|
||||||
|
"content": base64.b64encode(image.content).decode("ascii"),
|
||||||
|
},
|
||||||
|
"features": [
|
||||||
|
{
|
||||||
|
"type": "WEB_DETECTION",
|
||||||
|
"maxResults": self.max_results,
|
||||||
|
}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
payload: dict[str, Any] = {"requests": [request]}
|
||||||
|
if self.parent:
|
||||||
|
payload["parent"] = self.parent
|
||||||
|
response = self.transport.request_json(
|
||||||
|
"POST",
|
||||||
|
f"{self.endpoint}?key={quote_plus(self.api_key)}",
|
||||||
|
payload=payload,
|
||||||
|
timeout=self.timeout,
|
||||||
|
)
|
||||||
|
web_detection = (response.get("responses") or [{}])[0].get("webDetection", {})
|
||||||
|
return {
|
||||||
|
"web_entities": [
|
||||||
|
{
|
||||||
|
"description": item.get("description", ""),
|
||||||
|
"score": item.get("score", 0),
|
||||||
|
"entity_id": item.get("entityId", ""),
|
||||||
|
}
|
||||||
|
for item in web_detection.get("webEntities", [])
|
||||||
|
],
|
||||||
|
"full_matching_images": [
|
||||||
|
_web_image_result(item)
|
||||||
|
for item in web_detection.get("fullMatchingImages", [])
|
||||||
|
],
|
||||||
|
"partial_matching_images": [
|
||||||
|
_web_image_result(item)
|
||||||
|
for item in web_detection.get("partialMatchingImages", [])
|
||||||
|
],
|
||||||
|
"visually_similar_images": [
|
||||||
|
_web_image_result(item)
|
||||||
|
for item in web_detection.get("visuallySimilarImages", [])
|
||||||
|
],
|
||||||
|
"pages_with_matching_images": [
|
||||||
|
_web_page_result(item)
|
||||||
|
for item in web_detection.get("pagesWithMatchingImages", [])
|
||||||
|
],
|
||||||
|
"best_guess_labels": web_detection.get("bestGuessLabels", []),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class CloudVisionWebDetectionAdapter:
|
||||||
|
def __init__(self, client: Any) -> None:
|
||||||
|
self.client = client
|
||||||
|
|
||||||
|
def detect(
|
||||||
|
self, submission_id: str, derivative: ImagePayload, policy: ExternalApiPolicy
|
||||||
|
) -> list[Evidence]:
|
||||||
|
allowed, reason = policy.can_call()
|
||||||
|
if not allowed:
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.EXTERNAL_SKIPPED,
|
||||||
|
reason=reason or "external API skipped",
|
||||||
|
confidence=1.0,
|
||||||
|
data={"submission_id": submission_id},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = self.client.detect_web(derivative)
|
||||||
|
except Exception as exc: # pragma: no cover - defensive adapter boundary
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.FAILURE,
|
||||||
|
reason=f"External API failed: {exc}",
|
||||||
|
confidence=1.0,
|
||||||
|
data={"submission_id": submission_id},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
# Record the quota debit only after the request actually succeeds, so a
|
||||||
|
# failed/timed-out call does not permanently consume the daily limit.
|
||||||
|
policy.record_call()
|
||||||
|
return _map_response(submission_id, response)
|
||||||
|
|
||||||
|
|
||||||
|
def _map_response(submission_id: str, response: dict[str, Any]) -> list[Evidence]:
|
||||||
|
evidence: list[Evidence] = []
|
||||||
|
|
||||||
|
for entity in response.get("web_entities", []):
|
||||||
|
description = entity.get("description", "")
|
||||||
|
# `or 0` guards an explicit JSON null (which float(None) would crash on);
|
||||||
|
# clamp to [0, 1] because web-entity relevance scores are unbounded.
|
||||||
|
score = max(0.0, min(1.0, float(entity.get("score", 0) or 0)))
|
||||||
|
evidence.append(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason=f"Web entity matched {description}",
|
||||||
|
confidence=score,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"entity": description,
|
||||||
|
"entity_id": entity.get("entity_id", ""),
|
||||||
|
"score": score,
|
||||||
|
"provider": "google",
|
||||||
|
"match": "entity",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence.extend(
|
||||||
|
_image_match_evidence(
|
||||||
|
submission_id,
|
||||||
|
response.get("full_matching_images", []),
|
||||||
|
match="full",
|
||||||
|
reason="Google full image match found",
|
||||||
|
fallback_confidence=0.9,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
evidence.extend(
|
||||||
|
_image_match_evidence(
|
||||||
|
submission_id,
|
||||||
|
response.get("partial_matching_images", []),
|
||||||
|
match="partial",
|
||||||
|
reason="Google partial image match found",
|
||||||
|
fallback_confidence=0.75,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
evidence.extend(
|
||||||
|
_image_match_evidence(
|
||||||
|
submission_id,
|
||||||
|
response.get("visually_similar_images", []),
|
||||||
|
match="visual",
|
||||||
|
reason="Google visually similar image found",
|
||||||
|
fallback_confidence=0.55,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
for page in response.get("pages_with_matching_images", []):
|
||||||
|
url = page.get("url", "")
|
||||||
|
score = float(page.get("score", 0) or 0)
|
||||||
|
evidence.append(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google page with matching image found",
|
||||||
|
confidence=score or 0.75,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"url": url,
|
||||||
|
"result_url": url,
|
||||||
|
"page_title": page.get("page_title", ""),
|
||||||
|
"page_image_urls": _page_image_urls_from_page(page),
|
||||||
|
"score": score,
|
||||||
|
"match": "page",
|
||||||
|
"provider": "google",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
for label in response.get("best_guess_labels", []):
|
||||||
|
text = label.get("label", "")
|
||||||
|
evidence.append(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason=f"Google weak label {text}",
|
||||||
|
confidence=0.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"label": text,
|
||||||
|
"weak_hint": True,
|
||||||
|
"provider": "google",
|
||||||
|
"match": "weak_label",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
return evidence
|
||||||
|
|
||||||
|
|
||||||
|
def _web_image_result(item: dict[str, Any]) -> dict[str, Any]:
|
||||||
|
result = {
|
||||||
|
"url": _first_image_url(
|
||||||
|
item,
|
||||||
|
(
|
||||||
|
"url",
|
||||||
|
"imageUrl",
|
||||||
|
"image_url",
|
||||||
|
"imageurl",
|
||||||
|
"contentUrl",
|
||||||
|
"content_url",
|
||||||
|
"contenturl",
|
||||||
|
"src",
|
||||||
|
"source",
|
||||||
|
"mediaUrl",
|
||||||
|
"media_url",
|
||||||
|
"mediaurl",
|
||||||
|
"thumbnailUrl",
|
||||||
|
"thumbnail_url",
|
||||||
|
"thumbnailurl",
|
||||||
|
"thumbnail",
|
||||||
|
),
|
||||||
|
),
|
||||||
|
"score": item.get("score", 0),
|
||||||
|
}
|
||||||
|
thumbnail_url = _first_image_url(
|
||||||
|
item,
|
||||||
|
("thumbnailUrl", "thumbnail_url", "thumbnailurl", "thumbnail"),
|
||||||
|
)
|
||||||
|
if thumbnail_url:
|
||||||
|
result["thumbnail_url"] = thumbnail_url
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def _web_page_result(item: dict[str, Any]) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"url": item.get("url", ""),
|
||||||
|
"score": item.get("score", 0),
|
||||||
|
"page_title": item.get("pageTitle", item.get("page_title", "")),
|
||||||
|
"page_image_urls": _page_image_urls_from_page(item),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _page_image_urls_from_page(page: dict[str, Any]) -> list[str]:
|
||||||
|
candidates: list[str] = []
|
||||||
|
candidates.extend(_image_values(page.get("page_image_urls", [])))
|
||||||
|
for key in (
|
||||||
|
"fullMatchingImages",
|
||||||
|
"full_matching_images",
|
||||||
|
"partialMatchingImages",
|
||||||
|
"partial_matching_images",
|
||||||
|
"visuallySimilarImages",
|
||||||
|
"visually_similar_images",
|
||||||
|
):
|
||||||
|
candidates.extend(_image_values(page.get(key, [])))
|
||||||
|
for key in (
|
||||||
|
"contentUrl",
|
||||||
|
"content_url",
|
||||||
|
"imageUrl",
|
||||||
|
"image_url",
|
||||||
|
"thumbnail",
|
||||||
|
"thumbnailUrl",
|
||||||
|
"thumbnail_url",
|
||||||
|
"urlToImage",
|
||||||
|
"url_to_image",
|
||||||
|
):
|
||||||
|
candidates.extend(_image_values(page.get(key, "")))
|
||||||
|
return _unique_texts(candidates)
|
||||||
|
|
||||||
|
|
||||||
|
def _image_values(value: Any) -> list[str]:
|
||||||
|
if isinstance(value, str):
|
||||||
|
return [value]
|
||||||
|
if isinstance(value, list):
|
||||||
|
values: list[str] = []
|
||||||
|
for item in value:
|
||||||
|
values.extend(_image_values(item))
|
||||||
|
return values
|
||||||
|
if isinstance(value, dict):
|
||||||
|
values: list[str] = []
|
||||||
|
for key in (
|
||||||
|
"contentUrl",
|
||||||
|
"content_url",
|
||||||
|
"contenturl",
|
||||||
|
"imageUrl",
|
||||||
|
"image_url",
|
||||||
|
"imageurl",
|
||||||
|
"mediaUrl",
|
||||||
|
"media_url",
|
||||||
|
"mediaurl",
|
||||||
|
"src",
|
||||||
|
"source",
|
||||||
|
"thumbnailUrl",
|
||||||
|
"thumbnail_url",
|
||||||
|
"thumbnailurl",
|
||||||
|
"thumbnail",
|
||||||
|
"url",
|
||||||
|
):
|
||||||
|
if value.get(key):
|
||||||
|
values.extend(_image_values(value[key]))
|
||||||
|
return values
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def _first_image_url(item: dict[str, Any], keys: tuple[str, ...]) -> str:
|
||||||
|
for key in keys:
|
||||||
|
if item.get(key):
|
||||||
|
values = _image_values(item[key])
|
||||||
|
if values:
|
||||||
|
return values[0]
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def _unique_texts(values: list[str]) -> list[str]:
|
||||||
|
seen: set[str] = set()
|
||||||
|
result: list[str] = []
|
||||||
|
for value in values:
|
||||||
|
text = str(value or "").strip()
|
||||||
|
if not text or text in seen:
|
||||||
|
continue
|
||||||
|
seen.add(text)
|
||||||
|
result.append(text)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def _image_match_evidence(
|
||||||
|
submission_id: str,
|
||||||
|
images: list[dict[str, Any]],
|
||||||
|
match: str,
|
||||||
|
reason: str,
|
||||||
|
fallback_confidence: float,
|
||||||
|
) -> list[Evidence]:
|
||||||
|
evidence: list[Evidence] = []
|
||||||
|
for image in images:
|
||||||
|
url = _first_image_url(
|
||||||
|
image,
|
||||||
|
(
|
||||||
|
"url",
|
||||||
|
"imageUrl",
|
||||||
|
"image_url",
|
||||||
|
"imageurl",
|
||||||
|
"contentUrl",
|
||||||
|
"content_url",
|
||||||
|
"contenturl",
|
||||||
|
"src",
|
||||||
|
"source",
|
||||||
|
"mediaUrl",
|
||||||
|
"media_url",
|
||||||
|
"mediaurl",
|
||||||
|
"thumbnailUrl",
|
||||||
|
"thumbnail_url",
|
||||||
|
"thumbnailurl",
|
||||||
|
"thumbnail",
|
||||||
|
),
|
||||||
|
)
|
||||||
|
thumbnail_url = _first_image_url(
|
||||||
|
image,
|
||||||
|
("thumbnailUrl", "thumbnail_url", "thumbnailurl", "thumbnail"),
|
||||||
|
)
|
||||||
|
score = float(image.get("score", 0) or 0)
|
||||||
|
evidence.append(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason=reason,
|
||||||
|
confidence=score or fallback_confidence,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"url": url,
|
||||||
|
"image_url": url,
|
||||||
|
"thumbnail_url": thumbnail_url,
|
||||||
|
"score": score,
|
||||||
|
"match": match,
|
||||||
|
"provider": "google",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return evidence
|
||||||
311
src/rights_filter/integrations/env_clients.py
Normal file
311
src/rights_filter/integrations/env_clients.py
Normal file
|
|
@ -0,0 +1,311 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Any, Mapping
|
||||||
|
|
||||||
|
from rights_filter.analysis.llm_assistance import InternalLlmAssistant, OllamaGenerateLlmClient
|
||||||
|
from rights_filter.integrations.cloud_vision_web_detection import (
|
||||||
|
CloudVisionWebDetectionAdapter,
|
||||||
|
GoogleVisionRestClient,
|
||||||
|
)
|
||||||
|
from rights_filter.integrations.external_policy import ExternalApiPolicy
|
||||||
|
from rights_filter.integrations.google_custom_search import (
|
||||||
|
GoogleCustomSearchAdapter,
|
||||||
|
GoogleCustomSearchClient,
|
||||||
|
)
|
||||||
|
from rights_filter.integrations.naver_search import NaverOpenApiSearchClient, NaverSearchAdapter
|
||||||
|
from rights_filter.integrations.search_policy import SearchApiPolicy
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ProviderRuntime:
|
||||||
|
naver_adapter: NaverSearchAdapter | None
|
||||||
|
search_policy: SearchApiPolicy
|
||||||
|
google_adapter: CloudVisionWebDetectionAdapter | None
|
||||||
|
external_policy: ExternalApiPolicy
|
||||||
|
google_custom_search_adapter: GoogleCustomSearchAdapter | None
|
||||||
|
google_custom_search_policy: SearchApiPolicy
|
||||||
|
face_crop_web_detection_enabled: bool
|
||||||
|
auto_naver_query_limit: int
|
||||||
|
auto_naver_blog_query_limit: int
|
||||||
|
auto_naver_web_query_limit: int
|
||||||
|
auto_google_custom_query_limit: int
|
||||||
|
search_result_compare_limit: int
|
||||||
|
search_result_page_image_limit: int
|
||||||
|
search_result_similarity_threshold: float
|
||||||
|
llm_assistant: InternalLlmAssistant | None
|
||||||
|
provider_payloads: dict[str, dict[str, Any]]
|
||||||
|
|
||||||
|
|
||||||
|
def build_provider_runtime(
|
||||||
|
env: Mapping[str, str],
|
||||||
|
transport: Any | None = None,
|
||||||
|
) -> ProviderRuntime:
|
||||||
|
naver_adapter = _naver_adapter(env, transport)
|
||||||
|
google_adapter = _google_adapter(env, transport)
|
||||||
|
google_custom_search_adapter = _google_custom_search_adapter(env, transport)
|
||||||
|
llm_assistant = _llm_assistant(env, transport)
|
||||||
|
search_limit = _optional_int(env.get("COPYRIGHTER_NAVER_DAILY_LIMIT"))
|
||||||
|
google_limit = _optional_int(env.get("COPYRIGHTER_GOOGLE_DAILY_LIMIT"))
|
||||||
|
google_custom_search_limit = _optional_int(env.get("COPYRIGHTER_GOOGLE_CUSTOM_SEARCH_DAILY_LIMIT"))
|
||||||
|
llm_limit = _optional_int(env.get("COPYRIGHTER_LLM_DAILY_LIMIT"))
|
||||||
|
face_crop_search_enabled = _truthy(env.get("COPYRIGHTER_GOOGLE_FACE_CROP_SEARCH"))
|
||||||
|
auto_naver_query_limit = _bounded_int(
|
||||||
|
env.get("COPYRIGHTER_AUTO_NAVER_QUERY_LIMIT"),
|
||||||
|
default=3,
|
||||||
|
minimum=0,
|
||||||
|
maximum=10,
|
||||||
|
)
|
||||||
|
auto_naver_blog_query_limit = _bounded_int(
|
||||||
|
env.get("COPYRIGHTER_AUTO_NAVER_BLOG_QUERY_LIMIT"),
|
||||||
|
default=1,
|
||||||
|
minimum=0,
|
||||||
|
maximum=10,
|
||||||
|
)
|
||||||
|
auto_naver_web_query_limit = _bounded_int(
|
||||||
|
env.get("COPYRIGHTER_AUTO_NAVER_WEB_QUERY_LIMIT"),
|
||||||
|
default=1,
|
||||||
|
minimum=0,
|
||||||
|
maximum=10,
|
||||||
|
)
|
||||||
|
auto_google_custom_query_limit = _bounded_int(
|
||||||
|
env.get("COPYRIGHTER_AUTO_GOOGLE_CUSTOM_QUERY_LIMIT"),
|
||||||
|
default=2,
|
||||||
|
minimum=0,
|
||||||
|
maximum=10,
|
||||||
|
)
|
||||||
|
search_result_compare_limit = _bounded_int(
|
||||||
|
env.get("COPYRIGHTER_SEARCH_RESULT_COMPARE_LIMIT"),
|
||||||
|
default=3,
|
||||||
|
minimum=0,
|
||||||
|
maximum=20,
|
||||||
|
)
|
||||||
|
search_result_page_image_limit = _bounded_int(
|
||||||
|
env.get("COPYRIGHTER_SEARCH_RESULT_PAGE_IMAGE_LIMIT"),
|
||||||
|
default=3,
|
||||||
|
minimum=0,
|
||||||
|
maximum=10,
|
||||||
|
)
|
||||||
|
search_result_similarity_threshold = _bounded_float(
|
||||||
|
env.get("COPYRIGHTER_SEARCH_RESULT_SIMILARITY_THRESHOLD"),
|
||||||
|
default=0.9,
|
||||||
|
minimum=0.0,
|
||||||
|
maximum=1.0,
|
||||||
|
)
|
||||||
|
|
||||||
|
return ProviderRuntime(
|
||||||
|
naver_adapter=naver_adapter,
|
||||||
|
search_policy=SearchApiPolicy(
|
||||||
|
disabled=naver_adapter is None,
|
||||||
|
compliance_approved=naver_adapter is not None,
|
||||||
|
daily_limit=search_limit,
|
||||||
|
),
|
||||||
|
google_adapter=google_adapter,
|
||||||
|
external_policy=ExternalApiPolicy(
|
||||||
|
disabled=google_adapter is None,
|
||||||
|
compliance_approved=google_adapter is not None,
|
||||||
|
metadata_logging_accepted=google_adapter is not None,
|
||||||
|
allow_online_sync=google_adapter is not None,
|
||||||
|
daily_limit=google_limit,
|
||||||
|
),
|
||||||
|
google_custom_search_adapter=google_custom_search_adapter,
|
||||||
|
google_custom_search_policy=SearchApiPolicy(
|
||||||
|
disabled=google_custom_search_adapter is None,
|
||||||
|
compliance_approved=google_custom_search_adapter is not None,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
daily_limit=google_custom_search_limit,
|
||||||
|
),
|
||||||
|
face_crop_web_detection_enabled=face_crop_search_enabled,
|
||||||
|
auto_naver_query_limit=auto_naver_query_limit,
|
||||||
|
auto_naver_blog_query_limit=auto_naver_blog_query_limit,
|
||||||
|
auto_naver_web_query_limit=auto_naver_web_query_limit,
|
||||||
|
auto_google_custom_query_limit=auto_google_custom_query_limit,
|
||||||
|
search_result_compare_limit=search_result_compare_limit,
|
||||||
|
search_result_page_image_limit=search_result_page_image_limit,
|
||||||
|
search_result_similarity_threshold=search_result_similarity_threshold,
|
||||||
|
llm_assistant=llm_assistant,
|
||||||
|
provider_payloads={
|
||||||
|
"internal": _provider_payload(
|
||||||
|
"internal",
|
||||||
|
"Internal analysis",
|
||||||
|
True,
|
||||||
|
"local only",
|
||||||
|
1000,
|
||||||
|
"없음",
|
||||||
|
"로컬 지문, 지식 DB, 얼굴/사람 존재 신호만 사용",
|
||||||
|
env=env,
|
||||||
|
),
|
||||||
|
"naver": _provider_payload(
|
||||||
|
"naver",
|
||||||
|
"Naver search",
|
||||||
|
naver_adapter is not None,
|
||||||
|
"text-query API configured" if naver_adapter else "text-query client not configured",
|
||||||
|
search_limit,
|
||||||
|
"없음" if naver_adapter else "missing NAVER_CLIENT_ID or NAVER_CLIENT_SECRET",
|
||||||
|
"텍스트 쿼리만 허용; 이미지 전송 금지",
|
||||||
|
required_env=["NAVER_CLIENT_ID", "NAVER_CLIENT_SECRET"],
|
||||||
|
env=env,
|
||||||
|
),
|
||||||
|
"google": _provider_payload(
|
||||||
|
"google",
|
||||||
|
"Google Web Detection",
|
||||||
|
google_adapter is not None,
|
||||||
|
"REST images:annotate configured" if google_adapter else "client not configured",
|
||||||
|
google_limit,
|
||||||
|
"없음" if google_adapter else "missing GOOGLE_CLOUD_VISION_API_KEY",
|
||||||
|
"승인된 파생 이미지만 허용",
|
||||||
|
required_env=["GOOGLE_CLOUD_VISION_API_KEY"],
|
||||||
|
env=env,
|
||||||
|
),
|
||||||
|
"google_search": _provider_payload(
|
||||||
|
"google_search",
|
||||||
|
"Google Custom Search",
|
||||||
|
google_custom_search_adapter is not None,
|
||||||
|
"Programmable Search configured" if google_custom_search_adapter else "client not configured",
|
||||||
|
google_custom_search_limit,
|
||||||
|
"없음" if google_custom_search_adapter else "missing GOOGLE_CUSTOM_SEARCH_API_KEY or GOOGLE_CUSTOM_SEARCH_CX",
|
||||||
|
"text-query web/image search; no submitted image upload",
|
||||||
|
required_env=["GOOGLE_CUSTOM_SEARCH_API_KEY", "GOOGLE_CUSTOM_SEARCH_CX"],
|
||||||
|
env=env,
|
||||||
|
),
|
||||||
|
"llm": _provider_payload(
|
||||||
|
"llm",
|
||||||
|
"Ollama local LLM",
|
||||||
|
llm_assistant is not None,
|
||||||
|
f"Ollama local API configured ({env.get('OLLAMA_MODEL', 'qwen2.5:0.5b-instruct')})",
|
||||||
|
llm_limit,
|
||||||
|
"없음",
|
||||||
|
"source-linked summary only",
|
||||||
|
required_env=["OLLAMA_BASE_URL", "OLLAMA_MODEL"],
|
||||||
|
env=env,
|
||||||
|
),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _naver_adapter(env: Mapping[str, str], transport: Any | None) -> NaverSearchAdapter | None:
|
||||||
|
client_id = env.get("NAVER_CLIENT_ID")
|
||||||
|
client_secret = env.get("NAVER_CLIENT_SECRET")
|
||||||
|
if not client_id or not client_secret:
|
||||||
|
return None
|
||||||
|
client = NaverOpenApiSearchClient(
|
||||||
|
client_id=client_id,
|
||||||
|
client_secret=client_secret,
|
||||||
|
transport=transport,
|
||||||
|
display=_optional_int(env.get("NAVER_SEARCH_DISPLAY")) or 10,
|
||||||
|
sort=env.get("NAVER_SEARCH_SORT", "sim"),
|
||||||
|
image_pages=_bounded_int(env.get("NAVER_SEARCH_PAGES"), default=1, minimum=1, maximum=10),
|
||||||
|
blog_display=_optional_int(env.get("NAVER_BLOG_SEARCH_DISPLAY")) or 3,
|
||||||
|
blog_sort=env.get("NAVER_BLOG_SEARCH_SORT", "sim"),
|
||||||
|
blog_pages=_bounded_int(env.get("NAVER_BLOG_SEARCH_PAGES"), default=1, minimum=1, maximum=10),
|
||||||
|
web_display=_optional_int(env.get("NAVER_WEB_SEARCH_DISPLAY")) or 3,
|
||||||
|
web_pages=_bounded_int(env.get("NAVER_WEB_SEARCH_PAGES"), default=1, minimum=1, maximum=10),
|
||||||
|
)
|
||||||
|
return NaverSearchAdapter(client)
|
||||||
|
|
||||||
|
|
||||||
|
def _google_adapter(env: Mapping[str, str], transport: Any | None) -> CloudVisionWebDetectionAdapter | None:
|
||||||
|
api_key = env.get("GOOGLE_CLOUD_VISION_API_KEY")
|
||||||
|
if not api_key:
|
||||||
|
return None
|
||||||
|
client = GoogleVisionRestClient(
|
||||||
|
api_key=api_key,
|
||||||
|
transport=transport,
|
||||||
|
parent=env.get("GOOGLE_CLOUD_VISION_PARENT"),
|
||||||
|
)
|
||||||
|
return CloudVisionWebDetectionAdapter(client)
|
||||||
|
|
||||||
|
|
||||||
|
def _google_custom_search_adapter(
|
||||||
|
env: Mapping[str, str], transport: Any | None
|
||||||
|
) -> GoogleCustomSearchAdapter | None:
|
||||||
|
api_key = env.get("GOOGLE_CUSTOM_SEARCH_API_KEY")
|
||||||
|
cx = env.get("GOOGLE_CUSTOM_SEARCH_CX")
|
||||||
|
if not api_key or not cx:
|
||||||
|
return None
|
||||||
|
client = GoogleCustomSearchClient(
|
||||||
|
api_key=api_key,
|
||||||
|
cx=cx,
|
||||||
|
transport=transport,
|
||||||
|
image_num=_bounded_int(env.get("GOOGLE_CUSTOM_SEARCH_IMAGE_RESULTS"), default=3, minimum=1, maximum=10),
|
||||||
|
web_num=_bounded_int(env.get("GOOGLE_CUSTOM_SEARCH_WEB_RESULTS"), default=3, minimum=1, maximum=10),
|
||||||
|
image_pages=_bounded_int(env.get("GOOGLE_CUSTOM_SEARCH_IMAGE_PAGES"), default=1, minimum=1, maximum=10),
|
||||||
|
web_pages=_bounded_int(env.get("GOOGLE_CUSTOM_SEARCH_WEB_PAGES"), default=1, minimum=1, maximum=10),
|
||||||
|
)
|
||||||
|
return GoogleCustomSearchAdapter(client)
|
||||||
|
|
||||||
|
|
||||||
|
def _llm_assistant(env: Mapping[str, str], transport: Any | None) -> InternalLlmAssistant | None:
|
||||||
|
client = OllamaGenerateLlmClient(
|
||||||
|
base_url=env.get("OLLAMA_BASE_URL", "http://127.0.0.1:11434"),
|
||||||
|
model=env.get("OLLAMA_MODEL", "qwen2.5:0.5b-instruct"),
|
||||||
|
transport=transport,
|
||||||
|
)
|
||||||
|
return InternalLlmAssistant(client)
|
||||||
|
|
||||||
|
|
||||||
|
def _provider_payload(
|
||||||
|
provider_id: str,
|
||||||
|
name: str,
|
||||||
|
enabled: bool,
|
||||||
|
compliance: str,
|
||||||
|
quota: int | None,
|
||||||
|
last_failure: str,
|
||||||
|
boundary: str,
|
||||||
|
required_env: list[str] | None = None,
|
||||||
|
env: Mapping[str, str] | None = None,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
required_env = required_env or []
|
||||||
|
env = env or {}
|
||||||
|
return {
|
||||||
|
"id": provider_id,
|
||||||
|
"name": name,
|
||||||
|
"enabled": enabled,
|
||||||
|
"compliance": compliance,
|
||||||
|
"usage": 0,
|
||||||
|
"quota": quota,
|
||||||
|
"lastSuccess": "없음",
|
||||||
|
"lastFailure": last_failure,
|
||||||
|
"boundary": boundary,
|
||||||
|
"adminOnly": True,
|
||||||
|
"requiredEnv": required_env,
|
||||||
|
"configuredEnv": {key: bool(env.get(key)) for key in required_env},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _optional_int(value: str | None) -> int | None:
|
||||||
|
if not value:
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
return int(value)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _bounded_int(value: str | None, default: int, minimum: int, maximum: int) -> int:
|
||||||
|
if not value:
|
||||||
|
return default
|
||||||
|
try:
|
||||||
|
parsed = int(value)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
return default
|
||||||
|
return max(minimum, min(maximum, parsed))
|
||||||
|
|
||||||
|
|
||||||
|
def _bounded_float(value: str | None, default: float, minimum: float, maximum: float) -> float:
|
||||||
|
if not value:
|
||||||
|
return default
|
||||||
|
try:
|
||||||
|
parsed = float(value)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
return default
|
||||||
|
return max(minimum, min(maximum, parsed))
|
||||||
|
|
||||||
|
|
||||||
|
def _truthy(value: str | None) -> bool:
|
||||||
|
return str(value or "").strip().lower() in {"1", "true", "yes", "y", "on"}
|
||||||
|
|
||||||
|
|
||||||
|
def _now_label() -> str:
|
||||||
|
return datetime.now().strftime("%Y-%m-%d %H:%M:%S")
|
||||||
29
src/rights_filter/integrations/external_policy.py
Normal file
29
src/rights_filter/integrations/external_policy.py
Normal file
|
|
@ -0,0 +1,29 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ExternalApiPolicy:
|
||||||
|
disabled: bool = False
|
||||||
|
compliance_approved: bool = False
|
||||||
|
metadata_logging_accepted: bool = False
|
||||||
|
allow_online_sync: bool = False
|
||||||
|
daily_limit: int | None = None
|
||||||
|
calls_made: int = 0
|
||||||
|
|
||||||
|
def can_call(self) -> tuple[bool, str | None]:
|
||||||
|
if self.disabled:
|
||||||
|
return False, "external API disabled"
|
||||||
|
if not self.compliance_approved:
|
||||||
|
return False, "external API compliance not approved"
|
||||||
|
if not self.metadata_logging_accepted:
|
||||||
|
return False, "metadata logging not accepted"
|
||||||
|
if not self.allow_online_sync:
|
||||||
|
return False, "online synchronous mode not allowed"
|
||||||
|
if self.daily_limit is not None and self.calls_made >= self.daily_limit:
|
||||||
|
return False, "external API usage limit reached"
|
||||||
|
return True, None
|
||||||
|
|
||||||
|
def record_call(self) -> None:
|
||||||
|
self.calls_made += 1
|
||||||
748
src/rights_filter/integrations/google_custom_search.py
Normal file
748
src/rights_filter/integrations/google_custom_search.py
Normal file
|
|
@ -0,0 +1,748 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import html
|
||||||
|
import re
|
||||||
|
from collections.abc import Iterable
|
||||||
|
from datetime import UTC, datetime
|
||||||
|
from typing import Any
|
||||||
|
from urllib.parse import parse_qsl, unquote, urlencode, urljoin, urlparse
|
||||||
|
|
||||||
|
from rights_filter.domain.records import DataClass, Evidence, EvidenceSource
|
||||||
|
from rights_filter.governance.policies import assert_operator_evidence_payload_allowed
|
||||||
|
from rights_filter.integrations.http_json import UrllibJsonTransport
|
||||||
|
from rights_filter.integrations.search_policy import SearchApiPolicy
|
||||||
|
|
||||||
|
|
||||||
|
class GoogleCustomSearchClient:
|
||||||
|
endpoint = "https://www.googleapis.com/customsearch/v1"
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
api_key: str,
|
||||||
|
cx: str,
|
||||||
|
transport: Any | None = None,
|
||||||
|
image_num: int = 3,
|
||||||
|
web_num: int = 3,
|
||||||
|
image_pages: int = 1,
|
||||||
|
web_pages: int = 1,
|
||||||
|
timeout: int = 10,
|
||||||
|
) -> None:
|
||||||
|
self.api_key = api_key
|
||||||
|
self.cx = cx
|
||||||
|
self.transport = transport or UrllibJsonTransport()
|
||||||
|
self.image_num = min(10, max(1, image_num))
|
||||||
|
self.web_num = min(10, max(1, web_num))
|
||||||
|
self.image_pages = max(1, image_pages)
|
||||||
|
self.web_pages = max(1, web_pages)
|
||||||
|
self.timeout = timeout
|
||||||
|
|
||||||
|
def search_image(self, query: str) -> dict[str, Any]:
|
||||||
|
return self._search_pages(
|
||||||
|
query,
|
||||||
|
num=self.image_num,
|
||||||
|
pages=self.image_pages,
|
||||||
|
extra_params={"searchType": "image"},
|
||||||
|
)
|
||||||
|
|
||||||
|
def search_web(self, query: str) -> dict[str, Any]:
|
||||||
|
return self._search_pages(
|
||||||
|
query,
|
||||||
|
num=self.web_num,
|
||||||
|
pages=self.web_pages,
|
||||||
|
extra_params={},
|
||||||
|
)
|
||||||
|
|
||||||
|
def _search_pages(
|
||||||
|
self,
|
||||||
|
query: str,
|
||||||
|
num: int,
|
||||||
|
pages: int,
|
||||||
|
extra_params: dict[str, Any],
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
merged: dict[str, Any] = {}
|
||||||
|
items: list[Any] = []
|
||||||
|
for page_index in range(max(1, pages)):
|
||||||
|
params = {
|
||||||
|
"key": self.api_key,
|
||||||
|
"cx": self.cx,
|
||||||
|
"q": query,
|
||||||
|
"num": num,
|
||||||
|
"start": 1 + (page_index * num),
|
||||||
|
**extra_params,
|
||||||
|
}
|
||||||
|
response = self.transport.request_json(
|
||||||
|
"GET",
|
||||||
|
f"{self.endpoint}?{urlencode(params)}",
|
||||||
|
timeout=self.timeout,
|
||||||
|
)
|
||||||
|
if not merged:
|
||||||
|
merged = dict(response)
|
||||||
|
page_items = response.get("items", [])
|
||||||
|
if not page_items:
|
||||||
|
break
|
||||||
|
items.extend(page_items)
|
||||||
|
merged["items"] = items
|
||||||
|
return merged
|
||||||
|
|
||||||
|
|
||||||
|
class GoogleCustomSearchAdapter:
|
||||||
|
provider = "google_custom_search"
|
||||||
|
|
||||||
|
def __init__(self, client: Any) -> None:
|
||||||
|
self.client = client
|
||||||
|
|
||||||
|
def search_images(
|
||||||
|
self, submission_id: str, query: str, policy: SearchApiPolicy
|
||||||
|
) -> list[Evidence]:
|
||||||
|
if not isinstance(query, str):
|
||||||
|
raise ValueError("Google custom search requires a text query")
|
||||||
|
|
||||||
|
requested_calls = _requested_calls(self.client, "image_pages")
|
||||||
|
allowed, reason = policy.can_call(self.provider, requested_calls=requested_calls)
|
||||||
|
if not allowed:
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.SEARCH_SKIPPED,
|
||||||
|
reason=reason or "search API skipped",
|
||||||
|
confidence=1.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": self.provider,
|
||||||
|
"search_type": "image",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _image_query_signature(query),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = self.client.search_image(query)
|
||||||
|
except Exception as exc: # pragma: no cover - defensive adapter boundary
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.ENRICHMENT_FAILURE,
|
||||||
|
reason=f"Google custom image search failed: {exc}",
|
||||||
|
confidence=1.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": self.provider,
|
||||||
|
"search_type": "image",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _image_query_signature(query),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
policy.record_call(requested_calls)
|
||||||
|
return _map_image_response(submission_id, query, response)
|
||||||
|
|
||||||
|
def search_web_pages(
|
||||||
|
self, submission_id: str, query: str, policy: SearchApiPolicy
|
||||||
|
) -> list[Evidence]:
|
||||||
|
if not isinstance(query, str):
|
||||||
|
raise ValueError("Google custom search requires a text query")
|
||||||
|
|
||||||
|
requested_calls = _requested_calls(self.client, "web_pages")
|
||||||
|
allowed, reason = policy.can_call(self.provider, requested_calls=requested_calls)
|
||||||
|
if not allowed:
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.SEARCH_SKIPPED,
|
||||||
|
reason=reason or "search API skipped",
|
||||||
|
confidence=1.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": self.provider,
|
||||||
|
"search_type": "web",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _web_query_signature(query),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = self.client.search_web(query)
|
||||||
|
except Exception as exc: # pragma: no cover - defensive adapter boundary
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.ENRICHMENT_FAILURE,
|
||||||
|
reason=f"Google custom web search failed: {exc}",
|
||||||
|
confidence=1.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": self.provider,
|
||||||
|
"search_type": "web",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _web_query_signature(query),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
policy.record_call(requested_calls)
|
||||||
|
return _map_web_response(submission_id, query, response)
|
||||||
|
|
||||||
|
|
||||||
|
def _map_image_response(
|
||||||
|
submission_id: str, query: str, response: dict[str, Any]
|
||||||
|
) -> list[Evidence]:
|
||||||
|
items = response.get("items", [])
|
||||||
|
if not items:
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google custom image search returned no results",
|
||||||
|
confidence=0.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": "google_custom_search",
|
||||||
|
"search_type": "image",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _image_query_signature(query),
|
||||||
|
"retrieved_at": _now_iso(),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
evidence: list[Evidence] = []
|
||||||
|
for rank, item in enumerate(items, start=1):
|
||||||
|
image = item.get("image", {}) or {}
|
||||||
|
image_url = _normalize_image_url(str(item.get("link", "")), "")
|
||||||
|
thumbnail_url = _normalize_image_url(str(image.get("thumbnailLink", "")), "")
|
||||||
|
result_url = _normalize_result_url(image.get("contextLink", "")) or image_url
|
||||||
|
payload = {
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": "google_custom_search",
|
||||||
|
"search_type": "image",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _image_query_signature(query),
|
||||||
|
"rank": rank,
|
||||||
|
"title": item.get("title", ""),
|
||||||
|
"description": item.get("snippet", ""),
|
||||||
|
"image_url": image_url,
|
||||||
|
"thumbnail_url": thumbnail_url,
|
||||||
|
"result_url": result_url,
|
||||||
|
"domain": item.get("displayLink", ""),
|
||||||
|
"page_image_urls": _page_image_urls_from_pagemap(
|
||||||
|
item.get("pagemap", {}) or {},
|
||||||
|
str(result_url),
|
||||||
|
),
|
||||||
|
"height": image.get("height"),
|
||||||
|
"width": image.get("width"),
|
||||||
|
"match": "search_image",
|
||||||
|
"retrieved_at": _now_iso(),
|
||||||
|
}
|
||||||
|
assert_operator_evidence_payload_allowed(DataClass.SEARCH_EVIDENCE, payload)
|
||||||
|
evidence.append(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google custom image search result found",
|
||||||
|
confidence=0.45,
|
||||||
|
data=payload,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return evidence
|
||||||
|
|
||||||
|
|
||||||
|
def _map_web_response(
|
||||||
|
submission_id: str, query: str, response: dict[str, Any]
|
||||||
|
) -> list[Evidence]:
|
||||||
|
items = response.get("items", [])
|
||||||
|
if not items:
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google custom web search returned no results",
|
||||||
|
confidence=0.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": "google_custom_search",
|
||||||
|
"search_type": "web",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _web_query_signature(query),
|
||||||
|
"retrieved_at": _now_iso(),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
evidence: list[Evidence] = []
|
||||||
|
for rank, item in enumerate(items, start=1):
|
||||||
|
result_url = _normalize_result_url(item.get("link", ""))
|
||||||
|
payload = {
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": "google_custom_search",
|
||||||
|
"search_type": "web",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _web_query_signature(query),
|
||||||
|
"rank": rank,
|
||||||
|
"title": item.get("title", ""),
|
||||||
|
"description": item.get("snippet", ""),
|
||||||
|
"result_url": result_url,
|
||||||
|
"domain": item.get("displayLink", ""),
|
||||||
|
"page_image_urls": _page_image_urls_from_pagemap(
|
||||||
|
item.get("pagemap", {}) or {},
|
||||||
|
result_url,
|
||||||
|
),
|
||||||
|
"match": "page",
|
||||||
|
"retrieved_at": _now_iso(),
|
||||||
|
}
|
||||||
|
assert_operator_evidence_payload_allowed(DataClass.SEARCH_EVIDENCE, payload)
|
||||||
|
evidence.append(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google custom web search result found",
|
||||||
|
confidence=0.35,
|
||||||
|
data=payload,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return evidence
|
||||||
|
|
||||||
|
|
||||||
|
def _page_image_urls_from_pagemap(pagemap: dict[str, Any], base_url: str = "") -> list[str]:
|
||||||
|
candidates: list[str] = []
|
||||||
|
for item in pagemap.get("cse_image", []) or []:
|
||||||
|
if item.get("src"):
|
||||||
|
candidates.append(str(item["src"]))
|
||||||
|
for item in pagemap.get("cse_thumbnail", []) or []:
|
||||||
|
candidates.extend(_image_values(item, ("src", "url")))
|
||||||
|
for item in pagemap.get("imageobject", []) or []:
|
||||||
|
candidates.extend(
|
||||||
|
_image_values(
|
||||||
|
item,
|
||||||
|
(
|
||||||
|
"url",
|
||||||
|
"contenturl",
|
||||||
|
"contentUrl",
|
||||||
|
"thumbnailurl",
|
||||||
|
"thumbnailUrl",
|
||||||
|
"src",
|
||||||
|
),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
for item in pagemap.get("thumbnail", []) or []:
|
||||||
|
candidates.extend(_image_values(item, ("src", "url")))
|
||||||
|
for item in pagemap.get("metatags", []) or []:
|
||||||
|
for key in (
|
||||||
|
"og:image",
|
||||||
|
"og:image:url",
|
||||||
|
"og:image:secure_url",
|
||||||
|
"twitter:image",
|
||||||
|
"twitter:image:src",
|
||||||
|
"twitter:image:url",
|
||||||
|
):
|
||||||
|
if item.get(key):
|
||||||
|
candidates.append(str(item[key]))
|
||||||
|
candidates.extend(_generic_pagemap_image_values(pagemap))
|
||||||
|
return _unique_texts(
|
||||||
|
_normalize_image_url(candidate, base_url)
|
||||||
|
for candidate in _expanded_image_candidates(candidates)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _image_values(item: Any, keys: tuple[str, ...]) -> list[str]:
|
||||||
|
if not isinstance(item, dict):
|
||||||
|
return []
|
||||||
|
return [str(item[key]) for key in keys if item.get(key)]
|
||||||
|
|
||||||
|
|
||||||
|
def _generic_pagemap_image_values(pagemap: dict[str, Any]) -> list[str]:
|
||||||
|
candidates: list[str] = []
|
||||||
|
image_context_keys = {
|
||||||
|
"associatedmedia",
|
||||||
|
"contenturl",
|
||||||
|
"image",
|
||||||
|
"logo",
|
||||||
|
"mediaurl",
|
||||||
|
"photo",
|
||||||
|
"picture",
|
||||||
|
"primaryimageofpage",
|
||||||
|
"thumbnail",
|
||||||
|
"thumbnailurl",
|
||||||
|
}
|
||||||
|
image_value_keys = {
|
||||||
|
"content_url",
|
||||||
|
"contenturl",
|
||||||
|
"content_src",
|
||||||
|
"contentsrc",
|
||||||
|
"download_url",
|
||||||
|
"downloadurl",
|
||||||
|
"file_url",
|
||||||
|
"fileurl",
|
||||||
|
"href",
|
||||||
|
"image_url",
|
||||||
|
"imageurl",
|
||||||
|
"media_url",
|
||||||
|
"mediaurl",
|
||||||
|
"original_url",
|
||||||
|
"originalurl",
|
||||||
|
"public_url",
|
||||||
|
"publicurl",
|
||||||
|
"secure_url",
|
||||||
|
"secureurl",
|
||||||
|
"src_set",
|
||||||
|
"srcset",
|
||||||
|
"src",
|
||||||
|
"thumbnail_url",
|
||||||
|
"thumbnailurl",
|
||||||
|
"url",
|
||||||
|
}
|
||||||
|
direct_image_value_keys = image_context_keys | image_value_keys | {
|
||||||
|
"og_image",
|
||||||
|
"og_image_url",
|
||||||
|
"og_image_secure_url",
|
||||||
|
"twitter_image",
|
||||||
|
"twitter_image_src",
|
||||||
|
"twitter_image_url",
|
||||||
|
}
|
||||||
|
|
||||||
|
def normalized_key(value: Any) -> str:
|
||||||
|
return str(value).lower().replace("-", "_").replace(":", "_")
|
||||||
|
|
||||||
|
def is_image_context(key: Any) -> bool:
|
||||||
|
normalized = normalized_key(key)
|
||||||
|
return normalized in image_context_keys or any(
|
||||||
|
token in normalized
|
||||||
|
for token in ("image", "thumbnail", "photo", "picture", "poster")
|
||||||
|
)
|
||||||
|
|
||||||
|
def is_text_or_text_list(value: Any) -> bool:
|
||||||
|
if isinstance(value, str):
|
||||||
|
return True
|
||||||
|
return isinstance(value, list) and all(isinstance(item, str) for item in value)
|
||||||
|
|
||||||
|
def is_direct_image_value_key(key: Any) -> bool:
|
||||||
|
normalized = normalized_key(key)
|
||||||
|
if normalized in direct_image_value_keys:
|
||||||
|
return True
|
||||||
|
return any(
|
||||||
|
normalized.endswith(f"_{suffix}")
|
||||||
|
for suffix in ("image", "thumbnail", "photo", "picture", "poster", "logo")
|
||||||
|
)
|
||||||
|
|
||||||
|
def visit(value: Any, in_image_context: bool = False, collect_strings: bool = False) -> None:
|
||||||
|
if isinstance(value, str):
|
||||||
|
if collect_strings:
|
||||||
|
candidates.append(value)
|
||||||
|
return
|
||||||
|
if isinstance(value, list):
|
||||||
|
for item in value:
|
||||||
|
visit(item, in_image_context, collect_strings)
|
||||||
|
return
|
||||||
|
if not isinstance(value, dict):
|
||||||
|
return
|
||||||
|
for key, child in value.items():
|
||||||
|
key_text = normalized_key(key)
|
||||||
|
next_context = in_image_context or is_image_context(key)
|
||||||
|
if next_context and (
|
||||||
|
key_text in image_value_keys
|
||||||
|
or (
|
||||||
|
is_image_context(key)
|
||||||
|
and is_direct_image_value_key(key)
|
||||||
|
and is_text_or_text_list(child)
|
||||||
|
)
|
||||||
|
):
|
||||||
|
visit(child, True, True)
|
||||||
|
elif next_context:
|
||||||
|
visit(child, True, False)
|
||||||
|
else:
|
||||||
|
visit(child, False, False)
|
||||||
|
|
||||||
|
visit(pagemap)
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
def _expanded_image_candidates(candidates: list[str]) -> list[str]:
|
||||||
|
expanded: list[str] = []
|
||||||
|
for candidate in candidates:
|
||||||
|
text = str(candidate).strip()
|
||||||
|
if _looks_like_srcset(text):
|
||||||
|
expanded.extend(_srcset_image_urls(text))
|
||||||
|
else:
|
||||||
|
expanded.append(text)
|
||||||
|
return expanded
|
||||||
|
|
||||||
|
|
||||||
|
def _looks_like_srcset(value: str) -> bool:
|
||||||
|
if "," not in value:
|
||||||
|
return False
|
||||||
|
return bool(re.search(r"\s+\d+(?:\.\d+)?[wx](?:\s*,|$)", value, flags=re.IGNORECASE))
|
||||||
|
|
||||||
|
|
||||||
|
def _srcset_image_urls(value: str) -> list[str]:
|
||||||
|
ranked: list[tuple[float, int, str]] = []
|
||||||
|
for order, candidate in enumerate(_split_srcset_candidates(value)):
|
||||||
|
parts = candidate.split()
|
||||||
|
if not parts:
|
||||||
|
continue
|
||||||
|
ranked.append((_srcset_descriptor_score(parts[1] if len(parts) > 1 else ""), order, parts[0]))
|
||||||
|
ranked.sort(key=lambda item: (-item[0], item[1]))
|
||||||
|
return [url for _score, _order, url in ranked]
|
||||||
|
|
||||||
|
|
||||||
|
def _split_srcset_candidates(value: str) -> list[str]:
|
||||||
|
candidates: list[str] = []
|
||||||
|
start = 0
|
||||||
|
text = str(value)
|
||||||
|
for index, character in enumerate(text):
|
||||||
|
if character != ",":
|
||||||
|
continue
|
||||||
|
remainder = text[index + 1 :].lstrip()
|
||||||
|
if not _starts_srcset_candidate(remainder):
|
||||||
|
continue
|
||||||
|
candidates.append(text[start:index])
|
||||||
|
start = index + 1
|
||||||
|
candidates.append(text[start:])
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
def _starts_srcset_candidate(value: str) -> bool:
|
||||||
|
text = str(value).strip()
|
||||||
|
if not text:
|
||||||
|
return False
|
||||||
|
first_token = text.split(None, 1)[0]
|
||||||
|
return _is_urlish_reference(first_token) or _is_scheme_less_remote_image_url(first_token)
|
||||||
|
|
||||||
|
|
||||||
|
def _is_urlish_reference(value: str) -> bool:
|
||||||
|
text = str(value).strip()
|
||||||
|
return text.startswith(("http://", "https://", "//", "/", "./", "../")) or _url_looks_like_image(text)
|
||||||
|
|
||||||
|
|
||||||
|
def _srcset_descriptor_score(value: str) -> float:
|
||||||
|
descriptor = str(value).strip().lower()
|
||||||
|
if descriptor.endswith("w"):
|
||||||
|
try:
|
||||||
|
return float(descriptor[:-1])
|
||||||
|
except ValueError:
|
||||||
|
return 0.0
|
||||||
|
if descriptor.endswith("x"):
|
||||||
|
try:
|
||||||
|
density = float(descriptor[:-1])
|
||||||
|
except ValueError:
|
||||||
|
return 0.0
|
||||||
|
if density <= 0.0:
|
||||||
|
return 0.0
|
||||||
|
# Map density descriptors into a strictly-below-1 band so any real pixel
|
||||||
|
# width ('w', >= 1) outranks them in a malformed mixed-unit srcset, while
|
||||||
|
# preserving x-vs-x ordering (monotonic in density).
|
||||||
|
return density / (density + 1.0)
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_image_url(url: str, base_url: str) -> str:
|
||||||
|
text = _decoded_url_reference(_clean_url(url))
|
||||||
|
if not text or text.lower().startswith("data:"):
|
||||||
|
return ""
|
||||||
|
if _is_scheme_less_remote_image_url(text):
|
||||||
|
text = f"https://{text.lstrip('/')}"
|
||||||
|
normalized = urljoin(base_url, text) if base_url else text
|
||||||
|
normalized = _unwrapped_image_url(normalized) or normalized
|
||||||
|
parsed = urlparse(normalized)
|
||||||
|
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
|
||||||
|
return ""
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
|
||||||
|
def _clean_url(value: Any) -> str:
|
||||||
|
return html.unescape(str(value or "").strip())
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_result_url(value: Any) -> str:
|
||||||
|
text = _decoded_url_reference(_clean_url(value))
|
||||||
|
if not text or text.lower().startswith("data:"):
|
||||||
|
return ""
|
||||||
|
if text.startswith("//"):
|
||||||
|
text = f"https:{text}"
|
||||||
|
normalized = _unwrapped_result_url(text) or text
|
||||||
|
parsed = urlparse(normalized)
|
||||||
|
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
|
||||||
|
return ""
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
|
||||||
|
def _unwrapped_result_url(url: str) -> str:
|
||||||
|
parsed = urlparse(url)
|
||||||
|
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
|
||||||
|
return ""
|
||||||
|
if "google" not in parsed.netloc.lower():
|
||||||
|
return ""
|
||||||
|
|
||||||
|
redirect_keys = {
|
||||||
|
"url",
|
||||||
|
"u",
|
||||||
|
"q",
|
||||||
|
"target",
|
||||||
|
"redirect",
|
||||||
|
"redirect_url",
|
||||||
|
"imgrefurl",
|
||||||
|
"page_url",
|
||||||
|
}
|
||||||
|
for key, value in parse_qsl(parsed.query, keep_blank_values=False):
|
||||||
|
if key.lower().replace("-", "_") not in redirect_keys:
|
||||||
|
continue
|
||||||
|
candidate = _decoded_nested_url(value)
|
||||||
|
if candidate.startswith("//"):
|
||||||
|
candidate = f"https:{candidate}"
|
||||||
|
if _is_http_url(candidate):
|
||||||
|
return candidate
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def _unwrapped_image_url(url: str) -> str:
|
||||||
|
parsed = urlparse(url)
|
||||||
|
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
strong_keys = {
|
||||||
|
"imgurl",
|
||||||
|
"imageurl",
|
||||||
|
"image_url",
|
||||||
|
"mediaurl",
|
||||||
|
"media_url",
|
||||||
|
"contenturl",
|
||||||
|
"content_url",
|
||||||
|
"photo",
|
||||||
|
"photo_url",
|
||||||
|
"src",
|
||||||
|
"source",
|
||||||
|
"image",
|
||||||
|
"img",
|
||||||
|
}
|
||||||
|
weak_keys = {"url", "u", "target", "redirect", "redirect_url"}
|
||||||
|
for key, value in parse_qsl(parsed.query, keep_blank_values=False):
|
||||||
|
key_text = key.lower().replace("-", "_")
|
||||||
|
candidate = _decoded_nested_url(value)
|
||||||
|
if not candidate:
|
||||||
|
continue
|
||||||
|
if not _is_http_url(candidate):
|
||||||
|
if candidate.startswith("//"):
|
||||||
|
candidate = f"https:{candidate}"
|
||||||
|
elif _is_scheme_less_remote_image_url(candidate):
|
||||||
|
candidate = f"https://{candidate.lstrip('/')}"
|
||||||
|
elif candidate.startswith("/") or _url_looks_like_image(candidate):
|
||||||
|
candidate = urljoin(url, candidate)
|
||||||
|
else:
|
||||||
|
continue
|
||||||
|
if key_text in strong_keys and _is_http_url(candidate):
|
||||||
|
return candidate
|
||||||
|
if key_text in weak_keys and _is_http_url(candidate) and _url_looks_like_image(candidate):
|
||||||
|
return candidate
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def _decoded_url_reference(value: str) -> str:
|
||||||
|
raw = str(value).strip()
|
||||||
|
decoded = _decoded_nested_url(raw)
|
||||||
|
if decoded == raw:
|
||||||
|
return raw
|
||||||
|
if (
|
||||||
|
_is_http_url(decoded)
|
||||||
|
or decoded.startswith(("/", "//", "./", "../"))
|
||||||
|
or _is_scheme_less_remote_image_url(decoded)
|
||||||
|
or _url_looks_like_image(decoded)
|
||||||
|
):
|
||||||
|
return decoded
|
||||||
|
return raw
|
||||||
|
|
||||||
|
|
||||||
|
def _decoded_nested_url(value: str) -> str:
|
||||||
|
candidate = str(value).strip()
|
||||||
|
for _ in range(3):
|
||||||
|
decoded = unquote(candidate).strip()
|
||||||
|
if decoded == candidate:
|
||||||
|
break
|
||||||
|
candidate = decoded
|
||||||
|
return candidate
|
||||||
|
|
||||||
|
|
||||||
|
def _is_http_url(value: str) -> bool:
|
||||||
|
parsed = urlparse(value)
|
||||||
|
return parsed.scheme in {"http", "https"} and bool(parsed.netloc)
|
||||||
|
|
||||||
|
|
||||||
|
def _is_scheme_less_remote_image_url(value: str) -> bool:
|
||||||
|
text = str(value).strip().lstrip("/")
|
||||||
|
if not _url_looks_like_image(text):
|
||||||
|
return False
|
||||||
|
first_segment = text.split("/", 1)[0]
|
||||||
|
if first_segment in {".", ".."} or first_segment.startswith("."):
|
||||||
|
return False
|
||||||
|
return "." in first_segment and " " not in first_segment
|
||||||
|
|
||||||
|
|
||||||
|
def _url_path_has_image_suffix(value: str) -> bool:
|
||||||
|
path = urlparse(value).path.lower()
|
||||||
|
return path.endswith(
|
||||||
|
(
|
||||||
|
".jpg",
|
||||||
|
".jpeg",
|
||||||
|
".jfif",
|
||||||
|
".pjp",
|
||||||
|
".pjpeg",
|
||||||
|
".png",
|
||||||
|
".gif",
|
||||||
|
".webp",
|
||||||
|
".avif",
|
||||||
|
".bmp",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _url_has_image_format_hint(value: str) -> bool:
|
||||||
|
image_formats = {
|
||||||
|
"avif",
|
||||||
|
"bmp",
|
||||||
|
"gif",
|
||||||
|
"jpeg",
|
||||||
|
"jfif",
|
||||||
|
"jpg",
|
||||||
|
"pjp",
|
||||||
|
"pjpeg",
|
||||||
|
"png",
|
||||||
|
"webp",
|
||||||
|
}
|
||||||
|
image_format_keys = {"format", "fm", "ext", "extension", "mime", "output", "type"}
|
||||||
|
for key, hint in parse_qsl(urlparse(value).query, keep_blank_values=False):
|
||||||
|
if key.lower().replace("-", "_") not in image_format_keys:
|
||||||
|
continue
|
||||||
|
normalized = hint.lower().split(";", 1)[0].strip().lstrip(".")
|
||||||
|
if normalized.startswith("image/"):
|
||||||
|
normalized = normalized.split("/", 1)[1]
|
||||||
|
normalized = normalized.split("+", 1)[0]
|
||||||
|
if normalized in image_formats:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def _url_looks_like_image(value: str) -> bool:
|
||||||
|
return _url_path_has_image_suffix(value) or _url_has_image_format_hint(value)
|
||||||
|
|
||||||
|
|
||||||
|
def _image_query_signature(query: str) -> str:
|
||||||
|
return "google-custom-image:" + " ".join(query.lower().split())
|
||||||
|
|
||||||
|
|
||||||
|
def _web_query_signature(query: str) -> str:
|
||||||
|
return "google-custom-web:" + " ".join(query.lower().split())
|
||||||
|
|
||||||
|
|
||||||
|
def _requested_calls(client: Any, attribute: str) -> int:
|
||||||
|
try:
|
||||||
|
return max(1, int(getattr(client, attribute, 1) or 1))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return 1
|
||||||
|
|
||||||
|
|
||||||
|
def _unique_texts(values: Iterable[str]) -> list[str]:
|
||||||
|
seen: set[str] = set()
|
||||||
|
result: list[str] = []
|
||||||
|
for value in values:
|
||||||
|
text = str(value).strip()
|
||||||
|
if not text or text in seen:
|
||||||
|
continue
|
||||||
|
seen.add(text)
|
||||||
|
result.append(text)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def _now_iso() -> str:
|
||||||
|
return datetime.now(UTC).replace(microsecond=0).isoformat()
|
||||||
63
src/rights_filter/integrations/http_json.py
Normal file
63
src/rights_filter/integrations/http_json.py
Normal file
|
|
@ -0,0 +1,63 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import socket
|
||||||
|
from typing import Any
|
||||||
|
from urllib.error import HTTPError, URLError
|
||||||
|
from urllib.parse import urlsplit
|
||||||
|
from urllib.request import Request, urlopen
|
||||||
|
|
||||||
|
|
||||||
|
class HttpJsonError(RuntimeError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class UrllibJsonTransport:
|
||||||
|
def __init__(self, connect_timeout: float | None = None) -> None:
|
||||||
|
# When set, probe TCP connectivity with this (short) timeout before the
|
||||||
|
# request so a down host fails fast instead of stalling the caller for
|
||||||
|
# the full read `timeout`. The read still gets the full `timeout`, so a
|
||||||
|
# live-but-slow service (e.g. an LLM generating a response) is unaffected.
|
||||||
|
self.connect_timeout = connect_timeout
|
||||||
|
|
||||||
|
def request_json(
|
||||||
|
self,
|
||||||
|
method: str,
|
||||||
|
url: str,
|
||||||
|
headers: dict[str, str] | None = None,
|
||||||
|
payload: dict[str, Any] | None = None,
|
||||||
|
timeout: int = 10,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
if self.connect_timeout is not None:
|
||||||
|
self._probe_connect(url)
|
||||||
|
data = None if payload is None else json.dumps(payload).encode("utf-8")
|
||||||
|
request = Request(url, data=data, method=method, headers=headers or {})
|
||||||
|
if payload is not None:
|
||||||
|
request.add_header("Content-Type", "application/json")
|
||||||
|
try:
|
||||||
|
with urlopen(request, timeout=timeout) as response:
|
||||||
|
raw = response.read().decode("utf-8")
|
||||||
|
except HTTPError as exc:
|
||||||
|
detail = exc.read().decode("utf-8", errors="replace")
|
||||||
|
raise HttpJsonError(f"HTTP {exc.code}: {detail}") from exc
|
||||||
|
except URLError as exc:
|
||||||
|
raise HttpJsonError(str(exc.reason)) from exc
|
||||||
|
|
||||||
|
if not raw:
|
||||||
|
return {}
|
||||||
|
try:
|
||||||
|
return json.loads(raw)
|
||||||
|
except json.JSONDecodeError as exc:
|
||||||
|
raise HttpJsonError(f"invalid JSON response: {exc}") from exc
|
||||||
|
|
||||||
|
def _probe_connect(self, url: str) -> None:
|
||||||
|
parts = urlsplit(url)
|
||||||
|
host = parts.hostname
|
||||||
|
if not host:
|
||||||
|
return
|
||||||
|
port = parts.port or (443 if parts.scheme == "https" else 80)
|
||||||
|
try:
|
||||||
|
with socket.create_connection((host, port), timeout=self.connect_timeout):
|
||||||
|
pass
|
||||||
|
except OSError as exc:
|
||||||
|
raise HttpJsonError(f"connection to {host}:{port} failed: {exc}") from exc
|
||||||
759
src/rights_filter/integrations/naver_search.py
Normal file
759
src/rights_filter/integrations/naver_search.py
Normal file
|
|
@ -0,0 +1,759 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import html
|
||||||
|
from datetime import UTC, datetime
|
||||||
|
from typing import Any
|
||||||
|
from urllib.parse import parse_qsl, unquote, urlencode, urljoin, urlparse
|
||||||
|
|
||||||
|
from rights_filter.domain.records import DataClass, Evidence, EvidenceSource
|
||||||
|
from rights_filter.governance.policies import assert_operator_evidence_payload_allowed
|
||||||
|
from rights_filter.integrations.http_json import UrllibJsonTransport
|
||||||
|
from rights_filter.integrations.search_policy import SearchApiPolicy
|
||||||
|
|
||||||
|
|
||||||
|
class FakeNaverSearchClient:
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
response: dict[str, Any] | None = None,
|
||||||
|
blog_response: dict[str, Any] | None = None,
|
||||||
|
web_response: dict[str, Any] | None = None,
|
||||||
|
) -> None:
|
||||||
|
self.response = response or {"items": []}
|
||||||
|
self.blog_response = blog_response or {"items": []}
|
||||||
|
self.web_response = web_response or {"items": []}
|
||||||
|
self.calls: list[str] = []
|
||||||
|
self.blog_calls: list[str] = []
|
||||||
|
self.web_calls: list[str] = []
|
||||||
|
|
||||||
|
def search_image(self, query: str) -> dict[str, Any]:
|
||||||
|
self.calls.append(query)
|
||||||
|
return self.response
|
||||||
|
|
||||||
|
def search_blog(self, query: str) -> dict[str, Any]:
|
||||||
|
self.blog_calls.append(query)
|
||||||
|
return self.blog_response
|
||||||
|
|
||||||
|
def search_web(self, query: str) -> dict[str, Any]:
|
||||||
|
self.web_calls.append(query)
|
||||||
|
return self.web_response
|
||||||
|
|
||||||
|
|
||||||
|
class NaverOpenApiSearchClient:
|
||||||
|
image_endpoint = "https://openapi.naver.com/v1/search/image"
|
||||||
|
blog_endpoint = "https://openapi.naver.com/v1/search/blog"
|
||||||
|
web_endpoint = "https://openapi.naver.com/v1/search/webkr"
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
client_id: str,
|
||||||
|
client_secret: str,
|
||||||
|
transport: Any | None = None,
|
||||||
|
display: int = 10,
|
||||||
|
start: int = 1,
|
||||||
|
sort: str = "sim",
|
||||||
|
image_pages: int = 1,
|
||||||
|
blog_display: int = 3,
|
||||||
|
blog_start: int = 1,
|
||||||
|
blog_sort: str = "sim",
|
||||||
|
blog_pages: int = 1,
|
||||||
|
web_display: int = 3,
|
||||||
|
web_start: int = 1,
|
||||||
|
web_pages: int = 1,
|
||||||
|
timeout: int = 10,
|
||||||
|
) -> None:
|
||||||
|
self.client_id = client_id
|
||||||
|
self.client_secret = client_secret
|
||||||
|
self.transport = transport or UrllibJsonTransport()
|
||||||
|
self.display = display
|
||||||
|
self.start = start
|
||||||
|
self.sort = sort
|
||||||
|
self.image_pages = max(1, image_pages)
|
||||||
|
self.blog_display = blog_display
|
||||||
|
self.blog_start = blog_start
|
||||||
|
self.blog_sort = blog_sort
|
||||||
|
self.blog_pages = max(1, blog_pages)
|
||||||
|
self.web_display = web_display
|
||||||
|
self.web_start = web_start
|
||||||
|
self.web_pages = max(1, web_pages)
|
||||||
|
self.timeout = timeout
|
||||||
|
|
||||||
|
def search_image(self, query: str) -> dict[str, Any]:
|
||||||
|
return self._search_pages(
|
||||||
|
self.image_endpoint,
|
||||||
|
query,
|
||||||
|
display=self.display,
|
||||||
|
start=self.start,
|
||||||
|
pages=self.image_pages,
|
||||||
|
extra_params={"sort": self.sort},
|
||||||
|
)
|
||||||
|
|
||||||
|
def search_blog(self, query: str) -> dict[str, Any]:
|
||||||
|
return self._search_pages(
|
||||||
|
self.blog_endpoint,
|
||||||
|
query,
|
||||||
|
display=self.blog_display,
|
||||||
|
start=self.blog_start,
|
||||||
|
pages=self.blog_pages,
|
||||||
|
extra_params={"sort": self.blog_sort},
|
||||||
|
)
|
||||||
|
|
||||||
|
def search_web(self, query: str) -> dict[str, Any]:
|
||||||
|
return self._search_pages(
|
||||||
|
self.web_endpoint,
|
||||||
|
query,
|
||||||
|
display=self.web_display,
|
||||||
|
start=self.web_start,
|
||||||
|
pages=self.web_pages,
|
||||||
|
extra_params={},
|
||||||
|
)
|
||||||
|
|
||||||
|
def _headers(self) -> dict[str, str]:
|
||||||
|
return {
|
||||||
|
"X-Naver-Client-Id": self.client_id,
|
||||||
|
"X-Naver-Client-Secret": self.client_secret,
|
||||||
|
}
|
||||||
|
|
||||||
|
def _search_pages(
|
||||||
|
self,
|
||||||
|
endpoint: str,
|
||||||
|
query: str,
|
||||||
|
display: int,
|
||||||
|
start: int,
|
||||||
|
pages: int,
|
||||||
|
extra_params: dict[str, Any],
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
merged: dict[str, Any] = {}
|
||||||
|
items: list[Any] = []
|
||||||
|
for page_index in range(max(1, pages)):
|
||||||
|
params = urlencode(
|
||||||
|
{
|
||||||
|
"query": query,
|
||||||
|
"display": display,
|
||||||
|
"start": start + (page_index * display),
|
||||||
|
**extra_params,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
response = self.transport.request_json(
|
||||||
|
"GET",
|
||||||
|
f"{endpoint}?{params}",
|
||||||
|
headers=self._headers(),
|
||||||
|
timeout=self.timeout,
|
||||||
|
)
|
||||||
|
if not merged:
|
||||||
|
merged = dict(response)
|
||||||
|
page_items = response.get("items", [])
|
||||||
|
if not page_items:
|
||||||
|
break
|
||||||
|
items.extend(page_items)
|
||||||
|
merged["items"] = items
|
||||||
|
return merged
|
||||||
|
|
||||||
|
|
||||||
|
class NaverSearchAdapter:
|
||||||
|
provider = "naver"
|
||||||
|
|
||||||
|
def __init__(self, client: Any) -> None:
|
||||||
|
self.client = client
|
||||||
|
|
||||||
|
def search(
|
||||||
|
self, submission_id: str, query: str, policy: SearchApiPolicy
|
||||||
|
) -> list[Evidence]:
|
||||||
|
if not isinstance(query, str):
|
||||||
|
raise ValueError("Naver search requires a text query")
|
||||||
|
|
||||||
|
requested_calls = _requested_calls(self.client, "image_pages")
|
||||||
|
allowed, reason = policy.can_call(self.provider, requested_calls=requested_calls)
|
||||||
|
if not allowed:
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.SEARCH_SKIPPED,
|
||||||
|
reason=reason or "search API skipped",
|
||||||
|
confidence=1.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": self.provider,
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _query_signature(query),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = self.client.search_image(query)
|
||||||
|
policy.record_call(requested_calls)
|
||||||
|
return _map_response(submission_id, query, response)
|
||||||
|
except Exception as exc: # pragma: no cover - defensive adapter boundary
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.ENRICHMENT_FAILURE,
|
||||||
|
reason=f"Naver search failed: {exc}",
|
||||||
|
confidence=1.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": self.provider,
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _query_signature(query),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
def search_pages(
|
||||||
|
self, submission_id: str, query: str, policy: SearchApiPolicy
|
||||||
|
) -> list[Evidence]:
|
||||||
|
if not isinstance(query, str):
|
||||||
|
raise ValueError("Naver search requires a text query")
|
||||||
|
|
||||||
|
requested_calls = _requested_calls(self.client, "blog_pages")
|
||||||
|
allowed, reason = policy.can_call(self.provider, requested_calls=requested_calls)
|
||||||
|
if not allowed:
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.SEARCH_SKIPPED,
|
||||||
|
reason=reason or "search API skipped",
|
||||||
|
confidence=1.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": self.provider,
|
||||||
|
"search_type": "blog",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _blog_query_signature(query),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = self.client.search_blog(query)
|
||||||
|
policy.record_call(requested_calls)
|
||||||
|
return _map_blog_response(submission_id, query, response)
|
||||||
|
except Exception as exc: # pragma: no cover - defensive adapter boundary
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.ENRICHMENT_FAILURE,
|
||||||
|
reason=f"Naver blog search failed: {exc}",
|
||||||
|
confidence=1.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": self.provider,
|
||||||
|
"search_type": "blog",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _blog_query_signature(query),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
def search_web_pages(
|
||||||
|
self, submission_id: str, query: str, policy: SearchApiPolicy
|
||||||
|
) -> list[Evidence]:
|
||||||
|
if not isinstance(query, str):
|
||||||
|
raise ValueError("Naver search requires a text query")
|
||||||
|
|
||||||
|
requested_calls = _requested_calls(self.client, "web_pages")
|
||||||
|
allowed, reason = policy.can_call(self.provider, requested_calls=requested_calls)
|
||||||
|
if not allowed:
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.SEARCH_SKIPPED,
|
||||||
|
reason=reason or "search API skipped",
|
||||||
|
confidence=1.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": self.provider,
|
||||||
|
"search_type": "web",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _web_query_signature(query),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = self.client.search_web(query)
|
||||||
|
policy.record_call(requested_calls)
|
||||||
|
return _map_web_response(submission_id, query, response)
|
||||||
|
except Exception as exc: # pragma: no cover - defensive adapter boundary
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.ENRICHMENT_FAILURE,
|
||||||
|
reason=f"Naver web search failed: {exc}",
|
||||||
|
confidence=1.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": self.provider,
|
||||||
|
"search_type": "web",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _web_query_signature(query),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _map_response(
|
||||||
|
submission_id: str, query: str, response: dict[str, Any]
|
||||||
|
) -> list[Evidence]:
|
||||||
|
items = response.get("items", [])
|
||||||
|
retrieved_at = _now_iso()
|
||||||
|
if not items:
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.NAVER_SEARCH,
|
||||||
|
reason="Naver search returned no results",
|
||||||
|
confidence=0.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": "naver",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _query_signature(query),
|
||||||
|
"retrieved_at": retrieved_at,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
evidence: list[Evidence] = []
|
||||||
|
for rank, item in enumerate(items, start=1):
|
||||||
|
image_url = _normalized_image_url(item.get("link", ""))
|
||||||
|
thumbnail_url = _normalized_image_url(item.get("thumbnail", ""))
|
||||||
|
result_url = _normalized_result_url(item.get("page_url") or item.get("originallink", "")) or image_url
|
||||||
|
payload = {
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": "naver",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _query_signature(query),
|
||||||
|
"rank": rank,
|
||||||
|
"title": item.get("title", ""),
|
||||||
|
"description": item.get("description", ""),
|
||||||
|
"image_url": image_url,
|
||||||
|
"thumbnail_url": thumbnail_url,
|
||||||
|
"result_url": result_url,
|
||||||
|
"height": item.get("sizeheight"),
|
||||||
|
"width": item.get("sizewidth"),
|
||||||
|
"retrieved_at": retrieved_at,
|
||||||
|
}
|
||||||
|
assert_operator_evidence_payload_allowed(DataClass.SEARCH_EVIDENCE, payload)
|
||||||
|
evidence.append(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.NAVER_SEARCH,
|
||||||
|
reason="Naver search result found",
|
||||||
|
confidence=0.5,
|
||||||
|
data=payload,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return evidence
|
||||||
|
|
||||||
|
|
||||||
|
def _map_blog_response(
|
||||||
|
submission_id: str, query: str, response: dict[str, Any]
|
||||||
|
) -> list[Evidence]:
|
||||||
|
items = response.get("items", [])
|
||||||
|
retrieved_at = _now_iso()
|
||||||
|
if not items:
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.NAVER_SEARCH,
|
||||||
|
reason="Naver blog search returned no results",
|
||||||
|
confidence=0.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": "naver",
|
||||||
|
"search_type": "blog",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _blog_query_signature(query),
|
||||||
|
"retrieved_at": retrieved_at,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
evidence: list[Evidence] = []
|
||||||
|
for rank, item in enumerate(items, start=1):
|
||||||
|
result_url = _normalized_result_url(item.get("link", ""))
|
||||||
|
page_image_urls = _page_image_urls_from_item(item, result_url)
|
||||||
|
payload = {
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": "naver",
|
||||||
|
"search_type": "blog",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _blog_query_signature(query),
|
||||||
|
"rank": rank,
|
||||||
|
"title": item.get("title", ""),
|
||||||
|
"description": item.get("description", ""),
|
||||||
|
"result_url": result_url,
|
||||||
|
"blogger_name": item.get("bloggername", ""),
|
||||||
|
"blogger_link": item.get("bloggerlink", ""),
|
||||||
|
"postdate": item.get("postdate", ""),
|
||||||
|
"match": "page",
|
||||||
|
"retrieved_at": retrieved_at,
|
||||||
|
}
|
||||||
|
if page_image_urls:
|
||||||
|
payload["page_image_urls"] = page_image_urls
|
||||||
|
assert_operator_evidence_payload_allowed(DataClass.SEARCH_EVIDENCE, payload)
|
||||||
|
evidence.append(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.NAVER_SEARCH,
|
||||||
|
reason="Naver blog search result found",
|
||||||
|
confidence=0.35,
|
||||||
|
data=payload,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return evidence
|
||||||
|
|
||||||
|
|
||||||
|
def _map_web_response(
|
||||||
|
submission_id: str, query: str, response: dict[str, Any]
|
||||||
|
) -> list[Evidence]:
|
||||||
|
items = response.get("items", [])
|
||||||
|
retrieved_at = _now_iso()
|
||||||
|
if not items:
|
||||||
|
return [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.NAVER_SEARCH,
|
||||||
|
reason="Naver web search returned no results",
|
||||||
|
confidence=0.0,
|
||||||
|
data={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": "naver",
|
||||||
|
"search_type": "web",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _web_query_signature(query),
|
||||||
|
"retrieved_at": retrieved_at,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
evidence: list[Evidence] = []
|
||||||
|
for rank, item in enumerate(items, start=1):
|
||||||
|
result_url = _normalized_result_url(item.get("link", ""))
|
||||||
|
page_image_urls = _page_image_urls_from_item(item, result_url)
|
||||||
|
payload = {
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"provider": "naver",
|
||||||
|
"search_type": "web",
|
||||||
|
"query": query,
|
||||||
|
"query_signature": _web_query_signature(query),
|
||||||
|
"rank": rank,
|
||||||
|
"title": item.get("title", ""),
|
||||||
|
"description": item.get("description", ""),
|
||||||
|
"result_url": result_url,
|
||||||
|
"match": "page",
|
||||||
|
"retrieved_at": retrieved_at,
|
||||||
|
}
|
||||||
|
if page_image_urls:
|
||||||
|
payload["page_image_urls"] = page_image_urls
|
||||||
|
assert_operator_evidence_payload_allowed(DataClass.SEARCH_EVIDENCE, payload)
|
||||||
|
evidence.append(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.NAVER_SEARCH,
|
||||||
|
reason="Naver web search result found",
|
||||||
|
confidence=0.3,
|
||||||
|
data=payload,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return evidence
|
||||||
|
|
||||||
|
|
||||||
|
def _page_image_urls_from_item(item: dict[str, Any], base_url: str) -> list[str]:
|
||||||
|
image_keys = {
|
||||||
|
"content_url",
|
||||||
|
"contenturl",
|
||||||
|
"image",
|
||||||
|
"image_url",
|
||||||
|
"imageurl",
|
||||||
|
"img",
|
||||||
|
"img_url",
|
||||||
|
"imgurl",
|
||||||
|
"media_url",
|
||||||
|
"mediaurl",
|
||||||
|
"og_image",
|
||||||
|
"og_image_url",
|
||||||
|
"photo",
|
||||||
|
"photo_url",
|
||||||
|
"photourl",
|
||||||
|
"picture",
|
||||||
|
"picture_url",
|
||||||
|
"poster",
|
||||||
|
"poster_url",
|
||||||
|
"src",
|
||||||
|
"thumbnail",
|
||||||
|
"thumbnail_url",
|
||||||
|
"thumbnailurl",
|
||||||
|
"twitter_image",
|
||||||
|
"twitter_image_url",
|
||||||
|
}
|
||||||
|
candidates: list[str] = []
|
||||||
|
for key, value in item.items():
|
||||||
|
if _image_hint_key(key) in image_keys:
|
||||||
|
candidates.extend(_image_hint_values(value))
|
||||||
|
return _unique_texts(
|
||||||
|
_normalized_page_image_url(candidate, base_url) for candidate in candidates
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _image_hint_key(key: Any) -> str:
|
||||||
|
return (
|
||||||
|
str(key)
|
||||||
|
.strip()
|
||||||
|
.lower()
|
||||||
|
.replace("-", "_")
|
||||||
|
.replace(":", "_")
|
||||||
|
.replace(".", "_")
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _image_hint_values(value: Any) -> list[str]:
|
||||||
|
if isinstance(value, str):
|
||||||
|
return [value]
|
||||||
|
if isinstance(value, list):
|
||||||
|
values: list[str] = []
|
||||||
|
for item in value:
|
||||||
|
values.extend(_image_hint_values(item))
|
||||||
|
return values
|
||||||
|
if isinstance(value, dict):
|
||||||
|
values = []
|
||||||
|
for key in (
|
||||||
|
"contentUrl",
|
||||||
|
"content_url",
|
||||||
|
"imageUrl",
|
||||||
|
"image_url",
|
||||||
|
"src",
|
||||||
|
"thumbnailUrl",
|
||||||
|
"thumbnail_url",
|
||||||
|
"url",
|
||||||
|
):
|
||||||
|
if value.get(key):
|
||||||
|
values.extend(_image_hint_values(value[key]))
|
||||||
|
return values
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def _normalized_page_image_url(value: Any, base_url: str) -> str:
|
||||||
|
text = _decoded_url_reference(_clean_url(value))
|
||||||
|
if not text or text.lower().startswith("data:"):
|
||||||
|
return ""
|
||||||
|
if text.startswith("//"):
|
||||||
|
text = f"https:{text}"
|
||||||
|
elif base_url and text.startswith(("/", "./", "../")):
|
||||||
|
text = urljoin(base_url, text)
|
||||||
|
normalized = _normalized_image_url(text)
|
||||||
|
parsed = urlparse(normalized)
|
||||||
|
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
|
||||||
|
return ""
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
|
||||||
|
def _unique_texts(values: Any) -> list[str]:
|
||||||
|
seen: set[str] = set()
|
||||||
|
result: list[str] = []
|
||||||
|
for value in values:
|
||||||
|
text = str(value or "").strip()
|
||||||
|
if not text or text in seen:
|
||||||
|
continue
|
||||||
|
seen.add(text)
|
||||||
|
result.append(text)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def _query_signature(query: str) -> str:
|
||||||
|
return "naver:" + " ".join(query.lower().split())
|
||||||
|
|
||||||
|
|
||||||
|
def _blog_query_signature(query: str) -> str:
|
||||||
|
return "naver-blog:" + " ".join(query.lower().split())
|
||||||
|
|
||||||
|
|
||||||
|
def _web_query_signature(query: str) -> str:
|
||||||
|
return "naver-web:" + " ".join(query.lower().split())
|
||||||
|
|
||||||
|
|
||||||
|
def _requested_calls(client: Any, attribute: str) -> int:
|
||||||
|
try:
|
||||||
|
return max(1, int(getattr(client, attribute, 1) or 1))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return 1
|
||||||
|
|
||||||
|
|
||||||
|
def _normalized_image_url(value: Any) -> str:
|
||||||
|
text = _decoded_url_reference(_clean_url(value))
|
||||||
|
if not text or text.lower().startswith("data:"):
|
||||||
|
return ""
|
||||||
|
if _is_scheme_less_remote_image_url(text):
|
||||||
|
text = f"https://{text.lstrip('/')}"
|
||||||
|
normalized = _unwrapped_image_url(text) or text
|
||||||
|
parsed = urlparse(normalized)
|
||||||
|
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
|
||||||
|
return ""
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
|
||||||
|
def _clean_url(value: Any) -> str:
|
||||||
|
return html.unescape(str(value or "").strip())
|
||||||
|
|
||||||
|
|
||||||
|
def _normalized_result_url(value: Any) -> str:
|
||||||
|
text = _decoded_url_reference(_clean_url(value))
|
||||||
|
if not text or text.lower().startswith("data:"):
|
||||||
|
return ""
|
||||||
|
if text.startswith("//"):
|
||||||
|
text = f"https:{text}"
|
||||||
|
normalized = _unwrapped_result_url(text) or text
|
||||||
|
parsed = urlparse(normalized)
|
||||||
|
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
|
||||||
|
return ""
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
|
||||||
|
def _unwrapped_result_url(url: str) -> str:
|
||||||
|
parsed = urlparse(url)
|
||||||
|
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
|
||||||
|
return ""
|
||||||
|
if "naver" not in parsed.netloc.lower():
|
||||||
|
return ""
|
||||||
|
|
||||||
|
redirect_keys = {
|
||||||
|
"url",
|
||||||
|
"u",
|
||||||
|
"q",
|
||||||
|
"target",
|
||||||
|
"redirect",
|
||||||
|
"redirect_url",
|
||||||
|
"imgrefurl",
|
||||||
|
"page_url",
|
||||||
|
}
|
||||||
|
for key, raw_value in parse_qsl(parsed.query, keep_blank_values=False):
|
||||||
|
if key.lower().replace("-", "_") not in redirect_keys:
|
||||||
|
continue
|
||||||
|
candidate = _decoded_nested_url(raw_value)
|
||||||
|
if candidate.startswith("//"):
|
||||||
|
candidate = f"https:{candidate}"
|
||||||
|
if _is_http_url(candidate):
|
||||||
|
return candidate
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def _unwrapped_image_url(url: str) -> str:
|
||||||
|
parsed = urlparse(url)
|
||||||
|
if parsed.scheme not in {"http", "https"} or not parsed.netloc:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
strong_keys = {
|
||||||
|
"imgurl",
|
||||||
|
"imageurl",
|
||||||
|
"image_url",
|
||||||
|
"mediaurl",
|
||||||
|
"media_url",
|
||||||
|
"contenturl",
|
||||||
|
"content_url",
|
||||||
|
"photo",
|
||||||
|
"photo_url",
|
||||||
|
"src",
|
||||||
|
"source",
|
||||||
|
"image",
|
||||||
|
"img",
|
||||||
|
}
|
||||||
|
weak_keys = {"url", "u", "target", "redirect", "redirect_url"}
|
||||||
|
for key, raw_value in parse_qsl(parsed.query, keep_blank_values=False):
|
||||||
|
key_text = key.lower().replace("-", "_")
|
||||||
|
candidate = _decoded_nested_url(raw_value)
|
||||||
|
if not candidate:
|
||||||
|
continue
|
||||||
|
if not _is_http_url(candidate):
|
||||||
|
if candidate.startswith("//"):
|
||||||
|
candidate = f"https:{candidate}"
|
||||||
|
elif _is_scheme_less_remote_image_url(candidate):
|
||||||
|
candidate = f"https://{candidate.lstrip('/')}"
|
||||||
|
elif candidate.startswith("/") or _url_looks_like_image(candidate):
|
||||||
|
candidate = urljoin(url, candidate)
|
||||||
|
else:
|
||||||
|
continue
|
||||||
|
if key_text in strong_keys:
|
||||||
|
return candidate
|
||||||
|
if key_text in weak_keys and _url_looks_like_image(candidate):
|
||||||
|
return candidate
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def _decoded_url_reference(value: str) -> str:
|
||||||
|
raw = str(value).strip()
|
||||||
|
decoded = _decoded_nested_url(raw)
|
||||||
|
if decoded == raw:
|
||||||
|
return raw
|
||||||
|
if (
|
||||||
|
_is_http_url(decoded)
|
||||||
|
or decoded.startswith(("/", "//", "./", "../"))
|
||||||
|
or _is_scheme_less_remote_image_url(decoded)
|
||||||
|
or _url_looks_like_image(decoded)
|
||||||
|
):
|
||||||
|
return decoded
|
||||||
|
return raw
|
||||||
|
|
||||||
|
|
||||||
|
def _decoded_nested_url(value: str) -> str:
|
||||||
|
candidate = str(value).strip()
|
||||||
|
for _ in range(3):
|
||||||
|
decoded = unquote(candidate).strip()
|
||||||
|
if decoded == candidate:
|
||||||
|
break
|
||||||
|
candidate = decoded
|
||||||
|
return candidate
|
||||||
|
|
||||||
|
|
||||||
|
def _is_http_url(value: str) -> bool:
|
||||||
|
parsed = urlparse(value)
|
||||||
|
return parsed.scheme in {"http", "https"} and bool(parsed.netloc)
|
||||||
|
|
||||||
|
|
||||||
|
def _is_scheme_less_remote_image_url(value: str) -> bool:
|
||||||
|
text = str(value).strip().lstrip("/")
|
||||||
|
if not _url_looks_like_image(text):
|
||||||
|
return False
|
||||||
|
first_segment = text.split("/", 1)[0]
|
||||||
|
if first_segment in {".", ".."} or first_segment.startswith("."):
|
||||||
|
return False
|
||||||
|
return "." in first_segment and " " not in first_segment
|
||||||
|
|
||||||
|
|
||||||
|
def _url_path_has_image_suffix(value: str) -> bool:
|
||||||
|
return urlparse(value).path.lower().endswith(
|
||||||
|
(
|
||||||
|
".jpg",
|
||||||
|
".jpeg",
|
||||||
|
".jfif",
|
||||||
|
".pjp",
|
||||||
|
".pjpeg",
|
||||||
|
".png",
|
||||||
|
".gif",
|
||||||
|
".webp",
|
||||||
|
".avif",
|
||||||
|
".bmp",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _url_has_image_format_hint(value: str) -> bool:
|
||||||
|
image_formats = {
|
||||||
|
"avif",
|
||||||
|
"bmp",
|
||||||
|
"gif",
|
||||||
|
"jpeg",
|
||||||
|
"jfif",
|
||||||
|
"jpg",
|
||||||
|
"pjp",
|
||||||
|
"pjpeg",
|
||||||
|
"png",
|
||||||
|
"webp",
|
||||||
|
}
|
||||||
|
image_format_keys = {"format", "fm", "ext", "extension", "mime", "output", "type"}
|
||||||
|
for key, hint in parse_qsl(urlparse(value).query, keep_blank_values=False):
|
||||||
|
if key.lower().replace("-", "_") not in image_format_keys:
|
||||||
|
continue
|
||||||
|
normalized = hint.lower().split(";", 1)[0].strip().lstrip(".")
|
||||||
|
if normalized.startswith("image/"):
|
||||||
|
normalized = normalized.split("/", 1)[1]
|
||||||
|
normalized = normalized.split("+", 1)[0]
|
||||||
|
if normalized in image_formats:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def _url_looks_like_image(value: str) -> bool:
|
||||||
|
return _url_path_has_image_suffix(value) or _url_has_image_format_hint(value)
|
||||||
|
|
||||||
|
|
||||||
|
def _now_iso() -> str:
|
||||||
|
return datetime.now(UTC).replace(microsecond=0).isoformat()
|
||||||
27
src/rights_filter/integrations/search_policy.py
Normal file
27
src/rights_filter/integrations/search_policy.py
Normal file
|
|
@ -0,0 +1,27 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SearchApiPolicy:
|
||||||
|
disabled: bool = False
|
||||||
|
compliance_approved: bool = False
|
||||||
|
allowed_providers: set[str] = field(default_factory=lambda: {"naver"})
|
||||||
|
daily_limit: int | None = None
|
||||||
|
calls_made: int = 0
|
||||||
|
|
||||||
|
def can_call(self, provider: str, requested_calls: int = 1) -> tuple[bool, str | None]:
|
||||||
|
if self.disabled:
|
||||||
|
return False, "search API disabled"
|
||||||
|
if not self.compliance_approved:
|
||||||
|
return False, "search API compliance not approved"
|
||||||
|
if provider not in self.allowed_providers:
|
||||||
|
return False, "search provider not allowed"
|
||||||
|
calls = max(1, int(requested_calls or 1))
|
||||||
|
if self.daily_limit is not None and self.calls_made + calls > self.daily_limit:
|
||||||
|
return False, "search API usage limit reached"
|
||||||
|
return True, None
|
||||||
|
|
||||||
|
def record_call(self, count: int = 1) -> None:
|
||||||
|
self.calls_made += max(1, int(count or 1))
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
from rights_filter.integrations.cloud_vision_web_detection import _map_response as map_response
|
||||||
|
|
||||||
|
__all__ = ["map_response"]
|
||||||
1
src/rights_filter/jobs/__init__.py
Normal file
1
src/rights_filter/jobs/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
"""Batch jobs for rights filtering."""
|
||||||
104
src/rights_filter/jobs/batch_analyzer.py
Normal file
104
src/rights_filter/jobs/batch_analyzer.py
Normal file
|
|
@ -0,0 +1,104 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
from rights_filter.analysis.internal_analyzer import InternalAnalyzer
|
||||||
|
from rights_filter.analysis.preprocessing import ImagePayload, PreprocessingError, build_external_derivative
|
||||||
|
from rights_filter.analysis.risk_scoring import RiskScorer
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
AnalysisRun,
|
||||||
|
Evidence,
|
||||||
|
EvidenceSource,
|
||||||
|
InMemoryRightsFilterRepository,
|
||||||
|
)
|
||||||
|
from rights_filter.integrations.cloud_vision_web_detection import CloudVisionWebDetectionAdapter
|
||||||
|
from rights_filter.integrations.external_policy import ExternalApiPolicy
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class SubmissionImage:
|
||||||
|
submission_id: str
|
||||||
|
image: ImagePayload
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class BatchSummary:
|
||||||
|
processed: int = 0
|
||||||
|
skipped_existing: int = 0
|
||||||
|
external_skipped: int = 0
|
||||||
|
failed: int = 0
|
||||||
|
|
||||||
|
|
||||||
|
class BatchAnalyzer:
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
repository: InMemoryRightsFilterRepository,
|
||||||
|
internal_analyzer: InternalAnalyzer,
|
||||||
|
external_adapter: CloudVisionWebDetectionAdapter,
|
||||||
|
external_policy: ExternalApiPolicy,
|
||||||
|
scorer: RiskScorer,
|
||||||
|
) -> None:
|
||||||
|
self.repository = repository
|
||||||
|
self.internal_analyzer = internal_analyzer
|
||||||
|
self.external_adapter = external_adapter
|
||||||
|
self.external_policy = external_policy
|
||||||
|
self.scorer = scorer
|
||||||
|
|
||||||
|
def run(
|
||||||
|
self,
|
||||||
|
submissions: list[SubmissionImage],
|
||||||
|
analysis_version: str = "v1",
|
||||||
|
) -> BatchSummary:
|
||||||
|
processed = 0
|
||||||
|
skipped_existing = 0
|
||||||
|
external_skipped = 0
|
||||||
|
failed = 0
|
||||||
|
|
||||||
|
for submission in submissions:
|
||||||
|
if self.repository.has_analysis_run(
|
||||||
|
submission.submission_id, analysis_version
|
||||||
|
):
|
||||||
|
skipped_existing += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
run = AnalysisRun.for_submission(
|
||||||
|
submission.submission_id, analysis_version
|
||||||
|
)
|
||||||
|
evidence: list[Evidence] = []
|
||||||
|
try:
|
||||||
|
evidence.extend(
|
||||||
|
self.internal_analyzer.analyze(
|
||||||
|
submission.submission_id, submission.image
|
||||||
|
)
|
||||||
|
)
|
||||||
|
derivative = build_external_derivative(submission.image)
|
||||||
|
evidence.extend(
|
||||||
|
self.external_adapter.detect(
|
||||||
|
submission.submission_id,
|
||||||
|
derivative,
|
||||||
|
self.external_policy,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
except PreprocessingError:
|
||||||
|
# Count the failure and move on without saving a run. Falling
|
||||||
|
# through previously double-counted the submission (failed AND
|
||||||
|
# processed) and persisted a partial run whose presence made
|
||||||
|
# has_analysis_run() skip the submission permanently on re-runs.
|
||||||
|
failed += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
external_skipped += sum(
|
||||||
|
1 for item in evidence if item.source == EvidenceSource.EXTERNAL_SKIPPED
|
||||||
|
)
|
||||||
|
for item in evidence:
|
||||||
|
run.add_evidence(item)
|
||||||
|
run.score = self.scorer.score(evidence)
|
||||||
|
self.repository.save_analysis_run(run)
|
||||||
|
processed += 1
|
||||||
|
|
||||||
|
return BatchSummary(
|
||||||
|
processed=processed,
|
||||||
|
skipped_existing=skipped_existing,
|
||||||
|
external_skipped=external_skipped,
|
||||||
|
failed=failed,
|
||||||
|
)
|
||||||
12
src/rights_filter/jobs/retry_policy.py
Normal file
12
src/rights_filter/jobs/retry_policy.py
Normal file
|
|
@ -0,0 +1,12 @@
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class RetryPolicy:
|
||||||
|
max_attempts: int = 3
|
||||||
|
|
||||||
|
def should_retry(self, attempt: int) -> bool:
|
||||||
|
return attempt < self.max_attempts
|
||||||
|
|
||||||
|
|
||||||
|
__all__ = ["RetryPolicy"]
|
||||||
52
src/rights_filter/jobs/review_enrichment_job.py
Normal file
52
src/rights_filter/jobs/review_enrichment_job.py
Normal file
|
|
@ -0,0 +1,52 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from rights_filter.analysis.evidence_enrichment import EnrichmentSummary
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ReviewEnrichmentJobSummary:
|
||||||
|
processed: int = 0
|
||||||
|
generated_queries: int = 0
|
||||||
|
executed_searches: int = 0
|
||||||
|
skipped_searches: int = 0
|
||||||
|
provider_failures: int = 0
|
||||||
|
summary_failures: int = 0
|
||||||
|
failed: int = 0
|
||||||
|
failure_reasons: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
class ReviewEnrichmentJob:
|
||||||
|
def __init__(self, enricher: Any) -> None:
|
||||||
|
self.enricher = enricher
|
||||||
|
|
||||||
|
def run(
|
||||||
|
self,
|
||||||
|
repository: Any,
|
||||||
|
submission_ids: list[str],
|
||||||
|
) -> ReviewEnrichmentJobSummary:
|
||||||
|
summary = ReviewEnrichmentJobSummary()
|
||||||
|
for submission_id in submission_ids:
|
||||||
|
try:
|
||||||
|
result: EnrichmentSummary = self.enricher.enrich_latest(
|
||||||
|
repository, submission_id
|
||||||
|
)
|
||||||
|
except Exception as exc:
|
||||||
|
# One submission's failure must not abort the whole batch.
|
||||||
|
summary.processed += 1
|
||||||
|
summary.failed += 1
|
||||||
|
summary.failure_reasons.append(
|
||||||
|
"enrichment failed for " + submission_id + ": " + str(exc)
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
summary.processed += 1
|
||||||
|
summary.generated_queries += result.generated_queries
|
||||||
|
summary.executed_searches += result.executed_searches
|
||||||
|
summary.skipped_searches += result.skipped_searches
|
||||||
|
summary.provider_failures += result.provider_failures
|
||||||
|
summary.summary_failures += result.summary_failures
|
||||||
|
summary.failed += result.failed
|
||||||
|
summary.failure_reasons.extend(result.failure_reasons)
|
||||||
|
return summary
|
||||||
13
src/rights_filter/jobs/usage_limits.py
Normal file
13
src/rights_filter/jobs/usage_limits.py
Normal file
|
|
@ -0,0 +1,13 @@
|
||||||
|
from rights_filter.integrations.external_policy import ExternalApiPolicy
|
||||||
|
|
||||||
|
|
||||||
|
class UsageLimit:
|
||||||
|
def __init__(self, policy: ExternalApiPolicy) -> None:
|
||||||
|
self.policy = policy
|
||||||
|
|
||||||
|
def available(self) -> bool:
|
||||||
|
allowed, _reason = self.policy.can_call()
|
||||||
|
return allowed
|
||||||
|
|
||||||
|
|
||||||
|
__all__ = ["UsageLimit"]
|
||||||
1
src/rights_filter/server/__init__.py
Normal file
1
src/rights_filter/server/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
"""Local API server for the standalone Copyrighter operator console."""
|
||||||
44
src/rights_filter/server/__main__.py
Normal file
44
src/rights_filter/server/__main__.py
Normal file
|
|
@ -0,0 +1,44 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from rights_filter.integrations.env_clients import build_provider_runtime
|
||||||
|
from rights_filter.server.env_file import load_env_file
|
||||||
|
from rights_filter.server.http_app import build_server
|
||||||
|
from rights_filter.server.image_store import LocalSubmissionImageStore
|
||||||
|
from rights_filter.server.sqlite_store import CopyrighterStore
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(description="Run the local Copyrighter API server.")
|
||||||
|
parser.add_argument("--host", default="127.0.0.1")
|
||||||
|
parser.add_argument("--port", type=int, default=9500)
|
||||||
|
parser.add_argument("--db", default="data/copyrighter.sqlite3")
|
||||||
|
parser.add_argument("--images", default="data/submissions")
|
||||||
|
parser.add_argument("--static", default="web/operator-gui")
|
||||||
|
parser.add_argument("--env", default=".env")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
load_env_file(Path(args.env), os.environ)
|
||||||
|
provider_runtime = build_provider_runtime(os.environ)
|
||||||
|
image_store = LocalSubmissionImageStore(Path(args.images))
|
||||||
|
store = CopyrighterStore(Path(args.db), provider_runtime=provider_runtime)
|
||||||
|
store.initialize()
|
||||||
|
|
||||||
|
server = build_server(
|
||||||
|
host=args.host,
|
||||||
|
port=args.port,
|
||||||
|
store=store,
|
||||||
|
image_store=image_store,
|
||||||
|
static_dir=Path(args.static),
|
||||||
|
)
|
||||||
|
print(f"Copyrighter API server listening on http://{args.host}:{args.port}")
|
||||||
|
print(f"SQLite DB: {Path(args.db).resolve()}")
|
||||||
|
print(f"Submission images: {Path(args.images).resolve()}")
|
||||||
|
server.serve_forever()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
31
src/rights_filter/server/env_file.py
Normal file
31
src/rights_filter/server/env_file.py
Normal file
|
|
@ -0,0 +1,31 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import MutableMapping
|
||||||
|
|
||||||
|
|
||||||
|
def load_env_file(path: Path | str, environ: MutableMapping[str, str]) -> None:
|
||||||
|
env_path = Path(path)
|
||||||
|
if not env_path.exists():
|
||||||
|
return
|
||||||
|
|
||||||
|
for raw_line in env_path.read_text(encoding="utf-8").splitlines():
|
||||||
|
line = raw_line.strip()
|
||||||
|
if not line or line.startswith("#"):
|
||||||
|
continue
|
||||||
|
if line.startswith("export "):
|
||||||
|
line = line[7:].strip()
|
||||||
|
if "=" not in line:
|
||||||
|
continue
|
||||||
|
|
||||||
|
key, value = line.split("=", 1)
|
||||||
|
key = key.strip()
|
||||||
|
if not key or key in environ:
|
||||||
|
continue
|
||||||
|
environ[key] = _strip_quotes(value.strip())
|
||||||
|
|
||||||
|
|
||||||
|
def _strip_quotes(value: str) -> str:
|
||||||
|
if len(value) >= 2 and value[0] == value[-1] and value[0] in {"'", '"'}:
|
||||||
|
return value[1:-1]
|
||||||
|
return value
|
||||||
293
src/rights_filter/server/http_app.py
Normal file
293
src/rights_filter/server/http_app.py
Normal file
|
|
@ -0,0 +1,293 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import base64
|
||||||
|
import json
|
||||||
|
import mimetypes
|
||||||
|
import re
|
||||||
|
from http import HTTPStatus
|
||||||
|
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
|
||||||
|
from pathlib import Path
|
||||||
|
from urllib.parse import unquote, urlparse
|
||||||
|
|
||||||
|
from rights_filter.server.image_store import LocalSubmissionImageStore, SUPPORTED_IMAGE_SUFFIXES
|
||||||
|
from rights_filter.server.sqlite_store import CopyrighterStore
|
||||||
|
|
||||||
|
|
||||||
|
def build_server(
|
||||||
|
host: str,
|
||||||
|
port: int,
|
||||||
|
store: CopyrighterStore,
|
||||||
|
image_store: LocalSubmissionImageStore,
|
||||||
|
static_dir: Path | str,
|
||||||
|
) -> ThreadingHTTPServer:
|
||||||
|
static_root = Path(static_dir).resolve()
|
||||||
|
|
||||||
|
class CopyrighterHandler(BaseHTTPRequestHandler):
|
||||||
|
server_version = "CopyrighterHTTP/0.1"
|
||||||
|
|
||||||
|
def do_GET(self) -> None:
|
||||||
|
path = _path(self.path)
|
||||||
|
active_store = lambda: store.active_submission_image_store(image_store.root) # noqa: E731 - lazy: only opens a DB connection on routes that use it
|
||||||
|
try:
|
||||||
|
if path == "/health":
|
||||||
|
self._json({"status": "ok", "port": self.server.server_port})
|
||||||
|
elif path == "/api/providers/health":
|
||||||
|
self._json({"status": "ok", "providers": store.providers()})
|
||||||
|
elif path == "/api/bootstrap":
|
||||||
|
self._json(store.bootstrap())
|
||||||
|
elif path == "/api/review-queue":
|
||||||
|
self._json(store.bootstrap()["submissions"])
|
||||||
|
elif path.startswith("/api/submissions/") and path.endswith("/review"):
|
||||||
|
submission_id = unquote(path.split("/")[3])
|
||||||
|
self._json(store.review(submission_id))
|
||||||
|
elif path == "/api/providers":
|
||||||
|
self._json(store.providers())
|
||||||
|
elif path == "/api/audit-events":
|
||||||
|
self._json(store.audit_events())
|
||||||
|
elif path.startswith("/media/"):
|
||||||
|
self._file(active_store().media_path(unquote(path.removeprefix("/media/"))), untrusted=True)
|
||||||
|
elif path.startswith("/knowledge-media/"):
|
||||||
|
self._file(store.knowledge_media_path(unquote(path.removeprefix("/knowledge-media/"))), untrusted=True)
|
||||||
|
elif path.startswith("/collected-media/"):
|
||||||
|
self._file(store.collected_media_path(unquote(path.removeprefix("/collected-media/"))), untrusted=True)
|
||||||
|
else:
|
||||||
|
self._static(path, static_root)
|
||||||
|
except KeyError:
|
||||||
|
self._json({"error": "not found"}, HTTPStatus.NOT_FOUND)
|
||||||
|
except ValueError as exc:
|
||||||
|
self._json({"error": str(exc)}, HTTPStatus.BAD_REQUEST)
|
||||||
|
|
||||||
|
def do_POST(self) -> None:
|
||||||
|
path = _path(self.path)
|
||||||
|
active_store = lambda: store.active_submission_image_store(image_store.root) # noqa: E731 - lazy: only opens a DB connection on routes that use it
|
||||||
|
try:
|
||||||
|
body = self._body()
|
||||||
|
if path.startswith("/api/submissions/") and path.endswith("/decision"):
|
||||||
|
submission_id = unquote(path.split("/")[3])
|
||||||
|
self._json(
|
||||||
|
store.record_decision(
|
||||||
|
submission_id,
|
||||||
|
str(body.get("decision", "")),
|
||||||
|
str(body.get("memo", "")),
|
||||||
|
active_store(),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
elif path.startswith("/api/submissions/") and path.endswith("/search-auto"):
|
||||||
|
submission_id = unquote(path.split("/")[3])
|
||||||
|
self._json(store.run_auto_search(submission_id, active_store()))
|
||||||
|
elif path.startswith("/api/evidence/") and path.endswith("/status"):
|
||||||
|
evidence_id = unquote(path.split("/")[3])
|
||||||
|
self._json(
|
||||||
|
store.mark_evidence_status(
|
||||||
|
str(body.get("submission_id", "")),
|
||||||
|
evidence_id,
|
||||||
|
str(body.get("status", "")),
|
||||||
|
str(body.get("note", "")),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
elif path.startswith("/api/submissions/") and path.endswith("/rerun-enrichment"):
|
||||||
|
submission_id = unquote(path.split("/")[3])
|
||||||
|
self._json(store.rerun_enrichment(submission_id, active_store()))
|
||||||
|
elif path == "/api/submissions/reload":
|
||||||
|
imported = store.seed_from_image_store(active_store())
|
||||||
|
payload = store.bootstrap()
|
||||||
|
payload["imported"] = imported
|
||||||
|
self._json(payload)
|
||||||
|
elif path == "/api/submissions/import-folder":
|
||||||
|
folder = str(body.get("folder", "") or body.get("path", "")).strip()
|
||||||
|
if not folder:
|
||||||
|
raise ValueError("submission folder path is required")
|
||||||
|
folder_path = Path(folder)
|
||||||
|
if not folder_path.exists():
|
||||||
|
raise ValueError("submission folder does not exist")
|
||||||
|
imported = store.seed_from_image_store(LocalSubmissionImageStore(folder_path))
|
||||||
|
payload = store.bootstrap()
|
||||||
|
payload["imported"] = imported
|
||||||
|
self._json(payload)
|
||||||
|
elif path == "/api/submissions/upload-image":
|
||||||
|
active = active_store()
|
||||||
|
uploaded_id = _save_submission_upload(active, body.get("image"))
|
||||||
|
imported = store.seed_from_image_store(active)
|
||||||
|
payload = store.bootstrap()
|
||||||
|
payload["imported"] = imported
|
||||||
|
payload["uploadedSubmissionId"] = uploaded_id
|
||||||
|
self._json(payload)
|
||||||
|
elif path == "/api/search/manual":
|
||||||
|
self._json(
|
||||||
|
store.manual_search(
|
||||||
|
str(body.get("submission_id", "")),
|
||||||
|
str(body.get("provider", "")),
|
||||||
|
str(body.get("query", "")),
|
||||||
|
active_store(),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
elif path == "/api/knowledge/manual":
|
||||||
|
store.register_manual_knowledge_entry(body)
|
||||||
|
self._json(store.bootstrap())
|
||||||
|
elif path == "/api/collections/keyword":
|
||||||
|
self._json(
|
||||||
|
store.collect_keyword_candidates(
|
||||||
|
str(body.get("query", "")),
|
||||||
|
str(body.get("provider", "naver")),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
elif path == "/api/collections/candidates/promote-batch":
|
||||||
|
self._json(store.promote_collection_candidates(body))
|
||||||
|
elif path.startswith("/api/collections/candidates/") and path.endswith("/promote"):
|
||||||
|
candidate_id = unquote(path.split("/")[4])
|
||||||
|
self._json(store.promote_collection_candidate(candidate_id, body))
|
||||||
|
elif path.startswith("/api/knowledge/") and path.endswith("/promote-watchlist"):
|
||||||
|
entry_id = unquote(path.split("/")[3])
|
||||||
|
self._json(store.promote_watchlist_entry(entry_id))
|
||||||
|
elif path.startswith("/api/knowledge/") and path.endswith("/exclude-watchlist"):
|
||||||
|
entry_id = unquote(path.split("/")[3])
|
||||||
|
self._json(store.exclude_watchlist_entry(entry_id, str(body.get("reason", ""))))
|
||||||
|
elif path == "/api/providers/emergency-disable":
|
||||||
|
self._json(store.emergency_disable_external_providers())
|
||||||
|
else:
|
||||||
|
self._json({"error": "not found"}, HTTPStatus.NOT_FOUND)
|
||||||
|
except KeyError:
|
||||||
|
self._json({"error": "not found"}, HTTPStatus.NOT_FOUND)
|
||||||
|
except ValueError as exc:
|
||||||
|
self._json({"error": str(exc)}, HTTPStatus.BAD_REQUEST)
|
||||||
|
|
||||||
|
def do_PATCH(self) -> None:
|
||||||
|
path = _path(self.path)
|
||||||
|
try:
|
||||||
|
body = self._body()
|
||||||
|
if path.startswith("/api/providers/"):
|
||||||
|
provider_id = path.split("/")[3]
|
||||||
|
enabled = body.get("enabled")
|
||||||
|
if enabled is None:
|
||||||
|
current = next(
|
||||||
|
(provider for provider in store.providers() if provider["id"] == provider_id),
|
||||||
|
None,
|
||||||
|
)
|
||||||
|
if current is None:
|
||||||
|
raise KeyError(provider_id)
|
||||||
|
enabled = not current["enabled"]
|
||||||
|
self._json(store.set_provider_enabled(provider_id, bool(enabled)))
|
||||||
|
else:
|
||||||
|
self._json({"error": "not found"}, HTTPStatus.NOT_FOUND)
|
||||||
|
except KeyError:
|
||||||
|
self._json({"error": "not found"}, HTTPStatus.NOT_FOUND)
|
||||||
|
except ValueError as exc:
|
||||||
|
self._json({"error": str(exc)}, HTTPStatus.BAD_REQUEST)
|
||||||
|
|
||||||
|
def log_message(self, format: str, *args: object) -> None:
|
||||||
|
return
|
||||||
|
|
||||||
|
def _body(self) -> dict[str, object]:
|
||||||
|
length = int(self.headers.get("Content-Length", "0") or "0")
|
||||||
|
if not length:
|
||||||
|
return {}
|
||||||
|
return json.loads(self.rfile.read(length).decode("utf-8"))
|
||||||
|
|
||||||
|
def _json(self, payload: object, status: HTTPStatus = HTTPStatus.OK) -> None:
|
||||||
|
data = json.dumps(payload, ensure_ascii=False).encode("utf-8")
|
||||||
|
self.send_response(status)
|
||||||
|
self.send_header("Content-Type", "application/json; charset=utf-8")
|
||||||
|
self.send_header("Content-Length", str(len(data)))
|
||||||
|
self.send_header("Access-Control-Allow-Origin", "*")
|
||||||
|
self.end_headers()
|
||||||
|
self.wfile.write(data)
|
||||||
|
|
||||||
|
def _static(self, path: str, root: Path) -> None:
|
||||||
|
relative = "index.html" if path in {"", "/"} else path.lstrip("/")
|
||||||
|
target = (root / relative).resolve()
|
||||||
|
if target != root and root not in target.parents:
|
||||||
|
self._json({"error": "not found"}, HTTPStatus.NOT_FOUND)
|
||||||
|
return
|
||||||
|
if not target.exists() or target.is_dir():
|
||||||
|
self._json({"error": "not found"}, HTTPStatus.NOT_FOUND)
|
||||||
|
return
|
||||||
|
self._file(target)
|
||||||
|
|
||||||
|
def _file(self, path: Path, untrusted: bool = False) -> None:
|
||||||
|
if not path.exists() or path.is_dir():
|
||||||
|
self._json({"error": "not found"}, HTTPStatus.NOT_FOUND)
|
||||||
|
return
|
||||||
|
data = path.read_bytes()
|
||||||
|
content_type = mimetypes.guess_type(path.name)[0] or "application/octet-stream"
|
||||||
|
if content_type.startswith("text/") or content_type in {"application/javascript", "application/json"}:
|
||||||
|
content_type = f"{content_type}; charset=utf-8"
|
||||||
|
self.send_response(HTTPStatus.OK)
|
||||||
|
self.send_header("Content-Type", content_type)
|
||||||
|
self.send_header("Content-Length", str(len(data)))
|
||||||
|
if untrusted:
|
||||||
|
# Neutralize stored XSS from operator-uploaded / externally
|
||||||
|
# collected media (an SVG can carry an inline <script>). `sandbox`
|
||||||
|
# blocks script execution on direct navigation while still
|
||||||
|
# allowing <img> rendering (CSP does not apply to images embedded
|
||||||
|
# via <img>). Only applied to untrusted media — never to the
|
||||||
|
# trusted static app, whose own scripts must run.
|
||||||
|
self.send_header("X-Content-Type-Options", "nosniff")
|
||||||
|
self.send_header(
|
||||||
|
"Content-Security-Policy",
|
||||||
|
"default-src 'none'; style-src 'unsafe-inline'; img-src 'self' data:; sandbox",
|
||||||
|
)
|
||||||
|
self.end_headers()
|
||||||
|
self.wfile.write(data)
|
||||||
|
|
||||||
|
return ThreadingHTTPServer((host, port), CopyrighterHandler)
|
||||||
|
|
||||||
|
|
||||||
|
def _path(raw_path: str) -> str:
|
||||||
|
return urlparse(raw_path).path
|
||||||
|
|
||||||
|
|
||||||
|
def _save_submission_upload(image_store: LocalSubmissionImageStore, raw_image: object) -> str:
|
||||||
|
if not isinstance(raw_image, dict):
|
||||||
|
raise ValueError("image is required")
|
||||||
|
filename = Path(str(raw_image.get("filename", "") or "")).name
|
||||||
|
if not filename:
|
||||||
|
raise ValueError("image filename is required")
|
||||||
|
suffix = Path(filename).suffix.lower()
|
||||||
|
if not suffix:
|
||||||
|
suffix = _suffix_for_content_type(str(raw_image.get("content_type", "")))
|
||||||
|
filename = f"{filename}{suffix}"
|
||||||
|
if suffix not in SUPPORTED_IMAGE_SUFFIXES:
|
||||||
|
raise ValueError("unsupported submission image type")
|
||||||
|
try:
|
||||||
|
content = base64.b64decode(str(raw_image.get("data", "")), validate=True)
|
||||||
|
except Exception as exc:
|
||||||
|
raise ValueError("image data must be base64") from exc
|
||||||
|
if not content:
|
||||||
|
raise ValueError("image data is empty")
|
||||||
|
|
||||||
|
image_dir = image_store.root / "images"
|
||||||
|
image_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
safe_name = _safe_upload_name(filename)
|
||||||
|
target = _unique_upload_path(image_dir, safe_name)
|
||||||
|
target.write_bytes(content)
|
||||||
|
return target.stem
|
||||||
|
|
||||||
|
|
||||||
|
def _safe_upload_name(filename: str) -> str:
|
||||||
|
path = Path(filename)
|
||||||
|
stem = re.sub(r"[^0-9A-Za-z_.-]+", "-", path.stem).strip(".-") or "submission"
|
||||||
|
return f"{stem}{path.suffix.lower()}"
|
||||||
|
|
||||||
|
|
||||||
|
def _unique_upload_path(directory: Path, filename: str) -> Path:
|
||||||
|
candidate = directory / filename
|
||||||
|
if not candidate.exists():
|
||||||
|
return candidate
|
||||||
|
path = Path(filename)
|
||||||
|
for index in range(2, 1000):
|
||||||
|
candidate = directory / f"{path.stem}-{index}{path.suffix}"
|
||||||
|
if not candidate.exists():
|
||||||
|
return candidate
|
||||||
|
raise ValueError("too many files with the same submission image name")
|
||||||
|
|
||||||
|
|
||||||
|
def _suffix_for_content_type(content_type: str) -> str:
|
||||||
|
return {
|
||||||
|
"image/jpeg": ".jpg",
|
||||||
|
"image/png": ".png",
|
||||||
|
"image/gif": ".gif",
|
||||||
|
"image/webp": ".webp",
|
||||||
|
"image/bmp": ".bmp",
|
||||||
|
"image/svg+xml": ".svg",
|
||||||
|
"image/avif": ".avif",
|
||||||
|
}.get(content_type.lower(), "")
|
||||||
129
src/rights_filter/server/image_store.py
Normal file
129
src/rights_filter/server/image_store.py
Normal file
|
|
@ -0,0 +1,129 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from rights_filter.analysis.preprocessing import ImagePayload
|
||||||
|
|
||||||
|
SUPPORTED_IMAGE_SUFFIXES = {
|
||||||
|
".avif",
|
||||||
|
".bmp",
|
||||||
|
".gif",
|
||||||
|
".jpeg",
|
||||||
|
".jfif",
|
||||||
|
".jpg",
|
||||||
|
".pjp",
|
||||||
|
".pjpeg",
|
||||||
|
".png",
|
||||||
|
".svg",
|
||||||
|
".webp",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class LocalSubmissionImageStore:
|
||||||
|
def __init__(self, root: Path | str, public_prefix: str = "/media") -> None:
|
||||||
|
self.root = Path(root).resolve()
|
||||||
|
self.public_prefix = public_prefix.rstrip("/")
|
||||||
|
|
||||||
|
def submission_records(self) -> list[dict[str, Any]]:
|
||||||
|
manifest = self.root / "submissions.json"
|
||||||
|
if manifest.exists():
|
||||||
|
records = json.loads(manifest.read_text(encoding="utf-8"))
|
||||||
|
normalized = [
|
||||||
|
self._normalize_manifest_record(record)
|
||||||
|
for record in records
|
||||||
|
if isinstance(record, dict)
|
||||||
|
and record.get("file")
|
||||||
|
and self._safe_path(str(record["file"])).exists()
|
||||||
|
]
|
||||||
|
return self._with_unlisted_files(normalized)
|
||||||
|
return self._scan_records()
|
||||||
|
|
||||||
|
def image_payload(self, submission_id: str) -> ImagePayload:
|
||||||
|
record = self._record_for(submission_id)
|
||||||
|
path = self._safe_path(record["file"])
|
||||||
|
return ImagePayload(
|
||||||
|
content=path.read_bytes(),
|
||||||
|
width=int(record["width"]),
|
||||||
|
height=int(record["height"]),
|
||||||
|
metadata={
|
||||||
|
"submission_id": submission_id,
|
||||||
|
"title": str(record.get("title", submission_id)),
|
||||||
|
"file": str(record["file"]),
|
||||||
|
"format": str(record.get("format", "")),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
def media_path(self, relative_path: str) -> Path:
|
||||||
|
return self._safe_path(relative_path)
|
||||||
|
|
||||||
|
def _record_for(self, submission_id: str) -> dict[str, Any]:
|
||||||
|
for record in self.submission_records():
|
||||||
|
if record["id"] == submission_id:
|
||||||
|
return record
|
||||||
|
raise KeyError(submission_id)
|
||||||
|
|
||||||
|
def _normalize_manifest_record(self, record: dict[str, Any]) -> dict[str, Any]:
|
||||||
|
path = self._safe_path(str(record["file"]))
|
||||||
|
normalized = dict(record)
|
||||||
|
normalized["file"] = str(Path(str(record["file"])).as_posix())
|
||||||
|
normalized.setdefault("id", path.stem)
|
||||||
|
normalized.setdefault("title", normalized["id"])
|
||||||
|
normalized.setdefault("width", 1)
|
||||||
|
normalized.setdefault("height", 1)
|
||||||
|
normalized.setdefault("submitted_at", "")
|
||||||
|
normalized["asset"] = f"{self.public_prefix}/{normalized['file']}"
|
||||||
|
normalized["format"] = path.suffix.lstrip(".").upper() or "FILE"
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
def _scan_records(self) -> list[dict[str, Any]]:
|
||||||
|
if not self.root.exists():
|
||||||
|
return []
|
||||||
|
records: list[dict[str, Any]] = []
|
||||||
|
for path in sorted(self.root.rglob("*")):
|
||||||
|
if path.name == "submissions.json" or not path.is_file():
|
||||||
|
continue
|
||||||
|
if path.suffix.lower() not in SUPPORTED_IMAGE_SUFFIXES:
|
||||||
|
continue
|
||||||
|
relative = path.relative_to(self.root).as_posix()
|
||||||
|
width, height = _image_size(path)
|
||||||
|
records.append(
|
||||||
|
{
|
||||||
|
"id": path.stem,
|
||||||
|
"title": path.stem,
|
||||||
|
"file": relative,
|
||||||
|
"asset": f"{self.public_prefix}/{relative}",
|
||||||
|
"width": width,
|
||||||
|
"height": height,
|
||||||
|
"submitted_at": "",
|
||||||
|
"format": path.suffix.lstrip(".").upper(),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
return records
|
||||||
|
|
||||||
|
def _with_unlisted_files(self, records: list[dict[str, Any]]) -> list[dict[str, Any]]:
|
||||||
|
seen_ids = {str(record["id"]) for record in records}
|
||||||
|
seen_files = {str(record["file"]) for record in records}
|
||||||
|
merged = list(records)
|
||||||
|
for record in self._scan_records():
|
||||||
|
if record["id"] in seen_ids or record["file"] in seen_files:
|
||||||
|
continue
|
||||||
|
merged.append(record)
|
||||||
|
return merged
|
||||||
|
|
||||||
|
def _safe_path(self, relative_path: str) -> Path:
|
||||||
|
path = (self.root / relative_path).resolve()
|
||||||
|
if path != self.root and self.root not in path.parents:
|
||||||
|
raise ValueError("manifest path points outside image store")
|
||||||
|
return path
|
||||||
|
|
||||||
|
|
||||||
|
def _image_size(path: Path) -> tuple[int, int]:
|
||||||
|
try:
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
with Image.open(path) as image:
|
||||||
|
return int(image.width), int(image.height)
|
||||||
|
except Exception:
|
||||||
|
return 1, 1
|
||||||
4933
src/rights_filter/server/sqlite_store.py
Normal file
4933
src/rights_filter/server/sqlite_store.py
Normal file
File diff suppressed because it is too large
Load diff
191
tests/operator_gui/test_browser_smoke.py
Normal file
191
tests/operator_gui/test_browser_smoke.py
Normal file
|
|
@ -0,0 +1,191 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
import sys
|
||||||
|
from threading import Thread
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
ROOT = Path(__file__).resolve().parents[2]
|
||||||
|
sys.path.insert(0, str(ROOT / "src"))
|
||||||
|
APP_DIR = ROOT / "web" / "operator-gui"
|
||||||
|
|
||||||
|
from rights_filter.server.http_app import build_server
|
||||||
|
from rights_filter.server.image_store import LocalSubmissionImageStore
|
||||||
|
from rights_filter.server.sqlite_store import CopyrighterStore
|
||||||
|
|
||||||
|
|
||||||
|
def _start(server):
|
||||||
|
thread = Thread(target=server.serve_forever, daemon=True)
|
||||||
|
thread.start()
|
||||||
|
return thread
|
||||||
|
|
||||||
|
|
||||||
|
def _browser_or_skip(playwright):
|
||||||
|
try:
|
||||||
|
return playwright.chromium.launch(headless=True)
|
||||||
|
except Exception as exc:
|
||||||
|
pytest.skip(f"Playwright Chromium is not available: {exc}")
|
||||||
|
|
||||||
|
|
||||||
|
def _bootstrap_payload():
|
||||||
|
return {
|
||||||
|
"submissions": [
|
||||||
|
{
|
||||||
|
"id": "SUB-SMOKE1",
|
||||||
|
"title": "Smoke sample",
|
||||||
|
"asset": "/assets/case-portrait.svg",
|
||||||
|
"riskScore": 42,
|
||||||
|
"riskBand": "medium",
|
||||||
|
"submittedAt": "2026-06-03 10:00",
|
||||||
|
"submittedEpoch": 1780452000,
|
||||||
|
"lastAnalysis": "2026-06-03 10:01",
|
||||||
|
"applicantStatus": "검토 중",
|
||||||
|
"decisionStatus": "unreviewed",
|
||||||
|
"reasons": ["Naver search returned no results"],
|
||||||
|
"providerState": {"internal": "ok", "naver": "empty", "google": "disabled", "llm": "pending"},
|
||||||
|
"fileFacts": {"size": "320 x 240", "format": "SVG", "submitted": "2026-06-03 10:00", "analysis": "v1"},
|
||||||
|
"derivativeNote": "브라우저 smoke test submission",
|
||||||
|
"recommendation": {"label": "운영자 검토 필요", "detail": "검색 근거가 부족합니다."},
|
||||||
|
"derivedPreview": {"automatic": False, "entryName": "Smoke sample", "effect": "보강 검색 필요"},
|
||||||
|
"queryHistory": [
|
||||||
|
{
|
||||||
|
"provider": "naver",
|
||||||
|
"query": "Smoke sample official",
|
||||||
|
"status": "empty",
|
||||||
|
"timestamp": "2026-06-03 10:01",
|
||||||
|
"count": 0,
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"similar": [{"asset": "/assets/case-portrait.svg", "label": "local submission"}],
|
||||||
|
"evidence": [
|
||||||
|
{
|
||||||
|
"id": "ev-smoke-empty",
|
||||||
|
"group": "naver",
|
||||||
|
"source": "naver",
|
||||||
|
"title": "Naver search returned no results",
|
||||||
|
"confidence": 0,
|
||||||
|
"query": "Smoke sample official",
|
||||||
|
"domain": "naver",
|
||||||
|
"url": "",
|
||||||
|
"imageUrl": "",
|
||||||
|
"thumbnailUrl": "",
|
||||||
|
"pageTitle": "",
|
||||||
|
"matchType": "empty",
|
||||||
|
"rank": "",
|
||||||
|
"providerScore": 0,
|
||||||
|
"retrievedAt": "2026-06-03 10:01",
|
||||||
|
"contributed": False,
|
||||||
|
"sourceEvidenceIds": [],
|
||||||
|
"status": "active",
|
||||||
|
}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"submissionQueue": {"label": "smoke", "folderPath": "smoke", "isActive": True},
|
||||||
|
"providers": [{"id": "naver", "name": "Naver", "enabled": True, "quota": 100, "usage": 0, "status": "ok"}],
|
||||||
|
"knowledgeEntries": [],
|
||||||
|
"collectionCandidates": [],
|
||||||
|
"corrections": [],
|
||||||
|
"auditEvents": [],
|
||||||
|
"coverageThresholds": {"coverageGoodRate": 70, "coverageWarnRate": 40, "queryGoodRate": 70, "queryWarnRate": 40},
|
||||||
|
"searchCoverage": {
|
||||||
|
"submissions": {"total": 1, "coverageSubmissions": 0},
|
||||||
|
"queries": {"failed": 0},
|
||||||
|
"providers": [{"id": "naver", "name": "Naver", "queryEntries": 1, "evidenceSubmissions": 0}],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def test_browser_smoke_suggested_query_fills_manual_query_without_running_search(tmp_path: Path):
|
||||||
|
playwright = pytest.importorskip("playwright.sync_api")
|
||||||
|
image_root = tmp_path / "submissions"
|
||||||
|
image_root.mkdir()
|
||||||
|
store = CopyrighterStore(tmp_path / "copyrighter.sqlite3")
|
||||||
|
store.initialize()
|
||||||
|
server = build_server(
|
||||||
|
host="127.0.0.1",
|
||||||
|
port=0,
|
||||||
|
store=store,
|
||||||
|
image_store=LocalSubmissionImageStore(image_root),
|
||||||
|
static_dir=APP_DIR,
|
||||||
|
)
|
||||||
|
_start(server)
|
||||||
|
base = f"http://127.0.0.1:{server.server_port}"
|
||||||
|
|
||||||
|
try:
|
||||||
|
with playwright.sync_playwright() as pw:
|
||||||
|
browser = _browser_or_skip(pw)
|
||||||
|
page = browser.new_page(viewport={"width": 1280, "height": 900})
|
||||||
|
browser_errors = []
|
||||||
|
page.on("console", lambda message: browser_errors.append(f"console:{message.type}:{message.text}"))
|
||||||
|
page.on("pageerror", lambda error: browser_errors.append(f"pageerror:{error}"))
|
||||||
|
page.route(
|
||||||
|
"**/api/bootstrap",
|
||||||
|
lambda route: route.fulfill(
|
||||||
|
status=200,
|
||||||
|
content_type="application/json",
|
||||||
|
body=json.dumps(_bootstrap_payload(), ensure_ascii=False),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
page.goto(base + "/")
|
||||||
|
try:
|
||||||
|
page.wait_for_selector('#queue-body [data-select-case="SUB-SMOKE1"]', timeout=5000)
|
||||||
|
except Exception as exc:
|
||||||
|
pytest.fail(f"{exc}\nerrors={browser_errors}\nbody={page.locator('body').inner_text()[:1000]}")
|
||||||
|
page.get_by_role("button", name="SUB-SMOKE1").click()
|
||||||
|
page.get_by_role("button", name="Smoke sample 저작권").click()
|
||||||
|
|
||||||
|
assert page.locator('[data-workbench-panel="queries"]').is_visible()
|
||||||
|
assert page.locator("#manual-query").input_value() == "Smoke sample 저작권"
|
||||||
|
assert page.locator("#manual-query-status").inner_text() == "추천 쿼리를 입력했습니다. 실행 버튼을 눌러 검색하세요.", browser_errors
|
||||||
|
|
||||||
|
browser.close()
|
||||||
|
finally:
|
||||||
|
server.shutdown()
|
||||||
|
|
||||||
|
|
||||||
|
def test_browser_uploads_image_and_selects_new_submission(tmp_path: Path):
|
||||||
|
playwright = pytest.importorskip("playwright.sync_api")
|
||||||
|
image_root = tmp_path / "submissions"
|
||||||
|
image_root.mkdir()
|
||||||
|
upload_file = tmp_path / "smoke upload.svg"
|
||||||
|
upload_file.write_text("<svg xmlns='http://www.w3.org/2000/svg' width='80' height='60'></svg>", encoding="utf-8")
|
||||||
|
store = CopyrighterStore(tmp_path / "copyrighter.sqlite3")
|
||||||
|
store.initialize()
|
||||||
|
server = build_server(
|
||||||
|
host="127.0.0.1",
|
||||||
|
port=0,
|
||||||
|
store=store,
|
||||||
|
image_store=LocalSubmissionImageStore(image_root),
|
||||||
|
static_dir=APP_DIR,
|
||||||
|
)
|
||||||
|
_start(server)
|
||||||
|
base = f"http://127.0.0.1:{server.server_port}"
|
||||||
|
|
||||||
|
try:
|
||||||
|
with playwright.sync_playwright() as pw:
|
||||||
|
browser = _browser_or_skip(pw)
|
||||||
|
page = browser.new_page(viewport={"width": 1280, "height": 900})
|
||||||
|
browser_errors = []
|
||||||
|
page.on("console", lambda message: browser_errors.append(f"console:{message.type}:{message.text}"))
|
||||||
|
page.on("pageerror", lambda error: browser_errors.append(f"pageerror:{error}"))
|
||||||
|
|
||||||
|
page.goto(base + "/")
|
||||||
|
page.set_input_files("#submission-image", str(upload_file))
|
||||||
|
assert page.locator("#submission-image-name").inner_text() == "smoke upload.svg"
|
||||||
|
|
||||||
|
page.get_by_role("button", name="사진 넣기").click()
|
||||||
|
page.wait_for_selector('#queue-body [data-select-case="smoke-upload"]', state="attached", timeout=10000)
|
||||||
|
page.wait_for_selector("#workbench-view", state="visible", timeout=10000)
|
||||||
|
|
||||||
|
assert (image_root / "images" / "smoke-upload.svg").exists()
|
||||||
|
assert "smoke-upload 사진이 추가되었습니다. 새 심사 건으로 바로 선택했습니다." in page.locator("#submission-import-status").inner_text()
|
||||||
|
assert page.locator("#case-title").inner_text() == "smoke-upload · smoke-upload"
|
||||||
|
assert not browser_errors
|
||||||
|
|
||||||
|
browser.close()
|
||||||
|
finally:
|
||||||
|
server.shutdown()
|
||||||
782
tests/operator_gui/test_static_workbench.py
Normal file
782
tests/operator_gui/test_static_workbench.py
Normal file
|
|
@ -0,0 +1,782 @@
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
ROOT = Path(__file__).resolve().parents[2]
|
||||||
|
APP_DIR = ROOT / "web" / "operator-gui"
|
||||||
|
INDEX = APP_DIR / "index.html"
|
||||||
|
STYLES = APP_DIR / "styles.css"
|
||||||
|
APP_JS = APP_DIR / "app.js"
|
||||||
|
OPERATOR_LABELS_JS = APP_DIR / "operator-labels.js"
|
||||||
|
SUBMISSION_IMPORT_JS = APP_DIR / "submission-import.js"
|
||||||
|
EVIDENCE_GUIDANCE_JS = APP_DIR / "evidence-guidance.js"
|
||||||
|
OPERATOR_SEARCH_JS = APP_DIR / "operator-search.js"
|
||||||
|
PITCH = APP_DIR / "pitch.html"
|
||||||
|
PITCH_STYLES = APP_DIR / "pitch.css"
|
||||||
|
PITCH_JS = APP_DIR / "pitch.js"
|
||||||
|
UI_OVERHAUL_FINAL_RESULTS = ROOT / "data" / "logs" / "ui-overhaul-final-results.json"
|
||||||
|
UI_OVERHAUL_FINAL_SCREENSHOTS = [
|
||||||
|
"ui-overhaul-desktop-final-queue.png",
|
||||||
|
"ui-overhaul-desktop-final-workbench-queries.png",
|
||||||
|
"ui-overhaul-desktop-final-knowledge-db.png",
|
||||||
|
"ui-overhaul-desktop-final-knowledge-corrections.png",
|
||||||
|
"ui-overhaul-mobile-final-queue.png",
|
||||||
|
"ui-overhaul-mobile-final-workbench-queries.png",
|
||||||
|
"ui-overhaul-mobile-final-knowledge-db.png",
|
||||||
|
"ui-overhaul-mobile-final-knowledge-corrections.png",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _read(path: Path) -> str:
|
||||||
|
return path.read_text(encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_workbench_files_exist():
|
||||||
|
assert INDEX.exists()
|
||||||
|
assert STYLES.exists()
|
||||||
|
assert APP_JS.exists()
|
||||||
|
assert OPERATOR_LABELS_JS.exists()
|
||||||
|
assert SUBMISSION_IMPORT_JS.exists()
|
||||||
|
assert EVIDENCE_GUIDANCE_JS.exists()
|
||||||
|
assert OPERATOR_SEARCH_JS.exists()
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_gui_text_does_not_include_mojibake_fragments():
|
||||||
|
suspect_fragments = [
|
||||||
|
"Â",
|
||||||
|
"Ã",
|
||||||
|
"<EFBFBD>",
|
||||||
|
"ê",
|
||||||
|
"ë",
|
||||||
|
"ì",
|
||||||
|
"í",
|
||||||
|
"?",
|
||||||
|
"?¤",
|
||||||
|
"?",
|
||||||
|
"?´",
|
||||||
|
"?¸",
|
||||||
|
"?",
|
||||||
|
"?¬",
|
||||||
|
"?",
|
||||||
|
"?",
|
||||||
|
]
|
||||||
|
|
||||||
|
for path in [INDEX, APP_JS, OPERATOR_LABELS_JS, SUBMISSION_IMPORT_JS, EVIDENCE_GUIDANCE_JS, OPERATOR_SEARCH_JS]:
|
||||||
|
text = _read(path)
|
||||||
|
for fragment in suspect_fragments:
|
||||||
|
assert fragment not in text, f"{path.name} contains mojibake fragment {fragment!r}"
|
||||||
|
|
||||||
|
|
||||||
|
def test_pitch_page_presents_copyright_review_flow_with_real_captures():
|
||||||
|
html = _read(PITCH)
|
||||||
|
styles = _read(PITCH_STYLES)
|
||||||
|
script = _read(PITCH_JS)
|
||||||
|
|
||||||
|
for path in [PITCH, PITCH_STYLES, PITCH_JS]:
|
||||||
|
assert path.exists()
|
||||||
|
|
||||||
|
for required_text in [
|
||||||
|
"이미지 저작권 심사를",
|
||||||
|
"판별 방식",
|
||||||
|
"운영 화면",
|
||||||
|
"DB 성장",
|
||||||
|
"거버넌스",
|
||||||
|
"Mermaid 원본 보기",
|
||||||
|
"실제 9500 운영 콘솔",
|
||||||
|
"Google",
|
||||||
|
"Naver",
|
||||||
|
"Ollama",
|
||||||
|
]:
|
||||||
|
assert required_text in html
|
||||||
|
|
||||||
|
for asset_name in [
|
||||||
|
"case-review.png",
|
||||||
|
"evidence-search.png",
|
||||||
|
"knowledge-db.png",
|
||||||
|
"provider-controls.png",
|
||||||
|
"risk-pipeline.svg",
|
||||||
|
"decision-loop.svg",
|
||||||
|
]:
|
||||||
|
assert (APP_DIR / "pitch-assets" / asset_name).exists()
|
||||||
|
assert f"pitch-assets/{asset_name}" in html
|
||||||
|
|
||||||
|
assert "hydrateMetrics" in script
|
||||||
|
assert "overflow-x: hidden" in styles
|
||||||
|
assert "@media (max-width: 640px)" in styles
|
||||||
|
|
||||||
|
|
||||||
|
def test_workbench_shell_exposes_all_internal_operator_views():
|
||||||
|
html = _read(INDEX)
|
||||||
|
styles = _read(STYLES)
|
||||||
|
|
||||||
|
assert "<nav" in html
|
||||||
|
assert "<main" in html
|
||||||
|
assert 'data-internal-only="true"' in html
|
||||||
|
assert 'class="product-purpose"' in html
|
||||||
|
assert 'aria-label="제품 목적"' in html
|
||||||
|
assert "이미지 저작권 위험 심사" in html
|
||||||
|
assert "제출 이미지, 외부 검색 근거, 내부 기준 DB를 한 화면에서 검토합니다." in html
|
||||||
|
assert ".product-purpose" in styles
|
||||||
|
assert 'class="operator-workflow"' in html
|
||||||
|
assert 'aria-label="운영 흐름"' in html
|
||||||
|
assert "심사 건 추가" in html
|
||||||
|
assert "근거 보강" in html
|
||||||
|
assert "운영 결정" in html
|
||||||
|
assert "사진 추가 또는 제출 폴더 불러오기" in html
|
||||||
|
assert "추천 쿼리로 외부 검색 결과를 보강합니다." in html
|
||||||
|
assert "기준 DB에 반영합니다." in html
|
||||||
|
assert ".operator-workflow" in styles
|
||||||
|
assert "bulk-approve" not in html.lower()
|
||||||
|
assert "bulk-reject" not in html.lower()
|
||||||
|
|
||||||
|
assert 'data-view="evidence"' not in html
|
||||||
|
assert 'id="evidence-view"' not in html
|
||||||
|
assert 'data-view="corrections"' not in html
|
||||||
|
assert 'id="corrections-view"' not in html
|
||||||
|
|
||||||
|
for view in [
|
||||||
|
"queue",
|
||||||
|
"workbench",
|
||||||
|
"knowledge",
|
||||||
|
"providers",
|
||||||
|
"audit",
|
||||||
|
]:
|
||||||
|
assert f'data-view="{view}"' in html
|
||||||
|
assert f'id="{view}-view"' in html
|
||||||
|
|
||||||
|
|
||||||
|
def test_static_app_models_core_review_operations():
|
||||||
|
script = _read(APP_JS)
|
||||||
|
|
||||||
|
for state_name in [
|
||||||
|
"submissions",
|
||||||
|
"providers",
|
||||||
|
"knowledgeEntries",
|
||||||
|
"collectionCandidates",
|
||||||
|
"corrections",
|
||||||
|
"auditEvents",
|
||||||
|
]:
|
||||||
|
assert state_name in script
|
||||||
|
|
||||||
|
for renderer in [
|
||||||
|
"renderQueue",
|
||||||
|
"renderCaseReview",
|
||||||
|
"renderEvidenceSearch",
|
||||||
|
"renderKnowledgeBase",
|
||||||
|
"renderCollectionCandidates",
|
||||||
|
"renderCorrections",
|
||||||
|
"renderProviderControls",
|
||||||
|
"renderAuditLog",
|
||||||
|
"renderCoverageTabs",
|
||||||
|
]:
|
||||||
|
assert f"function {renderer}" in script
|
||||||
|
|
||||||
|
for operation in [
|
||||||
|
"selectCase",
|
||||||
|
"setDecision",
|
||||||
|
"runManualSearch",
|
||||||
|
"rerunEnrichment",
|
||||||
|
"addKnowledgeEntry",
|
||||||
|
"runCandidateCollection",
|
||||||
|
"promoteCollectionCandidate",
|
||||||
|
"deactivateCorrectionEntry",
|
||||||
|
"toggleProvider",
|
||||||
|
]:
|
||||||
|
assert f"function {operation}" in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_gui_extracts_submission_import_helpers_from_main_app():
|
||||||
|
html = _read(INDEX)
|
||||||
|
script = _read(APP_JS)
|
||||||
|
labels = _read(OPERATOR_LABELS_JS)
|
||||||
|
helpers = _read(SUBMISSION_IMPORT_JS)
|
||||||
|
guidance = _read(EVIDENCE_GUIDANCE_JS)
|
||||||
|
search = _read(OPERATOR_SEARCH_JS)
|
||||||
|
|
||||||
|
assert html.index('src="operator-labels.js"') < html.index('src="submission-import.js"')
|
||||||
|
assert html.index('src="submission-import.js"') < html.index('src="app.js"')
|
||||||
|
assert html.index('src="evidence-guidance.js"') < html.index('src="app.js"')
|
||||||
|
assert html.index('src="operator-search.js"') < html.index('src="app.js"')
|
||||||
|
assert "OperatorLabels" in labels
|
||||||
|
assert "riskLabels" in labels
|
||||||
|
assert "providerLabels" in labels
|
||||||
|
assert "} = window.OperatorLabels;" in script
|
||||||
|
assert "OperatorSubmissionImport" in helpers
|
||||||
|
assert "fileToImagePayload" in helpers
|
||||||
|
assert "importedFolderStatusMessage" in helpers
|
||||||
|
assert "importedSubmissionStatusMessage" in helpers
|
||||||
|
assert "window.OperatorSubmissionImport.fileToImagePayload(file)" in script
|
||||||
|
assert "window.OperatorSubmissionImport.importedFolderStatusMessage" in script
|
||||||
|
assert "window.OperatorSubmissionImport.importedSubmissionStatusMessage" in script
|
||||||
|
assert "OperatorEvidenceGuidance" in guidance
|
||||||
|
assert "suggestedEvidenceQueries" in guidance
|
||||||
|
assert "evidenceFollowupReasons" in guidance
|
||||||
|
assert "window.OperatorEvidenceGuidance.suggestedEvidenceQueries" in script
|
||||||
|
assert "window.OperatorEvidenceGuidance.evidenceFollowupReasons" in script
|
||||||
|
assert "OperatorSearch" in search
|
||||||
|
assert "formatQueryStatus" in search
|
||||||
|
assert "formatQueryStrategy" in search
|
||||||
|
assert "normalizeManualSearchProvider" in search
|
||||||
|
assert "window.OperatorSearch.formatQueryStatus" in script
|
||||||
|
assert "window.OperatorSearch.formatQueryStrategy" in script
|
||||||
|
assert "window.OperatorSearch.normalizeManualSearchProvider" in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_gui_does_not_boot_with_demo_review_data():
|
||||||
|
script = _read(APP_JS)
|
||||||
|
|
||||||
|
for collection in [
|
||||||
|
"submissions",
|
||||||
|
"providers",
|
||||||
|
"knowledgeEntries",
|
||||||
|
"collectionCandidates",
|
||||||
|
"corrections",
|
||||||
|
"auditEvents",
|
||||||
|
]:
|
||||||
|
assert f"const {collection} = [];" in script
|
||||||
|
|
||||||
|
for demo_token in [
|
||||||
|
"SUB-1007",
|
||||||
|
"ev-1007",
|
||||||
|
"kb-iu",
|
||||||
|
"DEC-0992",
|
||||||
|
]:
|
||||||
|
assert demo_token not in script
|
||||||
|
|
||||||
|
assert "clearRuntimeData()" in script
|
||||||
|
assert "renderNoSelectedCase" in script
|
||||||
|
assert "state.apiError" in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_design_contract_has_accessibility_risk_states_and_responsive_layouts():
|
||||||
|
styles = _read(STYLES)
|
||||||
|
|
||||||
|
assert ":focus-visible" in styles
|
||||||
|
assert "@media (max-width: 980px)" in styles
|
||||||
|
assert "@media (max-width: 680px)" in styles
|
||||||
|
assert "linear-gradient" not in styles.lower()
|
||||||
|
assert "orb" not in styles.lower()
|
||||||
|
|
||||||
|
for class_name in [
|
||||||
|
".risk-high",
|
||||||
|
".risk-medium",
|
||||||
|
".risk-low",
|
||||||
|
".risk-failed",
|
||||||
|
".source-naver",
|
||||||
|
".source-google",
|
||||||
|
".source-llm",
|
||||||
|
".source-internal",
|
||||||
|
]:
|
||||||
|
assert class_name in styles
|
||||||
|
|
||||||
|
|
||||||
|
def test_ui_overhaul_final_audit_has_no_overflow_and_required_screenshots():
|
||||||
|
results = json.loads(_read(UI_OVERHAUL_FINAL_RESULTS))
|
||||||
|
|
||||||
|
for breakpoint_results in results.values():
|
||||||
|
for view_name, view_result in breakpoint_results.items():
|
||||||
|
if not isinstance(view_result, dict) or "overflow" not in view_result:
|
||||||
|
continue
|
||||||
|
|
||||||
|
assert view_result["docW"] <= view_result["vw"], view_name
|
||||||
|
assert view_result["overflow"] == [], view_name
|
||||||
|
|
||||||
|
for screenshot_name in UI_OVERHAUL_FINAL_SCREENSHOTS:
|
||||||
|
screenshot_path = UI_OVERHAUL_FINAL_RESULTS.parent / screenshot_name
|
||||||
|
assert screenshot_path.exists(), screenshot_name
|
||||||
|
assert screenshot_path.stat().st_size > 0, screenshot_name
|
||||||
|
|
||||||
|
|
||||||
|
def test_header_coverage_uses_horizontal_filter_tabs_not_scrolling_panel():
|
||||||
|
html = _read(INDEX)
|
||||||
|
styles = _read(STYLES)
|
||||||
|
script = _read(APP_JS)
|
||||||
|
|
||||||
|
assert 'id="coverage-tabs"' in html
|
||||||
|
assert 'id="search-coverage"' not in html
|
||||||
|
assert "function renderCoverageTabs" in script
|
||||||
|
assert "function applyCoverageFilter" in script
|
||||||
|
assert "data-coverage-filter" in script
|
||||||
|
assert ".coverage-tabs" in styles
|
||||||
|
assert ".coverage-tab" in styles
|
||||||
|
assert ".search-coverage" not in styles
|
||||||
|
|
||||||
|
|
||||||
|
def test_queue_uses_grid_row_contract_for_dense_evidence_and_provider_columns():
|
||||||
|
html = _read(INDEX)
|
||||||
|
styles = _read(STYLES)
|
||||||
|
script = _read(APP_JS)
|
||||||
|
|
||||||
|
assert "queue-grid" in html
|
||||||
|
assert "queue-row" in script
|
||||||
|
assert ".queue-grid" in styles
|
||||||
|
assert ".queue-row" in styles
|
||||||
|
assert "grid-template-columns: 28px 64px minmax(104px, 0.68fr) 72px minmax(126px, 0.58fr) minmax(360px, 1.5fr) 82px 76px 90px" in styles
|
||||||
|
assert "queue-submission-cell" in script
|
||||||
|
assert "queue-risk-cell" in script
|
||||||
|
assert "queue-time-cell" in script
|
||||||
|
assert "queue-title" not in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_visual_overhaul_removes_obsolete_header_cards_and_fixed_table_widths():
|
||||||
|
styles = _read(STYLES)
|
||||||
|
|
||||||
|
for obsolete_selector in [
|
||||||
|
".coverage-main",
|
||||||
|
".coverage-badges",
|
||||||
|
".coverage-provider-grid",
|
||||||
|
".coverage-mini-line",
|
||||||
|
]:
|
||||||
|
assert obsolete_selector not in styles
|
||||||
|
|
||||||
|
assert ".queue-table th:nth-child" not in styles
|
||||||
|
assert ".queue-row td:nth-child(5)" in styles
|
||||||
|
assert ".queue-row td:nth-child(6)" in styles
|
||||||
|
|
||||||
|
|
||||||
|
def test_case_workbench_owns_evidence_and_query_history_as_internal_tabs():
|
||||||
|
html = _read(INDEX)
|
||||||
|
script = _read(APP_JS)
|
||||||
|
styles = _read(STYLES)
|
||||||
|
|
||||||
|
assert 'id="workbench-view"' in html
|
||||||
|
assert 'data-workbench-tab="summary"' not in html
|
||||||
|
assert 'data-workbench-tab="evidence"' in html
|
||||||
|
assert 'data-workbench-tab="queries"' in html
|
||||||
|
assert 'data-workbench-tab="decision"' not in html
|
||||||
|
assert 'data-workbench-panel="evidence"' in html
|
||||||
|
assert 'data-workbench-panel="queries"' in html
|
||||||
|
assert 'data-workbench-panel="decision"' not in html
|
||||||
|
assert "근거 및 판단" in html
|
||||||
|
assert 'id="query-history"' in html
|
||||||
|
assert 'id="search-results"' in html
|
||||||
|
assert "workbenchTab" in script
|
||||||
|
assert 'workbenchTab: "evidence"' in script
|
||||||
|
assert "function switchWorkbenchTab" in script
|
||||||
|
assert 'switchView("workbench")' in script
|
||||||
|
assert 'panelName === nextTab ||' not in script
|
||||||
|
assert ".workbench-tabs" in styles
|
||||||
|
assert ".workbench-panel" in styles
|
||||||
|
|
||||||
|
|
||||||
|
def test_case_workbench_tabs_do_not_share_the_same_review_layout():
|
||||||
|
html = _read(INDEX)
|
||||||
|
|
||||||
|
evidence_panel = html.split('data-workbench-panel="evidence"', 1)[1].split('data-workbench-panel="queries"', 1)[0]
|
||||||
|
query_panel = html.split('data-workbench-panel="queries"', 1)[1].split('id="knowledge-view"', 1)[0]
|
||||||
|
|
||||||
|
assert 'id="case-image"' in evidence_panel
|
||||||
|
assert 'id="evidence-groups"' in evidence_panel
|
||||||
|
assert evidence_panel.index('id="recommendation-box"') < evidence_panel.index('id="case-image"')
|
||||||
|
assert evidence_panel.index('id="decision-memo"') < evidence_panel.index('id="case-image"')
|
||||||
|
assert 'id="query-history"' in query_panel
|
||||||
|
|
||||||
|
|
||||||
|
def test_case_decision_controls_float_while_reviewing_evidence():
|
||||||
|
html = _read(INDEX)
|
||||||
|
script = _read(APP_JS)
|
||||||
|
styles = _read(STYLES)
|
||||||
|
|
||||||
|
assert "floating-decision-panel" in html
|
||||||
|
assert 'id="floating-case-score"' in html
|
||||||
|
assert "floating-case-score" in script
|
||||||
|
assert ".floating-decision-panel" in styles
|
||||||
|
assert "position: fixed" in styles
|
||||||
|
assert "bottom: 24px" in styles
|
||||||
|
assert 'data-workbench-panel="evidence"' in styles
|
||||||
|
assert "padding-right: 334px" in styles
|
||||||
|
assert ".evidence-layout" in styles
|
||||||
|
assert "padding-bottom: 260px" in styles
|
||||||
|
assert ".floating-decision-panel .decision-actions" in styles
|
||||||
|
|
||||||
|
|
||||||
|
def test_knowledge_base_rows_separate_title_metadata_chips_and_actions():
|
||||||
|
script = _read(APP_JS)
|
||||||
|
styles = _read(STYLES)
|
||||||
|
|
||||||
|
assert "knowledge-main" in script
|
||||||
|
assert "knowledge-chip-row" in script
|
||||||
|
assert "knowledge-detail-line" in script
|
||||||
|
assert "memoInline" in script
|
||||||
|
assert "memoBlock" in script
|
||||||
|
assert "knowledge-meta" in script
|
||||||
|
assert ".knowledge-main" in styles
|
||||||
|
assert ".knowledge-chip-row" in styles
|
||||||
|
assert ".knowledge-detail-line" in styles
|
||||||
|
assert ".knowledge-meta" in styles
|
||||||
|
assert ".knowledge-actions" in styles
|
||||||
|
assert "grid-template-areas" in styles
|
||||||
|
|
||||||
|
|
||||||
|
def test_correction_history_lives_inside_knowledge_database_panel():
|
||||||
|
html = _read(INDEX)
|
||||||
|
script = _read(APP_JS)
|
||||||
|
styles = _read(STYLES)
|
||||||
|
|
||||||
|
assert 'data-knowledge-tab="collect"' in html
|
||||||
|
assert 'data-knowledge-tab="registered"' in html
|
||||||
|
assert 'data-knowledge-tab="manual"' in html
|
||||||
|
assert 'data-knowledge-tab="corrections"' in html
|
||||||
|
assert 'data-knowledge-panel="collect"' in html
|
||||||
|
assert 'data-knowledge-panel="registered"' in html
|
||||||
|
assert 'data-knowledge-panel="manual"' in html
|
||||||
|
assert 'data-knowledge-panel="corrections"' in html
|
||||||
|
assert 'id="corrections-list"' in html
|
||||||
|
assert "knowledgeTab" in script
|
||||||
|
assert "function switchKnowledgeTab" in script
|
||||||
|
assert 'switchView("knowledge")' in script
|
||||||
|
assert 'switchKnowledgeTab("corrections")' in script
|
||||||
|
assert ".knowledge-tabs" in styles
|
||||||
|
assert ".knowledge-panel" in styles
|
||||||
|
|
||||||
|
|
||||||
|
def test_knowledge_collection_and_registered_entries_are_separate_tabs():
|
||||||
|
html = _read(INDEX)
|
||||||
|
|
||||||
|
collect_panel = html.split('data-knowledge-panel="collect"', 1)[1].split('data-knowledge-panel="registered"', 1)[0]
|
||||||
|
registered_panel = html.split('data-knowledge-panel="registered"', 1)[1].split('data-knowledge-panel="manual"', 1)[0]
|
||||||
|
manual_panel = html.split('data-knowledge-panel="manual"', 1)[1].split('data-knowledge-panel="corrections"', 1)[0]
|
||||||
|
|
||||||
|
assert 'id="candidate-collection-form"' in collect_panel
|
||||||
|
assert 'id="collection-candidates"' in collect_panel
|
||||||
|
assert 'id="knowledge-list"' not in collect_panel
|
||||||
|
assert 'id="knowledge-list"' in registered_panel
|
||||||
|
assert 'id="candidate-collection-form"' not in registered_panel
|
||||||
|
assert 'id="knowledge-form"' in manual_panel
|
||||||
|
|
||||||
|
|
||||||
|
def test_safety_rules_are_visible_in_ui_contract():
|
||||||
|
html = _read(INDEX)
|
||||||
|
script = _read(APP_JS)
|
||||||
|
|
||||||
|
assert 'id="decision-memo"' in html
|
||||||
|
assert 'id="manual-query-provider"' in html
|
||||||
|
assert 'option value="google_search"' not in html
|
||||||
|
assert "네이버" in html
|
||||||
|
assert "reverse search" not in html.lower()
|
||||||
|
assert "sourceEvidenceIds" in script
|
||||||
|
assert "requiresMemo" in script
|
||||||
|
assert "automatic" in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_provider_quota_zero_does_not_render_nan_meter():
|
||||||
|
script = _read(APP_JS)
|
||||||
|
|
||||||
|
assert "provider.quota ? Math.min" in script
|
||||||
|
assert "configuredEnv" in script
|
||||||
|
assert "requiredEnv" in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_history_rerun_preserves_original_search_provider():
|
||||||
|
script = _read(APP_JS)
|
||||||
|
search = _read(OPERATOR_SEARCH_JS)
|
||||||
|
|
||||||
|
assert "data-rerun-provider" in script
|
||||||
|
assert "rerunHistoricalQuery" in script
|
||||||
|
assert "normalizeManualSearchProvider" in search
|
||||||
|
assert "window.OperatorSearch.normalizeManualSearchProvider" in script
|
||||||
|
assert 'providerSelect.value = provider' in script
|
||||||
|
assert "NaN" not in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_gui_exposes_submission_reload_action():
|
||||||
|
html = _read(INDEX)
|
||||||
|
script = _read(APP_JS)
|
||||||
|
|
||||||
|
assert 'id="submission-folder"' in html
|
||||||
|
assert 'id="submission-image"' in html
|
||||||
|
assert 'id="submission-image-name"' in html
|
||||||
|
assert 'id="queue-upload-guidance"' in html
|
||||||
|
assert 'id="upload-submission-image"' in html
|
||||||
|
assert 'id="reload-submissions"' in html
|
||||||
|
assert 'id="submission-import-status"' in html
|
||||||
|
assert "사진을 추가하면 현재 큐에 새 심사 건으로 들어가고" in html
|
||||||
|
assert "/api/submissions/reload" in script
|
||||||
|
assert "/api/submissions/import-folder" in script
|
||||||
|
assert "/api/submissions/upload-image" in script
|
||||||
|
assert "uploadSubmissionImage" in script
|
||||||
|
assert "updateSubmissionImageName" in script
|
||||||
|
assert 'switchView("workbench")' in script
|
||||||
|
assert 'switchWorkbenchTab("evidence")' in script
|
||||||
|
assert "새 심사 건으로 바로 선택했습니다." in script
|
||||||
|
assert "reloadSubmissions" in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_gui_uses_clear_korean_status_copy_for_queue_imports():
|
||||||
|
script = _read(APP_JS) + _read(SUBMISSION_IMPORT_JS)
|
||||||
|
|
||||||
|
for required_copy in [
|
||||||
|
"제출 폴더를 읽는 중입니다.",
|
||||||
|
"추가된 제출 없음",
|
||||||
|
"건 가져옴",
|
||||||
|
"건 추가됨",
|
||||||
|
"선택됨",
|
||||||
|
"재분석할 제출이 없습니다.",
|
||||||
|
"건 재분석을 시작합니다.",
|
||||||
|
"재분석 중",
|
||||||
|
"건 재분석 완료",
|
||||||
|
"사진을 현재 제출 폴더에 넣는 중입니다.",
|
||||||
|
"사진이 추가되었습니다.",
|
||||||
|
]:
|
||||||
|
assert required_copy in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_gui_bulk_rerun_uses_checked_queue_rows():
|
||||||
|
script = _read(APP_JS)
|
||||||
|
|
||||||
|
assert "function selectedBulkSubmissionIds" in script
|
||||||
|
assert 'input[data-bulk-id]:checked' in script
|
||||||
|
assert "function rerunSelectedEnrichment" in script
|
||||||
|
assert 'document.getElementById("bulk-rerun").addEventListener("click", rerunSelectedEnrichment)' in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_gui_manual_knowledge_entry_accepts_reference_image():
|
||||||
|
html = _read(INDEX)
|
||||||
|
script = _read(APP_JS)
|
||||||
|
|
||||||
|
assert 'id="knowledge-image"' in html
|
||||||
|
assert 'id="knowledge-image-name"' in html
|
||||||
|
assert 'type="file"' in html
|
||||||
|
assert 'accept="image/*"' in html
|
||||||
|
assert "updateKnowledgeImageName" in script
|
||||||
|
assert "readKnowledgeImage" in script
|
||||||
|
assert "/api/knowledge/manual" in script
|
||||||
|
assert "imageAsset" in script
|
||||||
|
assert "sampleFingerprints" in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_gui_exposes_keyword_candidate_collection_workflow():
|
||||||
|
html = _read(INDEX)
|
||||||
|
script = _read(APP_JS)
|
||||||
|
search = _read(OPERATOR_SEARCH_JS)
|
||||||
|
|
||||||
|
assert 'id="candidate-collection-form"' in html
|
||||||
|
assert 'id="collection-query"' in html
|
||||||
|
assert 'id="collection-provider"' in html
|
||||||
|
assert 'option value="google_search"' not in html
|
||||||
|
assert 'id="collection-candidates"' in html
|
||||||
|
assert 'id="collection-status"' in html
|
||||||
|
assert 'id="select-all-candidates"' in html
|
||||||
|
assert 'id="clear-selected-candidates"' in html
|
||||||
|
assert "/api/collections/keyword" in script
|
||||||
|
assert "/api/collections/candidates/" in script
|
||||||
|
assert "/api/collections/candidates/promote-batch" in script
|
||||||
|
assert "바로 편입" in script
|
||||||
|
assert "선택 후보 묶어서 편입" in html
|
||||||
|
assert 'id="collection-promotion-form"' in html
|
||||||
|
assert 'id="collection-promotion-name"' in html
|
||||||
|
assert 'id="collection-promotion-type"' in html
|
||||||
|
assert 'id="collection-promotion-aliases"' in html
|
||||||
|
assert 'id="collection-promotion-keywords"' in html
|
||||||
|
assert 'id="collection-promotion-memo"' in html
|
||||||
|
assert 'id="promote-selected-candidates"' in html
|
||||||
|
assert "selectedCollectionCandidateIds" in script
|
||||||
|
assert "setAllCollectionCandidateSelection" in script
|
||||||
|
assert "promoteSelectedCollectionCandidates" in script
|
||||||
|
assert "clearCollectionCandidatesForSearch" in script
|
||||||
|
assert "clearCollectionCandidatesAfterAction" in script
|
||||||
|
assert "dismissCollectionCandidate" in script
|
||||||
|
assert "currentCollectionQuery" in script
|
||||||
|
assert "visibleCollectionCandidates" in script
|
||||||
|
assert "data-collection-candidate-id" in script
|
||||||
|
assert "data-promote-candidate" in script
|
||||||
|
assert "data-dismiss-candidate" in script
|
||||||
|
assert "candidate-actions" in script
|
||||||
|
assert ".candidate-actions" in _read(STYLES)
|
||||||
|
assert "candidate-card" in script
|
||||||
|
assert "sourceClass(candidate.provider)" in script
|
||||||
|
assert "formatCandidateSourceType" in script
|
||||||
|
assert "검색 이미지 결과" in script
|
||||||
|
assert "페이지 대표 이미지" in script
|
||||||
|
assert "formatQueryStrategy" in search
|
||||||
|
assert "window.OperatorSearch.formatQueryStrategy" in script
|
||||||
|
assert "구글 페이지 제목 기반" in search
|
||||||
|
assert "제출 제목/파일명 기반" in search
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_search_is_not_exposed_as_operator_choice():
|
||||||
|
html = _read(INDEX)
|
||||||
|
script = _read(APP_JS)
|
||||||
|
|
||||||
|
assert "구글 맞춤 검색" not in html
|
||||||
|
assert "구글 맞춤 검색" not in script
|
||||||
|
assert 'value="google_search"' not in html
|
||||||
|
assert "operatorSearchProviders" in script
|
||||||
|
assert "visibleProviderControls" in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_audit_target_and_change_columns_have_equal_widths():
|
||||||
|
styles = _read(STYLES)
|
||||||
|
|
||||||
|
assert "--audit-object-width: 24%" in styles
|
||||||
|
assert ".audit-table th:nth-child(4)" in styles
|
||||||
|
assert ".audit-table th:nth-child(5)" in styles
|
||||||
|
assert "width: var(--audit-object-width)" in styles
|
||||||
|
|
||||||
|
|
||||||
|
def test_queue_provider_judgments_render_on_one_line_with_narrower_reason_column():
|
||||||
|
script = _read(APP_JS)
|
||||||
|
styles = _read(STYLES)
|
||||||
|
|
||||||
|
assert "provider-strip" in script
|
||||||
|
assert "queue-provider-chip" in script
|
||||||
|
assert "formatQueueProviderStatus" in script
|
||||||
|
assert "근거 있음" in script
|
||||||
|
assert "결과 없음" in script
|
||||||
|
assert "미실행" in script
|
||||||
|
assert ".queue-provider-strip" in styles
|
||||||
|
assert "flex-wrap: nowrap" in styles
|
||||||
|
assert ".queue-row td:nth-child(6)" in styles
|
||||||
|
assert "white-space: nowrap" in styles
|
||||||
|
assert "grid-template-columns: 28px 64px minmax(104px, 0.68fr) 72px minmax(126px, 0.58fr) minmax(360px, 1.5fr) 82px 76px 90px" in styles
|
||||||
|
|
||||||
|
|
||||||
|
def test_evidence_operator_status_actions_are_binary_use_or_ignore():
|
||||||
|
script = _read(APP_JS)
|
||||||
|
|
||||||
|
assert '["used_for_judgment", "사용"]' in script
|
||||||
|
assert '["ignored", "미사용"]' in script
|
||||||
|
assert '["irrelevant",' not in script
|
||||||
|
assert '["false_positive",' not in script
|
||||||
|
assert '["pending",' not in script
|
||||||
|
assert "normalizeEvidenceOperatorStatus" in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_gui_uses_korean_operator_copy_for_visible_labels():
|
||||||
|
html = _read(INDEX)
|
||||||
|
script = _read(APP_JS)
|
||||||
|
|
||||||
|
for english_label in [
|
||||||
|
"Image Rights Operator Console",
|
||||||
|
"Review Queue",
|
||||||
|
"Case Review",
|
||||||
|
"Evidence Search",
|
||||||
|
"Knowledge Base",
|
||||||
|
"Provider Controls",
|
||||||
|
"Audit Log",
|
||||||
|
"Automated Recommendation",
|
||||||
|
"Fingerprint match",
|
||||||
|
"Face/person",
|
||||||
|
"LLM summary",
|
||||||
|
"Provider failure",
|
||||||
|
"Google Web Detection</option>",
|
||||||
|
]:
|
||||||
|
assert english_label not in html
|
||||||
|
|
||||||
|
assert "권리 검수 콘솔" in html
|
||||||
|
assert "심사 대기열" in html
|
||||||
|
assert "얼굴/인물 감지" in html
|
||||||
|
assert "공급자" not in html
|
||||||
|
assert "외부 검색 tool 활용" in html
|
||||||
|
assert "공급자" not in script
|
||||||
|
assert "외부 검색 tool" in script
|
||||||
|
assert "formatEvidenceTitle" in script
|
||||||
|
assert "formatReason" in script
|
||||||
|
assert "동일인 판정이 아닙니다" in script
|
||||||
|
assert "신뢰도" in script
|
||||||
|
assert "근거 ID" in script
|
||||||
|
assert "샘플 지문" in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_gui_renders_search_result_links_and_thumbnails():
|
||||||
|
script = _read(APP_JS)
|
||||||
|
|
||||||
|
assert "renderEvidenceLink" in script
|
||||||
|
assert "thumbnailUrl" in script
|
||||||
|
assert "matchType" in script
|
||||||
|
assert "evidence-preview" in script
|
||||||
|
assert "search_result_image" in script
|
||||||
|
assert "search_result_page_image" in script
|
||||||
|
assert "Naver blog search result found" in script
|
||||||
|
assert "naver_blog" in script
|
||||||
|
assert "Naver web search result found" in script
|
||||||
|
assert "naver_web" in script
|
||||||
|
assert "Google custom image search result found" in script
|
||||||
|
assert "Google custom web search result found" in script
|
||||||
|
assert "google_search" in script
|
||||||
|
assert "google_best_guess" in script
|
||||||
|
assert "google_face_crop_page" in script
|
||||||
|
assert "google_face_crop_entity" in script
|
||||||
|
assert "searchType" in script
|
||||||
|
assert "이미지 유사도" in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_gui_prioritizes_evidence_and_hides_overflow_details():
|
||||||
|
script = _read(APP_JS)
|
||||||
|
styles = _read(STYLES)
|
||||||
|
|
||||||
|
assert "renderEvidenceSummary" in script
|
||||||
|
assert "renderEvidenceGroup" in script
|
||||||
|
assert "topItems" in script
|
||||||
|
assert "자세히 보기" in script
|
||||||
|
assert "renderEvidenceGroups({ ...submission, evidence: searchableEvidence })" in script
|
||||||
|
assert "evidence-summary-board" in script
|
||||||
|
assert ".evidence-card-grid" in styles
|
||||||
|
assert ".evidence-details" in styles
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_gui_suggests_followup_queries_for_insufficient_evidence():
|
||||||
|
html = _read(INDEX)
|
||||||
|
script = _read(APP_JS)
|
||||||
|
guidance = _read(EVIDENCE_GUIDANCE_JS)
|
||||||
|
combined_script = script + guidance
|
||||||
|
styles = _read(STYLES)
|
||||||
|
|
||||||
|
assert 'id="evidence-next-actions"' in html
|
||||||
|
assert "function evidenceNeedsFollowup" in combined_script
|
||||||
|
assert "function evidenceFollowupReasons" in combined_script
|
||||||
|
assert "function suggestedEvidenceQueries" in combined_script
|
||||||
|
assert "function renderEvidenceNextActions" in script
|
||||||
|
assert "function applySuggestedQuery" in script
|
||||||
|
assert "data-suggested-query" in script
|
||||||
|
assert "evidence-followup-reasons" in script
|
||||||
|
assert "직접 매칭 또는 원문 페이지 근거가 없습니다." in guidance
|
||||||
|
assert "검색 근거가 2건 미만입니다." in guidance
|
||||||
|
assert "외부 검색 tool이 빈 결과를 반환했습니다." in guidance
|
||||||
|
assert "외부 검색 tool 실패 이력이 있습니다." in guidance
|
||||||
|
assert "switchWorkbenchTab(\"queries\")" in script
|
||||||
|
assert "/api/search/manual" in script
|
||||||
|
assert "추천 쿼리를 입력했습니다" in script
|
||||||
|
assert ".evidence-next-action-panel" in styles
|
||||||
|
assert ".evidence-followup-reasons" in styles
|
||||||
|
assert ".suggested-query-list" in styles
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_gui_separates_face_crop_web_evidence_from_identity_matching():
|
||||||
|
script = _read(APP_JS)
|
||||||
|
|
||||||
|
assert "faceCropSearch" in script
|
||||||
|
assert "face_web" in script
|
||||||
|
assert "얼굴 영역 웹 근거" in script
|
||||||
|
assert "동일인 판정이 아닙니다" in script
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_gui_exposes_evidence_status_and_watchlist_controls():
|
||||||
|
script = _read(APP_JS)
|
||||||
|
styles = _read(STYLES)
|
||||||
|
|
||||||
|
assert "/api/evidence/" in script
|
||||||
|
assert "data-evidence-status" in script
|
||||||
|
assert "판단에 사용" in script
|
||||||
|
assert "오탐" in script
|
||||||
|
assert "주의 후보 근거" in script
|
||||||
|
assert "knowledgeEntryStatus" in script
|
||||||
|
assert "/promote-watchlist" in script
|
||||||
|
assert "/exclude-watchlist" in script
|
||||||
|
assert "data-promote-watchlist" in script
|
||||||
|
assert "data-exclude-watchlist" in script
|
||||||
|
assert ".watchlist-chip" in styles
|
||||||
|
|
||||||
|
|
||||||
|
def test_visual_assets_are_referenced_for_review_images():
|
||||||
|
html = _read(INDEX)
|
||||||
|
assets_dir = APP_DIR / "assets"
|
||||||
|
|
||||||
|
for asset_name in [
|
||||||
|
"case-portrait.svg",
|
||||||
|
"case-character.svg",
|
||||||
|
"case-emblem.svg",
|
||||||
|
"match-web.svg",
|
||||||
|
]:
|
||||||
|
assert (assets_dir / asset_name).exists()
|
||||||
|
assert f"assets/{asset_name}" in html or f"assets/{asset_name}" in _read(APP_JS)
|
||||||
35
tests/rights_filter/admin/test_correction_handlers.py
Normal file
35
tests/rights_filter/admin/test_correction_handlers.py
Normal file
|
|
@ -0,0 +1,35 @@
|
||||||
|
from rights_filter.admin.correction_handlers import correct_rejected_decision
|
||||||
|
from rights_filter.admin.knowledge_base_handlers import register_manual_entry
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
InMemoryRightsFilterRepository,
|
||||||
|
KnowledgeEntryType,
|
||||||
|
ReviewStatus,
|
||||||
|
)
|
||||||
|
from rights_filter.admin.review_handlers import record_operator_decision
|
||||||
|
|
||||||
|
|
||||||
|
def test_correcting_rejection_deactivates_only_derived_entries():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
decision = record_operator_decision(
|
||||||
|
repo,
|
||||||
|
"submission-1",
|
||||||
|
ReviewStatus.REJECTED,
|
||||||
|
fingerprints=["phash:auto"],
|
||||||
|
)
|
||||||
|
manual = register_manual_entry(
|
||||||
|
repo,
|
||||||
|
entry_type=KnowledgeEntryType.CELEBRITY,
|
||||||
|
name="IU",
|
||||||
|
sample_fingerprints=["phash:manual"],
|
||||||
|
)
|
||||||
|
|
||||||
|
deactivated = correct_rejected_decision(
|
||||||
|
repo,
|
||||||
|
decision.id,
|
||||||
|
reason="operator corrected false positive",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert len(deactivated) == 1
|
||||||
|
assert deactivated[0].source_decision_id == decision.id
|
||||||
|
assert deactivated[0].active is False
|
||||||
|
assert repo.knowledge_entry(manual.id).active is True
|
||||||
98
tests/rights_filter/admin/test_detailed_review_presenter.py
Normal file
98
tests/rights_filter/admin/test_detailed_review_presenter.py
Normal file
|
|
@ -0,0 +1,98 @@
|
||||||
|
from rights_filter.admin.detailed_review_presenter import detailed_review_for
|
||||||
|
from rights_filter.admin.review_handlers import applicant_summary_for
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
AnalysisRun,
|
||||||
|
Evidence,
|
||||||
|
EvidenceSource,
|
||||||
|
InMemoryRightsFilterRepository,
|
||||||
|
ScoreResult,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_detailed_review_groups_operator_evidence_and_actions():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
run = AnalysisRun.for_submission("submission-1", "v1")
|
||||||
|
run.add_evidence(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.NAVER_SEARCH,
|
||||||
|
reason="Naver result linked named work",
|
||||||
|
confidence=0.8,
|
||||||
|
data={"result_url": "https://example.test/page", "query": "IU album"},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
run.add_evidence(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Web entity matched IU",
|
||||||
|
confidence=0.9,
|
||||||
|
data={"entity": "IU"},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
run.add_evidence(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.LLM_SUMMARY,
|
||||||
|
reason="Assistant summarized source-linked evidence",
|
||||||
|
confidence=0.0,
|
||||||
|
data={"summary": "Both sources mention IU.", "source_urls": ["url"]},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
run.add_evidence(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.SEARCH_SKIPPED,
|
||||||
|
reason="search API disabled",
|
||||||
|
confidence=1.0,
|
||||||
|
data={"provider": "naver"},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
run.score = ScoreResult(
|
||||||
|
score=78,
|
||||||
|
band="high",
|
||||||
|
reasons=["Naver result linked named work", "Web entity matched IU"],
|
||||||
|
)
|
||||||
|
repo.save_analysis_run(run)
|
||||||
|
|
||||||
|
review = detailed_review_for(
|
||||||
|
repo,
|
||||||
|
"submission-1",
|
||||||
|
image_reference="internal://submission-1/original",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert review["submission_id"] == "submission-1"
|
||||||
|
assert review["image_reference"] == "internal://submission-1/original"
|
||||||
|
assert review["score"] == 78
|
||||||
|
assert review["band"] == "high"
|
||||||
|
assert review["manual_actions"] == ["approved", "held", "rejected"]
|
||||||
|
assert len(review["evidence_groups"]["naver"]) == 1
|
||||||
|
assert len(review["evidence_groups"]["google"]) == 1
|
||||||
|
assert len(review["evidence_groups"]["llm"]) == 1
|
||||||
|
assert len(review["evidence_groups"]["failures"]) == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_missing_analysis_returns_unavailable_review_model():
|
||||||
|
review = detailed_review_for(
|
||||||
|
InMemoryRightsFilterRepository(),
|
||||||
|
"missing",
|
||||||
|
image_reference="internal://missing/original",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert review["analysis_available"] is False
|
||||||
|
assert review["evidence_groups"] == {}
|
||||||
|
|
||||||
|
|
||||||
|
def test_applicant_summary_excludes_enrichment_details():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
run = AnalysisRun.for_submission("submission-1", "v1")
|
||||||
|
run.add_evidence(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.LLM_SUMMARY,
|
||||||
|
reason="Assistant summarized source-linked evidence",
|
||||||
|
confidence=0.0,
|
||||||
|
data={"summary": "Internal summary"},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
run.score = ScoreResult(score=50, band="medium", reasons=["internal"])
|
||||||
|
repo.save_analysis_run(run)
|
||||||
|
|
||||||
|
applicant = applicant_summary_for(repo, "submission-1")
|
||||||
|
|
||||||
|
assert applicant == {"submission_id": "submission-1"}
|
||||||
40
tests/rights_filter/admin/test_knowledge_base_handlers.py
Normal file
40
tests/rights_filter/admin/test_knowledge_base_handlers.py
Normal file
|
|
@ -0,0 +1,40 @@
|
||||||
|
from rights_filter.admin.knowledge_base_handlers import register_manual_entry
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
InMemoryRightsFilterRepository,
|
||||||
|
KnowledgeEntryType,
|
||||||
|
KnowledgeProvenance,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_can_register_manual_celebrity_entry():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
|
||||||
|
entry = register_manual_entry(
|
||||||
|
repo,
|
||||||
|
entry_type=KnowledgeEntryType.CELEBRITY,
|
||||||
|
name="IU",
|
||||||
|
aliases=["Lee Ji-eun"],
|
||||||
|
related_keywords=["album cover"],
|
||||||
|
policy_memo="Reject unless rights evidence is supplied.",
|
||||||
|
sample_fingerprints=["phash:iu"],
|
||||||
|
)
|
||||||
|
|
||||||
|
assert entry.provenance == KnowledgeProvenance.MANUAL
|
||||||
|
assert entry.aliases == ["Lee Ji-eun"]
|
||||||
|
assert repo.active_knowledge_entries() == [entry]
|
||||||
|
|
||||||
|
|
||||||
|
def test_manual_entry_rejects_biometric_template_payload():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
|
||||||
|
try:
|
||||||
|
register_manual_entry(
|
||||||
|
repo,
|
||||||
|
entry_type=KnowledgeEntryType.CELEBRITY,
|
||||||
|
name="IU",
|
||||||
|
biometric_template=[0.1, 0.2],
|
||||||
|
)
|
||||||
|
except ValueError as error:
|
||||||
|
assert "biometric template" in str(error)
|
||||||
|
else:
|
||||||
|
raise AssertionError("expected biometric template rejection")
|
||||||
52
tests/rights_filter/admin/test_review_handlers.py
Normal file
52
tests/rights_filter/admin/test_review_handlers.py
Normal file
|
|
@ -0,0 +1,52 @@
|
||||||
|
from rights_filter.admin.review_handlers import (
|
||||||
|
applicant_summary_for,
|
||||||
|
operator_summary_for,
|
||||||
|
record_operator_decision,
|
||||||
|
)
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
AnalysisRun,
|
||||||
|
Evidence,
|
||||||
|
EvidenceSource,
|
||||||
|
InMemoryRightsFilterRepository,
|
||||||
|
ReviewStatus,
|
||||||
|
ScoreResult,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_high_risk_score_is_operator_visible_but_not_applicant_visible():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
run = AnalysisRun.for_submission("submission-1", analysis_version="v1")
|
||||||
|
run.add_evidence(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Web entity matched 아이유",
|
||||||
|
confidence=0.93,
|
||||||
|
data={"entity": "아이유"},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
run.score = ScoreResult(score=88, band="high", reasons=["Web entity matched 아이유"])
|
||||||
|
repo.save_analysis_run(run)
|
||||||
|
|
||||||
|
operator = operator_summary_for(repo, "submission-1")
|
||||||
|
applicant = applicant_summary_for(repo, "submission-1")
|
||||||
|
|
||||||
|
assert operator["score"] == 88
|
||||||
|
assert operator["evidence"][0]["reason"] == "Web entity matched 아이유"
|
||||||
|
assert "score" not in applicant
|
||||||
|
assert "evidence" not in applicant
|
||||||
|
|
||||||
|
|
||||||
|
def test_rejection_creates_automatic_knowledge_entry():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
decision = record_operator_decision(
|
||||||
|
repo,
|
||||||
|
submission_id="submission-2",
|
||||||
|
status=ReviewStatus.REJECTED,
|
||||||
|
memo="캐릭터 권리 위험",
|
||||||
|
fingerprints=["phash:abc"],
|
||||||
|
)
|
||||||
|
|
||||||
|
entries = repo.active_knowledge_entries()
|
||||||
|
assert decision.status == ReviewStatus.REJECTED
|
||||||
|
assert len(entries) == 1
|
||||||
|
assert entries[0].source_decision_id == decision.id
|
||||||
112
tests/rights_filter/analysis/test_evidence_enrichment.py
Normal file
112
tests/rights_filter/analysis/test_evidence_enrichment.py
Normal file
|
|
@ -0,0 +1,112 @@
|
||||||
|
from rights_filter.analysis.evidence_enrichment import EvidenceEnricher
|
||||||
|
from rights_filter.analysis.llm_assistance import FakeInternalLlmClient, InternalLlmAssistant
|
||||||
|
from rights_filter.analysis.risk_scoring import RiskScorer
|
||||||
|
from rights_filter.analysis.search_query_generation import SearchQueryGenerator
|
||||||
|
from rights_filter.analysis.search_result_promoter import SearchResultPromoter
|
||||||
|
from rights_filter.domain.records import AnalysisRun, Evidence, EvidenceSource, InMemoryRightsFilterRepository, ScoreResult
|
||||||
|
from rights_filter.integrations.naver_search import FakeNaverSearchClient, NaverSearchAdapter
|
||||||
|
from rights_filter.integrations.search_policy import SearchApiPolicy
|
||||||
|
|
||||||
|
|
||||||
|
def _enricher(policy: SearchApiPolicy | None = None) -> EvidenceEnricher:
|
||||||
|
return EvidenceEnricher(
|
||||||
|
query_generator=SearchQueryGenerator(),
|
||||||
|
naver_adapter=NaverSearchAdapter(
|
||||||
|
FakeNaverSearchClient(
|
||||||
|
response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official album cover",
|
||||||
|
"link": "https://example.test/image.jpg",
|
||||||
|
"thumbnail": "https://example.test/thumb.jpg",
|
||||||
|
"page_url": "https://example.test/page",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
),
|
||||||
|
search_policy=policy
|
||||||
|
or SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"}),
|
||||||
|
promoter=SearchResultPromoter(),
|
||||||
|
llm_assistant=InternalLlmAssistant(
|
||||||
|
FakeInternalLlmClient(summary="Evidence cites a named work.")
|
||||||
|
),
|
||||||
|
scorer=RiskScorer(),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_enrichment_adds_naver_evidence_llm_summary_and_score():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
run = AnalysisRun.for_submission("submission-1", "v1")
|
||||||
|
run.add_evidence(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Web entity matched IU",
|
||||||
|
confidence=0.9,
|
||||||
|
data={"entity": "IU", "category": "celebrity"},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
run.score = ScoreResult(score=35, band="medium", reasons=["Web entity matched IU"])
|
||||||
|
repo.save_analysis_run(run)
|
||||||
|
|
||||||
|
summary = _enricher().enrich_latest(repo, "submission-1")
|
||||||
|
|
||||||
|
latest = repo.analysis_runs_for_submission("submission-1")[-1]
|
||||||
|
assert summary.generated_queries == 3
|
||||||
|
assert summary.executed_searches == 3
|
||||||
|
assert any(item.source == EvidenceSource.NAVER_SEARCH for item in latest.evidence)
|
||||||
|
assert any(item.source == EvidenceSource.LLM_SUMMARY for item in latest.evidence)
|
||||||
|
assert latest.score.score >= 35
|
||||||
|
|
||||||
|
|
||||||
|
def test_enrichment_is_idempotent_for_same_query_signature():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
run = AnalysisRun.for_submission("submission-1", "v1")
|
||||||
|
run.add_evidence(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Web entity matched IU",
|
||||||
|
confidence=0.9,
|
||||||
|
data={"entity": "IU", "category": "celebrity"},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
repo.save_analysis_run(run)
|
||||||
|
enricher = _enricher()
|
||||||
|
|
||||||
|
enricher.enrich_latest(repo, "submission-1")
|
||||||
|
enricher.enrich_latest(repo, "submission-1")
|
||||||
|
|
||||||
|
latest = repo.analysis_runs_for_submission("submission-1")[-1]
|
||||||
|
search_evidence = [
|
||||||
|
item for item in latest.evidence if item.source == EvidenceSource.NAVER_SEARCH
|
||||||
|
]
|
||||||
|
assert len(search_evidence) == 3
|
||||||
|
|
||||||
|
|
||||||
|
def test_disabled_naver_records_search_skipped_and_still_summarizes():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
run = AnalysisRun.for_submission("submission-1", "v1")
|
||||||
|
run.add_evidence(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Web entity matched IU",
|
||||||
|
confidence=0.9,
|
||||||
|
data={"entity": "IU", "category": "celebrity"},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
repo.save_analysis_run(run)
|
||||||
|
|
||||||
|
_enricher(SearchApiPolicy(disabled=True, compliance_approved=True)).enrich_latest(
|
||||||
|
repo, "submission-1"
|
||||||
|
)
|
||||||
|
|
||||||
|
latest = repo.analysis_runs_for_submission("submission-1")[-1]
|
||||||
|
assert any(item.source == EvidenceSource.SEARCH_SKIPPED for item in latest.evidence)
|
||||||
|
assert any(item.source == EvidenceSource.LLM_SUMMARY for item in latest.evidence)
|
||||||
|
|
||||||
|
|
||||||
|
def test_missing_analysis_run_returns_failure_summary():
|
||||||
|
summary = _enricher().enrich_latest(InMemoryRightsFilterRepository(), "missing")
|
||||||
|
|
||||||
|
assert summary.failed == 1
|
||||||
|
assert "missing analysis run" in summary.failure_reasons[0]
|
||||||
225
tests/rights_filter/analysis/test_internal_analyzer.py
Normal file
225
tests/rights_filter/analysis/test_internal_analyzer.py
Normal file
|
|
@ -0,0 +1,225 @@
|
||||||
|
import sys
|
||||||
|
from io import BytesIO
|
||||||
|
from types import SimpleNamespace
|
||||||
|
|
||||||
|
from rights_filter.analysis.face_person_detection import FacePersonSignal, HeuristicFacePersonDetector
|
||||||
|
from rights_filter.analysis.fingerprints import FingerprintService
|
||||||
|
from rights_filter.analysis.internal_analyzer import InternalAnalyzer
|
||||||
|
from rights_filter.analysis.preprocessing import ImagePayload
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
EvidenceSource,
|
||||||
|
InMemoryRightsFilterRepository,
|
||||||
|
KnowledgeBaseEntry,
|
||||||
|
KnowledgeEntryType,
|
||||||
|
KnowledgeProvenance,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _sample_png_bytes(color="white"):
|
||||||
|
from PIL import Image, ImageDraw
|
||||||
|
|
||||||
|
image = Image.new("RGB", (64, 64), color)
|
||||||
|
draw = ImageDraw.Draw(image)
|
||||||
|
draw.rectangle((14, 10, 46, 52), fill="black")
|
||||||
|
buffer = BytesIO()
|
||||||
|
image.save(buffer, format="PNG")
|
||||||
|
return buffer.getvalue()
|
||||||
|
|
||||||
|
|
||||||
|
def test_fingerprint_service_treats_resized_same_image_as_similar():
|
||||||
|
from PIL import Image, ImageDraw
|
||||||
|
|
||||||
|
base = Image.new("RGB", (64, 64), "white")
|
||||||
|
draw = ImageDraw.Draw(base)
|
||||||
|
draw.rectangle((14, 10, 46, 52), fill="black")
|
||||||
|
resized = base.resize((128, 128))
|
||||||
|
|
||||||
|
base_buffer = BytesIO()
|
||||||
|
resized_buffer = BytesIO()
|
||||||
|
base.save(base_buffer, format="PNG")
|
||||||
|
resized.save(resized_buffer, format="PNG")
|
||||||
|
service = FingerprintService()
|
||||||
|
|
||||||
|
base_fingerprint = service.fingerprints_for(base_buffer.getvalue())
|
||||||
|
resized_fingerprint = service.fingerprints_for(resized_buffer.getvalue())
|
||||||
|
|
||||||
|
assert service.similarity(base_fingerprint.perceptual, resized_fingerprint.perceptual) >= 0.9
|
||||||
|
|
||||||
|
|
||||||
|
def test_prior_rejected_similarity_emits_fingerprint_evidence():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
service = FingerprintService()
|
||||||
|
payload = ImagePayload(content=_sample_png_bytes(), width=64, height=64, metadata={})
|
||||||
|
fingerprints = service.fingerprints_for(payload.content)
|
||||||
|
repo.save_knowledge_entry(
|
||||||
|
KnowledgeBaseEntry.create(
|
||||||
|
entry_type=KnowledgeEntryType.REJECTED_IMAGE,
|
||||||
|
name="rejected:old",
|
||||||
|
provenance=KnowledgeProvenance.AUTOMATIC_REJECTION,
|
||||||
|
sample_fingerprints=[fingerprints.perceptual],
|
||||||
|
)
|
||||||
|
)
|
||||||
|
analyzer = InternalAnalyzer(repo, service, HeuristicFacePersonDetector())
|
||||||
|
|
||||||
|
evidence = analyzer.analyze("submission-1", payload)
|
||||||
|
|
||||||
|
fingerprint_reasons = [item for item in evidence if item.source == EvidenceSource.FINGERPRINT]
|
||||||
|
assert any("similarity" in item.reason for item in fingerprint_reasons)
|
||||||
|
assert fingerprint_reasons[0].confidence >= 0.9
|
||||||
|
|
||||||
|
|
||||||
|
def test_watchlist_similarity_emits_separate_fingerprint_evidence_label():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
service = FingerprintService()
|
||||||
|
payload = ImagePayload(content=_sample_png_bytes(), width=64, height=64, metadata={})
|
||||||
|
fingerprints = service.fingerprints_for(payload.content)
|
||||||
|
repo.save_knowledge_entry(
|
||||||
|
KnowledgeBaseEntry.create(
|
||||||
|
entry_type=KnowledgeEntryType.REJECTED_IMAGE,
|
||||||
|
name="watchlist:old",
|
||||||
|
provenance=KnowledgeProvenance.AUTOMATIC_REJECTION,
|
||||||
|
sample_fingerprints=[fingerprints.perceptual],
|
||||||
|
entry_status="watchlist",
|
||||||
|
source_submission_id="SUB-OLD",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
analyzer = InternalAnalyzer(repo, service, HeuristicFacePersonDetector())
|
||||||
|
|
||||||
|
evidence = analyzer.analyze("submission-1", payload)
|
||||||
|
watchlist_matches = [
|
||||||
|
item
|
||||||
|
for item in evidence
|
||||||
|
if item.source == EvidenceSource.FINGERPRINT
|
||||||
|
and item.data.get("knowledge_entry_status") == "watchlist"
|
||||||
|
]
|
||||||
|
|
||||||
|
assert watchlist_matches
|
||||||
|
assert "주의 후보 이미지 유사도" in watchlist_matches[0].reason
|
||||||
|
assert watchlist_matches[0].data["source_submission_id"] == "SUB-OLD"
|
||||||
|
assert "identity" not in watchlist_matches[0].data
|
||||||
|
assert "embedding" not in watchlist_matches[0].data
|
||||||
|
|
||||||
|
|
||||||
|
def test_face_presence_evidence_contains_no_identity_or_embedding():
|
||||||
|
class OneFaceDetector:
|
||||||
|
def detect(self, image):
|
||||||
|
return FacePersonSignal(face_count=1, person_count=1)
|
||||||
|
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
analyzer = InternalAnalyzer(repo, FingerprintService(), OneFaceDetector())
|
||||||
|
payload = ImagePayload(content=b"image bytes", width=100, height=100, metadata={})
|
||||||
|
|
||||||
|
evidence = analyzer.analyze("submission-2", payload)
|
||||||
|
|
||||||
|
face_evidence = [item for item in evidence if item.source == EvidenceSource.FACE_PERSON]
|
||||||
|
assert face_evidence
|
||||||
|
assert face_evidence[0].data["face_count"] == 1
|
||||||
|
assert "identity" not in face_evidence[0].data
|
||||||
|
assert "embedding" not in face_evidence[0].data
|
||||||
|
assert "face_boxes" not in face_evidence[0].data
|
||||||
|
|
||||||
|
|
||||||
|
def test_face_detector_does_not_treat_marker_text_as_detection():
|
||||||
|
signal = HeuristicFacePersonDetector().detect(
|
||||||
|
ImagePayload(content=b"portrait FACE PERSON", width=100, height=100, metadata={})
|
||||||
|
)
|
||||||
|
|
||||||
|
assert signal.present is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_fingerprint_service_does_not_estimate_similarity_for_non_images():
|
||||||
|
service = FingerprintService()
|
||||||
|
|
||||||
|
left = service.fingerprints_for(b"same image content one")
|
||||||
|
right = service.fingerprints_for(b"same image content two")
|
||||||
|
|
||||||
|
assert left.perceptual.startswith("phash:unavailable:")
|
||||||
|
assert service.similarity(left.perceptual, right.perceptual) == 0.0
|
||||||
|
|
||||||
|
|
||||||
|
def test_face_detector_uses_local_opencv_when_available(monkeypatch):
|
||||||
|
class FakeCascade:
|
||||||
|
def __init__(self, path):
|
||||||
|
self.path = path
|
||||||
|
|
||||||
|
def empty(self):
|
||||||
|
return False
|
||||||
|
|
||||||
|
def detectMultiScale(self, gray, scaleFactor, minNeighbors, minSize):
|
||||||
|
if "frontalface" in self.path:
|
||||||
|
return [(1, 2, 30, 30), (40, 2, 30, 30)]
|
||||||
|
return []
|
||||||
|
|
||||||
|
fake_cv2 = SimpleNamespace(
|
||||||
|
IMREAD_COLOR=1,
|
||||||
|
COLOR_BGR2GRAY=2,
|
||||||
|
data=SimpleNamespace(haarcascades="cascades/"),
|
||||||
|
imdecode=lambda data, flags: "decoded-image",
|
||||||
|
cvtColor=lambda image, conversion: "gray-image",
|
||||||
|
CascadeClassifier=FakeCascade,
|
||||||
|
)
|
||||||
|
fake_numpy = SimpleNamespace(
|
||||||
|
uint8="uint8",
|
||||||
|
frombuffer=lambda content, dtype: ["bytes"],
|
||||||
|
)
|
||||||
|
monkeypatch.setitem(sys.modules, "cv2", fake_cv2)
|
||||||
|
monkeypatch.setitem(sys.modules, "numpy", fake_numpy)
|
||||||
|
|
||||||
|
signal = HeuristicFacePersonDetector().detect(
|
||||||
|
ImagePayload(content=b"\xff\xd8 image bytes", width=120, height=80, metadata={})
|
||||||
|
)
|
||||||
|
|
||||||
|
assert signal.face_count == 2
|
||||||
|
assert signal.person_count == 2
|
||||||
|
assert signal.face_boxes == ((1, 2, 30, 30), (40, 2, 30, 30))
|
||||||
|
|
||||||
|
|
||||||
|
def test_face_detector_falls_back_to_pillow_decode_for_avif(monkeypatch):
|
||||||
|
class FakeCascade:
|
||||||
|
def __init__(self, path):
|
||||||
|
self.path = path
|
||||||
|
|
||||||
|
def empty(self):
|
||||||
|
return False
|
||||||
|
|
||||||
|
def detectMultiScale(self, gray, scaleFactor, minNeighbors, minSize):
|
||||||
|
return [(1, 2, 30, 30)] if "frontalface" in self.path else []
|
||||||
|
|
||||||
|
class FakeImage:
|
||||||
|
def __enter__(self):
|
||||||
|
return self
|
||||||
|
|
||||||
|
def __exit__(self, exc_type, exc, traceback):
|
||||||
|
return False
|
||||||
|
|
||||||
|
def convert(self, mode):
|
||||||
|
return "rgb-image"
|
||||||
|
|
||||||
|
fake_cv2 = SimpleNamespace(
|
||||||
|
IMREAD_COLOR=1,
|
||||||
|
COLOR_BGR2GRAY=2,
|
||||||
|
COLOR_RGB2BGR=3,
|
||||||
|
data=SimpleNamespace(haarcascades="cascades/"),
|
||||||
|
imdecode=lambda data, flags: None,
|
||||||
|
cvtColor=lambda image, conversion: "gray-image",
|
||||||
|
CascadeClassifier=FakeCascade,
|
||||||
|
)
|
||||||
|
fake_numpy = SimpleNamespace(
|
||||||
|
uint8="uint8",
|
||||||
|
frombuffer=lambda content, dtype: ["bytes"],
|
||||||
|
array=lambda image: "array-image",
|
||||||
|
)
|
||||||
|
fake_pil_image = SimpleNamespace(open=lambda stream: FakeImage())
|
||||||
|
fake_pil = SimpleNamespace(Image=fake_pil_image)
|
||||||
|
monkeypatch.setitem(sys.modules, "cv2", fake_cv2)
|
||||||
|
monkeypatch.setitem(sys.modules, "numpy", fake_numpy)
|
||||||
|
monkeypatch.setitem(sys.modules, "PIL", fake_pil)
|
||||||
|
monkeypatch.setitem(sys.modules, "PIL.Image", fake_pil_image)
|
||||||
|
|
||||||
|
signal = HeuristicFacePersonDetector().detect(
|
||||||
|
ImagePayload(content=b"avif image bytes", width=120, height=80, metadata={"format": "AVIF"})
|
||||||
|
)
|
||||||
|
|
||||||
|
assert signal.face_count == 1
|
||||||
|
assert signal.person_count == 1
|
||||||
|
assert signal.face_boxes == ((1, 2, 30, 30),)
|
||||||
47
tests/rights_filter/analysis/test_llm_assistance.py
Normal file
47
tests/rights_filter/analysis/test_llm_assistance.py
Normal file
|
|
@ -0,0 +1,47 @@
|
||||||
|
from rights_filter.analysis.llm_assistance import FakeInternalLlmClient, InternalLlmAssistant
|
||||||
|
from rights_filter.domain.records import Evidence, EvidenceSource
|
||||||
|
|
||||||
|
|
||||||
|
def test_llm_summary_cites_source_urls_and_evidence_ids():
|
||||||
|
source_evidence = [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.NAVER_SEARCH,
|
||||||
|
reason="Naver search result found",
|
||||||
|
confidence=0.8,
|
||||||
|
data={
|
||||||
|
"evidence_id": "evidence-1",
|
||||||
|
"result_url": "https://example.test/page",
|
||||||
|
"title": "IU official album cover",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
assistant = InternalLlmAssistant(
|
||||||
|
FakeInternalLlmClient(summary="Search evidence cites the named work.")
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = assistant.summarize("submission-1", source_evidence)
|
||||||
|
|
||||||
|
assert evidence.source == EvidenceSource.LLM_SUMMARY
|
||||||
|
assert evidence.confidence == 0.0
|
||||||
|
assert evidence.data["summary"] == "Search evidence cites the named work."
|
||||||
|
assert evidence.data["source_urls"] == ["https://example.test/page"]
|
||||||
|
assert evidence.data["source_evidence_ids"] == ["evidence-1"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_llm_summary_without_sources_is_marked_unverified():
|
||||||
|
assistant = InternalLlmAssistant(FakeInternalLlmClient(summary="Looks famous."))
|
||||||
|
|
||||||
|
evidence = assistant.summarize("submission-1", [])
|
||||||
|
|
||||||
|
assert evidence.source == EvidenceSource.LLM_SUMMARY
|
||||||
|
assert evidence.reason == "LLM summary has no source evidence"
|
||||||
|
assert evidence.data["verified"] is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_llm_failure_becomes_enrichment_failure_evidence():
|
||||||
|
assistant = InternalLlmAssistant(FakeInternalLlmClient(error=RuntimeError("down")))
|
||||||
|
|
||||||
|
evidence = assistant.summarize("submission-1", [])
|
||||||
|
|
||||||
|
assert evidence.source == EvidenceSource.ENRICHMENT_FAILURE
|
||||||
|
assert "LLM assistance failed" in evidence.reason
|
||||||
106
tests/rights_filter/analysis/test_preprocessing.py
Normal file
106
tests/rights_filter/analysis/test_preprocessing.py
Normal file
|
|
@ -0,0 +1,106 @@
|
||||||
|
from rights_filter.analysis.preprocessing import (
|
||||||
|
ImagePayload,
|
||||||
|
PreprocessingError,
|
||||||
|
build_face_crop_derivatives,
|
||||||
|
build_external_derivative,
|
||||||
|
)
|
||||||
|
from rights_filter.domain.records import DataClass
|
||||||
|
|
||||||
|
|
||||||
|
def test_external_derivative_removes_exif_and_caps_dimensions():
|
||||||
|
original = ImagePayload(
|
||||||
|
content=b"image-bytes",
|
||||||
|
width=4000,
|
||||||
|
height=2000,
|
||||||
|
metadata={"EXIF": "camera serial", "author": "user"},
|
||||||
|
)
|
||||||
|
|
||||||
|
derivative = build_external_derivative(original, max_side=1600)
|
||||||
|
|
||||||
|
assert derivative.width == 1600
|
||||||
|
assert derivative.height == 800
|
||||||
|
assert derivative.metadata == {}
|
||||||
|
assert derivative.data_class == DataClass.EXTERNAL_DERIVATIVE
|
||||||
|
assert derivative.content == b"image-bytes"
|
||||||
|
|
||||||
|
|
||||||
|
def test_small_image_is_still_metadata_stripped():
|
||||||
|
original = ImagePayload(
|
||||||
|
content=b"small-image",
|
||||||
|
width=800,
|
||||||
|
height=600,
|
||||||
|
metadata={"GPS": "private"},
|
||||||
|
)
|
||||||
|
|
||||||
|
derivative = build_external_derivative(original, max_side=1600)
|
||||||
|
|
||||||
|
assert derivative.width == 800
|
||||||
|
assert derivative.height == 600
|
||||||
|
assert derivative.metadata == {}
|
||||||
|
|
||||||
|
|
||||||
|
def test_empty_image_records_preprocessing_failure():
|
||||||
|
original = ImagePayload(content=b"", width=100, height=100, metadata={})
|
||||||
|
|
||||||
|
try:
|
||||||
|
build_external_derivative(original)
|
||||||
|
except PreprocessingError as error:
|
||||||
|
assert "empty image content" in str(error)
|
||||||
|
else:
|
||||||
|
raise AssertionError("expected preprocessing failure")
|
||||||
|
|
||||||
|
|
||||||
|
def test_avif_external_derivative_is_converted_to_jpeg_when_decodable():
|
||||||
|
from io import BytesIO
|
||||||
|
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
source = BytesIO()
|
||||||
|
Image.new("RGB", (200, 100), color=(255, 0, 0)).save(source, format="PNG")
|
||||||
|
original = ImagePayload(
|
||||||
|
content=source.getvalue(),
|
||||||
|
width=200,
|
||||||
|
height=100,
|
||||||
|
metadata={"format": "AVIF", "EXIF": "private"},
|
||||||
|
)
|
||||||
|
|
||||||
|
derivative = build_external_derivative(original, max_side=100)
|
||||||
|
|
||||||
|
assert derivative.content.startswith(b"\xff\xd8")
|
||||||
|
assert derivative.width == 100
|
||||||
|
assert derivative.height == 50
|
||||||
|
assert derivative.metadata == {}
|
||||||
|
assert derivative.data_class == DataClass.EXTERNAL_DERIVATIVE
|
||||||
|
|
||||||
|
|
||||||
|
def test_face_crop_derivative_uses_only_cropped_pixels_and_strips_metadata():
|
||||||
|
from io import BytesIO
|
||||||
|
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
source = BytesIO()
|
||||||
|
Image.new("RGB", (200, 100), color=(0, 120, 200)).save(source, format="PNG")
|
||||||
|
original = ImagePayload(
|
||||||
|
content=source.getvalue(),
|
||||||
|
width=200,
|
||||||
|
height=100,
|
||||||
|
metadata={"EXIF": "camera serial", "GPS": "private"},
|
||||||
|
)
|
||||||
|
|
||||||
|
crops = build_face_crop_derivatives(
|
||||||
|
original,
|
||||||
|
[(50, 20, 60, 40)],
|
||||||
|
max_crops=1,
|
||||||
|
padding_ratio=0.25,
|
||||||
|
max_side=64,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert len(crops) == 1
|
||||||
|
crop = crops[0]
|
||||||
|
assert crop.content.startswith(b"\xff\xd8")
|
||||||
|
assert crop.width <= 64
|
||||||
|
assert crop.height <= 64
|
||||||
|
assert crop.width > 0
|
||||||
|
assert crop.height > 0
|
||||||
|
assert crop.metadata == {}
|
||||||
|
assert crop.data_class == DataClass.EXTERNAL_DERIVATIVE
|
||||||
179
tests/rights_filter/analysis/test_risk_scoring.py
Normal file
179
tests/rights_filter/analysis/test_risk_scoring.py
Normal file
|
|
@ -0,0 +1,179 @@
|
||||||
|
from rights_filter.analysis.risk_scoring import RiskScorer
|
||||||
|
from rights_filter.domain.records import Evidence, EvidenceSource, ScoreResult
|
||||||
|
|
||||||
|
|
||||||
|
def test_face_presence_alone_requires_medium_review_without_becoming_high_risk():
|
||||||
|
scorer = RiskScorer()
|
||||||
|
result = scorer.score(
|
||||||
|
[
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.FACE_PERSON,
|
||||||
|
reason="Face/person detected",
|
||||||
|
confidence=0.8,
|
||||||
|
data={"face_count": 1},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert 30 <= result.score < 70
|
||||||
|
assert result.band == "medium"
|
||||||
|
|
||||||
|
|
||||||
|
def test_operator_false_positive_evidence_is_not_scored():
|
||||||
|
result = RiskScorer().score(
|
||||||
|
[
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.FACE_PERSON,
|
||||||
|
reason="Face/person detected",
|
||||||
|
confidence=0.8,
|
||||||
|
data={"operator_status": "false_positive"},
|
||||||
|
),
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Matching image URL found",
|
||||||
|
confidence=0.9,
|
||||||
|
data={"url": "https://example.com/official-character", "operator_status": "irrelevant"},
|
||||||
|
),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.score == 0
|
||||||
|
assert result.band == "low"
|
||||||
|
assert result.reasons == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_character_web_evidence_can_create_high_risk_for_ai_or_fanart():
|
||||||
|
scorer = RiskScorer()
|
||||||
|
result = scorer.score(
|
||||||
|
[
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Web entity matched character",
|
||||||
|
confidence=0.94,
|
||||||
|
data={"entity": "유명 웹툰 캐릭터", "category": "character"},
|
||||||
|
),
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Matching image URL found",
|
||||||
|
confidence=0.9,
|
||||||
|
data={"url": "https://example.com/official-character"},
|
||||||
|
),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.score >= 70
|
||||||
|
assert result.band == "high"
|
||||||
|
assert "Web entity matched character" in result.reasons
|
||||||
|
|
||||||
|
|
||||||
|
def test_failure_reason_does_not_lower_existing_high_risk_score():
|
||||||
|
scorer = RiskScorer()
|
||||||
|
high_without_failure = scorer.score(
|
||||||
|
[
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.FINGERPRINT,
|
||||||
|
reason="Prior rejected image similarity 0.96",
|
||||||
|
confidence=0.96,
|
||||||
|
data={"similarity": 0.96},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
)
|
||||||
|
with_failure = scorer.score(
|
||||||
|
[
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.FINGERPRINT,
|
||||||
|
reason="Prior rejected image similarity 0.96",
|
||||||
|
confidence=0.96,
|
||||||
|
data={"similarity": 0.96},
|
||||||
|
),
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.FAILURE,
|
||||||
|
reason="External API failed",
|
||||||
|
confidence=1.0,
|
||||||
|
data={},
|
||||||
|
),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert with_failure.score >= high_without_failure.score
|
||||||
|
assert "External API failed" in with_failure.reasons
|
||||||
|
assert isinstance(with_failure, ScoreResult)
|
||||||
|
|
||||||
|
|
||||||
|
def test_promoted_naver_search_evidence_can_raise_review_risk():
|
||||||
|
result = RiskScorer().score(
|
||||||
|
[
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.NAVER_SEARCH,
|
||||||
|
reason="Naver result linked named work",
|
||||||
|
confidence=0.8,
|
||||||
|
data={
|
||||||
|
"promoted": True,
|
||||||
|
"promotion_reason": "named person/work search evidence",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.band == "medium"
|
||||||
|
assert result.score >= 35
|
||||||
|
assert "Naver result linked named work" in result.reasons
|
||||||
|
|
||||||
|
|
||||||
|
def test_llm_summary_does_not_directly_affect_score():
|
||||||
|
result = RiskScorer().score(
|
||||||
|
[
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.LLM_SUMMARY,
|
||||||
|
reason="Assistant thinks this is famous",
|
||||||
|
confidence=1.0,
|
||||||
|
data={"verified": False},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.score == 0
|
||||||
|
assert result.band == "low"
|
||||||
|
assert result.reasons == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_best_guess_label_alone_does_not_create_risk_reason():
|
||||||
|
result = RiskScorer().score(
|
||||||
|
[
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Best guess label gentleman",
|
||||||
|
confidence=0.6,
|
||||||
|
data={"label": "gentleman"},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.score == 0
|
||||||
|
assert result.band == "low"
|
||||||
|
assert result.reasons == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_multiple_google_visual_similar_results_do_not_stack_into_high_risk():
|
||||||
|
result = RiskScorer().score(
|
||||||
|
[
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.FACE_PERSON,
|
||||||
|
reason="Face/person detected",
|
||||||
|
confidence=0.8,
|
||||||
|
data={"face_count": 1},
|
||||||
|
),
|
||||||
|
*[
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google visually similar image found",
|
||||||
|
confidence=0.55,
|
||||||
|
data={"match": "visual", "url": f"https://example.test/{index}.jpg"},
|
||||||
|
)
|
||||||
|
for index in range(10)
|
||||||
|
],
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.band == "medium"
|
||||||
|
assert result.score < 70
|
||||||
271
tests/rights_filter/analysis/test_search_query_generation.py
Normal file
271
tests/rights_filter/analysis/test_search_query_generation.py
Normal file
|
|
@ -0,0 +1,271 @@
|
||||||
|
from rights_filter.analysis.search_query_generation import SearchQueryGenerator
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
Evidence,
|
||||||
|
EvidenceSource,
|
||||||
|
KnowledgeBaseEntry,
|
||||||
|
KnowledgeEntryType,
|
||||||
|
KnowledgeProvenance,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_generator_plans_web_entities_and_knowledge_aliases_without_duplicates():
|
||||||
|
evidence = [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Web entity matched IU",
|
||||||
|
confidence=0.9,
|
||||||
|
data={"entity": "IU", "category": "celebrity"},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
knowledge = [
|
||||||
|
KnowledgeBaseEntry.create(
|
||||||
|
entry_type=KnowledgeEntryType.CELEBRITY,
|
||||||
|
name="IU",
|
||||||
|
provenance=KnowledgeProvenance.MANUAL,
|
||||||
|
aliases=["Lee Ji-eun", "IU"],
|
||||||
|
related_keywords=["album cover"],
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
plan = SearchQueryGenerator().plan(evidence, knowledge)
|
||||||
|
|
||||||
|
assert [item.query for item in plan] == [
|
||||||
|
"IU 공식 프로필",
|
||||||
|
"IU 사진",
|
||||||
|
"IU 화보",
|
||||||
|
"IU album cover",
|
||||||
|
"Lee Ji-eun album cover",
|
||||||
|
]
|
||||||
|
assert plan[0].strategy == "google_entity"
|
||||||
|
assert plan[0].source == "IU"
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_generator_prioritizes_google_page_titles_and_skips_weak_generic_labels():
|
||||||
|
evidence = [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google page with matching image found",
|
||||||
|
confidence=0.9,
|
||||||
|
data={"page_title": "Known actor official profile - Example Site", "match": "page"},
|
||||||
|
),
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google weak label gentleman",
|
||||||
|
confidence=0.0,
|
||||||
|
data={"label": "gentleman", "weak_hint": True},
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
queries = SearchQueryGenerator().generate(evidence, [])
|
||||||
|
|
||||||
|
assert queries == [
|
||||||
|
"Known actor official profile",
|
||||||
|
"Known actor official profile image",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_generator_uses_weak_label_and_google_page_title_candidates_together():
|
||||||
|
evidence = [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google weak results for IU image",
|
||||||
|
confidence=0.0,
|
||||||
|
data={
|
||||||
|
"page_title": "IU profile - Example Site",
|
||||||
|
"label": "IU official profile",
|
||||||
|
"weak_hint": True,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
plan = SearchQueryGenerator().plan(evidence, [])
|
||||||
|
|
||||||
|
assert [item.query for item in plan] == [
|
||||||
|
"IU profile",
|
||||||
|
"IU profile image",
|
||||||
|
"IU official profile",
|
||||||
|
"IU official profile image",
|
||||||
|
]
|
||||||
|
assert [item.strategy for item in plan] == [
|
||||||
|
"google_page",
|
||||||
|
"google_page",
|
||||||
|
"google_best_guess",
|
||||||
|
"google_best_guess",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_generator_uses_weak_label_fallback_entity_with_low_priority():
|
||||||
|
evidence = [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google weak entity hint",
|
||||||
|
confidence=0.0,
|
||||||
|
data={
|
||||||
|
"entity": "IU",
|
||||||
|
"category": "celebrity",
|
||||||
|
"weak_hint": True,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
plan = SearchQueryGenerator().plan(evidence, [])
|
||||||
|
|
||||||
|
assert [item.query for item in plan] == [
|
||||||
|
"IU image",
|
||||||
|
"IU",
|
||||||
|
]
|
||||||
|
assert [item.strategy for item in plan] == [
|
||||||
|
"google_entity",
|
||||||
|
"google_entity",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_generator_weak_entity_falls_back_when_label_is_generic():
|
||||||
|
evidence = [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google weak hint with generic label",
|
||||||
|
confidence=0.0,
|
||||||
|
data={
|
||||||
|
"entity": "IU",
|
||||||
|
"label": "person",
|
||||||
|
"weak_hint": True,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
plan = SearchQueryGenerator().plan(evidence, [])
|
||||||
|
|
||||||
|
assert [item.query for item in plan] == [
|
||||||
|
"IU image",
|
||||||
|
"IU",
|
||||||
|
]
|
||||||
|
assert [item.strategy for item in plan] == [
|
||||||
|
"google_entity",
|
||||||
|
"google_entity",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_generator_uses_specific_google_best_guess_labels_as_low_priority_queries():
|
||||||
|
evidence = [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google weak label IU official profile",
|
||||||
|
confidence=0.0,
|
||||||
|
data={"label": "IU official profile", "weak_hint": True},
|
||||||
|
),
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google weak label person",
|
||||||
|
confidence=0.0,
|
||||||
|
data={"label": "person", "weak_hint": True},
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
plan = SearchQueryGenerator().plan(evidence, [])
|
||||||
|
|
||||||
|
assert [item.query for item in plan] == [
|
||||||
|
"IU official profile",
|
||||||
|
"IU official profile image",
|
||||||
|
]
|
||||||
|
assert all(item.strategy == "google_best_guess" for item in plan)
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_generator_uses_face_crop_page_titles_without_making_them_scoring_evidence():
|
||||||
|
evidence = [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google face crop web evidence",
|
||||||
|
confidence=0.75,
|
||||||
|
data={
|
||||||
|
"page_title": "Known actor official profile - Example Site",
|
||||||
|
"face_crop_search": True,
|
||||||
|
"weak_hint": True,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
plan = SearchQueryGenerator().plan(evidence, [])
|
||||||
|
|
||||||
|
assert [item.query for item in plan] == [
|
||||||
|
"Known actor official profile",
|
||||||
|
"Known actor official profile image",
|
||||||
|
]
|
||||||
|
assert all(item.strategy == "google_face_crop_page" for item in plan)
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_generator_limits_query_count_after_priority_sorting():
|
||||||
|
evidence = [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Web entity matched IU",
|
||||||
|
confidence=0.9,
|
||||||
|
data={"entity": "IU", "category": "celebrity"},
|
||||||
|
),
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google page with matching image found",
|
||||||
|
confidence=0.9,
|
||||||
|
data={"page_title": "IU official profile", "match": "page"},
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
queries = SearchQueryGenerator().generate(evidence, [], max_queries=3)
|
||||||
|
|
||||||
|
assert queries == [
|
||||||
|
"IU official profile",
|
||||||
|
"IU official profile image",
|
||||||
|
"IU 공식 프로필",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_generator_uses_korean_image_suffix_for_korean_page_titles():
|
||||||
|
evidence = [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.WEB_DETECTION,
|
||||||
|
reason="Google page with matching image found",
|
||||||
|
confidence=0.9,
|
||||||
|
data={"page_title": "김연아 공식 홈페이지 – Example Site", "match": "page"},
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
queries = SearchQueryGenerator().generate(evidence, [])
|
||||||
|
|
||||||
|
assert queries == [
|
||||||
|
"김연아 공식 홈페이지",
|
||||||
|
"김연아 공식 홈페이지 이미지",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_generator_uses_specific_local_metadata_hints_but_skips_generic_file_names():
|
||||||
|
evidence = [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.FINGERPRINT,
|
||||||
|
reason="Local submission title",
|
||||||
|
confidence=0.0,
|
||||||
|
data={
|
||||||
|
"local_query_hint": True,
|
||||||
|
"query": "IU official profile",
|
||||||
|
"hint_source": "title",
|
||||||
|
},
|
||||||
|
),
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.FINGERPRINT,
|
||||||
|
reason="Local generic file name",
|
||||||
|
confidence=0.0,
|
||||||
|
data={
|
||||||
|
"local_query_hint": True,
|
||||||
|
"query": "IMG_1234",
|
||||||
|
"hint_source": "file",
|
||||||
|
},
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
plan = SearchQueryGenerator().plan(evidence, [])
|
||||||
|
|
||||||
|
assert [item.query for item in plan] == [
|
||||||
|
"IU official profile",
|
||||||
|
"IU official profile image",
|
||||||
|
]
|
||||||
|
assert all(item.strategy == "local_metadata" for item in plan)
|
||||||
|
assert all(item.source == "title" for item in plan)
|
||||||
39
tests/rights_filter/analysis/test_search_result_promoter.py
Normal file
39
tests/rights_filter/analysis/test_search_result_promoter.py
Normal file
|
|
@ -0,0 +1,39 @@
|
||||||
|
from rights_filter.analysis.search_result_promoter import SearchResultPromoter
|
||||||
|
from rights_filter.domain.records import Evidence, EvidenceSource
|
||||||
|
|
||||||
|
|
||||||
|
def test_promotes_named_person_or_work_naver_result():
|
||||||
|
evidence = Evidence(
|
||||||
|
source=EvidenceSource.NAVER_SEARCH,
|
||||||
|
reason="Naver search result found",
|
||||||
|
confidence=0.5,
|
||||||
|
data={
|
||||||
|
"title": "IU official album cover",
|
||||||
|
"query": "IU album cover",
|
||||||
|
"result_url": "https://official.example.test/iu",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
promoted = SearchResultPromoter().promote([evidence])[0]
|
||||||
|
|
||||||
|
assert promoted.data["promoted"] is True
|
||||||
|
assert promoted.data["promotion_reason"] == "named person/work search evidence"
|
||||||
|
assert promoted.confidence == 0.8
|
||||||
|
|
||||||
|
|
||||||
|
def test_generic_naver_result_remains_context_only():
|
||||||
|
evidence = Evidence(
|
||||||
|
source=EvidenceSource.NAVER_SEARCH,
|
||||||
|
reason="Naver search result found",
|
||||||
|
confidence=0.5,
|
||||||
|
data={
|
||||||
|
"title": "nice drawing",
|
||||||
|
"query": "nice drawing",
|
||||||
|
"result_url": "https://example.test/drawing",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
promoted = SearchResultPromoter().promote([evidence])[0]
|
||||||
|
|
||||||
|
assert promoted.data["promoted"] is False
|
||||||
|
assert promoted.confidence == 0.2
|
||||||
9
tests/rights_filter/conftest.py
Normal file
9
tests/rights_filter/conftest.py
Normal file
|
|
@ -0,0 +1,9 @@
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
ROOT = Path(__file__).resolve().parents[2]
|
||||||
|
SRC = ROOT / "src"
|
||||||
|
|
||||||
|
if str(SRC) not in sys.path:
|
||||||
|
sys.path.insert(0, str(SRC))
|
||||||
44
tests/rights_filter/domain/test_knowledge_base.py
Normal file
44
tests/rights_filter/domain/test_knowledge_base.py
Normal file
|
|
@ -0,0 +1,44 @@
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
InMemoryRightsFilterRepository,
|
||||||
|
KnowledgeEntryType,
|
||||||
|
KnowledgeProvenance,
|
||||||
|
)
|
||||||
|
from rights_filter.domain.knowledge_base import create_manual_knowledge_entry
|
||||||
|
|
||||||
|
|
||||||
|
def test_create_manual_knowledge_entry_stores_aliases_and_policy_memo():
|
||||||
|
entry = create_manual_knowledge_entry(
|
||||||
|
entry_type=KnowledgeEntryType.CHARACTER,
|
||||||
|
name="Sample character",
|
||||||
|
aliases=["Sample hero"],
|
||||||
|
related_keywords=["webtoon"],
|
||||||
|
policy_memo="Reject unless licensed.",
|
||||||
|
exception_conditions="licensed collaboration",
|
||||||
|
sample_fingerprints=["phash:sample"],
|
||||||
|
)
|
||||||
|
|
||||||
|
assert entry.provenance == KnowledgeProvenance.MANUAL
|
||||||
|
assert entry.aliases == ["Sample hero"]
|
||||||
|
assert entry.related_keywords == ["webtoon"]
|
||||||
|
assert entry.policy_memo == "Reject unless licensed."
|
||||||
|
assert entry.exception_conditions == "licensed collaboration"
|
||||||
|
|
||||||
|
|
||||||
|
def test_deactivating_derived_entries_preserves_manual_entries():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
automatic = repo.create_rejected_image_entry(
|
||||||
|
decision_id="decision-1",
|
||||||
|
submission_id="submission-1",
|
||||||
|
fingerprints=["phash:auto"],
|
||||||
|
)
|
||||||
|
manual = create_manual_knowledge_entry(
|
||||||
|
entry_type=KnowledgeEntryType.CELEBRITY,
|
||||||
|
name="IU",
|
||||||
|
sample_fingerprints=["phash:manual"],
|
||||||
|
)
|
||||||
|
repo.save_knowledge_entry(manual)
|
||||||
|
|
||||||
|
repo.deactivate_entries_for_source_decision("decision-1", "corrected")
|
||||||
|
|
||||||
|
assert repo.knowledge_entry(automatic.id).active is False
|
||||||
|
assert repo.knowledge_entry(manual.id).active is True
|
||||||
142
tests/rights_filter/domain/test_records.py
Normal file
142
tests/rights_filter/domain/test_records.py
Normal file
|
|
@ -0,0 +1,142 @@
|
||||||
|
from rights_filter.domain.records import (
|
||||||
|
AnalysisRun,
|
||||||
|
DataClass,
|
||||||
|
Evidence,
|
||||||
|
EvidenceSource,
|
||||||
|
InMemoryRightsFilterRepository,
|
||||||
|
KnowledgeBaseEntry,
|
||||||
|
KnowledgeEntryType,
|
||||||
|
KnowledgeProvenance,
|
||||||
|
OperatorDecision,
|
||||||
|
ReviewStatus,
|
||||||
|
ScoreResult,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_analysis_run_preserves_evidence_and_latest_score():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
run = AnalysisRun.for_submission("submission-1", analysis_version="v1")
|
||||||
|
run.add_evidence(
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.FINGERPRINT,
|
||||||
|
reason="Prior rejected image similarity 0.94",
|
||||||
|
confidence=0.94,
|
||||||
|
data={"similarity": 0.94},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
run.score = ScoreResult(score=91, band="high", reasons=["Prior rejected image similarity 0.94"])
|
||||||
|
|
||||||
|
repo.save_analysis_run(run)
|
||||||
|
|
||||||
|
latest = repo.latest_score_for_submission("submission-1")
|
||||||
|
assert latest.score == 91
|
||||||
|
assert latest.band == "high"
|
||||||
|
assert repo.analysis_runs_for_submission("submission-1")[0].evidence[0].source == EvidenceSource.FINGERPRINT
|
||||||
|
|
||||||
|
|
||||||
|
def test_manual_and_automatic_knowledge_entries_keep_provenance_separate():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
manual = KnowledgeBaseEntry.create(
|
||||||
|
entry_type=KnowledgeEntryType.CELEBRITY,
|
||||||
|
name="아이유",
|
||||||
|
provenance=KnowledgeProvenance.MANUAL,
|
||||||
|
aliases=["IU", "이지은"],
|
||||||
|
policy_memo="권리 증빙 없으면 반려 검토",
|
||||||
|
sample_fingerprints=["fp-manual"],
|
||||||
|
)
|
||||||
|
automatic = KnowledgeBaseEntry.create(
|
||||||
|
entry_type=KnowledgeEntryType.REJECTED_IMAGE,
|
||||||
|
name="rejected:submission-2",
|
||||||
|
provenance=KnowledgeProvenance.AUTOMATIC_REJECTION,
|
||||||
|
source_decision_id="decision-2",
|
||||||
|
sample_fingerprints=["fp-auto"],
|
||||||
|
)
|
||||||
|
|
||||||
|
repo.save_knowledge_entry(manual)
|
||||||
|
repo.save_knowledge_entry(automatic)
|
||||||
|
|
||||||
|
assert {entry.provenance for entry in repo.active_knowledge_entries()} == {
|
||||||
|
KnowledgeProvenance.MANUAL,
|
||||||
|
KnowledgeProvenance.AUTOMATIC_REJECTION,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def test_inactive_knowledge_entry_is_audited_but_excluded_from_matching():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
entry = KnowledgeBaseEntry.create(
|
||||||
|
entry_type=KnowledgeEntryType.CHARACTER,
|
||||||
|
name="테스트 캐릭터",
|
||||||
|
provenance=KnowledgeProvenance.MANUAL,
|
||||||
|
sample_fingerprints=["fp-character"],
|
||||||
|
)
|
||||||
|
repo.save_knowledge_entry(entry)
|
||||||
|
|
||||||
|
repo.deactivate_knowledge_entry(entry.id, reason="operator correction")
|
||||||
|
|
||||||
|
assert repo.knowledge_entry(entry.id).active is False
|
||||||
|
assert repo.active_knowledge_entries() == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_rejected_decision_creates_linkable_knowledge_source():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
decision = OperatorDecision.create(
|
||||||
|
submission_id="submission-3",
|
||||||
|
status=ReviewStatus.REJECTED,
|
||||||
|
memo="유명인 이미지로 판단",
|
||||||
|
)
|
||||||
|
repo.save_operator_decision(decision)
|
||||||
|
|
||||||
|
entry = repo.create_rejected_image_entry(
|
||||||
|
decision_id=decision.id,
|
||||||
|
submission_id=decision.submission_id,
|
||||||
|
fingerprints=["exact:abc", "phash:def"],
|
||||||
|
)
|
||||||
|
|
||||||
|
assert entry.provenance == KnowledgeProvenance.AUTOMATIC_REJECTION
|
||||||
|
assert entry.source_decision_id == decision.id
|
||||||
|
assert entry.data_classes == {DataClass.IMAGE_FINGERPRINT, DataClass.OPERATOR_NOTE}
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_and_llm_evidence_sources_are_first_class():
|
||||||
|
naver = Evidence(
|
||||||
|
source=EvidenceSource.NAVER_SEARCH,
|
||||||
|
reason="Naver result linked named work",
|
||||||
|
confidence=0.82,
|
||||||
|
data={
|
||||||
|
"query": "IU album cover",
|
||||||
|
"rank": 1,
|
||||||
|
"result_url": "https://example.test/page",
|
||||||
|
"image_url": "https://example.test/image.jpg",
|
||||||
|
"thumbnail_url": "https://example.test/thumb.jpg",
|
||||||
|
"title": "IU album cover",
|
||||||
|
"retrieved_at": "2026-05-25T00:00:00Z",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
summary = Evidence(
|
||||||
|
source=EvidenceSource.LLM_SUMMARY,
|
||||||
|
reason="Assistant summarized source-linked evidence",
|
||||||
|
confidence=0.0,
|
||||||
|
data={
|
||||||
|
"summary": "Search and web evidence mention the same named work.",
|
||||||
|
"source_urls": ["https://example.test/page"],
|
||||||
|
"source_evidence_ids": ["evidence-1"],
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
assert naver.source == EvidenceSource.NAVER_SEARCH
|
||||||
|
assert summary.source == EvidenceSource.LLM_SUMMARY
|
||||||
|
assert DataClass.SEARCH_EVIDENCE.value == "search_evidence"
|
||||||
|
assert DataClass.LLM_SUMMARY.value == "llm_summary"
|
||||||
|
|
||||||
|
|
||||||
|
def test_repository_can_find_entries_derived_from_source_decision():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
entry = repo.create_rejected_image_entry(
|
||||||
|
decision_id="decision-1",
|
||||||
|
submission_id="submission-1",
|
||||||
|
fingerprints=["phash:abc"],
|
||||||
|
)
|
||||||
|
|
||||||
|
derived = repo.knowledge_entries_for_source_decision("decision-1")
|
||||||
|
|
||||||
|
assert derived == [entry]
|
||||||
58
tests/rights_filter/governance/test_policies.py
Normal file
58
tests/rights_filter/governance/test_policies.py
Normal file
|
|
@ -0,0 +1,58 @@
|
||||||
|
from rights_filter.domain.records import DataClass
|
||||||
|
from rights_filter.governance.policies import (
|
||||||
|
GovernancePolicyRegistry,
|
||||||
|
assert_no_biometric_template,
|
||||||
|
assert_operator_evidence_payload_allowed,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_each_data_class_has_access_retention_delete_and_correction_policy():
|
||||||
|
registry = GovernancePolicyRegistry.default()
|
||||||
|
|
||||||
|
for data_class in DataClass:
|
||||||
|
policy = registry.policy_for(data_class)
|
||||||
|
assert policy.access_roles
|
||||||
|
assert policy.retention
|
||||||
|
assert policy.deletion
|
||||||
|
assert policy.correction
|
||||||
|
|
||||||
|
|
||||||
|
def test_biometric_template_storage_is_rejected():
|
||||||
|
try:
|
||||||
|
assert_no_biometric_template({"embedding": [0.1, 0.2, 0.3]})
|
||||||
|
except ValueError as error:
|
||||||
|
assert "biometric template" in str(error)
|
||||||
|
else:
|
||||||
|
raise AssertionError("expected biometric template rejection")
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_evidence_rejects_image_payloads_sent_to_naver():
|
||||||
|
try:
|
||||||
|
assert_operator_evidence_payload_allowed(
|
||||||
|
DataClass.SEARCH_EVIDENCE,
|
||||||
|
{
|
||||||
|
"provider": "naver",
|
||||||
|
"query": "IU album cover",
|
||||||
|
"original_image": b"raw-image-bytes",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
except ValueError as error:
|
||||||
|
assert "image payload" in str(error)
|
||||||
|
else:
|
||||||
|
raise AssertionError("expected Naver image payload rejection")
|
||||||
|
|
||||||
|
|
||||||
|
def test_llm_summary_must_reference_source_evidence():
|
||||||
|
try:
|
||||||
|
assert_operator_evidence_payload_allowed(
|
||||||
|
DataClass.LLM_SUMMARY,
|
||||||
|
{
|
||||||
|
"summary": "This appears to be a famous character.",
|
||||||
|
"source_urls": [],
|
||||||
|
"source_evidence_ids": [],
|
||||||
|
},
|
||||||
|
)
|
||||||
|
except ValueError as error:
|
||||||
|
assert "source evidence" in str(error)
|
||||||
|
else:
|
||||||
|
raise AssertionError("expected source evidence rejection")
|
||||||
|
|
@ -0,0 +1,271 @@
|
||||||
|
from rights_filter.analysis.preprocessing import ImagePayload, build_external_derivative
|
||||||
|
from rights_filter.domain.records import EvidenceSource
|
||||||
|
from rights_filter.integrations.cloud_vision_web_detection import (
|
||||||
|
CloudVisionWebDetectionAdapter,
|
||||||
|
FakeWebDetectionClient,
|
||||||
|
GoogleVisionRestClient,
|
||||||
|
)
|
||||||
|
from rights_filter.integrations.external_policy import ExternalApiPolicy
|
||||||
|
|
||||||
|
|
||||||
|
def test_disabled_or_unapproved_policy_skips_outbound_call():
|
||||||
|
client = FakeWebDetectionClient()
|
||||||
|
adapter = CloudVisionWebDetectionAdapter(client)
|
||||||
|
policy = ExternalApiPolicy(disabled=True, compliance_approved=True)
|
||||||
|
derivative = build_external_derivative(ImagePayload(b"image", 100, 100, {}))
|
||||||
|
|
||||||
|
evidence = adapter.detect("submission-1", derivative, policy)
|
||||||
|
|
||||||
|
assert evidence[0].source == EvidenceSource.EXTERNAL_SKIPPED
|
||||||
|
assert client.calls == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_approved_policy_sends_only_derivative_and_maps_web_evidence():
|
||||||
|
client = FakeWebDetectionClient(
|
||||||
|
response={
|
||||||
|
"web_entities": [{"description": "아이유", "score": 0.92}],
|
||||||
|
"full_matching_images": [{"url": "https://example.com/image.jpg"}],
|
||||||
|
"pages_with_matching_images": [{"url": "https://example.com/page"}],
|
||||||
|
"best_guess_labels": [{"label": "IU photo"}],
|
||||||
|
}
|
||||||
|
)
|
||||||
|
adapter = CloudVisionWebDetectionAdapter(client)
|
||||||
|
policy = ExternalApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
metadata_logging_accepted=True,
|
||||||
|
allow_online_sync=True,
|
||||||
|
daily_limit=10,
|
||||||
|
)
|
||||||
|
original = ImagePayload(b"image", 3000, 2000, {"EXIF": "secret"})
|
||||||
|
derivative = build_external_derivative(original, max_side=1600)
|
||||||
|
|
||||||
|
evidence = adapter.detect("submission-2", derivative, policy)
|
||||||
|
|
||||||
|
assert client.calls == [derivative]
|
||||||
|
assert all(item.source == EvidenceSource.WEB_DETECTION for item in evidence)
|
||||||
|
assert any(item.data.get("entity") == "아이유" for item in evidence)
|
||||||
|
assert any(item.data.get("url") == "https://example.com/image.jpg" for item in evidence)
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_image_search_results_are_mapped_as_operator_evidence():
|
||||||
|
client = FakeWebDetectionClient(
|
||||||
|
response={
|
||||||
|
"partial_matching_images": [{"url": "https://example.com/partial.jpg", "score": 0.77}],
|
||||||
|
"visually_similar_images": [{"url": "https://example.com/similar.jpg", "score": 0.64}],
|
||||||
|
"pages_with_matching_images": [
|
||||||
|
{
|
||||||
|
"url": "https://example.com/article",
|
||||||
|
"score": 0.82,
|
||||||
|
"page_title": "Celebrity photo source",
|
||||||
|
}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
)
|
||||||
|
adapter = CloudVisionWebDetectionAdapter(client)
|
||||||
|
policy = ExternalApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
metadata_logging_accepted=True,
|
||||||
|
allow_online_sync=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.detect(
|
||||||
|
"submission-search",
|
||||||
|
build_external_derivative(ImagePayload(b"image", 100, 100, {})),
|
||||||
|
policy,
|
||||||
|
)
|
||||||
|
|
||||||
|
matches = {item.data.get("match"): item for item in evidence}
|
||||||
|
assert matches["partial"].data["image_url"] == "https://example.com/partial.jpg"
|
||||||
|
assert matches["visual"].reason == "Google visually similar image found"
|
||||||
|
assert matches["page"].data["page_title"] == "Celebrity photo source"
|
||||||
|
assert all(item.data["provider"] == "google" for item in evidence)
|
||||||
|
|
||||||
|
|
||||||
|
def test_image_match_evidence_preserves_url_variants_as_compare_candidates():
|
||||||
|
client = FakeWebDetectionClient(
|
||||||
|
response={
|
||||||
|
"full_matching_images": [
|
||||||
|
{
|
||||||
|
"image_url": "https://cdn.example.com/full.webp",
|
||||||
|
"thumbnail_url": "https://cdn.example.com/full-thumb.webp",
|
||||||
|
"score": 0.88,
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"partial_matching_images": [
|
||||||
|
{"contentUrl": "https://cdn.example.com/partial.png"}
|
||||||
|
],
|
||||||
|
"visually_similar_images": [
|
||||||
|
{"src": "https://cdn.example.com/visual.jpg"}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
)
|
||||||
|
adapter = CloudVisionWebDetectionAdapter(client)
|
||||||
|
policy = ExternalApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
metadata_logging_accepted=True,
|
||||||
|
allow_online_sync=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.detect(
|
||||||
|
"submission-search",
|
||||||
|
build_external_derivative(ImagePayload(b"image", 100, 100, {})),
|
||||||
|
policy,
|
||||||
|
)
|
||||||
|
|
||||||
|
matches = {item.data.get("match"): item for item in evidence}
|
||||||
|
assert matches["full"].data["image_url"] == "https://cdn.example.com/full.webp"
|
||||||
|
assert matches["full"].data["thumbnail_url"] == "https://cdn.example.com/full-thumb.webp"
|
||||||
|
assert matches["partial"].data["image_url"] == "https://cdn.example.com/partial.png"
|
||||||
|
assert matches["visual"].data["image_url"] == "https://cdn.example.com/visual.jpg"
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_rest_client_preserves_image_candidate_url_variants_from_raw_matches():
|
||||||
|
class FakeTransport:
|
||||||
|
def request_json(self, method, url, headers=None, payload=None, timeout=15):
|
||||||
|
return {
|
||||||
|
"responses": [
|
||||||
|
{
|
||||||
|
"webDetection": {
|
||||||
|
"fullMatchingImages": [
|
||||||
|
{
|
||||||
|
"imageUrl": "https://cdn.example.com/full.webp",
|
||||||
|
"thumbnailUrl": "https://cdn.example.com/full-thumb.webp",
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"partialMatchingImages": [
|
||||||
|
{"contentUrl": "https://cdn.example.com/partial.png"}
|
||||||
|
],
|
||||||
|
"visuallySimilarImages": [
|
||||||
|
{"src": "https://cdn.example.com/visual.jpg"}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
client = GoogleVisionRestClient("google-key", transport=FakeTransport())
|
||||||
|
|
||||||
|
result = client.detect_web(ImagePayload(b"image", 100, 100, {}))
|
||||||
|
|
||||||
|
assert result["full_matching_images"][0]["url"] == "https://cdn.example.com/full.webp"
|
||||||
|
assert result["full_matching_images"][0]["thumbnail_url"] == "https://cdn.example.com/full-thumb.webp"
|
||||||
|
assert result["partial_matching_images"][0]["url"] == "https://cdn.example.com/partial.png"
|
||||||
|
assert result["visually_similar_images"][0]["url"] == "https://cdn.example.com/visual.jpg"
|
||||||
|
|
||||||
|
|
||||||
|
def test_page_evidence_preserves_nested_image_candidates():
|
||||||
|
client = FakeWebDetectionClient(
|
||||||
|
response={
|
||||||
|
"pages_with_matching_images": [
|
||||||
|
{
|
||||||
|
"url": "https://example.com/article",
|
||||||
|
"score": 0.82,
|
||||||
|
"page_title": "Celebrity photo source",
|
||||||
|
"page_image_urls": ["https://cdn.example.com/already-known.jpg"],
|
||||||
|
"full_matching_images": [{"url": "https://cdn.example.com/full.jpg"}],
|
||||||
|
"partial_matching_images": [{"url": "https://cdn.example.com/partial.png"}],
|
||||||
|
"thumbnail_url": "https://cdn.example.com/thumb.webp",
|
||||||
|
}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
)
|
||||||
|
adapter = CloudVisionWebDetectionAdapter(client)
|
||||||
|
policy = ExternalApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
metadata_logging_accepted=True,
|
||||||
|
allow_online_sync=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.detect(
|
||||||
|
"submission-search",
|
||||||
|
build_external_derivative(ImagePayload(b"image", 100, 100, {})),
|
||||||
|
policy,
|
||||||
|
)
|
||||||
|
|
||||||
|
page = evidence[0]
|
||||||
|
assert page.data["page_image_urls"] == [
|
||||||
|
"https://cdn.example.com/already-known.jpg",
|
||||||
|
"https://cdn.example.com/full.jpg",
|
||||||
|
"https://cdn.example.com/partial.png",
|
||||||
|
"https://cdn.example.com/thumb.webp",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_rest_client_preserves_page_image_candidates_from_raw_web_pages():
|
||||||
|
class FakeTransport:
|
||||||
|
def request_json(self, method, url, headers=None, payload=None, timeout=15):
|
||||||
|
return {
|
||||||
|
"responses": [
|
||||||
|
{
|
||||||
|
"webDetection": {
|
||||||
|
"pagesWithMatchingImages": [
|
||||||
|
{
|
||||||
|
"url": "https://example.com/article",
|
||||||
|
"pageTitle": "Celebrity photo source",
|
||||||
|
"fullMatchingImages": [
|
||||||
|
{"url": "https://cdn.example.com/full.jpg"}
|
||||||
|
],
|
||||||
|
"partialMatchingImages": [
|
||||||
|
{"url": "https://cdn.example.com/partial.png"}
|
||||||
|
],
|
||||||
|
"visuallySimilarImages": [
|
||||||
|
{"url": "https://cdn.example.com/visual.webp"}
|
||||||
|
],
|
||||||
|
"thumbnailUrl": "https://cdn.example.com/thumb.webp",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
client = GoogleVisionRestClient("google-key", transport=FakeTransport())
|
||||||
|
|
||||||
|
result = client.detect_web(ImagePayload(b"image", 100, 100, {}))
|
||||||
|
|
||||||
|
assert result["pages_with_matching_images"][0]["page_image_urls"] == [
|
||||||
|
"https://cdn.example.com/full.jpg",
|
||||||
|
"https://cdn.example.com/partial.png",
|
||||||
|
"https://cdn.example.com/visual.webp",
|
||||||
|
"https://cdn.example.com/thumb.webp",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_best_guess_labels_are_mapped_as_weak_non_contributing_hints():
|
||||||
|
client = FakeWebDetectionClient(
|
||||||
|
response={"best_guess_labels": [{"label": "gentleman"}]}
|
||||||
|
)
|
||||||
|
adapter = CloudVisionWebDetectionAdapter(client)
|
||||||
|
policy = ExternalApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
metadata_logging_accepted=True,
|
||||||
|
allow_online_sync=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.detect(
|
||||||
|
"submission-weak",
|
||||||
|
build_external_derivative(ImagePayload(b"image", 100, 100, {})),
|
||||||
|
policy,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert evidence[0].reason == "Google weak label gentleman"
|
||||||
|
assert evidence[0].confidence == 0.0
|
||||||
|
assert evidence[0].data["weak_hint"] is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_quota_exhaustion_records_failure_reason():
|
||||||
|
client = FakeWebDetectionClient()
|
||||||
|
adapter = CloudVisionWebDetectionAdapter(client)
|
||||||
|
policy = ExternalApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
metadata_logging_accepted=True,
|
||||||
|
allow_online_sync=True,
|
||||||
|
daily_limit=0,
|
||||||
|
)
|
||||||
|
derivative = build_external_derivative(ImagePayload(b"image", 100, 100, {}))
|
||||||
|
|
||||||
|
evidence = adapter.detect("submission-3", derivative, policy)
|
||||||
|
|
||||||
|
assert evidence[0].source == EvidenceSource.EXTERNAL_SKIPPED
|
||||||
|
assert "usage limit" in evidence[0].reason
|
||||||
|
assert client.calls == []
|
||||||
275
tests/rights_filter/integrations/test_env_clients.py
Normal file
275
tests/rights_filter/integrations/test_env_clients.py
Normal file
|
|
@ -0,0 +1,275 @@
|
||||||
|
import base64
|
||||||
|
|
||||||
|
from rights_filter.analysis.preprocessing import ImagePayload
|
||||||
|
from rights_filter.domain.records import Evidence, EvidenceSource
|
||||||
|
from rights_filter.integrations.cloud_vision_web_detection import GoogleVisionRestClient
|
||||||
|
from rights_filter.integrations.env_clients import build_provider_runtime
|
||||||
|
from rights_filter.integrations.google_custom_search import GoogleCustomSearchClient
|
||||||
|
from rights_filter.integrations.naver_search import NaverOpenApiSearchClient
|
||||||
|
from rights_filter.analysis.llm_assistance import OllamaGenerateLlmClient
|
||||||
|
|
||||||
|
|
||||||
|
class FakeJsonTransport:
|
||||||
|
def __init__(self, response):
|
||||||
|
self.response = response
|
||||||
|
self.calls = []
|
||||||
|
|
||||||
|
def request_json(self, method, url, headers=None, payload=None, timeout=10):
|
||||||
|
self.calls.append(
|
||||||
|
{
|
||||||
|
"method": method,
|
||||||
|
"url": url,
|
||||||
|
"headers": headers or {},
|
||||||
|
"payload": payload,
|
||||||
|
"timeout": timeout,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
return self.response
|
||||||
|
|
||||||
|
|
||||||
|
def test_naver_open_api_client_uses_env_headers_and_text_query():
|
||||||
|
transport = FakeJsonTransport({"items": [{"title": "IU", "link": "https://image"}]})
|
||||||
|
client = NaverOpenApiSearchClient(
|
||||||
|
client_id="naver-id",
|
||||||
|
client_secret="naver-secret",
|
||||||
|
transport=transport,
|
||||||
|
)
|
||||||
|
|
||||||
|
response = client.search_image("IU album cover")
|
||||||
|
|
||||||
|
call = transport.calls[0]
|
||||||
|
assert response["items"][0]["title"] == "IU"
|
||||||
|
assert call["method"] == "GET"
|
||||||
|
assert call["url"].startswith("https://openapi.naver.com/v1/search/image?")
|
||||||
|
assert "query=IU+album+cover" in call["url"]
|
||||||
|
assert call["headers"]["X-Naver-Client-Id"] == "naver-id"
|
||||||
|
assert call["headers"]["X-Naver-Client-Secret"] == "naver-secret"
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_vision_rest_client_sends_web_detection_request_with_api_key():
|
||||||
|
transport = FakeJsonTransport(
|
||||||
|
{
|
||||||
|
"responses": [
|
||||||
|
{
|
||||||
|
"webDetection": {
|
||||||
|
"webEntities": [{"description": "IU", "score": 0.91}],
|
||||||
|
"fullMatchingImages": [{"url": "https://image"}],
|
||||||
|
"pagesWithMatchingImages": [{"url": "https://page"}],
|
||||||
|
"bestGuessLabels": [{"label": "IU photo"}],
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
client = GoogleVisionRestClient(api_key="google-key", transport=transport)
|
||||||
|
|
||||||
|
response = client.detect_web(ImagePayload(b"image-bytes", 100, 100, {}))
|
||||||
|
|
||||||
|
call = transport.calls[0]
|
||||||
|
assert call["method"] == "POST"
|
||||||
|
assert call["url"] == "https://vision.googleapis.com/v1/images:annotate?key=google-key"
|
||||||
|
assert call["payload"]["requests"][0]["features"][0]["type"] == "WEB_DETECTION"
|
||||||
|
assert call["payload"]["requests"][0]["image"]["content"] == base64.b64encode(b"image-bytes").decode("ascii")
|
||||||
|
assert response["web_entities"][0]["description"] == "IU"
|
||||||
|
assert response["full_matching_images"][0]["url"] == "https://image"
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_vision_rest_client_preserves_all_image_search_result_types():
|
||||||
|
transport = FakeJsonTransport(
|
||||||
|
{
|
||||||
|
"responses": [
|
||||||
|
{
|
||||||
|
"webDetection": {
|
||||||
|
"partialMatchingImages": [{"url": "https://partial", "score": 0.7}],
|
||||||
|
"visuallySimilarImages": [{"url": "https://similar", "score": 0.5}],
|
||||||
|
"pagesWithMatchingImages": [
|
||||||
|
{
|
||||||
|
"url": "https://page",
|
||||||
|
"score": 0.8,
|
||||||
|
"pageTitle": "Known celebrity page",
|
||||||
|
"fullMatchingImages": [{"url": "https://page/full.jpg"}],
|
||||||
|
"partialMatchingImages": [{"url": "https://page/partial.jpg"}],
|
||||||
|
}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
client = GoogleVisionRestClient(api_key="google-key", transport=transport)
|
||||||
|
|
||||||
|
response = client.detect_web(ImagePayload(b"image-bytes", 100, 100, {}))
|
||||||
|
|
||||||
|
assert response["partial_matching_images"][0] == {"url": "https://partial", "score": 0.7}
|
||||||
|
assert response["visually_similar_images"][0] == {"url": "https://similar", "score": 0.5}
|
||||||
|
assert response["pages_with_matching_images"][0]["page_title"] == "Known celebrity page"
|
||||||
|
assert response["pages_with_matching_images"][0]["page_image_urls"] == [
|
||||||
|
"https://page/full.jpg",
|
||||||
|
"https://page/partial.jpg",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_search_client_uses_env_key_and_search_engine_id():
|
||||||
|
transport = FakeJsonTransport({"items": []})
|
||||||
|
client = GoogleCustomSearchClient(
|
||||||
|
api_key="google-search-key",
|
||||||
|
cx="search-engine-id",
|
||||||
|
transport=transport,
|
||||||
|
)
|
||||||
|
|
||||||
|
client.search_image("IU official profile")
|
||||||
|
|
||||||
|
call = transport.calls[0]
|
||||||
|
assert call["method"] == "GET"
|
||||||
|
assert call["url"].startswith("https://www.googleapis.com/customsearch/v1?")
|
||||||
|
assert "key=google-search-key" in call["url"]
|
||||||
|
assert "cx=search-engine-id" in call["url"]
|
||||||
|
assert "q=IU+official+profile" in call["url"]
|
||||||
|
assert "searchType=image" in call["url"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_ollama_generate_client_uses_local_endpoint_and_returns_response_text():
|
||||||
|
transport = FakeJsonTransport({"response": "Source-linked summary"})
|
||||||
|
client = OllamaGenerateLlmClient(
|
||||||
|
base_url="http://localhost:11434",
|
||||||
|
model="qwen2.5:0.5b-instruct",
|
||||||
|
transport=transport,
|
||||||
|
)
|
||||||
|
evidence = [
|
||||||
|
Evidence(
|
||||||
|
source=EvidenceSource.NAVER_SEARCH,
|
||||||
|
reason="Naver search result found",
|
||||||
|
confidence=0.8,
|
||||||
|
data={"title": "IU", "result_url": "https://source"},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
summary = client.summarize_evidence(evidence)
|
||||||
|
|
||||||
|
call = transport.calls[0]
|
||||||
|
assert summary == "Source-linked summary"
|
||||||
|
assert call["method"] == "POST"
|
||||||
|
assert call["url"] == "http://localhost:11434/api/generate"
|
||||||
|
assert call["headers"] == {}
|
||||||
|
assert call["payload"]["model"] == "qwen2.5:0.5b-instruct"
|
||||||
|
assert call["payload"]["stream"] is False
|
||||||
|
assert "source evidence" in call["payload"]["prompt"].lower()
|
||||||
|
assert "source evidence" in call["payload"]["system"].lower()
|
||||||
|
|
||||||
|
|
||||||
|
def test_provider_runtime_wires_clients_from_environment():
|
||||||
|
env = {
|
||||||
|
"NAVER_CLIENT_ID": "naver-id",
|
||||||
|
"NAVER_CLIENT_SECRET": "naver-secret",
|
||||||
|
"GOOGLE_CLOUD_VISION_API_KEY": "google-key",
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_API_KEY": "google-search-key",
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_CX": "search-engine-id",
|
||||||
|
"OLLAMA_BASE_URL": "http://localhost:11434",
|
||||||
|
"OLLAMA_MODEL": "qwen2.5:0.5b-instruct",
|
||||||
|
}
|
||||||
|
|
||||||
|
runtime = build_provider_runtime(env=env, transport=FakeJsonTransport({}))
|
||||||
|
|
||||||
|
assert runtime.naver_adapter is not None
|
||||||
|
assert runtime.google_adapter is not None
|
||||||
|
assert runtime.google_custom_search_adapter is not None
|
||||||
|
assert runtime.llm_assistant is not None
|
||||||
|
assert runtime.provider_payloads["naver"]["enabled"] is True
|
||||||
|
assert runtime.provider_payloads["google"]["enabled"] is True
|
||||||
|
assert runtime.provider_payloads["google_search"]["enabled"] is True
|
||||||
|
assert runtime.provider_payloads["llm"]["enabled"] is True
|
||||||
|
assert runtime.provider_payloads["google_search"]["requiredEnv"] == [
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_API_KEY",
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_CX",
|
||||||
|
]
|
||||||
|
assert runtime.provider_payloads["google_search"]["configuredEnv"] == {
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_API_KEY": True,
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_CX": True,
|
||||||
|
}
|
||||||
|
assert runtime.auto_naver_query_limit == 3
|
||||||
|
assert runtime.auto_naver_blog_query_limit == 1
|
||||||
|
assert runtime.auto_naver_web_query_limit == 1
|
||||||
|
assert runtime.auto_google_custom_query_limit == 2
|
||||||
|
assert runtime.search_result_compare_limit == 3
|
||||||
|
assert runtime.search_result_page_image_limit == 3
|
||||||
|
assert runtime.search_result_similarity_threshold == 0.9
|
||||||
|
|
||||||
|
|
||||||
|
def test_provider_runtime_wires_search_pagination_limits_from_environment():
|
||||||
|
runtime = build_provider_runtime(
|
||||||
|
env={
|
||||||
|
"NAVER_CLIENT_ID": "naver-id",
|
||||||
|
"NAVER_CLIENT_SECRET": "naver-secret",
|
||||||
|
"NAVER_SEARCH_PAGES": "2",
|
||||||
|
"NAVER_BLOG_SEARCH_PAGES": "3",
|
||||||
|
"NAVER_WEB_SEARCH_PAGES": "4",
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_API_KEY": "google-search-key",
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_CX": "search-engine-id",
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_IMAGE_PAGES": "2",
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_WEB_PAGES": "3",
|
||||||
|
},
|
||||||
|
transport=FakeJsonTransport({"items": []}),
|
||||||
|
)
|
||||||
|
|
||||||
|
naver_client = runtime.naver_adapter.client
|
||||||
|
google_search_client = runtime.google_custom_search_adapter.client
|
||||||
|
|
||||||
|
assert naver_client.image_pages == 2
|
||||||
|
assert naver_client.blog_pages == 3
|
||||||
|
assert naver_client.web_pages == 4
|
||||||
|
assert google_search_client.image_pages == 2
|
||||||
|
assert google_search_client.web_pages == 3
|
||||||
|
|
||||||
|
|
||||||
|
def test_provider_runtime_reads_bounded_auto_naver_query_limit():
|
||||||
|
runtime = build_provider_runtime(
|
||||||
|
env={
|
||||||
|
"COPYRIGHTER_AUTO_NAVER_QUERY_LIMIT": "25",
|
||||||
|
"COPYRIGHTER_AUTO_NAVER_BLOG_QUERY_LIMIT": "25",
|
||||||
|
"COPYRIGHTER_AUTO_NAVER_WEB_QUERY_LIMIT": "25",
|
||||||
|
"COPYRIGHTER_AUTO_GOOGLE_CUSTOM_QUERY_LIMIT": "25",
|
||||||
|
"COPYRIGHTER_SEARCH_RESULT_COMPARE_LIMIT": "50",
|
||||||
|
"COPYRIGHTER_SEARCH_RESULT_PAGE_IMAGE_LIMIT": "25",
|
||||||
|
"COPYRIGHTER_SEARCH_RESULT_SIMILARITY_THRESHOLD": "10",
|
||||||
|
},
|
||||||
|
transport=FakeJsonTransport({}),
|
||||||
|
)
|
||||||
|
|
||||||
|
assert runtime.auto_naver_query_limit == 10
|
||||||
|
assert runtime.auto_naver_blog_query_limit == 10
|
||||||
|
assert runtime.auto_naver_web_query_limit == 10
|
||||||
|
assert runtime.auto_google_custom_query_limit == 10
|
||||||
|
assert runtime.search_result_compare_limit == 20
|
||||||
|
assert runtime.search_result_page_image_limit == 10
|
||||||
|
assert runtime.search_result_similarity_threshold == 1.0
|
||||||
|
|
||||||
|
|
||||||
|
def test_provider_runtime_clamps_search_result_similarity_threshold():
|
||||||
|
runtime = build_provider_runtime(
|
||||||
|
env={"COPYRIGHTER_SEARCH_RESULT_SIMILARITY_THRESHOLD": "-0.2"},
|
||||||
|
transport=FakeJsonTransport({}),
|
||||||
|
)
|
||||||
|
|
||||||
|
assert runtime.search_result_similarity_threshold == 0.0
|
||||||
|
|
||||||
|
|
||||||
|
def test_provider_runtime_leaves_missing_env_providers_disabled():
|
||||||
|
runtime = build_provider_runtime(env={}, transport=FakeJsonTransport({}))
|
||||||
|
|
||||||
|
assert runtime.naver_adapter is None
|
||||||
|
assert runtime.google_adapter is None
|
||||||
|
assert runtime.google_custom_search_adapter is None
|
||||||
|
assert runtime.llm_assistant is not None
|
||||||
|
assert runtime.provider_payloads["naver"]["enabled"] is False
|
||||||
|
assert runtime.provider_payloads["google"]["lastFailure"] == "missing GOOGLE_CLOUD_VISION_API_KEY"
|
||||||
|
assert runtime.provider_payloads["google_search"]["lastFailure"] == "missing GOOGLE_CUSTOM_SEARCH_API_KEY or GOOGLE_CUSTOM_SEARCH_CX"
|
||||||
|
assert runtime.provider_payloads["google_search"]["requiredEnv"] == [
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_API_KEY",
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_CX",
|
||||||
|
]
|
||||||
|
assert runtime.provider_payloads["google_search"]["configuredEnv"] == {
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_API_KEY": False,
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_CX": False,
|
||||||
|
}
|
||||||
|
assert runtime.provider_payloads["llm"]["enabled"] is True
|
||||||
|
assert runtime.provider_payloads["llm"]["compliance"] == "Ollama local API configured (qwen2.5:0.5b-instruct)"
|
||||||
892
tests/rights_filter/integrations/test_google_custom_search.py
Normal file
892
tests/rights_filter/integrations/test_google_custom_search.py
Normal file
|
|
@ -0,0 +1,892 @@
|
||||||
|
from rights_filter.domain.records import EvidenceSource
|
||||||
|
from rights_filter.integrations.google_custom_search import (
|
||||||
|
GoogleCustomSearchAdapter,
|
||||||
|
GoogleCustomSearchClient,
|
||||||
|
)
|
||||||
|
from rights_filter.integrations.search_policy import SearchApiPolicy
|
||||||
|
|
||||||
|
|
||||||
|
class FakeJsonTransport:
|
||||||
|
def __init__(self, response):
|
||||||
|
self.response = response
|
||||||
|
self.calls = []
|
||||||
|
|
||||||
|
def request_json(self, method, url, headers=None, payload=None, timeout=10):
|
||||||
|
self.calls.append(
|
||||||
|
{
|
||||||
|
"method": method,
|
||||||
|
"url": url,
|
||||||
|
"headers": headers or {},
|
||||||
|
"payload": payload,
|
||||||
|
"timeout": timeout,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
return self.response
|
||||||
|
|
||||||
|
|
||||||
|
class FakeGoogleCustomSearchClient:
|
||||||
|
def __init__(self, image_response=None, web_response=None):
|
||||||
|
self.image_response = image_response or {"items": []}
|
||||||
|
self.web_response = web_response or {"items": []}
|
||||||
|
self.image_calls = []
|
||||||
|
self.web_calls = []
|
||||||
|
|
||||||
|
def search_image(self, query: str):
|
||||||
|
self.image_calls.append(query)
|
||||||
|
return self.image_response
|
||||||
|
|
||||||
|
def search_web(self, query: str):
|
||||||
|
self.web_calls.append(query)
|
||||||
|
return self.web_response
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_search_client_calls_image_search_with_key_and_cx():
|
||||||
|
transport = FakeJsonTransport({"items": []})
|
||||||
|
client = GoogleCustomSearchClient(
|
||||||
|
api_key="google-search-key",
|
||||||
|
cx="search-engine-id",
|
||||||
|
transport=transport,
|
||||||
|
image_num=3,
|
||||||
|
)
|
||||||
|
|
||||||
|
client.search_image("IU official profile")
|
||||||
|
|
||||||
|
call = transport.calls[0]
|
||||||
|
assert call["method"] == "GET"
|
||||||
|
assert call["url"].startswith("https://www.googleapis.com/customsearch/v1?")
|
||||||
|
assert "key=google-search-key" in call["url"]
|
||||||
|
assert "cx=search-engine-id" in call["url"]
|
||||||
|
assert "q=IU+official+profile" in call["url"]
|
||||||
|
assert "searchType=image" in call["url"]
|
||||||
|
assert "num=3" in call["url"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_search_client_merges_configured_image_pages():
|
||||||
|
class PagingTransport:
|
||||||
|
def __init__(self):
|
||||||
|
self.calls = []
|
||||||
|
|
||||||
|
def request_json(self, method, url, headers=None, payload=None, timeout=10):
|
||||||
|
self.calls.append({"method": method, "url": url})
|
||||||
|
if "start=4" in url:
|
||||||
|
return {
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "page 2",
|
||||||
|
"link": "https://example.test/page-2.jpg",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
return {
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "page 1",
|
||||||
|
"link": "https://example.test/page-1.jpg",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
transport = PagingTransport()
|
||||||
|
client = GoogleCustomSearchClient(
|
||||||
|
api_key="google-search-key",
|
||||||
|
cx="search-engine-id",
|
||||||
|
transport=transport,
|
||||||
|
image_num=3,
|
||||||
|
image_pages=2,
|
||||||
|
)
|
||||||
|
|
||||||
|
response = client.search_image("IU official profile")
|
||||||
|
|
||||||
|
assert [item["link"] for item in response["items"]] == [
|
||||||
|
"https://example.test/page-1.jpg",
|
||||||
|
"https://example.test/page-2.jpg",
|
||||||
|
]
|
||||||
|
assert "start=1" in transport.calls[0]["url"]
|
||||||
|
assert "start=4" in transport.calls[1]["url"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_image_results_are_mapped_as_image_evidence():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
image_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile",
|
||||||
|
"link": "https://example.test/iu.png",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"image": {
|
||||||
|
"thumbnailLink": "https://example.test/iu-thumb.png",
|
||||||
|
"contextLink": "https://example.test/profile",
|
||||||
|
"height": 900,
|
||||||
|
"width": 700,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_images("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].source == EvidenceSource.WEB_DETECTION
|
||||||
|
assert evidence[0].reason == "Google custom image search result found"
|
||||||
|
assert evidence[0].data["provider"] == "google_custom_search"
|
||||||
|
assert evidence[0].data["query_signature"] == "google-custom-image:iu official profile"
|
||||||
|
assert evidence[0].data["search_type"] == "image"
|
||||||
|
assert evidence[0].data["image_url"] == "https://example.test/iu.png"
|
||||||
|
assert evidence[0].data["thumbnail_url"] == "https://example.test/iu-thumb.png"
|
||||||
|
assert evidence[0].data["result_url"] == "https://example.test/profile"
|
||||||
|
assert evidence[0].data["match"] == "search_image"
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_image_results_unwrap_direct_proxy_image_urls():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
image_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile",
|
||||||
|
"link": (
|
||||||
|
"https://images.example.test/imgres?"
|
||||||
|
"imgurl=cdn.example.test%2Fiu-profile.png"
|
||||||
|
"&imgrefurl=https%3A%2F%2Fexample.test%2Fprofile"
|
||||||
|
),
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"image": {
|
||||||
|
"thumbnailLink": (
|
||||||
|
"https://images.example.test/thumb?"
|
||||||
|
"image_url=https%3A%2F%2Fcdn.example.test%2Fiu-thumb.jpg"
|
||||||
|
),
|
||||||
|
"contextLink": "https://example.test/profile?a=1&b=2",
|
||||||
|
"height": 900,
|
||||||
|
"width": 700,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_images("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["image_url"] == "https://cdn.example.test/iu-profile.png"
|
||||||
|
assert evidence[0].data["thumbnail_url"] == "https://cdn.example.test/iu-thumb.jpg"
|
||||||
|
assert evidence[0].data["result_url"] == "https://example.test/profile?a=1&b=2"
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_image_results_unwrap_weak_proxy_jfif_image_urls():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
image_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile",
|
||||||
|
"link": (
|
||||||
|
"https://images.example.test/imgres?"
|
||||||
|
"url=https%3A%2F%2Fcdn.example.test%2Fiu-profile.jfif"
|
||||||
|
),
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"image": {
|
||||||
|
"thumbnailLink": (
|
||||||
|
"https://images.example.test/thumb?"
|
||||||
|
"u=https%3A%2F%2Fcdn.example.test%2Fiu-thumb.jfif"
|
||||||
|
),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_images("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["image_url"] == "https://cdn.example.test/iu-profile.jfif"
|
||||||
|
assert evidence[0].data["thumbnail_url"] == "https://cdn.example.test/iu-thumb.jfif"
|
||||||
|
assert evidence[0].data["result_url"] == "https://cdn.example.test/iu-profile.jfif"
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_image_results_unwrap_weak_proxy_query_format_image_urls():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
image_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile",
|
||||||
|
"link": (
|
||||||
|
"https://images.example.test/imgres?"
|
||||||
|
"url=https%3A%2F%2Fcdn.example.test%2Frender%3Fformat%3Dwebp"
|
||||||
|
),
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"image": {
|
||||||
|
"thumbnailLink": (
|
||||||
|
"https://images.example.test/thumb?"
|
||||||
|
"u=https%3A%2F%2Fcdn.example.test%2Fthumb%3Ffm%3Djpg"
|
||||||
|
),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_images("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["image_url"] == "https://cdn.example.test/render?format=webp"
|
||||||
|
assert evidence[0].data["thumbnail_url"] == "https://cdn.example.test/thumb?fm=jpg"
|
||||||
|
assert evidence[0].data["result_url"] == "https://cdn.example.test/render?format=webp"
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_image_results_unwrap_scheme_less_proxy_query_format_image_urls():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
image_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile",
|
||||||
|
"link": (
|
||||||
|
"https://images.example.test/imgres?"
|
||||||
|
"url=cdn.example.test%2Frender%3Fformat%3Dwebp"
|
||||||
|
),
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"image": {
|
||||||
|
"thumbnailLink": (
|
||||||
|
"https://images.example.test/thumb?"
|
||||||
|
"u=cdn.example.test%2Fthumb%3Ffm%3Djpg"
|
||||||
|
),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_images("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["image_url"] == "https://cdn.example.test/render?format=webp"
|
||||||
|
assert evidence[0].data["thumbnail_url"] == "https://cdn.example.test/thumb?fm=jpg"
|
||||||
|
assert evidence[0].data["result_url"] == "https://cdn.example.test/render?format=webp"
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_image_results_preserve_pagemap_image_candidates():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
image_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile",
|
||||||
|
"link": "https://example.test/blocked-original.png",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"image": {
|
||||||
|
"thumbnailLink": "https://example.test/blocked-thumb.png",
|
||||||
|
"contextLink": "https://example.test/profile/page",
|
||||||
|
},
|
||||||
|
"pagemap": {
|
||||||
|
"cse_image": [{"src": "/media/profile-from-page.png"}],
|
||||||
|
"metatags": [{"og:image": "https://cdn.example.test/profile-og.png"}],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_images("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://example.test/media/profile-from-page.png",
|
||||||
|
"https://cdn.example.test/profile-og.png",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_web_results_preserve_page_image_candidates():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU profile page",
|
||||||
|
"link": "https://example.test/profile",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"pagemap": {
|
||||||
|
"cse_image": [{"src": "https://example.test/page-image.png"}],
|
||||||
|
"metatags": [{"og:image": "https://example.test/og-image.png"}],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].source == EvidenceSource.WEB_DETECTION
|
||||||
|
assert evidence[0].reason == "Google custom web search result found"
|
||||||
|
assert evidence[0].data["query_signature"] == "google-custom-web:iu official profile"
|
||||||
|
assert evidence[0].data["search_type"] == "web"
|
||||||
|
assert evidence[0].data["result_url"] == "https://example.test/profile"
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://example.test/page-image.png",
|
||||||
|
"https://example.test/og-image.png",
|
||||||
|
]
|
||||||
|
assert evidence[0].data["match"] == "page"
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_web_results_unwrap_redirect_result_urls_before_resolving_page_images():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU profile page",
|
||||||
|
"link": (
|
||||||
|
"https://www.google.test/url?"
|
||||||
|
"url=https%3A%2F%2Fexample.test%2Fprofiles%2Fiu"
|
||||||
|
"&sa=U"
|
||||||
|
),
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"pagemap": {
|
||||||
|
"cse_image": [{"src": "/media/profile.png"}],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["result_url"] == "https://example.test/profiles/iu"
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://example.test/media/profile.png",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_web_results_preserve_common_pagemap_image_objects():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU profile page",
|
||||||
|
"link": "https://example.test/profile",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"pagemap": {
|
||||||
|
"cse_thumbnail": [{"src": "https://example.test/thumb.jpg"}],
|
||||||
|
"imageobject": [
|
||||||
|
{
|
||||||
|
"url": "https://example.test/image-url.jpg",
|
||||||
|
"contenturl": "https://example.test/content-url.jpg",
|
||||||
|
"thumbnailurl": "https://example.test/object-thumb.jpg",
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"thumbnail": [{"src": "https://example.test/thumb-entry.jpg"}],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://example.test/thumb.jpg",
|
||||||
|
"https://example.test/image-url.jpg",
|
||||||
|
"https://example.test/content-url.jpg",
|
||||||
|
"https://example.test/object-thumb.jpg",
|
||||||
|
"https://example.test/thumb-entry.jpg",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_web_results_ignore_non_image_pagemap_metadata_inside_image_objects():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU profile page",
|
||||||
|
"link": "https://example.test/profile",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"pagemap": {
|
||||||
|
"imageobject": [
|
||||||
|
{
|
||||||
|
"url": "https://example.test/image-url.jpg",
|
||||||
|
"width": "1200",
|
||||||
|
"height": "900",
|
||||||
|
"caption": "IU official profile",
|
||||||
|
}
|
||||||
|
],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://example.test/image-url.jpg",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_web_results_ignore_non_image_metatag_image_metadata():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU profile page",
|
||||||
|
"link": "https://example.test/profile",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"pagemap": {
|
||||||
|
"metatags": [
|
||||||
|
{
|
||||||
|
"og:image": "https://cdn.example.test/profile.jpg",
|
||||||
|
"og:image:width": "1200",
|
||||||
|
"og:image:height": "900",
|
||||||
|
"twitter:image:alt": "IU official profile",
|
||||||
|
}
|
||||||
|
],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://cdn.example.test/profile.jpg",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_web_results_preserve_generic_pagemap_image_fields():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU profile page",
|
||||||
|
"link": "https://example.test/profile",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"pagemap": {
|
||||||
|
"person": [{"image": "https://cdn.example.test/person-profile.jpg"}],
|
||||||
|
"article": [{"url": "https://example.test/plain-page-url"}],
|
||||||
|
"webpage": [
|
||||||
|
{
|
||||||
|
"primaryimageofpage": {
|
||||||
|
"url": "/media/primary-profile.jpg",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"hcard": [{"photo": "https://cdn.example.test/hcard-photo.jpg"}],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://cdn.example.test/person-profile.jpg",
|
||||||
|
"https://example.test/media/primary-profile.jpg",
|
||||||
|
"https://cdn.example.test/hcard-photo.jpg",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_web_results_preserve_generic_pagemap_public_url_image_fields():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU profile page",
|
||||||
|
"link": "https://example.test/profile",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"pagemap": {
|
||||||
|
"webpage": [
|
||||||
|
{
|
||||||
|
"primaryimageofpage": {
|
||||||
|
"file": {
|
||||||
|
"publicUrl": "/media/profile-public-url.png",
|
||||||
|
},
|
||||||
|
"canonicalUrl": "https://example.test/not-an-image-page",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://example.test/media/profile-public-url.png",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_web_results_preserve_generic_pagemap_snake_case_url_image_fields():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU profile page",
|
||||||
|
"link": "https://example.test/profile",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"pagemap": {
|
||||||
|
"webpage": [
|
||||||
|
{
|
||||||
|
"primaryimageofpage": {
|
||||||
|
"file": {
|
||||||
|
"public_url": "/media/profile-snake-public-url.png",
|
||||||
|
},
|
||||||
|
"canonical_url": "https://example.test/not-an-image-page",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://example.test/media/profile-snake-public-url.png",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_web_results_preserve_pagemap_srcset_image_candidates():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU profile page",
|
||||||
|
"link": "https://example.test/profile",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"pagemap": {
|
||||||
|
"webpage": [
|
||||||
|
{
|
||||||
|
"primaryimageofpage": {
|
||||||
|
"srcset": (
|
||||||
|
"/media/profile-small.jpg 320w, "
|
||||||
|
"/media/profile-large.jpg 1200w"
|
||||||
|
),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://example.test/media/profile-large.jpg",
|
||||||
|
"https://example.test/media/profile-small.jpg",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_web_results_preserve_pagemap_srcset_urls_with_query_commas():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU profile page",
|
||||||
|
"link": "https://example.test/profile",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"pagemap": {
|
||||||
|
"webpage": [
|
||||||
|
{
|
||||||
|
"primaryimageofpage": {
|
||||||
|
"srcset": (
|
||||||
|
"https://cdn.example.test/render?img=small,format=webp 320w, "
|
||||||
|
"https://cdn.example.test/render?img=large,format=webp 1200w"
|
||||||
|
),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://cdn.example.test/render?img=large,format=webp",
|
||||||
|
"https://cdn.example.test/render?img=small,format=webp",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_web_results_preserve_relative_pagemap_srcset_format_hint_urls():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU profile page",
|
||||||
|
"link": "https://example.test/profile",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"pagemap": {
|
||||||
|
"webpage": [
|
||||||
|
{
|
||||||
|
"primaryimageofpage": {
|
||||||
|
"srcset": (
|
||||||
|
"render?format=webp&w=320 320w, "
|
||||||
|
"render?format=webp&w=1200 1200w"
|
||||||
|
),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://example.test/render?format=webp&w=1200",
|
||||||
|
"https://example.test/render?format=webp&w=320",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_web_results_resolve_relative_pagemap_image_urls_against_result_page():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU profile page",
|
||||||
|
"link": "https://example.test/profiles/iu",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"pagemap": {
|
||||||
|
"cse_image": [{"src": "/media/profile.png"}],
|
||||||
|
"metatags": [{"og:image": "//cdn.example.test/profile-og.png"}],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://example.test/media/profile.png",
|
||||||
|
"https://cdn.example.test/profile-og.png",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_web_results_unwrap_proxy_pagemap_image_urls():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU profile page",
|
||||||
|
"link": "https://example.test/profiles/iu",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"pagemap": {
|
||||||
|
"metatags": [
|
||||||
|
{
|
||||||
|
"og:image": (
|
||||||
|
"https://images.example.test/imgres?"
|
||||||
|
"imgurl=https%3A%2F%2Fcdn.example.test%2Fofficial-profile.png"
|
||||||
|
"&imgrefurl=https%3A%2F%2Fexample.test%2Farticle"
|
||||||
|
)
|
||||||
|
}
|
||||||
|
],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://cdn.example.test/official-profile.png",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_web_results_unwrap_scheme_less_proxy_pagemap_image_urls():
|
||||||
|
adapter = GoogleCustomSearchAdapter(
|
||||||
|
FakeGoogleCustomSearchClient(
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU profile page",
|
||||||
|
"link": "https://example.test/profiles/iu",
|
||||||
|
"displayLink": "example.test",
|
||||||
|
"snippet": "profile image",
|
||||||
|
"pagemap": {
|
||||||
|
"metatags": [
|
||||||
|
{
|
||||||
|
"og:image": (
|
||||||
|
"https://images.example.test/imgres?"
|
||||||
|
"imgurl=cdn.example.test%2Fofficial-profile.png"
|
||||||
|
)
|
||||||
|
}
|
||||||
|
],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://cdn.example.test/official-profile.png",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_blocked_policy_makes_no_google_custom_search_call():
|
||||||
|
client = FakeGoogleCustomSearchClient()
|
||||||
|
adapter = GoogleCustomSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(disabled=True, compliance_approved=True)
|
||||||
|
|
||||||
|
evidence = adapter.search_images("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert client.image_calls == []
|
||||||
|
assert evidence[0].source == EvidenceSource.SEARCH_SKIPPED
|
||||||
|
assert evidence[0].data["query_signature"] == "google-custom-image:iu official profile"
|
||||||
|
|
||||||
|
|
||||||
|
def test_google_custom_adapter_reserves_configured_page_calls_before_search():
|
||||||
|
client = FakeGoogleCustomSearchClient()
|
||||||
|
client.image_pages = 2
|
||||||
|
adapter = GoogleCustomSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"google_custom_search"},
|
||||||
|
daily_limit=1,
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search_images("SUB-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert client.image_calls == []
|
||||||
|
assert evidence[0].source == EvidenceSource.SEARCH_SKIPPED
|
||||||
|
assert evidence[0].reason == "search API usage limit reached"
|
||||||
554
tests/rights_filter/integrations/test_naver_search.py
Normal file
554
tests/rights_filter/integrations/test_naver_search.py
Normal file
|
|
@ -0,0 +1,554 @@
|
||||||
|
from rights_filter.analysis.preprocessing import ImagePayload
|
||||||
|
from rights_filter.domain.records import AnalysisRun, EvidenceSource, InMemoryRightsFilterRepository
|
||||||
|
from rights_filter.integrations.naver_search import (
|
||||||
|
FakeNaverSearchClient,
|
||||||
|
NaverOpenApiSearchClient,
|
||||||
|
NaverSearchAdapter,
|
||||||
|
)
|
||||||
|
from rights_filter.integrations.search_policy import SearchApiPolicy
|
||||||
|
|
||||||
|
|
||||||
|
def test_approved_text_query_returns_ranked_naver_evidence():
|
||||||
|
client = FakeNaverSearchClient(
|
||||||
|
response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official album cover",
|
||||||
|
"link": "https://example.test/image.jpg",
|
||||||
|
"thumbnail": "https://example.test/thumb.jpg",
|
||||||
|
"sizeheight": "900",
|
||||||
|
"sizewidth": "900",
|
||||||
|
"page_url": "https://example.test/page",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"})
|
||||||
|
|
||||||
|
evidence = adapter.search("submission-1", "IU album cover", policy)
|
||||||
|
|
||||||
|
assert client.calls == ["IU album cover"]
|
||||||
|
assert evidence[0].source == EvidenceSource.NAVER_SEARCH
|
||||||
|
assert evidence[0].reason == "Naver search result found"
|
||||||
|
assert evidence[0].data["query"] == "IU album cover"
|
||||||
|
assert evidence[0].data["rank"] == 1
|
||||||
|
assert evidence[0].data["image_url"] == "https://example.test/image.jpg"
|
||||||
|
assert evidence[0].data["thumbnail_url"] == "https://example.test/thumb.jpg"
|
||||||
|
assert evidence[0].data["result_url"] == "https://example.test/page"
|
||||||
|
|
||||||
|
|
||||||
|
def test_naver_image_results_unwrap_proxy_image_urls():
|
||||||
|
client = FakeNaverSearchClient(
|
||||||
|
response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile",
|
||||||
|
"link": (
|
||||||
|
"https://search.naver.test/proxy?"
|
||||||
|
"imgurl=cdn.example.test%2Fiu-profile.png"
|
||||||
|
"&where=image"
|
||||||
|
),
|
||||||
|
"thumbnail": (
|
||||||
|
"https://search.naver.test/thumb?"
|
||||||
|
"image_url=https%3A%2F%2Fcdn.example.test%2Fiu-thumb.jpg"
|
||||||
|
),
|
||||||
|
"sizeheight": "900",
|
||||||
|
"sizewidth": "700",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"})
|
||||||
|
|
||||||
|
evidence = adapter.search("submission-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["image_url"] == "https://cdn.example.test/iu-profile.png"
|
||||||
|
assert evidence[0].data["thumbnail_url"] == "https://cdn.example.test/iu-thumb.jpg"
|
||||||
|
assert evidence[0].data["result_url"] == "https://cdn.example.test/iu-profile.png"
|
||||||
|
|
||||||
|
|
||||||
|
def test_naver_image_results_unwrap_weak_proxy_jfif_image_urls():
|
||||||
|
client = FakeNaverSearchClient(
|
||||||
|
response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile",
|
||||||
|
"link": (
|
||||||
|
"https://search.naver.test/proxy?"
|
||||||
|
"url=https%3A%2F%2Fcdn.example.test%2Fiu-profile.jfif"
|
||||||
|
"&where=image"
|
||||||
|
),
|
||||||
|
"thumbnail": (
|
||||||
|
"https://search.naver.test/thumb?"
|
||||||
|
"u=https%3A%2F%2Fcdn.example.test%2Fiu-thumb.jfif"
|
||||||
|
),
|
||||||
|
"sizeheight": "900",
|
||||||
|
"sizewidth": "700",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"})
|
||||||
|
|
||||||
|
evidence = adapter.search("submission-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["image_url"] == "https://cdn.example.test/iu-profile.jfif"
|
||||||
|
assert evidence[0].data["thumbnail_url"] == "https://cdn.example.test/iu-thumb.jfif"
|
||||||
|
assert evidence[0].data["result_url"] == "https://cdn.example.test/iu-profile.jfif"
|
||||||
|
|
||||||
|
|
||||||
|
def test_naver_image_results_unwrap_weak_proxy_query_format_image_urls():
|
||||||
|
client = FakeNaverSearchClient(
|
||||||
|
response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile",
|
||||||
|
"link": (
|
||||||
|
"https://search.naver.test/proxy?"
|
||||||
|
"url=https%3A%2F%2Fcdn.example.test%2Frender%3Fformat%3Dwebp"
|
||||||
|
"&where=image"
|
||||||
|
),
|
||||||
|
"thumbnail": (
|
||||||
|
"https://search.naver.test/thumb?"
|
||||||
|
"u=https%3A%2F%2Fcdn.example.test%2Fthumb%3Ffm%3Djpg"
|
||||||
|
),
|
||||||
|
"sizeheight": "900",
|
||||||
|
"sizewidth": "700",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"})
|
||||||
|
|
||||||
|
evidence = adapter.search("submission-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["image_url"] == "https://cdn.example.test/render?format=webp"
|
||||||
|
assert evidence[0].data["thumbnail_url"] == "https://cdn.example.test/thumb?fm=jpg"
|
||||||
|
assert evidence[0].data["result_url"] == "https://cdn.example.test/render?format=webp"
|
||||||
|
|
||||||
|
|
||||||
|
def test_naver_image_results_unwrap_scheme_less_proxy_query_format_image_urls():
|
||||||
|
client = FakeNaverSearchClient(
|
||||||
|
response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile",
|
||||||
|
"link": (
|
||||||
|
"https://search.naver.test/proxy?"
|
||||||
|
"url=cdn.example.test%2Frender%3Fformat%3Dwebp"
|
||||||
|
"&where=image"
|
||||||
|
),
|
||||||
|
"thumbnail": (
|
||||||
|
"https://search.naver.test/thumb?"
|
||||||
|
"u=cdn.example.test%2Fthumb%3Ffm%3Djpg"
|
||||||
|
),
|
||||||
|
"sizeheight": "900",
|
||||||
|
"sizewidth": "700",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"})
|
||||||
|
|
||||||
|
evidence = adapter.search("submission-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["image_url"] == "https://cdn.example.test/render?format=webp"
|
||||||
|
assert evidence[0].data["thumbnail_url"] == "https://cdn.example.test/thumb?fm=jpg"
|
||||||
|
assert evidence[0].data["result_url"] == "https://cdn.example.test/render?format=webp"
|
||||||
|
|
||||||
|
|
||||||
|
def test_naver_image_results_unwrap_redirect_page_urls():
|
||||||
|
client = FakeNaverSearchClient(
|
||||||
|
response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile",
|
||||||
|
"link": "https://cdn.example.test/iu-profile.png",
|
||||||
|
"thumbnail": "https://cdn.example.test/iu-thumb.jpg",
|
||||||
|
"page_url": (
|
||||||
|
"https://search.naver.test/rd?"
|
||||||
|
"url=https%3A%2F%2Fexample.test%2Fprofile-page"
|
||||||
|
"&where=image"
|
||||||
|
),
|
||||||
|
"sizeheight": "900",
|
||||||
|
"sizewidth": "700",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"})
|
||||||
|
|
||||||
|
evidence = adapter.search("submission-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["result_url"] == "https://example.test/profile-page"
|
||||||
|
|
||||||
|
|
||||||
|
def test_empty_naver_results_are_visible_evidence():
|
||||||
|
adapter = NaverSearchAdapter(FakeNaverSearchClient(response={"items": []}))
|
||||||
|
policy = SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"})
|
||||||
|
|
||||||
|
evidence = adapter.search("submission-1", "unknown query", policy)
|
||||||
|
|
||||||
|
assert evidence[0].source == EvidenceSource.NAVER_SEARCH
|
||||||
|
assert evidence[0].reason == "Naver search returned no results"
|
||||||
|
assert evidence[0].confidence == 0.0
|
||||||
|
|
||||||
|
|
||||||
|
def test_blog_search_results_are_mapped_as_page_evidence():
|
||||||
|
client = FakeNaverSearchClient(
|
||||||
|
response={"items": []},
|
||||||
|
blog_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile blog",
|
||||||
|
"description": "profile image",
|
||||||
|
"link": "https://example.test/blog-post",
|
||||||
|
"bloggername": "official blog",
|
||||||
|
"bloggerlink": "https://blog.example.test",
|
||||||
|
"postdate": "20260527",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
)
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"})
|
||||||
|
|
||||||
|
evidence = adapter.search_pages("submission-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert client.blog_calls == ["IU official profile"]
|
||||||
|
assert evidence[0].source == EvidenceSource.NAVER_SEARCH
|
||||||
|
assert evidence[0].reason == "Naver blog search result found"
|
||||||
|
assert evidence[0].data["query_signature"] == "naver-blog:iu official profile"
|
||||||
|
assert evidence[0].data["provider"] == "naver"
|
||||||
|
assert evidence[0].data["search_type"] == "blog"
|
||||||
|
assert evidence[0].data["result_url"] == "https://example.test/blog-post"
|
||||||
|
assert evidence[0].data["match"] == "page"
|
||||||
|
|
||||||
|
|
||||||
|
def test_blog_search_results_preserve_provider_image_hints_as_page_candidates():
|
||||||
|
client = FakeNaverSearchClient(
|
||||||
|
response={"items": []},
|
||||||
|
blog_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile blog",
|
||||||
|
"description": "profile image",
|
||||||
|
"link": "https://example.test/blog-post",
|
||||||
|
"thumbnail": "https://postfiles.pstatic.net/profile-thumb.jpg?type=w966",
|
||||||
|
"bloggername": "official blog",
|
||||||
|
"bloggerlink": "https://blog.example.test",
|
||||||
|
"postdate": "20260527",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
)
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"})
|
||||||
|
|
||||||
|
evidence = adapter.search_pages("submission-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://postfiles.pstatic.net/profile-thumb.jpg?type=w966"
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_blog_search_results_unwrap_redirect_result_urls():
|
||||||
|
client = FakeNaverSearchClient(
|
||||||
|
response={"items": []},
|
||||||
|
blog_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile blog",
|
||||||
|
"description": "profile image",
|
||||||
|
"link": (
|
||||||
|
"https://search.naver.test/rd?"
|
||||||
|
"url=https%3A%2F%2Fexample.test%2Fblog-post"
|
||||||
|
"&where=blog"
|
||||||
|
),
|
||||||
|
"bloggername": "official blog",
|
||||||
|
"bloggerlink": "https://blog.example.test",
|
||||||
|
"postdate": "20260527",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
)
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"})
|
||||||
|
|
||||||
|
evidence = adapter.search_pages("submission-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["result_url"] == "https://example.test/blog-post"
|
||||||
|
|
||||||
|
|
||||||
|
def test_web_search_results_are_mapped_as_page_evidence():
|
||||||
|
client = FakeNaverSearchClient(
|
||||||
|
response={"items": []},
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile page",
|
||||||
|
"description": "profile image",
|
||||||
|
"link": "https://example.test/web-profile",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
)
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"})
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("submission-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert client.web_calls == ["IU official profile"]
|
||||||
|
assert evidence[0].source == EvidenceSource.NAVER_SEARCH
|
||||||
|
assert evidence[0].reason == "Naver web search result found"
|
||||||
|
assert evidence[0].data["query_signature"] == "naver-web:iu official profile"
|
||||||
|
assert evidence[0].data["provider"] == "naver"
|
||||||
|
assert evidence[0].data["search_type"] == "web"
|
||||||
|
assert evidence[0].data["result_url"] == "https://example.test/web-profile"
|
||||||
|
assert evidence[0].data["match"] == "page"
|
||||||
|
|
||||||
|
|
||||||
|
def test_web_search_results_preserve_provider_image_hints_as_page_candidates():
|
||||||
|
client = FakeNaverSearchClient(
|
||||||
|
response={"items": []},
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile page",
|
||||||
|
"description": "profile image",
|
||||||
|
"link": "https://example.test/web-profile",
|
||||||
|
"imageUrl": "//cdn.example.test/profile.webp",
|
||||||
|
"thumbnailUrl": "https://cdn.example.test/profile-thumb?format=webp",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
)
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"})
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("submission-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["page_image_urls"] == [
|
||||||
|
"https://cdn.example.test/profile.webp",
|
||||||
|
"https://cdn.example.test/profile-thumb?format=webp",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_web_search_results_unwrap_redirect_result_urls():
|
||||||
|
client = FakeNaverSearchClient(
|
||||||
|
response={"items": []},
|
||||||
|
web_response={
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "IU official profile page",
|
||||||
|
"description": "profile image",
|
||||||
|
"link": (
|
||||||
|
"https://search.naver.test/rd?"
|
||||||
|
"u=https%3A%2F%2Fexample.test%2Fweb-profile"
|
||||||
|
"&where=web"
|
||||||
|
),
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
)
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"})
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("submission-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert evidence[0].data["result_url"] == "https://example.test/web-profile"
|
||||||
|
|
||||||
|
|
||||||
|
def test_blocked_policy_makes_no_naver_client_call():
|
||||||
|
client = FakeNaverSearchClient(response={"items": []})
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(disabled=True, compliance_approved=True)
|
||||||
|
|
||||||
|
evidence = adapter.search("submission-1", "IU album cover", policy)
|
||||||
|
|
||||||
|
assert client.calls == []
|
||||||
|
assert evidence[0].source == EvidenceSource.SEARCH_SKIPPED
|
||||||
|
assert evidence[0].reason == "search API disabled"
|
||||||
|
|
||||||
|
|
||||||
|
def test_naver_adapter_reserves_configured_page_calls_before_search():
|
||||||
|
client = FakeNaverSearchClient(response={"items": []})
|
||||||
|
client.image_pages = 2
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"naver"},
|
||||||
|
daily_limit=1,
|
||||||
|
)
|
||||||
|
|
||||||
|
evidence = adapter.search("submission-1", "IU album cover", policy)
|
||||||
|
|
||||||
|
assert client.calls == []
|
||||||
|
assert evidence[0].source == EvidenceSource.SEARCH_SKIPPED
|
||||||
|
assert evidence[0].reason == "search API usage limit reached"
|
||||||
|
|
||||||
|
|
||||||
|
def test_blocked_policy_makes_no_naver_blog_client_call():
|
||||||
|
client = FakeNaverSearchClient(blog_response={"items": []})
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(disabled=True, compliance_approved=True)
|
||||||
|
|
||||||
|
evidence = adapter.search_pages("submission-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert client.blog_calls == []
|
||||||
|
assert evidence[0].source == EvidenceSource.SEARCH_SKIPPED
|
||||||
|
assert evidence[0].data["query_signature"] == "naver-blog:iu official profile"
|
||||||
|
|
||||||
|
|
||||||
|
def test_blocked_policy_makes_no_naver_web_client_call():
|
||||||
|
client = FakeNaverSearchClient(web_response={"items": []})
|
||||||
|
adapter = NaverSearchAdapter(client)
|
||||||
|
policy = SearchApiPolicy(disabled=True, compliance_approved=True)
|
||||||
|
|
||||||
|
evidence = adapter.search_web_pages("submission-1", "IU official profile", policy)
|
||||||
|
|
||||||
|
assert client.web_calls == []
|
||||||
|
assert evidence[0].source == EvidenceSource.SEARCH_SKIPPED
|
||||||
|
assert evidence[0].data["query_signature"] == "naver-web:iu official profile"
|
||||||
|
|
||||||
|
|
||||||
|
def test_naver_open_api_client_calls_blog_endpoint_with_same_credentials():
|
||||||
|
class FakeTransport:
|
||||||
|
def __init__(self):
|
||||||
|
self.calls = []
|
||||||
|
|
||||||
|
def request_json(self, method, url, headers=None, payload=None, timeout=10):
|
||||||
|
self.calls.append({"method": method, "url": url, "headers": headers or {}})
|
||||||
|
return {"items": []}
|
||||||
|
|
||||||
|
transport = FakeTransport()
|
||||||
|
client = NaverOpenApiSearchClient(
|
||||||
|
client_id="naver-id",
|
||||||
|
client_secret="naver-secret",
|
||||||
|
transport=transport,
|
||||||
|
blog_display=3,
|
||||||
|
)
|
||||||
|
|
||||||
|
client.search_blog("IU official profile")
|
||||||
|
|
||||||
|
call = transport.calls[0]
|
||||||
|
assert call["method"] == "GET"
|
||||||
|
assert call["url"].startswith("https://openapi.naver.com/v1/search/blog?")
|
||||||
|
assert "query=IU+official+profile" in call["url"]
|
||||||
|
assert "display=3" in call["url"]
|
||||||
|
assert call["headers"]["X-Naver-Client-Id"] == "naver-id"
|
||||||
|
assert call["headers"]["X-Naver-Client-Secret"] == "naver-secret"
|
||||||
|
|
||||||
|
|
||||||
|
def test_naver_open_api_client_merges_configured_image_pages_with_incremented_start():
|
||||||
|
class PagingTransport:
|
||||||
|
def __init__(self):
|
||||||
|
self.calls = []
|
||||||
|
|
||||||
|
def request_json(self, method, url, headers=None, payload=None, timeout=10):
|
||||||
|
self.calls.append({"method": method, "url": url, "headers": headers or {}})
|
||||||
|
if "start=3" in url:
|
||||||
|
return {
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "page 2",
|
||||||
|
"link": "https://example.test/page-2.jpg",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
return {
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"title": "page 1",
|
||||||
|
"link": "https://example.test/page-1.jpg",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
transport = PagingTransport()
|
||||||
|
client = NaverOpenApiSearchClient(
|
||||||
|
client_id="naver-id",
|
||||||
|
client_secret="naver-secret",
|
||||||
|
transport=transport,
|
||||||
|
display=2,
|
||||||
|
image_pages=2,
|
||||||
|
)
|
||||||
|
|
||||||
|
response = client.search_image("IU official profile")
|
||||||
|
|
||||||
|
assert [item["link"] for item in response["items"]] == [
|
||||||
|
"https://example.test/page-1.jpg",
|
||||||
|
"https://example.test/page-2.jpg",
|
||||||
|
]
|
||||||
|
assert "start=1" in transport.calls[0]["url"]
|
||||||
|
assert "start=3" in transport.calls[1]["url"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_naver_open_api_client_calls_web_endpoint_with_same_credentials():
|
||||||
|
class FakeTransport:
|
||||||
|
def __init__(self):
|
||||||
|
self.calls = []
|
||||||
|
|
||||||
|
def request_json(self, method, url, headers=None, payload=None, timeout=10):
|
||||||
|
self.calls.append({"method": method, "url": url, "headers": headers or {}})
|
||||||
|
return {"items": []}
|
||||||
|
|
||||||
|
transport = FakeTransport()
|
||||||
|
client = NaverOpenApiSearchClient(
|
||||||
|
client_id="naver-id",
|
||||||
|
client_secret="naver-secret",
|
||||||
|
transport=transport,
|
||||||
|
web_display=3,
|
||||||
|
)
|
||||||
|
|
||||||
|
client.search_web("IU official profile")
|
||||||
|
|
||||||
|
call = transport.calls[0]
|
||||||
|
assert call["method"] == "GET"
|
||||||
|
assert call["url"].startswith("https://openapi.naver.com/v1/search/webkr?")
|
||||||
|
assert "query=IU+official+profile" in call["url"]
|
||||||
|
assert "display=3" in call["url"]
|
||||||
|
assert call["headers"]["X-Naver-Client-Id"] == "naver-id"
|
||||||
|
assert call["headers"]["X-Naver-Client-Secret"] == "naver-secret"
|
||||||
|
|
||||||
|
|
||||||
|
def test_naver_adapter_rejects_image_payload_input():
|
||||||
|
adapter = NaverSearchAdapter(FakeNaverSearchClient())
|
||||||
|
policy = SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"})
|
||||||
|
|
||||||
|
try:
|
||||||
|
adapter.search(
|
||||||
|
"submission-1",
|
||||||
|
ImagePayload(content=b"image", width=10, height=10),
|
||||||
|
policy,
|
||||||
|
)
|
||||||
|
except ValueError as error:
|
||||||
|
assert "text query" in str(error)
|
||||||
|
else:
|
||||||
|
raise AssertionError("expected text-only query rejection")
|
||||||
|
|
||||||
|
|
||||||
|
def test_naver_evidence_can_be_stored_with_existing_analysis_run():
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
run = AnalysisRun.for_submission("submission-1", "v1")
|
||||||
|
evidence = NaverSearchAdapter(
|
||||||
|
FakeNaverSearchClient(response={"items": [{"title": "IU", "link": "url"}]})
|
||||||
|
).search(
|
||||||
|
"submission-1",
|
||||||
|
"IU",
|
||||||
|
SearchApiPolicy(compliance_approved=True, allowed_providers={"naver"}),
|
||||||
|
)
|
||||||
|
|
||||||
|
for item in evidence:
|
||||||
|
run.add_evidence(item)
|
||||||
|
repo.save_analysis_run(run)
|
||||||
|
|
||||||
|
assert repo.analysis_runs_for_submission("submission-1")[0].evidence[0].source == (
|
||||||
|
EvidenceSource.NAVER_SEARCH
|
||||||
|
)
|
||||||
44
tests/rights_filter/integrations/test_search_policy.py
Normal file
44
tests/rights_filter/integrations/test_search_policy.py
Normal file
|
|
@ -0,0 +1,44 @@
|
||||||
|
from rights_filter.integrations.search_policy import SearchApiPolicy
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_policy_blocks_disabled_or_unapproved_provider():
|
||||||
|
disabled = SearchApiPolicy(disabled=True, compliance_approved=True)
|
||||||
|
unapproved = SearchApiPolicy(disabled=False, compliance_approved=False)
|
||||||
|
|
||||||
|
assert disabled.can_call("naver") == (False, "search API disabled")
|
||||||
|
assert unapproved.can_call("naver") == (
|
||||||
|
False,
|
||||||
|
"search API compliance not approved",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_policy_enforces_allowed_providers_and_daily_limit():
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"naver"},
|
||||||
|
daily_limit=1,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert policy.can_call("naver") == (True, None)
|
||||||
|
policy.record_call()
|
||||||
|
|
||||||
|
assert policy.can_call("google") == (False, "search provider not allowed")
|
||||||
|
assert policy.can_call("naver") == (False, "search API usage limit reached")
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_policy_counts_requested_page_calls_against_daily_limit():
|
||||||
|
policy = SearchApiPolicy(
|
||||||
|
compliance_approved=True,
|
||||||
|
allowed_providers={"naver"},
|
||||||
|
daily_limit=3,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert policy.can_call("naver", requested_calls=2) == (True, None)
|
||||||
|
policy.record_call(2)
|
||||||
|
|
||||||
|
assert policy.calls_made == 2
|
||||||
|
assert policy.can_call("naver", requested_calls=2) == (
|
||||||
|
False,
|
||||||
|
"search API usage limit reached",
|
||||||
|
)
|
||||||
|
assert policy.can_call("naver", requested_calls=1) == (True, None)
|
||||||
72
tests/rights_filter/jobs/test_batch_analyzer.py
Normal file
72
tests/rights_filter/jobs/test_batch_analyzer.py
Normal file
|
|
@ -0,0 +1,72 @@
|
||||||
|
from rights_filter.analysis.face_person_detection import HeuristicFacePersonDetector
|
||||||
|
from rights_filter.analysis.fingerprints import FingerprintService
|
||||||
|
from rights_filter.analysis.internal_analyzer import InternalAnalyzer
|
||||||
|
from rights_filter.analysis.preprocessing import ImagePayload
|
||||||
|
from rights_filter.analysis.risk_scoring import RiskScorer
|
||||||
|
from rights_filter.domain.records import InMemoryRightsFilterRepository
|
||||||
|
from rights_filter.integrations.cloud_vision_web_detection import (
|
||||||
|
CloudVisionWebDetectionAdapter,
|
||||||
|
FakeWebDetectionClient,
|
||||||
|
)
|
||||||
|
from rights_filter.integrations.external_policy import ExternalApiPolicy
|
||||||
|
from rights_filter.jobs.batch_analyzer import BatchAnalyzer, SubmissionImage
|
||||||
|
|
||||||
|
|
||||||
|
def build_batch(policy=None):
|
||||||
|
repo = InMemoryRightsFilterRepository()
|
||||||
|
internal = InternalAnalyzer(repo, FingerprintService(), HeuristicFacePersonDetector())
|
||||||
|
external = CloudVisionWebDetectionAdapter(FakeWebDetectionClient())
|
||||||
|
return BatchAnalyzer(
|
||||||
|
repository=repo,
|
||||||
|
internal_analyzer=internal,
|
||||||
|
external_adapter=external,
|
||||||
|
external_policy=policy or ExternalApiPolicy(disabled=True),
|
||||||
|
scorer=RiskScorer(),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_batch_processes_all_submissions_internal_only_when_external_disabled():
|
||||||
|
batch = build_batch()
|
||||||
|
submissions = [
|
||||||
|
SubmissionImage("s1", ImagePayload(b"image one FACE", 100, 100, {})),
|
||||||
|
SubmissionImage("s2", ImagePayload(b"image two", 100, 100, {})),
|
||||||
|
]
|
||||||
|
|
||||||
|
summary = batch.run(submissions)
|
||||||
|
|
||||||
|
assert summary.processed == 2
|
||||||
|
assert summary.external_skipped == 2
|
||||||
|
assert batch.repository.latest_score_for_submission("s1") is not None
|
||||||
|
assert batch.repository.latest_score_for_submission("s2") is not None
|
||||||
|
|
||||||
|
|
||||||
|
def test_batch_is_idempotent_for_same_submission_and_version():
|
||||||
|
batch = build_batch()
|
||||||
|
submissions = [SubmissionImage("s1", ImagePayload(b"image one", 100, 100, {}))]
|
||||||
|
|
||||||
|
first = batch.run(submissions, analysis_version="v1")
|
||||||
|
second = batch.run(submissions, analysis_version="v1")
|
||||||
|
|
||||||
|
assert first.processed == 1
|
||||||
|
assert second.skipped_existing == 1
|
||||||
|
assert len(batch.repository.analysis_runs_for_submission("s1")) == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_failed_submission_is_counted_once_and_remains_retryable():
|
||||||
|
batch = build_batch()
|
||||||
|
# Non-empty bytes (so internal analysis runs) but a non-positive dimension,
|
||||||
|
# which makes build_external_derivative raise PreprocessingError.
|
||||||
|
submissions = [SubmissionImage("s1", ImagePayload(b"not-an-image", 0, 100, {}))]
|
||||||
|
|
||||||
|
first = batch.run(submissions)
|
||||||
|
|
||||||
|
# Counted as failed only — not also processed — and no partial run persisted.
|
||||||
|
assert first.failed == 1
|
||||||
|
assert first.processed == 0
|
||||||
|
assert batch.repository.latest_score_for_submission("s1") is None
|
||||||
|
assert not batch.repository.has_analysis_run("s1", "v1")
|
||||||
|
|
||||||
|
# Because nothing was saved, a later batch re-attempts rather than skipping.
|
||||||
|
second = batch.run(submissions)
|
||||||
|
assert second.failed == 1
|
||||||
|
assert second.skipped_existing == 0
|
||||||
22
tests/rights_filter/jobs/test_review_enrichment_job.py
Normal file
22
tests/rights_filter/jobs/test_review_enrichment_job.py
Normal file
|
|
@ -0,0 +1,22 @@
|
||||||
|
from rights_filter.analysis.evidence_enrichment import EnrichmentSummary
|
||||||
|
from rights_filter.jobs.review_enrichment_job import ReviewEnrichmentJob
|
||||||
|
|
||||||
|
|
||||||
|
class RecordingEnricher:
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self.calls: list[str] = []
|
||||||
|
|
||||||
|
def enrich_latest(self, repository, submission_id: str) -> EnrichmentSummary:
|
||||||
|
self.calls.append(submission_id)
|
||||||
|
return EnrichmentSummary(generated_queries=1, executed_searches=1)
|
||||||
|
|
||||||
|
|
||||||
|
def test_review_enrichment_job_runs_for_each_submission():
|
||||||
|
enricher = RecordingEnricher()
|
||||||
|
job = ReviewEnrichmentJob(enricher)
|
||||||
|
|
||||||
|
summary = job.run(repository=object(), submission_ids=["s1", "s2"])
|
||||||
|
|
||||||
|
assert enricher.calls == ["s1", "s2"]
|
||||||
|
assert summary.processed == 2
|
||||||
|
assert summary.executed_searches == 2
|
||||||
57
tests/rights_filter/server/test_env_file.py
Normal file
57
tests/rights_filter/server/test_env_file.py
Normal file
|
|
@ -0,0 +1,57 @@
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from rights_filter.server.env_file import load_env_file
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_env_file_reads_local_key_values_without_overriding_existing(tmp_path: Path):
|
||||||
|
env_path = tmp_path / ".env"
|
||||||
|
env_path.write_text(
|
||||||
|
"""
|
||||||
|
# Copyrighter API keys
|
||||||
|
NAVER_CLIENT_ID=from-file
|
||||||
|
NAVER_CLIENT_SECRET="secret value"
|
||||||
|
export OLLAMA_MODEL=qwen2.5:0.5b-instruct
|
||||||
|
EMPTY_VALUE=
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
environ = {"NAVER_CLIENT_ID": "already-set"}
|
||||||
|
|
||||||
|
load_env_file(env_path, environ)
|
||||||
|
|
||||||
|
assert environ["NAVER_CLIENT_ID"] == "already-set"
|
||||||
|
assert environ["NAVER_CLIENT_SECRET"] == "secret value"
|
||||||
|
assert environ["OLLAMA_MODEL"] == "qwen2.5:0.5b-instruct"
|
||||||
|
assert environ["EMPTY_VALUE"] == ""
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_env_file_ignores_missing_file(tmp_path: Path):
|
||||||
|
environ = {}
|
||||||
|
|
||||||
|
load_env_file(tmp_path / ".env", environ)
|
||||||
|
|
||||||
|
assert environ == {}
|
||||||
|
|
||||||
|
|
||||||
|
def test_env_example_documents_all_search_provider_keys():
|
||||||
|
example = Path(".env.example").read_text(encoding="utf-8")
|
||||||
|
|
||||||
|
for key in [
|
||||||
|
"NAVER_CLIENT_ID",
|
||||||
|
"NAVER_CLIENT_SECRET",
|
||||||
|
"NAVER_SEARCH_PAGES",
|
||||||
|
"NAVER_BLOG_SEARCH_PAGES",
|
||||||
|
"NAVER_WEB_SEARCH_PAGES",
|
||||||
|
"GOOGLE_CLOUD_VISION_API_KEY",
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_API_KEY",
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_CX",
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_IMAGE_RESULTS",
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_WEB_RESULTS",
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_IMAGE_PAGES",
|
||||||
|
"GOOGLE_CUSTOM_SEARCH_WEB_PAGES",
|
||||||
|
"COPYRIGHTER_AUTO_GOOGLE_CUSTOM_QUERY_LIMIT",
|
||||||
|
"COPYRIGHTER_SEARCH_RESULT_COMPARE_LIMIT",
|
||||||
|
"COPYRIGHTER_SEARCH_RESULT_PAGE_IMAGE_LIMIT",
|
||||||
|
"COPYRIGHTER_SEARCH_RESULT_SIMILARITY_THRESHOLD",
|
||||||
|
]:
|
||||||
|
assert f"{key}=" in example
|
||||||
1132
tests/rights_filter/server/test_http_app.py
Normal file
1132
tests/rights_filter/server/test_http_app.py
Normal file
File diff suppressed because it is too large
Load diff
127
tests/rights_filter/server/test_image_store.py
Normal file
127
tests/rights_filter/server/test_image_store.py
Normal file
|
|
@ -0,0 +1,127 @@
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from rights_filter.server.image_store import LocalSubmissionImageStore
|
||||||
|
|
||||||
|
|
||||||
|
def test_local_image_store_reads_manifest_and_payload(tmp_path: Path):
|
||||||
|
image_dir = tmp_path / "images"
|
||||||
|
image_dir.mkdir()
|
||||||
|
image_file = image_dir / "sample.svg"
|
||||||
|
image_file.write_text("<svg><!-- FACE PERSON --></svg>", encoding="utf-8")
|
||||||
|
(tmp_path / "submissions.json").write_text(
|
||||||
|
"""
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"id": "SUB-T1",
|
||||||
|
"title": "Local sample",
|
||||||
|
"file": "images/sample.svg",
|
||||||
|
"width": 640,
|
||||||
|
"height": 480,
|
||||||
|
"submitted_at": "2026-05-26 10:00"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
store = LocalSubmissionImageStore(tmp_path)
|
||||||
|
|
||||||
|
records = store.submission_records()
|
||||||
|
payload = store.image_payload("SUB-T1")
|
||||||
|
|
||||||
|
assert records[0]["id"] == "SUB-T1"
|
||||||
|
assert records[0]["asset"] == "/media/images/sample.svg"
|
||||||
|
assert payload.content == image_file.read_bytes()
|
||||||
|
assert payload.width == 640
|
||||||
|
assert payload.height == 480
|
||||||
|
|
||||||
|
|
||||||
|
def test_local_image_store_includes_unlisted_image_files_when_manifest_exists(tmp_path: Path):
|
||||||
|
image_dir = tmp_path / "images"
|
||||||
|
image_dir.mkdir()
|
||||||
|
(image_dir / "listed.svg").write_text("<svg></svg>", encoding="utf-8")
|
||||||
|
(image_dir / "copied.png").write_bytes(b"png-bytes")
|
||||||
|
(tmp_path / "submissions.json").write_text(
|
||||||
|
"""
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"id": "SUB-LISTED",
|
||||||
|
"title": "Listed sample",
|
||||||
|
"file": "images/listed.svg",
|
||||||
|
"width": 640,
|
||||||
|
"height": 480
|
||||||
|
}
|
||||||
|
]
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
records = LocalSubmissionImageStore(tmp_path).submission_records()
|
||||||
|
|
||||||
|
assert [record["id"] for record in records] == ["SUB-LISTED", "copied"]
|
||||||
|
assert records[1]["title"] == "copied"
|
||||||
|
assert records[1]["asset"] == "/media/images/copied.png"
|
||||||
|
|
||||||
|
|
||||||
|
def test_local_image_store_scans_avif_submissions(tmp_path: Path):
|
||||||
|
image_dir = tmp_path / "images"
|
||||||
|
image_dir.mkdir()
|
||||||
|
(image_dir / "portrait.avif").write_bytes(b"avif-bytes")
|
||||||
|
|
||||||
|
records = LocalSubmissionImageStore(tmp_path).submission_records()
|
||||||
|
payload = LocalSubmissionImageStore(tmp_path).image_payload("portrait")
|
||||||
|
|
||||||
|
assert records[0]["id"] == "portrait"
|
||||||
|
assert records[0]["asset"] == "/media/images/portrait.avif"
|
||||||
|
assert records[0]["format"] == "AVIF"
|
||||||
|
assert payload.metadata["format"] == "AVIF"
|
||||||
|
|
||||||
|
|
||||||
|
def test_local_image_store_skips_manifest_records_with_missing_files(tmp_path: Path):
|
||||||
|
image_dir = tmp_path / "images"
|
||||||
|
image_dir.mkdir()
|
||||||
|
(image_dir / "copied.jpg").write_bytes(b"jpg-bytes")
|
||||||
|
(tmp_path / "submissions.json").write_text(
|
||||||
|
"""
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"id": "SUB-MISSING",
|
||||||
|
"title": "Missing sample",
|
||||||
|
"file": "images/missing.svg",
|
||||||
|
"width": 640,
|
||||||
|
"height": 480
|
||||||
|
}
|
||||||
|
]
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
records = LocalSubmissionImageStore(tmp_path).submission_records()
|
||||||
|
|
||||||
|
assert [record["id"] for record in records] == ["copied"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_local_image_store_rejects_manifest_paths_outside_root(tmp_path: Path):
|
||||||
|
(tmp_path / "submissions.json").write_text(
|
||||||
|
"""
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"id": "SUB-BAD",
|
||||||
|
"title": "Bad sample",
|
||||||
|
"file": "../secret.svg",
|
||||||
|
"width": 100,
|
||||||
|
"height": 100
|
||||||
|
}
|
||||||
|
]
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
store = LocalSubmissionImageStore(tmp_path)
|
||||||
|
|
||||||
|
try:
|
||||||
|
store.submission_records()
|
||||||
|
except ValueError as exc:
|
||||||
|
assert "outside image store" in str(exc)
|
||||||
|
else:
|
||||||
|
raise AssertionError("expected unsafe manifest path to be rejected")
|
||||||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Reference in a new issue