repo·evals
· 2026-05-04 ·master@HEAD (Docker image ghcr.io/usagi-org/ai-goofish:latest)

ai-goofish-monitor

Usagi-org/ai-goofish-monitor

🛠69 / 100
🎯

📝
🧬

🛑
0–29
⚠️
30–49
🛠
50–79
🏭
80–100
69
🛠· 69 / 100
  • 7 claims passed, no critical failures
  • MIT / Apache / etc., installable per deployment.install_methods
  • release_pipeline_score=3 + pushed in 90-day window
  • multilingual_readme=true
  • static-only eval; live e2e pending

#1👤
#2🎯
#3🧭
#4

Natural-language criteria自然语言条件(via Vue dashboard)(Vue dashboard 输入)APSchedulerAPScheduler(multi-task runs)(多任务调度)Playwright crawlerPlaywright 爬虫(cookie login → search)(cookie 登录 → 搜索)Multi-modal LLM多模态 LLM(photos + text vs criteria)(照片 + 文字 比对条件)Notification dispatch推送分发(ntfy / Bark / WeChat / TG / ...)(ntfy / Bark / 企微 / TG / ...)Push to your phone推到你手机(only true matches)(只有真命中)

docker compose up -dany (Docker)easy
Pre-built ghcr imageany (Docker)easy
  • 🌐
  • 🚨WEB_PASSWORD=admin123 + 0.0.0.0:8000 default — change before public exposure
Xianyu / 闲鱼 (real account)
Cookie-based session for crawling listings
Use companion Chrome extension to export login cookie
OpenAI-compatible LLM (default: modelscope.cn)
Multi-modal product analysis
Defaults to modelscope.cn (CN-friendly); change OPENAI_BASE_URL for own gateway
ntfy / Bark / 企业微信 / Telegram / Gotify / Webhook
Notification delivery
Pick at least one channel; ntfy.sh is fully free
· 8
6 1 1
+40
+14
+12
+6
0
-3

7 / 8
passed claim-001

passed claim-002

passed claim-003

passed claim-004

passed claim-005

passed claim-006

passed claim-007

untested claim-008

input_contract
output_contract
determinism
idempotence
no_skill_callouts
failure_mode_clarity

workflow_correctness
declared_call_graph
stop_conditions
handoff_points
atom_evidence
error_propagation
partial_failure_handling

  • core user-facing layer untested → capped at 'usable'
  • evidence_completeness='partial' (not portable) → capped at 'usable'

  • only 1/2 critical claims covered

archetype: api-servicecore_layer_tested? Falseevidence: partialrecommended: usablefinal: usable
ceiling 1 · core user-facing layer untested → capped at 'usable'
ceiling 2 · evidence_completeness='partial' (not portable) → capped at 'usable'

claim-001Docker Compose 一行命令真能起服务criticalinstall● passed
claim-002后端是 FastAPI + DDD 分层(不是脚本拼起来)highstructure● passed
claim-0036 个通知渠道与 README 列表对齐highnotifications● passed
claim-004多阶段 Dockerfile 真的多阶段highimage-quality● passed
claim-005浏览器扩展(XianYu 登录态导出)真存在highcompanion-tooling● passed
claim-006prompts/ 是真实 prompt 模板而非空壳highai-prompts● passed
claim-007默认密码 admin/admin123 应在 README 明确警告highsecurity◐ partial
claim-008端到端 happy path:登录态 + 任务 + 通知能跑通criticalend-to-end○ untested

0%
0.00s
0

run-static-checks

2026-05-04
0% tokens in ? / out ?

run-static-checks

2026-05-04
0% tokens in ? / out ?
# ai-goofish-monitor — final verdict (2026-05-04)

## Repo

- **Name:** Usagi-org/ai-goofish-monitor
- **Branch evaluated:** master@HEAD (Docker image
  `ghcr.io/usagi-org/ai-goofish:latest`)
- **Archetype:** api-service (reclassified from default `hybrid-skill`)
- **Layer:** **molecule** — predefined LLM-criteria → scrape →
  LLM-analyze → notify pipeline
- **Eval framework:** repo-evals layer model v1 (f9ed1e9)

## Bucket

**`usable`** — clean static layer, popular and well-engineered. Two
soft concerns (admin-password DX + 8000:8000 binding) are foot-shotgun
risks worth disclosing. Compound molecule rule caps `usable` until
a logged live monitoring run.

## What was evaluated

### Atom + molecule level (static, this run)

| Claim | Status | Notes |
|---|---|---|
| 001 docker-compose | passed | Single `app` service, 9 mounts, port 8000:8000 |
| 002 clean DDD layering | passed | src/{api, core, domain, services, infrastructure} |
| 003 6 notification channels | passed | All 6 README-claimed channels have section headers in env.example |
| 004 multi-stage Dockerfile | passed | 3 FROM stages: node frontend-builder + Python venv builder + lean final |
| 005 Chrome extension MV3 | passed | `Xianyu Login State Extractor` v1.1, scoped to `*.goofish.com` only |
| 006 prompt templates | passed | base_prompt 47 lines + macbook_criteria 46 lines |
| 007 admin password DX | passed_with_concerns | Defaults `admin/admin123` noted but not flagged as risky; combined with default 8000:8000 binding = foot-shotgun on public VPS |

### Molecule level (deferred)

| Claim | Status | Required |
|---|---|---|
| 008 live monitoring e2e | untested | Real XianYu cookie + LLM key + ntfy URL; create AI task; verify Playwright + LLM + notification chain |

## Real findings worth surfacing

1. **Foot-shotgun on public deployment.** `WEB_PASSWORD=admin123` +
   `8000:8000` (binds to all interfaces) means a user spinning this
   up on a VPS gets an internet-reachable admin login with the
   default password. README does say "默认 admin/admin123" but
   doesn't strongly warn. Two simple fixes upstream: bind `127.0.0.1:8000`
   by default, and add a `⚠️ change WEB_PASSWORD before exposing` line.

2. **Companion extension is well-scoped.** Unlike
   `xiaohongshu-skills`' XHS Bridge (which uses `debugger`), this
   one is just `cookies + scripting + storage + tabs + webRequest`
   on `*.goofish.com` only. Lower privilege footprint, narrower
   blast radius.

3. **DDD-style src/ is the real deal.** A lot of "Playwright + AI"
   repos ship as a single 3000-line script. This one has a clean
   `api / core / domain / services / infrastructure` split — easier
   to fork, easier to audit.

4. **OPENAI_BASE_URL defaults to modelscope.cn.** China-friendly
   default, but means user prompts and product images go through a
   3rd-party model gateway out of the box. Worth disclosing in
   `watch_out` (already done).

## Why not higher

`usable` because:

- No live monitoring run. Compound molecule rule requires evidence
  the actual user-value chain (notification fires on a real listing
  match) works.
- The two soft concerns (default password, default port binding)
  meaningfully reduce trust for casual users; promotion past
  `usable` should require either upstream fixes or a documented
  hardening playbook.

## Path to `reusable`

1. Run a live monitoring scenario per the deferred plan.
2. Produce a hardening checklist (change WEB_PASSWORD, change
   OPENAI_BASE_URL if needed, set proxy pool if rate-limited).
3. Run an expired-cookie scenario; verify error visibility.
4. Update claims, re-run verdict_calculator.

## Recommended

```yaml
current_bucket: usable
status: evaluated
```