· 2026-05-04

MediaCrawler

NanmiCoder/MediaCrawler

🛠75 / 100

✅

⚠

🎯

⚠

📝

🗺

📍

⚛

→

⚗

→

🧬

🛑

0–29

⚠️

30–49

🛠

50–79

🏭

80–100

▼

🛠· 75 / 100

✓4 claims passed, no critical failures
✓MIT / Apache / etc., installable per deployment.install_methods
◐release_pipeline=1, recently_active=True
✓multilingual_readme=true
⚪static-only eval; live e2e pending

#1👤

#2🎯

#3🧭

#4⇄

zinan92/content-downloader

🛑 · 29molecule

Usagi-org/ai-goofish-monitor

🛠 · 69molecule


`git clone + uv pip install + uv run main.py`	any (Python 3.11+)	moderate

🌐

Real account on each platform

QR / phone / cookie login; aggressive scraping risks account limits

Per-platform anti-bot ecosystem

Internal — handled via libs/*.js + xhshow + per-platform client.py

Project handles signing internally; breaks when platforms change

MediaCrawlerPro (paid)

Removes Playwright dependency, adds resume + multi-account

Commercial upgrade — separate from this OSS repo

· 7

3 1 3

	+40
	+14
	+12
	+9
	0
	0

4 / 7

passed claim-001

passed claim-002

passed claim-003

passed claim-004

unknown claim-005

unknown claim-006

unknown claim-007

`input_contract`
`output_contract`
`determinism`
`idempotence`
`no_skill_callouts`
`failure_mode_clarity`

`workflow_correctness`
`declared_call_graph`
`stop_conditions`
`handoff_points`
`atom_evidence`
`error_propagation`
`partial_failure_handling`

core user-facing layer untested → capped at 'usable'
evidence_completeness='partial' (not portable) → capped at 'usable'

only 3/4 critical claims covered

archetype: adapter→core_layer_tested? False→evidence: partial→recommended: usable→final: usable

ceiling 1 · core user-facing layer untested → capped at 'usable'

ceiling 2 · evidence_completeness='partial' (not portable) → capped at 'usable'


claim-001	7 个平台的 adapter 与 README 列表完全一致	critical	platform-coverage	● passed
claim-002	CLI 暴露的 type/lt/save_data_option 与文档一致	critical	cli-contract	● passed
claim-003	安装路径所需的依赖在 requirements.txt 真实声明	critical	install	● passed
claim-004	「无需逆向加密」承诺：用户不需要自己处理签名	high	dx-promise	◐ partial
claim-005	端到端 happy path：每个平台的 search 流程能跑出真实数据	critical	end-to-end	· unknown
claim-006	失败模式：未登录 / 关键词无结果 / 平台限流，错误清晰	high	error-propagation	· unknown
claim-007	单一适配器损坏不影响其他平台模块导入	high	structure	· unknown

0.00s

run-static-checks

2026-05-04

0% — tokens in ? / out ?

run-static-checks

2026-05-04

0% — tokens in ? / out ?

# MediaCrawler — final verdict (2026-05-04)

## Repo

- **Name:** NanmiCoder/MediaCrawler
- **Stars:** 48,800+
- **Archetype:** adapter
- **Layer:** **molecule** — predefined per-platform pipelines
  (login → search/detail/creator → store), no LLM-driven routing
- **Eval framework version:** repo-evals layer model v1 (41d9565)

## Bucket

**`usable`** — static layer is clean, but the user-facing value
(actually scraping live data) is molecule-level and not yet logged on
this evaluator's machine.

The repo is a credible install-and-try candidate. It is not yet
`reusable` because:

1. No molecule-level run on any of the 7 platforms has been logged
   here, and platform anti-bot moves fast enough that "the code is
   structurally complete" does not transfer to "it works today".
2. claim-004 ("no encryption reversal needed") is `passed_with_concerns`
   — the user surface is clean, but the framing understates how much
   per-platform plumbing is happening (3 different signing strategies
   live in the repo, all of them break when platforms change).

## What was evaluated

### Atom + molecule level (static, this run)

| Claim | Status | Notes |
|---|---|---|
| 001 platform coverage | passed | `CrawlerFactory.CRAWLERS` has 7 entries, `media_platform/` has 7 dirs, perfect 1:1 with README |
| 002 CLI contract | passed | Code is richer than the English README — `creator` crawl type and `mongodb` / `postgres` storage backends are not surfaced in docs |
| 003 install deps | passed | Python 3.11, Playwright 1.45.0, every storage backend has a real client in `requirements.txt` |
| 004 no encryption reversal | passed_with_concerns | User surface clean, but signing lives in 3 places (`libs/*.js` + `xhshow` pip dep + per-platform `client.py`) — framing oversells simplicity |

### Molecule level (deferred — needs live run)

| Claim | Status | What it takes to clear |
|---|---|---|
| 005 e2e per platform | untested | Real account + Playwright browser + un-broken platform on eval day; record date + commit + row count |
| 006 failure-mode UX | untested | Induce missing-login / empty-keyword / IP-block; verify error visibility and exit codes |
| 007 adapter isolation | untested | Clone repo, break one adapter's `__init__.py`, verify the other six still import |

## Real findings worth surfacing

1. **DX framing oversells the encryption story.** README says "no need
   to reverse complex encryption algorithms"; the repo ships
   `libs/douyin.js`, `libs/zhihu.js`, `libs/stealth.min.js`, depends on
   the third-party `xhshow` package, and keeps additional signing logic
   inside per-platform `client.py` files. The user does not have to do
   the reversing — but the *project* is doing it on three fronts at
   once, and any platform that bumps signing breaks until one of those
   three places is updated.

2. **Hidden capabilities under-documented.** Code exposes a `creator`
   crawl type and `mongodb` + `postgres` storage backends that the
   English README never mentions. Real capability, lost feature
   visibility.

3. **License is non-commercial custom (NOASSERTION).** README disclaimer
   is explicit but the GitHub-detected license is "Other" rather than
   a recognised SPDX ID, which means downstream tooling (Dependabot,
   SBOM scanners, package managers) treats it as unknown — read the
   LICENSE file before any non-research use.

## Why not higher

`usable` is the right ceiling now because:

- No molecule-level live run has been done. Even one passing run on
  one platform — with date + MediaCrawler commit + row count + storage
  evidence — would justify moving claim-005 to `passed` for that
  platform, not the others.
- Even with a clean static layer, anti-bot fragility means
  "recommendable" requires the same evidence on multiple platforms
  *and* multiple evaluation dates separated by weeks, not one heroic
  green snapshot.

## Path to `reusable`

1. Pick the lowest-risk platform (xhs / bili / weibo are usually the
   most stable) and run `uv run main.py --platform <p> --lt qrcode
   --type search --keywords <real_term>`. Log it under
   `runs/<date>/run-<p>-search/business-notes.md`.
2. Repeat for at least one more platform.
3. Induce two failures (no-login, fake keyword) and capture the error
   output under `runs/<date>/run-failures/`.
4. Re-run the verdict calculator.

## Recommended bucket

```yaml
current_bucket: usable
status: evaluated
```