· 2026-04-13 ·main@HEAD

wewrite

oaker-io/wewrite

🛠66 / 100

✅

⚠

🎯

⚠

🗺

📍

⚛

→

⚗

→

🧬

🛑

0–29

⚠️

30–49

🛠

50–79

🏭

80–100

▼

🛠· 66 / 100

✓10 claims passed, no critical failures
✓MIT / Apache / etc., installable per deployment.install_methods
◐release_pipeline=1, recently_active=True
⚪EN-only or ZH-only README
⚪static-only eval; live e2e pending

#1👤

#2🎯

#3🧭

#4⇄

geekjourneyx/md2wechat-skill

🏭 · 91atom


`git clone + pip install -r requirements.txt`	any (Python 3.11+)	easy

🌐
🔄

WeChat Official Account API

Push articles to draft box; fetch read-stats

Free; needs verified WeChat Official Account with appid/secret

Image-gen providers (9 supported)

Cover + inline image generation

DashScope/Doubao ~¥0.1/img; OpenAI/Gemini priced higher; auto-fallback chain handles outages

Hotspot sources (Weibo / Toutiao / Baidu)

Live trending topic scrape

Public endpoints; rate-limited but no signup

SEO sources (Baidu / 360)

Search suggestions for keyword scoring

Public endpoints

· 13

10 1 2

	+40
	+18
	+5
	+3
	0
	0

11 / 13

passed claim-001

passed claim-002

passed claim-003

passed claim-004

passed claim-005

passed claim-006

passed claim-007

failed claim-011

untested claim-101

untested claim-102

`input_contract`
`output_contract`
`determinism`
`idempotence`
`no_skill_callouts`
`failure_mode_clarity`

`workflow_correctness`
`declared_call_graph`
`stop_conditions`
`handoff_points`
`atom_evidence`
`error_propagation`
`partial_failure_handling`

core user-facing layer untested → capped at 'usable'
hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
evidence_completeness='partial' (not portable) → capped at 'usable'

only 4/6 critical claims covered

archetype: hybrid-skill→core_layer_tested? False→evidence: partial→recommended: usable→final: usable

ceiling 1 · core user-facing layer untested → capped at 'usable'

ceiling 2 · hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer

ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'


claim-001	pip install succeeds from requirements.txt	critical	support-install	● passed
claim-002	6 CLI commands all respond to --help	critical	support-cli	● passed
claim-003	Markdown→WeChat HTML conversion works	critical	support-converter	● passed
claim-004	Hotspot fetching returns live data from 3 sources	critical	support-hotspots	● passed
claim-005	16 themes exist with full YAML config + dark mode	high	support-themes	● passed
claim-006	9 image generation providers implemented	high	support-image-gen	● passed
claim-007	5 writing personas with rich YAML config	high	support-personas	● passed
claim-008	SEO keyword scoring works with live data	medium	support-seo	● passed
claim-009	Humanness scoring provides multi-tier analysis	medium	support-quality	● passed
claim-010	Evals exist for 3 scenarios	medium	support-quality	● passed
claim-011	Unit test suite exists	high	support-testing	✕ failed
claim-101	Full 8-step article generation workflow	critical	core-llm	○ untested
claim-102	Anti-AI detection quality measures	critical	core-llm	○ untested

0.00s

run-smoke

2026-04-13

0% — tokens in ? / out ?

run-smoke

2026-04-13

0% — tokens in ? / out ?

# Final Verdict

## Repo

- Name: oaker-io/wewrite
- Date: 2026-04-13
- Archetype: hybrid-skill
- Final bucket: **usable**
- Confidence: medium

## Why This Bucket

- **Core outcome**: Support layer is impressive — all 6 CLI commands work, converter produces real WeChat HTML, hotspot fetching returns live data, 16 themes + 9 image providers + 5 personas all verified. But the **core LLM workflow (8-step article generation) is untested** — it requires a full Claude Code session with WeChat API credentials.
- **Scenario breadth**: Only tested support layer (deterministic code). Core layer (LLM-driven writing) untested. For a hybrid-skill, this triggers the **hybrid cap**: core layer untested → cannot exceed `usable`.
- **Repeatability**: Converter, hotspots, and CLI commands all work consistently in repeated runs. LLM layer repeatability unknown.
- **Failure transparency**: CLI tools handle missing inputs gracefully. Error messages are actionable.

## Hybrid-Skill Ceiling Applied

Per hybrid-skill archetype rules: the **core user-facing layer (LLM-driven article generation)** was not tested. The support layer (converter, hotspots, themes, personas, image providers) all pass. But without core layer evidence, verdict is **capped at `usable`**.

## Score Summary

| Category | Passed | Failed | Partial | Untested | Total |
|----------|--------|--------|---------|----------|-------|
| Critical (support) | 4 | 0 | 0 | 0 | 4 |
| Critical (core) | 0 | 0 | 0 | 2 | 2 |
| High | 3 | 1 | 0 | 0 | 4 |
| Medium | 3 | 0 | 0 | 0 | 3 |
| **Total** | **10** | **1** | **0** | **2** | **13** |

## What I Would Say In Plain English

**wewrite's support layer is genuinely impressive for a skill repo.** The converter produces real WeChat-compatible HTML (inline CSS, footnoted links, dark mode attributes). Hotspot fetching returns live trends from 3 Chinese platforms. 16 themes, 9 image providers, 5 personas — all verified to exist with correct structure. The eval system (3 structured scenarios) shows maturity.

**But it's a writing skill that I haven't seen write.** The entire 8-step article generation pipeline is LLM-driven and requires WeChat API credentials to test end-to-end. The support layer works, but the core promise — "一句话搞定公众号" — is unverified.

**The one real gap: zero unit tests.** 2,232 lines of Python toolkit code with no pytest tests at all. The eval specs test agent behavior, not code correctness. A converter regression would go undetected.

## Path to `reusable`

1. **Test the core LLM workflow** — run a full agent session, generate an article, score it against the quality contract and humanness_score.py
2. **Add unit tests** — converter.py (548 lines) especially needs test coverage for WeChat HTML edge cases
3. **Verify at least 2 image providers** with real API keys

## Path to `recommendable`

Everything in `reusable` plus:
4. **Multiple article generation runs** showing consistency across personas and frameworks
5. **Anti-slop verification** — generated articles scored against banned phrase list
6. **Publish flow verification** — draft-to-WeChat pipeline tested with real credentials
7. **CI for converter tests** — prevent WeChat HTML regressions

## Remaining Risks

- **Core workflow completely untested** — the entire value prop of the skill is unverified
- **No unit tests** — 2,232 lines of Python with zero pytest coverage
- **Image providers cannot be tested without API keys** — 9 providers verified as code, but none tested for actual image generation
- **WeChat API dependency** — publish flow requires real WeChat Official Account credentials
- **camoufox dependency** — browser-based hotspot fetching may break if source sites change layout