#1
·
2026-04-13
·main@HEAD
wewrite
oaker-io/wewrite
🛠66 / 100
🗺
📍
⚛
→
⚗
→
🧬
🛑
0–29
⚠️
30–49
🛠
50–79
🏭
80–100
▼
66
🛠· 66 / 100
- ✓10 claims passed, no critical failures
- ✓MIT / Apache / etc., installable per deployment.install_methods
- ◐release_pipeline=1, recently_active=True
- ⚪EN-only or ZH-only README
- ⚪static-only eval; live e2e pending
#2
#3
#4
git clone + pip install -r requirements.txt | any (Python 3.11+) | easy |
WeChat Official Account API
Push articles to draft box; fetch read-stats
Free; needs verified WeChat Official Account with appid/secret
Image-gen providers (9 supported)
Cover + inline image generation
DashScope/Doubao ~¥0.1/img; OpenAI/Gemini priced higher; auto-fallback chain handles outages
Hotspot sources (Weibo / Toutiao / Baidu)
Live trending topic scrape
Public endpoints; rate-limited but no signup
SEO sources (Baidu / 360)
Search suggestions for keyword scoring
Public endpoints
· 13
10 1 2
| +40 | |
| +18 | |
| +5 | |
| +3 | |
| 0 | |
| 0 |
11 / 13
passed claim-001
passed claim-002
passed claim-003
passed claim-004
passed claim-005
passed claim-006
passed claim-007
failed claim-011
untested claim-101
untested claim-102
input_contract | |
|---|---|
output_contract | |
determinism | |
idempotence | |
no_skill_callouts | |
failure_mode_clarity |
workflow_correctness | |
|---|---|
declared_call_graph | |
stop_conditions | |
handoff_points | |
atom_evidence | |
error_propagation | |
partial_failure_handling |
- core user-facing layer untested → capped at 'usable'
- hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
- evidence_completeness='partial' (not portable) → capped at 'usable'
- only 4/6 critical claims covered
archetype: hybrid-skill→core_layer_tested? False→evidence: partial→recommended: usable→final: usable
ceiling 1 · core user-facing layer untested → capped at 'usable'
ceiling 2 · hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'
| claim-001 | pip install succeeds from requirements.txt | critical | support-install | ● passed | |
| claim-002 | 6 CLI commands all respond to --help | critical | support-cli | ● passed | |
| claim-003 | Markdown→WeChat HTML conversion works | critical | support-converter | ● passed | |
| claim-004 | Hotspot fetching returns live data from 3 sources | critical | support-hotspots | ● passed | |
| claim-005 | 16 themes exist with full YAML config + dark mode | high | support-themes | ● passed | |
| claim-006 | 9 image generation providers implemented | high | support-image-gen | ● passed | |
| claim-007 | 5 writing personas with rich YAML config | high | support-personas | ● passed | |
| claim-008 | SEO keyword scoring works with live data | medium | support-seo | ● passed | |
| claim-009 | Humanness scoring provides multi-tier analysis | medium | support-quality | ● passed | |
| claim-010 | Evals exist for 3 scenarios | medium | support-quality | ● passed | |
| claim-011 | Unit test suite exists | high | support-testing | ✕ failed | |
| claim-101 | Full 8-step article generation workflow | critical | core-llm | ○ untested | |
| claim-102 | Anti-AI detection quality measures | critical | core-llm | ○ untested |
0%
0.00s
0
run-smoke
2026-04-13
0% — tokens in ? / out ?
run-smoke
2026-04-13
0% — tokens in ? / out ?
# Final Verdict ## Repo - Name: oaker-io/wewrite - Date: 2026-04-13 - Archetype: hybrid-skill - Final bucket: **usable** - Confidence: medium ## Why This Bucket - **Core outcome**: Support layer is impressive — all 6 CLI commands work, converter produces real WeChat HTML, hotspot fetching returns live data, 16 themes + 9 image providers + 5 personas all verified. But the **core LLM workflow (8-step article generation) is untested** — it requires a full Claude Code session with WeChat API credentials. - **Scenario breadth**: Only tested support layer (deterministic code). Core layer (LLM-driven writing) untested. For a hybrid-skill, this triggers the **hybrid cap**: core layer untested → cannot exceed `usable`. - **Repeatability**: Converter, hotspots, and CLI commands all work consistently in repeated runs. LLM layer repeatability unknown. - **Failure transparency**: CLI tools handle missing inputs gracefully. Error messages are actionable. ## Hybrid-Skill Ceiling Applied Per hybrid-skill archetype rules: the **core user-facing layer (LLM-driven article generation)** was not tested. The support layer (converter, hotspots, themes, personas, image providers) all pass. But without core layer evidence, verdict is **capped at `usable`**. ## Score Summary | Category | Passed | Failed | Partial | Untested | Total | |----------|--------|--------|---------|----------|-------| | Critical (support) | 4 | 0 | 0 | 0 | 4 | | Critical (core) | 0 | 0 | 0 | 2 | 2 | | High | 3 | 1 | 0 | 0 | 4 | | Medium | 3 | 0 | 0 | 0 | 3 | | **Total** | **10** | **1** | **0** | **2** | **13** | ## What I Would Say In Plain English **wewrite's support layer is genuinely impressive for a skill repo.** The converter produces real WeChat-compatible HTML (inline CSS, footnoted links, dark mode attributes). Hotspot fetching returns live trends from 3 Chinese platforms. 16 themes, 9 image providers, 5 personas — all verified to exist with correct structure. The eval system (3 structured scenarios) shows maturity. **But it's a writing skill that I haven't seen write.** The entire 8-step article generation pipeline is LLM-driven and requires WeChat API credentials to test end-to-end. The support layer works, but the core promise — "一句话搞定公众号" — is unverified. **The one real gap: zero unit tests.** 2,232 lines of Python toolkit code with no pytest tests at all. The eval specs test agent behavior, not code correctness. A converter regression would go undetected. ## Path to `reusable` 1. **Test the core LLM workflow** — run a full agent session, generate an article, score it against the quality contract and humanness_score.py 2. **Add unit tests** — converter.py (548 lines) especially needs test coverage for WeChat HTML edge cases 3. **Verify at least 2 image providers** with real API keys ## Path to `recommendable` Everything in `reusable` plus: 4. **Multiple article generation runs** showing consistency across personas and frameworks 5. **Anti-slop verification** — generated articles scored against banned phrase list 6. **Publish flow verification** — draft-to-WeChat pipeline tested with real credentials 7. **CI for converter tests** — prevent WeChat HTML regressions ## Remaining Risks - **Core workflow completely untested** — the entire value prop of the skill is unverified - **No unit tests** — 2,232 lines of Python with zero pytest coverage - **Image providers cannot be tested without API keys** — 9 providers verified as code, but none tested for actual image generation - **WeChat API dependency** — publish flow requires real WeChat Official Account credentials - **camoufox dependency** — browser-based hotspot fetching may break if source sites change layout