#1
·
2026-04-13
·main@HEAD
content-toolkit
zinan92/content-toolkit
🛠67 / 100
🗺
📍
⚛
→
⚗
→
🧬
🛑
0–29
⚠️
30–49
🛠
50–79
🏭
80–100
▼
67
🛠· 67 / 100
- ✓12 claims passed, no critical failures
- ⚠README may claim a license but no LICENSE file exists
- ◐release_pipeline=1, recently_active=True
- ⚪EN-only or ZH-only README
- ⚪compound layer needs a logged scenario run
#2
#3
#4
npm install -g @zinan92/content-toolkit | any (Node.js 18+, Python 3.11+ on demand) | easy |
git clone + npm link | any | moderate |
GitHub (for downstream skill repos)
First-time skill auto-install (git clone)
Public repos; needs network on first use of each skill
Anthropic API (downstream)
ctk-analyze + ctk-rewrite use Claude
Required only for the analyze + rewrite verbs
· 16
12 1 3
| +40 | |
| +27 | |
| +5 | |
| 0 | |
| -3 | |
| -2 |
16 / 16
passed claim-001
passed claim-002
passed claim-003
passed claim-004
passed claim-005
passed claim-006
passed claim-007
passed claim-008
failed claim-009
passed claim-010
passed claim-011
partial claim-012
failed claim-014
input_contract | |
|---|---|
output_contract | |
determinism | |
idempotence | |
no_skill_callouts | |
failure_mode_clarity |
workflow_correctness | |
|---|---|
declared_call_graph | |
stop_conditions | |
handoff_points | |
atom_evidence | |
error_propagation | |
partial_failure_handling |
goal_achievement | |
|---|---|
direction_judgment | |
quality_judgment | |
meaningful_autonomy | |
handoff_timing | |
observed_call_graph | |
failure_recovery |
- core user-facing layer untested → capped at 'usable'
- hybrid-repo rule: archetype 'orchestrator' requires end-to-end evaluation of the user-facing layer
- evidence_completeness='partial' (not portable) → capped at 'usable'
archetype: orchestrator→core_layer_tested? False→evidence: partial→recommended: usable→final: usable
ceiling 1 · core user-facing layer untested → capped at 'usable'
ceiling 2 · hybrid-repo rule: archetype 'orchestrator' requires end-to-end evaluation of the user-facing layer
ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'
| claim-001 | Unified CLI entry point routes to 7 capabilities | critical | orchestration | ● passed | |
| claim-002 | Bare CLI shows Chinese help with all commands | critical | orchestration | ● passed | |
| claim-003 | Auto-install: capabilities installed on first use | critical | orchestration | ● passed | |
| claim-004 | Bare URL input → suggests content download | critical | smart-routing | ● passed | |
| claim-005 | Bare .mp4 input → suggests videocut subcommand | critical | smart-routing | ● passed | |
| claim-006 | Alias normalization: intelligence→analyze, xhs→xiaohongshu, etc. | high | smart-routing | ● passed | |
| claim-007 | videocut transcribe produces transcript files | high | videocut | ● passed | |
| claim-008 | videocut autocut produces cut.mp4 | high | videocut | ● passed | |
| claim-009 | videocut subtitle produces subtitled video | high | videocut | ✕ failed | |
| claim-010 | health command shows per-capability status | high | orchestration | ● passed | |
| claim-011 | Unknown command shows helpful error | high | error-handling | ● passed | |
| claim-012 | Error propagation: upstream errors are passed through | high | error-handling | ◐ partial | |
| claim-013 | 7 capabilities badge claim | medium | meta | ● passed | |
| claim-014 | CLI test suite passes | high | test-infra | ✕ failed | |
| claim-015 | intelligence/analyze capability works | medium | intelligence | ✕ failed | |
| claim-016 | Zero npm dependencies | medium | meta | ● passed |
0%
0.00s
0
run-smoke-2026-04-13
2026-04-13
0% — tokens in ? / out ?
run-smoke-2026-04-13
2026-04-13
0% — tokens in ? / out ?
# Final Verdict ## Repo - Name: zinan92/content-toolkit - Date: 2026-04-13 - Archetype: orchestrator - Final bucket: **usable** - Confidence: medium ## Verdict Rationale ### Baseline: usable Per verdict calculator rules: - Critical claims **claim-001 through claim-006** all PASSED (routing, help, auto-install, smart hints) - But critical downstream coverage is partial — test suite is 100% broken (claim-014), subtitle silently fails (claim-009), intelligence capability degraded (claim-015) - Error propagation is inconsistent (claim-012: partial) ### Ceiling applied: none The orchestrator archetype has no default ceiling. However, the broken test suite and silent failures effectively self-cap at `usable` — you can't recommend something where automated verification is completely absent and some capabilities fail silently. ## Evaluation Dimensions (Orchestrator-Specific) | Dimension | Rating | Notes | |-----------|--------|-------| | **Routing correctness** | ★★★★☆ | Excellent. All tested routes work. Aliases normalize correctly. Smart input detection is a nice touch. | | **Error propagation** | ★★☆☆☆ | Inconsistent. `download` passes through errors, but `videocut subtitle` exits 0 with empty output. | | **Partial failure handling** | ★★★☆☆ | Not tested deeply, but auto-install→degraded is handled well (health reports it). | | **End-to-end happy path** | ★★★☆☆ | transcribe and autocut work. subtitle fails silently. Pipeline untested in this run but TEST-REPORT.md says it passes. | | **Per-area coverage** | ★★☆☆☆ | 7 capabilities claimed, only download+videocut(2/7 subs) verified working. Intelligence degraded. publish/xiaohongshu untested (require auth/external services). | | **Observability** | ★★★★☆ | `content health` is genuinely useful — shows per-capability status, git ref, known issues. | ## Score Summary | Category | Passed | Failed | Partial | Total | |----------|--------|--------|---------|-------| | Critical | 6 | 0 | 0 | 6 | | High | 4 | 2 | 1 | 7 | | Medium | 2 | 1 | 0 | 3 | | **Total** | **12** | **3** | **1** | **16** | ## What I Would Say In Plain English **content-toolkit's orchestration layer is well-designed — the routing, help, aliases, and smart input detection are genuinely good.** If you already know which commands work, it's a useful tool. **But it's not reliable enough to recommend.** The test suite is 100% broken (all 80+ tests fail on import), some capabilities silently fail (subtitle exits 0 with empty output), and the intelligence capability auto-installs into a degraded state. The repo's own TEST-REPORT.md honestly documents a 7/20 pass rate from March 31 — and nothing has been fixed since. **The gap is not in design but in execution quality.** The architecture is sound, the skill system is thoughtful, and the health reporting is better than most. What's missing is: fix the test suite, fix silent failures, fix the 13 known issues in TEST-REPORT.md. ## Path to `reusable` 1. **Fix test suite** — export functions from cli.js, prevent help side effect on import. Currently zero automated verification of routing logic. 2. **Fix silent failures** — videocut subtitle (and likely clip, cover, speed per TEST-REPORT.md) must either produce output or surface a clear error. Exit 0 + empty dir is unacceptable. 3. **Fix intelligence capability** — pyproject.toml module path so auto-install produces a working capability, not degraded. 4. **Address TEST-REPORT.md backlog** — at least the 4 MEDIUM bugs (BUG-3/4/5/6) and 2 HIGH UX issues (UX-1/2). ## Path to `recommendable` Everything in `reusable` plus: 5. **Per-area claim maps** — each downstream capability (download, extract, rewrite, videocut, publish, xiaohongshu) gets its own eval under `areas/<slug>/` 6. **End-to-end workflow verification** — the douyin-to-xhs and pipeline presets tested with real content 7. **Consistent error propagation** — every downstream failure surfaces at the orchestrator boundary 8. **CI integration** — test suite runs on push, catches regressions ## Remaining Risks - **Silent failure pattern may be systemic.** We only tested 3 of 7 videocut subcommands and 1 already silently fails. TEST-REPORT.md documents similar issues in clip, cover, speed. - **No CI.** Regressions accumulate silently. The test suite broke and nobody noticed. - **External capabilities (publish, xiaohongshu) are untested** — they require auth and external services, making them hard to evaluate without credentials. - **intelligence capability has a packaging bug** in the upstream repo, but content-toolkit claims it as a capability. Users will encounter a broken experience.