· 2026-04-13 ·main@HEAD

content-toolkit

zinan92/content-toolkit

🛠

67 / 100Available

📝

🗺

📍

⚛

→

⚗

→

🧬

🛑

0–29

⚠️

30–49

🛠

50–79

🏭

80–100

▼

🛠· 67 / 100

✓12 claims passed, no critical failures
⚠README may claim a license but no LICENSE file exists
◐release_pipeline=1, recently_active=True
⚪EN-only or ZH-only README
⚪compound layer needs a logged scenario run

#1👤

#2🎯

#3🧭

#4⇄

zinan92/content-downloader

🛑 · 44molecule


`npm install -g @zinan92/content-toolkit`	any (Node.js 18+, Python 3.11+ on demand)	easy
`git clone + npm link`	any	moderate

🌐

GitHub (for downstream skill repos)

First-time skill auto-install (git clone)

Public repos; needs network on first use of each skill

Anthropic API (downstream)

ctk-analyze + ctk-rewrite use Claude

Required only for the analyze + rewrite verbs

· 16

12 1 3

	+40
	+27
	+5
	0
	-3
	-2

16 / 16

passed claim-001

passed claim-002

passed claim-003

passed claim-004

passed claim-005

passed claim-006

passed claim-007

passed claim-008

failed claim-009

passed claim-010

passed claim-011

partial claim-012

failed claim-014

`input_contract`
`output_contract`
`determinism`
`idempotence`
`no_skill_callouts`
`failure_mode_clarity`

`workflow_correctness`
`declared_call_graph`
`stop_conditions`
`handoff_points`
`atom_evidence`
`error_propagation`
`partial_failure_handling`

`goal_achievement`
`direction_judgment`
`quality_judgment`
`meaningful_autonomy`
`handoff_timing`
`observed_call_graph`
`failure_recovery`

core user-facing layer untested → capped at 'usable'
hybrid-repo rule: archetype 'orchestrator' requires end-to-end evaluation of the user-facing layer
evidence_completeness='partial' (not portable) → capped at 'usable'

archetype: orchestrator→core_layer_tested? False→evidence: partial→recommended: usable→final: usable

ceiling 1 · core user-facing layer untested → capped at 'usable'

ceiling 2 · hybrid-repo rule: archetype 'orchestrator' requires end-to-end evaluation of the user-facing layer

ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'


claim-001	Unified CLI entry point routes to 7 capabilities	critical	orchestration	● passed
claim-002	Bare CLI shows Chinese help with all commands	critical	orchestration	● passed
claim-003	Auto-install: capabilities installed on first use	critical	orchestration	● passed
claim-004	Bare URL input → suggests content download	critical	smart-routing	● passed
claim-005	Bare .mp4 input → suggests videocut subcommand	critical	smart-routing	● passed
claim-006	Alias normalization: intelligence→analyze, xhs→xiaohongshu, etc.	high	smart-routing	● passed
claim-007	videocut transcribe produces transcript files	high	videocut	● passed
claim-008	videocut autocut produces cut.mp4	high	videocut	● passed
claim-009	videocut subtitle produces subtitled video	high	videocut	✕ failed
claim-010	health command shows per-capability status	high	orchestration	● passed
claim-011	Unknown command shows helpful error	high	error-handling	● passed
claim-012	Error propagation: upstream errors are passed through	high	error-handling	◐ partial
claim-013	7 capabilities badge claim	medium	meta	● passed
claim-014	CLI test suite passes	high	test-infra	✕ failed
claim-015	intelligence/analyze capability works	medium	intelligence	✕ failed
claim-016	Zero npm dependencies	medium	meta	● passed

0.00s

run-smoke-2026-04-13

2026-04-13

0% — tokens in ? / out ?

run-smoke-2026-04-13

2026-04-13

0% — tokens in ? / out ?

# Final Verdict

## Repo

- Name: zinan92/content-toolkit
- Date: 2026-04-13
- Archetype: orchestrator
- Final bucket: **usable**
- Confidence: medium

## Verdict Rationale

### Baseline: usable

Per verdict calculator rules:
- Critical claims **claim-001 through claim-006** all PASSED (routing, help, auto-install, smart hints)
- But critical downstream coverage is partial — test suite is 100% broken (claim-014),
  subtitle silently fails (claim-009), intelligence capability degraded (claim-015)
- Error propagation is inconsistent (claim-012: partial)

### Ceiling applied: none

The orchestrator archetype has no default ceiling. However, the broken test suite
and silent failures effectively self-cap at `usable` — you can't recommend something
where automated verification is completely absent and some capabilities fail silently.

## Evaluation Dimensions (Orchestrator-Specific)

| Dimension | Rating | Notes |
|-----------|--------|-------|
| **Routing correctness** | ★★★★☆ | Excellent. All tested routes work. Aliases normalize correctly. Smart input detection is a nice touch. |
| **Error propagation** | ★★☆☆☆ | Inconsistent. `download` passes through errors, but `videocut subtitle` exits 0 with empty output. |
| **Partial failure handling** | ★★★☆☆ | Not tested deeply, but auto-install→degraded is handled well (health reports it). |
| **End-to-end happy path** | ★★★☆☆ | transcribe and autocut work. subtitle fails silently. Pipeline untested in this run but TEST-REPORT.md says it passes. |
| **Per-area coverage** | ★★☆☆☆ | 7 capabilities claimed, only download+videocut(2/7 subs) verified working. Intelligence degraded. publish/xiaohongshu untested (require auth/external services). |
| **Observability** | ★★★★☆ | `content health` is genuinely useful — shows per-capability status, git ref, known issues. |

## Score Summary

| Category | Passed | Failed | Partial | Total |
|----------|--------|--------|---------|-------|
| Critical | 6 | 0 | 0 | 6 |
| High | 4 | 2 | 1 | 7 |
| Medium | 2 | 1 | 0 | 3 |
| **Total** | **12** | **3** | **1** | **16** |

## What I Would Say In Plain English

**content-toolkit's orchestration layer is well-designed — the routing, help, aliases,
and smart input detection are genuinely good.** If you already know which commands work,
it's a useful tool.

**But it's not reliable enough to recommend.** The test suite is 100% broken (all 80+ tests
fail on import), some capabilities silently fail (subtitle exits 0 with empty output),
and the intelligence capability auto-installs into a degraded state. The repo's own
TEST-REPORT.md honestly documents a 7/20 pass rate from March 31 — and nothing has
been fixed since.

**The gap is not in design but in execution quality.** The architecture is sound, the
skill system is thoughtful, and the health reporting is better than most. What's missing
is: fix the test suite, fix silent failures, fix the 13 known issues in TEST-REPORT.md.

## Path to `reusable`

1. **Fix test suite** — export functions from cli.js, prevent help side effect on import.
   Currently zero automated verification of routing logic.
2. **Fix silent failures** — videocut subtitle (and likely clip, cover, speed per TEST-REPORT.md)
   must either produce output or surface a clear error. Exit 0 + empty dir is unacceptable.
3. **Fix intelligence capability** — pyproject.toml module path so auto-install produces
   a working capability, not degraded.
4. **Address TEST-REPORT.md backlog** — at least the 4 MEDIUM bugs (BUG-3/4/5/6) and
   2 HIGH UX issues (UX-1/2).

## Path to `recommendable`

Everything in `reusable` plus:
5. **Per-area claim maps** — each downstream capability (download, extract, rewrite,
   videocut, publish, xiaohongshu) gets its own eval under `areas/<slug>/`
6. **End-to-end workflow verification** — the douyin-to-xhs and pipeline presets
   tested with real content
7. **Consistent error propagation** — every downstream failure surfaces at the
   orchestrator boundary
8. **CI integration** — test suite runs on push, catches regressions

## Remaining Risks

- **Silent failure pattern may be systemic.** We only tested 3 of 7 videocut subcommands
  and 1 already silently fails. TEST-REPORT.md documents similar issues in clip, cover, speed.
- **No CI.** Regressions accumulate silently. The test suite broke and nobody noticed.
- **External capabilities (publish, xiaohongshu) are untested** — they require auth
  and external services, making them hard to evaluate without credentials.
- **intelligence capability has a packaging bug** in the upstream repo, but content-toolkit
  claims it as a capability. Users will encounter a broken experience.