repo·evals
· 2026-05-05 ·main@HEAD (npm v1.4.1)

qiushi-skill

HughYau/qiushi-skill

🛠73 / 100
🎯

🛠
🗺
01Research调研02Plan & design计划与设计03Code & review开发与评审04Package打包发布05Maintain维护
📍
🧬

🛑
0–29
⚠️
30–49
🛠
50–79
🏭
80–100
73
🛠· 73 / 100
  • 6 claims passed, no critical failures
  • MIT / Apache / etc., installable per deployment.install_methods
  • release_pipeline_score=2 + pushed in 90-day window
  • multilingual_readme=true
  • static-only eval; live e2e pending

#1👤
#2🎯
#3🧭
#4

Hard question难题(analysis / strategy / debug)(分析 / 战略 / 调试)User picks methodology用户选方法论(investigation / contradiction / ...)(调查 / 矛盾 / ...)Skill firesskill 触发(questions / framework)(反问 / 框架)Original-text citations原文引用(per skill)(每个 skill 自带)Structured analysis结构化分析(not flat answer)(不是平铺答案)

npx qiushi-skill install <platform>Claude Code / Codex / Cursor / Hermes / NanoBot / OpenClaw / OpenCodeeasy
git clone + cp skills/ to platform skills diranymoderate
  • 📡
AI agent runtime (Claude Code / Codex / Cursor / Hermes / NanoBot / OpenClaw / OpenCode)
Host that loads the skills
Standard agent-side cost; the skill itself is pure markdown (no extra API calls)
· 7
5 1 1
+40
+18
+12
+3
0
0

6 / 7
passed claim-001

passed claim-002

passed claim-003

passed claim-004

passed claim-005

passed claim-006

untested claim-007

input_contract
output_contract
determinism
idempotence
no_skill_callouts
failure_mode_clarity

workflow_correctness
declared_call_graph
stop_conditions
handoff_points
atom_evidence
error_propagation
partial_failure_handling

  • core user-facing layer untested → capped at 'usable'
  • hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
  • evidence_completeness='partial' (not portable) → capped at 'usable'

  • only 3/4 critical claims covered

archetype: hybrid-skillcore_layer_tested? Falseevidence: partialrecommended: usablefinal: usable
ceiling 1 · core user-facing layer untested → capped at 'usable'
ceiling 2 · hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'

claim-00110 个 SKILL.md 都真实存在(9 大方法论 + 1 总原则)criticalskill-coverage● passed
claim-0027-platform 安装路径在仓库根目录都有专用配置目录criticalcross-platform● passed
claim-003npm 包真实可装,bin entry 是真 JS CLIcriticalinstall● passed
claim-004双语 README + tests/ 跨平台验证脚本highi18n + testing● passed
claim-005每个 skill 自带 original-texts.md 引用经典原文highdepth◐ partial
claim-006LICENSE 是 MIT 且对中英文都明确highlicensing● passed
claim-007端到端 happy path:装上后 agent 真在工作流里调起方法论criticalend-to-end○ untested

0%
0.00s
0

run-static-checks

2026-05-05
0% tokens in ? / out ?

run-static-checks

2026-05-05
0% tokens in ? / out ?
# qiushi-skill — final verdict (2026-05-05)

## Repo

- **Name:** HughYau/qiushi-skill · **Stars:** 3,007
- **Archetype:** hybrid-skill (reclassified from default prompt-skill)
- **Layer:** molecule
- **License:** MIT · **Language:** JavaScript · **Pushed:** 2026-05-01

## What was evaluated

| Claim | Status | Notes |
|---|---|---|
| 001 10 methodology skills | passed | All 10 SKILL.md exist (HTTP 200) |
| 002 7-platform install configs | passed | each platform has dedicated config dir with files |
| 003 npm + bin | passed | 307-line CLI; npm registry has v1.4.1 |
| 004 bilingual + cross-platform tests | passed | EN README + bash + PowerShell validators |
| 005 original-texts depth | passed_with_concerns | 1 of 3 sampled is empty (arming-thought/original-texts.md = 0 bytes) |
| 006 LICENSE | passed | MIT |
| 007 live agent workflow | untested | needs real Claude Code / OpenClaw session |

## Real findings

1. **`arming-thought/original-texts.md` is empty (0 bytes).** The
   other two sampled skills have ~2 KB of classical-text excerpts.
   arming-thought is the *总原则* (overarching principle, "实事求是")
   — the most important skill — and it's missing its references.
   One-line upstream fix.

2. **Genuinely cross-platform install path.** 7 dedicated
   `.<platform>/` config dirs (Claude Code / Codex / Cursor / Hermes
   / NanoBot / OpenClaw / OpenCode). Most personal skill catalogs
   target 1-3; this one cared enough to ship 7.

3. **Cross-platform test discipline.** validate.sh (216 lines) +
   validate.ps1 (212 lines) — Windows install path is actually
   tested, not just "should work".

4. **Methodology granularity is honest.** 10 distinct methodologies
   are genuinely different reflexes (contradiction analysis vs
   investigation-first vs protracted-strategy). User picks 2-3 that
   match a workflow; not 10 pieces of one skill.

5. **Cultural / branding consideration.** Methodology rooted in
   Mao-era dialectical materialism. README explicitly disclaims
   ("this is methodology, not politics"), but corporate adopters
   should think before installing in a public skill catalog.
   Worth surfacing in `watch_out`.

## Why the score lands where it does

Predicted ~70 (🛠 Self-use OK). Drivers:
- 7 SKILL.md claims passed (mostly +5 each, capped at +30)
- claim-005 passed_with_concerns
- maintainer evidence: release_pipeline=2 (+5) + multilingual (+2) + recently_active (+5) = +12 maintainer
- ecosystem: 3K stars → +3
- layer_bonus: molecule → 0
- penalties: 0 (MIT)

## Path forward

1. Fill `skills/arming-thought/original-texts.md` (the most important
   skill is missing references).
2. Run a live agent workflow on a complex problem; verify the agent
   actually invokes contradiction-analysis (or another methodology)
   rather than going straight to the answer.
3. Log under `runs/<date>/run-live-agent/`.