#1
·
2026-05-05
·main@HEAD (npm v5.1.0)
superpowers
obra/superpowers
🛠77 / 100
🛠
🗺
📍
⚛
→
⚗
→
🧬
🛑
0–29
⚠️
30–49
🛠
50–79
🏭
80–100
▼
77
🛠· 77 / 100
- ✓6 claims passed, no critical failures
- ✓MIT / Apache / etc., installable per deployment.install_methods
- ✓release_pipeline_score=2 + pushed in 90-day window
- ⚪EN-only or ZH-only README
- ⚪compound layer needs a logged scenario run
#2
#3
#4
/plugin install superpowers@claude-plugins-official (Claude Code) | Claude Code | easy |
Plugin marketplace (Codex CLI / Codex App / Factory Droid / Gemini CLI / OpenCode / Cursor / GitHub Copilot CLI) | 8 agent platforms | easy |
AI agent harness (Claude Code / Codex / Cursor / Gemini / OpenCode / Factory Droid / GH Copilot CLI)
Host that loads + auto-triggers the skills
Standard agent-side cost; auto-triggering 14 skills means more tokens consumed per task before any actual work happens
· 7
5 1 1
| +40 | |
| +18 | |
| +10 | |
| +12 | |
| -3 | |
| 0 |
6 / 7
passed claim-001
passed claim-002
passed claim-003
passed claim-004
passed claim-005
passed claim-006
untested claim-007
input_contract | |
|---|---|
output_contract | |
determinism | |
idempotence | |
no_skill_callouts | |
failure_mode_clarity |
workflow_correctness | |
|---|---|
declared_call_graph | |
stop_conditions | |
handoff_points | |
atom_evidence | |
error_propagation | |
partial_failure_handling |
goal_achievement | |
|---|---|
direction_judgment | |
quality_judgment | |
meaningful_autonomy | |
handoff_timing | |
observed_call_graph | |
failure_recovery |
- core user-facing layer untested → capped at 'usable'
- hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
- evidence_completeness='partial' (not portable) → capped at 'usable'
- only 3/4 critical claims covered
archetype: hybrid-skill→core_layer_tested? False→evidence: partial→recommended: usable→final: usable
ceiling 1 · core user-facing layer untested → capped at 'usable'
ceiling 2 · hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'
| claim-001 | 14 个 methodology skills 都真实存在且非占位 | critical | skill-coverage | ● passed | |
| claim-002 | 8-platform 安装路径都有官方 marketplace 或 plugin 入口 | critical | cross-platform | ● passed | |
| claim-003 | package.json + 版本号说明项目成熟度 | high | maturity | ● passed | |
| claim-004 | 仓库有 LICENSE 文件 | critical | licensing | ● passed | |
| claim-005 | CLAUDE.md / GEMINI.md / AGENTS.md 等 agent-facing docs 真有内容 | high | agent-discoverability | ◐ partial | |
| claim-006 | tests/ 真有测试覆盖 | high | testing | ● passed | |
| claim-007 | 端到端:装上 superpowers 后 agent 真在工作流里调起方法论 | critical | end-to-end | ○ untested |
0%
0.00s
0
run-static-checks
2026-05-05
0% — tokens in ? / out ?
run-static-checks
2026-05-05
0% — tokens in ? / out ?
# obra/superpowers — final verdict (2026-05-05) ## Repo - **Name:** obra/superpowers · **Stars:** 178,762 · **License:** MIT - **Archetype:** hybrid-skill · **Layer:** molecule - **Version:** v5.1.0 · **Pushed:** 2026-05-04 (yesterday) ## What was evaluated | Claim | Status | Notes | |---|---|---| | 001 14 methodology skills | passed | All 5 sampled 152-371 lines | | 002 8-platform install | passed | 4 plugin config dirs + 8 README install sections | | 003 mature versioning | passed | v5.1.0 + RELEASE-NOTES.md | | 004 LICENSE | passed | MIT | | 005 agent-facing docs | passed_with_concerns | CLAUDE.md (106 lines) is real, but GEMINI.md (2) and AGENTS.md (0 bytes empty) are thin | | 006 tests | passed | 7 test dirs covering install paths + skill triggering | | 007 live agent workflow | untested | needs Claude Code session on a real feature task | ## Real findings 1. **AGENTS.md empty + GEMINI.md only 2 lines.** For a project whose audience is *coding agents*, that's an ironic gap. Claude Code is first-class; everything else is second-class. Worth disclosing in `watch_out`. 2. **Mature for a methodology bundle.** v5.1.0 + 7-dir test suite + release notes — most personal skill catalogs evaluated in this batch are v1.x with no tests. This one has been iterated on significantly. 3. **The skill list reads as a coherent methodology pipeline.** brainstorming → writing-plans → TDD → subagent-driven-development → verification-before-completion → finishing-a-development-branch → receiving-code-review. That's one opinionated software-engineering approach, taught skill-by-skill — not a random utilities catalog. 4. **CLAUDE.md is famously blunt.** "94% PR rejection rate" + "slop PRs" naming and shaming. That's a maintainer culture choice — fits the anti-slop posture of the methodology, but contributors should know the bar before submitting. 5. **8-platform install with uneven coverage.** Claude / OpenCode get 2-file plugin configs each; Codex / Cursor only 1 file each. README lists 8 install paths but the depth-of-integration varies. Verify your platform's plugin contract is what you expect before relying on a non-Claude install. ## Why the score lands where it does - 6/7 static claims passed; 1 passed_with_concerns - 179K stars puts ecosystem at +12 (50K+ band) - Recently active (+5) + release_pipeline=2 (+5) → +10 maintainer - Molecule layer +0 - LICENSE present, no penalties Predicted score: ~85 — solidly **🏭 Team-ready** territory. ## Path to ⭐ Recommend 1. Fill AGENTS.md and GEMINI.md properly (not 0/2 lines). 2. Run a logged live-agent scenario in Claude Code — kick off a feature with brainstorming, watch superpowers chain through writing-plans → TDD → subagent-driven-development → verify. 3. Multi-evaluator coverage on a non-Claude platform (e.g. Cursor) to validate the thinner plugin configs work. 4. Update claim-007 to passed; re-run verdict_calculator. ## Recommended ```yaml status: evaluated ```