· 2026-05-05 ·main@HEAD (npm v5.1.0)

superpowers

obra/superpowers

🛠77 / 100

✅

⚠

🎯

⚠

🛠

🗺

📍

⚛

→

⚗

→

🧬

🛑

0–29

⚠️

30–49

🛠

50–79

🏭

80–100

▼

🛠· 77 / 100

✓6 claims passed, no critical failures
✓MIT / Apache / etc., installable per deployment.install_methods
✓release_pipeline_score=2 + pushed in 90-day window
⚪EN-only or ZH-only README
⚪compound layer needs a logged scenario run

#1👤

#2🎯

#3🧭

#4⇄


`/plugin install superpowers@claude-plugins-official (Claude Code)`	Claude Code	easy
`Plugin marketplace (Codex CLI / Codex App / Factory Droid / Gemini CLI / OpenCode / Cursor / GitHub Copilot CLI)`	8 agent platforms	easy

📡

AI agent harness (Claude Code / Codex / Cursor / Gemini / OpenCode / Factory Droid / GH Copilot CLI)

Host that loads + auto-triggers the skills

Standard agent-side cost; auto-triggering 14 skills means more tokens consumed per task before any actual work happens

· 7

5 1 1

	+40
	+18
	+10
	+12
	-3
	0

6 / 7

passed claim-001

passed claim-002

passed claim-003

passed claim-004

passed claim-005

passed claim-006

untested claim-007

`input_contract`
`output_contract`
`determinism`
`idempotence`
`no_skill_callouts`
`failure_mode_clarity`

`workflow_correctness`
`declared_call_graph`
`stop_conditions`
`handoff_points`
`atom_evidence`
`error_propagation`
`partial_failure_handling`

`goal_achievement`
`direction_judgment`
`quality_judgment`
`meaningful_autonomy`
`handoff_timing`
`observed_call_graph`
`failure_recovery`

core user-facing layer untested → capped at 'usable'
hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
evidence_completeness='partial' (not portable) → capped at 'usable'

only 3/4 critical claims covered

archetype: hybrid-skill→core_layer_tested? False→evidence: partial→recommended: usable→final: usable

ceiling 1 · core user-facing layer untested → capped at 'usable'

ceiling 2 · hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer

ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'


claim-001	14 个 methodology skills 都真实存在且非占位	critical	skill-coverage	● passed
claim-002	8-platform 安装路径都有官方 marketplace 或 plugin 入口	critical	cross-platform	● passed
claim-003	package.json + 版本号说明项目成熟度	high	maturity	● passed
claim-004	仓库有 LICENSE 文件	critical	licensing	● passed
claim-005	CLAUDE.md / GEMINI.md / AGENTS.md 等 agent-facing docs 真有内容	high	agent-discoverability	◐ partial
claim-006	tests/ 真有测试覆盖	high	testing	● passed
claim-007	端到端：装上 superpowers 后 agent 真在工作流里调起方法论	critical	end-to-end	○ untested

0.00s

run-static-checks

2026-05-05

0% — tokens in ? / out ?

run-static-checks

2026-05-05

0% — tokens in ? / out ?

# obra/superpowers — final verdict (2026-05-05)

## Repo

- **Name:** obra/superpowers · **Stars:** 178,762 · **License:** MIT
- **Archetype:** hybrid-skill · **Layer:** molecule
- **Version:** v5.1.0 · **Pushed:** 2026-05-04 (yesterday)

## What was evaluated

| Claim | Status | Notes |
|---|---|---|
| 001 14 methodology skills | passed | All 5 sampled 152-371 lines |
| 002 8-platform install | passed | 4 plugin config dirs + 8 README install sections |
| 003 mature versioning | passed | v5.1.0 + RELEASE-NOTES.md |
| 004 LICENSE | passed | MIT |
| 005 agent-facing docs | passed_with_concerns | CLAUDE.md (106 lines) is real, but GEMINI.md (2) and AGENTS.md (0 bytes empty) are thin |
| 006 tests | passed | 7 test dirs covering install paths + skill triggering |
| 007 live agent workflow | untested | needs Claude Code session on a real feature task |

## Real findings

1. **AGENTS.md empty + GEMINI.md only 2 lines.** For a project whose
   audience is *coding agents*, that's an ironic gap. Claude Code is
   first-class; everything else is second-class. Worth disclosing in
   `watch_out`.

2. **Mature for a methodology bundle.** v5.1.0 + 7-dir test suite +
   release notes — most personal skill catalogs evaluated in this
   batch are v1.x with no tests. This one has been iterated on
   significantly.

3. **The skill list reads as a coherent methodology pipeline.**
   brainstorming → writing-plans → TDD → subagent-driven-development
   → verification-before-completion → finishing-a-development-branch
   → receiving-code-review. That's one opinionated software-engineering
   approach, taught skill-by-skill — not a random utilities catalog.

4. **CLAUDE.md is famously blunt.** "94% PR rejection rate" + "slop
   PRs" naming and shaming. That's a maintainer culture choice — fits
   the anti-slop posture of the methodology, but contributors should
   know the bar before submitting.

5. **8-platform install with uneven coverage.** Claude / OpenCode get
   2-file plugin configs each; Codex / Cursor only 1 file each.
   README lists 8 install paths but the depth-of-integration varies.
   Verify your platform's plugin contract is what you expect before
   relying on a non-Claude install.

## Why the score lands where it does

- 6/7 static claims passed; 1 passed_with_concerns
- 179K stars puts ecosystem at +12 (50K+ band)
- Recently active (+5) + release_pipeline=2 (+5) → +10 maintainer
- Molecule layer +0
- LICENSE present, no penalties

Predicted score: ~85 — solidly **🏭 Team-ready** territory.

## Path to ⭐ Recommend

1. Fill AGENTS.md and GEMINI.md properly (not 0/2 lines).
2. Run a logged live-agent scenario in Claude Code — kick off a
   feature with brainstorming, watch superpowers chain through
   writing-plans → TDD → subagent-driven-development → verify.
3. Multi-evaluator coverage on a non-Claude platform (e.g. Cursor)
   to validate the thinner plugin configs work.
4. Update claim-007 to passed; re-run verdict_calculator.

## Recommended

```yaml
status: evaluated
```