repo·evals
· 2026-05-05 ·main@HEAD (npm v5.1.0)

superpowers

obra/superpowers

🛠77 / 100
🎯

🛠
🗺
01Research调研02Plan & design计划与设计03Code & review开发与评审04Package打包发布05Maintain维护
📍
🧬

🛑
0–29
⚠️
30–49
🛠
50–79
🏭
80–100
77
🛠· 77 / 100
  • 6 claims passed, no critical failures
  • MIT / Apache / etc., installable per deployment.install_methods
  • release_pipeline_score=2 + pushed in 90-day window
  • EN-only or ZH-only README
  • compound layer needs a logged scenario run

#1👤
#2🎯
#3🧭
#4

no不清晰yes清晰splitsingle不拆User feature request用户提需求Spec clear enough?需求是否清晰?(LLM decides)(LLM 决定)brainstormingbrainstormingwriting-planswriting-planstest-driven-developmenttest-driven-developmentBig enough to split?工作量需要拆分?(LLM decides)(LLM 决定)subagent-driven-developmentsubagent-driven-developmentverification-before-completionverification-before-completionfinishing-a-development-branchfinishing-a-development-branchreceiving-code-reviewreceiving-code-review

/plugin install superpowers@claude-plugins-official (Claude Code)Claude Codeeasy
Plugin marketplace (Codex CLI / Codex App / Factory Droid / Gemini CLI / OpenCode / Cursor / GitHub Copilot CLI)8 agent platformseasy
  • 📡
AI agent harness (Claude Code / Codex / Cursor / Gemini / OpenCode / Factory Droid / GH Copilot CLI)
Host that loads + auto-triggers the skills
Standard agent-side cost; auto-triggering 14 skills means more tokens consumed per task before any actual work happens
· 7
5 1 1
+40
+18
+10
+12
-3
0

6 / 7
passed claim-001

passed claim-002

passed claim-003

passed claim-004

passed claim-005

passed claim-006

untested claim-007

input_contract
output_contract
determinism
idempotence
no_skill_callouts
failure_mode_clarity

workflow_correctness
declared_call_graph
stop_conditions
handoff_points
atom_evidence
error_propagation
partial_failure_handling

goal_achievement
direction_judgment
quality_judgment
meaningful_autonomy
handoff_timing
observed_call_graph
failure_recovery

  • core user-facing layer untested → capped at 'usable'
  • hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
  • evidence_completeness='partial' (not portable) → capped at 'usable'

  • only 3/4 critical claims covered

archetype: hybrid-skillcore_layer_tested? Falseevidence: partialrecommended: usablefinal: usable
ceiling 1 · core user-facing layer untested → capped at 'usable'
ceiling 2 · hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'

claim-00114 个 methodology skills 都真实存在且非占位criticalskill-coverage● passed
claim-0028-platform 安装路径都有官方 marketplace 或 plugin 入口criticalcross-platform● passed
claim-003package.json + 版本号说明项目成熟度highmaturity● passed
claim-004仓库有 LICENSE 文件criticallicensing● passed
claim-005CLAUDE.md / GEMINI.md / AGENTS.md 等 agent-facing docs 真有内容highagent-discoverability◐ partial
claim-006tests/ 真有测试覆盖hightesting● passed
claim-007端到端:装上 superpowers 后 agent 真在工作流里调起方法论criticalend-to-end○ untested

0%
0.00s
0

run-static-checks

2026-05-05
0% tokens in ? / out ?

run-static-checks

2026-05-05
0% tokens in ? / out ?
# obra/superpowers — final verdict (2026-05-05)

## Repo

- **Name:** obra/superpowers · **Stars:** 178,762 · **License:** MIT
- **Archetype:** hybrid-skill · **Layer:** molecule
- **Version:** v5.1.0 · **Pushed:** 2026-05-04 (yesterday)

## What was evaluated

| Claim | Status | Notes |
|---|---|---|
| 001 14 methodology skills | passed | All 5 sampled 152-371 lines |
| 002 8-platform install | passed | 4 plugin config dirs + 8 README install sections |
| 003 mature versioning | passed | v5.1.0 + RELEASE-NOTES.md |
| 004 LICENSE | passed | MIT |
| 005 agent-facing docs | passed_with_concerns | CLAUDE.md (106 lines) is real, but GEMINI.md (2) and AGENTS.md (0 bytes empty) are thin |
| 006 tests | passed | 7 test dirs covering install paths + skill triggering |
| 007 live agent workflow | untested | needs Claude Code session on a real feature task |

## Real findings

1. **AGENTS.md empty + GEMINI.md only 2 lines.** For a project whose
   audience is *coding agents*, that's an ironic gap. Claude Code is
   first-class; everything else is second-class. Worth disclosing in
   `watch_out`.

2. **Mature for a methodology bundle.** v5.1.0 + 7-dir test suite +
   release notes — most personal skill catalogs evaluated in this
   batch are v1.x with no tests. This one has been iterated on
   significantly.

3. **The skill list reads as a coherent methodology pipeline.**
   brainstorming → writing-plans → TDD → subagent-driven-development
   → verification-before-completion → finishing-a-development-branch
   → receiving-code-review. That's one opinionated software-engineering
   approach, taught skill-by-skill — not a random utilities catalog.

4. **CLAUDE.md is famously blunt.** "94% PR rejection rate" + "slop
   PRs" naming and shaming. That's a maintainer culture choice — fits
   the anti-slop posture of the methodology, but contributors should
   know the bar before submitting.

5. **8-platform install with uneven coverage.** Claude / OpenCode get
   2-file plugin configs each; Codex / Cursor only 1 file each.
   README lists 8 install paths but the depth-of-integration varies.
   Verify your platform's plugin contract is what you expect before
   relying on a non-Claude install.

## Why the score lands where it does

- 6/7 static claims passed; 1 passed_with_concerns
- 179K stars puts ecosystem at +12 (50K+ band)
- Recently active (+5) + release_pipeline=2 (+5) → +10 maintainer
- Molecule layer +0
- LICENSE present, no penalties

Predicted score: ~85 — solidly **🏭 Team-ready** territory.

## Path to ⭐ Recommend

1. Fill AGENTS.md and GEMINI.md properly (not 0/2 lines).
2. Run a logged live-agent scenario in Claude Code — kick off a
   feature with brainstorming, watch superpowers chain through
   writing-plans → TDD → subagent-driven-development → verify.
3. Multi-evaluator coverage on a non-Claude platform (e.g. Cursor)
   to validate the thinner plugin configs work.
4. Update claim-007 to passed; re-run verdict_calculator.

## Recommended

```yaml
status: evaluated
```