· 2026-05-04 ·main@HEAD (skills-index v1.2.0, generated 2026-04-29)

goose-skills

gooseworks-ai/goose-skills

🛠66 / 100

✅

⚠

🎯

⚠

🗺

📍

⚛

→

⚗

→

🧬

🛑

0–29

⚠️

30–49

🛠

50–79

🏭

80–100

▼

🛠· 66 / 100

✓6 claims passed, no critical failures
⚠README may claim a license but no LICENSE file exists
✓release_pipeline_score=2 + pushed in 90-day window
⚪EN-only or ZH-only README
⚪static-only eval; live e2e pending

#1👤

#2🎯

#3🧭

#4⇄

anthropics/skill-creator

🏭 · 81molecule

RKiding/Awesome-finance-skills

🛠 · 59molecule


`npx gooseworks install --claude/--cursor/--codex/--all`	any (npm)	easy
`git clone + copy skills/ to ~/.claude/skills/`	any	moderate

📡

Anthropic Claude API (or Cursor / Codex)

LLM that consumes the skill prompts

Per-skill execution token cost varies; some skills hit external APIs (Reddit, GitHub) too

Per-skill external APIs

Some composites call Reddit / GitHub / Meta Ad Library / Apollo / Semrush / Ahrefs / Apify

Each skill in the catalog may need its own API keys — read SKILL.md before running. Apollo/Semrush/Ahrefs are paid.

· 7

4 2 1

	+40
	+18
	+10
	0
	0
	-2

6 / 7

passed claim-001

passed claim-002

passed claim-003

passed claim-004

passed claim-005

passed claim-006

untested claim-007

`input_contract`
`output_contract`
`determinism`
`idempotence`
`no_skill_callouts`
`failure_mode_clarity`

`workflow_correctness`
`declared_call_graph`
`stop_conditions`
`handoff_points`
`atom_evidence`
`error_propagation`
`partial_failure_handling`

core user-facing layer untested → capped at 'usable'
hybrid-repo rule: archetype 'prompt-skill' requires end-to-end evaluation of the user-facing layer
evidence_completeness='partial' (not portable) → capped at 'usable'

only 4/5 critical claims covered

archetype: prompt-skill→core_layer_tested? False→evidence: partial→recommended: usable→final: usable

ceiling 1 · core user-facing layer untested → capped at 'usable'

ceiling 2 · hybrid-repo rule: archetype 'prompt-skill' requires end-to-end evaluation of the user-facing layer

ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'


claim-001	skills-index.json 与文档承诺的 skill 总数对齐	critical	catalog-coverage	◐ partial
claim-002	capabilities / composites / playbooks 三大类都真实存在且非空	critical	taxonomy	● passed
claim-003	每个 skill 都遵循统一的元数据契约	high	contract	● passed
claim-004	npm 包真实可装，bin 入口存在	critical	install	◐ partial
claim-005	skill packs（如 lead-gen-devtools）确实是多 skill 集合	high	composition	● passed
claim-006	每个 skill 在 Claude Code / Cursor / Codex 三家都能调起	critical	cross-platform	● passed
claim-007	端到端：在真实 agent 里装 + 调用 skill 能完成任务	critical	end-to-end	○ untested

0.00s

run-static-checks

2026-05-04

0% — tokens in ? / out ?

run-static-checks

2026-05-04

0% — tokens in ? / out ?

# goose-skills — final verdict (2026-05-04)

## Repo

- **Name:** gooseworks-ai/goose-skills
- **Branch evaluated:** main@HEAD (skills-index 1.2.0, generated 2026-04-29)
- **Archetype:** prompt-skill (catalog of prompt skills)
- **Layer:** **molecule** at the repo level (catalog wired by
  manifest + npm installer); individual skills have their own layer
  (capabilities ≈ atom, composites ≈ molecule, playbooks ≈ compound)
- **Eval framework:** repo-evals layer model v1 (fe256e5)

## Bucket

**`usable`** — strong static layer; capped by the molecule rule
because no live skill execution has been logged on this evaluator's
machine.

## What was evaluated

### Atom + molecule level (static, this run)

| Claim | Status | Notes |
|---|---|---|
| 001 catalog count | passed_with_concerns | 204 skills in manifest vs 108 in README — docs stale |
| 002 three categories | passed | 143 capabilities + 56 composites + 5 playbooks, all non-empty |
| 003 metadata contract | passed | Sampled 3 skills — uniform shape with `installation.{base_command, supports}` |
| 004 npm + bin | passed_with_concerns | bin/goose-skills.js exists (12.5 KB); npm@1.1.0 is ahead of repo@1.0.1 |
| 005 packs | passed | 2 real packs (lead-gen-devtools=7 skills, video-production=5 skills); README's "7-skill" claim matches |
| 006 cross-platform | passed | All 204 skills declare `supports = [claude, cursor, codex]` (100% uniform) |

### Molecule level (deferred)

| Claim | Status | Required |
|---|---|---|
| 007 live skill execution | untested | install via `npx`, run 1 capability + 1 composite + 1 playbook in a real agent session, log token + output evidence |

## Real findings worth surfacing

1. **Cap/Comp/Play taxonomy ≈ atom/molecule/compound.** Goose's
   internal classification (capabilities → composites → playbooks) is
   functionally identical to repo-evals' atom/molecule/compound layer
   model. We didn't invent the insight; we formalized it. This is
   worth surfacing in the meta-reflection on framework neutrality.

2. **README is meaningfully out of date.** "108 skills" is the
   headline, "204" is the reality. Not a false claim, but it
   under-sells the catalog and could send users to the npm package
   thinking the surface is half what it is.

3. **npm is one minor version ahead of repo.** A user reading the
   source on `main` (v1.0.1) sees something different from what
   `npx goose-skills install` ships (v1.1.0 on the registry). Not
   broken, but a maintainer / contributor will be confused.

4. **Pack contract is real, not marketing.** Both packs ship genuine
   `shared_files` (.env.example + requirements.txt + more), so
   "configure once, use whole pack" is structurally enforced, not
   just suggested.

## Why not higher

`usable` is the right ceiling because:

- No live skill execution evidence on this machine. The catalog could
  have 204 manifest entries and still ship low-signal SKILL.md content
  inside any one of them. Per-skill quality is the trust-determining
  variable, and we sampled only the manifest, not the prompt content.
- Skill quality is heterogeneous by definition (different authors,
  different review depth) — we'd need to sample, not assume.

## Path to `reusable`

1. `npx gooseworks install --claude` in fresh Claude Code.
2. Pick 1 skill per category (suggested: `brand-voice-extractor` /
   `competitor-intel` / `competitor-monitoring-system`).
3. Run each on a representative input. Capture the agent's
   intermediate plan, final output, and token usage.
4. Log under `runs/<date>/run-live-execution/` with one business-notes
   per skill.
5. Update claim-007 status. If all three pass with a useful artefact
   and the `lead-gen-devtools` pack also runs end-to-end, candidate
   for `reusable` (still not `recommendable` until 2nd evaluator).

## Recommended

```yaml
current_bucket: usable
status: evaluated
```