repo·evals
· 2026-04-24

skills-manage

iamzhihuix/skills-manage

🛠65 / 100
🎯

🛠
🗺
01Research调研02Plan & design计划与设计03Code & review开发与评审04Package打包发布05Maintain维护
📍
🧬

🛑
0–29
⚠️
30–49
🛠
50–79
🏭
80–100
65
🛠· 65 / 100
  • 8 claims passed, no critical failures
  • MIT / Apache / etc., installable per deployment.install_methods
  • release_pipeline_score=2 + pushed in 90-day window
  • multilingual_readme=true
  • evidence_completeness=portable

#1👤
#2🎯
#3🧭
#4

Register / import skill注册 / 导入 skill(via Tauri GUI)(Tauri GUI)Central storage中心存储(~/.agents/skills/)(~/.agents/skills/)Pick target tools选目标工具(Claude / Cursor / Codex / ...)(Claude / Cursor / Codex / ...)Create symlinks创建符号链接(per-tool dir)(各家工具目录)Skill active in every tool所有工具里都生效(edit-once, sync-many)(改一次 / 多处同步)

Download Tauri DMG (unsigned — needs xattr workaround)macOSmoderate
Build from source (Tauri / Rust)anyhard
  • 📡
AI coding tools (Claude Code, Cursor, Codex, Gemini CLI, Trae, +24 more)
Target tools whose skill folders are managed
Manages local skill folders; doesn't talk to upstream APIs itself
· 9
6 1 1 1
+40
+25
0
0
0
0

8 / 9
partial claim-001

passed claim-002

passed claim-003

passed claim-004

passed claim-005

failed claim-006

passed claim-007

untested claim-009

input_contract
output_contract
determinism
idempotence
no_skill_callouts
failure_mode_clarity

workflow_correctness
declared_call_graph
stop_conditions
handoff_points
atom_evidence
error_propagation
partial_failure_handling

  • core user-facing layer untested → capped at 'usable'
  • evidence_completeness='portable' → capped at 'reusable'

  • only 5/6 critical claims covered

archetype: adaptercore_layer_tested? Falseevidence: portablerecommended: usablefinal: usable
ceiling 1 · core user-facing layer untested → capped at 'usable'
ceiling 2 · evidence_completeness='portable' → capped at 'reusable'

claim-001每个已声明平台都在 code 层注册为 adaptercriticalplatform-coverage◐ partial
claim-002所有 install 产出同一个 shape(symlink → central)criticalshape-conformance● passed
claim-003未装的/未列出的平台清晰失败criticalfailure-transparency● passed
claim-004GitHub 导入有真实 auth + retry fallbackhighauth● passed
claim-005重复 install 同一个 skill 是幂等的highdeduplication● passed
claim-006数据本地化 + 无遥测criticalupstream-drift✕ failed
claim-007Prebuilt macOS DMG 下载即可用criticalplatform-coverage● passed
claim-008支持 Central↔平台双向 centralizemediumshape-conformance● passed
claim-009实际启动 + 扫描 + install 的端到端 runtime 验证criticalplatform-coverage○ untestedWendy 当前机器上 ~/.claude/skills/ (260+) 和 ~/.agents/skills/ 正在被 本次 session 使用,启动未签名 adhoc-signed app 去扫描+改写这些目录 风险过高。建议在干净账户 / VM 里再验。

50%
3.38s
40

run-source-and-dmg-integrity

2026-04-24
50% 3.4s tokens in 2903 / out 40
  • claim-001 · partial
  • claim-002 · passed
  • claim-003 · passed
  • claim-004 · passed
  • claim-005 · passed
  • claim-006 · failed
  • claim-007 · passed
  • claim-008 · passed

run-source-and-dmg-integrity

2026-04-24
50% 3.4s tokens in 2903 / out 40
  • claim-001 · partial
  • claim-002 · passed
  • claim-003 · passed
  • claim-004 · passed
  • claim-005 · passed
  • claim-006 · failed
  • claim-007 · passed
  • claim-008 · passed
# Final Verdict — iamzhihuix/skills-manage

## Repo

- **Name**: iamzhihuix/skills-manage
- **Version tested**: v0.9.1 (2026-04-23)
- **Date**: 2026-04-24
- **Archetype**: adapter
- **Final bucket**: `usable`
- **Confidence**: low (per verdict_calculator.py)

## Verdict Calculator Output

```
Recommended bucket: usable
Final bucket:       usable
Confidence:         low

Ceiling reasons:
  - core user-facing layer untested → capped at 'usable'
  - evidence_completeness='portable' → capped at 'reusable'

Blocking issues:
  - only 5/6 critical claims covered
```

Inputs: 8 of 9 claims passed (claim-001 passed_with_concerns, claim-009 untested). 5 of 6 critical claims covered; claim-009 is the uncovered one.

## Why This Bucket

### Core Outcome — code path exists, end-to-end not proven

Every claim about install / uninstall / symlink / detection / GitHub import / local-first storage has a concrete, reviewable code path. The prebuilt DMG is byte-identical to the release asset digest. But no real user workflow was executed through the GUI — the adapter archetype says that failing to exercise the *actual user-facing layer* caps the verdict at `usable`, and the rule applied.

### Scenario Breadth — narrow on purpose

Only one scenario was tested: "open the source + download + inspect bundle". No per-platform install smoke, no collection batch-install, no discover scan against a real project tree. The breadth floor is 1, not 28.

### Repeatability — not tested

No repeat runs. Idempotency was verified at the *schema level* (`ON CONFLICT(skill_id, agent_id) DO UPDATE`) but not by running the same install twice and inspecting on-disk state.

### Failure Transparency — good signals in code

- `is_agent_detected()` honestly returns false when both dir and parent are missing.
- `ensure_centralized()` errors out with explicit messages when the source skill is missing.
- GitHub import falls back through 4 mirrors, so a single network failure won't produce a misleading empty import.
- Zero telemetry libraries, so a failure can't be silently phoned home.

## What I Would Say In Plain English

skills-manage is a well-built young project (910 stars in 11 days is not an accident — the code shows it). The README's claims about "central library + symlink to per-platform" are not marketing: they are literally implemented with `std::os::unix::fs::symlink` and a relative-path computation that makes the links portable. Privacy claims are honest — the database is where they say it is, and there is no analytics dependency anywhere.

But this evaluation did not prove the app works for a real user. It proved the code for each claim exists. For a pre-1.0 Tauri desktop app that requires `xattr -dr com.apple.quarantine` to launch and will scan/modify directories many other tools are already managing, "code exists" is not enough to recommend.

**Use it if** you have a clean macOS account or a VM, and your skills live in one place today.
**Wait if** your `~/.claude/skills/`, `~/.agents/skills/`, or `~/.cursor/skills/` are already managed by another tool (dbskill, lobster lock file, plugin registries) — test in isolation first.

## Remaining Risks

1. **claim-009 (runtime E2E) untested**. Everything downstream of "user clicks install" is inferred, not observed.
2. **EasyClaw V2 listed in README but not seeded in code** (`builtin_agents()` has 27 ids vs README's 28 platforms). If a user specifically needs that platform, the adapter is missing.
3. **API keys stored unencrypted** in `~/.skillsmanage/db.sqlite` (README self-discloses; still a real constraint).
4. **adhoc signing + no notarization**. The `xattr` workaround is a permanent requirement until the maintainer signs the build.
5. **3 legacy failing frontend tests** (CLAUDE.md self-discloses). Not in the core path but worth noting.
6. **Schema drift**: README and code disagree about Hermes category and React version. Low-risk but pattern-of-minor-drift is a smell to watch at 1.0.

## What Would Move It To `reusable`

- A live run on a clean macOS user account: launch app, detect platforms, install one skill to two platforms, verify symlinks on disk, uninstall, verify cleanup — all with screenshots/log evidence.
- A repeat run proving idempotency at the filesystem level.
- An unsupported-input run (e.g., custom agent with a read-only dir) proving the failure is loud.

## What Would Move It To `recommendable`

- Everything above, plus:
- A 1.0 release with notarized macOS build and a Linux build (currently source-only).
- The 3 legacy failing frontend tests fixed.
- README-code consistency pass (EasyClaw V2, React version, Hermes category).

## Related Artifacts

- Claim map: `claims/claim-map.yaml`
- Plan: `plans/2026-04-24-eval-plan.md`
- Run: `runs/2026-04-24/run-source-and-dmg-integrity/`
  - DMG: `artifacts/skills-manage_0.9.1_macos_universal.dmg`
  - DMG integrity log: `logs/dmg-integrity.log`
  - Source inspection log: `logs/source-inspection.log`
  - Business notes: `business-notes.md`
- Verdict calculator input: `verdicts/2026-04-24-verdict-input.yaml`