#1
·
2026-05-07
·develop@e1df620 (package version 0.1.0)
tradecat
tukuaiai/tradecat
🛠63 / 100
🗺
📍
📍
⚛
→
⚗
→
🧬
🛑
0–29
⚠️
30–49
🛠
50–79
🏭
80–100
▼
63
🛠· 63 / 100
- ✗1 critical claim(s) failed
- ⚠README may claim a license but no LICENSE file exists
- ✓release_pipeline_score=2 + pushed in 90-day window
- ⚪EN-only or ZH-only README
- ⚪static-only eval; live e2e pending
#2
#3
#4
curl|sh one-liner installer | macOS / Linux / WSL / Git Bash | easy |
PowerShell irm|iex | Windows | easy |
git clone + pip install -e . | any | moderate |
Google Sheets (public read)
Hosts the 4 datasets read at runtime via 2 workbooks (market_data + alternative_data)
Free; project owners can rename/delete sheets — single point of failure
· 12
10 1 1
| +40 | |
| +10 | |
| +15 | |
| 0 | |
| 0 | |
| -2 |
11 / 12
passed claim-001
passed claim-002
passed claim-003
passed claim-004
passed claim-005
passed claim-006
untested claim-007
passed claim-008
passed claim-009
failed claim-012
input_contract | |
|---|---|
output_contract | |
determinism | |
idempotence | |
no_skill_callouts | |
failure_mode_clarity |
workflow_correctness | |
|---|---|
declared_call_graph | |
stop_conditions | |
handoff_points | |
atom_evidence | |
error_propagation | |
partial_failure_handling |
- core user-facing layer untested → capped at 'usable'
- hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
- evidence_completeness='partial' (not portable) → capped at 'usable'
- critical claim claim-012 failed
archetype: hybrid-skill→core_layer_tested? False→evidence: partial→recommended: unusable→final: unusable
ceiling 1 · core user-facing layer untested → capped at 'usable'
ceiling 2 · hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'
| claim-001 | 一键 curl|sh 安装脚本(重构后路径已迁移)真的能完成自检 | critical | install | ● passed | |
| claim-002 | dataset_registry.json 与 README 数据集声明对齐(now 2 workbooks) | critical | dataset-coverage | ● passed | |
| claim-003 | pyproject.toml 与 install.sh 的 Python 版本要求一致 | high | install-consistency | ● passed | |
| claim-004 | 一次性请求脚本 (request.py) 能在不安装的情况下读 dataset | high | cli-surface | ● passed | |
| claim-005 | TUI 在不支持 curses 的终端会优雅降级,不抛 traceback | high | terminal-ux | ● passed | |
| claim-006 | 默认自动更新有节流,可被环境变量关闭 | high | update-policy | ● passed | |
| claim-007 | 端到端 happy path:sync 一次后能看到真实事件流 | critical | end-to-end | ○ untested | repo-evals 框架禁止把未授信工具装到评测者本地系统。该 claim 需要 curl|sh 改 PATH + Google Sheets 真实拉取。建议项目方把这条 e2e 录到 CI 的 artifact 里(一次 sync run + 三段 exit code + cache 体积),让外部评测者不用上手安装也能验。 |
| claim-008 | Skill 外壳 + 项目源边界(root SKILL.md / AGENTS.md / scripts/project/) | high | skill-shell-boundary | ● passed | |
| claim-009 | GitHub Actions CI 真实存在(含 skill strict + secret scan) | high | ci-pipeline | ● passed | |
| claim-010 | 治理脚本(validate-skill / security-scan / supply-chain-audit)真实可执行 | medium | governance-shell | ● passed | |
| claim-011 | 测试覆盖薄但密度合理(单文件 1622 行 / 81 个测试函数) | medium | test-coverage | ● passed | |
| claim-012 | 仓库依然没有 LICENSE 文件(README MIT 徽章不真) | critical | legal | ✕ failed |
0%
0.00s
0
run-static-checks
2026-05-04
0% — tokens in ? / out ?
run-static-checks
2026-05-04
0% — tokens in ? / out ?
# TradeCat — final verdict (2026-05-07, full re-eval) ## Repo - **Name:** tukuaiai/tradecat - **Branch evaluated:** develop@e1df620 (package version 0.1.0) - **Archetype:** **hybrid-skill** (changed from `pure-cli` — repo restructured between 2026-05-04 and 2026-05-07) - **Layer:** **molecule** — Skill shell + 4 dataset readers + sync + probe + TUI wired by predefined orchestration; no LLM-runtime routing - **Eval framework:** repo-evals layer model v1 ## Bucket **`usable`** — capped by the molecule rule: static layer is unusually clean and even improved since the last eval (CI added, governance shell in place), but actual user value (seeing real market data in a terminal) is still downstream of a live Google Sheets fetch that no static check can validate. The `no-LICENSE-file` defect from the prior eval is still unresolved and now applies to a 935-star repo. ## What was evaluated ### Static layer (this run, all PASS) | Claim | Status | Notes | |---|---|---| | 001 install.sh path migrated to scripts/project/install.sh | passed | 288 lines POSIX shell, 5 env-var overrides + 2 CI skip flags | | 002 dataset registry coverage (now 2 workbooks) | passed | 4 active datasets across market_data + alternative_data workbooks | | 003 Python version + entry-points consistency | passed | install.sh 3.12 ↔ pyproject ">=3.12" ↔ 3 entry-points present | | 004 zero-install request.py | passed | 191 lines, references same dataset_registry.json (raw URL) | | 005 TUI graceful fallback | passed | TUI_FORCE_CURSES_ENV / TUI_ALLOW_WINDOWS_CURSES_ENV + render_safe_plain_tui present | | 006 auto-update env vars | passed | install.sh has 8 references to NO_AUTO_UPDATE / FORCE_UPDATE / UPDATE_INTERVAL_SECONDS | | 008 Skill-shell boundary (NEW) | passed | root SKILL.md (197) + AGENTS.md (98) + scripts/project/AGENTS.md (258); references/ has 8 long docs | | 009 GitHub Actions CI (NEW) | passed | .github/workflows/ci.yml: validate-skill --strict + secret scan + supply-chain audit | | 010 Governance scripts (NEW) | passed | 8 root shell scripts, all real bash (~580 lines total) | | 011 Test coverage density (NEW) | passed | single test_cache_tui.py with 81 test functions / 1622 lines — adequate but brittle to refactor | ### Static layer FAILED | Claim | Status | Notes | |---|---|---| | 012 LICENSE file present | **failed** | gh api license=null + 404 on /contents/LICENSE; README MIT badge does not constitute a license. Unchanged from 2026-05-04. | ### Molecule level (deferred) | Claim | Status | Required | |---|---|---| | 007 e2e live sync | untested (skip) | install + sync + render at least one dataset; framework forbids installing untrusted CLI on evaluator's machine | ## Real findings worth surfacing 1. **Repo restructured to a Skill-shell layout in the last 3 days.** Root holds SKILL.md / AGENTS.md / references/ + thin governance scripts; `scripts/project/` holds the Python package, its own AGENTS.md, install.sh, tests. This is a clean reference layout for "Skill outside, project inside" — recommendable to other skill authors who need to bundle a working Python tool. 2. **CI now exists and is non-trivial.** `.github/workflows/ci.yml` runs `validate-skill.sh --strict` (frontmatter + Codex skill alignment), a secret scan over the diff range (push range or PR base..HEAD), and a supply-chain audit. This earns release_pipeline_score 2 (was 1) but is held below 3 because no e2e sync run is captured as a CI artifact. 3. **Honest README — kept this strength.** Still unusually clear about what the tool *doesn't* do: no PostgreSQL writeback, no SQLite, no cloud accounts, no server credentials. That clarity remains a quality signal. 4. **Single-source dataset contract preserved across the move.** `dataset_registry.json` is now under `scripts/project/src/tradecat_terminal/`, but both the installed CLI *and* the zero-install `request.py` (via raw URL) still consume it. Refactor preserved the single-source guarantee. 5. **2-workbook design is new.** Previous registry had 1 workbook; now `market_data` (3 snapshot datasets) + `alternative_data` (event_stream). This is a healthier separation of concerns — alternative data has a different cadence and ownership profile. 6. **Test concentration is the soft spot.** 81 tests in a single 1622-line file is real coverage but a refactor liability. A reader looking for "where is the cache test" or "where is the TUI test" can't tell from filenames. 7. **License debt grew, not shrank.** README still claims MIT via badge, repo still has no LICENSE file, and the star count grew from 928 to 935 in the 3 days between evals. This is the easiest fix on the list (one file commit) and the highest legal cost of any single missing artifact. ## Score deltas vs. 2026-05-04 - **+** 4 new static claims passed (skill-shell, CI, governance, tests) - **+** release_pipeline_score 1 → 2 (CI now exists) - **=** has_license still false (penalty unchanged) - **=** layer ceiling unchanged (still molecule with deferred e2e) - **=** archetype changed from pure-cli to hybrid-skill (more accurate, no score effect) ## Next steps to raise the score 1. **Add a LICENSE file** matching the README MIT badge — biggest single-line win. Removes the 935-star unlicensed penalty and unblocks fork/redistribute. 2. **Capture an e2e sync run as CI artifact** (sync exit code + cache size + dataset row count) — would let claim-007 verify without each user running curl|sh, lifting the molecule ceiling. 3. **Split `test_cache_tui.py`** into focused modules (test_cache.py, test_tui.py, test_sync.py) — improves refactor safety and reads cleaner.