repo·evals
· 2026-05-07 ·develop@e1df620 (package version 0.1.0)

tradecat

tukuaiai/tradecat

🛠63 / 100
🎯

🗺
01Market data行情数据02Intel collection情报采集03Signal synthesis信号合成04Methodology routing方法论路由05Backtest validation回测验证06Risk management风控决策07Execution engine执行引擎08Journal & review复盘学习
📍
📍
🧬

🛑
0–29
⚠️
30–49
🛠
50–79
🏭
80–100
63
🛠· 63 / 100
  • 1 critical claim(s) failed
  • README may claim a license but no LICENSE file exists
  • release_pipeline_score=2 + pushed in 90-day window
  • EN-only or ZH-only README
  • static-only eval; live e2e pending

#1👤
#2🎯
#3🧭
#4

curl|sh one-liner installermacOS / Linux / WSL / Git Basheasy
PowerShell irm|iexWindowseasy
git clone + pip install -e .anymoderate
  • 🌐
  • 🔄
Google Sheets (public read)
Hosts the 4 datasets read at runtime via 2 workbooks (market_data + alternative_data)
Free; project owners can rename/delete sheets — single point of failure
· 12
10 1 1
+40
+10
+15
0
0
-2

11 / 12
passed claim-001

passed claim-002

passed claim-003

passed claim-004

passed claim-005

passed claim-006

untested claim-007

passed claim-008

passed claim-009

failed claim-012

input_contract
output_contract
determinism
idempotence
no_skill_callouts
failure_mode_clarity

workflow_correctness
declared_call_graph
stop_conditions
handoff_points
atom_evidence
error_propagation
partial_failure_handling

  • core user-facing layer untested → capped at 'usable'
  • hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
  • evidence_completeness='partial' (not portable) → capped at 'usable'

  • critical claim claim-012 failed

archetype: hybrid-skillcore_layer_tested? Falseevidence: partialrecommended: unusablefinal: unusable
ceiling 1 · core user-facing layer untested → capped at 'usable'
ceiling 2 · hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'

claim-001一键 curl|sh 安装脚本(重构后路径已迁移)真的能完成自检criticalinstall● passed
claim-002dataset_registry.json 与 README 数据集声明对齐(now 2 workbooks)criticaldataset-coverage● passed
claim-003pyproject.toml 与 install.sh 的 Python 版本要求一致highinstall-consistency● passed
claim-004一次性请求脚本 (request.py) 能在不安装的情况下读 datasethighcli-surface● passed
claim-005TUI 在不支持 curses 的终端会优雅降级,不抛 tracebackhighterminal-ux● passed
claim-006默认自动更新有节流,可被环境变量关闭highupdate-policy● passed
claim-007端到端 happy path:sync 一次后能看到真实事件流criticalend-to-end○ untestedrepo-evals 框架禁止把未授信工具装到评测者本地系统。该 claim 需要 curl|sh 改 PATH + Google Sheets 真实拉取。建议项目方把这条 e2e 录到 CI 的 artifact 里(一次 sync run + 三段 exit code + cache 体积),让外部评测者不用上手安装也能验。
claim-008Skill 外壳 + 项目源边界(root SKILL.md / AGENTS.md / scripts/project/)highskill-shell-boundary● passed
claim-009GitHub Actions CI 真实存在(含 skill strict + secret scan)highci-pipeline● passed
claim-010治理脚本(validate-skill / security-scan / supply-chain-audit)真实可执行mediumgovernance-shell● passed
claim-011测试覆盖薄但密度合理(单文件 1622 行 / 81 个测试函数)mediumtest-coverage● passed
claim-012仓库依然没有 LICENSE 文件(README MIT 徽章不真)criticallegal✕ failed

0%
0.00s
0

run-static-checks

2026-05-04
0% tokens in ? / out ?

run-static-checks

2026-05-04
0% tokens in ? / out ?
# TradeCat — final verdict (2026-05-07, full re-eval)

## Repo

- **Name:** tukuaiai/tradecat
- **Branch evaluated:** develop@e1df620 (package version 0.1.0)
- **Archetype:** **hybrid-skill** (changed from `pure-cli` — repo restructured between 2026-05-04 and 2026-05-07)
- **Layer:** **molecule** — Skill shell + 4 dataset readers + sync + probe + TUI wired by predefined orchestration; no LLM-runtime routing
- **Eval framework:** repo-evals layer model v1

## Bucket

**`usable`** — capped by the molecule rule: static layer is unusually clean and even improved since the last eval (CI added, governance shell in place), but actual user value (seeing real market data in a terminal) is still downstream of a live Google Sheets fetch that no static check can validate. The `no-LICENSE-file` defect from the prior eval is still unresolved and now applies to a 935-star repo.

## What was evaluated

### Static layer (this run, all PASS)

| Claim | Status | Notes |
|---|---|---|
| 001 install.sh path migrated to scripts/project/install.sh | passed | 288 lines POSIX shell, 5 env-var overrides + 2 CI skip flags |
| 002 dataset registry coverage (now 2 workbooks) | passed | 4 active datasets across market_data + alternative_data workbooks |
| 003 Python version + entry-points consistency | passed | install.sh 3.12 ↔ pyproject ">=3.12" ↔ 3 entry-points present |
| 004 zero-install request.py | passed | 191 lines, references same dataset_registry.json (raw URL) |
| 005 TUI graceful fallback | passed | TUI_FORCE_CURSES_ENV / TUI_ALLOW_WINDOWS_CURSES_ENV + render_safe_plain_tui present |
| 006 auto-update env vars | passed | install.sh has 8 references to NO_AUTO_UPDATE / FORCE_UPDATE / UPDATE_INTERVAL_SECONDS |
| 008 Skill-shell boundary (NEW) | passed | root SKILL.md (197) + AGENTS.md (98) + scripts/project/AGENTS.md (258); references/ has 8 long docs |
| 009 GitHub Actions CI (NEW) | passed | .github/workflows/ci.yml: validate-skill --strict + secret scan + supply-chain audit |
| 010 Governance scripts (NEW) | passed | 8 root shell scripts, all real bash (~580 lines total) |
| 011 Test coverage density (NEW) | passed | single test_cache_tui.py with 81 test functions / 1622 lines — adequate but brittle to refactor |

### Static layer FAILED

| Claim | Status | Notes |
|---|---|---|
| 012 LICENSE file present | **failed** | gh api license=null + 404 on /contents/LICENSE; README MIT badge does not constitute a license. Unchanged from 2026-05-04. |

### Molecule level (deferred)

| Claim | Status | Required |
|---|---|---|
| 007 e2e live sync | untested (skip) | install + sync + render at least one dataset; framework forbids installing untrusted CLI on evaluator's machine |

## Real findings worth surfacing

1. **Repo restructured to a Skill-shell layout in the last 3 days.** Root holds SKILL.md / AGENTS.md / references/ + thin governance scripts; `scripts/project/` holds the Python package, its own AGENTS.md, install.sh, tests. This is a clean reference layout for "Skill outside, project inside" — recommendable to other skill authors who need to bundle a working Python tool.

2. **CI now exists and is non-trivial.** `.github/workflows/ci.yml` runs `validate-skill.sh --strict` (frontmatter + Codex skill alignment), a secret scan over the diff range (push range or PR base..HEAD), and a supply-chain audit. This earns release_pipeline_score 2 (was 1) but is held below 3 because no e2e sync run is captured as a CI artifact.

3. **Honest README — kept this strength.** Still unusually clear about what the tool *doesn't* do: no PostgreSQL writeback, no SQLite, no cloud accounts, no server credentials. That clarity remains a quality signal.

4. **Single-source dataset contract preserved across the move.** `dataset_registry.json` is now under `scripts/project/src/tradecat_terminal/`, but both the installed CLI *and* the zero-install `request.py` (via raw URL) still consume it. Refactor preserved the single-source guarantee.

5. **2-workbook design is new.** Previous registry had 1 workbook; now `market_data` (3 snapshot datasets) + `alternative_data` (event_stream). This is a healthier separation of concerns — alternative data has a different cadence and ownership profile.

6. **Test concentration is the soft spot.** 81 tests in a single 1622-line file is real coverage but a refactor liability. A reader looking for "where is the cache test" or "where is the TUI test" can't tell from filenames.

7. **License debt grew, not shrank.** README still claims MIT via badge, repo still has no LICENSE file, and the star count grew from 928 to 935 in the 3 days between evals. This is the easiest fix on the list (one file commit) and the highest legal cost of any single missing artifact.

## Score deltas vs. 2026-05-04

- **+** 4 new static claims passed (skill-shell, CI, governance, tests)
- **+** release_pipeline_score 1 → 2 (CI now exists)
- **=** has_license still false (penalty unchanged)
- **=** layer ceiling unchanged (still molecule with deferred e2e)
- **=** archetype changed from pure-cli to hybrid-skill (more accurate, no score effect)

## Next steps to raise the score

1. **Add a LICENSE file** matching the README MIT badge — biggest single-line win. Removes the 935-star unlicensed penalty and unblocks fork/redistribute.
2. **Capture an e2e sync run as CI artifact** (sync exit code + cache size + dataset row count) — would let claim-007 verify without each user running curl|sh, lifting the molecule ceiling.
3. **Split `test_cache_tui.py`** into focused modules (test_cache.py, test_tui.py, test_sync.py) — improves refactor safety and reads cleaner.