repo·evals
· 2026-05-04 ·main@HEAD (3.0.3)

QuantDinger

brokermr810/QuantDinger

🛠76 / 100
🎯

💹
🗺
01Market data行情数据02Intel collection情报采集03Signal synthesis信号合成04Methodology routing方法论路由05Backtest validation回测验证06Risk management风控决策07Execution engine执行引擎08Journal & review复盘学习
📍
📍
📍
🧬

🛑
0–29
⚠️
30–49
🛠
50–79
🏭
80–100
76
🛠· 76 / 100
  • 7 claims passed, no critical failures
  • MIT / Apache / etc., installable per deployment.install_methods
  • release_pipeline_score=3 + pushed in 90-day window
  • multilingual_readme=true
  • compound layer needs a logged scenario run

#1👤
#2🎯
#3🧭
#4

yesnoneeds work需调整ready够好了paper first先 paperapproved已审批Trading idea交易想法(e.g., "RSI dip + MA cross")(比如"RSI 回调 + MA 上穿")Need new indicator?需要新 indicator?(LLM decides)(LLM 决定)AI generates indicator codeAI 生成 indicator 代码(Python, registered)(Python,注册)Backtest engine回测引擎(your data, your params)(你的数据 / 参数)Promising or iterate?够好 或 迭代?(LLM evaluates results)(LLM 评估结果)AI adjusts strategyAI 调整策略(re-run backtest)(重跑回测)Live or paper?实盘 或 paper?(manual-approval gate)(人工审批闸门)Paper accountpaper 账号(validate gate)(验证审批闸门)Live execution实盘执行(IBKR / MT5 / crypto)(IBKR / MT5 / 加密)

docker compose up -d --buildany (Docker)moderate
AWS Marketplace AMIAWS EC2easy
git clone + manual setupanyhard
  • 🛠
  • 🌐
OpenAI API
LLM for AI strategy / indicator generation
One of three LLM providers — pick at least one
DeepSeek API
Alternative LLM provider
Cheaper alternative; CN-friendly
Grok API (x.ai)
Alternative LLM provider
Third LLM option
Interactive Brokers (IBKR)
US stock execution
Brokerage account; paper trading available for free
MetaTrader 5
Forex execution (Windows-only)
Windows host required; Linux not supported
Crypto exchanges (via ccxt)
Crypto execution
ccxt supports many exchanges; trading fees apply
· 8
7 1
+40
+24
+12
+3
-3
0

7 / 8
passed claim-001

passed claim-002

passed claim-003

passed claim-004

passed claim-005

passed claim-006

untested claim-007

passed claim-008

input_contract
output_contract
determinism
idempotence
no_skill_callouts
failure_mode_clarity

workflow_correctness
declared_call_graph
stop_conditions
handoff_points
atom_evidence
error_propagation
partial_failure_handling

goal_achievement
direction_judgment
quality_judgment
meaningful_autonomy
handoff_timing
observed_call_graph
failure_recovery

  • core user-facing layer untested → capped at 'usable'
  • hybrid-repo rule: archetype 'orchestrator' requires end-to-end evaluation of the user-facing layer
  • evidence_completeness='partial' (not portable) → capped at 'usable'

  • only 4/5 critical claims covered

archetype: orchestratorcore_layer_tested? Falseevidence: partialrecommended: usablefinal: usable
ceiling 1 · core user-facing layer untested → capped at 'usable'
ceiling 2 · hybrid-repo rule: archetype 'orchestrator' requires end-to-end evaluation of the user-facing layer
ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'

claim-001"Try in 2 minutes" 一键安装命令真的能装出 4 个服务criticalinstall● passed
claim-002后端基础镜像与 Python 版本声明一致highinstall-consistency● passed
claim-003多 LLM provider 支持(OpenAI / DeepSeek / Grok)真实可配criticalai-providers● passed
claim-004多 broker 集成在 requirements.txt 真实声明highbrokers● passed
claim-005MCP server 是独立 Python 包,可被 AI agent 调起criticalmcp-integration● passed
claim-006默认 docker-compose 不裸暴公网(端口 bind 到 127.0.0.1)highsecurity● passed
claim-007端到端 happy path:MCP agent 触发一次回测能拿到结果criticalend-to-end○ untested
claim-008AI 生成的策略不会自动下真实订单(人工审批)criticalsafety● passed

0%
0.00s
0

run-static-checks

2026-05-04
0% tokens in ? / out ?

run-static-checks

2026-05-04
0% tokens in ? / out ?
# QuantDinger — final verdict (2026-05-04)

## Repo

- **Name:** brokermr810/QuantDinger
- **Branch evaluated:** main@HEAD (3.0.3)
- **Archetype:** orchestrator
- **Layer:** **compound** — multi-agent AI research, LLM-driven
  strategy and indicator generation, ensemble + reflection
- **Eval framework:** repo-evals layer model v1 (4acbd5d)

## Bucket

**`usable`** — strong static layer. Compound rule caps `usable`
until at least one logged agent-driven scenario, and a platform that
lets an LLM trade real money needs a verified manual-approval gate
before any higher bucket can be claimed.

## What was evaluated

### Atom + molecule level (static, this run)

| Claim | Status | Notes |
|---|---|---|
| 001 4-service compose | passed | postgres + redis + backend + frontend, all healthchecked |
| 002 base-image consistency | passed | python:3.12-slim-bookworm in both Dockerfile and compose |
| 003 multi-LLM | passed | OpenAI / DeepSeek / Grok all with `*_BASE_URL` overrides |
| 004 multi-broker | passed | ccxt + ib_insync + finnhub + yfinance + akshare in requirements; MetaTrader5 conditional + Windows-only note |
| 005 MCP server | passed | quantdinger-mcp 0.1.0 with `mcp>=1.2.0`; supports 5 named agent runtimes |
| 006 default port binding | passed | postgres/redis/backend bind 127.0.0.1; frontend public-by-design |
| 008 live-trading off by default | passed | `AGENT_LIVE_TRADING_ENABLED=false`; env.example references paper-only force-pin |

### Compound level (deferred)

| Claim | Status | Required |
|---|---|---|
| 007 MCP-agent e2e | untested | install + LLM key + Cursor/Claude Code session running a real backtest end-to-end via MCP |
| 008 live-order gating in practice | untested | flip flag in paper-broker test, verify manual approval is enforced (not just UI) |

## Real findings worth surfacing

1. **The default safety posture is real.** `AGENT_LIVE_TRADING_ENABLED=false`
   + `paper_only` force-pinned + localhost-only bindings on
   sensitive services together mean the default deploy doesn't
   auto-fire live orders or auto-leak postgres/redis. That's the
   right baseline for a platform where an LLM writes trading code.

2. **MetaTrader5 is structurally Linux-incompatible.** The Python
   package only ships Windows wheels. README mentions "MT5 forex"
   alongside crypto and stocks as a peer; the requirements file is
   honest (Windows-only comment), but a casual reader of the README
   could miss that and pick a Linux server expecting forex to work.
   This belongs in `watch_out`.

3. **OSS / SaaS / Marketplace overlap.** README links to
   ai.quantdinger.com (SaaS), AWS Marketplace AMI, and a billing
   primitive in the OSS repo. A user evaluating "is this open
   source?" should read which features are gated and which are
   genuinely free to self-host.

4. **MCP integration is a separately-versioned package.** Not a
   stub or marketing phrase — `mcp_server/` has its own pyproject,
   its own version (0.1.0), its own console_scripts entry. Easier
   to audit than a "we mention MCP somewhere" claim.

## Why not higher

`usable` is the right ceiling because:

- No live agent-driven scenario logged. Compound layer is exactly
  the case where static evidence cannot translate to user-facing
  trust without a real session.
- The most consequential claim — that an LLM cannot auto-fire live
  orders — is verifiable only with a live test, and is too
  important to assume from one default-off env var.

## Path to `reusable`

1. Bring up the stack on a fresh host with `docker-compose up -d`.
2. Wire MCP into Claude Code (or Cursor) per README Step 2.
3. Ask the agent to run one backtest and capture: tool calls
   actually used, structured artefact returned, token usage.
4. With paper account: enable live trading, attempt to submit an
   order through the agent, confirm a manual-approval step
   intercedes.
5. Log under `runs/<date>/run-{compound-happy,compound-safety}/`.
6. Update claim-007 + claim-008 to `passed` if both work as
   advertised; re-run verdict_calculator.

## Recommended

```yaml
current_bucket: usable
status: evaluated
```