#1
·
2026-05-04
·main@HEAD (v0.2.1, JCST'26 paper)
OpenMAIC
THU-MAIC/OpenMAIC
🛠75 / 100
📝
⚛
→
⚗
→
🧬
🛑
0–29
⚠️
30–49
🛠
50–79
🏭
80–100
▼
75
🛠· 75 / 100
- ✓5 claims passed, no critical failures
- ✓MIT / Apache / etc., installable per deployment.install_methods
- ✓release_pipeline_score=2 + pushed in 90-day window
- ✓multilingual_readme=true
- ⚪compound layer needs a logged scenario run
#2
#3
#4
Vercel one-click deploy | Vercel | easy |
docker compose up | any (Docker) | moderate |
Hosted demo at open.maic.chat | any browser | easy |
OpenAI / Anthropic / Google / DeepSeek / Grok
LLM for classroom generation
Per-classroom token cost can be substantial — pick a model + lock spend before opening to non-tech users
OpenAI / Azure / GLM / Qwen / MiniMax TTS
Voice synthesis for AI teachers
Optional — disable TTS for text-only mode; self-hosted VoxCPM2 is free
· 7
5 2
| +40 | |
| +14 | |
| +15 | |
| +9 | |
| -3 | |
| 0 |
5 / 7
passed claim-001
passed claim-002
passed claim-003
passed claim-004
passed claim-005
untested claim-006
untested claim-007
input_contract | |
|---|---|
output_contract | |
determinism | |
idempotence | |
no_skill_callouts | |
failure_mode_clarity |
workflow_correctness | |
|---|---|
declared_call_graph | |
stop_conditions | |
handoff_points | |
atom_evidence | |
error_propagation | |
partial_failure_handling |
goal_achievement | |
|---|---|
direction_judgment | |
quality_judgment | |
meaningful_autonomy | |
handoff_timing | |
observed_call_graph | |
failure_recovery |
- core user-facing layer untested → capped at 'usable'
- hybrid-repo rule: archetype 'orchestrator' requires end-to-end evaluation of the user-facing layer
- evidence_completeness='partial' (not portable) → capped at 'usable'
- only 2/3 critical claims covered
archetype: orchestrator→core_layer_tested? False→evidence: partial→recommended: usable→final: usable
ceiling 1 · core user-facing layer untested → capped at 'usable'
ceiling 2 · hybrid-repo rule: archetype 'orchestrator' requires end-to-end evaluation of the user-facing layer
ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'
| claim-001 | Next.js + React 19 + LangGraph 1.1 真栈一致 | critical | tech-stack | ● passed | |
| claim-002 | 5 家 LLM provider 真有 env-var 入口 | critical | ai-providers | ● passed | |
| claim-003 | 5 家 TTS provider 真有 env-var 入口 | high | tts-providers | ● passed | |
| claim-004 | 自带 eval harness(不只 talk,真有自测) | high | testing-discipline | ● passed | |
| claim-005 | OpenClaw skill 真存在(README 集成段不是营销词) | high | openclaw-integration | ● passed | |
| claim-006 | 端到端 happy path:一句话 → 真课堂 | critical | end-to-end | ○ untested | |
| claim-007 | AGPL-3.0 + 多 provider 部署成本曲线披露 | high | economics | ○ untested |
0%
0.00s
0
run-static-checks
2026-05-04
0% — tokens in ? / out ?
run-static-checks
2026-05-04
0% — tokens in ? / out ?
# OpenMAIC — final verdict (2026-05-04) ## Repo - **Name:** THU-MAIC/OpenMAIC - **Branch evaluated:** main@HEAD (v0.2.1, JCST'26 paper) - **Archetype:** orchestrator - **Layer:** **compound** — LangGraph multi-agent classroom generation - **Eval framework:** repo-evals layer model v1 (f9ed1e9) ## Bucket **`usable`** — strong static layer with rare positive signals (in-repo eval harness, well-disclosed multi-provider env, clean OpenClaw integration). Compound rule caps `usable` until at least one logged live classroom generation. ## What was evaluated ### Atom + molecule level (static, this run) | Claim | Status | Notes | |---|---|---| | 001 tech stack | passed | next 16.1.2 / react 19.2.3 / langgraph ^1.1.1 / tailwind ^4 — matches README badges | | 002 5 LLM providers | passed | OpenAI/Anthropic/Google/DeepSeek/Grok all with KEY+BASE_URL+MODELS | | 003 5 TTS providers | passed | OpenAI/Azure/GLM/Qwen/MiniMax all with KEY+BASE_URL; MiniMax has default endpoint | | 004 eval harness | passed | 2 named eval scripts (eval:whiteboard + eval:outline-language) reference real tsx runners | | 005 OpenClaw skill | passed | skills/openmaic/SKILL.md (102 lines) with user-invocable, confirmation-heavy SOP | ### Compound level (deferred) | Claim | Status | Required | |---|---|---| | 006 live classroom generation | untested | open.maic.chat or self-hosted; verify slides + quiz + sim + whiteboard + TTS | | 007 cost transparency | untested | README to add per-classroom token + TTS cost estimate | ## Real findings worth surfacing 1. **In-repo eval harness is rare and disciplined.** Most "AI demo" repos don't ship `eval/`. OpenMAIC has two named evals (whiteboard-layout, outline-language) with their own runners and a `shared/` for common code. That's a strong testing-intent signal. 2. **OpenClaw SOP is safety-conscious.** The skill explicitly says "Run one phase at a time and ask for confirmation before each state-changing step". This is the right posture for a multi-step AI orchestrator that might write files / clone repos / start services on the user's machine. 3. **TTS surface is unusually broad.** 5 commercial providers + a self-hosted VoxCPM2 (added in v0.2.1) means the classroom doesn't degrade silently if one provider has issues — the operator can fail over. 4. **Active development cadence.** 4 minor releases in the 6 weeks leading up to eval (v0.1.0 through v0.2.1). Healthy for an academic-affiliated open-source project. ## Why not higher `usable` because: - No live classroom generation logged on this evaluator's machine. Compound layer's user value is the multi-agent dance — static evidence cannot validate that the agents actually teach meaningfully. - Cost transparency is genuinely missing; non-technical users would benefit from a "a 30-min classroom on a typical topic costs roughly $X with default config" line. ## Path to `reusable` 1. Run a live classroom on open.maic.chat with a real LLM key. 2. Self-host a fork; verify Vercel one-click deploy works. 3. Try one PDF-upload classroom; verify the OpenClaw skill SOP end-to-end. 4. Trigger an LLM-provider failure (revoked key) and verify the classroom degrades gracefully. 5. Update claim-006 → `passed`. If the README later adds a cost estimate, claim-007 → `passed`. Re-run verdict_calculator. ## Recommended ```yaml current_bucket: usable status: evaluated ```