· 2026-05-04 ·main@HEAD (v0.2.1, JCST'26 paper)

OpenMAIC

THU-MAIC/OpenMAIC

🛠75 / 100

✅

⚠

🎯

⚠

📝

⚛

→

⚗

→

🧬

🛑

0–29

⚠️

30–49

🛠

50–79

🏭

80–100

▼

🛠· 75 / 100

✓5 claims passed, no critical failures
✓MIT / Apache / etc., installable per deployment.install_methods
✓release_pipeline_score=2 + pushed in 90-day window
✓multilingual_readme=true
⚪compound layer needs a logged scenario run

#1👤

#2🎯

#3🧭

#4⇄

zarazhangrui/codebase-to-course

🛠 · 60atom

zarazhangrui/frontend-slides

🛠 · 62atom


`Vercel one-click deploy`	Vercel	easy
`docker compose up`	any (Docker)	moderate
`Hosted demo at open.maic.chat`	any browser	easy

🛠
🌐

OpenAI / Anthropic / Google / DeepSeek / Grok

LLM for classroom generation

Per-classroom token cost can be substantial — pick a model + lock spend before opening to non-tech users

OpenAI / Azure / GLM / Qwen / MiniMax TTS

Voice synthesis for AI teachers

Optional — disable TTS for text-only mode; self-hosted VoxCPM2 is free

· 7

5 2

	+40
	+14
	+15
	+9
	-3
	0

5 / 7

passed claim-001

passed claim-002

passed claim-003

passed claim-004

passed claim-005

untested claim-006

untested claim-007

`input_contract`
`output_contract`
`determinism`
`idempotence`
`no_skill_callouts`
`failure_mode_clarity`

`workflow_correctness`
`declared_call_graph`
`stop_conditions`
`handoff_points`
`atom_evidence`
`error_propagation`
`partial_failure_handling`

`goal_achievement`
`direction_judgment`
`quality_judgment`
`meaningful_autonomy`
`handoff_timing`
`observed_call_graph`
`failure_recovery`

core user-facing layer untested → capped at 'usable'
hybrid-repo rule: archetype 'orchestrator' requires end-to-end evaluation of the user-facing layer
evidence_completeness='partial' (not portable) → capped at 'usable'

only 2/3 critical claims covered

archetype: orchestrator→core_layer_tested? False→evidence: partial→recommended: usable→final: usable

ceiling 1 · core user-facing layer untested → capped at 'usable'

ceiling 2 · hybrid-repo rule: archetype 'orchestrator' requires end-to-end evaluation of the user-facing layer

ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'


claim-001	Next.js + React 19 + LangGraph 1.1 真栈一致	critical	tech-stack	● passed
claim-002	5 家 LLM provider 真有 env-var 入口	critical	ai-providers	● passed
claim-003	5 家 TTS provider 真有 env-var 入口	high	tts-providers	● passed
claim-004	自带 eval harness（不只 talk，真有自测）	high	testing-discipline	● passed
claim-005	OpenClaw skill 真存在（README 集成段不是营销词）	high	openclaw-integration	● passed
claim-006	端到端 happy path：一句话 → 真课堂	critical	end-to-end	○ untested
claim-007	AGPL-3.0 + 多 provider 部署成本曲线披露	high	economics	○ untested

0.00s

run-static-checks

2026-05-04

0% — tokens in ? / out ?

run-static-checks

2026-05-04

0% — tokens in ? / out ?

# OpenMAIC — final verdict (2026-05-04)

## Repo

- **Name:** THU-MAIC/OpenMAIC
- **Branch evaluated:** main@HEAD (v0.2.1, JCST'26 paper)
- **Archetype:** orchestrator
- **Layer:** **compound** — LangGraph multi-agent classroom
  generation
- **Eval framework:** repo-evals layer model v1 (f9ed1e9)

## Bucket

**`usable`** — strong static layer with rare positive signals
(in-repo eval harness, well-disclosed multi-provider env, clean
OpenClaw integration). Compound rule caps `usable` until at least one
logged live classroom generation.

## What was evaluated

### Atom + molecule level (static, this run)

| Claim | Status | Notes |
|---|---|---|
| 001 tech stack | passed | next 16.1.2 / react 19.2.3 / langgraph ^1.1.1 / tailwind ^4 — matches README badges |
| 002 5 LLM providers | passed | OpenAI/Anthropic/Google/DeepSeek/Grok all with KEY+BASE_URL+MODELS |
| 003 5 TTS providers | passed | OpenAI/Azure/GLM/Qwen/MiniMax all with KEY+BASE_URL; MiniMax has default endpoint |
| 004 eval harness | passed | 2 named eval scripts (eval:whiteboard + eval:outline-language) reference real tsx runners |
| 005 OpenClaw skill | passed | skills/openmaic/SKILL.md (102 lines) with user-invocable, confirmation-heavy SOP |

### Compound level (deferred)

| Claim | Status | Required |
|---|---|---|
| 006 live classroom generation | untested | open.maic.chat or self-hosted; verify slides + quiz + sim + whiteboard + TTS |
| 007 cost transparency | untested | README to add per-classroom token + TTS cost estimate |

## Real findings worth surfacing

1. **In-repo eval harness is rare and disciplined.** Most "AI demo"
   repos don't ship `eval/`. OpenMAIC has two named evals
   (whiteboard-layout, outline-language) with their own runners and
   a `shared/` for common code. That's a strong testing-intent
   signal.

2. **OpenClaw SOP is safety-conscious.** The skill explicitly says
   "Run one phase at a time and ask for confirmation before each
   state-changing step". This is the right posture for a multi-step
   AI orchestrator that might write files / clone repos / start
   services on the user's machine.

3. **TTS surface is unusually broad.** 5 commercial providers + a
   self-hosted VoxCPM2 (added in v0.2.1) means the classroom doesn't
   degrade silently if one provider has issues — the operator can
   fail over.

4. **Active development cadence.** 4 minor releases in the 6 weeks
   leading up to eval (v0.1.0 through v0.2.1). Healthy for an
   academic-affiliated open-source project.

## Why not higher

`usable` because:

- No live classroom generation logged on this evaluator's machine.
  Compound layer's user value is the multi-agent dance — static
  evidence cannot validate that the agents actually teach
  meaningfully.
- Cost transparency is genuinely missing; non-technical users would
  benefit from a "a 30-min classroom on a typical topic costs roughly
  $X with default config" line.

## Path to `reusable`

1. Run a live classroom on open.maic.chat with a real LLM key.
2. Self-host a fork; verify Vercel one-click deploy works.
3. Try one PDF-upload classroom; verify the OpenClaw skill SOP
   end-to-end.
4. Trigger an LLM-provider failure (revoked key) and verify the
   classroom degrades gracefully.
5. Update claim-006 → `passed`. If the README later adds a cost
   estimate, claim-007 → `passed`. Re-run verdict_calculator.

## Recommended

```yaml
current_bucket: usable
status: evaluated
```