repo·evals
· 2026-05-08 ·v0.5.3 (npm) / main@2026-05-08

HyperFrames

heygen-com/hyperframes

🏭89 / 100
🎯

📝
🗺
01Signal scanning信号发现02Content acquisition内容获取03Content understanding内容理解04Topic curation选题决策05Content production内容生产06Creative assembly创意组装07Distribution & feedback分发反馈08Learning学习
📍
📍
🧬

🛑
0–29
⚠️
30–49
🛠
50–79
🏭
80–100
89
🏭· 89 / 100
  • 8 claims passed, no critical failures
  • MIT / Apache / etc., installable per deployment.install_methods
  • release_pipeline_score=3 + pushed in 90-day window
  • EN-only or ZH-only README
  • compound layer needs a logged scenario run

#1👤
#2🎯
#3🧭
#4

User prompt用户描述Agent picks skillAgent 选 skillHTML compositionHTML 组合Frame AdapterFrame Adapter@hyperframes/engine (Puppeteer)@hyperframes/engine(Puppeteer)@hyperframes/producer (FFmpeg)@hyperframes/producer(FFmpeg)MP4MP4 视频

npx hyperframes init <name>any (Node ≥ 22 + FFmpeg)easy
npx skills add heygen-com/hyperframesClaude Code / Cursor / Codex / Gemini CLIeasy
npm install hyperframesany (Node ≥ 22)easy
  • 📡
FFmpeg
Video encoding (image2pipe → MP4)
Free, must be on PATH
Node.js (≥ 22)
Runtime
Free
Chrome / Chromium (via Puppeteer)
Headless rendering
Free; auto-downloaded by Puppeteer
Kokoro TTS (optional)
Voiceover synthesis (via hyperframes-media)
Local model, free
Whisper (optional)
Audio transcription / caption alignment (via hyperframes-media)
Local model, free
· 10
8 2
+40
+28
+15
+9
-3
0

8 / 10
passed claim-001

passed claim-002

passed claim-003

passed claim-004

passed claim-005

passed claim-006

passed claim-007

passed claim-008

deferred claim-009

deferred claim-010

input_contract
output_contract
determinism
idempotence
no_skill_callouts
failure_mode_clarity

workflow_correctness
declared_call_graph
stop_conditions
handoff_points
atom_evidence
error_propagation
partial_failure_handling

goal_achievement
direction_judgment
quality_judgment
meaningful_autonomy
handoff_timing
observed_call_graph
failure_recovery

  • core user-facing layer untested → capped at 'usable'
  • hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
  • evidence_completeness='partial' (not portable) → capped at 'usable'

  • only 4/5 critical claims covered

archetype: hybrid-skillcore_layer_tested? Falseevidence: partialrecommended: usablefinal: usable
ceiling 1 · core user-facing layer untested → capped at 'usable'
ceiling 2 · hybrid-repo rule: archetype 'hybrid-skill' requires end-to-end evaluation of the user-facing layer
ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'

claim-001npm 包发布且最近更新criticalsupport-distribution● passed
claim-002Monorepo 结构与 README 一致criticalsupport-structure● passed
claim-003skills/ 目录与 README 表一致criticalsupport-skills● passed
claim-004registry/blocks 数量 ≥ 50highsupport-catalog● passed
claim-005许可证 Apache-2.0 已验证criticalsupport-license● passed
claim-006三个 IDE plugin manifest 全部存在highsupport-plugins● passed
claim-00711 个活跃 CI workflow,含回归与跨平台验证highsupport-eval-discipline● passed
claim-008高频维护,近 30 天 100+ commithighsupport-activity● passed
claim-009端到端 agent → MP4 跑通criticalcore-llm-output· deferredStatic eval; no live agent run executed. Requires Node 22 + FFmpeg + Puppeteer + an agent runtime; intentionally out of scope for the first-pass dossier. See repo.yaml.next_step for the upgrade path.
claim-010Frame Adapter 逐帧 seek 一致性highcore-rendering-determinism· deferredStatic eval; no render run executed. Requires running the producer pipeline locally and frame-extracting both preview and MP4.

0%
0.00s
0

# Final Verdict

## Repo

- **Name**:
- **Version tested**:
- **Date**:
- **Archetype**:
- **Layer**: (atom | molecule | compound)
- **Score**:  /100  (from `verdict_calculator.py`, not judgement)
- **Category**:  (🏭 Production-ready / 🛠 Available / ⚠️ Risky / 🛑 Don't use)
- **Tier**: (recommend ≥90 / team ≥80 / self ≥65 / try ≥50 / risky ≥30 / broken <30)

## Plain English

Two sentences max. What does the user get if they adopt this repo today, and what would make them regret it?

- Outcome if adopted:
- Regret scenario:

## Why This Score

State the user-visible outcome first, mechanism second. Lead with what the repo *does* for the user, then the evidence.

### Top 3 score drivers

What earned or cost the most points. Reference `breakdown` from the calculator output.

- +/- :
- +/- :
- +/- :

### Core outcome
What observably works end-to-end? What observably does not?

### Scenario breadth
How many real inputs has it been tested against? Which dimensions vary (platform, data shape, scale)?

### Repeatability
Same input twice → same result? Filesystem-level or only log-level?

### Failure transparency
When it fails, do you learn something actionable, or does it swallow the error?

## What Would Move The Score Up

Concrete, testable next actions in score-impact order. Not "be better" — "add X test against Y fixture showing Z (lifts ~+N)".

1. (~+N)
2. (~+N)
3. (~+N)

## Remaining Risks

Ranked. Each risk with severity + impact + mitigation if known.

| Risk | Severity | Impact | Mitigation |
|---|---|---|---|

## Related Artifacts

- Claim map:
- Plan:
- Runs:
- Verdict calculator input:
- Rendered HTML dossier: