#1
·
2026-05-04
·v1.11.0 (release page) / main@HEAD
RedBox
Jamailar/RedBox
🛠58 / 100
🗺
📍xiaohongshu
⚛
→
⚗
→
🧬
🛑
0–29
⚠️
30–49
🛠
50–79
🏭
80–100
▼
58
🛠· 58 / 100
- ✓3 claims passed, no critical failures
- ✓MIT / Apache / etc., installable per deployment.install_methods
- ✓release_pipeline_score=3 + pushed in 90-day window
- ✓multilingual_readme=true
- ⚪compound layer needs a logged scenario run
#2
#3
#4
Download signed DMG / EXE / DEB | macOS / Windows / Linux × aarch64+amd64+x86 | easy |
OpenAI / Anthropic / Google (any compatible)
LLM for content generation + analysis
Configure endpoint + key + model in Settings; Vercel AI SDK v6 supports openai-compatible too
Xiaohongshu (real account)
RedClaw automation drives user's session
Use companion browser extension
Image / video generation provider (e.g. GPT-image-2)
Cover image + short-video generation in the creation page
Pay-per-generation; depends on which provider you point at
· 7
2 1 4
| +40 | |
| +9 | |
| +12 | |
| 0 | |
| -3 | |
| 0 |
3 / 7
passed claim-001
passed claim-002
passed claim-003
untested claim-004
untested claim-005
untested claim-006
untested claim-007
input_contract | |
|---|---|
output_contract | |
determinism | |
idempotence | |
no_skill_callouts | |
failure_mode_clarity |
workflow_correctness | |
|---|---|
declared_call_graph | |
stop_conditions | |
handoff_points | |
atom_evidence | |
error_propagation | |
partial_failure_handling |
goal_achievement | |
|---|---|
direction_judgment | |
quality_judgment | |
meaningful_autonomy | |
handoff_timing | |
observed_call_graph | |
failure_recovery |
- core user-facing layer untested → capped at 'usable'
- hybrid-repo rule: archetype 'orchestrator' requires end-to-end evaluation of the user-facing layer
- evidence_completeness='partial' (not portable) → capped at 'usable'
- only 3/5 critical claims covered
archetype: orchestrator→core_layer_tested? False→evidence: partial→recommended: usable→final: usable
ceiling 1 · core user-facing layer untested → capped at 'usable'
ceiling 2 · hybrid-repo rule: archetype 'orchestrator' requires end-to-end evaluation of the user-facing layer
ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'
| claim-001 | 跨平台安装包真实存在且版本一致 | critical | distribution | ● passed | |
| claim-002 | 浏览器插件 manifest 与 README 抓取范围声明一致 | critical | browser-capture | ◐ partial | |
| claim-003 | 桌面端走 Vercel AI SDK,支持自定义 endpoint/key/model | critical | ai-providers | ● passed | |
| claim-004 | 端到端创作流程:捕获 → 知识库 → 编辑器 → 配图 | critical | end-to-end | ○ untested | |
| claim-005 | RedClaw 自动化能在单 session 内独立完成任务 | critical | redclaw-automation | ○ untested | |
| claim-006 | 后台调度任务确实持续运行 | high | scheduling | ○ untested | |
| claim-007 | 失败模式对用户友好(API key 缺失 / 模型不可达 / 抓取站点改版) | high | error-propagation | ○ untested |
0%
0.00s
0
run-static-checks
2026-05-04
0% — tokens in ? / out ?
run-static-checks
2026-05-04
0% — tokens in ? / out ?
# RedBox — final verdict (2026-05-04) ## Repo - **Name:** Jamailar/RedBox - **Release evaluated:** v1.11.0 (browser-extension v1.9.7) - **Archetype:** orchestrator - **Layer:** **compound** — RedClaw automation console runs LLM-driven multi-step tasks; background scheduler keeps long-running work alive - **Eval framework version:** repo-evals layer model v1 (cee2351) ## Bucket **`usable`** — capped by the compound-layer ceiling rule. The static layer is in good shape and the distribution / provider / extension foundations all check out. But the user-facing value proposition (creation flow, RedClaw automation, background scheduling, failure-mode UX) is compound-level and has zero logged scenarios on this evaluator's machine. Per `docs/LAYERS.md`, compound cannot exceed `usable` without ≥1 logged scenario, and cannot exceed `reusable` without ≥3. ## What was evaluated ### Atom + molecule level (static, this run) | Claim | Status | Notes | |---|---|---| | 001 distribution | passed | All 7 assets resolve, sizes 14–24 MB (small for Electron — heavy assets likely deferred per build script) | | 002 capture coverage | passed_with_concerns | 9/10 platforms covered; **YouTube missing from `host_permissions` despite manifest description listing it** | | 003 ai providers | passed | Vercel `ai` v6 + Anthropic + OpenAI + openai-compatible + Google; Electron 39.6.0 | ### Compound level (deferred) | Claim | Status | Required for promotion | |---|---|---| | 004 end-to-end creation flow | untested | install + provider key + run a real article through workspace | | 005 RedClaw single-session autonomy | untested | live RedClaw session with multi-step task | | 006 background scheduling | untested | scheduled task that survives window close | | 007 user-friendly failure modes | untested | deliberately broken inputs at three layers | ## Real bugs / mismatches surfaced 1. **YouTube capture promised but unimplemented.** The browser-extension manifest's own `description` field lists YouTube alongside the other capture sources, but `host_permissions` has no `*.youtube.com` entries. A user attempting to capture from YouTube will silently fail to inject content scripts. Either add the host permission or remove YouTube from the description. 2. **Desktop package version lags release tag (cosmetic).** `desktop/package.json` is at `1.9.0` while the release tag is `v1.11.0`. Not user-visible during install, but a sign the release pipeline is not bumping the package version automatically. ## Why not higher `usable` is the right ceiling now because: - The framework's compound rule explicitly caps at `usable` until ≥1 scenario passes, and at `reusable` until ≥3 — same logic that caps hybrid-skill repos with untested LLM layers. - Even ignoring layers, claim-002 has a real defect (YouTube capture) that should not be papered over by averaging. - Single-evaluator, single-OS, single-day pass — even a clean compound scenario would not justify `recommendable` until repeated by other operators on other OSes. ## Path to `reusable` Run the four compound experiments rendered on the dashboard (`dashboard/repos/Jamailar--RedBox.html`). Each is a system prompt + a "watch for" list. Log the result in `repos/Jamailar--RedBox/runs/<date>/run-<scenario>/business-notes.md` and update the matching claim's `status` in `claims/claim-map.yaml`. After three pass with full evidence, re-run `verdict_calculator.py` and the bucket can move to `reusable`. ## Recommended bucket ```yaml current_bucket: usable status: evaluated ```