· 2026-05-04 ·v1.11.0 (release page) / main@HEAD

RedBox

Jamailar/RedBox

🛠58 / 100

✅

⚠

🎯

⚠

🗺

📍xiaohongshu

⚛

→

⚗

→

🧬

🛑

0–29

⚠️

30–49

🛠

50–79

🏭

80–100

▼

🛠· 58 / 100

✓3 claims passed, no critical failures
✓MIT / Apache / etc., installable per deployment.install_methods
✓release_pipeline_score=3 + pushed in 90-day window
✓multilingual_readme=true
⚪compound layer needs a logged scenario run

#1👤

#2🎯

#3🧭

#4⇄

autoclaw-cc/xiaohongshu-skills

🛠 · 59molecule

dreammis/social-auto-upload

⚠️ · 42molecule


`Download signed DMG / EXE / DEB`	macOS / Windows / Linux × aarch64+amd64+x86	easy

🌐

OpenAI / Anthropic / Google (any compatible)

LLM for content generation + analysis

Configure endpoint + key + model in Settings; Vercel AI SDK v6 supports openai-compatible too

Xiaohongshu (real account)

RedClaw automation drives user's session

Use companion browser extension

Image / video generation provider (e.g. GPT-image-2)

Cover image + short-video generation in the creation page

Pay-per-generation; depends on which provider you point at

· 7

2 1 4

	+40
	+9
	+12
	0
	-3
	0

3 / 7

passed claim-001

passed claim-002

passed claim-003

untested claim-004

untested claim-005

untested claim-006

untested claim-007

`input_contract`
`output_contract`
`determinism`
`idempotence`
`no_skill_callouts`
`failure_mode_clarity`

`workflow_correctness`
`declared_call_graph`
`stop_conditions`
`handoff_points`
`atom_evidence`
`error_propagation`
`partial_failure_handling`

`goal_achievement`
`direction_judgment`
`quality_judgment`
`meaningful_autonomy`
`handoff_timing`
`observed_call_graph`
`failure_recovery`

core user-facing layer untested → capped at 'usable'
hybrid-repo rule: archetype 'orchestrator' requires end-to-end evaluation of the user-facing layer
evidence_completeness='partial' (not portable) → capped at 'usable'

only 3/5 critical claims covered

archetype: orchestrator→core_layer_tested? False→evidence: partial→recommended: usable→final: usable

ceiling 1 · core user-facing layer untested → capped at 'usable'

ceiling 2 · hybrid-repo rule: archetype 'orchestrator' requires end-to-end evaluation of the user-facing layer

ceiling 3 · evidence_completeness='partial' (not portable) → capped at 'usable'


claim-001	跨平台安装包真实存在且版本一致	critical	distribution	● passed
claim-002	浏览器插件 manifest 与 README 抓取范围声明一致	critical	browser-capture	◐ partial
claim-003	桌面端走 Vercel AI SDK，支持自定义 endpoint/key/model	critical	ai-providers	● passed
claim-004	端到端创作流程：捕获 → 知识库 → 编辑器 → 配图	critical	end-to-end	○ untested
claim-005	RedClaw 自动化能在单 session 内独立完成任务	critical	redclaw-automation	○ untested
claim-006	后台调度任务确实持续运行	high	scheduling	○ untested
claim-007	失败模式对用户友好（API key 缺失 / 模型不可达 / 抓取站点改版）	high	error-propagation	○ untested

0.00s

run-static-checks

2026-05-04

0% — tokens in ? / out ?

run-static-checks

2026-05-04

0% — tokens in ? / out ?

# RedBox — final verdict (2026-05-04)

## Repo

- **Name:** Jamailar/RedBox
- **Release evaluated:** v1.11.0 (browser-extension v1.9.7)
- **Archetype:** orchestrator
- **Layer:** **compound** — RedClaw automation console runs LLM-driven
  multi-step tasks; background scheduler keeps long-running work alive
- **Eval framework version:** repo-evals layer model v1 (cee2351)

## Bucket

**`usable`** — capped by the compound-layer ceiling rule.

The static layer is in good shape and the distribution / provider /
extension foundations all check out. But the user-facing value
proposition (creation flow, RedClaw automation, background scheduling,
failure-mode UX) is compound-level and has zero logged scenarios on
this evaluator's machine. Per `docs/LAYERS.md`, compound cannot exceed
`usable` without ≥1 logged scenario, and cannot exceed `reusable`
without ≥3.

## What was evaluated

### Atom + molecule level (static, this run)

| Claim | Status | Notes |
|---|---|---|
| 001 distribution | passed | All 7 assets resolve, sizes 14–24 MB (small for Electron — heavy assets likely deferred per build script) |
| 002 capture coverage | passed_with_concerns | 9/10 platforms covered; **YouTube missing from `host_permissions` despite manifest description listing it** |
| 003 ai providers | passed | Vercel `ai` v6 + Anthropic + OpenAI + openai-compatible + Google; Electron 39.6.0 |

### Compound level (deferred)

| Claim | Status | Required for promotion |
|---|---|---|
| 004 end-to-end creation flow | untested | install + provider key + run a real article through workspace |
| 005 RedClaw single-session autonomy | untested | live RedClaw session with multi-step task |
| 006 background scheduling | untested | scheduled task that survives window close |
| 007 user-friendly failure modes | untested | deliberately broken inputs at three layers |

## Real bugs / mismatches surfaced

1. **YouTube capture promised but unimplemented.** The browser-extension
   manifest's own `description` field lists YouTube alongside the other
   capture sources, but `host_permissions` has no `*.youtube.com`
   entries. A user attempting to capture from YouTube will silently
   fail to inject content scripts. Either add the host permission or
   remove YouTube from the description.

2. **Desktop package version lags release tag (cosmetic).**
   `desktop/package.json` is at `1.9.0` while the release tag is
   `v1.11.0`. Not user-visible during install, but a sign the release
   pipeline is not bumping the package version automatically.

## Why not higher

`usable` is the right ceiling now because:

- The framework's compound rule explicitly caps at `usable` until ≥1
  scenario passes, and at `reusable` until ≥3 — same logic that caps
  hybrid-skill repos with untested LLM layers.
- Even ignoring layers, claim-002 has a real defect (YouTube capture)
  that should not be papered over by averaging.
- Single-evaluator, single-OS, single-day pass — even a clean compound
  scenario would not justify `recommendable` until repeated by other
  operators on other OSes.

## Path to `reusable`

Run the four compound experiments rendered on the dashboard
(`dashboard/repos/Jamailar--RedBox.html`). Each is a system prompt + a
"watch for" list. Log the result in
`repos/Jamailar--RedBox/runs/<date>/run-<scenario>/business-notes.md`
and update the matching claim's `status` in `claims/claim-map.yaml`.
After three pass with full evidence, re-run `verdict_calculator.py`
and the bucket can move to `reusable`.

## Recommended bucket

```yaml
current_bucket: usable
status: evaluated
```