v1.0.2

Ralph Loops

qlifebot-coder qlifebot-coder ← All skills

Runs autonomous iterative AI loops for requirements, planning, or building phases using structured prompts and fresh context per iteration.

Downloads
1.7k
Stars
0
Versions
3
Updated
2026-02-24

Install

npx clawhub@latest install ralph-loops

Documentation

Ralph Loops Skill

> First time? Read [SETUP.md](./SETUP.md) first to install dependencies and verify your setup.

Autonomous AI agent loops for iterative development. Based on Geoffrey Huntley's Ralph Wiggum technique, as documented by Clayton Farr.

Script: skills/ralph-loops/scripts/ralph-loop.mjs Dashboard: skills/ralph-loops/dashboard/ (run with node server.mjs) Templates: skills/ralph-loops/templates/ Archive: ~/clawd/logs/ralph-archive/

---

⚠️ Known Issues

Claude Code Version Compatibility

Claude Code 2.1.29 has a critical bug that spawns orphaned sub-agents consuming 99% CPU. Iterations fail with "exit code null" on first run. Fix: Downgrade to 2.1.25:
npm install -g @anthropic-ai/claude-code@2.1.25
Verify:
claude --version  # Should show 2.1.25

This was discovered 2026-02-01. Check if newer versions fix the issue before upgrading.

---

⚠️ Don't Block the Conversation!

When running a Ralph loop, don't monitor it synchronously. The loop runs as a separate Claude CLI process — you can keep chatting.

❌ Wrong (blocks conversation):
Start loop → sleep 60 → poll → sleep 60 → poll → ... (6 minutes of silence)
✅ Right (stays responsive):
Start loop → "It's running, I'll check periodically" → keep chatting → check on heartbeats
How to monitor without blocking:

1. Start the loop with node ralph-loop.mjs ... (runs in background)

2. Tell human: "Loop running. I'll check progress periodically or you can ask."

3. Check via process poll <sessionId> when asked or during heartbeats

4. Use the dashboard at http://localhost:3939 for real-time visibility

The loop is autonomous — that's the whole point. Don't babysit it at the cost of ignoring your human.

---

Trigger Phrases

When human says:

| Phrase | Action |

|--------|--------|

| "Interview me about system X" | Start Phase 1 requirements interview |

| "Start planning system X" | Run ./loop.sh plan (needs specs first) |

| "Start building system X" | Run ./loop.sh build (needs plan first) |

| "Ralph loop over X" | ASK which phase (see below) |

When Human Says "Ralph Loop" — Clarify the Phase!

Don't assume which phase. Ask:

> "Which type of Ralph loop are we doing?

>

> 1️⃣ Interview — I'll ask you questions to build specs (Phase 1)

> 2️⃣ Planning — I'll iterate on an implementation plan (Phase 2)

> 3️⃣ Building — I'll implement from a plan, one task per iteration (Phase 3)

> 4️⃣ Generic — Simple iterative refinement on a single topic"

Then proceed based on their answer:

| Choice | Action |

|--------|--------|

| Interview | Use templates/requirements-interview.md protocol |

| Planning | Need specs first → run planning loop with PROMPT_plan.md |

| Building | Need plan first → run build loop with PROMPT_build.md |

| Generic | Create prompt file, run ralph-loop.mjs directly |

Generic Ralph Loop Flow (Phase 4)

For simple iterative refinement (not full system builds):

1. Clarify the task — What exactly should be improved/refined?

2. Create a prompt file — Save to /tmp/ralph-prompt-<task>.md

3. Set completion criteria — What signals "done"?

4. Run the loop:

   node skills/ralph-loops/scripts/ralph-loop.mjs \

--prompt "/tmp/ralph-prompt-<task>.md" \

--model opus \

--max 10 \

--done "RALPH_DONE"

5. Or spawn as sub-agent for long-running tasks

---

Core Philosophy

> "Human roles shift from 'telling the agent what to do' to 'engineering conditions where good outcomes emerge naturally through iteration."

> — Clayton Farr

Three principles drive everything:

1. Context is scarce — With ~176K usable tokens from a 200K window, keep each iteration lean

2. Plans are disposable — A drifting plan is cheaper to regenerate than salvage

3. Backpressure beats direction — Engineer environments where wrong outputs get rejected automatically

---

Three-Phase Workflow

┌─────────────────────────────────────────────────────────────────────┐

│ Phase 1: REQUIREMENTS │

│ Human + LLM conversation → JTBD → Topics → specs/*.md │

├─────────────────────────────────────────────────────────────────────┤

│ Phase 2: PLANNING │

│ Gap analysis (specs vs code) → IMPLEMENTATION_PLAN.md │

├─────────────────────────────────────────────────────────────────────┤

│ Phase 3: BUILDING │

│ One task per iteration → fresh context → backpressure → commit │

└─────────────────────────────────────────────────────────────────────┘

Phase 1: Requirements (Talk to Human)

Goal: Understand what to build BEFORE building it.

This is the most important phase. Use structured conversation to:

1. Identify Jobs to Be Done (JTBD)

- What user need or outcome are we solving?

- Not features — outcomes

2. Break JTBD into Topics of Concern

- Each topic = one distinct aspect/component

- Use the "one sentence without 'and'" test

- ✓ "The color extraction system analyzes images to identify dominant colors"

- ✗ "The user system handles authentication, profiles, and billing" → 3 topics

3. Create Specs for Each Topic

- One markdown file per topic in specs/

- Capture requirements, acceptance criteria, edge cases

Template: templates/requirements-interview.md

Phase 2: Planning (Gap Analysis)

Goal: Create a prioritized task list without implementing anything.

Uses PROMPT_plan.md in the loop:

  • -Study all specs
  • -Study existing codebase
  • -Compare specs vs code (gap analysis)
  • -Generate IMPLEMENTATION_PLAN.md with prioritized tasks
  • -NO implementation — planning only

Usually completes in 1-2 iterations.

Phase 3: Building (One Task Per Iteration)

Goal: Implement tasks one at a time with fresh context.

Uses PROMPT_build.md in the loop:

1. Read IMPLEMENTATION_PLAN.md

2. Pick the most important task

3. Investigate codebase (don't assume not implemented)

4. Implement

5. Run validation (backpressure)

6. Update plan, commit

7. Exit → fresh context → next iteration

Key insight: One task per iteration keeps context lean. The agent stays in the "smart zone" instead of accumulating cruft. Why fresh context matters:
  • -No accumulated mistakes — Each iteration starts clean; previous errors don't compound
  • -Full context budget — 200K tokens for THIS task, not shared with finished work
  • -Reduced hallucination — Shorter contexts = more grounded responses
  • -Natural checkpoints — Each commit is a save point; easy to revert single iterations

---

File Structure

project/

├── loop.sh # Ralph loop script

├── PROMPT_plan.md # Planning mode instructions

├── PROMPT_build.md # Building mode instructions

├── AGENTS.md # Operational guide (~60 lines max)

├── IMPLEMENTATION_PLAN.md # Prioritized task list (generated)

└── specs/ # Requirement specs

├── topic-a.md

├── topic-b.md

└── ...

File Purposes

| File | Purpose | Who Creates |

|------|---------|-------------|

| specs/*.md | Source of truth for requirements | Human + Phase 1 |

| PROMPT_plan.md | Instructions for planning mode | Copy from template |

| PROMPT_build.md | Instructions for building mode | Copy from template |

| AGENTS.md | Build/test/lint commands | Human + Ralph |

| IMPLEMENTATION_PLAN.md | Task list with priorities | Ralph (Phase 2) |

Project Organization (Systems)

For Clawdbot systems, each Ralph project lives in <workspace>/systems/<name>/:

systems/

├── health-tracker/ # Example system

│ ├── specs/

│ │ ├── daily-tracking.md

│ │ └── test-scheduling.md

│ ├── PROMPT_plan.md

│ ├── PROMPT_build.md

│ ├── AGENTS.md

│ ├── IMPLEMENTATION_PLAN.md # ← exists = past Phase 1

│ └── src/

└── activity-planner/

├── specs/ # ← empty = still in Phase 1

└── ...

Phase Detection (Auto)

Detect current phase by checking what files exist:

| What Exists | Current Phase | Next Action |

|-------------|---------------|-------------|

| Nothing / empty specs/ | Phase 1: Requirements | Run requirements interview |

| specs/*.md but no IMPLEMENTATION_PLAN.md | Ready for Phase 2 | Run ./loop.sh plan |

| specs/*.md + IMPLEMENTATION_PLAN.md | Phase 2 or 3 | Review plan, run ./loop.sh build |

| Plan shows all tasks complete | Done | Archive or iterate |

Quick check:

What phase are we in?

[ -z "$(ls specs/ 2>/dev/null)" ] && echo "Phase 1: Need specs" && exit

[ ! -f IMPLEMENTATION_PLAN.md ] && echo "Phase 2: Need plan" && exit

echo "Phase 3: Ready to build (or done)"

---

JTBD Breakdown

The hierarchy matters:

JTBD (Job to Be Done)

└── Topic of Concern (1 per spec file)

└── Tasks (many per topic, in IMPLEMENTATION_PLAN.md)

Example:
  • -JTBD: "Help designers create mood boards"
  • -Topics:
- Image collection → specs/image-collection.md

- Color extraction → specs/color-extraction.md

- Layout system → specs/layout-system.md

- Sharing → specs/sharing.md

  • -Tasks: Each spec generates multiple implementation tasks

Topic Scope Test

> Can you describe the topic in one sentence without "and"?

If you need "and" or "also", it's probably multiple topics. Split it.

When to split:
  • -Multiple verbs in the description → separate topics
  • -Different user personas involved → separate topics
  • -Could be implemented by different teams → separate topics
  • -Has its own failure modes → probably its own topic
Example split:
❌ "User management handles registration, authentication, profiles, and permissions"

✅ Split into:

- "Registration creates new user accounts from email/password"

- "Authentication verifies user identity via login flow"

- "Profiles let users view and edit their information"

- "Permissions control what actions users can perform"

Counter-example (don't split):
✅ Keep together:

"Color extraction analyzes images and returns dominant color palettes"

Why: "analyzes" and "returns" are steps in one operation, not separate concerns.

---

Backpressure Mechanisms

Autonomous loops converge when wrong outputs get rejected. Three layers:

1. Downstream Gates (Hard)

Tests, type-checking, linting, build validation. Deterministic.

In AGENTS.md

Validation

  • -Tests: npm test
  • -Typecheck: npm run typecheck
  • -Lint: npm run lint

2. Upstream Steering (Soft)

Existing code patterns guide the agent. It discovers conventions through exploration.

3. LLM-as-Judge (Subjective)

For subjective criteria (tone, UX, aesthetics), use another LLM call with binary pass/fail.

> Start with hard gates. Add LLM-as-judge for subjective criteria only after mechanical backpressure works.

---

Prompt Structure

Geoffrey's prompts follow a numbered pattern:

| Section | Purpose |

|---------|---------|

| 0a-0d | Orient: Study specs, source, current plan |

| 1-4 | Main instructions: What to do this iteration |

| 999+ | Guardrails: Invariants (higher number = more critical) |

The Numbered Guardrails Pattern

Guardrails use escalating numbers (99999, 999999, 9999999...) to signal priority:

99999. Important: Capture the why in documentation.

999999. Important: Single sources of truth, no migrations.

9999999. Create git tags after successful builds.

99999999. Add logging if needed to debug.

999999999. Keep IMPLEMENTATION_PLAN.md current.

Why this works:

1. Visual prominence — Large numbers stand out, harder to skip

2. Implicit priority — More 9s = more critical (like DEFCON levels in reverse)

3. No collisions — Sparse numbering lets you insert new rules without renumbering

4. Mnemonic — Claude treats these as invariants, not suggestions

The "Important:" prefix is deliberate — it triggers Claude's attention.

Key Language Patterns

Use Geoffrey's specific phrasing — it matters:

  • -"study" (not "read" or "look at")
  • -"don't assume not implemented" (critical!)
  • -"using parallel subagents" / "up to N subagents"
  • -"only 1 subagent for build/tests" (backpressure control)
  • -"Ultrathink" (deep reasoning trigger)
  • -"capture the why"
  • -"keep it up to date"
  • -"resolve them or document them"

---

Quick Start

1. Set Up Project Structure

mkdir -p myproject/specs

cd myproject

git init # Ralph expects git for commits

Copy templates

cp .//templates/PROMPT_plan.md .

cp .//templates/PROMPT_build.md .

cp .//templates/AGENTS.md .

cp .//templates/loop.sh .

chmod +x loop.sh

2. Customize Templates (Required!)

PROMPT_plan.md — Replace [PROJECT_GOAL] with your actual goal:

Before:

ULTIMATE GOAL: We want to achieve [PROJECT_GOAL].

After:

ULTIMATE GOAL: We want to achieve a fully functional mood board app with image upload and color extraction.

PROMPT_build.md — Adjust source paths if not using src/:

Before:

0c. For reference, the application source code is in src/*.

After:

0c. For reference, the application source code is in lib/*.

AGENTS.md — Update build/test/lint commands for your stack.

3. Phase 1: Requirements Gathering (Don't Skip!)

This phase happens WITH the human. Use the interview template:

cat .//templates/requirements-interview.md
The workflow:

1. Discuss the JTBD (Job to Be Done) — outcomes, not features

2. Break into Topics of Concern (each passes the "one sentence" test)

3. Write a spec file for each topic: specs/topic-name.md

4. Human reviews and approves specs

Example output:
specs/

├── image-collection.md

├── color-extraction.md

├── layout-system.md

└── sharing.md

4. Phase 2: Planning

./loop.sh plan

Wait for IMPLEMENTATION_PLAN.md to be generated (usually 1-2 iterations). Review it — this is your task list.

5. Phase 3: Building

./loop.sh build 20  # Max 20 iterations

Watch it work. Add backpressure (tests, lints) as patterns emerge. Check commits for progress.

---

Loop Script Options

./loop.sh              # Build mode, unlimited

./loop.sh 20 # Build mode, max 20 iterations

./loop.sh plan # Plan mode, unlimited

./loop.sh plan 5 # Plan mode, max 5 iterations

Or use the Node.js wrapper for more control:

node skills/ralph-loops/scripts/ralph-loop.mjs \

--prompt "./PROMPT_build.md" \

--model opus \

--max 20 \

--done "RALPH_DONE"

---

When to Regenerate the Plan

Plans drift. Regenerate when:

  • -Ralph is going off track (implementing wrong things)
  • -Plan feels stale or doesn't match current state
  • -Too much clutter from completed items
  • -You've made significant spec changes
  • -You're confused about what's actually done

Just switch back to planning mode:

./loop.sh plan

Regeneration cost is one Planning loop. Cheap compared to Ralph going in circles.

---

Safety

Ralph requires --dangerously-skip-permissions to run autonomously. This bypasses Claude's permission system entirely.

Philosophy: "It's not if it gets popped, it's when. And what is the blast radius?" Protections:
  • -Run in isolated environments (Docker, VM)
  • -Only the API keys needed for the task
  • -No access to private data beyond requirements
  • -Restrict network connectivity where possible
  • -Escape hatches: Ctrl+C stops the loop; git reset --hard reverts uncommitted changes

---

Cost Expectations

| Task Type | Model | Iterations | Est. Cost |

|-----------|-------|------------|-----------|

| Generate plan | Opus | 1-2 | $0.50-1.00 |

| Implement simple feature | Opus | 3-5 | $1.00-2.00 |

| Implement complex feature | Opus | 10-20 | $3.00-8.00 |

| Full project buildout | Opus | 50+ | $15-50+ |

Tip: Use Sonnet for simpler tasks where plan is clear. Use Opus for planning and complex reasoning.

---

Real-World Results

From Geoffrey Huntley:

  • -6 repos generated overnight at YC hackathon
  • -$50k contract completed for $297 in API costs
  • -Created entire programming language over 3 months

---

Advanced: Running as Sub-Agent

For long loops, spawn as sub-agent so main session stays responsive:

sessions_spawn({

task: cd /path/to/project && ./loop.sh build 20

Summarize what was implemented when done.,

label: "ralph-build",

model: "opus"

})

Check progress:

sessions_list({ kinds: ["spawn"] })

sessions_history({ label: "ralph-build", limit: 5 })

---

Troubleshooting

Ralph keeps implementing the same thing

  • -Plan is stale → regenerate with ./loop.sh plan
  • -Backpressure missing → add tests that catch duplicates

Ralph goes in circles

  • -Add more specific guardrails to prompts
  • -Check if specs are ambiguous
  • -Regenerate plan

Context getting bloated

  • -Ensure one task per iteration (check prompt)
  • -Keep AGENTS.md under 60 lines
  • -Move status/progress to IMPLEMENTATION_PLAN.md, not AGENTS.md

Tests not running

  • -Check AGENTS.md has correct validation commands
  • -Ensure backpressure section in prompt references AGENTS.md

---

Edge Cases

Projects Without Git

The loop script expects git for commits and pushes. For projects without version control:

Option 1: Initialize git anyway (recommended)
git init

git add -A

git commit -m "Initial commit before Ralph"

Option 2: Modify the prompts
  • -Remove git-related guardrails from PROMPT_build.md
  • -Remove the git push section from loop.sh
  • -Use file backups instead: add cp -r src/ backups/iteration-$ITERATION/ to loop.sh
Option 3: Use tarball snapshots

Add to loop.sh before each iteration:

tar -czf "snapshots/pre-iteration-$ITERATION.tar.gz" src/

Very Large Codebases

For codebases with 100K+ lines:

  • -Reduce subagent parallelism: Change "up to 500 parallel Sonnet subagents" to "up to 50" in prompts
  • -Scope narrowly: Use focused specs that target specific directories
  • -Add path restrictions: In AGENTS.md, note which directories are in-scope
  • -Consider workspace splitting: Treat large modules as separate Ralph projects

When Claude CLI Isn't Available

The methodology works with any Claude interface:

Claude API directly:

Replace loop.sh with API calls using curl or a script

curl https://api.anthropic.com/v1/messages \

-H "x-api-key: $ANTHROPIC_API_KEY" \

-H "content-type: application/json" \

-d '{"model": "claude-sonnet-4-20250514", "max_tokens": 8192, "messages": [...]}'

Alternative agents:
  • -Aider: aider --opus --auto-commits
  • -Continue.dev: Use with Claude API key
  • -Cursor: Composer mode with PROMPT files as context

The key principles (one task per iteration, fresh context, backpressure) apply regardless of tooling.

Non-Node.js Projects

Adapt AGENTS.md for your stack:

| Stack | Build | Test | Lint |

|-------|-------|------|------|

| Python | pip install -e . | pytest | ruff . |

| Go | go build ./... | go test ./... | golangci-lint run |

| Rust | cargo build | cargo test | cargo clippy |

| Ruby | bundle install | rspec | rubocop |

Also update path references in prompts (src/* → your source directory).

---

Learn More

  • -Geoffrey Huntley: https://ghuntley.com/ralph/
  • -Clayton Farr's Playbook: https://github.com/ClaytonFarr/ralph-playbook
  • -Geoffrey's Fork: https://github.com/ghuntley/how-to-ralph-wiggum

---

Credits

Built by Johnathan & Q — a human-AI dyad.

  • -Twitter: [@spacepixel](https://x.com/spacepixel)
  • -ClawdHub: [clawhub.ai/skills/ralph-loops](https://www.clawhub.ai/skills/ralph-loops)

Launch an agent with Ralph Loops on Termo.