top of page

Loop Engineering

  • 3 hours ago
  • 10 min read

Stop prompting your AI coding agent. Start designing the system that prompts it for you. Here's the new meta — what it is, how it works, and why it changes the job.

Diagram of the loop engineering cycle: five stages — discover, assign, verify, persist, and decide next — arranged in a circle around a central "The Loop," running on a schedule.
The Loop

What is loop engineering?

Loop engineering is the practice of replacing yourself as the person who prompts the agent — and designing the system that does the prompting instead.

For about two years, getting value out of a coding agent meant one thing: you wrote a good prompt, gave it enough context, read what came back, and typed the next thing. The agent was a tool and you were holding it the entire time, one turn after another. That era is ending.


In its place you build a small system that finds the work, hands it out, checks the result, writes down what's done, and decides the next thing — and you let that system poke the agents instead of you. As Addy Osmani puts it, “a loop here can be thought of as a recursive goal where you define a purpose and the AI iterates until complete.”


”You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents.”— Peter Steinberger, creator of OpenClaw
“I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.”— Boris Cherny, head of Claude Code at Anthropic

This isn't just rhetoric from two practitioners. The same primitives have shipped inside the major coding products at almost the same time, which means the “shape” of a loop is becoming tool-agnostic. Once you notice the shape is the same in Codex, Claude Code, and Grok, you stop arguing about which tool and just design a loop that works in any of them.


The shift: from a chain you drive to a loop that drives itself

The clearest way to understand loop engineering is to compare it with what came before. A chain is linear and you are the engine: step A leads to step B leads to step C, and you supply each next instruction. A loop is dynamic and self-driving: the system acts, observes the result, reasons about it, and decides the next move — repeating until a goal is genuinely met.


Comparison diagram. Top: the old way, a linear chain where you prompt, the agent replies, you read, you prompt again — you are the loop. Bottom: the new way, a self-driving loop of find work, act, observe, verify, decide next, repeating until the goal is met.
A chain is predictable and easy to trace, but you supply every step. A loop is flexible and autonomous — built for tasks where the right path isn't known upfront, like debugging or feature work across many files.

This matters more for coding than almost any other task, because coding is naturally iterative. Even expert engineers don't write perfect code on the first try — they run it, read the error, fix it, run it again. An agent that generates code once and stops is fundamentally limited: it can't catch runtime errors, adapt to environment quirks, or confirm its output actually works. The loop is what closes that feedback gap.

Where it sits: three layers of the agent stack

Loop engineering didn't appear in a vacuum. The AI vocabulary has moved fast — first context engineering, then harness engineering, and now loop engineering. They stack, each solving a different problem.

Stacked-layer diagram of the AI agent stack: context engineering (one turn) at the base, harness engineering (one run) in the middle, and loop engineering (the full system) on top.
The harness equips a single agent run; the loop sits one floor above it — it keeps poking agents on a schedule, spawns sub-agents, and feeds itself.

The key insight: a year ago, building a loop meant writing a pile of bash scripts you'd maintain forever. Today the pieces ship inside the products themselves. That's why loop engineering is suddenly accessible — you assemble primitives rather than build infrastructure from scratch.


The anatomy of a well-engineered loop

Not all loops are equal. A poorly designed one wastes tokens, runs forever, or hallucinates progress. A well-designed one is efficient, terminates correctly, and produces reliable output. A solid loop has five internal ingredients:


Goal

A clear definition of "done"

Specific enough to evaluate, broken into testable sub-tasks. “Make all unit tests pass” works; “make the app better” produces infinite loops.

Tools

A toolset to act & observe

Code execution, file access, a shell, test runners, docs lookup. If the agent can't run its own code, the loop is just guessing.

Context

Context management

Summarize past iterations into compact memory, log attempts and outcomes, prune the irrelevant — or hit token limits fast.

Termination

Termination logic

Success conditions, failure conditions (max iterations, repeated errors), and escalation paths. A first-class design requirement, not an afterthought.

Recovery

Error handling that adapts

Distinguish recoverable errors from hard blockers, and vary the next attempt. A loop that retries the same failed action isn't learning — it's spinning.

The honest test: the hard part of loop engineering isn't getting it to work when everything goes right. It's making it fail gracefully when things go wrong. Test your loops on ambiguous tasks, broken tools, and genuinely unsolvable problems — not just the happy path.

The five building blocks (+ memory)

A loop that runs unattended is not one long prompt. It's a small system with six parts. Five are capabilities; the sixth is the spine that holds state between runs.

1. Automations — the heartbeat

Automations are what make a loop an actual loop and not just one run you did once. You pick a project, a prompt, a cadence, and an environment; the runs that find something land in a triage inbox, and the runs that find nothing archive themselves. This turns “I should check CI every morning” into something that happens whether or not you open a terminal. The heartbeat doesn't need to be clever — it needs to be reliable.

A close cousin is the run-until-done primitive (/goal): it keeps working across turns until a verifiable stopping condition holds, and after each turn a separate small model checks whether you're actually done — so the agent that wrote the code isn't the one grading it.

2. Worktrees — so parallel doesn't become chaos

The moment you run more than one agent, files start colliding. Two agents writing the same file is the same headache as two engineers committing to the same lines without talking. A git worktree — a separate working directory on its own branch, sharing the same repo history — means one agent's edits literally cannot touch another's checkout. The mechanical collision disappears, but your review bandwidth is still the ceiling on how many you can actually run.

3. Skills — so you stop re-explaining your project

Every session, the agent starts cold. Conventions, build commands, review standards, the incident that taught you “we don't do it that way” — all of it gets re-derived from scratch unless you write it down. A skill (a SKILL.md file plus optional scripts and references) is that intent written down on the outside, where the agent reads it every run. Without skills, every loop run is day one. With them, knowledge compounds.

4. Plugins & connectors — reaching real tools

A loop that can only see the filesystem is a tiny loop. Connectors (built on MCP, the Model Context Protocol) let the agent read your issue tracker, query a database, hit a staging API, or post to Slack. This is the difference between an agent that says “here's the fix” and a loop that opens the PR, links the ticket, and pings the channel once CI is green. Connectors turn the loop from a commentator into an operator. Plugins bundle skills and connectors so a teammate installs your whole setup in one go.

5. Sub-agents — keep the maker away from the checker

The single most useful structural move in a loop is splitting the agent that writes from the agent that checks. The model that wrote the code is far too generous grading its own homework — that's a structural limitation, not a model one. A second agent, with different instructions and sometimes a stronger model, catches what the first one talked itself into. In an unattended loop, a verifier you trust is the only reason you can walk away.

6. Memory — the durable spine

None of the above survives a session boundary on its own. The loop must read from and write to something external — a STATE.md, a LOOP-STATE.json, a Linear board column, a GitHub Project view. Good state answers three questions: What are we working on right now? What did we try last time, and what happened? What's waiting for a human? For multi-day loops this is non-negotiable; the state file is often the most important artifact the loop produces. The agent forgets — the repo doesn't.

Horizontal bar chart titled "Which blocks make a loop trustworthy unattended," ranking six loop building blocks by illustrative weighting: memory 95, sub-agents 90, automations 85, skills 75, connectors 70, worktrees 60.
How much each building block contributes to whether a loop can run unattended (illustrative, based on the emphasis practitioners place on each piece). Memory and verification carry the most weight for trustworthy autonomy.

The same shape in every tool

What makes this a meta rather than a feature is convergence: Codex, Claude Code, and Grok have all landed on the same six primitives, with only the names differing. Design the loop once and it ports.


Primitive

Job in the loop

Codex app

Claude Code

Automations

Discovery + triage on a schedule

Automations tab; results land in a Triage inbox; /goal for run-until-done

Scheduled tasks & cron, /loop, /goal, hooks, GitHub Actions

Worktrees

Isolate parallel features

Built-in worktree per thread

git worktree, --worktree, isolation: worktree on a subagent

Skills

Codify project knowledge

Agent Skills (SKILL.md), invoked with $name or implicitly

Agent Skills (SKILL.md)

Plugins / connectors

Connect your tools

Connectors (MCP) plus plugins for distribution

MCP servers plus plugins

Sub-agents

Ideate and verify

Subagents as TOML in .codex/agents/

Task subagents in .claude/agents/, agent teams

State

Track what's done

Markdown or Linear via a connector

Markdown (AGENTS.md, progress files) or Linear via MCP


Because MCP has become the common substrate, a connector written for one tool often ports straight to another. Once the shape is shared, the tool you happen to be sitting in matters far less than the loop you designed.


Four loop patterns and when to use each

Loops aren't one architecture. A handful of standard shapes suit different task types — knowing which to reach for is half the craft.

Pattern 01

Retry loop

Use for: short, atomic tasks with clear pass/fail — a function that passes a test, a query that returns valid data.

Watch out: infinite retries without changing strategy. If the same approach keeps failing, vary the next attempt.

Pattern 02

Plan–execute–verify

Use for: multi-step tasks where order matters and early mistakes compound — refactors, new features with several components.

Watch out: over-committing to a bad plan. If step 2 reveals the plan was wrong, revise it — don't push through.

Pattern 03

Explore–narrow

Use for: debugging unknown errors, unfamiliar APIs, performance work — when you don't know the right approach upfront.

Watch out: context explosion. Running paths in parallel is expensive; prune early and often.

Pattern 04

Human-in-the-loop

Use for: tasks that can't be fully specified upfront, or production changes a human should approve before execution.

Watch out: interrupting too often. If it asks about every small decision, it isn't saving you time.

What one real loop looks like

Stick the pieces together and a single thread turns into a little control panel. Here's a shape that works in any of the major tools:

Flowchart of an automated morning triage loop: a scheduled automation triggers a triage skill that reads the repo and writes findings to STATE.md, opens an isolated worktree per finding, sends a maker sub-agent and a checker sub-agent, then uses connectors to open a PR, update the ticket, and ping Slack.
One morning-triage loop. You designed it once — you didn't prompt any individual step. The state file is the spine: tomorrow's run picks up exactly where today's stopped.

Look at what you actually did there: you designed the system one time and then stopped touching the steps. That is Steinberger's whole point made concrete — and it's the same loop whether you run it in Codex or Claude Code, because the pieces are the same pieces.

How to engineer better loops in practice


If you're building or customizing an agentic system, these are the highest-leverage habits:

  • Define termination conditions before you write any loop logic. “All tests pass and no lint errors” is a condition; “the code looks good” is not. Do the same for failure: “after 10 iterations with no progress, escalate to a human” gives your loop a floor.

  • Give the agent structured feedback, not raw dumps. Pre-process errors: include the code that caused them, the intent behind the action, and a flag for repeated vs. new errors. Each iteration gets more efficient.

  • Log everything, summarize often. Compress the running log into compact working memory before each iteration — “tried A (failed: TypeError), tried B (same), tried C (error resolved, tests still failing on line 47)” beats a full transcript.

  • Set strict tool-call budgets. Unlimited calls bloat runs and burn tokens. If an agent exhausts its budget without progress, treat that as a failure signal and switch strategy.

  • Test on failure cases, not just happy paths. Feed it ambiguous tasks, broken tools, and unsolvable problems to confirm the exit conditions actually fire.


On token economics: sub-agents and frequent cadences multiply costs fast. A 5-minute loop that spawns an implementer and a verifier on every run can burn through a limited plan before breakfast. Keep triage cheap; spawn sub-agents only when state says something is actually actionable.

What the loop still won't do for you


The loop changes the work; it doesn't delete you from it. Three problems actually get sharper as the loop gets better, not easier.

Verification is still on you

A loop running unattended is also a loop making mistakes unattended. Splitting the verifier from the maker is what makes the loop's “it's done” mean something — but even then, “done” is a claim, not a proof. Your job is to ship code you confirmed works.

Your understanding rots if you let it

The faster the loop ships code you didn't write, the wider the gap between what exists and what you actually grasp. That's comprehension debt, and a smooth loop makes it grow faster unless you read what the loop made.

The comfortable posture is the dangerous one

When the loop runs itself, it's tempting to stop having an opinion and just take whatever it returns — cognitive surrender. The same loop design accelerates someone who stays the engineer and lets someone else abdicate judgment entirely. Same action, opposite results. The loop doesn't know the difference. You do.

Two people can build the exact same loop and get completely opposite outcomes. One uses it to move faster on work they understand deeply. The other uses it to avoid understanding the work at all.That's what makes loop design harder than prompt engineering — not easier.

Key takeaways

Describing AI building blocks, layers, patterns, and one-time design.

  • Loop engineering replaces single-shot prompting with iterative, self-driving cycles: act, observe, reason, repeat until a real goal is met.

  • The quality difference between coding agents usually comes down to loop design, not the base model.

  • A solid loop needs a clear goal, real tools, context management, termination logic, and genuine error recovery.

  • The same six primitives have converged across Codex, Claude Code, and Grok — so a well-designed loop is largely tool-agnostic.

  • Loops are still early. Token economics, verification, and comprehension debt are real costs — build the loop like someone who intends to stay the engineer, not just the person who presses go.


Prompting directly is still powerful and often the right tool. But the leverage point has moved. The job isn't a bigger prompt — it's a system that discovers, assigns, verifies, persists, and knows when to hand off to you.


References

Comments


bottom of page