Loop Engineering
- 3 hours ago
- 10 min read
Stop prompting your AI coding agent. Start designing the system that prompts it for you. Here's the new meta — what it is, how it works, and why it changes the job.

What is loop engineering?
Loop engineering is the practice of replacing yourself as the person who prompts the agent — and designing the system that does the prompting instead.
For about two years, getting value out of a coding agent meant one thing: you wrote a good prompt, gave it enough context, read what came back, and typed the next thing. The agent was a tool and you were holding it the entire time, one turn after another. That era is ending.
In its place you build a small system that finds the work, hands it out, checks the result, writes down what's done, and decides the next thing — and you let that system poke the agents instead of you. As Addy Osmani puts it, “a loop here can be thought of as a recursive goal where you define a purpose and the AI iterates until complete.”
”You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents.”— Peter Steinberger, creator of OpenClaw
“I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.”— Boris Cherny, head of Claude Code at Anthropic
This isn't just rhetoric from two practitioners. The same primitives have shipped inside the major coding products at almost the same time, which means the “shape” of a loop is becoming tool-agnostic. Once you notice the shape is the same in Codex, Claude Code, and Grok, you stop arguing about which tool and just design a loop that works in any of them.
The shift: from a chain you drive to a loop that drives itself
The clearest way to understand loop engineering is to compare it with what came before. A chain is linear and you are the engine: step A leads to step B leads to step C, and you supply each next instruction. A loop is dynamic and self-driving: the system acts, observes the result, reasons about it, and decides the next move — repeating until a goal is genuinely met.

This matters more for coding than almost any other task, because coding is naturally iterative. Even expert engineers don't write perfect code on the first try — they run it, read the error, fix it, run it again. An agent that generates code once and stops is fundamentally limited: it can't catch runtime errors, adapt to environment quirks, or confirm its output actually works. The loop is what closes that feedback gap.
Where it sits: three layers of the agent stack
Loop engineering didn't appear in a vacuum. The AI vocabulary has moved fast — first context engineering, then harness engineering, and now loop engineering. They stack, each solving a different problem.

The key insight: a year ago, building a loop meant writing a pile of bash scripts you'd maintain forever. Today the pieces ship inside the products themselves. That's why loop engineering is suddenly accessible — you assemble primitives rather than build infrastructure from scratch.
The anatomy of a well-engineered loop
Not all loops are equal. A poorly designed one wastes tokens, runs forever, or hallucinates progress. A well-designed one is efficient, terminates correctly, and produces reliable output. A solid loop has five internal ingredients:
Goal
A clear definition of "done"
Specific enough to evaluate, broken into testable sub-tasks. “Make all unit tests pass” works; “make the app better” produces infinite loops.
Tools
A toolset to act & observe
Code execution, file access, a shell, test runners, docs lookup. If the agent can't run its own code, the loop is just guessing.
Context
Context management
Summarize past iterations into compact memory, log attempts and outcomes, prune the irrelevant — or hit token limits fast.
Termination
Termination logic
Success conditions, failure conditions (max iterations, repeated errors), and escalation paths. A first-class design requirement, not an afterthought.
Recovery
Error handling that adapts
Distinguish recoverable errors from hard blockers, and vary the next attempt. A loop that retries the same failed action isn't learning — it's spinning.
The honest test: the hard part of loop engineering isn't getting it to work when everything goes right. It's making it fail gracefully when things go wrong. Test your loops on ambiguous tasks, broken tools, and genuinely unsolvable problems — not just the happy path.
The five building blocks (+ memory)
A loop that runs unattended is not one long prompt. It's a small system with six parts. Five are capabilities; the sixth is the spine that holds state between runs.
1. Automations — the heartbeat
Automations are what make a loop an actual loop and not just one run you did once. You pick a project, a prompt, a cadence, and an environment; the runs that find something land in a triage inbox, and the runs that find nothing archive themselves. This turns “I should check CI every morning” into something that happens whether or not you open a terminal. The heartbeat doesn't need to be clever — it needs to be reliable.
A close cousin is the run-until-done primitive (/goal): it keeps working across turns until a verifiable stopping condition holds, and after each turn a separate small model checks whether you're actually done — so the agent that wrote the code isn't the one grading it.
2. Worktrees — so parallel doesn't become chaos
The moment you run more than one agent, files start colliding. Two agents writing the same file is the same headache as two engineers committing to the same lines without talking. A git worktree — a separate working directory on its own branch, sharing the same repo history — means one agent's edits literally cannot touch another's checkout. The mechanical collision disappears, but your review bandwidth is still the ceiling on how many you can actually run.
3. Skills — so you stop re-explaining your project
Every session, the agent starts cold. Conventions, build commands, review standards, the incident that taught you “we don't do it that way” — all of it gets re-derived from scratch unless you write it down. A skill (a SKILL.md file plus optional scripts and references) is that intent written down on the outside, where the agent reads it every run. Without skills, every loop run is day one. With them, knowledge compounds.
4. Plugins & connectors — reaching real tools
A loop that can only see the filesystem is a tiny loop. Connectors (built on MCP, the Model Context Protocol) let the agent read your issue tracker, query a database, hit a staging API, or post to Slack. This is the difference between an agent that says “here's the fix” and a loop that opens the PR, links the ticket, and pings the channel once CI is green. Connectors turn the loop from a commentator into an operator. Plugins bundle skills and connectors so a teammate installs your whole setup in one go.
5. Sub-agents — keep the maker away from the checker
The single most useful structural move in a loop is splitting the agent that writes from the agent that checks. The model that wrote the code is far too generous grading its own homework — that's a structural limitation, not a model one. A second agent, with different instructions and sometimes a stronger model, catches what the first one talked itself into. In an unattended loop, a verifier you trust is the only reason you can walk away.
6. Memory — the durable spine
None of the above survives a session boundary on its own. The loop must read from and write to something external — a STATE.md, a LOOP-STATE.json, a Linear board column, a GitHub Project view. Good state answers three questions: What are we working on right now? What did we try last time, and what happened? What's waiting for a human? For multi-day loops this is non-negotiable; the state file is often the most important artifact the loop produces. The agent forgets — the repo doesn't.

The same shape in every tool
What makes this a meta rather than a feature is convergence: Codex, Claude Code, and Grok have all landed on the same six primitives, with only the names differing. Design the loop once and it ports.
Primitive | Job in the loop | Codex app | Claude Code |
Automations | Discovery + triage on a schedule | Automations tab; results land in a Triage inbox; /goal for run-until-done | Scheduled tasks & cron, /loop, /goal, hooks, GitHub Actions |
Worktrees | Isolate parallel features | Built-in worktree per thread | git worktree, --worktree, isolation: worktree on a subagent |
Skills | Codify project knowledge | Agent Skills (SKILL.md), invoked with $name or implicitly | Agent Skills (SKILL.md) |
Plugins / connectors | Connect your tools | Connectors (MCP) plus plugins for distribution | MCP servers plus plugins |
Sub-agents | Ideate and verify | Subagents as TOML in .codex/agents/ | Task subagents in .claude/agents/, agent teams |
State | Track what's done | Markdown or Linear via a connector | Markdown (AGENTS.md, progress files) or Linear via MCP |
Because MCP has become the common substrate, a connector written for one tool often ports straight to another. Once the shape is shared, the tool you happen to be sitting in matters far less than the loop you designed.
Four loop patterns and when to use each
Loops aren't one architecture. A handful of standard shapes suit different task types — knowing which to reach for is half the craft.
Pattern 01
Retry loop
Use for: short, atomic tasks with clear pass/fail — a function that passes a test, a query that returns valid data.
Watch out: infinite retries without changing strategy. If the same approach keeps failing, vary the next attempt.
Pattern 02
Plan–execute–verify
Use for: multi-step tasks where order matters and early mistakes compound — refactors, new features with several components.
Watch out: over-committing to a bad plan. If step 2 reveals the plan was wrong, revise it — don't push through.
Pattern 03
Explore–narrow
Use for: debugging unknown errors, unfamiliar APIs, performance work — when you don't know the right approach upfront.
Watch out: context explosion. Running paths in parallel is expensive; prune early and often.
Pattern 04
Human-in-the-loop
Use for: tasks that can't be fully specified upfront, or production changes a human should approve before execution.
Watch out: interrupting too often. If it asks about every small decision, it isn't saving you time.
What one real loop looks like
Stick the pieces together and a single thread turns into a little control panel. Here's a shape that works in any of the major tools:

Look at what you actually did there: you designed the system one time and then stopped touching the steps. That is Steinberger's whole point made concrete — and it's the same loop whether you run it in Codex or Claude Code, because the pieces are the same pieces.
How to engineer better loops in practice
If you're building or customizing an agentic system, these are the highest-leverage habits:
Define termination conditions before you write any loop logic. “All tests pass and no lint errors” is a condition; “the code looks good” is not. Do the same for failure: “after 10 iterations with no progress, escalate to a human” gives your loop a floor.
Give the agent structured feedback, not raw dumps. Pre-process errors: include the code that caused them, the intent behind the action, and a flag for repeated vs. new errors. Each iteration gets more efficient.
Log everything, summarize often. Compress the running log into compact working memory before each iteration — “tried A (failed: TypeError), tried B (same), tried C (error resolved, tests still failing on line 47)” beats a full transcript.
Set strict tool-call budgets. Unlimited calls bloat runs and burn tokens. If an agent exhausts its budget without progress, treat that as a failure signal and switch strategy.
Test on failure cases, not just happy paths. Feed it ambiguous tasks, broken tools, and unsolvable problems to confirm the exit conditions actually fire.
On token economics: sub-agents and frequent cadences multiply costs fast. A 5-minute loop that spawns an implementer and a verifier on every run can burn through a limited plan before breakfast. Keep triage cheap; spawn sub-agents only when state says something is actually actionable.
What the loop still won't do for you
The loop changes the work; it doesn't delete you from it. Three problems actually get sharper as the loop gets better, not easier.
Verification is still on you
A loop running unattended is also a loop making mistakes unattended. Splitting the verifier from the maker is what makes the loop's “it's done” mean something — but even then, “done” is a claim, not a proof. Your job is to ship code you confirmed works.
Your understanding rots if you let it
The faster the loop ships code you didn't write, the wider the gap between what exists and what you actually grasp. That's comprehension debt, and a smooth loop makes it grow faster unless you read what the loop made.
The comfortable posture is the dangerous one
When the loop runs itself, it's tempting to stop having an opinion and just take whatever it returns — cognitive surrender. The same loop design accelerates someone who stays the engineer and lets someone else abdicate judgment entirely. Same action, opposite results. The loop doesn't know the difference. You do.
Two people can build the exact same loop and get completely opposite outcomes. One uses it to move faster on work they understand deeply. The other uses it to avoid understanding the work at all.That's what makes loop design harder than prompt engineering — not easier.
Key takeaways

Loop engineering replaces single-shot prompting with iterative, self-driving cycles: act, observe, reason, repeat until a real goal is met.
The quality difference between coding agents usually comes down to loop design, not the base model.
A solid loop needs a clear goal, real tools, context management, termination logic, and genuine error recovery.
The same six primitives have converged across Codex, Claude Code, and Grok — so a well-designed loop is largely tool-agnostic.
Loops are still early. Token economics, verification, and comprehension debt are real costs — build the loop like someone who intends to stay the engineer, not just the person who presses go.
Prompting directly is still powerful and often the right tool. But the leverage point has moved. The job isn't a bigger prompt — it's a system that discovers, assigns, verifies, persists, and knows when to hand off to you.
References

Comments