Skip to main content
Haney Strategy
Applied AI

The Loop I Was Already Running

Loops are the AI idea of the moment. The part that makes one work... is the part nobody posts about, and I almost missed it myself.

An AI loop is a system you set running once that does the work, checks its own output against a standard you set, and repeats until it clears the bar, instead of you prompting it by hand on every pass. The one thing that decides whether a loop pays off, instead of quietly burning money, is verification: a real, defensible way to check the work.

Jim HaneyJune 17, 202610 min read

An AI loop is a system you set running once that does the work, checks its own output against a standard you set, and repeats until it clears the bar, instead of you prompting it by hand on every pass. The idea is everywhere right now. What almost nobody says out loud is the one thing that decides whether a loop pays off or just quietly burns money. Here it is in plain language, plus a tool I use that you can copy and run today.

From my desk, June 17.

A few weeks back I heard Boris Cherny, who leads Claude Code at Anthropic, describe how he works now, in an on-stage interview with the hosts of the Acquired Podcast. The line stuck with me. He said he does not really prompt the AI anymore. He has loops running a lot of that for him, and his actual job is to write effective loops.

I filed it away. Then it kept showing up. A handful of posts on X, all circling the same topic: loops, loops, loops. Then it landed again in an AI newsletter I subscribe to. When the same idea finds me three different ways in two weeks, I pay attention. So I started pulling the thread.

Here is what caught me off guard. I had been doing this already. Not casually, either. One of the most useful tools in my stack is a loop. I just had not been thinking of it that way, or leaning into it the way I could be.

What a loop is

Start with a thermostat. You set the temperature you want. It reads the room, runs the heat, checks the room again, and keeps adjusting until the room matches the number, all without you standing there touching the dial.

A loop is that, pointed at work. You set the standard. The AI does the work, checks it against the standard, fixes what falls short, and runs again until it clears the bar. You stopped doing each pass by hand. You set the bar, and you let it work.

That is the non-technical version, and it is enough to get the point.

For the technical reader, here is the same thing one level down. A loop is a harness. It figures out what work exists, hands a piece to the model, checks what came back, and repeats. The unit of work moved up a rung. A year ago it was the keystroke. Then it was the prompt. Now, for a lot of us, it is the loop. You are not writing the answer. You are writing the thing that gets the answer and knows when it is good enough to stop. The work did not vanish. It moved up a level, from grinding out each pass to designing the loop and setting the bar it has to clear. That is more demanding, not less.

The part the hype skips

Now the part nobody puts in a punchy post.

A loop is only as good as its check. If the AI grades its own homework against no real standard, you do not get quality. You get a confident machine telling you it nailed it while it spins in circles and burns tokens (aka real money). That is what amounts to an expensive form of vibe coding, and the skeptics are right about it.

The difference between a loop that compounds and a loop that wastes your budget comes down to one thing. A real way to verify the work. Run the test. Open the page. Check the claim against a source. Grade it against a standard you can defend. That is the whole game, and it is the least glamorous part, which is exactly why it gets left out of the highlight reel.

Get the verification right and the leverage is real. Skip it and you have bought an expensive way to "feel productive."

The loop I'd been running

Which brings me to the tool I mentioned, the one I had not been calling a loop.

I call it the Gauntlet. When I have something that matters, a piece of code, a proposal, a plan, a page of pricing, I run it through the Gauntlet before I trust it. The process is simple to describe. First it checks the work against reality, running it or testing its claims against the evidence. Then it builds the strongest case for the work as it stands. Then it switches sides and challenges it, hunting for every weak point, edge case, and reason it could fail. Then it grades the work across a handful of categories and gives it a single score.

VerifyDefendChallengeGrade
Score under the bar? Loop back to Verify and run it again

I set a target, say a nine out of ten. If the work comes in under the target, the Gauntlet does not shrug and move on. It researches, fixes the weak areas the challenge exposed, and runs the whole thing again. It keeps going until the work clears the bar I set. I have run code, a client proposal, website builds, pricing logic, and a board deck through it. What comes back is not a first draft. It is something that has already survived its own cross-examination.

Here is the honest distinction, because the word loop is carrying a lot of weight. Like a thermostat, you set the target. Unlike a thermostat that runs all day, I point this one at a specific job when I have one. But once I do, it works every pass on its own, defense, challenge, grade, and fix, until the work clears my bar, with no hand-holding from me in between. That is the loop, and the part that earns the name. Making it fully hands-off, so it runs on a schedule or after every change, is one wrapper away, and I will point you at that.

The name is not an accident. Run the gauntlet, the old meaning, and you walk between two lines and take hits from both sides. Defense, then challenge. Whatever comes out the other end... earned it.

Why I just rebuilt it as V2

The first version had a flaw, and it is the same flaw hiding in most homemade loops. The same AI made the thing and graded the thing. Tell a model the loop ends at a nine, and it learns the fastest way out is to hand you a nine. The score drifts away from the work.

So I rebuilt it, and the fixes are the same discipline I just described. The grader is now separated from the maker, and it never sees the maker's reasoning or the target score, so it cannot grade toward the exit. Scores come from explicit checks backed by evidence, not a gut number. For code, it runs and tests the work before it argues about it. The loop has guards so it cannot spin forever, which also keeps the cost in check: a handful of passes, then it stops, whether it cleared the bar or not. And it always ends by listing the risks that survived.

That last part matters. The Gauntlet does not certify perfect. It certifies vetted, with the risks named. That is the more honest claim, and it is the more useful one.

Run it yourself

Here is the part I care most about. I am giving you the whole thing.

A few honest words about what this is. I built the Gauntlet, and I run it in my own stack most days. I am sharing it with everyone because I want to contribute something that helps all of us grow and keep learning, rather than keep it to myself. So there is no gate: no form to fill out, no email to hand over, and no catch. And what you get is the real thing, the same version I rely on, not a stripped-down demo version. Take it the way you would take a tool off a colleague's workbench, then tweak it, rename it, gut it, rebuild it. Make it yours. If it becomes the launch pad for something better, that is a win, and I would love to see what you build. Share your thoughts, your ideas, and what you make of it. Let's build on it together. It goes out under a Creative Commons license, so the one thing I ask is credit if you build on it in public. Past that, it is yours to use and build from.

One note first: getting it is free, but running it needs Claude Code or the Cowork desktop app on a paid Claude plan, with code execution switched on. If you are already working in Claude, you are most of the way there. Installing takes about a minute, and it works a little differently in each app.

In Claude Code

  1. Get the file from the repo at github.com/haneystrategy/gauntlet: clone it, or copy SKILL.md straight from there.
  2. Save it as SKILL.md inside a folder named gauntlet in your Claude skills directory, at ~/.claude/skills/gauntlet/.
  3. Type /gauntlet, or just say "run the gauntlet on this," and point it at a file or whatever you are working on.

In Claude Cowork (the desktop app)

  1. Download gauntlet.zip from the repo at github.com/haneystrategy/gauntlet.
  2. In Cowork, open Customize, then Skills. You can reach Customize from Settings, then Capabilities, where you also switch on Code execution, which skills require.
  3. Click the plus, then Create skill, then Upload a skill, and choose gauntlet.zip.
  4. Run it the same way, with /gauntlet or "run the gauntlet on this." You will know it really ran when it hands back a structured report, scores by category and a surviving-risk list, not just a confident write-up.

To tune it, set your target, set a per-category floor so one weak area cannot hide behind a strong average, or reweight the categories for what you are vetting. Code leans on soundness and resilience. Writing and strategy lean on value and how well the thing holds up over time.

It runs at full power in Claude Code or Cowork, where it can spin up a separate grader and run your tests. In a plain chat window it still works, with one honest caveat: a single model is playing every role, which leans back toward the first-version problem of grading its own homework. It pushes against that by re-deriving each judgment from scratch and checking facts with web search, so it is still genuinely useful. Just trust a borderline pass less than one from the full setup.

The whole thing, in one place

The complete skill, the install steps for both apps, and the license all live in the repo: github.com/haneystrategy/gauntlet. Copy it, clone it, fork it, make it yours.

If you take one thing from this, it is not "go build loops." It is narrower than that. Find the work you already repeat, the work that has a clear way to tell good from bad, and wrap a loop around that one. Start there. The leverage is real. The discipline is the verification.

I had the loop the whole time. I just needed to lean in.

All signal. No noise.

Frequently asked questions

What is an AI loop?

An AI loop is a setup where you give the model a goal and a way to check its own work, then let it run, review, fix, and repeat on its own until the result meets the standard you set. Instead of prompting by hand on every pass, you define the bar once and the loop works toward it. Think of a thermostat working toward a temperature, pointed at a task instead of a room.

What is the difference between prompting and writing loops?

Prompting is you typing an instruction and reading the answer, one exchange at a time. Writing a loop is building the thing that does the prompting for you: it decides what to work on, checks what comes back, and keeps going until the work clears a bar. The unit of work moves up a level, from the single prompt to the system that runs many prompts toward a goal.

Do I need to be a developer to use the Gauntlet?

No. It runs inside Claude Code or Claude Cowork, where you add one file and invoke it by name. It works on writing, plans, and strategy, not only code. It even runs in a plain chat window, with a lighter version of the same process. If you can save a text file and type a command, you can run it.

What actually makes an AI loop work?

Verification. A loop with no real way to check its output will happily tell you it succeeded while producing nothing useful and spending money doing it. The loops that pay off all share one trait: a defensible way to grade the work, whether that is running a test, checking a claim against a source, or scoring against an explicit standard. Get that right and the rest follows.

What is a rubric?

A rubric is a scorecard: the specific qualities you are grading for, named before you start. Rather than asking "is this any good?" and trusting a gut answer, you decide what matters, such as whether the claims are accurate, whether it covers everything it should, and whether it holds up under pressure, and then you check the work against each one. That is what turns "looks fine to me" into a score you can actually defend. The Gauntlet grades against six qualities by default, and you can change them to fit whatever you are checking.

What is the Gauntlet skill?

The Gauntlet is a free skill that pressure-tests any piece of work before you trust it. It builds the strongest case for the work, challenges it for weak points, grades it against a rubric, and iterates on the weakest areas until it clears a target score. Version two separates the grader from the maker so the score stays honest, and it always reports the risks that survived. The full file is linked in this article.

Share

Want this thinking applied to your business?

Signal Notes can sharpen the thinking. A strategy call turns it into a plan.