One Command to Generate, Test, Rank, and Merge an Agent Fan-Out

This is the last post in a little series, and it's the one where the pieces snap together. Over three
posts I built a best-of-N agent pipeline a part at a time:

Git worktrees — the isolation
substrate: one repo, many working directories, so agents don't clobber each other.
fan-out.sh — generate: run the
same task in N worktrees in parallel and collect the diffs.
judge.sh — select: run the tests in
each worktree to eliminate the broken attempts, then judge the survivors on quality.

Each was a deliberate single step you could run by hand. That's good for understanding, but in daily use
you don't want to babysit three scripts and copy a timestamp between them. So here's the capstone: one
command that does the whole loop — generate, test, rank, present, merge — with the human kept exactly
where they belong, at the final gate.

The orchestrator

You give it three things: how many attempts, the agent command, and the test command. Everything else is
automatic until the merge decision, which is yours.

#!/usr/bin/env bash
# orchestrate.sh — fan a task out across N worktrees, eliminate failures with a
# test command, rank the survivors, and offer to merge the winner.
#   ./orchestrate.sh <n> '<task-command>' '<test-command>'
set -uo pipefail

N="${1:?usage: orchestrate.sh <n> '<task-cmd>' '<test-cmd>'}"
TASK="${2:?need a task command (string)}"
TEST="${3:?need a test command (string)}"

repo_root="$(git rev-parse --show-toplevel)"
base="$(git rev-parse --abbrev-ref HEAD)"
stamp="$(date +%Y%m%d-%H%M%S)"
runs_dir="$(dirname "$repo_root")/$(basename "$repo_root")-fanout-$stamp"
mkdir -p "$runs_dir"
say() { printf '\n== %s ==\n' "$*"; }

# 1. fan out (parallel, isolated)
say "1/4  fan out $N runs of: $TASK"
pids=()
for i in $(seq 1 "$N"); do
  dir="$runs_dir/run-$i"
  git worktree add -q -b "fanout/$stamp/$i" "$dir" "$base"
  ( cd "$dir" && FANOUT_RUN="$i" bash -c "$TASK" >task.log 2>&1
    git add -A && git commit -q -m "fanout run $i" --allow-empty ) &
  pids+=($!)
done
for pid in "${pids[@]}"; do wait "$pid"; done

# 2. test = eliminate
say "2/4  test each run: $TEST"
green=()
for i in $(seq 1 "$N"); do
  if ( cd "$runs_dir/run-$i" && bash -c "$TEST" >test.log 2>&1 ); then
    echo "  run $i  PASS"; green+=("$i")
  else
    echo "  run $i  FAIL  ($(cd "$runs_dir/run-$i" && tail -1 test.log))"
  fi
done

# 3. rank survivors (cheap proxy: smallest diff first — swap for an LLM judge)
say "3/4  rank ${#green[@]} survivor(s)"
if [ "${#green[@]}" -eq 0 ]; then
  echo "  no run passed — nothing to merge. branches: fanout/$stamp/*"; exit 1
fi
ranked=$(for i in "${green[@]}"; do
  churn=$(git -C "$repo_root" diff --numstat "$base..fanout/$stamp/$i" | awk '{a+=$1+$2} END{print a+0}')
  echo "$churn $i"
done | sort -n)
echo "$ranked" | while read -r c i; do echo "  run $i  (+/- $c lines)"; done
winner=$(echo "$ranked" | head -1 | awk '{print $2}')

# 4. present + gate the merge
say "4/4  winner: run $winner  [fanout/$stamp/$winner]"
git -C "$repo_root" diff "$base..fanout/$stamp/$winner"
echo
read -r -p "merge run $winner into $base? [y/N] " ans
if [ "$ans" = y ] || [ "$ans" = Y ]; then
  git -C "$repo_root" merge --no-edit "fanout/$stamp/$winner" && echo "merged."
else
  echo "left unmerged. branches kept under fanout/$stamp/*"
fi

# cleanup worktrees (branches kept until you prune them)
for i in $(seq 1 "$N"); do git worktree remove --force "$runs_dir/run-$i" 2>/dev/null; done
git worktree prune
echo "worktrees removed. surviving branches: fanout/$stamp/*"

It's the previous two scripts welded together, plus a ranking step and a merge gate. The design choices
that matter:

One base, one timestamp. Every run branches from the same base, and a single stamp namespaces all
the branches and directories for this invocation — so a run never collides with yesterday's.
Test is the eliminator; ranking is a separate, swappable thing. Failing the tests removes you from
contention, full stop. Among the survivors, I rank by a deliberately dumb proxy — smallest diff first,
on the theory that the simplest change that passes is usually the one you want. That line is the seam:
swap it for the judge.sh quality pass or an LLM-as-judge call when "fewest lines" isn't good enough.
The human gate is non-negotiable. Everything up to the merge is automated; the merge itself waits on
read -p. Automate the tedium (spinning up, testing, eliminating, ranking); keep a person on the one
irreversible step.
Disposable by default. Worktrees are torn down at the end. The branches are kept — they're your
audit trail and your escape hatch if you picked wrong — until you git branch -D them.

Running the whole thing

I ran it end to end against a throwaway repo. The stand-in "agent" produces a different add() per run —
run 1 clean and correct, run 2 buggy (it subtracts), run 3 correct but more elaborate — and the test
command checks add 2 3 == 5. One command, the full loop:

$ echo y | ./orchestrate.sh 3 '/tmp/fake-agent.sh' 'bash test.sh'

== 1/4  fan out 3 runs of: /tmp/fake-agent.sh ==

== 2/4  test each run: bash test.sh ==
  run 1  PASS
  run 2  FAIL  (FAIL: add 2 3 = -1, want 5)
  run 3  PASS

== 3/4  rank 2 survivor(s) ==
  run 1  (+/- 2 lines)
  run 3  (+/- 6 lines)

== 4/4  winner: run 1  [fanout/20260618-010947/1] ==
diff --git a/solution.sh b/solution.sh
-add() { :; }   # TODO
+add() { echo $(($1 + $2)); }

merged.
worktrees removed. surviving branches: fanout/20260618-010947/*

And main afterward holds exactly the winning attempt:

$ git show main:solution.sh
add() { echo $(($1 + $2)); }

That's the entire best-of-N loop in one command. The buggy run was eliminated without my attention, the
two working runs were ranked, the simplest one was surfaced with its diff, and it reached main only
because I answered y. Generate, test, rank, gate, merge — done.

The rough edges (because there always are some)

I'm pasting the real output, which means I have to be honest about what it shows. Two things to fix
before you point this at a real project:

It commits its own logs. Look closely at the merge and you'll see task.log rode along into the
commit — the orchestrator does git add -A, which scoops up the task.log/test.log files it wrote.
The fix is one line in .gitignore (task.log and test.log), or commit a curated set instead of
-A. Harmless in the demo; annoying in a real repo.
"Smallest diff" is a proxy for simplicity, not a measure of quality. It happened to pick the right
run here because the clean solution genuinely was the smallest. It will happily mis-rank a tight,
clever-but-wrong-shaped change above a slightly longer, clearer one. For anything that matters, replace
the ranking block with a real judge — the test-then-quality pass from the last post, or an LLM-as-judge
scoring the survivors against an explicit rubric.

And the same standing caveats from the whole series still apply: each worktree needs its dependencies,
so fold the install into your task or test command ('npm ci && npm test', not bare 'npm test'); green
tests prove only the absence of the bugs you tested for; and this is a single-machine, synchronous toy
— fine for a handful of runs, but the moment you want a queue, retries, or a fleet, you've outgrown bash.

Where the series lands

Four posts, one idea built up in layers: isolation (worktrees) makes parallel generation
(fan-out.sh) safe, parallel generation makes selection (judge.sh) meaningful, and selection only
pays off when it ends in a gated merge (orchestrate.sh). Fifty-odd lines of bash, and you can run
best-of-N on your own machine this afternoon.

All three scripts, plus a self-contained demo you can run with no agent wired up, are on GitHub:
github.com/egarim/agent-fanout (MIT). Clone it, drop the
scripts into a repo, and ./examples/demo.sh to watch the whole loop run end to end.

The real point was never the scripts. It's that this exact shape — fan out into isolated worktrees, verify
each attempt, rank, and merge the winner behind a human gate — is what the serious agent harnesses do for
you now. Claude Code orchestrates precisely this: isolated worktrees per attempt, parallel execution,
verification before anything lands. Building the hand-cranked version is worth it not because you'll keep
using it, but because you'll never again treat that orchestration as a black box. You'll know it's
git worktree add in a loop, a wait, a test, a sort, and a merge you chose to allow — and knowing that
is exactly what lets you hand the real thing bigger, riskier work with your eyes open.