Kind	`note`
Maturity	`budding`
Confidence	`medium`
Origin	`ai-drafted`
Created	March 29, 2026
Tags	ai, agents, architecture
Related	`parallel-ai-research-pipelinesagentic-coding-getting-started`
Markdown	/note/multi-agent-coordination-without-llm.md

See what AI agents see

note 🪴 budding 🤖 ai-drafted

Multi-Agent Coordination Without an LLM

A deterministic coordinator pattern for parallel AI agents — goals, budgets, feedback loops, and redirect signals without LLM judgment in the control plane.

March 29, 2026

ai agents architecture

You have three AI agents running in parallel, each generating candidates for the same goal. They need to know what’s already been tried, when to change strategy, and when to stop. The obvious move: put an LLM in the middle to coordinate. Read each agent’s output, decide who should pivot, tell them what to do next.

This is the wrong move.

The Problem with LLM Coordinators

An LLM coordinator introduces three failure modes:

Subjective stopping. An LLM reads an agent’s output and decides “this direction looks exhausted.” But the LLM doesn’t have ground truth — it’s guessing based on vibes. An agent that found nothing in 50 tries might find gold on try 51 if the search space is large enough. Only objective metrics (hit rate, budget remaining, target reached) should trigger stops.

State drift. The coordinator needs to track what’s been submitted, what’s been checked, what’s duplicated. An LLM tracking this in its context window will lose items, double-count, and hallucinate state. Context windows are not databases.

Latency and cost. Every coordination decision requires an LLM call. If you have five agents each checking in every 30 seconds, that’s 10 coordinator calls per minute — each burning tokens to re-read state that a SQLite query could answer in microseconds.

The Pattern: Deterministic Coordinator

Separate the creative work from the coordination work. Agents (LLMs) do the creative part — generating candidates, exploring strategies, adapting to feedback. The coordinator is a plain program — no LLM, no inference, no judgment calls. It owns:

Goals and stop conditions. Each goal has a target (e.g., “find 50 results meeting criteria X”) and objective completion rules.
Worker registration and budgets. Each agent gets a workspace, a strategy assignment, and a budget (how many items to process before stopping).
Candidate dedup. A global set of everything already submitted. No agent wastes effort on items another agent already tried.
Result recording. Every submission is tracked — accepted, rejected, duplicate, error. The coordinator is the single source of truth.
Feedback generation. Deterministic signals derived from observed data, not LLM interpretation.

The coordinator is a CLI backed by a local database. Agents interact with it through commands, not conversation.

Goal Lifecycle

The lifecycle has five steps:

1. Create a goal with constraints — topic, strategy hints, target count, quality thresholds. The goal defines what “done” looks like in measurable terms.

2. Register workers. Each agent gets an ID, a workspace directory, a strategy assignment, and per-worker limits. Strategies should be disjoint — if one agent is exploring short names and another is exploring compound words, their search spaces overlap minimally.

3. Submit and check. Agents generate candidates and submit them to the coordinator. The coordinator deduplicates against the global checked set, processes accepted candidates, and records results — all in one atomic operation.

4. Read feedback. After each submission round, agents read their worker feedback and the goal-level feedback. This is where they learn what’s working and what isn’t.

5. Stop on objective conditions. The goal is complete when the target count is reached, the budget is exhausted, or the operator manually stops it. Not when an agent “feels done.”

The Feedback Loop

This is what makes the pattern work. Feedback is deterministic — computed from observed data, not generated by an LLM reading summaries.

Worker-level feedback

Each agent gets a report specific to its own performance:

Signal	What it tells the agent
Budget remaining	How many more items it can process
Target remaining	How many more hits the worker needs
Duplicate rate	How often it’s submitting items another agent already tried
Hit rate	What fraction of its submissions are succeeding
Recent successes	Its last accepted results (reinforcement)

High duplicate rate means the agent’s strategy is converging with another agent’s. Time to diversify. Low hit rate means the current approach isn’t working — but that’s the agent’s problem to solve creatively, not the coordinator’s.

Goal-level feedback

A broader view across all workers:

Signal	What it tells the agent
Total progress	Checked count, target remaining, queue depth
Goal state	`continue` or `complete`
Global hit rate	How productive the entire team is
Per-strategy performance	Which strategies are producing results
Duplicate pressure	How much redundant work is happening across all agents

Per-strategy performance is powerful. If strategy A has a 15% hit rate and strategy B has 2%, agents assigned to B can see this and pivot — without being told to by a coordinator. The data speaks for itself.

Redirect Signals

The coordinator emits redirect messages — but they’re deterministic observations, not instructions.

redirect: "hit rate below threshold — consider narrowing constraints"
redirect: "high duplicate pressure from strategy X — try a different direction"
redirect: "shortest successful results are 5-6 characters — prioritize that range"

These are generated by rules: if hit rate drops below a configured threshold, emit the message. If duplicate submissions from one strategy exceed a percentage, emit the message. No LLM interprets anything. The coordinator just reports what the numbers say.

The critical design choice: redirect messages are hints, not commands. They don’t grant permission to stop. An agent reads “hit rate below threshold” and might decide to change its approach — but it keeps going until an objective stop condition is met (budget exhausted, target reached, goal complete).

This prevents the biggest failure mode of LLM coordination: an agent that gives up too early because the coordinator (or the agent itself) decided the situation “looks hopeless.” In large search spaces, persistence past apparent exhaustion is often where the best results come from.

Why This Works

The pattern works because it separates two fundamentally different kinds of work:

Creative work (what LLMs are good at): generating novel candidates, adapting strategies, exploring unexpected directions, interpreting qualitative feedback.

Bookkeeping (what databases are good at): tracking what’s been tried, computing hit rates, enforcing budgets, detecting duplicates, determining if a goal is complete.

Putting an LLM in the bookkeeping role wastes its strengths and amplifies its weaknesses. An LLM coordinator is slower, less accurate, more expensive, and less reliable than a deterministic program doing the same job.

The database is the source of truth. The feedback loop is the communication channel. The agents are creative workers who read objective data and make their own decisions about how to proceed.

The Architecture in Summary

┌─────────────────────────────────────────────┐
│  Agent 1          Agent 2          Agent 3  │  ← LLMs (creative work)
│  Strategy A       Strategy B       Strategy C│
└─────┬──────────────┬──────────────┬─────────┘
      │ submit       │ submit       │ submit
      ▼              ▼              ▼
┌─────────────────────────────────────────────┐
│  Coordinator (deterministic CLI)            │  ← No LLM
│  • Dedup against global checked set         │
│  • Record results                           │
│  • Compute feedback (hit rate, budgets)     │
│  • Emit redirect signals (rule-based)       │
│  • Evaluate stop conditions                 │
├─────────────────────────────────────────────┤
│  Local Database (SQLite)                    │  ← Source of truth
│  • Goals, workers, budgets                  │
│  • Candidate pool (deduped)                 │
│  • Results, events, strategies              │
└─────────────────────────────────────────────┘

Agents submit candidates, read feedback, adapt. The coordinator tracks everything, computes signals, decides nothing. Creative decisions stay with the LLMs. State management stays with the database.

No LLM judgment in the control plane. Just data in, signals out.