Kind	`article`
Maturity	`budding`
Confidence	`high`
Origin	`ai-assisted`
Created	March 28, 2026
Tags	agentic-coding, patterns, templates
Prerequisites	`agentic-coding-getting-started`
Related	`claude-md-patternsbuilding-krowdev-with-agentsaimd-rate-limiting`
Markdown	/article/parallel-ai-research-pipelines.md

See what AI agents see

article 🪴 budding 🤖 ai-assisted

Parallel AI Research Pipelines

Three systems for orchestrating parallel AI agents — from JSONL work items to declarative workspaces to phased research pipelines. The patterns that actually work.

March 28, 2026

I needed protocol documentation for 19 top-level domains — DNS behavior, WHOIS formats, RDAP endpoints, registration rules, rate limits, raw captures. Each TLD is its own research unit with its own servers, formats, and quirks. Doing them sequentially would take days.

So I wrote a prompt that launched 19 parallel subagents, each researching one TLD in its own isolated directory, then ran a review pass to find gaps, then launched a second research wave, then a documentation pass. The whole thing ran in one session.

This article is about the pattern that emerged — not the TLD research itself, but the structure for running parallel AI research at scale.

The Problem with Naive Parallel Agents

The obvious approach: “research these 19 things in parallel.” Give each agent a topic and let it go. This fails in predictable ways:

Agents overwrite each other. Two agents writing to the same summary file. Merge conflicts in shared state. Lost work.
No consistency. Agent 1 captures WHOIS response time. Agent 7 doesn’t. Agent 12 uses a different JSON schema. You can’t compare findings across units.
No refinement. First-pass research always has gaps. Without a review step, gaps stay gaps.
No machine-readable output. Agents default to markdown prose. Prose is hard to aggregate, diff, or feed into code.

The Three-Phase Pattern

The structure that works:

Phase 1: Explore (parallel)    → raw findings per unit
Phase 2: Review & Refine       → cross-unit analysis → v2 template → second pass
Phase 3: Document (parallel)   → uniform deliverables

Each phase has different parallelism characteristics. Phase 1 and 3 are embarrassingly parallel (one agent per unit, no coordination). Phase 2 is sequential — a single review agent reads everything and produces the refined template.

The folder structure

research_root/
├── 1_explore/{unit_a, unit_b, ...}/   # Phase 1 workspaces
├── 2_research/{unit_a, unit_b, ...}/  # Phase 2 workspaces
├── 3_writing/{unit_a, unit_b, ...}/   # Phase 3 workspaces
├── {unit}_documentation/              # Final deliverables
├── prompts/                           # Templates (v1, v2)
├── templates/                         # Schemas, response formats
├── summaries/                         # Cross-unit analysis
├── analysis/                          # Review outputs
└── tools/                             # Shared scripts, configs

The key insight: each phase gets its own directory tree. Phase 2 agents don’t touch Phase 1 directories. This makes the workspace append-only at the directory level — you can always go back and see exactly what each agent produced at each stage.

Isolation: Three Approaches

The single most important rule across all three systems: agents must not interfere with each other. There are different ways to enforce this:

Directory isolation (research pipeline) — each agent writes only in its assigned directory:

Agent for unit "net" in Phase 1:
  CAN write:  1_explore/net/*
  CAN read:   tools/*, prompts/*, templates/*
  CANNOT:     1_explore/org/*, 2_research/*, anything else

Git worktree isolation (work system) — each agent gets a separate copy of the repository on disk:

# Each task runs in its own worktree
claude --worktree work-W001 "Fix the port mismatch..."
# Creates branch worktree-work-W001, separate working directory
# Other agents on other worktrees can't see uncommitted changes

Pane isolation (workspace manager) — each agent runs in its own terminal pane, sharing the repo but partitioned by prompt:

# workspace manager: declarative layout, agents share the repo but work on different dirs
panes:
  - name: agent-01
    closing: "Work ONLY in src/parser/. Commit when done."
  - name: agent-02
    closing: "Work ONLY in src/extraction/. Commit when done."

Directory isolation is simplest — no git machinery needed. Worktrees are strongest — agents literally can’t see each other’s uncommitted work. Pane isolation is fastest to set up — just a YAML file — but relies on the agent obeying its prompt.

For research, directory isolation is sufficient. For code changes, worktrees are safer.

Machine-Readable First

The second critical rule: JSON is authoritative, markdown is derived.

Each agent produces two outputs per phase:

findings.json — structured data with a defined schema, every field sourced
notes.md — human-readable summary, explicitly non-authoritative

Why not just markdown? Because the review agent needs to aggregate across all units. Reading 19 markdown files and extracting comparable data is fragile. Reading 19 JSON files with the same schema is trivial.

{
  "unit": "net",
  "registry_operator": "Verisign",
  "lookup_server": "whois.example-registry.com",
  "whois_available_pattern": "No match for \"DOMAIN.NET\".",
  "rdap_base": "https://rdap.verisign.com/net/v1",
  "rdap_available_status": 404,
  "min_label_length": 3,
  "rate_limiting": {"whois": "undocumented", "rdap": "429 + Retry-After"},
  "sources": ["https://www.verisign.com/...", "live probe 2026-03-28"]
}

Every field has a sources array. If the review agent questions a finding, it can trace back to the original source. No “trust me, I researched it.”

This is what makes the pattern actually work, not just “parallel agents doing things.”

Phase 1 uses a generic template. Agents do their best, but they don’t know what they don’t know. Some agents capture edge cases others miss. Some discover dimensions the template didn’t anticipate.

The review step reads all Phase 1 outputs and produces:

A global findings file (unified spec across all units)
A taxonomy of categories discovered (not just the ones you predicted)
A gap analysis (what each unit is missing)
A v2 template incorporating everything Phase 1 revealed

Phase 2 uses the v2 template. Now every agent knows to look for the edge cases that only some agents found in Phase 1. The quality floor rises dramatically.

Key Insight The review step isn't quality control — it's knowledge transfer. Phase 1 agents collectively discover what matters. The v2 template broadcasts that knowledge to Phase 2 agents. Each agent in Phase 2 is smarter than any agent in Phase 1 because it has the template that Phase 1 collectively produced.

What the review agent actually produces

For the TLD research, the review agent read 19 findings.json files and produced:

Implementation tiers — grouping TLDs by complexity (trivial: .net is identical to .com; custom: .uk needs a unique parser; special: .ch blocks WHOIS entirely)
Parser families — TLDs sharing the same backend/format (Identity Digital runs .org, .io, .ai with identical WHOIS patterns)
Gap analysis — “.fr agent didn’t capture rate limit behavior” / “.se agent missed zone file AXFR access”
v2 template — now includes: “check for AXFR zone file access” (only discovered by the .se agent), “capture WHOIS connection terminator behavior” (only .de closes connection instead of using <<<)

Live Captures as Ground Truth

Agents shouldn’t just search the web. They should probe live systems and capture raw responses.

# Shared probe tool available to all agents (read-only)
# probe.py --target net --type whois --domain google.net

Raw captures serve two purposes:

Truth. Web search results can be outdated. RFC text can be ambiguous. A raw WHOIS response is unambiguous.
Parser guidance. When you later implement a parser, the raw captures are your test fixtures. You don’t need to re-query live servers.

Captures are immutable — written once, never edited. If a second probe gives different results, you capture both. Contradictions are data.

From Pattern to Tool

A template describes a pattern. A task file makes it executable. A CLI makes it repeatable. Each layer reduces how much the operator needs to get right.

Layer 1: Natural language prompt (one-shot)

The TLD research started as a single message:

“Research all remaining TLDs… use one subagent per TLD, give each its own directory… ensure agents never overwrite each other’s work.”

This works once. It’s not repeatable — the next researcher writes a different prompt, gets different structure, produces incomparable output.

Layer 2: Template with variables (repeatable)

Extract the pattern into a template with {{VARIABLES}}:

Phase 1: Explore (parallel — one agent per {{UNIT}})
  - Each agent works ONLY in 1_explore/{{UNIT_ID}}/
  - Live probe: {{PROBE_TARGETS}} via proxied connections
  - Persist: findings.json + notes.md

Now anyone can fill in the variables and get the same structure. But it’s still manual — you read the template, fill it in mentally, write the prompt.

Layer 3: Task file (executable)

Make the filled-in template machine-readable — a JSONL record per unit:

{
  "id": "C01",
  "slug": "bot-detection-2026",
  "title": "How Websites Detect Bots in 2026",
  "kind": "article",
  "status": "planned",
  "source_map": "analysis/01-bot-detection-2026.md",
  "sources": ["docs/research/03-anti-bot-landscape-2026.md", "..."],
  "parallel": true
}

This is the same pattern as a tasks.jsonl in any work system — each line is one unit of work with enough context to build a prompt and launch an agent.

Layer 4: CLI tool (trackable)

A script reads the task file, builds the prompt, and launches the agent:

# Development tasks (sequential work system)
work run T001                  # read JSONL → build prompt → claude --worktree

# Parallel agents (declarative workspace manager)
workspace start team.yml       # read YAML → split panes → launch agents

# Content pipeline (same pattern)
scripts/content draft C01      # read JSONL → read source map → claude

All three do the same thing: read structured task data, assemble a prompt with the right context, launch Claude. The data model and orchestration differ, but the core loop is identical.

The three-layer prompt sandwich

workspace manager introduces a useful pattern for prompt assembly — the three-layer sandwich:

Layer 1: Universal rules     (TESTING-RULES.md — same across all agents)
Layer 2: Task-specific prompt (01-parser-accuracy.md — unique per agent)
Layer 3: Closing block        (verification + tracking + commit sequence)

Layer 1 and 3 stay constant. Layer 2 is the variable. This ensures every agent follows the same verification and state-update protocol, regardless of what task it’s working on.

The research pipeline has the same structure implicitly: the template is Layer 1 + 3, the unit-specific assignment is Layer 2. Making it explicit (like workspace manager does) is cleaner.

What This Produced

For the TLD research specifically:

Metric	Value
TLDs researched	19
Phases	3 (explore, research, documentation)
Total agents launched	~60 (19 per phase + review agents)
Raw captures	WHOIS + DNS + RDAP per TLD, both registered and available
Final output	19 implementation guides with raw captures
Implementation tiers identified	5 (trivial → special)
Parser families identified	14

The deliverables were dense enough that implementing a new TLD in the scanner required reading one README and copying one set of raw captures as test fixtures. No additional research needed.

Three Systems, One Pattern

I’ve now built three systems that all solve the same problem — coordinating parallel AI agents with shared state — in different domains:

Aspect	Work System	workspace manager	Research Pipeline
Domain	Development tasks	Any parallel agents	Research/writing
Task data	JSONL (`tasks.jsonl`)	YAML (workspace config)	JSONL (`items.jsonl`)
Isolation	Git worktrees	Terminal panes + prompt rules	Directory per unit
Launch	`run T001`	`workspace manager start config.yml`	Subagent per unit
Parallelism	`run-all --max 3`	All panes start at once	Per-phase parallel
Review	`review T001` (diff + build + test)	RUNBOOK totals + `workspace manager read`	Review agent reads all findings
State tracking	JSONL (append-only)	JSONL + RUNBOOK.md	JSON (findings per unit)
Prompt assembly	Script builds from item fields	3-layer sandwich (YAML)	Template + source map

The shared principles:

JSONL for everything. Append-only, git-trackable, human-readable, no database server. Every system uses it for state.
Isolation by default. Whether worktrees, directories, or prompt boundaries — agents don’t share mutable state.
Structured launch. Read task data → build prompt → launch agent. Never hand-write the prompt.
Review as verification. Automated checks (build, test, schema validation) before human review. Persist the verdict.
The ratchet. Each agent reads current state, does work, updates state. Progress only moves forward.

When to Use Which

Work system (a task runner script) — when tasks are code changes that need build/test verification. Each task gets a worktree, a prompt, and an auto-review. Best for: bug fixes, refactors, feature additions.

workspace manager — when you want N agents working simultaneously with visual monitoring. Declarative YAML, all agents start at once, workspace manager read to check progress. Best for: parallel reviews, round-based enrichment, any task where you want to watch agents work.

Research pipeline — when you’re researching N items across the same dimensions and need two-pass refinement. Directory isolation, phased execution, machine-readable findings. Best for: protocol documentation, competitive analysis, API surveys.

All three are overkill for single tasks. Use a plain prompt for that.

The Original Prompt

For reference, here’s the prompt that kicked off the TLD research. One message, natural language, no template:

Please websearch for all remaining TLDs — same info as we have for .com and .de: basic infos and special stuff, allowed characters and domain rules, price, how to get / availability of domain lists, ways for domain check — DNS, DNS auth, WHOIS, other niche special options — and for all of those the full possible metadata it could provide. Then run for real (use proxies) and capture and store full raw responses as truth and for potential parser/implementation guidance. Use one subagent per TLD and give him his own dir where he can download, code, write etc (persist findings in machine-readable way with sources). Then run review over everything creating a global specs/findings file (with all niches and categories etc). Use that to create v2 template/research task. Then launch second pass of agents (one per TLD, same procedure). Then again review and create a compressed, information-dense documentation for each TLD with everything needed (including raw/real captures in a uniform clean format). Ensure agents never overwrite each other’s work / step on each other’s toes.

The template is the reusable pattern extracted from this. The task file is the machine-readable instance. The CLI is the executor. Each layer makes the pattern more reproducible and less dependent on the operator getting the prompt right.

The Problem with Naive Parallel Agents

The Three-Phase Pattern

The folder structure

Isolation: Three Approaches

Machine-Readable First

The Two-Pass Refinement

What the review agent actually produces

Live Captures as Ground Truth

From Pattern to Tool

Layer 1: Natural language prompt (one-shot)

Layer 2: Template with variables (repeatable)

Layer 3: Task file (executable)

Layer 4: CLI tool (trackable)

The three-layer prompt sandwich

What This Produced

Three Systems, One Pattern

When to Use Which

The Original Prompt