Agent context packet

Structured metadata, source alternates, graph links, headings, series position, and diagram inventory for crawlers and agent readers.

Table of contents

  1. The Problem
  2. The Method: Two-Category Research
  3. Step 1: Write Your Questions First
  4. Step 2: One Analysis File Per Repo
  5. The Agent Prompt Pattern
  6. Step 3: Synthesize Across Repos
  7. Step 4: Cross-Reference Categories
  8. When to Use This Method
  9. The Agent’s Role
  10. Sources

Series context

Agentic Coding

How to work effectively with AI coding agents — patterns, context management, and real workflows.

  1. Agentic Coding: Getting Started
  2. Prompt Patterns
  3. Context Management for AI Coding Agents
  4. Researching Codebases with AI Agents
  5. Setting Up Claude Code for a New Project
  6. Claude Code vs Codex Plugins — Native Agent Packages

Entry facts

Kind
guide
Maturity
budding
Confidence
high
Origin
ai-drafted (AI-drafted, human-reviewed)
Author
Agent
Directed by
krow
Published
Modified
Words
993 (5 min read)
Series
agentic-coding #4
Tags
agentic-coding, patterns, reference
Full corpus
/llms-full.txt
Readable corpus
/source/full-corpus/

Graph links

Prerequisites agentic-coding-context-management

Related parallel-ai-research-pipelinesagentic-coding-prompt-patternsclaude-md-patterns

Tagsagentic-coding, patterns, reference

Researching Codebases with AI Agents

A systematic methodology for analyzing open-source repos with AI agents — two-category research, structured questions, and synthesis.

/ directed by / / 5 min read
On this page

Most developers skim a README and guess. With an AI agent, you can systematically analyze an entire codebase in minutes — extracting architecture, patterns, and implementation details that would take days to find manually.

This pairs with Getting Started with Agentic Coding and the CLAUDE.md Patterns guide. For parallelizing the research stage across multiple agents see Parallel AI Research Pipelines.

This guide documents a real methodology used to analyze 11 reference codebases for the parallel research pipelines.

The Problem

You’re building something and want to learn how the best implementations actually work. But:

  • Reading source code is slow — large repos have 50K+ lines
  • You don’t know where to look — the important patterns are buried
  • You forget findings — analysis without structure evaporates
  • You mix up insights from different repos

The Method: Two-Category Research

The core insight: divide your references into categories, then ask different questions per category.

For WebTerminal, the categories were:

CategoryWhat it answersExample repos
A: Terminal EmulatorsHow does the window work? (input, rendering, buffer, resize)xterm.js, Alacritty, Ghostty, Kitty
B: AI Agent TUIsHow do programs inside terminals work? (layout, streaming, tool display)Codex CLI, Claude Code, OpenCode, OpenClaw

The same approach works for anything. Building a code editor? Category A = editor cores (Monaco, CodeMirror), Category B = IDE shells (VS Code, Zed). Building a chat app? Category A = messaging protocols, Category B = chat UIs.

Step 1: Write Your Questions First

Before touching any source code, write specific questions per category. Not vague questions — precise ones with expected output types.

Category A questions (terminal emulators):

  1. Input Architecture — Where does the prompt live? How is user input captured? When can the user type?
  2. Buffer Model — Character grid or free-flowing text? How does scrollback work?
  3. Resize Behavior — Fixed size or grows with content? What happens on window resize?
  4. Command Lifecycle — What happens on Enter? When does the new prompt appear?
  5. Output During Execution — Where does command output go? Can user type while streaming?

Category B questions (AI agent TUIs):

  1. Layout — Alternate screen or inline? How is the screen divided?
  2. Input Area — Single-line or multi-line? Keybindings?
  3. Streaming — Character-by-character or line-batched? Auto-scroll behavior?
  4. Tool Execution Display — Inline, collapsed, expandable? Spinners?
  5. Session Management — Scrollable history? Long output handling? Memory limits?
Key Insight

The questions are the actual deliverable. Good questions force you to extract comparable data across repos. Bad questions (“how does xterm.js work?”) produce unfocused essays.

Step 2: One Analysis File Per Repo

Each repo gets its own analysis document. Never mix findings. The agent reads the source and answers your questions with file paths and line numbers.

analysis/
INDEX.md ← your questions (the template)
xterm-js.md ← xterm.js findings
alacritty.md ← Alacritty findings
codex-cli.md ← Codex CLI findings
SYNTHESIS.md ← combined patterns (written last)

The Agent Prompt Pattern

For each repo, the prompt follows this structure:

Read the source code in reference-sources/{repo}/.
Answer questions Q1-Q5 from analysis/INDEX.md.
For each answer, cite the specific file path and line numbers.
Write findings to analysis/{repo}.md.
Tip

Lock reference repos as read-only (chmod -R a-w) to prevent the agent from accidentally modifying them. The agent should read and analyze, never touch the source.

Step 3: Synthesize Across Repos

After analyzing all repos, write a synthesis that answers: what do all implementations agree on?

Universal agreements are the patterns you should follow. Disagreements are where you have design freedom.

For WebTerminal, the synthesis revealed five universal patterns across ALL real terminals:

PatternEvery terminal does this
No input barPrompt is regular text in the buffer, not a separate widget
Scrollable line bufferFixed viewport, content scrolls within it
Hidden textareaOffscreen element captures keystrokes
Fixed dimensionsTerminal never grows because content was added
Dirty-row renderingOnly changed rows are re-rendered

These findings directly drove the WebTerminal refactor plan — killing the separate prompt bar, fixing the viewport size, and moving to inline cursor rendering.

Step 4: Cross-Reference Categories

The final step is mapping findings from different categories to implementation decisions:

Insight from Category BSourceImplementation
All agents hide input during AI generationCodex, OpenCode, OpenClaw, Claude CodeNo prompt line exists while command runs
Line-batched streaming prevents flickerCodex CLI (newline-gated rendering)Buffer until \n, render complete lines
Tool output should be collapsibleAll 4 agents truncate to 10-12 linesbox({ maxLines: 12, collapsible: true })
Component pruning prevents memory bloatOpenClaw (max 180 components)Cap body children at ~200, prune oldest

When to Use This Method

This is heavy artillery. Use it when:

  • You’re building something non-trivial and implementations exist to study
  • You need to make architecture decisions, not just copy code
  • The problem space has competing approaches (and you need to find consensus)
  • You’re going to invest significant time building — the upfront research pays back

Don’t use it for small utilities, well-documented APIs, or problems with a single obvious solution.

The Agent’s Role

The agent does the tedious part — reading thousands of lines of unfamiliar source code, finding the relevant sections, and extracting answers to your specific questions. You do the hard part — writing the right questions and synthesizing the findings into architecture decisions.

This is the “research-first” prompt pattern applied at scale: understand the problem space before writing a single line of code.

Sources

Diagram

Drag to pan · scroll or pinch to zoom · Esc to close