Researching Codebases with AI Agents
A systematic methodology for analyzing open-source repos with AI agents — two-category research, structured questions, and synthesis.
On this page
Most developers skim a README and guess. With an AI agent, you can systematically analyze an entire codebase in minutes — extracting architecture, patterns, and implementation details that would take days to find manually.
This pairs with Getting Started with Agentic Coding and the CLAUDE.md Patterns guide. For parallelizing the research stage across multiple agents see Parallel AI Research Pipelines.
This guide documents a real methodology used to analyze 11 reference codebases for the parallel research pipelines.
The Problem
You’re building something and want to learn how the best implementations actually work. But:
- Reading source code is slow — large repos have 50K+ lines
- You don’t know where to look — the important patterns are buried
- You forget findings — analysis without structure evaporates
- You mix up insights from different repos
The Method: Two-Category Research
The core insight: divide your references into categories, then ask different questions per category.
For WebTerminal, the categories were:
| Category | What it answers | Example repos |
|---|---|---|
| A: Terminal Emulators | How does the window work? (input, rendering, buffer, resize) | xterm.js, Alacritty, Ghostty, Kitty |
| B: AI Agent TUIs | How do programs inside terminals work? (layout, streaming, tool display) | Codex CLI, Claude Code, OpenCode, OpenClaw |
The same approach works for anything. Building a code editor? Category A = editor cores (Monaco, CodeMirror), Category B = IDE shells (VS Code, Zed). Building a chat app? Category A = messaging protocols, Category B = chat UIs.
Step 1: Write Your Questions First
Before touching any source code, write specific questions per category. Not vague questions — precise ones with expected output types.
Category A questions (terminal emulators):
- Input Architecture — Where does the prompt live? How is user input captured? When can the user type?
- Buffer Model — Character grid or free-flowing text? How does scrollback work?
- Resize Behavior — Fixed size or grows with content? What happens on window resize?
- Command Lifecycle — What happens on Enter? When does the new prompt appear?
- Output During Execution — Where does command output go? Can user type while streaming?
Category B questions (AI agent TUIs):
- Layout — Alternate screen or inline? How is the screen divided?
- Input Area — Single-line or multi-line? Keybindings?
- Streaming — Character-by-character or line-batched? Auto-scroll behavior?
- Tool Execution Display — Inline, collapsed, expandable? Spinners?
- Session Management — Scrollable history? Long output handling? Memory limits?
The questions are the actual deliverable. Good questions force you to extract comparable data across repos. Bad questions (“how does xterm.js work?”) produce unfocused essays.
Step 2: One Analysis File Per Repo
Each repo gets its own analysis document. Never mix findings. The agent reads the source and answers your questions with file paths and line numbers.
analysis/ INDEX.md ← your questions (the template) xterm-js.md ← xterm.js findings alacritty.md ← Alacritty findings codex-cli.md ← Codex CLI findings SYNTHESIS.md ← combined patterns (written last)The Agent Prompt Pattern
For each repo, the prompt follows this structure:
Read the source code in reference-sources/{repo}/.Answer questions Q1-Q5 from analysis/INDEX.md.For each answer, cite the specific file path and line numbers.Write findings to analysis/{repo}.md.Lock reference repos as read-only (chmod -R a-w) to prevent the agent from accidentally modifying them. The agent should read and analyze, never touch the source.
Step 3: Synthesize Across Repos
After analyzing all repos, write a synthesis that answers: what do all implementations agree on?
Universal agreements are the patterns you should follow. Disagreements are where you have design freedom.
For WebTerminal, the synthesis revealed five universal patterns across ALL real terminals:
| Pattern | Every terminal does this |
|---|---|
| No input bar | Prompt is regular text in the buffer, not a separate widget |
| Scrollable line buffer | Fixed viewport, content scrolls within it |
| Hidden textarea | Offscreen element captures keystrokes |
| Fixed dimensions | Terminal never grows because content was added |
| Dirty-row rendering | Only changed rows are re-rendered |
These findings directly drove the WebTerminal refactor plan — killing the separate prompt bar, fixing the viewport size, and moving to inline cursor rendering.
Step 4: Cross-Reference Categories
The final step is mapping findings from different categories to implementation decisions:
| Insight from Category B | Source | Implementation |
|---|---|---|
| All agents hide input during AI generation | Codex, OpenCode, OpenClaw, Claude Code | No prompt line exists while command runs |
| Line-batched streaming prevents flicker | Codex CLI (newline-gated rendering) | Buffer until \n, render complete lines |
| Tool output should be collapsible | All 4 agents truncate to 10-12 lines | box({ maxLines: 12, collapsible: true }) |
| Component pruning prevents memory bloat | OpenClaw (max 180 components) | Cap body children at ~200, prune oldest |
When to Use This Method
This is heavy artillery. Use it when:
- You’re building something non-trivial and implementations exist to study
- You need to make architecture decisions, not just copy code
- The problem space has competing approaches (and you need to find consensus)
- You’re going to invest significant time building — the upfront research pays back
Don’t use it for small utilities, well-documented APIs, or problems with a single obvious solution.
The Agent’s Role
The agent does the tedious part — reading thousands of lines of unfamiliar source code, finding the relevant sections, and extracting answers to your specific questions. You do the hard part — writing the right questions and synthesizing the findings into architecture decisions.
This is the “research-first” prompt pattern applied at scale: understand the problem space before writing a single line of code.
Sources
- Anthropic, Claude Code: Common workflows
- GitHub, Code search
- Aider, Repository map