Agent context packet

Structured metadata, source alternates, graph links, headings, series position, and diagram inventory for crawlers and agent readers.

Table of contents

The Problem
The Method: Two-Category Research
Step 1: Write Your Questions First
Step 2: One Analysis File Per Repo
The Agent Prompt Pattern
Step 3: Synthesize Across Repos
Step 4: Cross-Reference Categories
When to Use This Method
The Agent’s Role
Sources

Series context

Agentic Coding

How to work effectively with AI coding agents — patterns, context management, and real workflows.

Entry facts

Canonical: /guide/researching-codebases-with-agents/
Kind: guide
Maturity: budding
Confidence: high
Origin: ai-drafted (AI-drafted, human-reviewed)
Author: Agent
Directed by: krow
Published: March 20, 2026
Modified: May 29, 2026
Words: 993 (5 min read)
Series: agentic-coding #4
Tags: agentic-coding, patterns, reference
Prerequisites: agentic-coding-context-management
Related: parallel-ai-research-pipelinesagentic-coding-prompt-patternsclaude-md-patterns
Markdown: /guide/researching-codebases-with-agents.md
Schema JSON: /guide/researching-codebases-with-agents.schema.json
Readable Markdown: /source/guide/researching-codebases-with-agents/
Full corpus: /llms-full.txt
Readable corpus: /source/full-corpus/

Graph links

Prerequisites agentic-coding-context-management

Tagsagentic-coding, patterns, reference

Researching Codebases with AI Agents

A systematic methodology for analyzing open-source repos with AI agents — two-category research, structured questions, and synthesis.

Agent / directed by krow / March 20, 2026 / 5 min read updated May 29, 2026

On this page

Most developers skim a README and guess. With an AI agent, you can systematically analyze an entire codebase in minutes — extracting architecture, patterns, and implementation details that would take days to find manually.

This pairs with Getting Started with Agentic Coding and the CLAUDE.md Patterns guide. For parallelizing the research stage across multiple agents see Parallel AI Research Pipelines.

This guide documents a real methodology used to analyze 11 reference codebases for the parallel research pipelines.

The Problem

You’re building something and want to learn how the best implementations actually work. But:

Reading source code is slow — large repos have 50K+ lines
You don’t know where to look — the important patterns are buried
You forget findings — analysis without structure evaporates
You mix up insights from different repos

The Method: Two-Category Research

The core insight: divide your references into categories, then ask different questions per category.

For WebTerminal, the categories were:

Category	What it answers	Example repos
A: Terminal Emulators	How does the window work? (input, rendering, buffer, resize)	xterm.js, Alacritty, Ghostty, Kitty
B: AI Agent TUIs	How do programs inside terminals work? (layout, streaming, tool display)	Codex CLI, Claude Code, OpenCode, OpenClaw

The same approach works for anything. Building a code editor? Category A = editor cores (Monaco, CodeMirror), Category B = IDE shells (VS Code, Zed). Building a chat app? Category A = messaging protocols, Category B = chat UIs.

Step 1: Write Your Questions First

Before touching any source code, write specific questions per category. Not vague questions — precise ones with expected output types.

Category A questions (terminal emulators):

Input Architecture — Where does the prompt live? How is user input captured? When can the user type?
Buffer Model — Character grid or free-flowing text? How does scrollback work?
Resize Behavior — Fixed size or grows with content? What happens on window resize?
Command Lifecycle — What happens on Enter? When does the new prompt appear?
Output During Execution — Where does command output go? Can user type while streaming?

Category B questions (AI agent TUIs):

Layout — Alternate screen or inline? How is the screen divided?
Input Area — Single-line or multi-line? Keybindings?
Streaming — Character-by-character or line-batched? Auto-scroll behavior?
Tool Execution Display — Inline, collapsed, expandable? Spinners?
Session Management — Scrollable history? Long output handling? Memory limits?

Key Insight

The questions are the actual deliverable. Good questions force you to extract comparable data across repos. Bad questions (“how does xterm.js work?”) produce unfocused essays.

Step 2: One Analysis File Per Repo

Each repo gets its own analysis document. Never mix findings. The agent reads the source and answers your questions with file paths and line numbers.

analysis/
  INDEX.md          ← your questions (the template)
  xterm-js.md       ← xterm.js findings
  alacritty.md      ← Alacritty findings
  codex-cli.md      ← Codex CLI findings
  SYNTHESIS.md      ← combined patterns (written last)

The Agent Prompt Pattern

For each repo, the prompt follows this structure:

Read the source code in reference-sources/{repo}/.
Answer questions Q1-Q5 from analysis/INDEX.md.
For each answer, cite the specific file path and line numbers.
Write findings to analysis/{repo}.md.

Tip

Lock reference repos as read-only (chmod -R a-w) to prevent the agent from accidentally modifying them. The agent should read and analyze, never touch the source.

Step 3: Synthesize Across Repos

After analyzing all repos, write a synthesis that answers: what do all implementations agree on?

Universal agreements are the patterns you should follow. Disagreements are where you have design freedom.

For WebTerminal, the synthesis revealed five universal patterns across ALL real terminals:

Pattern	Every terminal does this
No input bar	Prompt is regular text in the buffer, not a separate widget
Scrollable line buffer	Fixed viewport, content scrolls within it
Hidden textarea	Offscreen element captures keystrokes
Fixed dimensions	Terminal never grows because content was added
Dirty-row rendering	Only changed rows are re-rendered

These findings directly drove the WebTerminal refactor plan — killing the separate prompt bar, fixing the viewport size, and moving to inline cursor rendering.

Step 4: Cross-Reference Categories

The final step is mapping findings from different categories to implementation decisions:

Insight from Category B	Source	Implementation
All agents hide input during AI generation	Codex, OpenCode, OpenClaw, Claude Code	No prompt line exists while command runs
Line-batched streaming prevents flicker	Codex CLI (newline-gated rendering)	Buffer until `\n`, render complete lines
Tool output should be collapsible	All 4 agents truncate to 10-12 lines	`box({ maxLines: 12, collapsible: true })`
Component pruning prevents memory bloat	OpenClaw (max 180 components)	Cap body children at ~200, prune oldest

When to Use This Method

This is heavy artillery. Use it when:

You’re building something non-trivial and implementations exist to study
You need to make architecture decisions, not just copy code
The problem space has competing approaches (and you need to find consensus)
You’re going to invest significant time building — the upfront research pays back

Don’t use it for small utilities, well-documented APIs, or problems with a single obvious solution.

The Agent’s Role

The agent does the tedious part — reading thousands of lines of unfamiliar source code, finding the relevant sections, and extracting answers to your specific questions. You do the hard part — writing the right questions and synthesizing the findings into architecture decisions.

This is the “research-first” prompt pattern applied at scale: understand the problem space before writing a single line of code.

Sources

Anthropic, Claude Code: Common workflows
GitHub, Code search
Aider, Repository map

Current: Human view