Agentic Coding: Getting Started
What agentic coding is, the plan–act–verify loop it runs on, and a first-task workflow that actually ships — a practical, experience-grounded guide.
On this page
Agentic coding is the practice of giving an AI code assistant a goal instead of a keystroke: you describe what you want, and the agent plans the work, reads your repository, edits files, runs commands, checks its own output, and reports back. Tools like Claude Code, Codex CLI, and Cursor all work this way. Treat them as fast junior developers: genuinely useful for bounded tasks, dangerous when they run without constraints.
Quick Reference
| Question | Practical answer |
|---|---|
| What is agentic coding? | Giving an AI assistant a goal so it plans, edits files, runs commands, and verifies results — not just autocompletes. |
| How is it different from autocomplete? | Autocomplete finishes your line; an agent executes a whole task and self-corrects in a loop. |
| Best first task | A small bug fix, utility, or UI change you already know how to review. |
| Required context | Stack, repo rules, test command, “done” criteria, and what not to touch. |
| Where your time goes | Mostly planning and reviewing — not typing code. |
| Main risk | The agent improves past the useful stopping point, or invents unsupported facts confidently. |
| Review rule | Read the diff, run the tests, verify claims against source files or docs. |
What Makes It “Agentic”?
The shift from traditional AI-assisted coding is a shift in who drives:
| Traditional AI Assist | Agentic Coding | |
|---|---|---|
| Scope | Single lines / functions | Entire features across files |
| Interaction | You type, it autocompletes | You describe intent, it plans and executes |
| Context | Current file only | Reads your codebase, project rules, docs |
| Memory | None between prompts | Session context, CLAUDE.md, memory files |
| Decision-making | You drive everything | Agent makes decisions, you review |
| Tool use | Suggestions only | Reads files, runs commands, creates PRs |
“Smarter autocomplete” becomes “a junior developer that works fast, reads everything, and needs code review.”
The Loop Agents Actually Run
What makes a tool agentic is the feedback loop. Instead of producing one answer, it cycles:
- Plan — break the goal into steps and decide what to read and change.
- Act — edit files, run commands, install dependencies.
- Verify — run the tests, the linter, the build, or the program itself.
- Observe — read the output: a failing test, a type error, a stack trace.
- Correct — adjust and repeat until the verification passes.
That self-checking loop is the whole value proposition — and also the whole risk. An agent with a real verification signal (a failing test, a compiler error) converges on working code. An agent with no signal, or a misleading one, will confidently loop toward something wrong. Your leverage is the quality of the signal you give it: a good test command, a strict linter, and clear done-criteria turn the loop from a liability into the feature.
Where Your Time Goes
The biggest adjustment is not technical, it’s where your hours land. In traditional development most of your time is spent writing code. In agentic coding that inverts: you spend almost no time typing implementation, and most of it thinking before (framing the task, the constraints, the context) and reviewing after (reading the diff, checking the claims).
If you find yourself watching the agent type and feeling productive, you’re probably under-investing in the two parts that actually determine the outcome. The keyboard time you saved moves to judgment — it doesn’t disappear.
What Agents Are Good At
Based on real experience building this site with agents and a terminal-emulator project:
- Reading large codebases fast — an agent analyzed 11 terminal-emulator source repos in hours, extracting architecture patterns that would take a person weeks
- Consistent formatting and boilerplate — schema definitions, test scaffolds, CSS custom properties
- Cross-file refactors — renaming a concept across 15 files, updating imports, fixing references
- Research synthesis — reading docs, comparing approaches, summarizing trade-offs (see Parallel AI Research Pipelines for how this scales)
- Mechanical work you understand — “add breadcrumbs to every entry page” when you already know exactly what the result should be
What Agents Struggle With
- Taste and judgment — they over-engineer, add unnecessary abstractions, and optimize things that don’t need it
- Knowing when to stop — without constraints they keep “improving” code until it’s unrecognizable
- Your project’s history — they see what the code looks like now, not why a decision was made
- Novel architecture — they recombine patterns from training data; they don’t invent genuinely new approaches
- Subtle bugs — they’re confident, not careful. The code works on the happy path and misses edge cases
Confidence without correctness is the throughline of every failure mode above — which is why the review step is non-negotiable.
How to Start: Your First Task
Your first agentic task should be small, well-defined, and reviewable:
- Pick a task you already know how to do — so you can judge the output. A bug fix, a utility function, a styling change. Don’t learn the tool and the problem at the same time.
- Write an implementation brief, not a command. Describe the what and why, plus acceptance criteria and scope limits — “Add a 404 page matching the site design, with links back to home and explore; don’t touch the layout components” beats “create src/pages/404.astro with an h1 and two anchor tags.” You’re specifying the destination, not the route.
- Make it ask for a plan first. In plan mode, or by asking for an approach before code, you catch bad ideas while they’re still cheap to reject.
- Give it a verification signal. Point it at the test command and let it run them. The loop above only works if the agent can check itself — without that, you become the only feedback mechanism.
- Review the output like a code review. Read every changed line. Agents commit to an approach even when it’s wrong; your job is to catch the ~10% that’s subtly incorrect. See Reviewing AI-Generated Code for a systematic pass.
Your second task should add a CLAUDE.md. Even ten lines of stack and conventions context measurably improves output. See Writing an Effective CLAUDE.md.
What Determines Output Quality
After enough sessions, three levers explain almost all of the variance — and none of them is “a better prompt in the moment”:
- Context — what the agent knows before it starts: the stack, the rules, the test command, the constraints. This is the rate-limiting factor; the time you spend structuring the request up front is the time best spent. A persistent CLAUDE.md is how you stop re-explaining it.
- Constraints — what you tell it not to do, and when to stop. Scope limits and done-criteria are what keep the loop from over-running.
- Review — the discipline of reading every diff and verifying every claim against source. See Reviewing AI-Generated Code.
Get those three right and the model almost stops mattering. Get them wrong and the best model still ships you confident nonsense.
Sources
- Anthropic, Claude Code overview — describes Claude Code as an agentic tool that understands and modifies a codebase.
- Anthropic, Common workflows — practical workflows: planning, editing, testing, GitHub integration.
- OpenAI, Codex cloud tasks — Codex as a coding agent for repository tasks and reviewable changes.