Full corpus
Full corpus source

Readable source view for the complete agent corpus. The raw text endpoint remains available for crawlers and agent readers.
Raw file
# krowdev — Full Content

> Snapshot 2026-05-10

---

# massdns Rate Limit Flags: -q, --max-qps, --max-queries

URL: https://krowdev.com/snippet/massdns-rate-limit-flags/
Kind: snippet | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: dns, networking, reference, rate-limiting

> What --max-qps, -q, --max-queries actually do in massdns — the queries-per-second flag, in-flight slot limit, and how to pick values that don't melt your resolver.

## Agent Context

- Canonical: https://krowdev.com/snippet/massdns-rate-limit-flags/
- Markdown: https://krowdev.com/snippet/massdns-rate-limit-flags.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: snippet
- Maturity: budding
- Confidence: high
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-05-10
- Modified: 2026-05-10
- Words: 509 (3 min read)
- Tags: dns, networking, reference, rate-limiting
- Related: go-dns-scanner-4000qps, aimd-rate-limiting, dns-resolution-full-picture
- Content map:
  - h2: TL;DR
  - h2: Common confusions
  - h2: Sane starting values
  - h2: When to step beyond --max-qps
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

[massdns](https://github.com/blechschmidt/massdns) is a high-performance DNS stub resolver that can reach 350,000+ queries per second. Two flags control rate, and they are easy to confuse.

## TL;DR

| Flag | Long form | Default | What it limits |
|---|---|---|---|
| `-q` | none | unlimited | Quiet mode (no progress output). **Not** queries-per-second. |
| `--max-qps N` | `--max-qps N` | 0 (no limit) | Maximum **queries per second** sent across all resolvers. |
| `-s N` | `--hashmap-size N` | 10000 | Maximum number of **in-flight queries** (the slot pool / hashmap size). |
| `-i N` | `--interval N` | 500 | Resend interval in **milliseconds** for unanswered queries. |
| `-r FILE` | `--resolvers FILE` | required | Path to resolver list. Throughput scales with resolver count, not just `--max-qps`. |

There is **no** `--qps`, `--max-queries`, or `--maximum-queries-per-second` flag. The single rate knob is `--max-qps`.

## Common confusions

**`-q` is not "queries per second".** It's the quiet flag — suppresses the per-second progress bar. Easy mistake because the long-form for the rate limit is `--max-qps`, which you might shorthand in your head as `-q`.

**Maximum queries per second != maximum in-flight queries.** `--max-qps 1000` sends 1000 new queries per wall-clock second. `-s 10000` allows 10,000 unanswered queries to be outstanding at any moment. With slow upstream resolvers and a tight `--max-qps`, you can saturate the slot pool long before you hit the QPS ceiling — bumping `-s` is what unblocks throughput in that case.

**Resolver count caps real-world QPS.** Each resolver in your list gets queries round-robin. If `--max-qps` is 4000 but you only have 20 healthy resolvers in `-r resolvers.txt`, every resolver eats 200 qps — most public resolvers will rate-limit you well below that. Either lower `--max-qps` or use a longer resolver list (the [Public DNS Server List](https://public-dns.info/) is a common starting point, though most entries are unstable).

## Sane starting values

```bash
massdns \
  -r resolvers.txt \
  -t A \
  -o S \
  --max-qps 1000 \
  -s 10000 \
  -i 500 \
  domains.txt > results.txt
```

- `--max-qps 1000` — conservative; raise once you're confident in your resolver list.
- `-s 10000` (default) — fine for most workloads; raise to 50000+ for slow resolvers / WAN-heavy lookups.
- `-i 500` — 500 ms retry interval; lower if you're using fast local resolvers, raise (1000–2000) if you're hammering public infrastructure.

## When to step beyond `--max-qps`

`--max-qps` is a fixed ceiling. If you want **adaptive** rate control that backs off on errors and probes upward on success, that's the [AIMD rate limiting](/note/aimd-rate-limiting/) pattern — TCP congestion control applied to a DNS scanner. Useful when you don't know the ceiling in advance.

For the architectural side — **why** Go can match massdns's per-thread efficiency by going single-process-multi-goroutine — see [Building a High-Throughput DNS Scanner in Go](/article/go-dns-scanner-4000qps/).

## Sources

- [`man massdns`](https://github.com/blechschmidt/massdns#usage) — official flag reference (B. Blechschmidt)
- [`zdns`](https://github.com/zmap/zdns) — Go-based alternative from ZMap; flags are different but solves the same problem
- [DNS Resolution: The Full Picture](/guide/dns-resolution-full-picture/) — what's actually happening behind each query

---

# Bare Element Selectors vs Library HTML

URL: https://krowdev.com/snippet/bare-selectors-vs-library-html/
Kind: snippet | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: css, astro, patterns

> How bare tag selectors in a global stylesheet collide with third-party library HTML — the box-model stacking trap and a specificity ladder for fixes.

## Agent Context

- Canonical: https://krowdev.com/snippet/bare-selectors-vs-library-html/
- Markdown: https://krowdev.com/snippet/bare-selectors-vs-library-html.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: snippet
- Maturity: budding
- Confidence: high
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-04-18
- Modified: 2026-04-21
- Words: 656 (3 min read)
- Tags: css, astro, patterns
- Related: css-collision-visualized, astro-mental-model
- Content map:
  - h2: Rule
  - h2: Mechanism
  - h2: Common collision families
  - h2: Specificity ladder for fixes
  - h2: Diagnostic
  - h2: Related
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

## Rule

Bare semantic element selectors in global CSS apply to every matching element in the document, including HTML emitted by third-party libraries. Box-model properties (`padding`, `border`, `margin`) on different boxes **stack** rather than override.

## Mechanism

- A bare `tag {}` rule has specificity `(0,0,1)` and matches any element of that type, regardless of ancestry.
- Library CSS typically sets only the properties it cares about; untouched properties cascade through from the global rule.
- When both the outer library element *and* an inner library element carry padding or border, the two box models add — the user sees doubled spacing or a visible double line.

## Common collision families

| Bare rule | Likely library producer | Typical effect |
|---|---|---|
| `pre { padding, border, background }` | Syntax-highlighter frames (Expressive Code, Shiki, Prism) | Padding stacks against the inner code line; border draws under the frame's titlebar border; background diverges from the highlighter's theme. |
| `:not(pre) > code { background, border }` | Highlighter frame captions, markdown inline code emitted inside library wrappers | Chip styling applied twice once a library wraps the code in an extra element. |
| `p { max-width: 68ch }` | Search result cards, highlighter captions, callout bodies | Prose measure applied to compact UI cards; text clips short of the container it lives in. |
| `ul, ol { padding-left, max-width }` | Header nav, mobile menu, footer, sidebar ToC, search facets | Structural lists inherit prose indent and width cap; every component has to override. |
| `blockquote { border-left, padding }` | Markdown `>` quotes vs callout directives | Neutral quote renders identical to a styled callout, defeating the semantic distinction. |
| `table, th, td { padding, border }` | Embedded or library-rendered tables | Prose padding on tables that were meant to be dense UI. |
| `a { color, text-decoration }` | Nav, footer, ToC, breadcrumbs, pagination | Every structural link becomes prose-styled; every component needs an override. |
| `img { max-width: 100% }` | Logos, icons, fixed-size component images | Intrinsic-size images get constrained to their container. |

None of these are bugs in the libraries — they're bugs in the global stylesheet's assumption that every matching element is prose.

## Specificity ladder for fixes

1. **Delete.** If no legitimate prose consumer exists (e.g. every `<pre>` on the site is highlighter output with zero raw consumers), the rule has no job. The library owns inner styling via its own config.
2. **Scope to a content wrapper.** Move rules into a layout-scoped `<style>` with `:global()` targeting a wrapper class on your rendered markdown region. Astro-native, cheap, reversible.
3. **`:not()` exclusion.** Per-library band-aid. Works; doesn't scale — each new library adds another exception.

Choose the highest-up option the rule permits. Exclusions beat scoping only when you can't change the wrapper.

## Diagnostic

```bash
# 1. List bare element selectors in the global stylesheet
grep -E '^[a-z]+[, {]' path/to/global.css

# 2. Count legitimate prose consumers vs library-emitted for each tag
grep -r '<pre'  src/ content/       # raw authored <pre>?
grep -rE '<ul>|<ol>' src/ content/  # raw authored lists in prose?

# 3. In DevTools, inspect a library-rendered element.
#    Count how many overriding rules the component had to write
#    to cancel your global. Each override = a collision you paid for.
```

Zero legitimate consumers → delete. Many consumers → scope to a content wrapper. A lone outlier → narrow selector or `:not()`.

If you want to see the exact highlighter and tabbed-code producers this warning is about, [Interactive Features Showcase](/snippet/interactive-features-showcase/) exercises them on a live page.

## Related

- [CSS Collision Visualized](/snippet/css-collision-visualized/) — interactive demo of the `<pre>` vs highlighter-frame case, plus cards for the other common producers.
- [Astro Mental Model](/guide/astro-mental-model/) — where scoped `<style>` and `:global()` fit in Astro's component model.

## Sources

- MDN, [Specificity](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_cascade/Specificity)
- MDN, [The box model](https://developer.mozilla.org/docs/Learn_web_development/Core/Styling_basics/Box_model)
- Astro Docs, [Styles and CSS](https://docs.astro.build/en/guides/styling/)

---

# CSS Collision Visualized

URL: https://krowdev.com/snippet/css-collision-visualized/
Kind: snippet | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: css, astro, patterns

> Interactive demo of bare element selectors colliding with library HTML — three defects from one rule, shown against common library producers.

## Agent Context

- Canonical: https://krowdev.com/snippet/css-collision-visualized/
- Markdown: https://krowdev.com/snippet/css-collision-visualized.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: snippet
- Maturity: budding
- Confidence: high
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-04-18
- Modified: 2026-04-21
- Words: 420 (2 min read)
- Tags: css, astro, patterns
- Related: bare-selectors-vs-library-html, astro-mental-model
- Content map:
  - h2: Worked example: pre vs a syntax-highlighter frame
  - h2: Same cascade, other common producers
  - h2: Fix ladder
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

import CSSCollisionDemo from '../../src/components/CSSCollisionDemo.svelte';
import CollisionCases from '../../src/components/CollisionCases.svelte';

Bare element selectors in a global stylesheet cascade into HTML emitted by third-party libraries. Box-model properties on different boxes stack rather than override, so a single well-meaning `pre {}` or `a {}` rule can produce visibly doubled padding, doubled borders, mismatched backgrounds, or structural elements that inherit prose styling. See [Bare Element Selectors vs Library HTML](/snippet/bare-selectors-vs-library-html/) for the full reference table and diagnostic.

The same syntax-highlighter and tabbed-code wrappers appear on [Interactive Features Showcase](/snippet/interactive-features-showcase/), which makes this collision easier to reproduce in a real article context.

## Worked example: `pre` vs a syntax-highlighter frame

Syntax highlighters (Expressive Code, Shiki frames, Prism plugins, etc.) wrap code in an outer `<pre>` and an inner container with its own padding, border, and background. A bare `pre { padding, border, background }` rule in the global stylesheet lands on the outer element — and produces three defects at once:

1. **Padding stacks.** The outer `<pre>` gets the global padding; the inner code line already had the highlighter's padding.
2. **Border doubles.** The frame titlebar already draws a bottom border; the global rule draws a top border right beneath it.
3. **Background mismatches.** The frame chrome uses the highlighter's theme background; the global rule paints the outer `<pre>` with a slightly different token.

<CSSCollisionDemo client:visible />

**Fix.** If no hand-authored `<pre>` appears anywhere on the site (i.e. every `<pre>` is highlighter output), delete the bare rule. The library owns inner styling via its own config. Keep one declaration for outer block rhythm:

```css
.expressive-code {
  margin: 1.5rem 0;
}
```

If hand-authored `<pre>` *does* appear, scope the rule to a content wrapper instead:

```css
.content :global(pre:not([class])) {
  /* your prose styling */
}
```

## Same cascade, other common producers

The `pre` case is visible because three box-model properties stack at once. The same cascade shape applies elsewhere — one property at a time, so the collision is quieter but just as real. Each card below toggles between the broken state (global rule applied) and the fixed state (rule scoped to prose or removed).

<CollisionCases client:visible />

## Fix ladder

1. **Delete** when no legitimate prose consumer exists.
2. **Scope to a content wrapper class** — move the rule into a layout-scoped `<style>` with `:global()` targeting your rendered-markdown region. Astro-native.
3. **`:not()` exclusion** — per-library, doesn't scale.

See also: [Bare Element Selectors vs Library HTML](/snippet/bare-selectors-vs-library-html/) for the full inventory and diagnostic.

## Sources

- MDN, [Specificity](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_cascade/Specificity)
- MDN, [The box model](https://developer.mozilla.org/docs/Learn_web_development/Core/Styling_basics/Box_model)
- Astro Docs, [Styles and CSS](https://docs.astro.build/en/guides/styling/)

---

# Pipeline Stage Communication

URL: https://krowdev.com/snippet/pipeline-stage-communication/
Kind: snippet | Maturity: seedling | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: architecture, patterns

> Patterns for connecting independent pipeline stages via message queues — decoupled producers and consumers with batch collection and backpressure.

## Agent Context

- Canonical: https://krowdev.com/snippet/pipeline-stage-communication/
- Markdown: https://krowdev.com/snippet/pipeline-stage-communication.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: snippet
- Maturity: seedling
- Confidence: medium
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-04-07
- Modified: 2026-04-21
- Words: 490 (3 min read)
- Tags: architecture, patterns
- Related: worker-pool-isolation, parallel-ai-research-pipelines, aimd-rate-limiting
- Content map:
  - h2: The Shape
  - h2: Producer Side: Send and Move On
  - h2: Consumer Side: Batch Collection
  - h2: Throughput Matching
  - h2: Key Details
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

Connect pipeline stages through message queues so each stage runs as an independent service. Stages don't call each other directly — they produce to and consume from queues. Combine with [worker pool isolation](/snippet/worker-pool-isolation/) per stage and [AIMD rate limiting](/note/aimd-rate-limiting/) on external calls for a resilient pipeline.

## The Shape

```
Stage A  →  [queue]  →  Stage B  →  [queue]  →  Stage C
producer     buffer     consumer/    buffer     consumer
                        producer
```

Each stage owns its own runtime, scaling, and failure domain. The queue is the contract between them. Stage A doesn't know or care whether Stage B is written in a different language, runs on different hardware, or processes items one at a time or in batches.

## Producer Side: Send and Move On

The producer pushes results to a channel or queue and immediately returns to its own work. No waiting for the consumer.

```rust
fn run(jobs: &Receiver<Input>, results: &Sender<Output>) {
    while let Ok(item) = jobs.recv() {
        match process(item) {
            Ok(output) => { results.send(output).ok(); }
            Err(e) => { handle_error(e); }
        }
    }
}
```

The `.ok()` on send is intentional — if the downstream queue is gone, this stage logs and continues rather than panicking.

## Consumer Side: Batch Collection

Some stages work more efficiently in batches. Collect items up to a batch size, with a timeout so partial batches don't stall forever.

```python
async def collect_batch(queue, batch_size: int = 50) -> list:
    items = []
    while len(items) < batch_size:
        try:
            item = await asyncio.wait_for(queue.get(), timeout=5.0)
            items.append(item)
        except asyncio.TimeoutError:
            break  # flush partial batch
    return items
```

The timeout is critical. Without it, a batch that's 49/50 full waits indefinitely if the upstream slows down.

## Throughput Matching

Stages rarely have identical throughput. The queue absorbs bursts and smooths mismatches.

| Pattern | When to use |
|---------|------------|
| 1:1 queue | Stages have similar throughput |
| Fan-out (1:N) | Consumer is slower — parallelize it |
| Batching | Consumer has high per-call overhead, amortize it |
| Bounded queue + backpressure | Prevent memory growth when consumer falls behind |

If Stage B is 3x slower than Stage A, run 3 instances of Stage B consuming from the same queue. The queue is the load balancer.

## Key Details

**Bounded queues.** Unbounded queues hide backpressure until memory runs out. Set a hard cap and let the queue push back on producers when full.

**Per-stage monitoring.** Track queue depth between each pair of stages. Growing depth means the consumer can't keep up — scale it or investigate before the queue hits its limit.

**Graceful drain.** On shutdown, stop accepting new items, flush in-progress work, then close the output queue. Stages shut down in order from the head of the pipeline.

At the workflow level, [Parallel AI Research Pipelines](/article/parallel-ai-research-pipelines/) uses the same separation: each phase talks through persisted artifacts instead of direct agent-to-agent coupling.

## Sources

- Python, [asyncio queues](https://docs.python.org/3/library/asyncio-queue.html)
- Rust, [std::sync::mpsc](https://doc.rust-lang.org/std/sync/mpsc/)
- Go, [Concurrency patterns: pipelines and cancellation](https://go.dev/blog/pipelines)

---

# Worker Pool Isolation Pattern

URL: https://krowdev.com/snippet/worker-pool-isolation/
Kind: snippet | Maturity: seedling | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: architecture, concurrency

> Separate worker pools per task type so a slow or failing dependency can't starve unrelated work — the bulkhead pattern applied to concurrent processing.

## Agent Context

- Canonical: https://krowdev.com/snippet/worker-pool-isolation/
- Markdown: https://krowdev.com/snippet/worker-pool-isolation.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: snippet
- Maturity: seedling
- Confidence: medium
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-04-07
- Modified: 2026-04-21
- Words: 470 (3 min read)
- Tags: architecture, concurrency
- Related: pipeline-stage-communication, go-dns-scanner-4000qps, aimd-rate-limiting
- Content map:
  - h2: The Problem
  - h2: The Fix: One Pool Per Concern
  - h2: Sizing
  - h2: Key Details
  - h2: When to Use This
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

Run different categories of work in separate, bounded pools. A spike in one category can't starve the others. This pairs naturally with [pipeline stage communication](/snippet/pipeline-stage-communication/) — each stage gets its own pool. For rate-sensitive pools, add [AIMD rate limiting](/note/aimd-rate-limiting/).

## The Problem

A single shared worker pool handles API calls, file processing, and database writes. The API starts responding slowly. Workers pile up waiting on API responses. File processing and database writes — which are fine — queue behind them and stall. One slow dependency takes down everything.

This is the same failure mode the [Go DNS scanner](/article/go-dns-scanner-4000qps/) had to avoid when network-bound probes and local parsing shared the same concurrency budget.

## The Fix: One Pool Per Concern

```go
type WorkerPool struct {
    name    string
    workers int
    queue   chan Job
    sem     chan struct{} // bounds concurrency
}

func NewPool(name string, workers, queueSize int) *WorkerPool {
    p := &WorkerPool{
        name:    name,
        workers: workers,
        queue:   make(chan Job, queueSize),
        sem:     make(chan struct{}, workers),
    }
    go p.run()
    return p
}

pools := map[string]*WorkerPool{
    "api":   NewPool("api", 10, 100),
    "files": NewPool("files", 4, 50),
    "db":    NewPool("db", 8, 200),
}
```

The API pool fills up? The file and database pools keep moving. Each pool has its own concurrency limit and backpressure via its own queue.

## Sizing

| Pool | Size by | Watch for |
|------|---------|-----------|
| I/O-bound (API calls, network) | Number of connections you can sustain | Queue depth growing = upstream is slow |
| CPU-bound (parsing, transforms) | Number of cores | CPU saturation = pool is too large |
| External writes (DB, storage) | Connection pool limit of the backend | Timeouts = reduce pool or batch writes |

Start small, measure, increase. A pool that's too large creates more contention than it solves.

## Key Details

**Bounded queues, not unbounded.** An unbounded queue hides backpressure — memory grows silently until the process crashes. Use a buffered channel or ring buffer with a hard cap. When the queue is full, reject or apply backpressure to the caller.

**Per-pool timeouts.** API calls might need a 30-second timeout. File operations might need 5 seconds. A shared timeout is wrong for both. Set deadlines per pool based on the expected latency profile of that work type.

**Monitor each pool independently.** Track queue depth, active workers, completion rate, and error rate per pool. A healthy aggregate hides a sick pool.

## When to Use This

- Multiple dependency types with different latency profiles
- Any system where one slow path shouldn't block unrelated fast paths
- Worker counts that need independent tuning per workload

This is the bulkhead pattern from ship design — compartments that prevent a hull breach from flooding the entire vessel. Same idea, applied to goroutines.

## Sources

- Microsoft Learn, [Bulkhead pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/bulkhead)
- Go, [Concurrency patterns: pipelines and cancellation](https://go.dev/blog/pipelines)

---

# Domain Registration: From ICANN to Your Browser

URL: https://krowdev.com/note/domain-registration-icann-to-browser/
Kind: note | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: dns, networking, fundamentals
Series: domain-infrastructure (#4)

> How domains move from ICANN through registries and registrars — the three-tier model, EPP, lifecycle states, and what happens when a domain drops.

## Agent Context

- Canonical: https://krowdev.com/note/domain-registration-icann-to-browser/
- Markdown: https://krowdev.com/note/domain-registration-icann-to-browser.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: note
- Maturity: budding
- Confidence: medium
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-03-29
- Modified: 2026-03-29
- Words: 1815 (9 min read)
- Tags: dns, networking, fundamentals
- Series: domain-infrastructure (#4)
- Prerequisites: dns-resolution-full-picture
- Related: dns-resolution-full-picture, whois-dead-long-live-rdap
- Content map:
  - h2: The Three-Tier Model
  - h2: Registry vs. Registrar — Why It Matters
  - h3: Thick vs. Thin Registries
  - h2: EPP: How Registrars Talk to Registries
  - h2: Domain Lifecycle
  - h2: Status Flags
  - h3: The Hold Trap
  - h2: Drop Catching: The Five-Day Race
  - h2: The Full Picture
  - h2: Sources
- Diagrams: Mermaid fences are paired with adjacent ASCII companions in this document (3 Mermaid, 3 ASCII); HTML figures expose rendered SVG plus copyable Mermaid/ASCII source tabs.
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

[DNS resolution](/guide/dns-resolution-full-picture/) explains how a name becomes an IP address. Domain registration explains how the name gets into the system in the first place. Every domain you've ever typed into a browser exists because of a chain of contracts, protocols, and databases stretching from a non-profit in Los Angeles to a zone file regenerated every few minutes.

## The Three-Tier Model

Domain registration is a hierarchy with clear separation of concerns:

```mermaid
graph TD
  ICANN["<b>ICANN</b><br/>Policy maker. Accredits registrars,<br/>manages root zone, negotiates<br/>registry agreements."]
  Registry["<b>Registry</b> (e.g. Verisign for .com)<br/>Operates one TLD. Maintains the<br/>authoritative database. Publishes<br/>the zone file. Runs TLD<br/>nameservers and RDAP/WHOIS."]
  Registrar["<b>Registrar</b> (GoDaddy, Namecheap, Cloudflare)<br/>Customer-facing. Sells domains.<br/>Sends EPP commands to the registry<br/>to create, renew, transfer, delete."]
  Registrant["<b>Registrant</b><br/>You. The person or entity that<br/>registered the domain."]

  ICANN -->|contracts + accreditation| Registry
  Registry -->|EPP protocol| Registrar
  Registrar -->|web interface / API| Registrant
```

```ascii
┌──────────────┐
│    ICANN      │  Policy maker. Accredits registrars, manages root zone,
│               │  negotiates registry agreements.
└──────┬───────┘
       │ contracts + accreditation
┌──────┴───────┐
│   Registry    │  Operates one TLD. Maintains the authoritative database
│  (e.g.,       │  of all domains under that TLD. Publishes the zone file.
│   Verisign    │  Runs TLD nameservers and RDAP/WHOIS.
│   for .com)   │
└──────┬───────┘
       │ EPP protocol
┌──────┴───────┐
│  Registrar    │  Customer-facing. Sells domains. Sends EPP commands to
│  (GoDaddy,    │  the registry to create, renew, transfer, and delete
│   Namecheap,  │  registrations.
│   Cloudflare) │
└──────┬───────┘
       │ web interface / API
┌──────┴───────┐
│  Registrant   │  You. The person or entity that registered the domain.
└──────────────┘
```

**ICANN** (Internet Corporation for Assigned Names and Numbers) is the non-profit that coordinates the namespace. It decides what TLDs exist, contracts with each TLD's registry operator, accredits registrars, and manages the IANA functions (IP allocation, AS numbers, protocol parameters, root zone KSK). ICANN doesn't sell domains. It sets the rules.

**Registries** operate the infrastructure for a TLD. Verisign runs `.com` and `.net`. Donuts operates many newer gTLDs. Each registry is a monopoly for its namespace — there's exactly one operator per TLD. The registry maintains the canonical database, generates zone files, runs TLD nameservers, and exposes RDAP endpoints.

**Registrars** are the competitive layer. Dozens of registrars can sell `.com` domains, all talking to the same Verisign registry behind the scenes. They handle billing, customer support, DNS hosting, and the web interfaces you actually interact with.

## Registry vs. Registrar — Why It Matters

The distinction isn't academic. It determines where data lives and who controls what.

| | Registry | Registrar |
|---|---|---|
| **Relationship to TLD** | One per TLD (monopoly) | Many per TLD (competitive) |
| **Sells to end users** | No (usually) | Yes |
| **Canonical data** | Domain existence, status, nameservers, dates | Registrant contacts, billing |
| **RDAP/WHOIS** | Authoritative for domain status | Additional registrant details |
| **Protocol** | Receives EPP commands | Sends EPP commands |

When you query a registry's [RDAP](/note/whois-dead-long-live-rdap/) server, you get authoritative data about whether a domain exists, its status flags, its nameservers, and its dates. When you query a registrar's RDAP server, you get richer registrant and contact information.

### Thick vs. Thin Registries

Historically, `.com` was a "thin" registry — Verisign stored only the domain name, registrar, nameservers, status, and dates. All detailed registrant information lived at the registrar. If you wanted contact data, you had to query the registrar's WHOIS separately.

In February 2020, `.com` transitioned to a "thick" registry. Verisign now stores full registrant contact data. In practice, this is somewhat academic because GDPR redaction means most registrant contacts show only the registrar's identity. But the data is technically there, centralized at the registry level.

Most newer gTLDs were thick from the start.

## EPP: How Registrars Talk to Registries

The Extensible Provisioning Protocol (EPP, RFC 5730-5734) is the XML-over-TCP protocol that every registrar uses to communicate with every registry. When you click "Register" on your registrar's website, the sequence is:

```mermaid
sequenceDiagram
  participant U as You
  participant R as Registrar
  participant V as Verisign (registry)
  participant Z as .com zone file
  participant DNS as DNS / RDAP

  U->>R: Click "Register example.com"
  R->>V: EPP <create>
  V->>V: Create domain in registry DB
  V->>Z: Add NS records (next generation cycle)
  Z-->>DNS: Domain resolvable in DNS
  V-->>DNS: Domain appears in RDAP queries
```

```ascii
You click "Register example.com"
  → Registrar sends EPP <create> to Verisign
  → Verisign creates domain in registry database
  → Verisign adds NS records to the .com zone file (next generation cycle)
  → Domain becomes resolvable in DNS
  → Domain appears in RDAP queries
```

You never touch EPP directly — it's a registrar-to-registry protocol. But understanding it explains the latency you sometimes see. The gap between an EPP `<create>` command and the domain actually resolving in DNS is typically seconds to minutes, depending on when the registry next regenerates its zone file. Verisign regenerates the `.com` zone multiple times per day.

EPP also handles renewals, transfers between registrars, status changes, and deletions. Every lifecycle transition described below is ultimately an EPP command.

## Domain Lifecycle

A domain passes through well-defined states from registration to deletion. The timelines are set by ICANN policy and registry rules.

```mermaid
stateDiagram-v2
  direction TB
  [*] --> REGISTERED
  REGISTERED: <b>REGISTERED</b> (active)<br/>Resolves. Auto-renew or manual.<br/>Duration 1–10 years
  EXPIRED: <b>EXPIRED</b> (auto-renew grace)<br/>May still resolve. Registrar can renew at normal price.<br/>0–45 days (registrar policy)
  RGP: <b>REDEMPTION GRACE PERIOD</b> (RGP)<br/>Removed from zone — stops resolving.<br/>Recoverable at penalty fee ($80–$200+).<br/>30 days (ICANN mandated for gTLDs)
  PENDING: <b>PENDING DELETE</b><br/>Queued for deletion.<br/>Cannot be renewed or recovered by anyone.<br/>5 days
  AVAILABLE: <b>AVAILABLE</b> (dropped)<br/>First-come-first-served

  REGISTERED --> EXPIRED: registration expires, not renewed
  EXPIRED --> RGP: grace period ends, not renewed
  RGP --> PENDING: RGP ends, not redeemed
  PENDING --> AVAILABLE: deletion completes
  AVAILABLE --> [*]
```

```ascii
┌──────────────────┐
│   REGISTERED      │  Normal state. Domain resolves. Auto-renew or manual.
│   (active)        │  Duration: 1-10 years
└────────┬─────────┘
         │ registration expires, not renewed
┌────────┴─────────┐
│   EXPIRED         │  Grace period. Domain may still resolve.
│   (auto-renew     │  Registrar can renew at normal price.
│    grace period)  │  Duration: 0-45 days (registrar policy, typically 30-40)
└────────┬─────────┘
         │ grace period ends, not renewed
┌────────┴─────────┐
│   REDEMPTION      │  Domain removed from zone — stops resolving.
│   GRACE PERIOD    │  Registrant can still recover, but at a penalty fee
│   (RGP)           │  ($80-200+ depending on registrar).
│                   │  Duration: 30 days (ICANN mandated for gTLDs)
└────────┬─────────┘
         │ RGP ends, not redeemed
┌────────┴─────────┐
│   PENDING DELETE   │  Queued for deletion by the registry.
│                   │  Cannot be renewed or recovered by anyone.
│                   │  Duration: 5 days
└────────┬─────────┘
         │ deletion completes
┌────────┴─────────┐
│   AVAILABLE        │  Domain dropped. First-come-first-served.
│   (dropped)       │
└───────────────────┘
```

The critical details:

- **Auto-renew grace** varies wildly by registrar. Some give 40 days. Some give none. Check your registrar's policy before assuming you can lapse and recover cheaply.
- **Redemption** is expensive by design — the fee discourages speculative letting-expire-and-reregistering. $80 is the low end; some registrars charge $200+.
- **Pending delete** is the point of no return. Once a domain enters this state, the current registrant cannot recover it. It will be deleted in exactly 5 days.
- **Available** doesn't mean you'll get it. More on that below.

## Status Flags

Domains carry EPP status codes that control what operations are permitted and whether the domain appears in the DNS zone. These flags show up in RDAP responses.

| Status | What it means |
|---|---|
| `active` | Normal registered domain, resolving in DNS |
| `clientTransferProhibited` | Registrar locked — cannot be transferred |
| `serverTransferProhibited` | Registry locked — cannot be transferred |
| `clientDeleteProhibited` | Registrar locked — cannot be deleted |
| `serverDeleteProhibited` | Registry locked — cannot be deleted |
| `clientUpdateProhibited` | Registrar locked — no changes to NS, contacts |
| `clientHold` | Registrar suspended — removed from zone file |
| `serverHold` | Registry suspended — removed from zone file |
| `pendingDelete` | Queued for deletion (5 days) |
| `pendingTransfer` | Transfer in progress between registrars |
| `redemptionPeriod` | In RGP — recoverable at penalty cost |
| `autoRenewPeriod` | Just auto-renewed, cancellation still possible |
| `addPeriod` | Just registered, within add grace period |

The `client*` flags are set by the registrar (at the registrant's request or as default policy). The `server*` flags are set by the registry (usually for legal or policy reasons). Most registrars set `clientTransferProhibited` by default on new domains to prevent unauthorized transfers.

### The Hold Trap

`clientHold` and `serverHold` are the flags that trip people up. A domain on hold is still registered — it has an owner and an expiry date — but it's removed from the zone file. It returns NXDOMAIN to DNS queries, which looks identical to "this domain doesn't exist."

If you're checking domain availability by [DNS lookup](/guide/dns-resolution-full-picture/) alone, held domains look available. They're not. The only way to know the difference is to check the registry's [RDAP](/note/whois-dead-long-live-rdap/), where the domain will show up as registered with a hold status.

## Drop Catching: The Five-Day Race

When a domain enters `pendingDelete`, the clock starts. In 5 days, the registry will delete it and the domain becomes available for fresh registration. This window creates an industry.

Professional drop-catching services monitor `pendingDelete` domains and queue automated registrations timed to fire the instant the registry deletes the domain. Multiple services compete for the same domain, sending EPP `<create>` commands within milliseconds of deletion.

For commodity domains (long names, obscure TLDs, no traffic), you might register one normally after it drops. For premium expired domains — short names, dictionary words, domains with existing traffic or backlinks — drop catchers win nearly every time. Services like SnapNames, DropCatch, and NameJet run auctions for high-value expiring domains, and the winner's registration fires automatically.

What this means in practice: knowing a domain is in `pendingDelete` gives you roughly 5 days of lead time. Whether you can actually register it when it drops depends entirely on whether anyone else wants it. The infrastructure exists to catch desirable domains within milliseconds.

## The Full Picture

The path from "I want a domain" to "it resolves in browsers worldwide" crosses every layer:

1. **ICANN** sets the rules and accredits your registrar
2. **Your registrar** sends an EPP `<create>` to the registry
3. **The registry** adds the domain to its database and generates NS records in the next zone file
4. **TLD nameservers** pick up the new zone file and start responding to queries for your domain
5. **Recursive resolvers** [walk the hierarchy](/guide/dns-resolution-full-picture/), find your domain's nameservers, and cache the result
6. **Your browser** gets an IP address and opens a connection

The entire chain — from EPP command to resolvable domain — typically completes in minutes. The governance structure behind it took decades to build.

## Sources

- [RFC 5730 — Extensible Provisioning Protocol (EPP)](https://datatracker.ietf.org/doc/html/rfc5730)
- [RFC 5731 — EPP Domain Name Mapping](https://datatracker.ietf.org/doc/html/rfc5731)
- [ICANN Registrar Accreditation](https://www.icann.org/resources/pages/accreditation-2012-02-25-en)
- [Verisign Domain Name Industry Brief](https://www.verisign.com/en_US/domain-names/dnib/index.xhtml)

---

# Building a High-Throughput DNS Scanner in Go

URL: https://krowdev.com/article/go-dns-scanner-4000qps/
Kind: article | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: go, dns, architecture, performance

> From 160 qps to 4000+ by moving the hot path into Go — eliminating shared state, per-goroutine connections, and lessons from massdns and zdns.

## Agent Context

- Canonical: https://krowdev.com/article/go-dns-scanner-4000qps/
- Markdown: https://krowdev.com/article/go-dns-scanner-4000qps.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: article
- Maturity: budding
- Confidence: medium
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-03-29
- Modified: 2026-03-29
- Words: 2088 (10 min read)
- Tags: go, dns, architecture, performance
- Prerequisites: dns-resolution-full-picture
- Related: dns-resolution-full-picture, aimd-rate-limiting
- Content map:
  - h2: The Bottleneck: Serialization, Not Network
  - h2: What massdns Teaches: One Thread Beats 500
  - h2: What zdns Teaches: Shared-Nothing Goroutines
  - h2: The Architecture: Go Owns the Hot Path
  - h2: Worker-per-Goroutine: Zero Locks in the Query Path
  - h2: Why miekg/dns
  - h2: The Protocol: Newline-Delimited Streaming
  - h2: The Evolution: Bash to Python to Go
  - h2: What Made the Difference
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

A DNS scanner project needed to check thousands of domains per second against TLD nameservers. The first version managed 160 queries/second. The bottleneck wasn't the network, wasn't DNS resolution time, wasn't the proxies. It was the pipe between the orchestrator and the resolver. This article covers what went wrong, what two reference implementations taught us about doing it right, and the architecture that got throughput past 4000 queries/second.

## The Bottleneck: Serialization, Not Network

The original architecture used a Python orchestrator with a Go sidecar for DNS resolution. Python sent one query at a time through stdin as JSON, Go processed it, and returned one result through stdout. A classic RPC bridge pattern.

Five serialization points killed throughput:

1. **Proxy selection lock** -- Python picks one proxy per query under a mutex
2. **Stdin write + flush** -- one JSON line per query, blocks on pipe buffer
3. **Per-connection lock in Go** -- one query at a time per connection
4. **Stdout write lock** -- serializes all output through a single mutex
5. **Single reader thread** -- Python processes one response at a time

Each serialization point is fast individually. Together they form a pipeline where every stage waits for the previous one. The wall-clock throughput was 160 queries/second -- roughly what you'd expect from five synchronous bottlenecks each adding a few milliseconds.

The [DNS round-trip](/guide/dns-resolution-full-picture/) through a SOCKS5 proxy to a public resolver is 50-200ms. With 500 concurrent connections, you'd expect 2500-10000 queries/second if the only bottleneck were network latency. The pipe was leaving 95% of the available parallelism on the table.

## What massdns Teaches: One Thread Beats 500

massdns achieves 350,000 queries/second using C, a single thread, and no locks. It's worth understanding how.

The design is a pre-allocated slot pool. At startup, massdns allocates N slots for in-flight queries (default 10,000). Each slot holds the query state: domain name, query type, timestamp, retry count. A hash map correlates incoming responses back to their slot using `(domain, type)` as the key.

The event loop is built on `epoll`:

- When the socket is writable, pull the next domain from input and send a UDP query
- When the socket is readable, parse the response and match it back to a slot via the hash map
- A timed ring buffer handles timeouts -- slots are inserted at their deadline position and swept lazily

No threads. No locks. No goroutines. No channels. One thread owns all state and alternates between sending and receiving based on socket readiness. The CPU never blocks on I/O and never contends on shared data.

**The insight**: one thread with async I/O beats 500 threads with locks. The coordination overhead of mutex contention, context switching, and cache invalidation across threads can easily dominate the actual work when the work (sending a 40-byte UDP packet) is near-zero.

massdns is a ceiling reference -- it shows what's possible when DNS scanning is the only thing happening in the process. A practical scanner that needs proxy support, TCP connections, and integration with a larger pipeline won't match 350K/s, but it should aim for the same principle: don't serialize the hot path.

## What zdns Teaches: Shared-Nothing Goroutines

zdns is a Go DNS scanner from the ZMap project. It achieves 1000+ queries/second with a cleaner model than raw epoll: goroutines with no shared mutable state.

The architecture is a four-channel pipeline:

```
stdin → input channel → worker goroutines → output channel → stdout
```

Each worker goroutine owns its own resolver. No shared connection pool. No shared socket. No mutex in the query path. Workers pull domains from the input channel, resolve them on their own connection, and push results to the output channel. Concurrency equals the worker count, and that's the only knob.

**The insight**: eliminate ALL shared mutable state in the hot path. If no goroutine reads or writes data that another goroutine touches, you don't need locks, you don't need atomics, and you don't need to think about memory ordering. Channels handle the handoff at the boundary.

zdns doesn't even do rate limiting in the traditional sense. The number of workers *is* the rate limit. Each worker processes queries sequentially on its own connection, so the maximum throughput is `workers * (1 / avg_query_time)`. Want more throughput? Add more workers.

## The Architecture: Go Owns the Hot Path

Combining these lessons, the redesigned scanner splits responsibilities by speed:

```
Python (orchestrator)                    Go (scanner daemon)
+-----------------------+                +------------------------------+
| Load proxies          |--config JSON-->| Store proxy pool             |
| Generate domains      |--domain stream>| Assign proxy per worker      |
| Read results          |<-result stream-| Resolve via SOCKS5 + DNS     |
| WHOIS/RDAP on misses  |               | Manage connections           |
| Store results         |               | Handle timeouts/retries      |
+-----------------------+                +------------------------------+
     slow path                               fast path
     (~20/s, 5% of domains)                  (1000+/s, all domains)
```

The principle: Go owns everything in the hot path. Python's job is to feed domains and consume results. No per-query decisions cross the process boundary.

Python sends the proxy list once at startup. Go assigns proxies to workers internally using static round-robin. No per-query proxy selection in Python. No per-query JSON encoding for a request object. No future-matching on the response. Domains go in as bare strings, results come out as compact JSON.

The slow path -- [WHOIS/RDAP](/note/whois-dead-long-live-rdap/) confirmation for domains that return NXDOMAIN -- stays in Python. It runs at ~20 queries/second, only hits ~5% of domains, and involves HTTP requests with TLS fingerprinting. There's no reason to rewrite it.

## Worker-per-Goroutine: Zero Locks in the Query Path

The internal Go architecture follows the zdns model directly:

```
stdin reader (1 goroutine)
    |
    v
domain channel (buffered, 10K)
    |
    +--> worker 0 --> own SOCKS5 conn + dns.Conn --> result channel
    +--> worker 1 --> own SOCKS5 conn + dns.Conn --> result channel
    +--> worker 2 --> ...                            ...
    ...
    +--> worker N --> own SOCKS5 conn + dns.Conn --> result channel
                                                         |
                                                         v
                                               stdout writer (1 goroutine)
```

Each worker goroutine owns:

- **One SOCKS5 connection** (persistent TCP, reconnect on error)
- **One proxy** from the pool (assigned at startup, never rotated)
- **One `dns.Conn`** for DNS queries over that SOCKS5 tunnel
- **Zero shared locks** in the query path

Proxy assignment is static: with N workers and M proxies, worker `i` uses proxy `i % M`. Workers reuse their connection across queries. If a connection dies, the worker reconnects to the same proxy. No connection pool. No shared connections.

**Why this works**: with 500 workers and 50 proxies, each proxy gets ~10 workers. Workers process queries sequentially on their own connection at ~2-5 queries/second each (limited by SOCKS5 round-trip time). 500 workers x 3 queries/second average = 1500/second total. Scale workers to 1500 and you're past 4000/second. No locks, no contention, linear scaling until you saturate proxy bandwidth. An [AIMD rate limiter](/note/aimd-rate-limiting/) can dynamically adjust worker count based on error signals, but the static model already demonstrates the scaling principle.

The worker lifecycle is minimal:

```go
func (w *Worker) Run(domains <-chan string, results chan<- Result) {
    defer w.Close()
    for domain := range domains {
        result := w.resolve(domain)
        results <- result
    }
}

func (w *Worker) resolve(domain string) Result {
    if w.conn == nil {
        w.connect() // SOCKS5 dial -> DNS resolver
    }
    msg := new(dns.Msg)
    msg.SetQuestion(dns.Fqdn(domain), dns.TypeNS)
    resp, rtt, err := w.client.ExchangeWithConn(msg, w.conn)
    if err != nil {
        w.conn.Close()
        w.conn = nil // reconnect on next query
        return Result{Domain: domain, Status: "error", Retries: w.retries}
    }
    return Result{
        Domain:   domain,
        Status:   rcodeToStatus(resp.Rcode),
        RTT:      rtt,
        Resolved: resp.Rcode == dns.RcodeSuccess,
    }
}
```

No connection pool abstraction. No retry middleware. No circuit breaker. Each worker is an independent unit. If one worker's proxy goes down, that one worker reconnects. The other 499 are unaffected.

## Why miekg/dns

The `miekg/dns` library is the de facto standard for DNS in Go. It's battle-tested by zdns, dnsx, CoreDNS, and most other serious Go DNS tooling. Using it instead of hand-rolling DNS packets gives you:

```go
client := &dns.Client{
    Net:     "tcp",
    Timeout: 2 * time.Second,
    Dialer:  socks5Dialer(proxyAddr), // custom net.Dialer for SOCKS5
}

msg := new(dns.Msg)
msg.SetQuestion(dns.Fqdn(domain), dns.TypeNS)
resp, rtt, err := client.ExchangeWithConn(msg, conn)
```

The critical piece is the custom `Dialer`. By injecting a SOCKS5 dialer into the DNS client, every DNS query goes through the proxy tunnel transparently. The DNS library doesn't know or care about the proxy layer -- it just sees a `net.Conn`. This is the same pattern dnsx uses for proxy support.

Benefits over building DNS packets manually:

- Correct message construction (no off-by-one in the 2-byte TCP length prefix)
- Proper FQDN handling (trailing dot normalization)
- Response parsing with type-safe access to answer records
- Future extensibility: DoH, DoT, EDNS0, DNSSEC validation are all supported

## The Protocol: Newline-Delimited Streaming

The communication between Python and Go uses the simplest possible wire format: newline-delimited text.

**Phase 1 -- Configuration** (one JSON object, first line):

```json
{"proxies":["socks5h://..."],"resolver":"1.1.1.1","timeout_ms":2000,"workers":500}
```

Python sends the full config as a single JSON line. Go parses it, initializes workers, connects to proxies, and starts listening for domains.

**Phase 2 -- Domain streaming** (one domain per line):

```
aaaa.com
aaab.com
aaac.com
```

Bare domain names, no JSON wrapping. Closing stdin signals end-of-input.

**Phase 3 -- Results** (JSONL, unordered):

```json
{"d":"aaab.com","s":"taken","r":0,"ms":45,"re":true}
{"d":"aaac.com","s":"nxdomain","r":3,"ms":62,"re":false}
{"d":"aaaa.com","s":"taken","r":0,"ms":38,"re":true}
```

Short field names minimize JSON overhead: `d` for domain, `s` for status, `r` for RCODE, `ms` for round-trip time, `re` for whether the domain resolved. Results arrive unordered -- whichever worker finishes first writes first.

The protocol is deliberately simple. No request IDs, no correlation, no framing beyond newlines. Python doesn't need to match responses to requests because it processes results as a stream. The two-phase pipeline (DNS scan, then WHOIS/RDAP on the NXDOMAIN subset) doesn't require per-domain tracking.

## The Evolution: Bash to Python to Go

The project went through three generations in about ten days:

**Day 1: Bash script.** A `whois` command in a loop with `sleep 0.3`. Sequential, no proxies, no structured storage. Roughly 3 queries/second when you account for WHOIS server latency.

**Day 1 (later): Python monolith.** A 774-line single-file rewrite with SQLite storage, proxy support, and parallel connections. This got the architecture right conceptually -- proxy rotation, structured results, deduplication -- but hit a ceiling around 20-30 queries/second for the RDAP/WHOIS path.

**Day 8: Go sidecar (v1).** Added a Go process for DNS resolution to bypass Python's I/O limitations. The RPC bridge pattern got throughput to 160/second -- better, but the serialization pipeline left 95% of capacity unused.

**Day 10: Go scanner (v2).** The architecture described in this article. Bulk streaming, worker-per-goroutine, no shared state. Throughput past 4000 queries/second.

The progression illustrates a pattern: the right language for the hot path matters less than the right architecture for the hot path. The v1 Go sidecar was Go code running at Python speeds because the bottleneck was the interface between them. The v2 architecture got fast by moving the entire hot path -- proxy selection, connection management, query dispatch, result collection -- into a single process with no cross-boundary serialization per query.

## What Made the Difference

Three changes account for nearly all the throughput improvement:

**No per-query serialization across the process boundary.** v1 serialized a JSON request and response for every single query. v2 sends bare domain names in and compact results out. The protocol overhead per query dropped from ~500 bytes of JSON round-trip to ~20 bytes in and ~60 bytes out.

**No shared mutable state in the query path.** v1 had five lock acquisition points per query. v2 has zero. Each worker is an independent goroutine with its own connection, its own proxy, and its own DNS client. The only synchronization is channel sends, which are lock-free at the goroutine level.

**Bulk proxy assignment instead of per-query rotation.** v1 called into a proxy manager for every query, acquiring a lock and running selection logic. v2 assigns proxies to workers once at startup. Worker `i` uses proxy `i % M` for its entire lifetime. No rotation, no scoring, no per-query decision.

The underlying principle is the same one massdns demonstrates at the extreme end: DNS queries are tiny and fast. The work of sending a 40-byte packet and reading a 100-byte response takes microseconds. Anything you do *around* that work -- locking, serializing, routing, selecting -- easily becomes the bottleneck. The architecture that wins is the one that does the least work per query in the hot path.

## Sources

- [massdns — High-performance DNS stub resolver](https://github.com/blechschmidt/massdns)
- [zdns — Fast CLI DNS lookup tool](https://github.com/zmap/zdns)
- [miekg/dns — DNS library in Go](https://github.com/miekg/dns)

---

# Multi-Agent Coordination Without an LLM

URL: https://krowdev.com/note/multi-agent-coordination-without-llm/
Kind: note | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: agentic-coding, architecture

> A deterministic coordinator for parallel AI agents — goals, budgets, feedback loops, and redirect signals without LLM judgment in the control plane.

## Agent Context

- Canonical: https://krowdev.com/note/multi-agent-coordination-without-llm/
- Markdown: https://krowdev.com/note/multi-agent-coordination-without-llm.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: note
- Maturity: budding
- Confidence: medium
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-03-29
- Modified: 2026-04-21
- Words: 1377 (7 min read)
- Tags: agentic-coding, architecture
- Related: parallel-ai-research-pipelines, agentic-coding-getting-started
- Content map:
  - h2: The Problem with LLM Coordinators
  - h2: The Pattern: Deterministic Coordinator
  - h2: Goal Lifecycle
  - h2: The Feedback Loop
  - h3: Worker-level feedback
  - h3: Goal-level feedback
  - h2: Redirect Signals
  - h2: Why This Works
  - h2: The Architecture in Summary
  - h2: Sources
- Diagrams: Mermaid fences are paired with adjacent ASCII companions in this document (1 Mermaid, 1 ASCII); HTML figures expose rendered SVG plus copyable Mermaid/ASCII source tabs.
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

You have three AI agents [running in parallel](/article/parallel-ai-research-pipelines/), each generating candidates for the same goal. They need to know what's already been tried, when to change strategy, and when to stop. The obvious move: put an LLM in the middle to coordinate. Read each agent's output, decide who should pivot, tell them what to do next.

This is the wrong move.

## The Problem with LLM Coordinators

An LLM coordinator introduces three failure modes:

**Subjective stopping.** An LLM reads an agent's output and decides "this direction looks exhausted." But the LLM doesn't have ground truth — it's guessing based on vibes. An agent that found nothing in 50 tries might find gold on try 51 if the search space is large enough. Only objective metrics (hit rate, budget remaining, target reached) should trigger stops.

**State drift.** The coordinator needs to track what's been submitted, what's been checked, what's duplicated. An LLM tracking this in its context window will lose items, double-count, and hallucinate state. Context windows are not databases.

**Latency and cost.** Every coordination decision requires an LLM call. If you have five agents each checking in every 30 seconds, that's 10 coordinator calls per minute — each burning tokens to re-read state that a SQLite query could answer in microseconds.

## The Pattern: Deterministic Coordinator

Separate the creative work from the coordination work. Agents (LLMs) do the creative part — generating candidates, exploring strategies, adapting to feedback. The coordinator is a plain program — no LLM, no inference, no judgment calls. It owns:

- **Goals and stop conditions.** Each goal has a target (e.g., "find 50 results meeting criteria X") and objective completion rules.
- **Worker registration and budgets.** Each agent gets a workspace, a strategy assignment, and a budget (how many items to process before stopping).
- **Candidate dedup.** A global set of everything already submitted. No agent wastes effort on items another agent already tried.
- **Result recording.** Every submission is tracked — accepted, rejected, duplicate, error. The coordinator is the single source of truth.
- **Feedback generation.** Deterministic signals derived from observed data, not LLM interpretation.

The coordinator is a CLI backed by a local database. Agents interact with it through commands, not conversation.

## Goal Lifecycle

The lifecycle has five steps:

**1. Create a goal** with constraints — topic, strategy hints, target count, quality thresholds. The goal defines what "done" looks like in measurable terms.

**2. Register workers.** Each agent gets an ID, a workspace directory, a strategy assignment, and per-worker limits. Strategies should be disjoint — if one agent is exploring short names and another is exploring compound words, their search spaces overlap minimally.

**3. Submit and check.** Agents generate candidates and submit them to the coordinator. The coordinator deduplicates against the global checked set, processes accepted candidates, and records results — all in one atomic operation.

**4. Read feedback.** After each submission round, agents read their worker feedback and the goal-level feedback. This is where they learn what's working and what isn't.

**5. Stop on objective conditions.** The goal is complete when the target count is reached, the budget is exhausted, or the operator manually stops it. Not when an agent "feels done."

## The Feedback Loop

This is what makes the pattern work. Feedback is deterministic — computed from observed data, not generated by an LLM reading summaries.

### Worker-level feedback

Each agent gets a report specific to its own performance:

| Signal | What it tells the agent |
|--------|------------------------|
| Budget remaining | How many more items it can process |
| Target remaining | How many more hits the worker needs |
| Duplicate rate | How often it's submitting items another agent already tried |
| Hit rate | What fraction of its submissions are succeeding |
| Recent successes | Its last accepted results (reinforcement) |

High duplicate rate means the agent's strategy is converging with another agent's. Time to diversify. Low hit rate means the current approach isn't working — but that's the agent's problem to solve creatively, not the coordinator's.

### Goal-level feedback

A broader view across all workers:

| Signal | What it tells the agent |
|--------|------------------------|
| Total progress | Checked count, target remaining, queue depth |
| Goal state | `continue` or `complete` |
| Global hit rate | How productive the entire team is |
| Per-strategy performance | Which strategies are producing results |
| Duplicate pressure | How much redundant work is happening across all agents |

Per-strategy performance is powerful. If strategy A has a 15% hit rate and strategy B has 2%, agents assigned to B can see this and pivot — without being told to by a coordinator. The data speaks for itself.

## Redirect Signals

The coordinator emits redirect messages — but they're deterministic observations, not instructions.

```
redirect: "hit rate below threshold — consider narrowing constraints"
redirect: "high duplicate pressure from strategy X — try a different direction"
redirect: "shortest successful results are 5-6 characters — prioritize that range"
```

These are generated by rules: if hit rate drops below a configured threshold, emit the message. If duplicate submissions from one strategy exceed a percentage, emit the message. No LLM interprets anything. The coordinator just reports what the numbers say.

The critical design choice: **redirect messages are hints, not commands.** They don't grant permission to stop. An agent reads "hit rate below threshold" and might decide to change its approach — but it keeps going until an objective stop condition is met (budget exhausted, target reached, goal complete).

This prevents the biggest failure mode of LLM coordination: an agent that gives up too early because the coordinator (or the agent itself) decided the situation "looks hopeless." In large search spaces, persistence past apparent exhaustion is often where the best results come from.

## Why This Works

The pattern works because it separates two fundamentally different kinds of work:

**Creative work** (what LLMs are good at): generating novel candidates, adapting strategies, exploring unexpected directions, interpreting qualitative feedback. This is the part where [agentic coding](/guide/agentic-coding-getting-started/) shines — letting the LLM do what it's best at.

**Bookkeeping** (what databases are good at): tracking what's been tried, computing hit rates, enforcing budgets, detecting duplicates, determining if a goal is complete.

Putting an LLM in the bookkeeping role wastes its strengths and amplifies its weaknesses. An LLM coordinator is slower, less accurate, more expensive, and less reliable than a deterministic program doing the same job.

The database is the source of truth. The feedback loop is the communication channel. The agents are creative workers who read objective data and make their own decisions about how to proceed.

## The Architecture in Summary

```mermaid
graph TD
  subgraph Agents["LLMs &mdash; creative work"]
    A1["Agent 1<br/>Strategy A"]
    A2["Agent 2<br/>Strategy B"]
    A3["Agent 3<br/>Strategy C"]
  end

  Coord["<b>Coordinator</b> (deterministic CLI) &mdash; no LLM<br/>&bull; Dedup against global checked set<br/>&bull; Record results<br/>&bull; Compute feedback (hit rate, budgets)<br/>&bull; Emit redirect signals (rule-based)<br/>&bull; Evaluate stop conditions"]
  DB[("<b>Local DB (SQLite)</b> &mdash; source of truth<br/>Goals, workers, budgets<br/>Candidate pool (deduped)<br/>Results, events, strategies")]

  A1 -->|submit| Coord
  A2 -->|submit| Coord
  A3 -->|submit| Coord
  Coord <--> DB
  Coord -. feedback .-> A1
  Coord -. feedback .-> A2
  Coord -. feedback .-> A3
```

```ascii
┌─────────────────────────────────────────────┐
│  Agent 1          Agent 2          Agent 3  │  ← LLMs (creative work)
│  Strategy A       Strategy B       Strategy C│
└─────┬──────────────┬──────────────┬─────────┘
      │ submit       │ submit       │ submit
      ▼              ▼              ▼
┌─────────────────────────────────────────────┐
│  Coordinator (deterministic CLI)            │  ← No LLM
│  • Dedup against global checked set         │
│  • Record results                           │
│  • Compute feedback (hit rate, budgets)     │
│  • Emit redirect signals (rule-based)       │
│  • Evaluate stop conditions                 │
├─────────────────────────────────────────────┤
│  Local Database (SQLite)                    │  ← Source of truth
│  • Goals, workers, budgets                  │
│  • Candidate pool (deduped)                 │
│  • Results, events, strategies              │
└─────────────────────────────────────────────┘
```

Agents submit candidates, read feedback, adapt. The coordinator tracks everything, computes signals, decides nothing. Creative decisions stay with the LLMs. State management stays with the database.

No LLM judgment in the control plane. Just data in, signals out.

For the repo-level version of the same discipline, [Writing an Effective CLAUDE.md](/guide/claude-md-patterns/) covers how to encode boundaries before agents start stepping on each other.

## Sources

- Anthropic, [Subagents](https://docs.anthropic.com/en/docs/claude-code/sub-agents)
- Anthropic, [Common workflows](https://docs.anthropic.com/en/docs/claude-code/tutorials)
- Git, [git-worktree documentation](https://git-scm.com/docs/git-worktree)

---

# TLS Fingerprinting with curl_cffi

URL: https://krowdev.com/note/tls-fingerprinting-curl-cffi/
Kind: note | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: python, security, fingerprinting

> How curl_cffi impersonates browser TLS and HTTP/2 fingerprints in Python — what it handles automatically and the one header you still need to set.

## Agent Context

- Canonical: https://krowdev.com/note/tls-fingerprinting-curl-cffi/
- Markdown: https://krowdev.com/note/tls-fingerprinting-curl-cffi.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: note
- Maturity: budding
- Confidence: medium
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-03-29
- Modified: 2026-04-21
- Words: 1519 (7 min read)
- Tags: python, security, fingerprinting
- Prerequisites: bot-detection-2026
- Related: bot-detection-2026
- Content map:
  - h2: The Problem: Python's TLS Signature
  - h2: How curl_cffi Works
  - h2: Supported Browser Targets
  - h2: Live Capture Results
  - h3: JA3 and JA4 Fingerprints
  - h3: HTTP/2 Fingerprints (Akamai Format)
  - h3: TLS Cipher Suite Counts
  - h2: What curl_cffi Handles Automatically
  - h2: What You Must Set Yourself
  - h2: Advanced: Customizing the Fingerprint
  - h2: Known Limitations
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

:::note
This content is for **security research, testing your own infrastructure, and understanding detection systems**. Bypassing bot detection on sites you do not own or operate may violate their terms of service. Always obtain proper authorization before testing against third-party systems.
:::

## The Problem: Python's TLS Signature

Every TLS connection starts with a ClientHello message. The cipher suites, extensions, extension order, supported groups, and signature algorithms in that message form a fingerprint. Different TLS libraries produce different fingerprints.

Python's `requests` library uses urllib3 backed by OpenSSL. Its JA3 hash — `8d9f7747675e24454cd9b7ed35c58707` — is one of the most widely recognized automation fingerprints on the internet. [Anti-bot systems](/article/bot-detection-2026/) at Cloudflare, Akamai, and DataDome check JA3/JA4 fingerprints before a single byte of your request body is read. If your TLS handshake says "I'm a Python script," no amount of header spoofing will help.

The same applies to Go's `net/http`, Node's `https`, and default curl. Each has a known fingerprint that matches zero real browsers.

## How curl_cffi Works

[curl_cffi](https://github.com/lexiforest/curl_cffi) wraps `curl-impersonate`, which links against **BoringSSL** — Google's OpenSSL fork and the same TLS library Chrome actually uses. This isn't recording your local Chrome. It replays pre-captured browser sessions:

1. **Pre-captured TLS profiles**: Each browser target has stored ClientHello parameters (cipher suites, extensions, extension order, supported groups, signature algorithms) captured from real browser sessions. These are replayed byte-for-byte.

2. **HTTP/2 SETTINGS**: Each profile stores the exact SETTINGS frame, WINDOW_UPDATE, and pseudo-header order matching the target browser. Default curl sends pseudo-headers in `mpsa` order, which matches no browser at all. curl_cffi sends Chrome's `masp`, Firefox's `mpas`, or Safari's `mspa`.

3. **Default headers**: With `default_headers=True` (the default), curl_cffi auto-generates correct `Sec-Ch-Ua`, `User-Agent`, `Sec-Fetch-*`, `Accept`, and other headers — all matched to the impersonated version, in the correct order.

The result: ~99.8% JA3 match rates against real Chrome.

```python
from curl_cffi import requests

session = requests.Session(impersonate="chrome136")
response = session.get("https://example.com")
```

That single `impersonate` parameter handles TLS fingerprint, HTTP/2 settings, pseudo-header order, header values, and header ordering.

## Supported Browser Targets

As of curl_cffi 0.14 (stable):

| Browser | Targets | Count |
|---------|---------|-------|
| Chrome Desktop | chrome99 through chrome142 | 15 |
| Chrome Android | chrome99_android, chrome131_android | 2 |
| Safari Desktop | safari153 through safari260 | 6 |
| Safari iOS | safari172_ios through safari260_ios | 4 |
| Firefox | firefox133 through firefox144 | 4 |
| Edge | edge99, edge101 | 2 |
| Tor | tor145 | 1 |

Generic aliases (`chrome`, `safari`, `firefox`) always point to the latest target.

**Version gaps are intentional.** Browser versions are only added when their fingerprints actually change. There's no chrome102 because Chrome 102's fingerprint was identical to Chrome 101's.

A few caveats worth knowing:

- **Chrome targets are the most accurate.** BoringSSL *is* Chrome's TLS library, so the replay is authentic.
- **Firefox targets are approximations.** Real Firefox uses NSS (Mozilla's TLS library), not BoringSSL. curl_cffi gets the JA3 hash right but can't replicate NSS-specific extensions like `delegated_credentials` (ext 34) or `record_size_limit` (ext 28). Also, the firefox144 target has a known bug: it reports `rv:135.0` in the User-Agent while claiming Firefox 144. A consistency check catches this.
- **Edge targets are obsolete.** Only edge99 and edge101 exist. Since Edge is Chromium-based, use a recent Chrome target instead.

## Live Capture Results

Empirical captures against a TLS fingerprint service, using curl_cffi 0.14:

### JA3 and JA4 Fingerprints

Chrome 110+ randomizes TLS extension order, so JA3 changes on every connection. JA4 sorts before hashing, producing stable fingerprints.

| Target | JA3 Hash | JA4 |
|--------|----------|-----|
| chrome120 | `9cc9e346...` | `t13d1516h2_8daaf6152771_02713d6af862` |
| chrome124 | `351d0eae...` | `t13d1516h2_8daaf6152771_02713d6af862` |
| chrome131 | `cdbf6205...` | `t13d1516h2_8daaf6152771_02713d6af862` |
| chrome133a | `a6d135b0...` | `t13d1516h2_8daaf6152771_d8a2da3f94cd` |
| chrome136 | `2d04cd75...` | `t13d1516h2_8daaf6152771_d8a2da3f94cd` |
| chrome142 | `5da544c8...` | `t13d1516h2_8daaf6152771_d8a2da3f94cd` |

JA4 part C changed between chrome131 and chrome133a — Chrome updated its signature algorithms. This means Chrome 133+ has a different JA4 than Chrome 120-131. Both are valid; they represent different real Chrome versions.

The `t13d1516h2` prefix decodes as: TLS 1.3, 15 cipher suites, 16 extensions, HTTP/2 ALPN.

### HTTP/2 Fingerprints (Akamai Format)

All Chrome targets produce the same HTTP/2 fingerprint:

```
1:65536;2:0;4:6291456;6:262144|15663105|0|m,a,s,p
```

Browsers are completely distinct at this layer:

| Browser | Akamai HTTP/2 Fingerprint | Pseudo-Header Order |
|---------|--------------------------|---------------------|
| Chrome  | `1:65536;2:0;4:6291456;6:262144\|15663105\|0\|m,a,s,p` | `:method` `:authority` `:scheme` `:path` |
| Firefox | `1:65536;2:0;4:131072;5:16384\|12517377\|0\|m,p,a,s` | `:method` `:path` `:authority` `:scheme` |
| Safari  | `2:0;3:100;4:2097152;9:1\|10420225\|0\|m,s,a,p` | `:method` `:scheme` `:authority` `:path` |

Chrome's INITIAL_WINDOW_SIZE is 6,291,456. Firefox's is 131,072 — 48x smaller. Safari uses entirely different SETTINGS IDs. These differences alone are enough to identify which browser (or non-browser) is connecting — one of the [many layers bot detection stacks](/article/bot-detection-2026/) inspect before your request reaches the origin server.

### TLS Cipher Suite Counts

| Browser | Cipher Suites | Extensions |
|---------|--------------|------------|
| Chrome  | 16 | 18 (15 + 3 GREASE) |
| Firefox | 17 | 16-17 |
| Safari  | 20 | 14 |

## What curl_cffi Handles Automatically

With a Chrome impersonation target and `default_headers=True`, curl_cffi sets all of the following correctly:

- **User-Agent** — matched to the impersonated Chrome version
- **Sec-Ch-Ua** — correct brand list with version-appropriate GREASE (e.g., `"Not.A/Brand";v="99"` for Chrome 136)
- **Sec-Ch-Ua-Mobile** — `?0` for desktop targets
- **Sec-Ch-Ua-Platform** — `"macOS"` (all targets default to macOS)
- **Sec-Fetch-Site**, **Sec-Fetch-Mode**, **Sec-Fetch-User**, **Sec-Fetch-Dest** — correct values for a navigation request
- **Accept** — browser-appropriate value including `image/avif,image/webp` for Chrome
- **Accept-Encoding** — `gzip, deflate, br, zstd`
- **Priority** — the HTTP priority header Chrome sends
- **Header ordering** — all headers in Chrome's characteristic sequence

**Do not override these headers.** Setting them manually risks getting the values wrong, the order wrong, or both. curl_cffi's auto-generated values are captured from real browsers. Your manual overrides almost certainly aren't.

A common mistake is using header-generation libraries that set `Sec-Fetch-*` values. These libraries frequently produce wrong values (e.g., `sec-fetch-user: ?0` instead of `?1` for navigation requests). Let curl_cffi handle it.

## What You Must Set Yourself

One header: **`Accept-Language`**.

curl_cffi does not set `Accept-Language` by default. This header should be plausible for the geographic region of your IP address. A request from a German IP with `Accept-Language: en-US,en;q=0.9` is suspicious. A request from a German IP with `Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7` is normal.

```python
from curl_cffi import requests

session = requests.Session(impersonate="chrome136")
response = session.get(
    "https://example.com",
    headers={"Accept-Language": "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7"}
)
```

When you pass headers to a request, curl_cffi merges them with its auto-generated defaults. Your `Accept-Language` is inserted at the correct position in the header order. The auto-generated headers you didn't override remain untouched.

That's the complete list of manual overrides. Everything else — TLS fingerprint, HTTP/2 settings, pseudo-header order, Sec-Ch-Ua, Sec-Fetch-*, Accept, User-Agent — is handled by the impersonation target.

## Advanced: Customizing the Fingerprint

Most use cases need nothing beyond `impersonate="chrome136"`. But curl_cffi exposes three levels of customization for cases where you need to tweak the fingerprint while keeping a base profile:

**JA3 string override** — replace the TLS ClientHello parameters directly:

```python
session = requests.Session(
    ja3="771,4865-4866-4867-49195-49196,0-11-10-35-16-5,29-23-24,0"
)
```

**Akamai HTTP/2 fingerprint override** — replace the HTTP/2 SETTINGS, WINDOW_UPDATE, and pseudo-header order:

```python
session = requests.Session(
    akamai="1:65536;4:131072;5:16384|12517377|3:0:0:201|m,p,a,s"
)
```

**extra_fp dictionary** — fine-grained control over individual TLS and HTTP/2 parameters:

```python
extra_fp = {
    "tls_signature_algorithms": [...],
    "tls_grease": False,
    "tls_permute_extensions": False,
    "tls_cert_compression": "brotli",
    "http2_stream_weight": 256,
}
session = requests.Session(impersonate="chrome136", extra_fp=extra_fp)
```

These levels compose. You can start with a Chrome profile and override specific TLS or HTTP/2 parameters without losing the rest of the profile's settings.

## Known Limitations

1. **All targets default to macOS User-Agents.** If you need Windows or Linux UAs, you must override both `User-Agent` and `Sec-Ch-Ua-Platform` together — they must agree. Since most proxy infrastructure runs Linux (TTL=64, matching macOS), sticking with the default macOS identity avoids TCP/IP layer inconsistencies.

2. **No JA4 customization.** curl_cffi lets you override JA3 strings and Akamai HTTP/2 fingerprints directly, but there's no JA4 override API. The Chrome profiles produce correct JA4 hashes because they replay real handshakes — you just can't tweak JA4 independently.

3. **No Encrypted Client Hello (ECH).** RFC 9849 was finalized in March 2026. Neither curl-impersonate nor curl_cffi support ECH yet. This doesn't affect fingerprinting today, but ECH adoption will eventually change how ClientHello fingerprinting works.

4. **Quarterly update lag.** Chrome ships new versions every 4 weeks. curl_cffi updates fingerprints roughly quarterly. There's always a window where the latest Chrome version isn't available as a target. Using the most recent available target (e.g., chrome142 when Chrome 145 is current) is fine — anti-bot systems expect a distribution of Chrome versions, not just the latest.

5. **Firefox accuracy.** BoringSSL approximations of NSS behavior have known gaps. If Firefox impersonation is critical, verify against a fingerprint service like tls.peet.ws before relying on it in production.

If you want the rest of the request path around this handshake, [DNS Resolution: The Full Picture](/guide/dns-resolution-full-picture/) is the companion walkthrough from resolver to HTTP request.

## Sources

- [curl_cffi documentation — Impersonation](https://curl-cffi.readthedocs.io/en/latest/impersonate/)
- [curl_cffi GitHub repository](https://github.com/lexiforest/curl_cffi)
- [curl-impersonate — A special build of curl that impersonates browsers](https://github.com/lwthiker/curl-impersonate)
- [RFC 9849 — TLS Encrypted Client Hello (ECH)](https://datatracker.ietf.org/doc/html/rfc9849)

---

# AIMD Rate Limiting for API Clients

URL: https://krowdev.com/note/aimd-rate-limiting/
Kind: note | Maturity: budding | Origin: ai-assisted
Author: Agent | Directed by: krow
Tags: networking, rate-limiting, algorithms

> TCP congestion control applied to API rate limiting — additive increase on success, multiplicative decrease on errors. Finds the limit automatically.

## Agent Context

- Canonical: https://krowdev.com/note/aimd-rate-limiting/
- Markdown: https://krowdev.com/note/aimd-rate-limiting.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: note
- Maturity: budding
- Confidence: medium
- Origin: ai-assisted
- Author: Agent
- Directed by: krow
- Published: 2026-03-28
- Modified: 2026-04-21
- Words: 700 (4 min read)
- Tags: networking, rate-limiting, algorithms
- Related: go-dns-scanner-4000qps
- Content map:
  - h2: The Problem
  - h2: The Idea: Borrow from TCP
  - h2: The Control Loop
  - h2: Parameters
  - h2: In Practice
  - h2: Why Not Just Use a Fixed Rate?
  - h2: Persistence Across Runs
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

## The Problem

You're hitting an API with unknown rate limits. Too fast and you get [429s](/snippet/http-status-codes/) or bans. Too slow and you waste time. The rate limit isn't documented, varies by endpoint, and changes under load.

Fixed delays are wrong in both directions — too conservative when the API is healthy, too aggressive when it's stressed.

## The Idea: Borrow from TCP

TCP solved this in 1988. The network doesn't tell you its capacity — you probe for it. AIMD (Additive Increase / Multiplicative Decrease) is the control loop:

- **Additive increase**: when things are going well, speed up by a small fixed step
- **Multiplicative decrease**: when you hit an error, slow down by a multiplier

The asymmetry is the key insight. You probe upward cautiously (linear) but retreat quickly (exponential). This converges to the maximum safe rate without oscillating wildly.

## The Control Loop

```
every check_period (e.g., 5 seconds):
    err_pct = recent_errors / recent_total

    if err_pct > target_err_pct:       # too fast
        interval = interval * backoff_mul    # multiplicative decrease
    elif err_pct < recover_pct:        # headroom available
        interval = interval - speedup_step   # additive increase (shorter = faster)

    interval = clamp(interval, min_interval, max_interval)
```

That's it. The interval between requests goes up (slower) when errors spike, and ticks down (faster) when the path is clear.

## Parameters

| Parameter | What it does | Example |
|-----------|-------------|---------|
| `check_period` | How often the controller evaluates | 5s |
| `target_err_pct` | Error rate that triggers slowdown | 10% |
| `recover_pct` | Error rate below which speedup begins | 5% |
| `backoff_mul` | How much to slow down (multiplicative) | 1.5x |
| `speedup_step` | How much to speed up (additive) | 200ms |
| `seed_interval` | Starting point | 3000ms |
| `min_interval` | Speed-up floor | 1500ms |
| `max_interval` | Slow-down ceiling | 6000ms |

The dead zone between `recover_pct` and `target_err_pct` is intentional — if error rate is between 5% and 10%, hold steady. This prevents jitter.

## In Practice

A real implementation from the [Go DNS scanner](/article/go-dns-scanner-4000qps/) managing a pool of upstream connections:

```go
// Per-connection health tracking
const (
    speedUpAfter   = 20   // consecutive successes to try faster
    slowDownFactor = 0.80 // interval *= 1/0.80 = 1.25x slower
    speedUpFactor  = 1.05 // interval *= 1/1.05 = ~5% faster
    minInterval    = 1500 * time.Millisecond
    maxInterval    = 6 * time.Second
)

func updateHealth(node *connNode, success bool, err string) {
    if success {
        node.consecSuccesses++
        if node.consecSuccesses >= speedUpAfter {
            cur := node.limiter.Interval()
            next := time.Duration(float64(cur) / speedUpFactor)
            next = max(next, minInterval)
            node.limiter.SetInterval(next)
            node.consecSuccesses = 0
        }
    } else if err == "rate_limited" {
        cur := node.limiter.Interval()
        next := time.Duration(float64(cur) / slowDownFactor)
        next = min(next, maxInterval)
        node.limiter.SetInterval(next)
    }
}
```

Each connection in the pool has its own rate limiter and health state. 20 consecutive successes triggers a ~5% speedup. A rate-limit error triggers a 25% slowdown. The interval is clamped between 1.5s and 6s.

The result: the system finds the maximum safe rate per connection and holds there, automatically adapting if conditions change.

## Why Not Just Use a Fixed Rate?

- Different endpoints have different throughput limits
- The server's rate limit may change under load
- After an outage, you want to ramp back up gradually — not hammer at full speed
- New connections need to discover their limits

AIMD handles all of these automatically. The physicist's version: it's a first-order feedback loop with asymmetric gain. Fast negative feedback prevents damage; slow positive feedback finds the equilibrium.

## Persistence Across Runs

The interval converges over the first few hundred queries. If you restart, you lose that convergence and burst through rate limits while re-learning.

Solution: persist the current interval to disk between runs.

```go
// Save at end of run
os.WriteFile("rate_interval_state", []byte(strconv.Itoa(ms)), 0644)

// Restore at start
data, _ := os.ReadFile("rate_interval_state")
ms, _ := strconv.Atoi(string(data))
```

This is the warm-start trick — the next run begins where the last one left off instead of probing from scratch.

For the concurrency side of the same problem, pair this with [worker pool isolation](/snippet/worker-pool-isolation/) and [pipeline stage communication](/snippet/pipeline-stage-communication/).

## Sources

- IETF, [RFC 5681: TCP Congestion Control](https://datatracker.ietf.org/doc/html/rfc5681)
- MDN, [429 Too Many Requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429)

---

# How Websites Detect Bots in 2026 — JA4 / JA4T Fingerprints, TLS, and HTTP/2

URL: https://krowdev.com/article/bot-detection-2026/
Kind: article | Maturity: evergreen | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: security, networking, fingerprinting, anti-detection, ja4, tls
Series: domain-infrastructure (#3)

> Cloudflare, Akamai, and DataDome bot detection explained: JA4 TLS fingerprints (t13d1516h2_…), JA4T TCP, HTTP/2 SETTINGS, and HTTP/2 frame order. Based on empirical packet captures.

## Agent Context

- Canonical: https://krowdev.com/article/bot-detection-2026/
- Markdown: https://krowdev.com/article/bot-detection-2026.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: article
- Maturity: evergreen
- Confidence: high
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-03-28
- Modified: 2026-05-10
- Words: 3063 (14 min read)
- Tags: security, networking, fingerprinting, anti-detection, ja4, tls
- Series: domain-infrastructure (#3)
- Related: dns-resolution-full-picture, whois-dead-long-live-rdap, aimd-rate-limiting, tls-fingerprinting-curl-cffi
- Content map:
  - h2: JA4 / JA4T Fingerprint Quick Reference
  - h2: The Detection Hierarchy
  - h2: Layer 0: TCP/IP Fingerprinting
  - h3: The proxy problem
  - h2: Layer 1: TLS ClientHello
  - h3: JA3: the original, now largely obsolete
  - h3: JA4: the current standard
  - h3: The JA4+ family
  - h3: Browser TLS characteristics
  - h2: Layer 2: HTTP/2 SETTINGS
  - h3: Akamai fingerprint format
  - h3: Pseudo-header order: the silent identifier
  - h2: Layer 3: HTTP Headers
  - h3: Chrome 136 header sequence
  - h3: Firefox 144 header sequence
  - h3: Safari 260 header sequence
  - h3: Cross-header consistency
  - h3: GREASE in Sec-Ch-Ua
  - h2: The Players
  - h3: Cloudflare Bot Management v3
  - h3: Akamai Bot Manager
  - h3: DataDome
  - h2: What Changed in 2025-2026
  - h2: What Actually Matters vs. What's Theater
  - h3: What matters
  - h3: What's theater (for non-JS requests)
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

## JA4 / JA4T Fingerprint Quick Reference

If you landed here searching for a specific fingerprint string, here's the lookup table. Full breakdown in [§ JA4: the current standard](#ja4-the-current-standard) below.

| Fingerprint | Decodes to | What it identifies |
|---|---|---|
| `t13d1516h2` | **t13** = TLS 1.3, **d** = SNI present, **15** ciphers, **16** extensions, **h2** = HTTP/2 ALPN | The JA4 *part A* prefix shared by all modern Chrome / Edge / Brave on TLS 1.3. Safari uses `t13d1517h2`, Firefox `t13d1714h2`. |
| `t13d1516h2_8daaf6152771_02713d6af862` | Chrome 120 – 131 (full JA4) | Stable across version bumps because part B/C are SHA256 of **sorted** ciphers + extensions. |
| `t13d1516h2_8daaf6152771_d8a2da3f94cd` | Chrome 133 – 136+ | Part C changed only because Chrome updated its `signature_algorithms` list between 131 and 133. |
| `t13d1516h2_8daaf6152` | Truncated form | The 12-character `_<hashB>` short form some logs and threat-intel feeds emit. Same browser family as the full hash. |

Tooling: [`curl_cffi`](/note/tls-fingerprinting-curl-cffi/) reproduces these from Python; native `curl` and Go's `net/http` cannot.

If you're looking for the inverse — *given a fingerprint, what's the browser?* — search [ja4db.com](https://ja4db.com/) or the [FoxIO-LLC/ja4](https://github.com/FoxIO-LLC/ja4) repo. This article explains the **why** behind the format so the table makes sense.

## The Detection Hierarchy

Bot detection is a layered system. Each layer fires at a different point in the connection lifecycle, and each one can reject you before the next layer even runs. Here's the order, from earliest to latest:

1. **TCP/IP fingerprint** — before encryption, before HTTP, before anything
2. **TLS ClientHello** — during the handshake, before any application data
3. **HTTP/2 SETTINGS** — the first application frame after TLS completes
4. **HTTP header order and values** — the actual request
5. **Client Hints coherence** — cross-header consistency checks
6. **IP reputation / ASN classification** — datacenter IP = suspicion
7. **Behavioral signals** — timing, navigation patterns, mouse movement

The critical insight: layers 1-4 are checked before a single byte of your "page content" loads. No JavaScript runs. No CAPTCHA renders. The server already knows if your connection looks like a browser or a script.

Modern anti-bot systems look for **cross-layer consistency** — what they call "stack drift." A perfect TLS fingerprint paired with wrong HTTP/2 settings is more suspicious than getting both slightly wrong. Every layer must tell the same story.

If you want the broader network path those layers sit on — resolver, recursive DNS, authoritative DNS, TLS, then HTTP — [DNS Resolution: The Full Picture](/guide/dns-resolution-full-picture/) is the right companion piece.

## Layer 0: TCP/IP Fingerprinting

The TCP SYN packet — the very first packet of any connection — reveals the operating system. This happens before encryption, before TLS, before HTTP. The server (or its CDN) sees raw TCP parameters that differ by OS:

| Parameter | Linux | Windows | macOS |
|-----------|-------|---------|-------|
| Initial TTL | 64 | 128 | 64 |
| TCP Window Size | 29,200 (kernel 3.x) / 64,240 (5.x+) | 65,535 | 65,535 |
| Window Scale | 7 | 8 | varies |
| TCP Options Order | MSS, SACK_PERM, TIMESTAMP, NOP, WSCALE | MSS, NOP, WSCALE, NOP, NOP, SACK_PERM (no TIMESTAMP) | MSS, NOP, WSCALE, NOP, NOP, TIMESTAMP, SACK_PERM |

Windows is the outlier: TTL of 128, no TIMESTAMP option. Linux and macOS share TTL 64 but differ in TCP options order. Tools like p0f and Zardaxt (used by DataDome in production) classify OS from these values.

The **JA4T** fingerprint formalizes this: `Window_Size, Options, MSS, TTL`. It's compact enough to index and fast enough to check on every connection.

### The proxy problem

When traffic routes through a proxy (SOCKS5, CONNECT), the target server sees the **proxy's** TCP stack, not yours. If the proxy runs Linux (TTL=64, Linux TCP options) but your User-Agent claims Windows (TTL=128, Windows TCP options), that's a detectable mismatch.

In practice, most proxy servers run Linux. This means:

- **macOS User-Agents**: Safe. macOS and Linux both use TTL=64, so the TCP layer is consistent.
- **Windows User-Agents**: Risky. TTL=64 from the Linux proxy contradicts the expected TTL=128 from a Windows machine.

This is the kind of cross-layer inconsistency that modern systems catch — the TCP layer and the HTTP layer are telling different stories about the operating system.

## Layer 1: TLS ClientHello

The TLS handshake happens before any HTTP data crosses the wire. The ClientHello message contains a rich set of signals:

- **Cipher suites**: count, order, values (including GREASE tokens)
- **TLS extensions**: count, order, values (including BoringSSL-specific ones)
- **Supported groups** (elliptic curves)
- **Signature algorithms**
- **ALPN values** (h2, http/1.1)
- **Key share groups**

Each browser has a distinct combination. Chrome uses BoringSSL, Firefox uses NSS, Safari uses Apple's SecureTransport. The crypto libraries produce fundamentally different ClientHello messages — different cipher suites, different extension sets, different ordering.

### JA3: the original, now largely obsolete

JA3 hashes TLS version + cipher suites + extensions + elliptic curves + EC point formats into an MD5 fingerprint. It worked well until Chrome 110 (January 2023) introduced TLS extension order randomization — a deliberate anti-fingerprinting measure. Now every Chrome connection produces a different JA3 hash:

| Impersonation Target | JA3 Hash |
|---------------------|----------|
| Chrome 120 | `9cc9e346...` |
| Chrome 124 | `351d0eae...` |
| Chrome 131 | `cdbf6205...` |
| Chrome 133 | `a6d135b0...` |
| Chrome 136 | `2d04cd75...` |

Different hash every time, same browser. JA3 is still useful for detecting non-browser clients (Python `requests`, Go's `net/http`, raw curl) which don't randomize — but it's useless for distinguishing Chrome versions.

### JA4: the current standard

[JA4](https://github.com/FoxIO-LLC/ja4), universally adopted by Cloudflare, AWS WAF, VirusTotal, and Akamai as of 2026, fixes this with a three-part fingerprint: `a_b_c`.

- **Part A** (human-readable): protocol type, TLS version, SNI presence, cipher count, extension count, first ALPN
- **Part B**: SHA256 of **sorted** cipher suites — immune to randomization
- **Part C**: SHA256 of **sorted** extensions + signature algorithms

Sorting before hashing is the key innovation. Chrome can randomize extension order all it wants — the sorted hash is stable.

Empirical captures confirm this. All Chrome 120-131 targets produce the same JA4 parts A and B, with part C changing only when Chrome updated its signature algorithms between versions 131 and 133:

| Chrome Version Range | JA4 |
|---------------------|-----|
| 120 - 131 | `t13d1516h2_8daaf6152771_02713d6af862` |
| 133 - 136+ | `t13d1516h2_8daaf6152771_d8a2da3f94cd` |

The `t13d1516h2` prefix decodes to: TLS 1.3, 15 cipher suites (after deduplication/GREASE removal), 16 extensions, HTTP/2 ALPN. Cloudflare sees 15 million unique JA4 fingerprints daily across 500 million+ user agents. A Python script using the `requests` library has a JA4 that matches exactly zero of those 15 million real browser fingerprints.

### The JA4+ family

JA4 spawned a family of fingerprints covering the full stack:

- **JA4S**: Server Hello fingerprint
- **JA4H**: HTTP client fingerprint (header names, values, cookies)
- **JA4X**: X.509 certificate fingerprint
- **JA4T**: TCP fingerprint (Layer 0 above)
- **JA4SSH**: SSH fingerprint

These are composable. A detection system can check JA4 (TLS) + JA4T (TCP) + JA4H (HTTP) for cross-layer consistency in a single lookup.

### Browser TLS characteristics

Each browser family has a distinct cipher suite profile:

| Browser | Cipher Suites | Extensions |
|---------|--------------|------------|
| Chrome | 16 | 18 (15 + 3 GREASE) |
| Firefox | 17 | 16-17 |
| Safari | 20 | 14 |

Safari has the most cipher suites but fewest extensions. Firefox sits in the middle. These counts alone narrow the field before you even look at values.

## Layer 2: HTTP/2 SETTINGS

Immediately after TLS, the HTTP/2 connection opens with a SETTINGS frame. Each browser sends different parameters — and this alone is enough to distinguish Chrome, Firefox, and Safari.

### Akamai fingerprint format

The industry-standard format is: `SETTINGS|WINDOW_UPDATE|PRIORITY|PSEUDO_HEADER_ORDER`

Empirical captures from each browser:

| Browser | Akamai HTTP/2 Fingerprint |
|---------|--------------------------|
| Chrome | `1:65536;2:0;4:6291456;6:262144\|15663105\|0\|m,a,s,p` |
| Firefox | `1:65536;2:0;4:131072;5:16384\|12517377\|0\|m,p,a,s` |
| Safari | `2:0;3:100;4:2097152;9:1\|10420225\|0\|m,s,a,p` |

These are completely distinct. Chrome uses INITIAL_WINDOW_SIZE of 6,291,456. Firefox uses 131,072 — 48x smaller. Safari uses entirely different SETTINGS IDs (3=MAX_CONCURRENT_STREAMS, 9=SETTINGS_ENABLE_CONNECT_PROTOCOL) that Chrome doesn't even send.

The WINDOW_UPDATE values differ too: Chrome sends 15,663,105; Firefox 12,517,377; Safari 10,420,225.

### Pseudo-header order: the silent identifier

HTTP/2 requires four pseudo-headers (`:method`, `:authority`, `:scheme`, `:path`) before any regular headers. The order is technically arbitrary, but each browser has a fixed convention:

| Browser | Pseudo-Header Order |
|---------|-------------------|
| Chrome | `:method`, `:authority`, `:scheme`, `:path` (`masp`) |
| Firefox | `:method`, `:path`, `:authority`, `:scheme` (`mpas`) |
| Safari | `:method`, `:scheme`, `:path`, `:authority` (`mspa`) |
| curl (default) | `:method`, `:path`, `:scheme`, `:authority` (`mpsa`) |

Note that default curl matches **no browser at all**. This single signal — four headers in the wrong order — is enough to flag a connection as automated. An HTTP client that gets TLS right but sends pseudo-headers in curl's default order is trivially detected.

This fingerprint is stable across versions. All Chrome targets from version 120 through 142 produce the identical HTTP/2 SETTINGS and pseudo-header order. The HTTP/2 implementation changes far less frequently than TLS parameters.

## Layer 3: HTTP Headers

Header order, presence, and values are all signals. Each browser sends headers in a fixed, characteristic sequence, and anti-bot systems compare the observed order against known-good patterns.

### Chrome 136 header sequence

```
:method, :authority, :scheme, :path
sec-ch-ua, sec-ch-ua-mobile, sec-ch-ua-platform
upgrade-insecure-requests, user-agent, accept
sec-fetch-site, sec-fetch-mode, sec-fetch-user, sec-fetch-dest
accept-encoding, accept-language, priority
```

### Firefox 144 header sequence

```
:method, :path, :authority, :scheme
user-agent
accept, accept-language, accept-encoding
upgrade-insecure-requests
sec-fetch-dest, sec-fetch-mode, sec-fetch-site, sec-fetch-user
priority, te: trailers
```

### Safari 260 header sequence

```
:method, :scheme, :authority, :path
sec-fetch-dest
user-agent, accept
sec-fetch-site, sec-fetch-mode
accept-language, priority, accept-encoding
```

The differences are striking:

- **Client Hints** (`sec-ch-ua`, `sec-ch-ua-mobile`, `sec-ch-ua-platform`): Chrome-only. Firefox and Safari never send them. If your request claims to be Firefox but includes `sec-ch-ua` headers, it's instantly flagged.
- **`te: trailers`**: Firefox-only. No other browser sends it.
- **`sec-fetch-dest` position**: Chrome sends it after `sec-fetch-mode`. Safari sends it first among regular headers. Firefox sends it first among the `sec-fetch` group.
- **`accept-encoding` position**: Chrome sends it near the end. Safari sends it last. Firefox sends it after `accept-language`.
- **`user-agent` position**: Chrome sends it in the middle (after `upgrade-insecure-requests`). Firefox sends it first among regular headers. Safari sends it after `sec-fetch-dest`.

### Cross-header consistency

Headers must agree with each other:

| Signal A | Must Match | Signal B |
|----------|-----------|----------|
| `sec-ch-ua-platform` | ↔ | User-Agent OS string |
| `sec-ch-ua` browser version | ↔ | TLS JA4 fingerprint |
| `accept-language` | ↔ | Proxy IP geolocation |
| HTTP/2 pseudo-header order | ↔ | TLS fingerprint (browser identity) |
| `sec-fetch-*` values | ↔ | Request context (navigation vs. API call) |

A request with `sec-ch-ua-platform: "Windows"` and `User-Agent: ...Macintosh; Intel Mac OS X...` is an instant fail. DataDome's own documentation states: "Using a Windows Chrome User Agent and a Linux platform header may result in blocking."

### GREASE in Sec-Ch-Ua

Chrome rotates the "Not A Brand" GREASE string per version:

- Chrome 136: `"Not.A/Brand";v="99"`
- Chrome 138: `"Not)A;Brand";v="8"`

The GREASE brand in `sec-ch-ua` must match the Chrome version claimed by the TLS fingerprint. A stale GREASE string is a version mismatch signal.

## The Players

### Cloudflare Bot Management v3

Scale: 46 million HTTP requests per second across its network.

Cloudflare runs a multi-engine detection system:

- **ML model (v8)**: Three feature categories — global (inter-request aggregates), high-cardinality (per-IP patterns), and single-request signals. Claims 95% accuracy against distributed residential proxy attacks.
- **Heuristics engine**: 50+ rules built on HTTP/2 fingerprints and ClientHello extensions.
- **JS Detection (JSD)**: Identifies headless browsers via `navigator.webdriver`, missing APIs, and other DOM-level signals.
- **Per-customer ML** (introduced 2025): Custom models trained on each site's specific traffic baseline. What looks normal for a SaaS dashboard is anomalous for an e-commerce storefront.

Key technical detail: Cloudflare sees 15 million unique JA4 fingerprints daily. Their system correlates JA4 against User-Agent — 500 million+ user agent strings mapped to expected JA4 values. A mismatch between claimed browser and observed TLS behavior is one of their primary signals.

Cloudflare also detects Chrome DevTools Protocol (CDP) usage, which they estimate covers "99% of bots." CDP leaves detectable artifacts even when `navigator.webdriver` is patched out.

**Turnstile** (Cloudflare's CAPTCHA replacement): Independent testing found it catches only about 33% of bot traffic, compared to reCAPTCHA's 69%. For HTTP-level automation that doesn't execute JavaScript, Turnstile is irrelevant — it requires a browser context to even load.

### Akamai Bot Manager

Akamai's approach mirrors Cloudflare's in principle but differs in emphasis:

- JA3 fingerprints compared against a known-good database (JA4 adopted commercially in 2026)
- HTTP/2 fingerprinting with their own format (the `SETTINGS|WINDOW_UPDATE|PRIORITY|PSEUDO_HEADER_ORDER` format described above originated from Akamai's Black Hat EU 2017 research)
- IP reputation: datacenter ASNs (AWS, OVH, Hetzner) are immediately flagged
- Behavioral analysis: identical scrolling patterns, perfectly timed clicks, predictable navigation sequences
- Active challenges for browser authenticity confirmation

### DataDome

DataDome is the most aggressive of the three, analyzing 1,000+ signals on 100% of requests with sub-2ms response time (5 trillion signals per day):

**Server-side signals** (heavier weight in their scoring):
- Request header analysis (order matters)
- HTTP version detection
- TLS/JA3/JA4 fingerprinting
- IP reputation scoring
- **TCP/IP OS fingerprinting via Zardaxt** — they're one of the few vendors openly using Layer 0

**Client-side signals** (35+ behavioral):
- Mouse movement, scroll velocity, typing cadence, click coordinates
- GPU rendering capabilities, font availability, JS engine specifics
- Per-customer ML models (85,000+ customer-specific models as of 2025)
- LLM crawler traffic detection (added 2025)

A critical insight from DataDome's own research: server-side signals carry more weight than client-side JavaScript fingerprinting. They've found that "JS fingerprinting is prone to false positives and not as heavily weighted." For requests that never execute JavaScript, this means the TLS + HTTP/2 + header layers are what matter.

## What Changed in 2025-2026

The bot detection landscape shifted significantly:

**JA4 replaced JA3 as the industry standard.** Chrome's TLS extension randomization (since Chrome 110, January 2023) made JA3 unreliable for browser identification. JA4's sorted-before-hashing approach solved this. By 2026, Cloudflare, AWS WAF, VirusTotal, and Akamai all use JA4 as a primary signal.

**Detection moved upstream.** The trend is toward catching bots earlier in the connection lifecycle. TLS handshake checks happen before the page loads, before JavaScript runs, before any CAPTCHA renders. If your ClientHello looks wrong, the connection may be terminated or routed to a honeypot before HTTP even begins.

**Per-customer ML models arrived.** Cloudflare's per-customer models (2025) train on each site's specific traffic patterns. A request that looks normal globally can be anomalous for a specific site. This makes generic evasion harder — you need to look normal for the specific site you're accessing, not just for the internet in general.

**Residential proxy detection improved.** Cloudflare's v8 ML model claims per-request detection of residential proxy abuse without IP blocking. The signals include request timing, header patterns, and behavioral fingerprints that distinguish real residential users from proxy traffic, even when the IP itself is classified as residential.

**CDP detection became standard.** Chrome DevTools Protocol detection is now a primary signal. CDP leaves artifacts in the browser environment that persist even when common patches (like removing `navigator.webdriver`) are applied. Cloudflare estimates 99% of browser-based bots use CDP.

**Browser attestation appeared.** Google's browser attestation APIs allow servers to verify that the connecting client is an unmodified, vendor-signed browser binary. Modified Chromium builds fail integrity checks. This is currently limited in deployment but represents the direction: hardware-rooted trust for browser identity.

**Fingerprint inconsistency detection formalized.** An IMC 2025 paper introduced data-driven rules for detecting both spatial inconsistencies (cross-attribute contradictions in a single request) and temporal inconsistencies (attribute changes across requests from the same session). The approach reduced bot evasion success by 45-48%.

## What Actually Matters vs. What's Theater

For HTTP-level requests that don't execute JavaScript (API calls, data fetching, scraping), the detection stack collapses to a smaller set of signals that actually matter:

### What matters

1. **TLS fingerprint (JA4)**: The single most important signal. A Python `requests` library has a JA4 that matches zero real browsers. Using a TLS library that replays a real browser's ClientHello is table stakes — see [TLS Fingerprinting with curl_cffi](/note/tls-fingerprinting-curl-cffi/) for how this works in practice.

2. **HTTP/2 SETTINGS + pseudo-header order**: The second gate. Default curl sends pseudo-headers in `mpsa` order, matching no browser. Chrome uses `masp`, Firefox `mpas`, Safari `mspa`. Wrong SETTINGS values or wrong pseudo-header order flags the connection before the first header is read.

3. **Header order and presence**: Chrome, Firefox, and Safari each send headers in a fixed, characteristic sequence. Missing `sec-fetch-*` headers when claiming to be Chrome is an automation signal. Including `sec-ch-ua` when claiming to be Firefox is equally bad.

4. **Cross-layer consistency**: Every signal must agree. TLS says Chrome 136, headers must say Chrome 136, `sec-ch-ua-platform` must match the User-Agent OS, and `accept-language` should be plausible for the IP's geolocation.

5. **IP reputation**: Datacenter ASNs are flagged by default. Residential IPs get more trust but are increasingly fingerprinted themselves. When building high-throughput systems that hit these detection layers, [adaptive rate limiting](/note/aimd-rate-limiting/) becomes essential to avoid triggering behavioral signals.

### What's theater (for non-JS requests)

- **JavaScript fingerprinting**: Irrelevant if you never execute JS. Canvas fingerprinting, WebGL rendering, `navigator` property checks — none of these fire for a simple HTTP request.
- **Behavioral signals**: Mouse movement, scroll patterns, typing cadence — these require a browser context. For API-style requests, behavioral analysis is limited to request timing and navigation patterns.
- **CAPTCHAs and Turnstile**: These require a browser to render. They're a gate for browser traffic, not for HTTP clients.

The practical implication: for simple HTTP requests, the TLS + HTTP/2 + header stack is the entire battle. Get those three layers right and consistent, and most anti-bot systems will pass you through. Get any one of them wrong, and everything downstream is irrelevant — you're already flagged before your request body is read.

## Sources

- [JA4+ Network Fingerprinting](https://github.com/FoxIO-LLC/ja4) — JA4 TLS fingerprint specification and tooling (FoxIO)
- [Cloudflare Bot Management](https://developers.cloudflare.com/bots/) — Cloudflare's bot detection documentation
- [Akamai Bot Manager](https://www.akamai.com/products/bot-manager) — Akamai's bot classification system
- [DataDome Bot Protection](https://datadome.co/bot-management-protection/) — DataDome's detection methodology
- [p0f](https://lcamtuf.coredump.cx/p0f3/) — passive TCP/IP fingerprinting tool (Michal Zalewski)
- [HTTP/2 RFC 9113](https://datatracker.ietf.org/doc/html/rfc9113) — HTTP/2 specification including SETTINGS frames
- [Client Hints Infrastructure](https://wicg.github.io/client-hints-infrastructure/) — W3C specification for `sec-ch-ua` headers

---

# DNS Resolution: The Full Picture

URL: https://krowdev.com/guide/dns-resolution-full-picture/
Kind: guide | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: dns, networking, fundamentals
Series: domain-infrastructure (#1)

> How DNS resolution works — root servers, TLD nameservers, record types, response codes, zone files, and why queries are nearly invisible.

## Agent Context

- Canonical: https://krowdev.com/guide/dns-resolution-full-picture/
- Markdown: https://krowdev.com/guide/dns-resolution-full-picture.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: guide
- Maturity: budding
- Confidence: high
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-03-28
- Modified: 2026-03-28
- Words: 3171 (15 min read)
- Tags: dns, networking, fundamentals
- Series: domain-infrastructure (#1)
- Related: whois-dead-long-live-rdap, bot-detection-2026, aimd-rate-limiting
- Content map:
  - h2: The Hierarchy
  - h3: Root servers
  - h3: TLD nameservers
  - h3: Authoritative nameservers
  - h2: Resolution Walk-Through
  - h3: Why it feels instant
  - h3: What dig shows you
  - h2: Record Types That Matter
  - h3: A and AAAA
  - h3: NS
  - h3: SOA
  - h3: MX
  - h3: CNAME
  - h3: TXT
  - h2: Response Codes
  - h2: DNS over HTTPS (DoH)
  - h3: Cloudflare DoH
  - h3: Google DoH
  - h3: Why DoH matters
  - h2: DNS Query Anatomy
  - h3: The fingerprint surface comparison
  - h2: Zone Files
  - h3: BIND format
  - h3: Scale
  - h3: What zone files don't contain
  - h3: Zone file diffing
  - h2: Putting It Together
  - h2: Sources
- Diagrams: Mermaid fences are paired with adjacent ASCII companions in this document (3 Mermaid, 3 ASCII); HTML figures expose rendered SVG plus copyable Mermaid/ASCII source tabs.
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

DNS is a hierarchical, distributed database that turns names into numbers. You type `example.com`, and somewhere between your keyboard and a TCP connection opening, a chain of servers conspires to produce `93.184.216.34`. This guide covers every layer of that process — the hierarchy, the resolution walk, the record types, the response codes, the wire format, and the zone files that make it all work.

## The Hierarchy

Every domain name resolution walks down a tree. The tree has exactly three levels that matter:

```mermaid
graph TD
  root["<b>.</b> (root)"]
  com[com]
  net[net]
  org[org]
  ex1[example]
  ex2[example]
  ex3[example]
  www[www]

  root --> com
  root --> net
  root --> org
  com --> ex1
  net --> ex2
  org --> ex3
  ex1 --> www

  classDef tld fill:transparent,stroke-dasharray:0;
  class com,net,org tld;
```

```ascii
                    . (root)
                    |
        ┌───────────┼───────────┐
       com         net         org        ← TLD (Top-Level Domain)
        |           |           |
    example      example     example      ← Second-Level Domain (SLD)
        |
       www                                ← Subdomain / host
```

*Levels: root → TLD (`com`, `net`, `org`) → second-level domain → subdomain.*

Each node in the tree has its own set of authoritative nameservers — servers that are the final source of truth for records at that level. The root knows about TLDs. TLD servers know about second-level domains. Authoritative nameservers know about everything under their domain. The [domain registration process](/note/domain-registration-icann-to-browser/) determines which nameservers appear at each level.

### Root servers

There are [13 root server identities](https://www.iana.org/domains/root/servers), named `a.root-servers.net` through `m.root-servers.net`, operated by 12 organizations (Verisign operates two). Through anycast routing, those 13 identities map to [over 1,700 physical instances](https://root-servers.org/) worldwide. A query to `a.root-servers.net` from Tokyo hits a different machine than the same query from London, but both return the same answer.

Root servers answer exactly one question: *"Who is authoritative for this TLD?"* They don't know about `example.com`. They know that `.com` is handled by Verisign's `a.gtld-servers.net` through `m.gtld-servers.net`.

### TLD nameservers

Each TLD has its own set of nameservers operated by the registry. For `.com`, that's Verisign. For `.org`, that's Public Interest Registry. For `.de`, that's DENIC.

TLD nameservers answer: *"Who is authoritative for this second-level domain?"* They store NS (nameserver) records pointing to each registered domain's authoritative nameservers. This is essentially the content of the zone file.

### Authoritative nameservers

The final stop. These are the nameservers set by the domain owner (or their hosting provider). They hold the actual records — A records, MX records, CNAME records, everything. When Cloudflare or AWS Route 53 "hosts your DNS," they're running your authoritative nameservers.

## Resolution Walk-Through

When you look up `example.com`, your resolver performs an iterative walk down the hierarchy. Here's the full sequence:

```mermaid
sequenceDiagram
  autonumber
  participant C as Client
  participant R as Recursive resolver
  participant Root as Root server
  participant TLD as .com TLD server
  participant Auth as Authoritative NS

  C->>R: What is example.com?
  R->>Root: Who handles .com?
  Root-->>R: Ask a.gtld-servers.net (Verisign)
  R->>TLD: Who handles example.com?
  TLD-->>R: Ask ns1.example.com at 93.184.216.34
  R->>Auth: What is example.com?
  Auth-->>R: A record 93.184.216.34
  R-->>C: 93.184.216.34
```

```ascii
1. Client → Recursive resolver: "What is example.com?"
2. Resolver → Root server: "Who handles .com?"
   Root → Resolver: "Ask a.gtld-servers.net (Verisign)"
3. Resolver → TLD server: "Who handles example.com?"
   TLD → Resolver: "Ask ns1.example.com at 93.184.216.34"
4. Resolver → Authoritative NS: "What is example.com?"
   Auth NS → Resolver: "A record: 93.184.216.34"
5. Resolver → Client: "93.184.216.34"
```

Three actors make this work:

**Recursive resolver** — does the full walk for you. This is 1.1.1.1 (Cloudflare), 8.8.8.8 (Google), or whatever your ISP provides. When you configure DNS settings on your machine, you're choosing your recursive resolver. It caches aggressively — after the first lookup, it knows who handles `.com` for hours.

**TLD nameserver** — operated by the registry (e.g., Verisign for `.com`). Contains NS records for every registered domain under that TLD. The `.com` TLD servers know about all ~160 million `.com` domains.

**Authoritative nameserver** — the final source of truth for a domain's records. Set by the domain owner or their hosting provider. This is where your A record, MX record, and everything else actually lives.

### Why it feels instant

In practice, steps 2 and 3 are almost always cached. Your recursive resolver already knows who handles `.com` (the TTL on root-to-TLD delegations is 48 hours). It probably has the NS records for popular domains cached too. The typical cold lookup takes 50-100ms. A warm lookup (fully cached) takes under 5ms.

### What `dig` shows you

You can watch this process with `dig`:

```bash
# Full recursive trace — shows each step
dig +trace example.com

# Ask a specific resolver
dig @1.1.1.1 example.com A

# Ask an authoritative server directly
dig @ns1.example.com example.com A +norecurse

# Query for NS records (who handles this domain?)
dig example.com NS

# Get the SOA record (zone metadata)
dig example.com SOA
```

The `+trace` flag is particularly instructive. It starts at the root and follows each delegation, showing you exactly which server returned what. The output reads like a conversation between your machine and the hierarchy.

## Record Types That Matter

DNS has dozens of record types. Seven of them cover nearly everything you'll encounter:

| Type | Purpose | Example | TTL range |
|------|---------|---------|-----------|
| **A** | IPv4 address | `example.com → 93.184.216.34` | 60s – 86400s |
| **AAAA** | IPv6 address | `example.com → 2606:2800:220:1:248:1893:25c8:1946` | 60s – 86400s |
| **NS** | Nameserver delegation | `example.com → ns1.example.com` | 3600s – 172800s |
| **SOA** | Start of Authority (zone metadata) | Serial number, refresh intervals | 3600s – 86400s |
| **MX** | Mail server (with priority) | `10 mail.example.com` | 3600s – 86400s |
| **CNAME** | Alias to another name | `www.example.com → example.com` | 60s – 86400s |
| **TXT** | Arbitrary text data | SPF records, domain verification tokens | 300s – 86400s |

### A and AAAA

The workhorses. An A record maps a name to an IPv4 address. AAAA does the same for IPv6. A domain can have multiple A records (round-robin load balancing) and both A and AAAA records simultaneously (dual-stack).

### NS

Nameserver records define delegation. The NS records in the `.com` zone for `example.com` point to `ns1.example.com` and `ns2.example.com`. These are what the TLD server returns in step 3 of the resolution walk. Without NS records, a domain can be registered but won't resolve — it's not in the zone.

### SOA

Every zone has exactly one SOA (Start of Authority) record. It contains the primary nameserver name, the responsible party's email (encoded as a DNS name), a serial number that increments on changes, and timing parameters for zone transfers. The serial number is how secondary nameservers know when to pull updates.

### MX

Mail exchange records have a priority value (lower = preferred) and a target hostname. When sending email to `user@example.com`, the sender's mail server queries the MX records for `example.com` and connects to the lowest-priority server that responds. Multiple MX records with different priorities provide failover.

### CNAME

A canonical name record is an alias. `www.example.com CNAME example.com` means "look up `example.com` instead." CNAMEs can't coexist with other record types at the same name — a name is either a CNAME or it has direct records, never both. This constraint trips people up regularly (you can't put a CNAME at the zone apex because SOA and NS records must exist there).

### TXT

Free-form text records. Originally meant for human-readable notes, now heavily used for machine-readable data: SPF records for email authentication (`v=spf1 include:_spf.google.com ~all`), DKIM public keys, domain verification tokens for Google/Microsoft/Cloudflare, and DMARC policies. A single domain commonly has 5-10 TXT records.

## Response Codes

Every DNS response includes a 4-bit response code (RCODE) in the header. Four codes matter in practice:

| Code | Name | Meaning | What it tells you |
|------|------|---------|-------------------|
| 0 | **NOERROR** | Name exists, records returned | The domain resolves. Records are in the answer section. |
| 3 | **NXDOMAIN** | Name does not exist | The authoritative server for this zone has no record of this name. For a query against a TLD server, this means no registrar has placed the domain in the zone. |
| 2 | **SERVFAIL** | Server failure | The resolver couldn't complete the query. Could be a timeout talking to an upstream server, a DNSSEC validation failure, or a misconfigured zone. Retry with a different resolver. |
| 5 | **REFUSED** | Policy refusal | The server declined to answer. Usually means you're querying a server that doesn't serve that zone, or an authoritative server that doesn't allow recursive queries. Try a different server. |

**NXDOMAIN** is the most interesting signal. When a TLD nameserver returns NXDOMAIN for `example.com`, it means no delegation exists — no registrar has placed NS records for that name in the zone. This is the fastest possible way to determine that a domain doesn't resolve (a single UDP round-trip to the TLD server).

**The caveat**: A domain can be registered but absent from the zone. Domains in `serverHold` or `clientHold` status, domains with no nameservers configured, and domains in `redemptionPeriod` are all registered but return NXDOMAIN. An NXDOMAIN response tells you the domain isn't in the zone — not that it's unregistered.

## DNS over HTTPS (DoH)

Traditional DNS uses UDP on port 53, unencrypted. Every query and response travels in plaintext. Your ISP can see every domain you look up. Any network middlebox can intercept and modify responses.

DNS over HTTPS wraps DNS queries inside standard HTTPS requests. From a network perspective, DoH traffic is indistinguishable from normal web browsing — it's TLS-encrypted traffic on port 443.

### Cloudflare DoH

```bash
curl -s "https://cloudflare-dns.com/dns-query?name=example.com&type=A" \
  -H "Accept: application/dns-json" | jq .
```

Response:

```json
{
  "Status": 0,
  "TC": false,
  "RD": true,
  "RA": true,
  "AD": true,
  "CD": false,
  "Question": [
    { "name": "example.com", "type": 1 }
  ],
  "Answer": [
    {
      "name": "example.com",
      "type": 1,
      "TTL": 3600,
      "data": "93.184.216.34"
    }
  ]
}
```

The fields in the response map directly to DNS header flags: `Status` is the RCODE (0 = NOERROR), `TC` is Truncated, `RD` is Recursion Desired, `RA` is Recursion Available, `AD` is Authenticated Data (DNSSEC validated), and `CD` is Checking Disabled.

For a non-existent domain, `Status` would be `3` (NXDOMAIN) and the `Answer` array would be empty.

### Google DoH

```bash
curl -s "https://dns.google/resolve?name=example.com&type=A" | jq .
```

Same JSON structure with minor field name differences. Both services support `type=A`, `type=AAAA`, `type=NS`, `type=MX`, `type=TXT`, and any other valid record type.

### Why DoH matters

Three reasons, in order of practical importance:

1. **Privacy**: Your DNS queries are encrypted. Your ISP (and anyone else on the network path) can't see which domains you're looking up. They can see you're talking to `cloudflare-dns.com`, but not what you're asking.

2. **Integrity**: HTTPS provides authenticity. A network middlebox can't inject forged DNS responses (a technique used by some ISPs to redirect failed lookups to ad pages and by some governments for censorship).

3. **Convenience**: JSON responses are trivial to parse in any language. No DNS library needed — any HTTP client works. This makes DNS lookups accessible from environments where raw UDP isn't available (browsers, serverless functions, restricted networks).

The tradeoff is latency. A raw UDP DNS query to 1.1.1.1 completes in ~10ms. A DoH query adds TLS handshake overhead on the first request (~50ms total), though subsequent queries reuse the connection.

## DNS Query Anatomy

A DNS query is remarkably small. When `dig` sends a query, it constructs a single UDP datagram:

```mermaid
flowchart TB
  Header["<b>Header</b> (12 bytes)<br/>Transaction ID: random 16-bit<br/>Flags: RD=1 (recursion desired)<br/>Question count: 1"]
  Question["<b>Question section</b><br/>Name: example.com (variable length)<br/>Type: A, AAAA, NS, MX, ...<br/>Class: IN (Internet)"]
  EDNS["<b>EDNS(0) OPT record</b> (optional, ~11 bytes)<br/>UDP payload size: 4096<br/>DNSSEC OK (DO) flag: 0 or 1<br/>DNS Cookie (optional)"]

  Header --> Question --> EDNS
```

```ascii
┌──────────────────────────────────────────┐
│ Header (12 bytes)                        │
│  - Transaction ID: random 16-bit         │
│  - Flags: RD=1 (recursion desired)       │
│  - Question count: 1                     │
├──────────────────────────────────────────┤
│ Question section                         │
│  - Name: example.com (variable length)   │
│  - Type: A (or AAAA, NS, MX, etc.)      │
│  - Class: IN (Internet)                  │
├──────────────────────────────────────────┤
│ EDNS(0) OPT record (optional, ~11 bytes) │
│  - UDP payload size: 4096                │
│  - DNSSEC OK (DO) flag: 0 or 1          │
│  - DNS Cookie (optional)                 │
└──────────────────────────────────────────┘
Total: ~40–80 bytes. Single UDP datagram.
```

That's the entire request. No TLS handshake. No HTTP framing. No headers. No User-Agent. No Accept-Language. No cookies. The response is similarly compact — a single UDP datagram back.

### The fingerprint surface comparison

This matters if you care about privacy or anonymity. Compare what a DNS query reveals about the sender versus what an HTTPS request reveals:

**What a DNS query exposes:**

| Element | Typical values | Identifiability |
|---------|----------------|-----------------|
| Source IP | Your IP or proxy IP | Primary identifier |
| EDNS buffer size | 4096 (dig), 1232 (newer default), 512 (legacy) | Minor signal |
| DNSSEC OK flag | 0 or 1 | Negligible |
| DNS Cookie | Present or absent | Negligible |
| Transaction ID | Random 16-bit | Expected to vary |
| RD (Recursion Desired) | Almost always 1 | Universal — not distinguishing |

**What an HTTPS request exposes:**

TLS version, 15+ cipher suites in a specific order, 20+ TLS extensions with GREASE values, HTTP/2 SETTINGS frame, WINDOW_UPDATE size, HEADERS frame priority, 12+ HTTP headers in a specific order, User-Agent string, Accept-Language, Sec-Ch-Ua (browser brand/version), Sec-Ch-Ua-Platform, Sec-Fetch-Site, Sec-Fetch-Mode, cookie jar, and more.

DNS queries are effectively anonymous except for the source IP. The protocol simply doesn't carry enough metadata to fingerprint the client. This is a fundamental property of UDP-based protocols with fixed, minimal headers — there's no room for the kind of feature negotiation that makes [TLS and HTTP so fingerprintable](/article/bot-detection-2026/).

## Zone Files

A zone file is a text file that describes a DNS zone — the complete set of records that an authoritative nameserver serves. For a TLD like `.com`, the zone file is the master list: every registered domain that has nameserver delegations.

### BIND format

Zone files use BIND format (named after the Berkeley Internet Name Daemon, the most widely deployed DNS server software). Here's a simplified view of what the `.com` zone file looks like:

```bind
; .com zone file (simplified)
$ORIGIN com.
$TTL 172800

; SOA record — zone metadata
com.  IN  SOA  a.gtld-servers.net. nstld.verisign-grs.com. (
              1710000000 ; serial (increments on each update)
              1800       ; refresh (30 min)
              900        ; retry (15 min)
              604800     ; expire (7 days)
              86400 )    ; minimum TTL (1 day)

; TLD nameservers
com.  IN  NS  a.gtld-servers.net.
com.  IN  NS  b.gtld-servers.net.
; ... 11 more

; Domain delegations — one block per registered domain
example.com.  IN  NS  ns1.example.com.
example.com.  IN  NS  ns2.example.com.
; Glue records (needed when NS is under the delegated domain)
ns1.example.com.  IN  A  93.184.216.34
ns2.example.com.  IN  A  93.184.216.34

google.com.   IN  NS  ns1.google.com.
google.com.   IN  NS  ns2.google.com.
; ... ~160 million more entries
```

The `$ORIGIN` directive sets the default suffix. The `$TTL` directive sets the default time-to-live for records. Lines starting with `;` are comments. The SOA record's serial number is the version — secondary nameservers compare their serial to the primary's and pull a zone transfer (AXFR or IXFR) when it's behind.

Glue records deserve a note. When `example.com` delegates to `ns1.example.com`, there's a circular dependency — you need to resolve `ns1.example.com` to find the nameserver for `example.com`, but `ns1.example.com` is under `example.com`. Glue records break the cycle by embedding the A record for the nameserver directly in the parent zone.

### Scale

The `.com` zone file contains approximately 160 million domain delegations. Compressed with gzip, it's roughly 4-5 GB. Uncompressed: 15-20 GB. Verisign regenerates it multiple times per day.

ICANN's Centralized Zone Data Service (CZDS) provides access to TLD zone files for approved purposes. You apply at czds.icann.org, each TLD registry reviews independently, and once approved you get API access for programmatic daily downloads.

### What zone files don't contain

Zone files are a subset of the registry database. They contain only domains that are active and have nameserver delegations. Missing from the zone:

- Domains in `serverHold` or `clientHold` status (suspended by registry or registrar)
- Domains in `pendingDelete` status (queued for deletion)
- Domains in `redemptionPeriod` (expired, recoverable at penalty cost)
- Domains registered but with no nameservers configured
- Registrant or contact information
- Registration and expiry dates

This distinction matters. A domain that returns NXDOMAIN in DNS could be registered but held, suspended, or in a grace period. The zone file reflects what resolves, not what's registered.

### Zone file diffing

Because zone files are regenerated daily, comparing consecutive snapshots reveals domains entering and leaving the zone:

```
Day 1 zone: {example.com, test.com, mydomain.com, ...}
Day 2 zone: {example.com, test.com, ...}

Diff: mydomain.com disappeared from the zone
```

A domain disappearing from the zone could mean:

1. It entered `pendingDelete` — will be fully deleted in 5 days
2. It was placed on `serverHold` or `clientHold` — still registered, just suspended
3. Its nameservers were removed — still registered, just not delegated
4. It entered `redemptionPeriod` — might become available in 30-35 days

The zone file tells you *what changed*. To understand *why*, you need to query the registry's [RDAP service](/note/whois-dead-long-live-rdap/), where the domain's status flags reveal its actual state.

## Putting It Together

DNS resolution is elegant because each layer knows only what it needs to. Root servers know TLDs. TLD servers know second-level domains. Authoritative servers know records. No single server has to know everything, and the caching at every level means the system handles billions of queries per day with response times measured in milliseconds.

The key things to remember:

- **The hierarchy is strict**: root, TLD, authoritative. Three levels. Always.
- **Recursive resolvers do the work**: your machine asks once; the resolver walks the tree.
- **Caching makes it fast**: TTL values control how long each answer stays cached.
- **NXDOMAIN means "not in the zone"**: it doesn't always mean "not registered."
- **DNS queries are tiny**: 40-80 bytes, single UDP datagram, near-zero fingerprint surface.
- **Zone files are the ground truth**: what's in the zone is what resolves. Everything else is metadata stored elsewhere.

The protocol is 40 years old ([RFC 1035](https://datatracker.ietf.org/doc/html/rfc1035), November 1987) and still handles the internet's naming layer with minimal changes to its core design. The extensions — DNSSEC, DoH, DoT, EDNS — are layers on top, not replacements. The hierarchy and the delegation model are the same ones Paul Mockapetris designed in 1983.

## Sources

- [RFC 1034](https://datatracker.ietf.org/doc/html/rfc1034) — Domain Names: Concepts and Facilities (the design document)
- [RFC 1035](https://datatracker.ietf.org/doc/html/rfc1035) — Domain Names: Implementation and Specification (the wire format)
- [RFC 8499](https://datatracker.ietf.org/doc/html/rfc8499) — DNS Terminology (canonical definitions of resolver, authoritative, etc.)
- [RFC 8484](https://datatracker.ietf.org/doc/html/rfc8484) — DNS Queries over HTTPS (DoH)
- [IANA Root Servers](https://www.iana.org/domains/root/servers) — the 13 root server identities and their operators
- [Root Server Technical Operations](https://root-servers.org/) — real-time root server instance map
- [Verisign Domain Name Industry Brief](https://www.verisign.com/en_US/domain-names/dnib/index.xhtml) — TLD registration statistics

---

# Parallel AI Research Pipelines

URL: https://krowdev.com/article/parallel-ai-research-pipelines/
Kind: article | Maturity: budding | Origin: ai-assisted
Author: Agent | Directed by: krow
Tags: agentic-coding, patterns

> Three systems for orchestrating parallel AI agents — JSONL work items, declarative workspaces, and phased research pipelines.

## Agent Context

- Canonical: https://krowdev.com/article/parallel-ai-research-pipelines/
- Markdown: https://krowdev.com/article/parallel-ai-research-pipelines.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: article
- Maturity: budding
- Confidence: high
- Origin: ai-assisted
- Author: Agent
- Directed by: krow
- Published: 2026-03-28
- Modified: 2026-04-21
- Words: 2498 (12 min read)
- Tags: agentic-coding, patterns
- Prerequisites: agentic-coding-getting-started
- Related: claude-md-patterns, building-krowdev-with-agents, aimd-rate-limiting
- Content map:
  - h2: The Problem with Naive Parallel Agents
  - h2: The Three-Phase Pattern
  - h3: The folder structure
  - h2: Isolation: Three Approaches
  - h2: Machine-Readable First
  - h2: The Two-Pass Refinement
  - h3: What the review agent actually produces
  - h2: Live Captures as Ground Truth
  - h2: From Pattern to Tool
  - h3: Layer 1: Natural language prompt (one-shot)
  - h3: Layer 2: Template with variables (repeatable)
  - h3: Layer 3: Task file (executable)
  - h3: Layer 4: CLI tool (trackable)
  - h3: The three-layer prompt sandwich
  - h2: What This Produced
  - h2: Three Systems, One Pattern
  - h2: When to Use Which
  - h2: The Original Prompt
  - h2: Sources
- Diagrams: Mermaid fences are paired with adjacent ASCII companions in this document (2 Mermaid, 2 ASCII); HTML figures expose rendered SVG plus copyable Mermaid/ASCII source tabs.
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

I needed protocol documentation for 19 top-level domains — DNS behavior, WHOIS formats, RDAP endpoints, registration rules, [rate limits](/note/aimd-rate-limiting/), raw captures. Each TLD is its own research unit with its own servers, formats, and quirks. Doing them sequentially would take days.

So I wrote a prompt that launched 19 parallel subagents, each researching one TLD in its own isolated directory, then ran a review pass to find gaps, then launched a second research wave, then a documentation pass. The whole thing ran in one session.

This article is about the pattern that emerged — not the TLD research itself, but the structure for running parallel AI research at scale.

## The Problem with Naive Parallel Agents

The obvious approach: "research these 19 things in parallel." Give each agent a topic and let it go. This fails in predictable ways:

- **Agents overwrite each other.** Two agents writing to the same summary file. Merge conflicts in shared state. Lost work.
- **No consistency.** Agent 1 captures WHOIS response time. Agent 7 doesn't. Agent 12 uses a different JSON schema. You can't compare findings across units.
- **No refinement.** First-pass research always has gaps. Without a review step, gaps stay gaps.
- **No machine-readable output.** Agents default to markdown prose. Prose is hard to aggregate, diff, or feed into code.

If you're coordinating parallel agents inside a product repo instead of a research tree, the same "state first, prose second" habit also shows up in [What I Learned Building krowdev with AI Agents](/article/building-krowdev-with-agents/) and [Writing an Effective CLAUDE.md](/guide/claude-md-patterns/).

## The Three-Phase Pattern

The structure that works:

```mermaid
flowchart LR
  P1["<b>Phase 1: Explore</b><br/>(parallel)"] --> P1out[/"raw findings<br/>per unit"/]
  P1out --> P2["<b>Phase 2: Review &amp; Refine</b><br/>(sequential)"]
  P2 --> P2out[/"cross-unit analysis<br/>v2 template<br/>second research pass"/]
  P2out --> P3["<b>Phase 3: Document</b><br/>(parallel)"]
  P3 --> P3out[/"uniform deliverables"/]
```

```ascii
Phase 1: Explore (parallel)    → raw findings per unit
Phase 2: Review & Refine       → cross-unit analysis → v2 template → second pass
Phase 3: Document (parallel)   → uniform deliverables
```

Each phase has different parallelism characteristics. Phase 1 and 3 are embarrassingly parallel (one agent per unit, no coordination). Phase 2 is sequential — a single review agent reads everything and produces the refined template.

### The folder structure

```tree
research_root/
├── 1_explore/{unit_a, unit_b, ...}/   # Phase 1 workspaces
├── 2_research/{unit_a, unit_b, ...}/  # Phase 2 workspaces
├── 3_writing/{unit_a, unit_b, ...}/   # Phase 3 workspaces
├── {unit}_documentation/              # Final deliverables
├── prompts/                           # Templates (v1, v2)
├── templates/                         # Schemas, response formats
├── summaries/                         # Cross-unit analysis
├── analysis/                          # Review outputs
└── tools/                             # Shared scripts, configs
```

The key insight: **each phase gets its own directory tree.** Phase 2 agents don't touch Phase 1 directories. This makes the workspace append-only at the directory level — you can always go back and see exactly what each agent produced at each stage.

## Isolation: Three Approaches

The single most important rule across all three systems: **agents must not interfere with each other.** There are different ways to enforce this:

**Directory isolation** (research pipeline) — each agent writes only in its assigned directory:

```
Agent for unit "net" in Phase 1:
  CAN write:  1_explore/net/*
  CAN read:   tools/*, prompts/*, templates/*
  CANNOT:     1_explore/org/*, 2_research/*, anything else
```

**Git worktree isolation** (work system) — each agent gets a separate copy of the repository on disk:

```bash
# Each task runs in its own worktree
claude --worktree work-W001 "Fix the port mismatch..."
# Creates branch worktree-work-W001, separate working directory
# Other agents on other worktrees can't see uncommitted changes
```

**Pane isolation** (workspace manager) — each agent runs in its own terminal pane, sharing the repo but partitioned by prompt (the isolation rules resemble [CLAUDE.md boundary patterns](/guide/claude-md-patterns/)):

```yaml
# workspace manager: declarative layout, agents share the repo but work on different dirs
panes:
  - name: agent-01
    closing: "Work ONLY in src/parser/. Commit when done."
  - name: agent-02
    closing: "Work ONLY in src/extraction/. Commit when done."
```

Directory isolation is simplest — no git machinery needed. Worktrees are strongest — agents literally can't see each other's uncommitted work. Pane isolation is fastest to set up — just a YAML file — but relies on the agent obeying its prompt.

For research, directory isolation is sufficient. For code changes, worktrees are safer.

## Machine-Readable First

The second critical rule: **JSON is authoritative, markdown is derived.**

Each agent produces two outputs per phase:
1. `findings.json` — structured data with a defined schema, every field sourced
2. `notes.md` — human-readable summary, explicitly non-authoritative

Why not just markdown? Because the review agent needs to aggregate across all units. Reading 19 markdown files and extracting comparable data is fragile. Reading 19 JSON files with the same schema is trivial.

```json
{
  "unit": "net",
  "registry_operator": "Verisign",
  "lookup_server": "whois.example-registry.com",
  "whois_available_pattern": "No match for \"DOMAIN.NET\".",
  "rdap_base": "https://rdap.verisign.com/net/v1",
  "rdap_available_status": 404,
  "min_label_length": 3,
  "rate_limiting": {"whois": "undocumented", "rdap": "429 + Retry-After"},
  "sources": ["https://www.verisign.com/...", "live probe 2026-03-28"]
}
```

Every field has a `sources` array. If the review agent questions a finding, it can trace back to the original source. No "trust me, I researched it."

## The Two-Pass Refinement

This is what makes the pattern actually work, not just "parallel agents doing things."

**Phase 1** uses a generic template. Agents do their best, but they don't know what they don't know. Some agents capture edge cases others miss. Some discover dimensions the template didn't anticipate.

**The review step** reads all Phase 1 outputs and produces:
- A global findings file (unified spec across all units)
- A taxonomy of categories discovered (not just the ones you predicted)
- A gap analysis (what each unit is missing)
- A **v2 template** incorporating everything Phase 1 revealed

**Phase 2** uses the v2 template. Now every agent knows to look for the edge cases that only some agents found in Phase 1. The quality floor rises dramatically.

The review step isn't quality control — it's knowledge transfer. Phase 1 agents collectively discover what matters. The v2 template broadcasts that knowledge to Phase 2 agents. Each agent in Phase 2 is smarter than any agent in Phase 1 because it has the template that Phase 1 collectively produced.

### What the review agent actually produces

For the TLD research, the review agent read 19 `findings.json` files and produced:

- **Implementation tiers** — grouping TLDs by complexity (trivial: .net is identical to .com; custom: .uk needs a unique parser; special: .ch blocks WHOIS entirely)
- **Parser families** — TLDs sharing the same backend/format (Identity Digital runs .org, .io, .ai with identical WHOIS patterns)
- **Gap analysis** — ".fr agent didn't capture rate limit behavior" / ".se agent missed zone file AXFR access"
- **v2 template** — now includes: "check for AXFR zone file access" (only discovered by the .se agent), "capture WHOIS connection terminator behavior" (only .de closes connection instead of using `<<<`)

## Live Captures as Ground Truth

Agents shouldn't just search the web. They should **probe live systems** and capture raw responses.

```python
# Shared probe tool available to all agents (read-only)
# probe.py --target net --type whois --domain google.net
```

Raw captures serve two purposes:
1. **Truth.** Web search results can be outdated. RFC text can be ambiguous. A raw WHOIS response is unambiguous.
2. **Parser guidance.** When you later implement a parser, the raw captures are your test fixtures. You don't need to re-query live servers.

Captures are immutable — written once, never edited. If a second probe gives different results, you capture both. Contradictions are data.

## From Pattern to Tool

A template describes a pattern. A task file makes it executable. A CLI makes it repeatable. Each layer reduces how much the operator needs to get right.

### Layer 1: Natural language prompt (one-shot)

The TLD research started as a single message:

> "Research all remaining TLDs... use one subagent per TLD, give each its own directory... ensure agents never overwrite each other's work."

This works once. It's not repeatable — the next researcher writes a different prompt, gets different structure, produces incomparable output.

### Layer 2: Template with variables (repeatable)

Extract the pattern into a template with `{{VARIABLES}}`:

```
Phase 1: Explore (parallel — one agent per {{UNIT}})
  - Each agent works ONLY in 1_explore/{{UNIT_ID}}/
  - Live probe: {{PROBE_TARGETS}} via proxied connections
  - Persist: findings.json + notes.md
```

Now anyone can fill in the variables and get the same structure. But it's still manual — you read the template, fill it in mentally, write the prompt.

### Layer 3: Task file (executable)

Make the filled-in template machine-readable — a JSONL record per unit:

```json
{
  "id": "C01",
  "slug": "bot-detection-2026",
  "title": "How Websites Detect Bots in 2026",
  "kind": "article",
  "status": "planned",
  "source_map": "analysis/01-bot-detection-2026.md",
  "sources": ["docs/research/03-anti-bot-landscape-2026.md", "..."],
  "parallel": true
}
```

This is the same pattern as a `tasks.jsonl` in any work system — each line is one unit of work with enough context to build a prompt and launch an agent.

### Layer 4: CLI tool (trackable)

A script reads the task file, builds the prompt, and launches the agent:

```bash
# Development tasks (sequential work system)
work run T001                  # read JSONL → build prompt → claude --worktree

# Parallel agents (declarative workspace manager)
workspace start team.yml       # read YAML → split panes → launch agents

# Content pipeline (same pattern)
scripts/content draft C01      # read JSONL → read source map → claude
```

All three do the same thing: read structured task data, assemble a prompt with the right context, launch Claude. The data model and orchestration differ, but the core loop is identical.

### The three-layer prompt sandwich

workspace manager introduces a useful pattern for prompt assembly — the **three-layer sandwich**:

```mermaid
flowchart TB
  L1["<b>Layer 1: Universal rules</b><br/>TESTING-RULES.md &mdash; same across all agents"]
  L2["<b>Layer 2: Task-specific prompt</b><br/>01-parser-accuracy.md &mdash; unique per agent"]
  L3["<b>Layer 3: Closing block</b><br/>verification + tracking + commit sequence"]
  L1 --> L2 --> L3
```

```ascii
Layer 1: Universal rules     (TESTING-RULES.md — same across all agents)
Layer 2: Task-specific prompt (01-parser-accuracy.md — unique per agent)
Layer 3: Closing block        (verification + tracking + commit sequence)
```

Layer 1 and 3 stay constant. Layer 2 is the variable. This ensures every agent follows the same verification and state-update protocol, regardless of what task it's working on.

The research pipeline has the same structure implicitly: the template is Layer 1 + 3, the unit-specific assignment is Layer 2. Making it explicit (like workspace manager does) is cleaner.

## What This Produced

For the TLD research specifically:

| Metric | Value |
|--------|-------|
| TLDs researched | 19 |
| Phases | 3 (explore, research, documentation) |
| Total agents launched | ~60 (19 per phase + review agents) |
| Raw captures | WHOIS + DNS + RDAP per TLD, both registered and available |
| Final output | 19 implementation guides with raw captures |
| Implementation tiers identified | 5 (trivial → special) |
| Parser families identified | 14 |

The deliverables were dense enough that implementing a new TLD in the scanner required reading one README and copying one set of raw captures as test fixtures. No additional research needed.

## Three Systems, One Pattern

I've now built three systems that all solve the same problem — [coordinating parallel AI agents](/note/multi-agent-coordination-without-llm/) with shared state — in different domains. (For the full retrospective on building krowdev this way, see [What I Learned Building krowdev with AI Agents](/article/building-krowdev-with-agents/).)

| Aspect | Work System | workspace manager | Research Pipeline |
|--------|------------|-----|-------------------|
| Domain | Development tasks | Any parallel agents | Research/writing |
| Task data | JSONL (`tasks.jsonl`) | YAML (workspace config) | JSONL (`items.jsonl`) |
| Isolation | Git worktrees | Terminal panes + prompt rules | Directory per unit |
| Launch | `run T001` | `workspace manager start config.yml` | Subagent per unit |
| Parallelism | `run-all --max 3` | All panes start at once | Per-phase parallel |
| Review | `review T001` (diff + build + test) | RUNBOOK totals + `workspace manager read` | Review agent reads all findings |
| State tracking | JSONL (append-only) | JSONL + RUNBOOK.md | JSON (findings per unit) |
| Prompt assembly | Script builds from item fields | 3-layer sandwich (YAML) | Template + source map |

The shared principles:

1. **JSONL for everything.** Append-only, git-trackable, human-readable, no database server. Every system uses it for state.
2. **Isolation by default.** Whether worktrees, directories, or prompt boundaries — agents don't share mutable state.
3. **Structured launch.** Read task data → build prompt → launch agent. Never hand-write the prompt.
4. **Review as verification.** Automated checks (build, test, schema validation) before human review. Persist the verdict.
5. **The ratchet.** Each agent reads current state, does work, updates state. Progress only moves forward.

## When to Use Which

**Work system** (`a task runner script`) — when tasks are code changes that need build/test verification. Each task gets a worktree, a prompt, and an auto-review. Best for: bug fixes, refactors, feature additions.

**workspace manager** — when you want N agents working simultaneously with visual monitoring. Declarative YAML, all agents start at once, `workspace manager read` to check progress. Best for: parallel reviews, round-based enrichment, any task where you want to watch agents work.

**Research pipeline** — when you're researching N items across the same dimensions and need two-pass refinement. Directory isolation, phased execution, machine-readable findings. Best for: protocol documentation, competitive analysis, API surveys.

All three are overkill for single tasks. Use a plain prompt for that.

## The Original Prompt

For reference, here's the prompt that kicked off the TLD research. One message, natural language, no template:

> Please websearch for all remaining TLDs — same info as we have for .com and .de: basic infos and special stuff, allowed characters and domain rules, price, how to get / availability of domain lists, ways for domain check — DNS, DNS auth, WHOIS, other niche special options — and for all of those the full possible metadata it could provide. Then run for real (use proxies) and capture and store full raw responses as truth and for potential parser/implementation guidance. Use one subagent per TLD and give him his own dir where he can download, code, write etc (persist findings in machine-readable way with sources). Then run review over everything creating a global specs/findings file (with all niches and categories etc). Use that to create v2 template/research task. Then launch second pass of agents (one per TLD, same procedure). Then again review and create a compressed, information-dense documentation for each TLD with everything needed (including raw/real captures in a uniform clean format). Ensure agents never overwrite each other's work / step on each other's toes.

## Sources

- Anthropic, [Common workflows](https://code.claude.com/docs/en/common-workflows)
- Anthropic, [Create custom subagents](https://code.claude.com/docs/en/sub-agents)
- OpenAI, [Codex web](https://developers.openai.com/codex/cloud)
- Git, [`git-worktree` documentation](https://git-scm.com/docs/git-worktree)

The template is the reusable pattern extracted from this. The task file is the machine-readable instance. The CLI is the executor. Each layer makes the pattern more reproducible and less dependent on the operator getting the prompt right.

---

# WHOIS Is Dead, Long Live RDAP

URL: https://krowdev.com/note/whois-dead-long-live-rdap/
Kind: note | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: dns, networking
Series: domain-infrastructure (#2)

> ICANN sunsetted WHOIS in January 2025. RDAP replaced it — structured JSON over HTTPS instead of plaintext over TCP. Here's what changed and why it matters.

## Agent Context

- Canonical: https://krowdev.com/note/whois-dead-long-live-rdap/
- Markdown: https://krowdev.com/note/whois-dead-long-live-rdap.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: note
- Maturity: budding
- Confidence: high
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-03-28
- Modified: 2026-03-28
- Words: 1303 (6 min read)
- Tags: dns, networking
- Series: domain-infrastructure (#2)
- Prerequisites: dns-resolution-full-picture
- Related: dns-resolution-full-picture, bot-detection-2026, aimd-rate-limiting
- Content map:
  - h2: WHOIS: What It Was
  - h2: Why It Died
  - h2: RDAP: What Replaced It
  - h2: The Comparison
  - h2: Bootstrap: Finding the Right Server
  - h2: Quick Examples
  - h2: What This Means Going Forward
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

In January 2025, ICANN officially sunsetted WHOIS for all gTLD registries and registrars. The protocol that powered [domain registration](/note/domain-registration-icann-to-browser/) lookups since 1982 is now deprecated. Its replacement — RDAP — has been mandatory since that date.

If you've ever parsed a WHOIS response, you know why it had to go. If you haven't, consider yourself lucky.

## WHOIS: What It Was

WHOIS was a plain-text protocol running on TCP port 43, formalized in RFC 3912. The entire exchange looked like this:

```
Client opens TCP connection to port 43
Client sends: "example.com\r\n"
Server sends: plain text response (no standard format)
Server closes connection
```

That's the whole protocol. No encryption. No authentication. No structured format. The server dumps a blob of text and hangs up.

The response was human-readable but machine-hostile. Every registry and registrar formatted theirs differently. Parsing required per-server regex patterns — "No match for" vs "NOT FOUND" vs "Domain not found" vs an empty response all meant the same thing. There was no standard way to know.

The referral system made it worse. Thin registries (like Verisign for `.com`) only returned basic data and a pointer to the registrar's own WHOIS server. Getting full registration details required two separate TCP connections: one to the registry, one to the registrar.

## Why It Died

Six problems converged:

1. **No standard format** — every server formatted differently, requiring per-server parsing logic
2. **No standard errors** — "not found" expressed a dozen different ways
3. **Plaintext TCP** — anyone on the network path could see what domains you were looking up
4. **No authentication** — IP-based rate limiting was the only protection
5. **Referral chains** — two TCP connections for one lookup on thin registries
6. **GDPR** — the kill shot. WHOIS was designed to publish registrant names, addresses, phone numbers, and emails. The EU's General Data Protection Regulation (2018) made that illegal for EU data subjects. The response was mass redaction — inconsistent, messy, and legally uncertain.

ICANN had been pushing RDAP since the RFCs landed in 2015. GDPR forced the timeline. By January 2025, all gTLD operators were required to support RDAP. WHOIS port 43 isn't turned off yet — Verisign still runs `whois.verisign-grs.com` — but it's legacy. The ~189 country-code TLDs (`.uk`, `.de`, `.jp`) still rely on WHOIS because ICANN's mandate doesn't apply to them. That's the last holdout.

## RDAP: What Replaced It

Registration Data Access Protocol. HTTPS transport, JSON responses. Defined across six RFCs:

| RFC | What it covers |
|-----|---------------|
| 7480 | HTTP usage in RDAP |
| 7481 | Security services |
| 7482 | Query format |
| 9083 | JSON response format (current) |
| 9224 | Bootstrap — finding the right server |

A query is a single HTTPS GET:

```
GET https://rdap.verisign.com/com/v1/domain/example.com
Accept: application/rdap+json
```

The response is structured JSON — dates in ISO 8601, status codes from a defined vocabulary, entities with roles, nameservers as objects. HTTP 404 means the domain isn't in the registry database (available). HTTP 200 means it's taken. HTTP 429 means you're rate-limited. No guessing.

RDAP was designed with GDPR in mind. Privacy is built into the spec through role-based redaction — registries can omit personal data by default and only reveal it to authenticated, authorized parties via OAuth 2.0 or API keys.

## The Comparison

This is the table that matters. Three ways to look up domain information, each with different tradeoffs:

| Aspect | RDAP | WHOIS (port 43) | DNS / DoH |
|--------|------|-----------------|-----------|
| **Format** | Structured JSON (RFC 9083) | Unstructured text, varies by server | Wire format / JSON (DoH) |
| **Transport** | HTTPS (port 443, encrypted) | Plaintext TCP (port 43) | HTTPS (DoH) or UDP (port 53) |
| **Coverage** | ~77% of TLDs (growing) | Being sunsetted, still ~100% | All resolvable domains |
| **Rate limits** | Per-registry, undocumented, strict | Per-server, undocumented, IP-based | Very generous (Cloudflare/Google) |
| **Data richness** | Full: registrar, dates, status, NS, DNSSEC, entities | Same data, harder to parse | Existence only ([NXDOMAIN / NOERROR](/guide/dns-resolution-full-picture/)) |
| **Authentication** | Optional (OAuth 2.0, API keys, certs) | None | None |
| **Privacy** | GDPR-compliant by design (role-based redaction) | Pre-GDPR design, inconsistent redaction | N/A (no personal data) |
| **Error handling** | HTTP 404 = not found, 429 = rate limited | Varies: "No match", empty, connection refused | RCODE 3 = NXDOMAIN |
| **Parsing** | `json.loads()` | Per-server regex | Standard format |
| **Speed (typical)** | 500ms -- 3s | 200ms -- 2s | 30ms -- 100ms |
| **Best for** | Authoritative confirmation, rich data | ccTLD fallback (~189 without RDAP) | Fast pre-screening |

The takeaway: [DNS resolution](/guide/dns-resolution-full-picture/) is fastest but only tells you if something resolves. WHOIS has the widest coverage but is a parsing nightmare. RDAP gives you everything WHOIS did in a format you can actually use — at the cost of being slower and not yet universal.

## Bootstrap: Finding the Right Server

RDAP has no single server. Each TLD has its own RDAP endpoint. To find the right one, you consult the IANA bootstrap file:

```
https://data.iana.org/rdap/dns.json
```

This JSON file maps TLDs to RDAP server URLs. It currently contains 447 service entries. The structure is an array of `[tld_list, url_list]` pairs:

```json
[
  [["com", "net"], ["https://rdap.verisign.com/com/v1/"]],
  [["uk"], ["https://rdap.nominet.uk/uk/"]],
  [["google", "youtube"], ["https://pubapi.registry.google/rdap/"]]
]
```

Match your TLD against the entries (longest label-wise match wins), then construct the query URL: `{base_url}domain/{domain_name}`. Cache the file locally with a 24-hour TTL.

If you don't want to maintain the bootstrap logic, `rdap.org` acts as a redirect service — query `https://rdap.org/domain/{name}` and it 302-redirects to the authoritative server. But it has strict rate limits (10 requests per 10 seconds) and adds a round-trip, so direct bootstrap is better for anything beyond casual lookups.

## Quick Examples

**WHOIS** (legacy, but still works):

```bash
# Raw WHOIS query over TCP
whois example.com

# Or with netcat, to see exactly what happens
echo "example.com" | nc whois.verisign-grs.com 43
```

**RDAP** (the modern way):

```bash
# Direct query to Verisign's RDAP server
curl -s https://rdap.verisign.com/com/v1/domain/example.com \
  | python3 -m json.tool

# Via rdap.org redirect (follows 302 automatically)
curl -sL https://rdap.org/domain/example.com \
  | python3 -m json.tool

# Check if a domain is available (HTTP 404 = available)
curl -s -o /dev/null -w "%{http_code}" \
  https://rdap.verisign.com/com/v1/domain/thisdomainprobablydoesnotexist.com
# Returns: 404
```

The difference is stark. WHOIS gives you a wall of text you need to regex apart. RDAP gives you `json.loads()` and you're done.

## What This Means Going Forward

WHOIS isn't fully dead yet — the servers are still running and ~189 ccTLDs depend on it exclusively. But for `.com`, `.net`, `.org`, and every other gTLD, RDAP is the authoritative source. New tooling should target RDAP first with WHOIS as a ccTLD fallback. If you're checking availability at scale, a [DNS pre-screen](/guide/dns-resolution-full-picture/) filters out the obvious "taken" domains before you hit RDAP rate limits.

The structured format also opens doors that WHOIS never could: proper error handling, authenticated access for legitimate use cases, and machine-readable responses that don't break when a registry changes their text formatting. At scale, RDAP queries are HTTPS requests — subject to the same [TLS fingerprinting](/note/tls-fingerprinting-curl-cffi/) and rate limiting that any HTTP client faces. Forty years of "just dump some text on port 43" is finally over.

## Sources

- [RFC 3912 — WHOIS Protocol Specification](https://datatracker.ietf.org/doc/html/rfc3912)
- [RFC 7480 — HTTP Usage in the Registration Data Access Protocol (RDAP)](https://datatracker.ietf.org/doc/html/rfc7480)
- [RFC 7481 — Security Services for the Registration Data Access Protocol](https://datatracker.ietf.org/doc/html/rfc7481)
- [RFC 7482 — Registration Data Access Protocol (RDAP) Query Format](https://datatracker.ietf.org/doc/html/rfc7482)
- [RFC 9083 — JSON Responses for the Registration Data Access Protocol](https://datatracker.ietf.org/doc/html/rfc9083)
- [RFC 9224 — Finding the Authoritative Registration Data Access Protocol (RDAP) Service](https://datatracker.ietf.org/doc/html/rfc9224)
- [IANA RDAP Bootstrap File (dns.json)](https://data.iana.org/rdap/dns.json)

---

# Writing an Effective CLAUDE.md

URL: https://krowdev.com/guide/claude-md-patterns/
Kind: guide | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: agentic-coding, patterns

> Patterns for project instruction files that actually change how AI agents behave — boundaries, conventions, and the rules that matter.

## Agent Context

- Canonical: https://krowdev.com/guide/claude-md-patterns/
- Markdown: https://krowdev.com/guide/claude-md-patterns.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: guide
- Maturity: budding
- Confidence: medium
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-03-21
- Modified: 2026-04-21
- Words: 726 (4 min read)
- Tags: agentic-coding, patterns
- Prerequisites: agentic-coding-getting-started
- Related: reviewing-ai-generated-code, agentic-coding-getting-started, building-krowdev-with-agents
- Content map:
  - h2: What CLAUDE.md Actually Does
  - h2: Structure That Works
  - h3: 1. Stack Declaration
  - h2: Stack
  - h3: 2. Key Paths
  - h2: Key Paths
  - h3: 3. Build & Test Commands
  - h2: Build & Test
  - h3: 4. Boundaries
  - h2: Boundaries
  - h2: Anti-Patterns
  - h3: Too Long
  - h3: Too Vague
  - h3: Duplicating the Codebase
  - h3: Static Rules
  - h2: The Compound Effect
  - h2: Testing Your CLAUDE.md
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

CLAUDE.md is the single highest-leverage file in an agentic coding project. It's not documentation — it's a constraint system. Every rule exists because the agent got something wrong at least once.

## What CLAUDE.md Actually Does

When Claude Code starts a session, it reads CLAUDE.md before doing anything else. The file sets:

- **Boundaries** — what the agent must not touch
- **Conventions** — how code should look in this project
- **Build commands** — how to verify changes work
- **File pointers** — where to find key parts of the codebase

Without CLAUDE.md, the agent falls back on training data. Training data is generic. Your project is specific. The gap between those two is where bugs live. If you're new to working with coding agents, the [getting started guide](/guide/agentic-coding-getting-started/) covers the fundamentals before diving into project rules.

## Structure That Works

A well-structured CLAUDE.md has four sections, in this order:

### 1. Stack Declaration

State the framework, language, and key libraries. Be specific about versions when it matters.

```markdown
## Stack
- Astro 6 (NOT Astro 4 — the content collections API changed)
- Svelte 5 for interactive islands (NOT React)
- TypeScript
- Cloudflare Pages hosting
```

The "NOT X" pattern is critical. Agents are trained on millions of React examples and far fewer Svelte ones. Without explicit exclusions, the agent will suggest what it's seen most.

### 2. Key Paths

Point to the files the agent will need most. Don't list every file — list the ones that matter for decision-making.

```markdown
## Key Paths
- `src/content.config.ts` — content schema (Zod)
- `src/layouts/KBEntry.astro` — entry layout
- `src/components/` — Svelte islands + Astro components
- `tests/kb.spec.ts` — e2e tests
```

### 3. Build & Test Commands

Give the agent the exact commands to verify its work. Include the working directory if it's not the repo root.

```markdown
## Build & Test
npm run build          # astro build + pagefind
npx playwright test    # e2e tests
```

### 4. Boundaries

The most important section. List what the agent must not modify, must not install, and must not change.

```markdown
## Boundaries
- `research/corpus/` is READ-ONLY — never modify
- `notes/` is private — never reference in published content
- No new npm dependencies without discussion
- Don't change the build command in package.json
```

Every boundary rule traces to an incident. The agent installed React once — now "Svelte only" is a boundary. The agent modified reference material — now "READ-ONLY" is a boundary. When these rules fail and bad code lands, having a systematic [review process](/guide/reviewing-ai-generated-code/) catches what the constraint system missed.

## Anti-Patterns

### Too Long

CLAUDE.md over 200 lines loses effectiveness. The agent processes it, but signal-to-noise drops. If your CLAUDE.md is getting long, move reference material into separate files and point to them.

### Too Vague

"Write good code" tells the agent nothing. "Use Svelte 5 runes syntax ($state, $derived) not the legacy let-based reactivity" tells it exactly what to do.

### Duplicating the Codebase

Don't paste large code blocks into CLAUDE.md. Point to the file instead: "See `src/content.config.ts` for the content schema." The agent can read the file — it doesn't need a copy.

### Static Rules

CLAUDE.md should evolve. After every session where the agent makes a mistake, add a rule. After every session where a rule prevented a mistake, keep it. Rules that never fire can be pruned.

## The Compound Effect

A good CLAUDE.md doesn't just fix one session. It fixes every future session. Every developer (or agent) who opens the project reads the same constraints. The cost of writing ten lines of rules is repaid across hundreds of sessions. The [krowdev retrospective](/article/building-krowdev-with-agents/) covers how this compound effect played out across a real project built entirely with agents.

:::tip
Start CLAUDE.md on day one. Even five lines — stack name, build command, one boundary — is better than nothing. Then add one rule per mistake. Within a week, the file will be comprehensive and earned.
:::

## Testing Your CLAUDE.md

The simplest test: start a fresh agent session and ask it to make a change. If it violates a convention you care about, you're missing a rule. If it follows conventions you never stated, your rules are working.

## Sources

- Anthropic, [Claude Code overview](https://code.claude.com/docs/en/overview)
- Anthropic, [How Claude remembers your project](https://code.claude.com/docs/en/memory)
- Anthropic, [Common workflows](https://code.claude.com/docs/en/common-workflows)

---

# Reviewing AI-Generated Code

URL: https://krowdev.com/guide/reviewing-ai-generated-code/
Kind: guide | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: agentic-coding, patterns

> A checklist and mental model for reviewing code you didn't write — what to look for when your coding agent hands back a diff.

## Agent Context

- Canonical: https://krowdev.com/guide/reviewing-ai-generated-code/
- Markdown: https://krowdev.com/guide/reviewing-ai-generated-code.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: guide
- Maturity: budding
- Confidence: medium
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-03-21
- Modified: 2026-04-21
- Words: 723 (4 min read)
- Tags: agentic-coding, patterns
- Prerequisites: agentic-coding-getting-started
- Related: building-krowdev-with-agents, claude-md-patterns, agentic-coding-getting-started
- Content map:
  - h2: The Trust Gradient
  - h2: What Agents Get Wrong
  - h3: Wrong Framework Version
  - h3: Dependency Creep
  - h3: Over-Engineering
  - h3: Inconsistent Patterns
  - h3: Silent Assumptions
  - h2: The Review Checklist
  - h2: The "Read the Diff" Habit
  - h2: After the Review
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

The agent writes the code. You own it. That means every line it produces is your responsibility — and you need a systematic way to review it.

## The Trust Gradient

Not all agent output deserves the same scrutiny. Calibrate review depth by risk:

| Risk Level | Examples | Review Approach |
|---|---|---|
| **Low** | CSS tweaks, adding a test, formatting | Scan the diff, verify it builds |
| **Medium** | New component, refactoring, API changes | Read every line, test manually |
| **High** | Auth logic, data mutations, build config | Read every line, trace the control flow, verify edge cases |

The mistake is treating everything as low-risk. The agent will happily modify your build pipeline with the same confidence it uses to fix a typo. The [krowdev retrospective](/article/building-krowdev-with-agents/) has concrete examples of this in a real project.

## What Agents Get Wrong

These failure modes appear consistently across projects and models:

### Wrong Framework Version

The agent's training data includes multiple versions of every framework. It will confidently generate Astro 4 patterns when you need Astro 6, or React class components when you use hooks. Check imports and API calls against your actual framework version.

### Dependency Creep

Ask for one feature, get three new npm packages. Agents default to installing libraries for things the standard library already handles. Before accepting a new dependency: check if the feature exists natively, check the package size, and check when it was last maintained.

### Over-Engineering

A request for "a breadcrumb component" returns a recursive navigation framework with configuration objects and abstract base classes. The agent optimizes for generality; your project needs specificity. If the solution is more complex than the problem, push back.

### Inconsistent Patterns

The agent doesn't remember your conventions between sessions unless [CLAUDE.md](/guide/claude-md-patterns/) tells it. It might use `camelCase` in one file and `snake_case` in another, or mix async patterns within the same module. Check for consistency with existing code.

### Silent Assumptions

The agent makes decisions without flagging them. It might choose a specific caching strategy, pick a default timeout value, or assume a particular database schema. These assumptions are embedded in the code without comments. Read for implicit decisions, not just explicit logic.

## The Review Checklist

Run through this for every non-trivial diff:

**Does it build?**
```bash
npm run build  # or your equivalent
```
Never merge agent output you haven't built locally. "It looks right" is not verification.

**Does it match the request?**
Compare what you asked for against what you got. Agents frequently add features you didn't request, refactor code you didn't mention, or "improve" things that worked fine.

**Does it follow project conventions?**
- Correct framework/library versions
- Consistent naming patterns
- Same file organization as existing code
- No new dependencies without justification

**Is it the right complexity?**
Count the files changed. If you asked for a simple feature and the diff touches 12 files, something went wrong. The right solution is usually the smallest one that works.

**Are there security concerns?**
- User input sanitized?
- No hardcoded secrets?
- No eval() or equivalent?
- API endpoints validated?

**Does it handle the edge cases that matter?**
The agent often adds error handling for impossible states while missing realistic edge cases. Focus on: what happens with empty data, null values, network failures, and concurrent access.

## The "Read the Diff" Habit

The most important practice: read every diff before accepting it. Not skim — read. (See [Git Commands I Actually Use](/snippet/git-commands-i-use/) for the full reference card.)

```bash
git diff --staged    # what you're about to commit
git diff HEAD~1      # what just landed
```

This sounds obvious. In practice, after hours of productive agent sessions, the temptation to "just accept and move on" is strong. That's exactly when bugs slip through.

:::warning
The agent's confidence is not correlated with correctness. It will present broken code with the same certainty as working code. Your review is the only quality gate.
:::

## After the Review

If you find a problem the agent should have avoided, add a rule to [CLAUDE.md](/guide/claude-md-patterns/). This is how the constraint system grows — through real failures, not hypothetical ones. Every bug that makes it past review is a missing rule.

## Sources

- Anthropic, [Code Review](https://code.claude.com/docs/en/code-review)
- Anthropic, [Common workflows](https://code.claude.com/docs/en/common-workflows)
- Git, [`git-diff` documentation](https://git-scm.com/docs/git-diff)

---

# What I Learned Building krowdev with AI Agents

URL: https://krowdev.com/article/building-krowdev-with-agents/
Kind: article | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: agentic-coding, meta

> Honest retrospective on building a developer knowledge base entirely with AI coding agents — what worked, what didn't, and what I'd do differently.

## Agent Context

- Canonical: https://krowdev.com/article/building-krowdev-with-agents/
- Markdown: https://krowdev.com/article/building-krowdev-with-agents.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: article
- Maturity: budding
- Confidence: high
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-03-20
- Modified: 2026-04-21
- Words: 1240 (6 min read)
- Tags: agentic-coding, meta
- Related: agentic-coding-getting-started, reviewing-ai-generated-code, claude-md-patterns
- Content map:
  - h2: The Numbers
  - h2: What Worked
  - h3: Research-First Prompting
  - h3: CLAUDE.md Evolution
  - h3: Parallel Agent Workflows
  - h3: Agent-Browser for Visual Review
  - h2: What Didn't Work
  - h3: Agent Drift
  - h3: Stale Training Data
  - h3: Over-Engineering Risk
  - h2: What Surprised Me
  - h3: Infrastructure Is Fast, Content Is Slow
  - h3: How Much CLAUDE.md Matters
  - h3: The Memory System
  - h2: What I'd Do Differently
  - h2: The Honest Summary
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

I built this entire site — search, mobile nav, breadcrumbs, reader mode, a live WebTerminal island, and a custom Catppuccin theme — in about a week. I had never built a website before. My background is physics and scientific computing: Python, Fortran, Makefiles, conda environments. HTML was something I avoided.

This is what I learned.

## The Numbers

| Metric | Value |
|---|---|
| **Calendar time** | ~7 days, zero to deployed |
| **Content entries** | 8 published (guides, snippets, articles, showcases) |
| **Features shipped** | Search (Pagefind), breadcrumbs, mobile hamburger menu, reader mode with .md endpoints, JSON-LD, theme toggle, series sidebar, WebTerminal island |
| **Framework experience going in** | None. Zero. I knew what HTML stood for. |

Those numbers are real. They're also misleading — I'll explain why below.

## What Worked

### Research-First Prompting

The single highest-value pattern was making the agent research before building. Before writing any Astro code, I had Claude analyze 11 terminal emulator source repos, read the Astro 6 docs, and compare static site generators. This research-first pattern is the closest thing to a cheat code I've found.

:::tip
Don't start with "build me X." Start with "read the codebase and propose an approach for X." The five minutes of research prevents hours of rework.
:::

### CLAUDE.md Evolution

The [project rules file](/guide/claude-md-patterns/) started as 10 lines — stack name, file paths, "use Svelte not React." By day three it had boundary rules, build commands, test instructions, and file pointers. By day five it included the agent-browser review workflow.

Every addition came from a mistake. The agent installed a React dependency — I added "Svelte for all interactive islands." It modified reference material — I added "research/corpus/ is READ-ONLY." It broke the build command — I added "don't change the build command in package.json."

CLAUDE.md is not documentation you write up front. It's a growing record of constraints discovered through failure. Each rule exists because the agent got it wrong at least once.

### Parallel Agent Workflows

Once the project rules were solid, I could run multiple agent sessions in parallel — one writing content, one building components, one writing tests. They all read the same CLAUDE.md and produced consistent output. This is where the speed came from. Not from any single session being fast, but from multiple sessions being independent.

### Agent-Browser for Visual Review

[Reviewing generated code](/guide/reviewing-ai-generated-code/) in a terminal is guessing. Reviewing it in a real browser through agent-browser is verification. I caught layout bugs, missing mobile styles, and broken breadcrumbs that looked fine in the code. Set this up on day one, not day five like I did.

## What Didn't Work

### Agent Drift

Despite "Svelte 5 for interactive islands (NOT React)" in CLAUDE.md, the agent suggested React components at least three times. Despite "Astro 6," it generated Astro 4 patterns (the content collections API changed significantly between versions). The agent's training data is a strong prior — your project rules are fighting against it.

:::warning
Agents are trained on millions of React examples and far fewer Svelte ones. If your stack is less common, expect to correct more often. Explicit "NOT X" rules in CLAUDE.md help but don't eliminate the problem.
:::

### Stale Training Data

Astro 6 changed its content collections API from Astro 4. The agent confidently generated the old API. Every. Single. Time. Until I put the correct import syntax directly in CLAUDE.md. The agent doesn't know what it doesn't know — and it won't tell you it's using outdated patterns.

### Over-Engineering Risk

Left unconstrained, the agent will add error handling for impossible states, create abstractions for things used once, and "improve" code you didn't ask about. I lost at least half a day to a session where I asked for "a simple breadcrumb component" and got a full navigation framework with recursive route resolution.

The constraint pattern — "do exactly this, nothing more" — exists because of this.

## What Surprised Me

### Infrastructure Is Fast, Content Is Slow

The entire site infrastructure — layout system, routing, search, mobile nav, theme toggle, CI/CD — took maybe two days. The content took the rest. Agents are excellent at mechanical, well-specified tasks (build a layout component that does X). They're mediocre at writing content that sounds like a real person with real opinions.

It's like the difference between setting up a lab and running experiments. The lab setup is procedural — follow the manual, connect the equipment, run calibration. The experiments require judgment, interpretation, and knowing what's interesting. Agents are lab technicians, not principal investigators.

### How Much CLAUDE.md Matters

The difference between a session with no project rules and a session with good ones is not incremental. It's categorical. Without CLAUDE.md, the agent produces generic, tutorial-quality code. With it, the agent produces code that fits your project. Ten minutes of writing rules saves hours per session, compounding across every future session.

### The Memory System

Claude's memory files — the ones that persist across conversations — turned out to be surprisingly valuable. Not for code details, but for project context: "the user has a physics background, explain via analogies" and "the blog uses Diataxis framework for content organization." This kind of meta-context is invisible in the codebase but dramatically affects output quality.

## What I'd Do Differently

**Start with content, not infrastructure.** I built the layout system, search, mobile nav, and theme toggle before writing a single real article. That's backwards. Content-first forces you to discover what the infrastructure actually needs, instead of guessing. I built three grid layout modes before I knew which ones my content would actually use.

**[Write CLAUDE.md from line one.](/guide/claude-md-patterns/)** Not "I'll add rules as I go" — write the initial stack, conventions, and boundaries before the first agent session. The cost of the first few unconstrained sessions was higher than the cost of writing 20 lines of rules.

**Set up agent-browser immediately.** I reviewed the first dozen generated pages by reading HTML in the terminal and imagining what they looked like. That's not review, that's hope. Visual review caught real bugs — and it's the only way to verify responsive layout, theme switching, and interactive components.

**Be more aggressive about saying no.** Agents propose scope expansion constantly. "While I'm here, I could also..." is the start of every derailed session. The answer should be "no, just do what I asked" far more often than I said it.

:::tip
The best prompt for preventing scope creep: "Do exactly this. Nothing more." Agents respect explicit boundaries better than implicit ones.
:::

## The Honest Summary

Building krowdev with AI agents was genuinely faster than building it without them would have been — probably by an order of magnitude, given my zero web development experience. But "faster" doesn't mean "easy." The skill isn't prompting. The skill is reviewing, constraining, and knowing when the agent is confidently wrong.

The site works. The tests pass. The content is growing. And I understand what every piece does, because I reviewed every line the agent wrote.

That last part is the thing nobody tells you about agentic coding: **you still have to understand all of it.** The agent writes the code, but you own it.

## Sources

- Anthropic, [Claude Code overview](https://code.claude.com/docs/en/overview)
- Anthropic, [How Claude remembers your project](https://code.claude.com/docs/en/memory)
- Anthropic, [Common workflows](https://code.claude.com/docs/en/common-workflows)
- OpenAI, [Codex web](https://developers.openai.com/codex/cloud)

---

# Git Commands I Actually Use

URL: https://krowdev.com/snippet/git-commands-i-use/
Kind: snippet | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: git, reference

> The 20% of git that covers 95% of daily work — no theory, just commands.

## Agent Context

- Canonical: https://krowdev.com/snippet/git-commands-i-use/
- Markdown: https://krowdev.com/snippet/git-commands-i-use.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: snippet
- Maturity: budding
- Confidence: high
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-03-20
- Modified: 2026-04-21
- Words: 444 (3 min read)
- Tags: git, reference
- Related: building-krowdev-with-agents, agentic-coding-getting-started
- Content map:
  - h2: Daily
  - h2: Branching
  - h2: Checking History
  - h2: Undoing Things
  - h2: Working with Remotes
  - h2: Stashing
  - h2: Useful Aliases
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

## Daily

```bash
# What changed?
git status
git diff                    # unstaged changes
git diff --staged           # staged changes (about to commit)

# Commit
git add -p                  # stage hunks interactively — review what you're committing
git commit -m "message"

# Sync
git pull --rebase           # pull without merge commits
git push
```

## Branching

```bash
# Create and switch
git checkout -b feature/thing
git switch -c feature/thing   # modern equivalent

# Switch back
git checkout main
git switch main

# Delete after merge
git branch -d feature/thing   # safe — refuses if unmerged
git branch -D feature/thing   # force — deletes regardless
```

Branching becomes especially important with [agentic coding workflows](/guide/agentic-coding-getting-started/) — each AI agent runs in its own branch or worktree so they can't step on each other's work.

The review habit in [Reviewing AI-Generated Code](/guide/reviewing-ai-generated-code/) starts with `git diff`, and the larger project workflow in [What I Learned Building krowdev with AI Agents](/article/building-krowdev-with-agents/) depends on these commands staying boring and reliable.

## Checking History

```bash
# Recent commits
git log --oneline -10
git log --oneline --graph --all   # visual branch topology

# What changed in a commit?
git show abc1234
git show abc1234 --stat           # files only, no diff

# Who changed this line?
git blame src/lib/auth.ts

# Search commit messages
git log --grep="fix auth"

# Find when a string was added/removed
git log -S "functionName" --oneline
```

## Undoing Things

```bash
# Unstage a file (keep changes)
git restore --staged file.ts

# Discard local changes to a file
git restore file.ts

# Amend the last commit (message or content)
git add forgotten-file.ts
git commit --amend

# Undo last commit but keep changes staged
git reset --soft HEAD~1

# Undo last commit, unstage changes
git reset HEAD~1
```

## Working with Remotes

```bash
# See what's out there
git remote -v
git fetch --all

# Check divergence from upstream
git rev-list --left-right --count main...upstream/main
# Output: "18  29" means 18 ahead, 29 behind

# Rebase onto upstream
git fetch upstream
git rebase upstream/main
```

## Stashing

```bash
# Save work in progress
git stash
git stash push -m "wip: auth refactor"

# Get it back
git stash pop              # apply and remove from stash
git stash apply            # apply but keep in stash

# List stashes
git stash list
```

## Useful Aliases

Add to `~/.gitconfig`:

```ini
[alias]
  s = status --short
  l = log --oneline -20
  d = diff
  ds = diff --staged
  co = checkout
  cb = checkout -b
  amend = commit --amend --no-edit
  last = log -1 --stat
  branches = branch -a --sort=-committerdate
```

## Sources

- Git, [git-diff documentation](https://git-scm.com/docs/git-diff)
- Git, [git-restore documentation](https://git-scm.com/docs/git-restore)
- Git, [git-worktree documentation](https://git-scm.com/docs/git-worktree)

---

# HTTP Status Codes That Actually Matter

URL: https://krowdev.com/snippet/http-status-codes/
Kind: snippet | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: networking, reference

> The 15 HTTP status codes you'll actually encounter in web development, with one-sentence real-world explanations.

## Agent Context

- Canonical: https://krowdev.com/snippet/http-status-codes/
- Markdown: https://krowdev.com/snippet/http-status-codes.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: snippet
- Maturity: budding
- Confidence: high
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-03-20
- Modified: 2026-04-21
- Words: 637 (3 min read)
- Tags: networking, reference
- Related: dns-resolution-full-picture, bot-detection-2026
- Content map:
  - h2: 2xx — It Worked
  - h2: 3xx — Go Somewhere Else
  - h2: 4xx — You Messed Up
  - h2: 5xx — The Server Messed Up
  - h2: Quick Decision Guide
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

There are ~75 official HTTP status codes. You'll encounter about 15 of them regularly. Here they are, grouped by what they mean in practice.

## 2xx — It Worked

- **`200 OK`** — The request succeeded and here's your data. The one you want to see.
- **`201 Created`** — The thing you asked to create (user, post, resource) now exists. Typical response to a successful `POST`.
- **`204 No Content`** — It worked, but there's nothing to send back. Common for `DELETE` requests — the thing is gone, what would I return?

## 3xx — Go Somewhere Else

- **`301 Moved Permanently`** — This URL has moved forever. Browsers and search engines will update their bookmarks. Use this when you rename a route and never want the old one back.
- **`302 Found`** — Temporary redirect. The resource is at a different URL right now, but keep using this one in the future. Login flows use this constantly.
- **`304 Not Modified`** — You already have the latest version (your cache is fine). The server checked your `If-Modified-Since` header and said "nothing changed, use what you have."

## 4xx — You Messed Up

- **`400 Bad Request`** — The server can't understand what you sent. Malformed JSON, missing required fields, invalid query parameters. Check your request body.
- **`401 Unauthorized`** — You're not logged in (or your token expired). Despite the name, this is about *authentication*, not authorization — the server doesn't know who you are.
- **`403 Forbidden`** — The server knows who you are and you're not allowed. Unlike `401`, logging in again won't help — you don't have permission.
- **`404 Not Found`** — Nothing exists at this URL. Either the route is wrong, the resource was deleted, or you have a typo. The most famous status code for a reason. (In [DNS resolution](/guide/dns-resolution-full-picture/), the equivalent is `NXDOMAIN` — "this domain doesn't exist.")
- **`405 Method Not Allowed`** — The URL exists, but not for that HTTP method. You sent a `DELETE` to an endpoint that only accepts `GET`. Check your method.
- **`422 Unprocessable Entity`** — The JSON is valid, but the data doesn't make sense. Your email field contains "not-an-email" or the date is in the wrong format. Many APIs use this instead of `400` for validation errors.
- **`429 Too Many Requests`** — You're being rate-limited. Slow down. Check the `Retry-After` header to know when you can try again. (This is the signal [bot detection systems](/article/bot-detection-2026/) use to push back on aggressive clients, and the one [AIMD rate limiting](/note/aimd-rate-limiting/) is designed to react to automatically.)

## 5xx — The Server Messed Up

- **`500 Internal Server Error`** — Something crashed on the server. An unhandled exception, a null pointer, a database query that blew up. Not your fault as the client — check server logs.
- **`503 Service Unavailable`** — The server is down or overloaded. Deployments, maintenance windows, and traffic spikes all produce this. Usually temporary — try again in a minute.

## Quick Decision Guide

| You see... | First thing to check |
|---|---|
| `401` | Is your auth token present and not expired? |
| `403` | Does this user/role have permission for this action? |
| `404` | Is the URL correct? Is the resource ID valid? |
| `422` vs `400` | `400` = bad syntax, `422` = bad semantics. Check API docs for which one they use. |
| `429` | Read the `Retry-After` header. Add exponential backoff. |
| `500` | Not a client problem. Check server logs, not your request. |
| `503` | Wait and retry. If persistent, check if the service is deploying or down. |

## Sources

- IETF, [RFC 9110: HTTP Semantics](https://www.rfc-editor.org/rfc/rfc9110)
- IETF, [RFC 6585: Additional HTTP Status Codes](https://www.rfc-editor.org/rfc/rfc6585)
- MDN, [HTTP response status codes](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status)

---

# Interactive Features Showcase

URL: https://krowdev.com/snippet/interactive-features-showcase/
Kind: snippet | Maturity: evergreen | Origin: ai-assisted
Author: Agent | Directed by: krow
Tags: astro, reference

> All the interactive components, code features, and eye candy available in krowdev articles.

## Agent Context

- Canonical: https://krowdev.com/snippet/interactive-features-showcase/
- Markdown: https://krowdev.com/snippet/interactive-features-showcase.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: snippet
- Maturity: evergreen
- Confidence: high
- Origin: ai-assisted
- Author: Agent
- Directed by: krow
- Published: 2026-03-17
- Modified: 2026-04-21
- Words: 744 (4 min read)
- Tags: astro, reference
- Related: astro-mental-model
- Content map:
  - h2: Code Blocks
  - h2: Line Highlighting
  - h2: Diff Notation
  - h2: Editor and Terminal Frames
  - h2: Collapsible Sections
  - h2: Callout Boxes
  - h2: Challenge Blocks
  - h2: Code View Tabs
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

This page demonstrates every interactive feature available when writing krowdev content. Use it as a reference when creating new articles. If you're new to how Astro renders these components, start with [the mental model](/guide/astro-mental-model/) — everything here is compiled to static HTML at build time. If the styling here ever looks odd, [Bare Element Selectors vs Library HTML](/snippet/bare-selectors-vs-library-html/) and [CSS Collision Visualized](/snippet/css-collision-visualized/) explain the most common global-CSS collisions.

## Code Blocks

Every fenced code block automatically gets a **language badge**, a **copy button** (hover to reveal), and proper Catppuccin syntax highlighting.

```python
def fibonacci(n):
    """Generate the first n Fibonacci numbers."""
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b

for num in fibonacci(10):
    print(num)
```

```bash
# Install dependencies and build
npm install
npm run build
npm run preview
```

```sql
SELECT users.name, COUNT(posts.id) AS post_count
FROM users
LEFT JOIN posts ON users.id = posts.author_id
GROUP BY users.name
HAVING post_count > 5
ORDER BY post_count DESC;
```

## Line Highlighting

Use `mark={lines}` in the code fence meta to highlight specific lines:

```javascript mark={2-3}
function greet(name) {
  const greeting = `Hello, ${name}!`;
  console.log(greeting);
  return greeting;
}
```

## Diff Notation

Use `ins={lines}` and `del={lines}` to show additions and removals:

```javascript del={5} ins={6-7}
function createUser(name, email) {
  return {
    name,
    email,
    role: 'viewer',
    role: 'editor',
    createdAt: Date.now(),
  };
}
```

## Editor and Terminal Frames

Code blocks auto-detect their frame type. Use `title="filename"` for editor tabs:

```js title="src/utils/helper.js"
export function greet(name) {
  return `Hello, ${name}!`;
}
```

Shell languages get terminal frames automatically:

```bash title="Installing dependencies"
npm install astro-expressive-code
```

## Collapsible Sections

Use `collapse={lines}` to collapse less-important lines:

```typescript collapse={1-4}
interface BlogPost {
  title: string;
  date: Date;
  tags: string[];
  content: string;
  draft: boolean;
}

function publishPost(post: BlogPost): void {
  if (post.draft) {
    throw new Error('Cannot publish a draft');
  }
  // ... publish logic
}
```

## Callout Boxes

Eight types, each with a Catppuccin accent color:

:::note
**Notes** provide additional context. They use the blue accent.
:::

:::tip
**Tips** suggest best practices. They use the green accent.
:::

:::info
**Info** blocks share background details. They use the sapphire accent.
:::

:::warning
**Warnings** flag common mistakes. They use the yellow accent.
:::

:::danger
**Danger** blocks mark breaking or destructive actions. They use the red accent.
:::

:::caution
**Caution** blocks advise careful consideration. They use the peach accent.
:::

<strong>Analogies</strong> map to familiar concepts. Think of Astro components like Python functions — they take arguments (props) and return a result (HTML).

<strong>Key insights</strong> highlight the most important takeaway. This is what you'd underline in a textbook.

## Challenge Blocks

Interactive exercises that expand on click:

**Challenge: Build a greeting component**

Create a `Greeting.astro` component that:
1. Accepts a `name` prop (string)
2. Renders `<h2>Hello, {name}!</h2>`
3. Uses a scoped style to color the text with `var(--accent)`

```astro
---
interface Props {
  name: string;
}
const { name } = Astro.props;
---

<h2>Hello, {name}!</h2>

<style>
  h2 { color: var(--accent); }
</style>
```

**What happens if you forget the Props interface?**

TypeScript won't catch incorrect prop usage at build time. You'll get `undefined` instead of a type error. Always define the interface — it's your safety net.

## Code View Tabs

The Source / Compiled / Rendered pattern for showing how Astro transforms code:

<p class="cv-label">src/components/Badge.astro</p>
<nav class="cv-tabs" role="tablist">
<button class="cv-tab active" role="tab" data-tab="source" aria-selected="true">Source</button>
<button class="cv-tab" role="tab" data-tab="compiled" aria-selected="false">Compiled</button>
<button class="cv-tab" role="tab" data-tab="rendered" aria-selected="false">Rendered</button>
</nav>
<div class="cv-panel active" data-panel="source" role="tabpanel">

```astro
---
interface Props { label: string; color?: string; }
const { label, color = 'var(--accent)' } = Astro.props;
---

<span class="badge" style={`--badge-color: ${color}`}>
  {label}
</span>

<style>
  .badge {
    display: inline-flex;
    padding: 0.15rem 0.6rem;
    border-radius: 999px;
    font-size: 0.75rem;
    font-weight: 600;
    color: var(--badge-color);
    border: 1px solid var(--badge-color);
    background: color-mix(in srgb, var(--badge-color) 10%, transparent);
  }
</style>
```

<div class="cv-panel" data-panel="compiled" role="tabpanel">

```html
<span class="badge" style="--badge-color: var(--accent)"
      data-astro-cid-x7q2k1>
  beginner
</span>

<style>
  .badge[data-astro-cid-x7q2k1] {
    display: inline-flex;
    padding: 0.15rem 0.6rem;
    /* ... scoped to this component only */
  }
</style>
```

<div class="cv-panel" data-panel="rendered" role="tabpanel">

The compiled output shows Astro's scoped CSS in action. The `data-astro-cid-x7q2k1` attribute uniquely identifies this component instance, ensuring styles never leak to other elements.

## Sources

- Astro Docs, [MDX integration](https://docs.astro.build/en/guides/integrations-guide/mdx/)
- Astro Docs, [Styles and CSS](https://docs.astro.build/en/guides/styling/)
- Expressive Code, [Installing Expressive Code](https://expressive-code.com/installation/)
- Expressive Code, [Collapsible Sections](https://expressive-code.com/plugins/collapsible-sections/)

---

# Getting Started with Agentic Coding

URL: https://krowdev.com/guide/agentic-coding-getting-started/
Kind: guide | Maturity: budding | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: agentic-coding, fundamentals

> What agentic coding is, why it matters, and how to start using AI coding agents effectively.

## Agent Context

- Canonical: https://krowdev.com/guide/agentic-coding-getting-started/
- Markdown: https://krowdev.com/guide/agentic-coding-getting-started.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: guide
- Maturity: budding
- Confidence: high
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-03-15
- Modified: 2026-04-21
- Words: 640 (3 min read)
- Tags: agentic-coding, fundamentals
- Related: claude-md-patterns, reviewing-ai-generated-code, astro-mental-model, building-krowdev-with-agents
- Content map:
  - h2: What Makes It "Agentic"?
  - h2: What Agents Are Good At
  - h2: What Agents Struggle With
  - h2: Core Patterns
  - h2: Start Here
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

Agentic coding is the practice of using AI agents — like [Claude Code](https://docs.anthropic.com/en/docs/claude-code), [Codex CLI](https://github.com/openai/codex), or [Cursor](https://cursor.com) — as active collaborators in the development process, rather than just autocomplete tools. This guide covers what it is, how it differs from traditional AI coding, and how to start effectively.

## What Makes It "Agentic"?

The key difference from traditional AI-assisted coding:

| | Traditional AI Assist | Agentic Coding |
|---|---|---|
| **Scope** | Single lines / functions | Entire features across files |
| **Interaction** | You type, it autocompletes | You describe intent, it plans and executes |
| **Context** | Current file only | Reads your codebase, project rules, docs |
| **Memory** | None between prompts | Session context, CLAUDE.md, memory files |
| **Decision-making** | You drive everything | Agent makes decisions, you review |
| **Tool use** | Suggestions only | Reads files, runs commands, creates PRs |

The shift is from "smarter autocomplete" to "junior developer that works fast, reads everything, and needs code review."

## What Agents Are Good At

Based on real experience [building krowdev](/article/building-krowdev-with-agents/) and WebTerminal:

- **Reading large codebases fast** — an agent analyzed 11 terminal emulator source repos in hours, extracting architecture patterns that would take weeks manually
- **Consistent formatting and boilerplate** — schema definitions, test scaffolds, CSS custom properties
- **Cross-file refactors** — renaming a concept across 15 files, updating imports, fixing references
- **Research synthesis** — reading docs, comparing approaches, summarizing trade-offs (see [Parallel AI Research Pipelines](/article/parallel-ai-research-pipelines/) for how this scales)
- **Mechanical work you understand** — "add breadcrumbs to every entry page" when you know exactly what breadcrumbs should look like

## What Agents Struggle With

- **Taste and judgment** — they'll over-engineer, add unnecessary abstractions, and optimize things that don't need optimizing
- **Knowing when to stop** — without constraints, they'll keep "improving" code until it's unrecognizable
- **Your project's history** — they don't know why a decision was made, only what the code looks like now
- **Novel architecture** — they recombine patterns from training data, they don't invent genuinely new approaches
- **Subtle bugs** — they're confident, not careful. Their code works on the happy path but may miss edge cases

## Core Patterns

This knowledge base documents the patterns that make agentic coding work:

- **Prompt Patterns** — getting better results from each interaction
- **Context Management** — feeding agents the right information (see [Writing an Effective CLAUDE.md](/guide/claude-md-patterns/))
- **Code Review** — systematic review of agent output (see [Reviewing AI-Generated Code](/guide/reviewing-ai-generated-code/))

## Start Here

**Your first agentic task should be small, well-defined, and reviewable:**

1. **Pick a task you already know how to do** — so you can evaluate the agent's output. A bug fix, a utility function, a styling change.
2. **Write a clear prompt** describing the *what* and *why*, not the *how*. "Add a 404 page that matches the site design with links back to the homepage and explore page" is better than "create src/pages/404.astro with an h1 and two anchor tags."
3. **Let the agent propose before it builds.** If you're using plan mode or asking for an approach first, you catch bad ideas before they become bad code.
4. **Review the output like a code review.** Read every changed line. Agents are confident — they'll commit to an approach even when it's wrong. Your job is to catch the 10% that's subtly incorrect.
5. **Document what you learn.** The prompt that worked, the constraint that prevented over-engineering, the anti-pattern that wasted an hour. That's what this knowledge base is for.

**Your second task should use CLAUDE.md.** Create a project rules file before starting. Even 10 lines of stack + conventions context dramatically improves output quality. See [Writing an Effective CLAUDE.md](/guide/claude-md-patterns/) for patterns.

## Sources

- Anthropic, [Claude Code overview](https://code.claude.com/docs/en/overview)
- Anthropic, [Common workflows](https://code.claude.com/docs/en/common-workflows)
- OpenAI, [Codex web](https://developers.openai.com/codex/cloud)

---

# The Mental Model

URL: https://krowdev.com/guide/astro-mental-model/
Kind: guide | Maturity: evergreen | Origin: ai-drafted
Author: Agent | Directed by: krow
Tags: astro, fundamentals

> What Astro actually is — a compiler, not a server. The single concept that makes everything else click.

## Agent Context

- Canonical: https://krowdev.com/guide/astro-mental-model/
- Markdown: https://krowdev.com/guide/astro-mental-model.md
- Full corpus: https://krowdev.com/llms-full.txt
- Kind: guide
- Maturity: evergreen
- Confidence: high
- Origin: ai-drafted
- Author: Agent
- Directed by: krow
- Published: 2026-03-15
- Modified: 2026-04-21
- Words: 589 (3 min read)
- Tags: astro, fundamentals
- Related: agentic-coding-getting-started, interactive-features-showcase
- Content map:
  - h2: Astro Is a Compiler
  - h2: What "Static Site Generator" Means
  - h2: The Two Phases
  - h2: Build Time Is Your Superpower
  - h2: Sources
- Crawl policy: same canonical content is exposed through HTML, Markdown, and llms-full; no crawler-specific content gate.

## Astro Is a Compiler

Astro is **not** a web server. It's a compiler — like LaTeX or a C compiler. You feed it source files, it outputs finished HTML.

:::analogy
**LaTeX:** `.tex` files → `pdflatex` → `.pdf` files you distribute

**Astro:** `.astro` + `.md` files → `astro build` → `.html` + `.css` files you upload
:::

The output (`dist/` folder) is a pile of static files. No Python process running, no database, no server-side logic. Cloudflare (or any host) just serves files — like putting PDFs on a file server.

## What "Static Site Generator" Means

The term is literal:

1. **Static** — the output is fixed HTML files, not dynamically generated per request
2. **Site** — a collection of web pages
3. **Generator** — a program that produces them from source templates

When someone visits `krowdev.pages.dev/article/welcome-to-krowdev/`, Cloudflare finds `dist/article/welcome-to-krowdev/index.html` and sends it. No code runs. It's as fast as file serving can be.

## The Two Phases

Everything in Astro happens in one of two phases:

| Phase | When | Where | What Runs |
|---|---|---|---|
| **Build time** | When you run `npm run build` | Your machine or CI | All your Astro/TS code, markdown processing, image optimization |
| **Runtime** | When someone visits the site | User's browser | Only explicit `<script>` tags (if any) |

:::key
By default, Astro ships **zero JavaScript** to the browser. Your code runs once at build time and produces pure HTML. Any JS on the page — theme toggle, search, interactive islands — is explicitly opted into. (See the [interactive features showcase](/snippet/interactive-features-showcase/) for every component available on this site.)
:::

This is the opposite of React/Next.js, where a JavaScript application runs in the browser. With Astro, the browser just renders HTML — like opening a `.html` file from your desktop.

## Build Time Is Your Superpower

Because code runs at build time, you can do expensive things for free:

- Query every markdown file and sort them by date? **Free** — happens once during build
- Validate every article's frontmatter against a schema? **Free** — happens at build time
- Optimize images and compress fonts? **Free** — done during build
- Generate a search index for every page? **Free** — Pagefind runs post-build

None of this costs anything at runtime. Your visitors get pre-computed results.

:::analogy
Think of it like precomputing a lookup table vs. calculating on the fly. Astro precomputes everything into HTML so there's nothing left to compute when someone visits.
:::

**Challenge: Verify the mental model**

Open the krowdev project and run:

```bash
cd site
npm run build
```

Now look at the output:

```bash
ls dist/
ls dist/guide/agentic-coding-getting-started/
cat dist/guide/agentic-coding-getting-started/index.html | head -5
```

**Confirm:** the output is just `.html` files. No `.js` bundles (except the tiny theme toggle), no server code. This is what gets uploaded to Cloudflare. (This entire site was [built with AI agents](/guide/agentic-coding-getting-started/) and the implementation details are covered in [What I Learned Building krowdev with AI Agents](/article/building-krowdev-with-agents/).)

**What about dynamic features like search?**

Pagefind builds a search index at build time — a compressed data file. The search UI loads this index in the browser and searches client-side. No server needed. This is the "precompute everything" pattern taken to its logical end.

## Sources

- Astro docs, [Why Astro?](https://docs.astro.build/en/concepts/why-astro/)
- Astro docs, [Pages](https://docs.astro.build/en/basics/astro-pages/)
- Astro docs, [Routing](https://docs.astro.build/en/guides/routing/)
- Astro docs, [Islands architecture](https://docs.astro.build/en/concepts/islands/)