Kindarticle
Maturitybudding
Confidencehigh
Originai-drafted
Created
Seriesdomain-infrastructure #3
Tagssecurity, web, fingerprinting, anti-detection
Related
Markdown/article/bot-detection-2026.md
See what AI agents see
🤖 This content is AI-generated. What does this mean?
article 🪴 budding 🤖 ai-drafted

How Websites Detect Bots in 2026

Layer-by-layer breakdown of how Cloudflare, Akamai, and DataDome detect automation — from TCP SYN packets through TLS handshakes to behavioral signals. Based on empirical captures, not speculation.

The Detection Hierarchy

Bot detection is a layered system. Each layer fires at a different point in the connection lifecycle, and each one can reject you before the next layer even runs. Here’s the order, from earliest to latest:

  1. TCP/IP fingerprint — before encryption, before HTTP, before anything
  2. TLS ClientHello — during the handshake, before any application data
  3. HTTP/2 SETTINGS — the first application frame after TLS completes
  4. HTTP header order and values — the actual request
  5. Client Hints coherence — cross-header consistency checks
  6. IP reputation / ASN classification — datacenter IP = suspicion
  7. Behavioral signals — timing, navigation patterns, mouse movement

The critical insight: layers 1-4 are checked before a single byte of your “page content” loads. No JavaScript runs. No CAPTCHA renders. The server already knows if your connection looks like a browser or a script.

Modern anti-bot systems look for cross-layer consistency — what they call “stack drift.” A perfect TLS fingerprint paired with wrong HTTP/2 settings is more suspicious than getting both slightly wrong. Every layer must tell the same story.

Layer 0: TCP/IP Fingerprinting

The TCP SYN packet — the very first packet of any connection — reveals the operating system. This happens before encryption, before TLS, before HTTP. The server (or its CDN) sees raw TCP parameters that differ by OS:

ParameterLinuxWindowsmacOS
Initial TTL6412864
TCP Window Size29,200 (kernel 3.x) / 64,240 (5.x+)65,53565,535
Window Scale78varies
TCP Options OrderMSS, SACK_PERM, TIMESTAMP, NOP, WSCALEMSS, NOP, WSCALE, NOP, NOP, SACK_PERM (no TIMESTAMP)MSS, NOP, WSCALE, NOP, NOP, TIMESTAMP, SACK_PERM

Windows is the outlier: TTL of 128, no TIMESTAMP option. Linux and macOS share TTL 64 but differ in TCP options order. Tools like p0f and Zardaxt (used by DataDome in production) classify OS from these values.

The JA4T fingerprint formalizes this: Window_Size, Options, MSS, TTL. It’s compact enough to index and fast enough to check on every connection.

The proxy problem

When traffic routes through a proxy (SOCKS5, CONNECT), the target server sees the proxy’s TCP stack, not yours. If the proxy runs Linux (TTL=64, Linux TCP options) but your User-Agent claims Windows (TTL=128, Windows TCP options), that’s a detectable mismatch.

In practice, most proxy servers run Linux. This means:

  • macOS User-Agents: Safe. macOS and Linux both use TTL=64, so the TCP layer is consistent.
  • Windows User-Agents: Risky. TTL=64 from the Linux proxy contradicts the expected TTL=128 from a Windows machine.

This is the kind of cross-layer inconsistency that modern systems catch — the TCP layer and the HTTP layer are telling different stories about the operating system.

Layer 1: TLS ClientHello

The TLS handshake happens before any HTTP data crosses the wire. The ClientHello message contains a rich set of signals:

  • Cipher suites: count, order, values (including GREASE tokens)
  • TLS extensions: count, order, values (including BoringSSL-specific ones)
  • Supported groups (elliptic curves)
  • Signature algorithms
  • ALPN values (h2, http/1.1)
  • Key share groups

Each browser has a distinct combination. Chrome uses BoringSSL, Firefox uses NSS, Safari uses Apple’s SecureTransport. The crypto libraries produce fundamentally different ClientHello messages — different cipher suites, different extension sets, different ordering.

JA3: the original, now largely obsolete

JA3 hashes TLS version + cipher suites + extensions + elliptic curves + EC point formats into an MD5 fingerprint. It worked well until Chrome 110 (January 2023) introduced TLS extension order randomization — a deliberate anti-fingerprinting measure. Now every Chrome connection produces a different JA3 hash:

Impersonation TargetJA3 Hash
Chrome 1209cc9e346...
Chrome 124351d0eae...
Chrome 131cdbf6205...
Chrome 133a6d135b0...
Chrome 1362d04cd75...

Different hash every time, same browser. JA3 is still useful for detecting non-browser clients (Python requests, Go’s net/http, raw curl) which don’t randomize — but it’s useless for distinguishing Chrome versions.

JA4: the current standard

JA4, universally adopted by Cloudflare, AWS WAF, VirusTotal, and Akamai as of 2026, fixes this with a three-part fingerprint: a_b_c.

  • Part A (human-readable): protocol type, TLS version, SNI presence, cipher count, extension count, first ALPN
  • Part B: SHA256 of sorted cipher suites — immune to randomization
  • Part C: SHA256 of sorted extensions + signature algorithms

Sorting before hashing is the key innovation. Chrome can randomize extension order all it wants — the sorted hash is stable.

Empirical captures confirm this. All Chrome 120-131 targets produce the same JA4 parts A and B, with part C changing only when Chrome updated its signature algorithms between versions 131 and 133:

Chrome Version RangeJA4
120 - 131t13d1516h2_8daaf6152771_02713d6af862
133 - 136+t13d1516h2_8daaf6152771_d8a2da3f94cd

The t13d1516h2 prefix decodes to: TLS 1.3, 15 cipher suites (after deduplication/GREASE removal), 16 extensions, HTTP/2 ALPN. Cloudflare sees 15 million unique JA4 fingerprints daily across 500 million+ user agents. A Python script using the requests library has a JA4 that matches exactly zero of those 15 million real browser fingerprints.

The JA4+ family

JA4 spawned a family of fingerprints covering the full stack:

  • JA4S: Server Hello fingerprint
  • JA4H: HTTP client fingerprint (header names, values, cookies)
  • JA4X: X.509 certificate fingerprint
  • JA4T: TCP fingerprint (Layer 0 above)
  • JA4SSH: SSH fingerprint

These are composable. A detection system can check JA4 (TLS) + JA4T (TCP) + JA4H (HTTP) for cross-layer consistency in a single lookup.

Browser TLS characteristics

Each browser family has a distinct cipher suite profile:

BrowserCipher SuitesExtensions
Chrome1618 (15 + 3 GREASE)
Firefox1716-17
Safari2014

Safari has the most cipher suites but fewest extensions. Firefox sits in the middle. These counts alone narrow the field before you even look at values.

Layer 2: HTTP/2 SETTINGS

Immediately after TLS, the HTTP/2 connection opens with a SETTINGS frame. Each browser sends different parameters — and this alone is enough to distinguish Chrome, Firefox, and Safari.

Akamai fingerprint format

The industry-standard format is: SETTINGS|WINDOW_UPDATE|PRIORITY|PSEUDO_HEADER_ORDER

Empirical captures from each browser:

BrowserAkamai HTTP/2 Fingerprint
Chrome1:65536;2:0;4:6291456;6:262144|15663105|0|m,a,s,p
Firefox1:65536;2:0;4:131072;5:16384|12517377|0|m,p,a,s
Safari2:0;3:100;4:2097152;9:1|10420225|0|m,s,a,p

These are completely distinct. Chrome uses INITIAL_WINDOW_SIZE of 6,291,456. Firefox uses 131,072 — 48x smaller. Safari uses entirely different SETTINGS IDs (3=MAX_CONCURRENT_STREAMS, 9=SETTINGS_ENABLE_CONNECT_PROTOCOL) that Chrome doesn’t even send.

The WINDOW_UPDATE values differ too: Chrome sends 15,663,105; Firefox 12,517,377; Safari 10,420,225.

Pseudo-header order: the silent identifier

HTTP/2 requires four pseudo-headers (:method, :authority, :scheme, :path) before any regular headers. The order is technically arbitrary, but each browser has a fixed convention:

BrowserPseudo-Header Order
Chrome:method, :authority, :scheme, :path (masp)
Firefox:method, :path, :authority, :scheme (mpas)
Safari:method, :scheme, :path, :authority (mspa)
curl (default):method, :path, :scheme, :authority (mpsa)

Note that default curl matches no browser at all. This single signal — four headers in the wrong order — is enough to flag a connection as automated. An HTTP client that gets TLS right but sends pseudo-headers in curl’s default order is trivially detected.

This fingerprint is stable across versions. All Chrome targets from version 120 through 142 produce the identical HTTP/2 SETTINGS and pseudo-header order. The HTTP/2 implementation changes far less frequently than TLS parameters.

Layer 3: HTTP Headers

Header order, presence, and values are all signals. Each browser sends headers in a fixed, characteristic sequence, and anti-bot systems compare the observed order against known-good patterns.

Chrome 136 header sequence

:method, :authority, :scheme, :path
sec-ch-ua, sec-ch-ua-mobile, sec-ch-ua-platform
upgrade-insecure-requests, user-agent, accept
sec-fetch-site, sec-fetch-mode, sec-fetch-user, sec-fetch-dest
accept-encoding, accept-language, priority

Firefox 144 header sequence

:method, :path, :authority, :scheme
user-agent
accept, accept-language, accept-encoding
upgrade-insecure-requests
sec-fetch-dest, sec-fetch-mode, sec-fetch-site, sec-fetch-user
priority, te: trailers

Safari 260 header sequence

:method, :scheme, :authority, :path
sec-fetch-dest
user-agent, accept
sec-fetch-site, sec-fetch-mode
accept-language, priority, accept-encoding

The differences are striking:

  • Client Hints (sec-ch-ua, sec-ch-ua-mobile, sec-ch-ua-platform): Chrome-only. Firefox and Safari never send them. If your request claims to be Firefox but includes sec-ch-ua headers, it’s instantly flagged.
  • te: trailers: Firefox-only. No other browser sends it.
  • sec-fetch-dest position: Chrome sends it after sec-fetch-mode. Safari sends it first among regular headers. Firefox sends it first among the sec-fetch group.
  • accept-encoding position: Chrome sends it near the end. Safari sends it last. Firefox sends it after accept-language.
  • user-agent position: Chrome sends it in the middle (after upgrade-insecure-requests). Firefox sends it first among regular headers. Safari sends it after sec-fetch-dest.

Cross-header consistency

Headers must agree with each other:

Signal AMust MatchSignal B
sec-ch-ua-platformUser-Agent OS string
sec-ch-ua browser versionTLS JA4 fingerprint
accept-languageProxy IP geolocation
HTTP/2 pseudo-header orderTLS fingerprint (browser identity)
sec-fetch-* valuesRequest context (navigation vs. API call)

A request with sec-ch-ua-platform: "Windows" and User-Agent: ...Macintosh; Intel Mac OS X... is an instant fail. DataDome’s own documentation states: “Using a Windows Chrome User Agent and a Linux platform header may result in blocking.”

GREASE in Sec-Ch-Ua

Chrome rotates the “Not A Brand” GREASE string per version:

  • Chrome 136: "Not.A/Brand";v="99"
  • Chrome 138: "Not)A;Brand";v="8"

The GREASE brand in sec-ch-ua must match the Chrome version claimed by the TLS fingerprint. A stale GREASE string is a version mismatch signal.

The Players

Cloudflare Bot Management v3

Scale: 46 million HTTP requests per second across its network.

Cloudflare runs a multi-engine detection system:

  • ML model (v8): Three feature categories — global (inter-request aggregates), high-cardinality (per-IP patterns), and single-request signals. Claims 95% accuracy against distributed residential proxy attacks.
  • Heuristics engine: 50+ rules built on HTTP/2 fingerprints and ClientHello extensions.
  • JS Detection (JSD): Identifies headless browsers via navigator.webdriver, missing APIs, and other DOM-level signals.
  • Per-customer ML (introduced 2025): Custom models trained on each site’s specific traffic baseline. What looks normal for a SaaS dashboard is anomalous for an e-commerce storefront.

Key technical detail: Cloudflare sees 15 million unique JA4 fingerprints daily. Their system correlates JA4 against User-Agent — 500 million+ user agent strings mapped to expected JA4 values. A mismatch between claimed browser and observed TLS behavior is one of their primary signals.

Cloudflare also detects Chrome DevTools Protocol (CDP) usage, which they estimate covers “99% of bots.” CDP leaves detectable artifacts even when navigator.webdriver is patched out.

Turnstile (Cloudflare’s CAPTCHA replacement): Independent testing found it catches only about 33% of bot traffic, compared to reCAPTCHA’s 69%. For HTTP-level automation that doesn’t execute JavaScript, Turnstile is irrelevant — it requires a browser context to even load.

Akamai Bot Manager

Akamai’s approach mirrors Cloudflare’s in principle but differs in emphasis:

  • JA3 fingerprints compared against a known-good database (JA4 adopted commercially in 2026)
  • HTTP/2 fingerprinting with their own format (the SETTINGS|WINDOW_UPDATE|PRIORITY|PSEUDO_HEADER_ORDER format described above originated from Akamai’s Black Hat EU 2017 research)
  • IP reputation: datacenter ASNs (AWS, OVH, Hetzner) are immediately flagged
  • Behavioral analysis: identical scrolling patterns, perfectly timed clicks, predictable navigation sequences
  • Active challenges for browser authenticity confirmation

DataDome

DataDome is the most aggressive of the three, analyzing 1,000+ signals on 100% of requests with sub-2ms response time (5 trillion signals per day):

Server-side signals (heavier weight in their scoring):

  • Request header analysis (order matters)
  • HTTP version detection
  • TLS/JA3/JA4 fingerprinting
  • IP reputation scoring
  • TCP/IP OS fingerprinting via Zardaxt — they’re one of the few vendors openly using Layer 0

Client-side signals (35+ behavioral):

  • Mouse movement, scroll velocity, typing cadence, click coordinates
  • GPU rendering capabilities, font availability, JS engine specifics
  • Per-customer ML models (85,000+ customer-specific models as of 2025)
  • LLM crawler traffic detection (added 2025)

A critical insight from DataDome’s own research: server-side signals carry more weight than client-side JavaScript fingerprinting. They’ve found that “JS fingerprinting is prone to false positives and not as heavily weighted.” For requests that never execute JavaScript, this means the TLS + HTTP/2 + header layers are what matter.

What Changed in 2025-2026

The bot detection landscape shifted significantly:

JA4 replaced JA3 as the industry standard. Chrome’s TLS extension randomization (since Chrome 110, January 2023) made JA3 unreliable for browser identification. JA4’s sorted-before-hashing approach solved this. By 2026, Cloudflare, AWS WAF, VirusTotal, and Akamai all use JA4 as a primary signal.

Detection moved upstream. The trend is toward catching bots earlier in the connection lifecycle. TLS handshake checks happen before the page loads, before JavaScript runs, before any CAPTCHA renders. If your ClientHello looks wrong, the connection may be terminated or routed to a honeypot before HTTP even begins.

Per-customer ML models arrived. Cloudflare’s per-customer models (2025) train on each site’s specific traffic patterns. A request that looks normal globally can be anomalous for a specific site. This makes generic evasion harder — you need to look normal for the specific site you’re accessing, not just for the internet in general.

Residential proxy detection improved. Cloudflare’s v8 ML model claims per-request detection of residential proxy abuse without IP blocking. The signals include request timing, header patterns, and behavioral fingerprints that distinguish real residential users from proxy traffic, even when the IP itself is classified as residential.

CDP detection became standard. Chrome DevTools Protocol detection is now a primary signal. CDP leaves artifacts in the browser environment that persist even when common patches (like removing navigator.webdriver) are applied. Cloudflare estimates 99% of browser-based bots use CDP.

Browser attestation appeared. Google’s browser attestation APIs allow servers to verify that the connecting client is an unmodified, vendor-signed browser binary. Modified Chromium builds fail integrity checks. This is currently limited in deployment but represents the direction: hardware-rooted trust for browser identity.

Fingerprint inconsistency detection formalized. An IMC 2025 paper introduced data-driven rules for detecting both spatial inconsistencies (cross-attribute contradictions in a single request) and temporal inconsistencies (attribute changes across requests from the same session). The approach reduced bot evasion success by 45-48%.

What Actually Matters vs. What’s Theater

For HTTP-level requests that don’t execute JavaScript (API calls, data fetching, scraping), the detection stack collapses to a smaller set of signals that actually matter:

What matters

  1. TLS fingerprint (JA4): The single most important signal. A Python requests library has a JA4 that matches zero real browsers. Using a TLS library that replays a real browser’s ClientHello is table stakes.

  2. HTTP/2 SETTINGS + pseudo-header order: The second gate. Default curl sends pseudo-headers in mpsa order, matching no browser. Chrome uses masp, Firefox mpas, Safari mspa. Wrong SETTINGS values or wrong pseudo-header order flags the connection before the first header is read.

  3. Header order and presence: Chrome, Firefox, and Safari each send headers in a fixed, characteristic sequence. Missing sec-fetch-* headers when claiming to be Chrome is an automation signal. Including sec-ch-ua when claiming to be Firefox is equally bad.

  4. Cross-layer consistency: Every signal must agree. TLS says Chrome 136, headers must say Chrome 136, sec-ch-ua-platform must match the User-Agent OS, and accept-language should be plausible for the IP’s geolocation.

  5. IP reputation: Datacenter ASNs are flagged by default. Residential IPs get more trust but are increasingly fingerprinted themselves.

What’s theater (for non-JS requests)

  • JavaScript fingerprinting: Irrelevant if you never execute JS. Canvas fingerprinting, WebGL rendering, navigator property checks — none of these fire for a simple HTTP request.
  • Behavioral signals: Mouse movement, scroll patterns, typing cadence — these require a browser context. For API-style requests, behavioral analysis is limited to request timing and navigation patterns.
  • CAPTCHAs and Turnstile: These require a browser to render. They’re a gate for browser traffic, not for HTTP clients.

The practical implication: for simple HTTP requests, the TLS + HTTP/2 + header stack is the entire battle. Get those three layers right and consistent, and most anti-bot systems will pass you through. Get any one of them wrong, and everything downstream is irrelevant — you’re already flagged before your request body is read.