How Websites Detect Bots in 2026
Layer-by-layer breakdown of how Cloudflare, Akamai, and DataDome detect automation — from TCP SYN packets through TLS handshakes to behavioral signals. Based on empirical captures, not speculation.
The Detection Hierarchy
Bot detection is a layered system. Each layer fires at a different point in the connection lifecycle, and each one can reject you before the next layer even runs. Here’s the order, from earliest to latest:
- TCP/IP fingerprint — before encryption, before HTTP, before anything
- TLS ClientHello — during the handshake, before any application data
- HTTP/2 SETTINGS — the first application frame after TLS completes
- HTTP header order and values — the actual request
- Client Hints coherence — cross-header consistency checks
- IP reputation / ASN classification — datacenter IP = suspicion
- Behavioral signals — timing, navigation patterns, mouse movement
The critical insight: layers 1-4 are checked before a single byte of your “page content” loads. No JavaScript runs. No CAPTCHA renders. The server already knows if your connection looks like a browser or a script.
Modern anti-bot systems look for cross-layer consistency — what they call “stack drift.” A perfect TLS fingerprint paired with wrong HTTP/2 settings is more suspicious than getting both slightly wrong. Every layer must tell the same story.
Layer 0: TCP/IP Fingerprinting
The TCP SYN packet — the very first packet of any connection — reveals the operating system. This happens before encryption, before TLS, before HTTP. The server (or its CDN) sees raw TCP parameters that differ by OS:
| Parameter | Linux | Windows | macOS |
|---|---|---|---|
| Initial TTL | 64 | 128 | 64 |
| TCP Window Size | 29,200 (kernel 3.x) / 64,240 (5.x+) | 65,535 | 65,535 |
| Window Scale | 7 | 8 | varies |
| TCP Options Order | MSS, SACK_PERM, TIMESTAMP, NOP, WSCALE | MSS, NOP, WSCALE, NOP, NOP, SACK_PERM (no TIMESTAMP) | MSS, NOP, WSCALE, NOP, NOP, TIMESTAMP, SACK_PERM |
Windows is the outlier: TTL of 128, no TIMESTAMP option. Linux and macOS share TTL 64 but differ in TCP options order. Tools like p0f and Zardaxt (used by DataDome in production) classify OS from these values.
The JA4T fingerprint formalizes this: Window_Size, Options, MSS, TTL. It’s compact enough to index and fast enough to check on every connection.
The proxy problem
When traffic routes through a proxy (SOCKS5, CONNECT), the target server sees the proxy’s TCP stack, not yours. If the proxy runs Linux (TTL=64, Linux TCP options) but your User-Agent claims Windows (TTL=128, Windows TCP options), that’s a detectable mismatch.
In practice, most proxy servers run Linux. This means:
- macOS User-Agents: Safe. macOS and Linux both use TTL=64, so the TCP layer is consistent.
- Windows User-Agents: Risky. TTL=64 from the Linux proxy contradicts the expected TTL=128 from a Windows machine.
This is the kind of cross-layer inconsistency that modern systems catch — the TCP layer and the HTTP layer are telling different stories about the operating system.
Layer 1: TLS ClientHello
The TLS handshake happens before any HTTP data crosses the wire. The ClientHello message contains a rich set of signals:
- Cipher suites: count, order, values (including GREASE tokens)
- TLS extensions: count, order, values (including BoringSSL-specific ones)
- Supported groups (elliptic curves)
- Signature algorithms
- ALPN values (h2, http/1.1)
- Key share groups
Each browser has a distinct combination. Chrome uses BoringSSL, Firefox uses NSS, Safari uses Apple’s SecureTransport. The crypto libraries produce fundamentally different ClientHello messages — different cipher suites, different extension sets, different ordering.
JA3: the original, now largely obsolete
JA3 hashes TLS version + cipher suites + extensions + elliptic curves + EC point formats into an MD5 fingerprint. It worked well until Chrome 110 (January 2023) introduced TLS extension order randomization — a deliberate anti-fingerprinting measure. Now every Chrome connection produces a different JA3 hash:
| Impersonation Target | JA3 Hash |
|---|---|
| Chrome 120 | 9cc9e346... |
| Chrome 124 | 351d0eae... |
| Chrome 131 | cdbf6205... |
| Chrome 133 | a6d135b0... |
| Chrome 136 | 2d04cd75... |
Different hash every time, same browser. JA3 is still useful for detecting non-browser clients (Python requests, Go’s net/http, raw curl) which don’t randomize — but it’s useless for distinguishing Chrome versions.
JA4: the current standard
JA4, universally adopted by Cloudflare, AWS WAF, VirusTotal, and Akamai as of 2026, fixes this with a three-part fingerprint: a_b_c.
- Part A (human-readable): protocol type, TLS version, SNI presence, cipher count, extension count, first ALPN
- Part B: SHA256 of sorted cipher suites — immune to randomization
- Part C: SHA256 of sorted extensions + signature algorithms
Sorting before hashing is the key innovation. Chrome can randomize extension order all it wants — the sorted hash is stable.
Empirical captures confirm this. All Chrome 120-131 targets produce the same JA4 parts A and B, with part C changing only when Chrome updated its signature algorithms between versions 131 and 133:
| Chrome Version Range | JA4 |
|---|---|
| 120 - 131 | t13d1516h2_8daaf6152771_02713d6af862 |
| 133 - 136+ | t13d1516h2_8daaf6152771_d8a2da3f94cd |
The t13d1516h2 prefix decodes to: TLS 1.3, 15 cipher suites (after deduplication/GREASE removal), 16 extensions, HTTP/2 ALPN. Cloudflare sees 15 million unique JA4 fingerprints daily across 500 million+ user agents. A Python script using the requests library has a JA4 that matches exactly zero of those 15 million real browser fingerprints.
The JA4+ family
JA4 spawned a family of fingerprints covering the full stack:
- JA4S: Server Hello fingerprint
- JA4H: HTTP client fingerprint (header names, values, cookies)
- JA4X: X.509 certificate fingerprint
- JA4T: TCP fingerprint (Layer 0 above)
- JA4SSH: SSH fingerprint
These are composable. A detection system can check JA4 (TLS) + JA4T (TCP) + JA4H (HTTP) for cross-layer consistency in a single lookup.
Browser TLS characteristics
Each browser family has a distinct cipher suite profile:
| Browser | Cipher Suites | Extensions |
|---|---|---|
| Chrome | 16 | 18 (15 + 3 GREASE) |
| Firefox | 17 | 16-17 |
| Safari | 20 | 14 |
Safari has the most cipher suites but fewest extensions. Firefox sits in the middle. These counts alone narrow the field before you even look at values.
Layer 2: HTTP/2 SETTINGS
Immediately after TLS, the HTTP/2 connection opens with a SETTINGS frame. Each browser sends different parameters — and this alone is enough to distinguish Chrome, Firefox, and Safari.
Akamai fingerprint format
The industry-standard format is: SETTINGS|WINDOW_UPDATE|PRIORITY|PSEUDO_HEADER_ORDER
Empirical captures from each browser:
| Browser | Akamai HTTP/2 Fingerprint |
|---|---|
| Chrome | 1:65536;2:0;4:6291456;6:262144|15663105|0|m,a,s,p |
| Firefox | 1:65536;2:0;4:131072;5:16384|12517377|0|m,p,a,s |
| Safari | 2:0;3:100;4:2097152;9:1|10420225|0|m,s,a,p |
These are completely distinct. Chrome uses INITIAL_WINDOW_SIZE of 6,291,456. Firefox uses 131,072 — 48x smaller. Safari uses entirely different SETTINGS IDs (3=MAX_CONCURRENT_STREAMS, 9=SETTINGS_ENABLE_CONNECT_PROTOCOL) that Chrome doesn’t even send.
The WINDOW_UPDATE values differ too: Chrome sends 15,663,105; Firefox 12,517,377; Safari 10,420,225.
Pseudo-header order: the silent identifier
HTTP/2 requires four pseudo-headers (:method, :authority, :scheme, :path) before any regular headers. The order is technically arbitrary, but each browser has a fixed convention:
| Browser | Pseudo-Header Order |
|---|---|
| Chrome | :method, :authority, :scheme, :path (masp) |
| Firefox | :method, :path, :authority, :scheme (mpas) |
| Safari | :method, :scheme, :path, :authority (mspa) |
| curl (default) | :method, :path, :scheme, :authority (mpsa) |
Note that default curl matches no browser at all. This single signal — four headers in the wrong order — is enough to flag a connection as automated. An HTTP client that gets TLS right but sends pseudo-headers in curl’s default order is trivially detected.
This fingerprint is stable across versions. All Chrome targets from version 120 through 142 produce the identical HTTP/2 SETTINGS and pseudo-header order. The HTTP/2 implementation changes far less frequently than TLS parameters.
Layer 3: HTTP Headers
Header order, presence, and values are all signals. Each browser sends headers in a fixed, characteristic sequence, and anti-bot systems compare the observed order against known-good patterns.
Chrome 136 header sequence
:method, :authority, :scheme, :pathsec-ch-ua, sec-ch-ua-mobile, sec-ch-ua-platformupgrade-insecure-requests, user-agent, acceptsec-fetch-site, sec-fetch-mode, sec-fetch-user, sec-fetch-destaccept-encoding, accept-language, priorityFirefox 144 header sequence
:method, :path, :authority, :schemeuser-agentaccept, accept-language, accept-encodingupgrade-insecure-requestssec-fetch-dest, sec-fetch-mode, sec-fetch-site, sec-fetch-userpriority, te: trailersSafari 260 header sequence
:method, :scheme, :authority, :pathsec-fetch-destuser-agent, acceptsec-fetch-site, sec-fetch-modeaccept-language, priority, accept-encodingThe differences are striking:
- Client Hints (
sec-ch-ua,sec-ch-ua-mobile,sec-ch-ua-platform): Chrome-only. Firefox and Safari never send them. If your request claims to be Firefox but includessec-ch-uaheaders, it’s instantly flagged. te: trailers: Firefox-only. No other browser sends it.sec-fetch-destposition: Chrome sends it aftersec-fetch-mode. Safari sends it first among regular headers. Firefox sends it first among thesec-fetchgroup.accept-encodingposition: Chrome sends it near the end. Safari sends it last. Firefox sends it afteraccept-language.user-agentposition: Chrome sends it in the middle (afterupgrade-insecure-requests). Firefox sends it first among regular headers. Safari sends it aftersec-fetch-dest.
Cross-header consistency
Headers must agree with each other:
| Signal A | Must Match | Signal B |
|---|---|---|
sec-ch-ua-platform | ↔ | User-Agent OS string |
sec-ch-ua browser version | ↔ | TLS JA4 fingerprint |
accept-language | ↔ | Proxy IP geolocation |
| HTTP/2 pseudo-header order | ↔ | TLS fingerprint (browser identity) |
sec-fetch-* values | ↔ | Request context (navigation vs. API call) |
A request with sec-ch-ua-platform: "Windows" and User-Agent: ...Macintosh; Intel Mac OS X... is an instant fail. DataDome’s own documentation states: “Using a Windows Chrome User Agent and a Linux platform header may result in blocking.”
GREASE in Sec-Ch-Ua
Chrome rotates the “Not A Brand” GREASE string per version:
- Chrome 136:
"Not.A/Brand";v="99" - Chrome 138:
"Not)A;Brand";v="8"
The GREASE brand in sec-ch-ua must match the Chrome version claimed by the TLS fingerprint. A stale GREASE string is a version mismatch signal.
The Players
Cloudflare Bot Management v3
Scale: 46 million HTTP requests per second across its network.
Cloudflare runs a multi-engine detection system:
- ML model (v8): Three feature categories — global (inter-request aggregates), high-cardinality (per-IP patterns), and single-request signals. Claims 95% accuracy against distributed residential proxy attacks.
- Heuristics engine: 50+ rules built on HTTP/2 fingerprints and ClientHello extensions.
- JS Detection (JSD): Identifies headless browsers via
navigator.webdriver, missing APIs, and other DOM-level signals. - Per-customer ML (introduced 2025): Custom models trained on each site’s specific traffic baseline. What looks normal for a SaaS dashboard is anomalous for an e-commerce storefront.
Key technical detail: Cloudflare sees 15 million unique JA4 fingerprints daily. Their system correlates JA4 against User-Agent — 500 million+ user agent strings mapped to expected JA4 values. A mismatch between claimed browser and observed TLS behavior is one of their primary signals.
Cloudflare also detects Chrome DevTools Protocol (CDP) usage, which they estimate covers “99% of bots.” CDP leaves detectable artifacts even when navigator.webdriver is patched out.
Turnstile (Cloudflare’s CAPTCHA replacement): Independent testing found it catches only about 33% of bot traffic, compared to reCAPTCHA’s 69%. For HTTP-level automation that doesn’t execute JavaScript, Turnstile is irrelevant — it requires a browser context to even load.
Akamai Bot Manager
Akamai’s approach mirrors Cloudflare’s in principle but differs in emphasis:
- JA3 fingerprints compared against a known-good database (JA4 adopted commercially in 2026)
- HTTP/2 fingerprinting with their own format (the
SETTINGS|WINDOW_UPDATE|PRIORITY|PSEUDO_HEADER_ORDERformat described above originated from Akamai’s Black Hat EU 2017 research) - IP reputation: datacenter ASNs (AWS, OVH, Hetzner) are immediately flagged
- Behavioral analysis: identical scrolling patterns, perfectly timed clicks, predictable navigation sequences
- Active challenges for browser authenticity confirmation
DataDome
DataDome is the most aggressive of the three, analyzing 1,000+ signals on 100% of requests with sub-2ms response time (5 trillion signals per day):
Server-side signals (heavier weight in their scoring):
- Request header analysis (order matters)
- HTTP version detection
- TLS/JA3/JA4 fingerprinting
- IP reputation scoring
- TCP/IP OS fingerprinting via Zardaxt — they’re one of the few vendors openly using Layer 0
Client-side signals (35+ behavioral):
- Mouse movement, scroll velocity, typing cadence, click coordinates
- GPU rendering capabilities, font availability, JS engine specifics
- Per-customer ML models (85,000+ customer-specific models as of 2025)
- LLM crawler traffic detection (added 2025)
A critical insight from DataDome’s own research: server-side signals carry more weight than client-side JavaScript fingerprinting. They’ve found that “JS fingerprinting is prone to false positives and not as heavily weighted.” For requests that never execute JavaScript, this means the TLS + HTTP/2 + header layers are what matter.
What Changed in 2025-2026
The bot detection landscape shifted significantly:
JA4 replaced JA3 as the industry standard. Chrome’s TLS extension randomization (since Chrome 110, January 2023) made JA3 unreliable for browser identification. JA4’s sorted-before-hashing approach solved this. By 2026, Cloudflare, AWS WAF, VirusTotal, and Akamai all use JA4 as a primary signal.
Detection moved upstream. The trend is toward catching bots earlier in the connection lifecycle. TLS handshake checks happen before the page loads, before JavaScript runs, before any CAPTCHA renders. If your ClientHello looks wrong, the connection may be terminated or routed to a honeypot before HTTP even begins.
Per-customer ML models arrived. Cloudflare’s per-customer models (2025) train on each site’s specific traffic patterns. A request that looks normal globally can be anomalous for a specific site. This makes generic evasion harder — you need to look normal for the specific site you’re accessing, not just for the internet in general.
Residential proxy detection improved. Cloudflare’s v8 ML model claims per-request detection of residential proxy abuse without IP blocking. The signals include request timing, header patterns, and behavioral fingerprints that distinguish real residential users from proxy traffic, even when the IP itself is classified as residential.
CDP detection became standard. Chrome DevTools Protocol detection is now a primary signal. CDP leaves artifacts in the browser environment that persist even when common patches (like removing navigator.webdriver) are applied. Cloudflare estimates 99% of browser-based bots use CDP.
Browser attestation appeared. Google’s browser attestation APIs allow servers to verify that the connecting client is an unmodified, vendor-signed browser binary. Modified Chromium builds fail integrity checks. This is currently limited in deployment but represents the direction: hardware-rooted trust for browser identity.
Fingerprint inconsistency detection formalized. An IMC 2025 paper introduced data-driven rules for detecting both spatial inconsistencies (cross-attribute contradictions in a single request) and temporal inconsistencies (attribute changes across requests from the same session). The approach reduced bot evasion success by 45-48%.
What Actually Matters vs. What’s Theater
For HTTP-level requests that don’t execute JavaScript (API calls, data fetching, scraping), the detection stack collapses to a smaller set of signals that actually matter:
What matters
-
TLS fingerprint (JA4): The single most important signal. A Python
requestslibrary has a JA4 that matches zero real browsers. Using a TLS library that replays a real browser’s ClientHello is table stakes. -
HTTP/2 SETTINGS + pseudo-header order: The second gate. Default curl sends pseudo-headers in
mpsaorder, matching no browser. Chrome usesmasp, Firefoxmpas, Safarimspa. Wrong SETTINGS values or wrong pseudo-header order flags the connection before the first header is read. -
Header order and presence: Chrome, Firefox, and Safari each send headers in a fixed, characteristic sequence. Missing
sec-fetch-*headers when claiming to be Chrome is an automation signal. Includingsec-ch-uawhen claiming to be Firefox is equally bad. -
Cross-layer consistency: Every signal must agree. TLS says Chrome 136, headers must say Chrome 136,
sec-ch-ua-platformmust match the User-Agent OS, andaccept-languageshould be plausible for the IP’s geolocation. -
IP reputation: Datacenter ASNs are flagged by default. Residential IPs get more trust but are increasingly fingerprinted themselves.
What’s theater (for non-JS requests)
- JavaScript fingerprinting: Irrelevant if you never execute JS. Canvas fingerprinting, WebGL rendering,
navigatorproperty checks — none of these fire for a simple HTTP request. - Behavioral signals: Mouse movement, scroll patterns, typing cadence — these require a browser context. For API-style requests, behavioral analysis is limited to request timing and navigation patterns.
- CAPTCHAs and Turnstile: These require a browser to render. They’re a gate for browser traffic, not for HTTP clients.
The practical implication: for simple HTTP requests, the TLS + HTTP/2 + header stack is the entire battle. Get those three layers right and consistent, and most anti-bot systems will pass you through. Get any one of them wrong, and everything downstream is irrelevant — you’re already flagged before your request body is read.