What is Web Bot Auth, and what does it mean for publishers?

Every existing mechanism for managing AI crawler access - robots.txt, Content Signals, Really Simple Licensing - is an honour-system. A bot that chooses to ignore your declared preferences suffers no technical consequence. Web Bot Auth changes that baseline by attaching a cryptographic signature to every outbound bot request, so a publisher can verify with genuine certainty whether a request from "Google-Agent" was produced by Google's private key or was spoofed by someone who typed that string into a header. That verification gap - between a crawler claiming an identity and a publisher being able to prove it - is the problem Web Bot Auth was built to close.

How does Web Bot Auth actually work?

Web Bot Auth is an application of RFC 9421 (HTTP Message Signatures), an IETF Proposed Standard published in February 2024, adapted specifically for automated crawler traffic. The mechanism is straightforward in principle. A bot operator generates an Ed25519 keypair, publishes the public key as a JSON Web Key Set at a /.well-known/http-message-signatures-directory URL on a domain they control, and signs every outbound HTTP request. Three new request headers carry the relevant data: Signature-Agent points to the key directory, Signature-Input describes what is signed and includes a tag="web-bot-auth" marker, and Signature carries the cryptographic bytes. A publisher's server - or edge network - fetches the public key directory, caches it per the response's Cache-Control header, and verifies the signature on each signed request in sub-millisecond time.

The IETF draft formalising the architecture, draft-meunier-web-bot-auth-architecture, is co-authored by Thibault Meunier of Cloudflare and Sandor Major of Google and is currently at version 05 (March 2026). A chartered IETF working group is targeting standards-track specifications and a Best Current Practice document by August 2026.

Why wasn't user-agent verification enough?

The short answer is that user-agent strings are text, and anyone can set them to anything. Johannes Ullrich at the SANS Internet Storm Center noted in September 2025 that users had long since worked out that setting a user agent to Googlebot bypasses some paywalls - a single-line header edit that the entire allow-by-UA model trusts implicitly. Reverse DNS works for crawlers with stable pointer records, like Googlebot, but is awkward for newer agent crawlers routed through general-purpose cloud infrastructure. IP allowlists are brittle: cloud egress ranges shift without warning and are frequently shared with privacy proxies and VPNs. Cloudflare's research team summarised the problem directly in their July 2025 announcement of Web Bot Auth integration: "connections from the crawling service might be shared by multiple users, such as in the case of privacy proxies and VPNs, and these ranges, often maintained by cloud providers, change over time." Cryptographic verification is the one verification method that survives all three of those failure modes simultaneously.

Which crawlers are already signing requests?

As of mid-2026, two major AI crawlers sign requests using Web Bot Auth. Google's Google-Agent - the AI-browsing agent that powers Google AI Mode and related features - publishes its public keys at agent.bot.goog and signs outbound requests. Google states explicitly in its developer documentation that not all Google user agents use Web Bot Auth; the traditional indexing crawler, Googlebot proper, still authenticates via reverse DNS and IP ranges. OpenAI's ChatGPT agent also signs requests. Cloudflare integrated HTTP Message Signatures into its Verified Bots Programme in July 2025, and AWS WAF, Vercel, Shopify, and Akamai have all added implementation support. The signed share of total Google-claiming traffic is still a minority as of mid-2026 - plan bot-management rules to run a cryptographic verification path in parallel with the legacy reverse-DNS path rather than replacing one with the other.

What can publishers do with verified bot identity?

Cryptographically verified identity is a prerequisite for any meaningful access policy. Today a publisher blocking GPTBot by user-agent is genuinely blocking OpenAI's crawler, but a publisher allowing Googlebot by user-agent is trusting a self-declared claim that any actor can replicate. Web Bot Auth closes that gap for signed crawlers, which in turn makes several downstream policies viable that were not reliable before.

First, selective access. A publisher might allow a verified Google-Agent to browse certain sections while restricting it from premium archives, in the knowledge that the policy is applied to a proven identity rather than a trusted claim. Second, cleaner analytics and segmentation. Verified identity means reliable separation between search indexers, AI retrieval bots, training crawlers, and user-triggered agents - distinctions that matter when you are trying to understand which type of AI traffic drives referral value and which does not. Third, a more robust foundation for commercial access models. Pay-per-crawl and direct licensing arrangements are only as trustworthy as the identity layer beneath them: if a crawler can fake who it is, it can avoid paying for what it reads. Cloudflare reported in late 2025 that over one billion HTTP 402 responses were being sent daily from pay-per-crawl-enrolled sites; the economic credibility of that model depends on knowing that the crawler receiving the payment demand is the crawler it claims to be.

One important boundary: Web Bot Auth is identity, not authorisation. A verified, signed Google-Agent is still subject to whatever access policy the publisher applies to that identity - robots.txt, Content Signals directives, paywalls. Signing does not grant additional access; it just makes the identity claim testable.

How does Web Bot Auth relate to the broader access-control toolkit?

Web Bot Auth sits in the enforcement layer of a three-part stack that publishers increasingly need to manage. The declaration layer - what you want - is where robots.txt, Content Signals Policy, and Really Simple Licensing sit; these tell crawlers your preferences, but compliance is voluntary. The identity layer - who is actually asking - is where Web Bot Auth operates; cryptographic signatures replace self-declared user agents. The monetisation and enforcement layer - what happens to the read - is where pay-per-crawl (Cloudflare, Tollbit, Akamai/Skyfire) and response-level monetisation operate; this layer depends on the identity layer working correctly to avoid granting free access to spoofed traffic.

For publishers focused on the uncompensated open-web read - where a Live Search Agent retrieves content server-side, constructs an AI answer, and returns the response without the user ever visiting the site - verified identity matters because it is the first step in knowing which party is monetising your content and whether any commercial arrangement is in place. blankspace operates at the CDN edge to identify that Live Search Agent traffic and attach revenue to it; Web Bot Auth strengthens the identity signal that makes that identification reliable.

How does a publisher implement Web Bot Auth verification?

For publishers already on Cloudflare, the work is minimal. Cloudflare validates signatures at the edge and exposes the result via cf.verified_bot_category in WAF Custom Rules and Transform Rules, so existing rules can branch on a verified identity without the publisher managing cryptographic code. Cloudflare manages the key directory cache, including key rotation, on the publisher's behalf.

For publishers not on a verifying CDN, verification is handled at the origin by a small middleware layer ahead of the web server. Cloudflare's research team has open-sourced verification tooling at cloudflareresearch/web-bot-auth, including a Rust crate, a TypeScript npm package, a Caddy plugin, and Cloudflare Worker examples. SeatGeek's engineering team reported a verification overhead of roughly 0.6 to 0.9 milliseconds per request against a warm cache - around 3% of average gateway processing time. The most important operational detail is cache management: the public key directory must be refreshed per its Cache-Control header, not on a fixed internal interval, because bot operators rotate keys and a stale cached copy will cause all signatures signed with the new key to fail until the cache expires.

Frequently asked questions

Is Web Bot Auth a replacement for robots.txt?

No. The two answer completely different questions. Web Bot Auth attests "this request was produced by a specific cryptographic keypair registered to this operator" - it is an identity claim. Robots.txt declares "this URL is or is not available for crawling" - it is an access declaration. Both apply independently. A verified, signed crawler that ignores a robots.txt disallow directive is still in violation of that directive; the signature does not override the policy.

Does Web Bot Auth mean Googlebot is now cryptographically verified?

Not yet. As of mid-2026, only Google's Google-Agent AI-browsing crawler signs requests using Web Bot Auth. The traditional Googlebot indexing crawler, which generates the majority of Google-driven organic traffic, still authenticates via reverse DNS and documented IP ranges. Google's own developer documentation says publishers should continue using IP addresses, reverse DNS, and user agents alongside Web Bot Auth. Treat the two verification paths as parallel systems, not replacements for each other.

What happens if the bot operator rotates their signing key?

This is the most common production failure point. If a publisher's system caches the public key directory on a fixed long-interval schedule rather than honoring the directory's Cache-Control header, a key rotation will cause all newly-signed requests to fail against the stale cached key until the publisher's cache expires. The correct approach is to treat Cache-Control as authoritative, refresh the directory on expiry, and reconcile the key set on each refresh so rotated-out keys are removed. On a cache miss where the directory is unreachable, the safe fallback is to route the request to the legacy verification path rather than blocking outright.

Can Web Bot Auth be used to charge crawlers for access?

Web Bot Auth is identity infrastructure, not a payment mechanism. It makes the identity claim behind a pay-per-crawl or licensing arrangement trustworthy, but it does not itself initiate payment. The commercial layer - HTTP 402 responses, TollBit, Cloudflare pay-per-crawl, Skyfire's Know Your Agent model - operates on top of the identity layer. A verified identity is a necessary condition for a credible pay-per-crawl arrangement; Web Bot Auth is the part of the stack that provides it.

Which AI companies have committed to implementing Web Bot Auth?

Google (for Google-Agent) and OpenAI (for the ChatGPT agent) are the two major AI companies with confirmed implementations as of mid-2026. The IETF Web Bot Auth working group, co-anchored by Cloudflare and Google, has milestones targeting standards-track specifications by August 2026. Cloudflare has integrated the protocol into its Verified Bots Programme and open-sourced its verification tooling. Adoption among AI crawler operators is expected to grow as the IETF standard matures - though the pace of rollout, and when Googlebot proper begins signing, is Google's decision to make.