What is the crawl-to-refer ratio, and why does it matter for publishers?

For two decades the bargain between publishers and search engines was simple: let the crawler in, and it sends readers back. The crawl-to-refer ratio is the metric that exposes how completely generative AI has rewritten that bargain. It divides the number of HTML pages a platform's bots request from a site by the number of visitors that platform refers back, then normalises the result to a single referral. A ratio of 5:1 means five pages crawled for every visitor returned. A ratio of 38,000:1 means a platform read 38,000 pages and sent one human in exchange. The higher the number, the more a platform consumes relative to what it gives, and in 2025 and 2026 the numbers for the largest AI operators climbed into territory no search crawler ever occupied.

How is the crawl-to-refer ratio calculated?

The ratio is a straightforward division. The numerator is the total number of requests from the crawlers and user agents associated with a platform where the response was HTML content. The denominator is the total number of requests for HTML content where the visitor arrived carrying a Referer header pointing back to that same platform. Cloudflare, which publishes the metric live on the AI Insights page of Cloudflare Radar, aggregates every user agent a company runs into one platform figure, so a vendor's training crawler, search crawler and live user-fetch bot are all counted together on the crawl side.

The result is always expressed against one referral. Legacy search crawlers sat near parity or modestly above it, because the whole point of indexing was to drive a click. Generative AI crawlers break that link: they read content to answer the question inside the assistant, so the user rarely needs to leave. That structural difference is exactly what the ratio is built to measure.

What counts as a high or low ratio?

The spread across platforms is enormous, which is what makes the metric so revealing. Cloudflare's own analysis of January to July 2025 data ranked the major operators by their average ratio. Anthropic was the most crawl-heavy by a wide margin, crawling roughly 38,000 pages for every referral by July 2025, even after an 87% improvement across the year from a January figure of around 287,000:1. OpenAI averaged close to 1,400:1 over the same window, finishing July near 1,091:1. Perplexity moved the other way, rising 257% to about 195 crawls per referral as it crawled more heavily relative to the traffic it returned. Microsoft held steady around 40:1, and traditional Google search sat in single digits to low double digits, averaging under 12:1.

By 2026 the heaviest ratios had eased further but remained structurally severe. Figures reported from Cloudflare Radar for late May and early June 2026 put Anthropic around 11,000:1, OpenAI around 850:1, and Google around 5:1. The direction of travel matters more than any single week: even after large percentage improvements, the gap between an AI platform and a legacy search engine is still measured in orders of magnitude.

Why has the ratio collapsed for AI platforms?

Two forces drive it. The first is the purpose of the crawl. By Cloudflare's classification, training accounted for roughly 80% of AI crawling over a twelve-month period to mid-2025, with search at around 15 to 18% and live user actions at just 2 to 3%. Training crawls return nothing to the publisher by definition - the content is absorbed into a model, not surfaced with a link - so the more crawling skews toward training, the worse the ratio gets.

The second force is the collapse of the click on the answer side. As AI Overviews, AI Mode and assistant answers resolve more queries inside the results surface, fewer users click through to the source even when it is cited. Cloudflare recorded Google referrals to news sites falling through 2025, with one month down around 15% against January. Fewer outbound clicks shrink the denominator, which pushes the ratio up regardless of how much crawling changes. The ratio is the product of both trends compounding: more reading, less returning.

What the ratio does not capture

The crawl-to-refer ratio is a directional indicator, not a precise audit, and it has a known blind spot. Much AI referral traffic arrives with no Referer header at all - notably from native mobile apps and from links that strip the referrer - so the visitors a platform genuinely sends back are undercounted. Cloudflare itself cautions that because referral counts capture mainly web-based tools, the published ratios may overstate the imbalance, though by an unknown amount. The honest reading is that the absolute numbers are estimates while the pattern they describe is real and consistent.

The ratio also says nothing about the value of the reads it counts. A thousand training crawls and a thousand live answer-generation reads register identically on the crawl side, yet they mean very different things for a publisher deciding whether to allow, block or charge for access. The metric tells you the scale of the imbalance, not what each read is worth.

What a high crawl-to-refer ratio means for publisher revenue

For a publisher, the ratio is the clearest evidence that the traffic-for-content exchange has stopped functioning. The traditional model monetised the click: a reader arrived from search, loaded a page, and generated an ad impression or a subscription prompt. When a platform crawls thousands of pages and returns a single visitor, almost all of the value of that content is being realised inside the AI product and almost none is flowing back to the source. The content still does the work of answering the question. It simply does it somewhere the publisher cannot monetise.

This is why the metric has become a fixture of the publisher monetisation debate. It quantifies, in one number, the argument that AI consumption is decoupled from AI compensation. A ratio in the thousands is not a traffic problem to be optimised away with better headlines or faster pages. It is a structural signal that monetisation has to move to where the content is actually consumed.

What publishers can do about it

There are three broad responses, and most publishers will combine them. The first is to block: identify the heaviest crawlers and refuse them at the edge, accepting the loss of any citations they might have produced. The second is to charge for access, through pay-per-crawl mechanisms and emerging licensing standards that put a price on the read itself. The third is to monetise the read rather than the click - to treat the AI retrieval as the commercial event, since the human visit it would once have produced is no longer coming.

This last response is the layer blankspace operates in. Because the crawl-to-refer ratio is calculated where requests actually hit the site, the imbalance it measures is visible at the CDN edge before any analytics tag or paywall fires. blankspace detects AI and Live Search Agent reads at that edge and monetises the retrieval in place, which is the part of the value chain a high ratio shows is leaking. The ratio defines the problem; edge monetisation is one way to close the gap between what AI reads and what the publisher earns.

Frequently asked questions

What is a good crawl-to-refer ratio?

There is no universal threshold, but lower is better, and context is everything. Traditional search engines have historically operated in single digits, crawling a handful of pages for each visitor they send. Any ratio in the hundreds or thousands indicates a platform consuming far more content than it returns. Judge a platform by its trend as much as its absolute number: a ratio that is falling means more referrals relative to crawling, even if the headline figure is still high.

How is the crawl-to-refer ratio different from crawl budget?

Crawl budget is an SEO concept describing how many pages a search engine is willing to crawl on your site within a given period, and it concerns indexing efficiency. The crawl-to-refer ratio is an economic measure: it compares crawling against the traffic returned, regardless of indexing. A site can have a healthy crawl budget and a catastrophic crawl-to-refer ratio at the same time, because the two metrics answer different questions - one about coverage, one about reciprocity.

Where can I see the crawl-to-refer ratio for my own site?

The aggregate, network-wide ratios by platform are published on Cloudflare Radar's AI Insights page, with a time-series view in its Data Explorer and an API endpoint. Those figures reflect Cloudflare's whole network rather than any single domain. To understand the ratio for your own site specifically you need server-side or edge logs that separate verified AI crawler requests from referred human visits, because client-side analytics tools cannot see most AI crawler activity at all.

Does a high crawl-to-refer ratio mean I should block AI crawlers?

Not automatically. A high ratio tells you a platform is taking far more than it returns, but blocking forfeits any citations or referrals that platform does produce, and the visitors it sends may be undercounted by the metric's referrer blind spot. Blocking, charging and monetising the read are all legitimate responses, and the right mix depends on how much referral value a platform genuinely sends and how you intend to capture value from the reads you allow.

Why do AI platforms crawl so many pages per referral?

Because most AI crawling exists to train models or to answer questions inside the assistant, neither of which produces an outbound click. Training crawls absorb content into a model and return nothing by design, and they make up the large majority of AI crawling. Answer-generation reads resolve the user's query in place, so even a cited source is often not clicked. The result is heavy, repeated consumption with very little traffic flowing back to the publisher.