← Back to blog

What the domain audit actually shows you (and what it doesn't)

We launched a free domain audit this week. Before you request one, here's exactly what the report shows: what's real data, what's modelled, and why we're being transparent about the difference.

This week we launched a free domain audit at blankspace.so/site-audit.

Give us your domain.

We send back a report showing what AI systems are doing with your content:

  • which brands are benefiting most from your content
  • what prompts AI has been using your pages to answer
  • which AI audiences are being influenced by your content

A few people have already asked the same question: is this real data, because the GEO space is full of synthetic data.

The honest answer is some of it, but not in the way you're thinking.

Here's exactly how it works.

Why this report exists

Google search referrals to publishers fell 34% year-on-year in 2025 (Chartbeat, March 2026). Organic traffic dropped up to 85% in some cases.

We've seen an extinction level shift for the online publishing industry in how users are consuming content.

What's clear though is that the audience didn't disappear.

We've just seen a massive displacement of traffic that most publishers can't see.

AI agents don't execute JavaScript.

They make HTTP requests, read HTML, and leave.

Your Google Analytics doesn't fire. Your ad server doesn't log the visit. The audience is there. Your infrastructure can't see it.

AI-driven traffic increased 187% from January to December 2025, while human traffic grew just 3.1% over the same period (HUMAN Security, March 2026).

To see AI agent traffic properly, you need to be at the CDN layer.

Before any of the JavaScript,

before the page renders.

That's where blankspace operates.

But asking a publisher to integrate at CDN level before they've seen a single data point about their own domain is a significant ask. So we built the audit. It shows publishers what's happening (and what could be happening) before they commit to anything.

The three data sources

The report draws on three distinct sources. They are not the same, and we want to be clear about which is which.

1. High traffic pages (real data)

We pull the pages most likely to drive significant AI traffic.

Using signals from a number of third party traffic sources, from SimilarWeb to Wayback Machine archive frequency, homepage links, and Common Crawl data to estimate which URLs attract the most traffic.

From those pages, we identify which brands are mentioned and how often, producing a brand share of voice figure: which brands are most present in the content AI agents are most likely to retrieve.

This is your actual content.

We read it and counted what's in it.

This dataset is assumed. Without CDN access we can't give a completely accurate picture.

2. A third-party citation database (real data)

We cross-reference your article URLs against a database of real user prompts, real AI-generated answers, and the real source URLs those AI systems cited when constructing those answers.

This isn't synthetic LLM data. This is real user data collected by our partners.

If your pages have appeared as sources in actual AI answers, we can see it.

We can show you the grounded queries. The core structure of the real prompts that triggered your content being used.

Where multiple pages from your domain were retrieved together in the same AI session, we can start mapping co-retrieval patterns: the audience intent clusters that define who's actually reaching your content through AI, and what they were trying to decide.

This is documented. Real prompts. Real answers. Real citations.

3. A synthetic simulation (illustrated, clearly labelled)

Here's the part that isn't real.

Because we can't see actual bot traffic to your site without CDN integration, we run simulated bot visits to illustrate what the analytics platform would show once you're integrated.

Things like LLM breakdown by provider, how much of your AI traffic comes from ChatGPT vs Perplexity vs Claude, and the split between Live Search Agents and training crawlers.

This is modelled in the report, not measured.

This layer is intentionally blurred. It's the data that becomes yours once you integrate. Showing it in outline makes the value of integration concrete without misrepresenting what we currently see.

What this means in practice

The grounded queries and brand SOV figures are real.

They come from actual citations and your actual content. They're the most immediately surprising part of the report for most publishers because most have never had any visibility into which questions AI has been using their pages to answer, and which brands are now benefiting most from their content.

With AI bot traffic now being your primary audience, it's more important than ever to understand the value your site offers the answer economy.

Start by understanding the potential goldmine you're sitting on.

blankspace.so/site-audit: free, no integration required, delivered in minutes.

If you want to talk through what the findings mean for your domain, drop us a message at alexander@blankspace.so.