Can robots.txt stop AI crawlers?

The short answer is no, not on its own. robots.txt is a request, not a lock: a crawler reads it and decides for itself whether to comply, and a rising share of AI bots simply do not. The major, reputable AI companies still honour it, which makes it a worthwhile way to state your intent, but anything that depends on a bot choosing to obey is a preference rather than a control. Real prevention, and any prospect of being paid for AI access, happens lower down, at the network or CDN layer, where a request can be verified and then blocked or charged rather than politely asked to leave.

What robots.txt actually is

robots.txt is a plain text file at the root of your domain that tells automated clients which parts of your site they may or may not access. It works on the honour system. A crawler is expected to fetch the file, find the rules that apply to its user-agent name, and comply. It has governed search-engine crawling for decades and works well for that, because the major search engines choose to obey it in exchange for being allowed to index the web. It is an agreement, not a lock on the door.

Why it does not reliably stop AI crawlers

Three weaknesses matter for AI traffic. First, compliance is voluntary, so a crawler that chooses to ignore robots.txt simply does, and an increasing share of AI bots do. Second, enforcement depends on the crawler honestly declaring who it is, and a user-agent string can be set to anything, so a bot can disguise itself as a browser and slip past rules aimed at its real name. Third, robots.txt only addresses access; even where it is obeyed, it gives you no way to monetise the traffic, only to turn it away. For the major, reputable AI companies it remains meaningful, because they honour it and publish the names to use. For everyone else it is advisory at best.

What you can and cannot control with robots.txt

You can use robots.txt to express clear intent to the crawlers that respect it. The major AI companies separate their crawlers by purpose and document the names, so you can, for example, disallow a training crawler while allowing a search or live-retrieval agent, keeping your content out of model training while staying eligible to be cited in AI answers. What you cannot do is guarantee any of it. robots.txt will not stop a non-compliant scraper, it will not catch a spoofed agent, and it will never earn you anything from the traffic it allows.

Enforcement lives at the network layer

To actually prevent or monetise AI access, you need to act where requests can be verified and acted on, at the server or CDN edge rather than in a text file. At the edge you can verify a crawler against its owner's published IP ranges and cryptographic signatures rather than trusting the user-agent string, so a spoofed bot is caught. You can then enforce a real decision: block it, charge it for access, or allow it and monetise the read. blankspace operates here. It classifies and verifies AI agents at the CDN edge, which is the layer where a policy can be enforced rather than merely requested, and where a Live Search Agent retrieval can be turned into revenue instead of simply being allowed through for free.

How to use robots.txt sensibly

Treat robots.txt as the polite first layer, not the whole strategy. Keep it accurate and use the documented crawler names to signal your intent to the companies that honour it, separating training from retrieval if that reflects your wishes. Then put enforcement at the edge for everything that matters: verify identity properly, block bad actors, and decide whether to charge or monetise legitimate AI traffic. robots.txt tells crawlers what you would like. The edge decides what actually happens.

Frequently asked questions

Do AI companies obey robots.txt?

The major reputable AI companies generally honour robots.txt and publish the crawler names to use. Compliance across the wider field is inconsistent, and the share of AI bots ignoring voluntary restrictions has been increasing.

Can I block GPTBot or other AI crawlers with robots.txt?

You can add disallow rules for named crawlers, and compliant ones will obey. But the rule depends on the bot identifying itself honestly and choosing to comply, so it is not a guarantee and does nothing against spoofed or non-compliant bots.

What is the difference between blocking training and blocking retrieval?

The major AI companies use separate identifiers for crawlers that gather training data and those that retrieve content for search or live answers. You can disallow one and allow the other, for example keeping content out of training while staying eligible to be cited.

If robots.txt does not work, what does?

Enforcement at the network or CDN edge, where requests can be verified against owner IP ranges and signatures and then blocked, charged, or monetised. That is the layer where a decision is enforced rather than requested.

Can robots.txt help me make money from AI traffic?

No. robots.txt only governs access, allow or disallow. Monetising AI traffic requires acting on the request at the edge, through access charges or content-layer advertising, neither of which robots.txt can do.