Proposed by Answer.AI co-founder Jeremy Howard in September 2024, llms.txt sits at the root of a domain - at /llms.txt - and offers a language model a flattened, link-rich map of the pages worth reading. The idea is simple: strip out the navigation, ad tags and tracking scripts that bloat raw HTML, and give the model the signal without the noise. Whether that is worth a publisher's time in 2026 depends entirely on what you want from it, because the file solves a comprehension problem, not a compensation one.
What is llms.txt?
llms.txt is a Markdown file placed at the root of a website (yoursite.com/llms.txt) that gives large language models a curated, machine-readable summary of the site's most important content. Rather than asking a model to parse a full HTML page - with menus, scripts, cookie banners and ad slots all competing for space in a limited context window - the file lists the pages that matter, each as a clean link with a short description, sometimes alongside the underlying text in plain Markdown.
The format was published by Jeremy Howard on llmstxt.org and answer.ai on 3 September 2024. It was designed for a specific pain point: context windows are finite, and converting messy HTML to usable plain text is lossy and expensive. A well-built llms.txt hands the model exactly the text it needs, already flattened. In its fullest form a site also publishes companion files such as llms-full.txt, which inlines the complete content of key pages so an agent can ingest them in a single fetch.
How is llms.txt different from robots.txt?
robots.txt and llms.txt look similar - both are plain text files at the root of a domain - but they do opposite jobs. robots.txt is an access-control file: it tells crawlers which paths they may and may not request. llms.txt is a comprehension file: it assumes a model is already reading and tells it what is worth reading and how to interpret it.
Search Engine Land put it well in calling llms.txt "a treasure map for AI" rather than a gate. robots.txt manages whether a bot gets in; llms.txt manages what it understands once it is inside. The two are complementary, not substitutes. A publisher can block training crawlers in robots.txt and still publish an llms.txt for the agents it does want to serve, or vice versa. Critically, neither file is a paywall and neither file carries any payment mechanism. They are signposts, not turnstiles, and certainly not tills.
Do AI models actually read llms.txt?
For the most part, not yet - and this is the single most important fact for a publisher weighing the effort. The large AI search crawlers that assemble live answers, including GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot and Google-Extended, overwhelmingly skip /llms.txt and crawl HTML directly. Google has stated publicly that its systems do not currently use llms.txt, and an SE Ranking analysis of nearly 300,000 domains found no measurable lift in citation rates for sites that had published one.
Where llms.txt does get used is inside products rather than at the open-web crawl layer. Documentation-heavy tools and in-product assistants fetch it routinely: Mintlify-hosted docs, IDE agents such as Cursor, GitHub Copilot and Windsurf, MCP servers, and retrieval workflows inside assistants like Claude. So the honest answer is that llms.txt works best when an AI tool is deliberately pointed at your site for a known task - developer documentation is the standout use case - and works barely at all for getting a publisher's articles surfaced in a general ChatGPT, Gemini or Perplexity answer.
What should a publisher put in an llms.txt file?
The spec is deliberately light. At minimum, a valid file is a single H1 with the site or project name, an optional blockquote summary, and one or more sections of Markdown links to the pages you most want a model to read, each with a short description. The convention is to lead with the highest-value, most stable, most factual pages - product pages, key explainers, an about page, documentation - and to leave out the ephemeral and the duplicative.
The discipline of building one is arguably more useful than the file itself. Deciding which twenty pages actually represent your site, writing a clean one-line description for each, and keeping the list current forces a clarity that benefits human readers and any retrieval system. But it is maintenance: a stale llms.txt that points at dead URLs or omits your best new work is worse than none, and there is no automated reconciliation between your sitemap and your llms.txt.
Should publishers add an llms.txt file?
For most publishers in mid-2026, llms.txt is a low-cost, low-return housekeeping task rather than a strategy. Adoption remains thin - an SE Ranking study put it at roughly 10% of domains sampled, and adoption across the Majestic Million was a fraction of a percent at the start of 2025 - which tells you both that few sites bother and that the major AI platforms have not made it a ranking or retrieval signal. If your site has substantial documentation or you want specific AI tools to use your content for defined tasks, publishing one is worthwhile and cheap. If your goal is to appear more often in AI-generated answers, the file will not move the needle on its own.
The deeper point is that llms.txt optimises for being understood, not for being valued. It makes your content easier and cheaper for a model to consume, which is a double-edged outcome for a publisher whose business depends on that content having a price. Being read efficiently by an agent that never sends a click, never shows an ad and never pays a licence fee is not obviously a win. That is the gap the file leaves open.
Where llms.txt stops, monetisation begins
llms.txt addresses comprehension. It does nothing about compensation. A publisher can hand an AI agent a perfectly flattened summary of its best work and still earn nothing when that agent uses it to answer a user. The file has no notion of impressions, attribution or payment, because it was never designed to. This is precisely the layer that blankspace operates in. Rather than relying on a voluntary file that crawlers may ignore, blankspace works at the CDN edge to detect Live Search Agent and LLM traffic as it actually arrives, and to inject contextual brand facts into the AI responses those agents generate - turning an otherwise uncompensated read into a monetisable, measurable event. llms.txt can help a model understand your site; it cannot help you get paid when it does. For publishers, the two questions are separate, and only one of them affects revenue.
Frequently asked questions
Is llms.txt an official web standard?
No. llms.txt is a community proposal published by Answer.AI in September 2024, not a standard ratified by the IETF, the W3C or any AI platform. It has a published specification at llmstxt.org and a directory of sites using it, but no AI company is contractually obliged to read or honour it, and several have declined to commit to supporting it.
Will adding llms.txt help my pages appear in ChatGPT or Perplexity answers?
There is no reliable evidence that it does. The major AI search crawlers generally fetch HTML directly and skip /llms.txt, and an SE Ranking analysis of nearly 300,000 domains found no measurable improvement in citation rates for sites that published one. Treat any claim that llms.txt boosts AI visibility with caution and ask to see the data behind it.
Does llms.txt replace robots.txt?
No. They do different jobs and can coexist. robots.txt controls which paths a crawler may access; llms.txt suggests which content a model should read and how to interpret it. Publishing one does not affect the other, and neither file provides any form of payment or licensing for the content it describes.
Who actually uses llms.txt today?
Adoption is concentrated in developer documentation and in-product AI tooling. Documentation platforms like Mintlify, IDE assistants such as Cursor, Copilot and Windsurf, MCP servers and some in-product retrieval workflows fetch it for defined tasks. Open-web AI search crawlers that build general answers largely do not, which is why its practical impact for news and content publishers remains limited.
If llms.txt does not pay publishers, what does?
llms.txt is a comprehension file, not a revenue mechanism, so monetisation has to happen elsewhere -- through licensing deals, pay-per-crawl arrangements, or edge-level approaches that detect agent traffic and inject paid, contextual brand mentions into AI answers. blankspace sits in that last category, monetising Live Search Agent traffic at the CDN edge rather than relying on a file that crawlers can choose to ignore.
