← Back to blog

AT

Alex Taylor

June 25, 2026

What is NLWeb, and what does it mean for publishers?

NLWeb is an open Microsoft project that turns any website into a conversational AI app and a Model Context Protocol server, so people and agents can query your content in plain language. It runs on data publishers already publish, Schema.org and RSS, and lets sites join the agentic web on their own terms. What it does not settle is how a publisher gets paid when the agent takes the answer and the reader never lands.

Picture a reader, or an AI agent acting for one, asking your site a question and getting a direct answer assembled from your own content, without a search engine sitting in the middle and without a single page view. That is the experience NLWeb is built to deliver. Short for Natural Language Web, it is an open project from Microsoft that adds a natural-language query layer to any website, drawing on the structured data the site already publishes. Microsoft frames it as something close to what HTML was for the early web: a simple, shared way to make a site legible to the new clients that matter, except this time the clients are assistants and agents rather than browsers. For publishers, that makes NLWeb a genuine opportunity to be present and queryable in the agentic web. It also sharpens the question that already defines the era, because being read by an agent is not the same as being paid by one.

What is NLWeb?

NLWeb is an open-source project, released by Microsoft in May 2025, that lets a website expose its content through a conversational, natural-language interface. Instead of a visitor scanning a results page or navigating a menu, they (or an agent) ask a question and receive a direct answer built from that site's own data.

It was conceived by R.V. Guha, who joined Microsoft as a corporate vice president and technical fellow and who previously created RSS, RDF and Schema.org. That lineage is the point: NLWeb is designed to sit on top of the open web standards publishers have used for two decades rather than to replace them. The code, connectors and tooling are available in the public microsoft/NLWeb repository, and the project is deliberately technology agnostic, supporting the major operating systems, language models and vector databases so a publisher can run it on the stack they already have.

How does NLWeb work?

NLWeb takes the semi-structured data a site already produces, principally Schema.org markup and RSS feeds, and combines it with a language model and a vector database to answer questions about that content. The site's data is loaded into the chosen vector store, and the NLWeb service handles incoming queries against it.

Each instance exposes two things. There is an ask endpoint, the natural-language interface a publisher can put on their own site so visitors and agents can query it directly. And critically, every NLWeb instance is also a Model Context Protocol (MCP) server, with an mcp endpoint that lets external AI agents access the same content in a structured, machine-readable way. The system supports session chat history, so questions can build on one another, and it rewrites context-dependent follow-ups into standalone queries to improve retrieval. Because answers are grounded in the publisher's own indexed data and returned in Schema.org form, the results stay tied to what the site actually published rather than to whatever a general model happens to remember.

Why is NLWeb significant for the agentic web?

The strategic claim Microsoft makes for NLWeb is that it could play the role HTML played for the document web: a low-effort, open standard that any publisher can adopt to become a first-class participant in a new layer of the internet. In the agentic web, software agents increasingly do the browsing, buying and researching on a person's behalf, and they need a reliable way to query a site's content and act on it. An NLWeb endpoint gives them one.

Because each instance is an MCP server, an NLWeb site becomes discoverable and callable inside the wider MCP ecosystem that assistants such as Copilot and other agent frameworks already use. The framing matters for publishers: NLWeb is presented as a way to participate on your own terms, exposing your content to agents if you choose and keeping it served from your own infrastructure rather than only being scraped into someone else's model. That is a meaningfully different posture from the default of the past two years, in which crawlers took content with no invitation and no control.

Who is using NLWeb?

Microsoft launched NLWeb with a cohort of early adopters that spans publishing, commerce and infrastructure: Chicago Public Media, Common Sense Media, DDM (the owner of Allrecipes and Serious Eats), Eventbrite, Hearst's Delish, Inception Labs, Milvus, O'Reilly Media, Qdrant, Shopify, Snowflake and Tripadvisor. Tripadvisor has used it for conversational travel planning, and O'Reilly built conversational search across roughly 59,000 books using its existing Schema.org metadata, with no web crawling required.

Adoption is also being made easier by the infrastructure layer. In early 2026 Cloudflare added native NLWeb support through its AutoRAG service, turning what was a manual integration into close to a one-click deployment, and said it was working with Microsoft to extend the standard. With Build 2026 now the first major Microsoft conference at which NLWeb can be judged on real deployments rather than promise, the practical question for publishers has shifted from whether the protocol exists to whether it earns its place in the stack.

What does NLWeb mean for publishers?

For a publisher, NLWeb offers three concrete things. It adds a modern conversational search experience to your own site, built from your own content and grounded in your own facts. It makes your content available to agents in a structured, permissioned way through the MCP endpoint, so you can choose to be present where agents look rather than absent. And it does this using standards, Schema.org and RSS, that many publishers already maintain, which keeps the cost of entry low.

There is a real upside in discoverability and control. O'Reilly, one of the pioneers, framed its support precisely around the problem of large, centralised gatekeepers absorbing publisher content into their models and returning ever less traffic to the sites that produced it. NLWeb is an attempt to give publishers a seat at that table on better terms. But the upside is about presence and permission, not payment, and that distinction is where publishers need to be clear-eyed.

Does NLWeb solve the publisher monetisation problem?

No. NLWeb governs how your content is discovered, queried and served to people and agents. It does not govern how you are compensated when that content is consumed. An agent that reaches your mcp endpoint, or a reader who gets a complete answer from your ask interface, has had their need met without loading the pages where your advertising, subscriptions and analytics live. The retrieval is cleaner and more on your terms than an uninvited crawl, which is a genuine improvement, but it is still a read that does not produce a click.

This is the same structural shift that AI Overviews, AI Mode and the assistants created, now extended to the publisher's own front door. Click-through rates from AI answer surfaces already sit far below those of traditional search results, and an architecture designed to give agents direct answers will tend to reduce onward visits, not increase them. NLWeb makes you legible and reachable in the agentic web. Being legible is necessary, but on its own it converts the value of your content into someone else's product experience, not your revenue line.

Comprehension and discovery are not compensation

NLWeb is a strong, open answer to a real problem: publishers need a way to be present, queryable and in control as agents take over more of the browsing. Adopting it is reasonable, and doing it on open standards you already maintain is sensible. The limit is that NLWeb addresses the read at the level of access and format, not at the level of money. It helps an agent understand and reach your content. It does nothing to capture value when the agent uses it.

That gap is the layer blankspace works on. When an AI assistant or Live Search Agent retrieves your content to build an answer, whether through a crawl, a citation or a structured endpoint like NLWeb, that retrieval is a read of your material, and increasingly it is the only interaction that happens, because the human never lands on the page. blankspace detects that agent and Live Search Agent traffic at the CDN edge and turns those reads into revenue, independently of whether the answer ever sends a click back. NLWeb decides who can ask your content and how it is served. Capturing value from the read is a separate decision, and it is the one that determines whether agentic traffic shows up as a cost or as income.

Frequently asked questions

What is NLWeb in simple terms?

NLWeb is an open project from Microsoft that lets any website answer natural-language questions about its own content, both for human visitors and for AI agents. It uses data the site already publishes, such as Schema.org markup and RSS feeds, combined with a language model and a vector database, and every NLWeb instance also runs as a Model Context Protocol server so agents can access the content in a structured way.

Is NLWeb the same as MCP?

Not quite, though they are closely linked. The Model Context Protocol is the open standard for how AI agents connect to external tools and data sources. NLWeb is a higher-level project for building conversational interfaces on websites, and each NLWeb instance also functions as an MCP server. So NLWeb uses MCP to make a site's content available to agents, but it adds the query interface, the data ingestion tooling and the conversational layer on top.

How is NLWeb different from llms.txt or Schema.org?

They operate at different levels. Schema.org is a vocabulary for labelling what a page contains, and llms.txt is a file that helps AI systems find and read a site's key content. NLWeb goes a step further by standing up a live, queryable service: rather than just describing or listing content for a crawler, it answers natural-language questions against that content and exposes an endpoint that agents can call. In practice NLWeb consumes Schema.org data as one of its inputs.

Do publishers have to pay to use NLWeb?

The NLWeb project itself is open source and free to deploy from the public repository, so there is no licence fee for the standard. Publishers do bear the running costs of the underlying components, the language model, the vector database and the hosting, and managed routes such as Cloudflare's AutoRAG integration carry their own service pricing. The protocol is free; operating an instance is not.

Does adding NLWeb increase or reduce traffic to my site?

It can do either, and publishers should not assume it sends more visitors. NLWeb improves how discoverable and queryable your content is for agents, which can surface your material where it would otherwise be absent. But because it is designed to deliver direct answers, it can also satisfy a query without producing a page visit, in line with the broader fall in click-throughs from AI answer surfaces. The discovery benefit is real, but it does not by itself replace the page views, ad impressions or subscriptions that a click would have generated.