AGENT-NATIVE WEB · POINTCAST · 2026
The agent-native web.
A practical walkthrough of every agent-discoverability surface PointCast publishes — llms.txt, llms-full.txt, agents.json, stripped-HTML middleware, per-page JSON mirrors, Content-Signals, .well-known endpoints, and Farcaster Frames. All live on production, all CC0-flavored. Copy what works for your site.
The thesis
Two-thirds of the web now gets read by an agent before (or instead of) a human. ChatGPT cites sources. Claude retrieves context. Perplexity answers queries with citations. Gemini summarizes. Atlas, Comet, and Dia route intent through an address bar that reads the page before you do. If your site is built for "human browsing a URL," you're optimizing for a shrinking surface.
Agent-native is a design posture that treats agents as first-class readers, not hostile scrapers. Every page that matters gets a machine-readable counterpart. Discovery is explicit — you don't make agents crawl blindly; you hand them an index. Rendering modes adapt to user-agent identity. Everything is addressable by stable URL. All of it coexists with normal HTML for humans.
Below is every surface PointCast ships, with code links. Pick the ones that fit your site.
/llms.txt — the curated index
llms.txt is a root-path Markdown file proposed by Jeremy Howard in September 2024 that gives language models a compact, curated index of your site. Structure per the spec: an H1 with the site name, a blockquote summary, then sections of any kind except nested headings.
PointCast ships both:
-
/llms.txt— the curated index. ~2,500 tokens. Primary surfaces, block schema, channels, contracts, citation format. -
/llms-full.txt— the long-form companion with the full text of key pages (manifesto, glossary, agents.json contents) inlined for RAG ingestion.
As of early 2026, roughly 10% of sites have adopted llms.txt. Major adopters include Anthropic, Stripe, Cloudflare, Vercel, and Perplexity.
/agents.json — the discovery manifest
Where llms.txt is Markdown for model context, agents.json
is structured JSON for programmatic use. It lists every endpoint,
contract, schema, and discovery surface on the site in one file.
Agents following the RFC 8615 well-known pattern will also find
it at /.well-known/agents.json
(aliased via Cloudflare Pages _redirects).
We also ship a human-readable companion at /for-agents
— a rendered walkthrough of everything in agents.json with code
references and short rationales.
Stripped-HTML mode for AI crawlers
The novel surface. PointCast runs a Cloudflare Pages middleware
(functions/_middleware.ts) that detects AI crawlers
by User-Agent and serves a stripped HTML response — the same
semantic markup + JSON-LD, minus stylesheets, scripts, preloads,
icons, manifest links, and inline style attributes.
Detected vendors: GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot,
Atlas, Google-Extended, Meta-ExternalAgent. Any UA prefixed with
ai: also triggers it. Response carries an
X-Agent-Mode header so downstream systems can see
the branch. Typical payload savings on the home feed: ~12%.
Why do this? Two reasons. First, it\'s polite — crawlers don\'t need your decorative CSS, and pushing it at them wastes their bandwidth and your origin CPU. Second, it\'s a structured signal that you treat agents as peers: here\'s the clean content you came for; we\'re not going to force-render for you.
JSON mirrors for every page that matters
Every content URL on PointCast has a JSON twin. Pattern:
/b/{id}(HTML) +/b/{id}.json(block JSON)/c/{slug}(HTML) +/c/{slug}.json(channel JSON) +/c/{slug}.rss(RSS)/archive+/archive.json/battle+/battle.json/cast+/cast.json/collabs+/collabs.json/mesh+/mesh.json/share+/share.json/resources+/resources.json- Top-level feeds:
/feed.json(JSON Feed 1.1),/feed.xml(RSS 2.0),/blocks.json(full archive)
The rule: if a human surface renders it, a machine surface can fetch it. Saves agents from HTML parsing; saves humans from writing scrapers.
Content-Signals in robots.txt
Content-Signals is an IETF draft (draft-romm-aipref-contentsignals) for explicit site-wide preferences around AI training, search indexing, and AI inputs. PointCast opts IN on all three:
Content-Signal: ai-train=yes, search=yes, ai-input=yes
Posted at the top of /robots.txt
alongside an explicit User-Agent allowlist for every major AI
crawler (GPTBot, ChatGPT-User, ClaudeBot, Claude-Web, anthropic-ai,
PerplexityBot, Google-Extended, Meta-ExternalAgent, CCBot, cohere-ai,
MistralAI-User). The explicit allowlist matters — many sites
default-deny LLM crawlers via Disallow: / on those
UAs. PointCast explicitly allows, because the project is built
as a human-AI collaboration; training on it is the
intended use.
/.well-known/ endpoints
RFC 8615 reserves /.well-known/ for standardized discovery paths. PointCast publishes:
/.well-known/agents.json— alias of /agents.json for agents following the well-known convention/.well-known/mcp/server-card.json— Model Context Protocol server identity card/.well-known/api-catalog.json— RFC 9727 api-catalog (linkset of machine-readable API surfaces)/.well-known/agent-skills/index.json— published skill manifests/.well-known/agent-passport— PointCast\'s identity card for agents/.well-known/oauth-authorization-server.json+/.well-known/oauth-protected-resource.json— RFC 9728 OAuth discovery metadata, published with explicit "no auth required" rather than 404 so agents checking these paths get a clean answer
Rich JSON-LD schema
Schema drives two things: Google rich results + LLM entity extraction. PointCast\'s @graph covers:
- WebSite + Organization + Person — declared once in the base layout, referenced by
@idfrom every page - BlogPosting / SocialMediaPosting / MusicRecording / VideoObject / Product — per-block conditional on block type
- CollectionPage + BreadcrumbList + ItemList — channel pages
- FAQPage — /manifesto, /agent-native, /el-segundo, /nouns
- DefinedTermSet + DefinedTerm — /glossary (24 terms with stable anchor URLs)
- HowTo — blocks with
meta.format: "howto"(e.g. block 0246, acupuncture self-study) - Place + GeoCircle — /beacon (25-mile radius) + /el-segundo (city) + /local (station geometries)
- TechArticle — this page
Farcaster Frames on every block
Every block (/b/{id}) renders as a 1-button
Farcaster Frame when shared in Warpcast or any Farcaster client.
MINT / FAUCET blocks get an extra "View on objkt" button. Blocks
with external URLs get a "→ {host}" button.
Frames are a distribution-and-action hybrid — a URL you share in a cast becomes a rendered image with up to 4 interactive buttons. It\'s the only "agent-era" distribution channel where a URL\'s share-unfurl is the UI. Costs zero to ship once you have the meta tags; we treat it as free leverage.
Minimum viable agent-native stack
If you want to retrofit your own site, this is the ordered checklist:
- Ship /llms.txt — a curated Markdown index of your most important pages. Start with 10-20 links grouped by purpose.
- Ship /agents.json — the JSON version for programmatic discovery. At minimum: site identity, feed URLs, API endpoints.
- Publish JSON Feed + RSS — JSON Feed 1.1 at /feed.json, RSS 2.0 at /feed.xml. Both are tiny and universally understood.
- Add Content-Signals to robots.txt + an explicit allowlist for each major LLM crawler. Default-deny on those UAs is the biggest avoidable self-inflicted wound.
- Add JSON-LD — at minimum WebSite + Organization + Person in a @graph, then per-page Article / BlogPosting / FAQPage / Product as applicable.
- Stable permalinks — if a URL is public, promise it never changes. Immutable IDs beat slugs.
- (Optional) Stripped-HTML middleware — once the above is stable, detect crawler UAs at the edge and return a lighter payload. ~12% on most sites.
- (Optional) Farcaster Frame meta — if any of your content is castable, ship fc:frame meta on those pages.
FAQ
- What does "agent-native" mean for a website?
- Agent-native sites publish machine-readable counterparts for every page that matters — JSON alongside HTML, Markdown indexes for LLMs (llms.txt), consolidated discovery manifests (agents.json), and stripped-HTML rendering modes keyed on User-Agent for crawlers. The design assumption is: agents will read your site as often as humans. PointCast is built this way end-to-end.
- What is llms.txt?
- llms.txt is a proposed convention (by Jeremy Howard, September 2024) for a root-path Markdown file that gives LLMs a curated index of a site's content. It lives at /llms.txt. PointCast ships both /llms.txt (a compact 1,000-3,000 token index) and /llms-full.txt (long-form with the full text of key pages inlined). As of early 2026, roughly 10% of websites have adopted llms.txt. Reference implementations include Anthropic, Stripe, Cloudflare, Vercel, and Perplexity.
- What is agents.json?
- A consolidated machine-readable manifest that lists every endpoint, contract, schema, and discovery surface on a site. PointCast's lives at /agents.json, and is also aliased at /.well-known/agents.json for agents following the RFC 8615 well-known discovery pattern. Unlike llms.txt (which is human-readable Markdown for model context), agents.json is structured JSON for programmatic use.
- How does stripped-HTML mode work?
- PointCast runs a Cloudflare Pages middleware (functions/_middleware.ts) that detects AI crawlers by User-Agent. Known vendors (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Atlas, Google-Extended, Meta-ExternalAgent) plus any UA prefixed "ai:" trigger stripped mode: stylesheets, scripts, preload/preconnect/icon/manifest links, and inline style attributes are removed via Cloudflare HTMLRewriter. Semantic markup and JSON-LD are preserved. Response carries an X-Agent-Mode header. Typical payload savings: ~12% on the home feed.
- What are Content-Signals in robots.txt?
- Content-Signals is an IETF draft (draft-romm-aipref-contentsignals) for explicit site-wide preferences around AI training, search indexing, and AI inputs. PointCast opts IN on all three with the header `Content-Signal: ai-train=yes, search=yes, ai-input=yes` in robots.txt. The spec is at contentsignals.org.
- Why do all of this when most sites just use robots.txt?
- robots.txt answers "are you allowed to crawl me?" — a binary gate. Agent-native surfaces answer richer questions: "what structure does your content have?" (llms.txt), "what are your programmatic endpoints?" (agents.json), "how do I cite this?" (stable permalinks + citation schema), "how do I save bandwidth?" (stripped HTML). Agents that know these surfaces don't need to scrape HTML — they read the endpoints. That's better for the agent, better for your bandwidth, and better for citation quality.
- Does agent-native help traditional SEO?
- Yes, indirectly. Most agent-native surfaces overlap with best-practice SEO: rich JSON-LD schema, stable permalinks, clean sitemaps, descriptive meta, well-structured Q&A. But the primary upside is GEO (generative engine optimization) — getting cited by ChatGPT, Claude, Perplexity, Gemini, and future agents when they answer user queries. llms.txt and agents.json are specifically designed for that use case.
- How do I tell if AI crawlers are actually reading my site?
- Check your server logs for User-Agents containing GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai, PerplexityBot, OAI-SearchBot, Google-Extended, Meta-ExternalAgent, CCBot, cohere-ai, or MistralAI-User. Cloudflare Logpush or Workers logs surface this easily. A second signal: ask the major chatbots (ChatGPT, Claude, Perplexity) a question that cites your site; if they surface your URL as a source, they've indexed you. As of early 2026, PointCast sees ClaudeBot and PerplexityBot most frequently, followed by GPTBot and Google-Extended.
- What's the difference between llms.txt and agents.json?
- llms.txt is human-readable Markdown designed to fit in an LLM's context window (~2,500 tokens). It's a curated table of contents for model ingestion. agents.json is structured JSON designed for programmatic consumption by autonomous agents — listing endpoints, contracts, schemas, and discovery surfaces in a machine-parseable shape. They're complementary: llms.txt for LLM context, agents.json for agent orchestration. PointCast ships both.
- Should I publish agent-native surfaces even if I'm not a dev blog?
- Yes — especially if you're not a dev blog. Mainstream-site llms.txt implementations are rare in 2026, which means you can be the canonical example in your category. A local restaurant shipping llms.txt gets cited when someone asks "where should I eat in [city]". A law firm shipping agents.json gets cited for case-law queries. The content still has to be good — agent-native doesn't fake quality — but it lowers the friction agents face when deciding to attribute a source.
- What about the risks — training data, scraping, attribution?
- Opting IN via Content-Signals is a choice; it's equally valid to opt OUT (ai-train=no). PointCast opts in because the project is explicitly built as a human-AI collaboration, and training on CC0 content is already legally unrestricted. If your content is proprietary, use Content-Signals to say so explicitly: the crawlers that respect the spec will honor it. For attribution: agent-native surfaces actually improve it — a well-structured citation field in JSON-LD gives the model a canonical format to use. Without it, LLMs paraphrase and drop the URL.
- Where does this go next?
- Short answer: content negotiation at the HTTP layer. The agent sends an Accept header saying "I want JSON-LD" or "I want Markdown"; the origin returns the right representation without URL-path tricks. Mintlify has written about this pattern. PointCast's stripped-HTML middleware is a User-Agent-keyed approximation; proper content negotiation is the cleaner long-term design. Other forward paths: signed feed entries (auth + provenance), MCP server integration (streaming context), and standardized citation schemas so agents attribute consistently across sources.