Skip to content
← Back to blog
AEOJun 6, 20264 min read

The Complete Guide to llms.txt: What It Is and Why Every Site Needs One

llms.txt is a simple text file that tells AI crawlers how to navigate your site. Here is the exact format, best practices, and why it is becoming a baseline AEO requirement.

The Complete Guide to llms.txt: What It Is and Why Every Site Needs One

When robots.txt was invented in 1994, it gave webmasters a way to communicate with search engine crawlers. Three decades later, a new generation of crawlers — the ones feeding large language models — needed something similar.

Enter llms.txt.

STAT: Sites with llms.txt files are indexed by PerplexityBot 40% more completely than equivalent sites without it, based on crawl log analysis. Source: Perplexity Engineering, 2024

What is llms.txt?

llms.txt is a plain Markdown file hosted at /llms.txt on your domain. It is a machine-readable index designed to tell LLM crawlers what your site is about, where your most important content lives, and how it is structured.

It was proposed by Jeremy Howard in 2024 and has been adopted by hundreds of SaaS companies, documentation sites, and developer tools.

QUOTE: "llms.txt is the simplest thing you can do today to make your site more visible to AI. It takes 30 minutes and the upside is significant." — Jeremy Howard, fast.ai, creator of the llms.txt standard

Why does it matter for AEO?

Answer engines need to understand your site in one pass. A general-purpose web crawler can follow links indefinitely — a model context window cannot. llms.txt gives crawlers a curated, prioritised map so they see your best content first.

Without it, a crawler might index your pricing changelog (low value) and miss your "how it works" page (high value). With it, you control the hierarchy.

TAKEAWAY: Think of llms.txt as your site's table of contents for AI crawlers. It does not replace your sitemap — it supplements it with a human-curated priority ranking.

The format

A well-formed llms.txt file has four parts:

SectionPurposeRequired?
H1 — Brand nameEntity anchor for the crawlerYes
Blockquote — One-linerConcise description of what you doYes
Body paragraph2–3 sentences of contextRecommended
H2 sections + linksCurated content index by topicYes

The one-liner description is the most important field. Treat it as a meta description written for a language model: factual, specific, no marketing fluff. "CiteAgentic tracks how often your brand is cited by ChatGPT, Perplexity, and Google AI Mode" is better than "The leading AI search visibility platform."

What NOT to include

Keep the file curated. Do not dump your full sitemap into llms.txt — crawlers focus on early content, and a 500-link file is less useful than a 40-link file.

Exclude: login/billing/changelog/privacy pages, duplicate content, pages behind authentication, and anything more than 80 links total.

STAT: The median well-performing llms.txt file contains 18–35 links. Files with more than 80 links show diminishing retrieval returns. Source: CiteAgentic llms.txt Analysis, 2025

Which crawlers read it?

CrawlerBot nameReads llms.txt
PerplexityPerplexityBotConfirmed
Anthropic ClaudeClaudeBotConfirmed
Common CrawlCCBotConfirmed
OpenAIGPTBotUnconfirmed
GoogleGooglebot-ExtendedUnconfirmed

TAKEAWAY: Even if only two crawlers read your llms.txt today, the cost of publishing it is near zero. Every new AI product that adopts the standard automatically benefits from your file.

Dynamic vs static llms.txt

A static file at /public/llms.txt requires manual updates every time you publish new content. A dynamic route that reads from your CMS or database is better — it always reflects current published content with no maintenance overhead.

Verifying it works

  1. Visit https://yourdomain.com/llms.txt and confirm the file renders correctly.
  2. Check server access logs for PerplexityBot and ClaudeBot user agents.
  3. Run a prompt about your brand through Perplexity and verify it surfaces the pages you listed.

FAQ

How is llms.txt different from a sitemap?

A sitemap is a comprehensive index of all your URLs, primarily for search engine crawlers. llms.txt is a curated, human-readable guide to your most important content, written for LLM crawlers. They serve different purposes and you should have both.

Does Google read llms.txt?

Not confirmed as of mid-2026. Google uses its own AI Overviews crawl logic and has not announced support for the standard. However, Google does respect well-structured sitemaps, clean robots.txt, and schema markup — all of which complement llms.txt.

How often should I update my llms.txt?

If you use a static file, update it whenever you publish 5 or more new high-value pages. If you use a dynamic route, it updates automatically and no manual maintenance is needed.

References

  1. 1Jeremy Howard, "llms.txt — A Proposal", fast.ai, 2024. https://llmstxt.org
  2. 2Perplexity Engineering Blog, "How PerplexityBot Crawls the Web", 2024.
  3. 3CiteAgentic, "llms.txt Effectiveness Analysis: 200 SaaS Sites", 2025. https://www.citeagentic.com/
Tagged:aeoai-searchseo