llms.txt Explained: How to Guide AI Crawlers to Your Best Content
Understand with AI
Discuss with your preferred AI assistant
Typical LLM context windows force models to prioritize — llms.txt tells them what to read first.
A focused markdown file skips nav, scripts, and ads so models spend tokens on real content.
Just like robots.txt, AI crawlers look for the file at your domain root — one file, sitewide impact.
AI search engines like ChatGPT, Perplexity, Claude, and Google's AI Overviews now read your website to answer questions and cite sources. But large language models have a problem traditional crawlers don't: a strict context window. They can't read your entire site, and your real content is often buried under navigation, scripts, cookie banners, and ads. llms.txt solves this — it's a single plain-text file that hands AI systems a clean, curated map of your most important pages.
This guide explains what llms.txt is, the exact format it uses, how to write one that AI crawlers actually use, and how it differs from robots.txt and sitemap.xml.
What Is llms.txt?
llms.txt is a proposed standard (published at llmstxt.org) for a markdown file you host at the root of your domain — https://yoursite.com/llms.txt. It gives large language models a concise, structured summary of your site and links to the pages that matter most, in clean markdown they can parse without wading through HTML clutter.
Think of it as a curated reading list for AI. Instead of guessing which of your thousands of URLs are important, a model reads your llms.txt and goes straight to your documentation, guides, and key product pages.
Why llms.txt Matters for AI Search (GEO)
Generative Engine Optimization (GEO) — getting cited and recommended by AI answer engines — is becoming as important as ranking in classic search. llms.txt helps in three ways:
- Context efficiency. Models have limited context windows. A focused llms.txt lets them ingest your best content without burning tokens on boilerplate.
- Accurate citations. When you point AI to authoritative, up-to-date pages, you reduce the chance it cites the wrong or outdated content about your brand.
- Discoverability. It signals which sections (docs, guides, API reference) deserve attention, increasing the odds your content shows up in AI answers.
The llms.txt Format Explained
The spec is intentionally simple — it's just markdown, parsed in a fixed order:
| Element | Markdown | Purpose |
|---|---|---|
| Title | # H1 (exactly one) | Your site or project name |
| Summary | > blockquote | A one-line description of the site |
| Details | Free-form paragraphs | Optional background context (no headings) |
| Sections | ## H2 + bullet links | Named lists of links, e.g. Docs, Guides |
| Optional | ## Optional section | Links models may skip when context is tight |
Each link follows the pattern - [Title](https://url): optional description. The description is a short note explaining what the page covers.
How to Write a Good llms.txt File
1. Lead with a sharp summary
The blockquote is the first thing a model reads. Write one clear sentence that states what your site does and who it's for. Avoid marketing fluff — be specific and factual.
2. Curate, don't dump
Resist the urge to list every page. Include only the pages you'd want an AI to learn from and cite: documentation, foundational guides, your core product or service pages, and key reference material.
3. Group links into clear sections
Use H2 sections like Docs, Guides, API, and About so models understand the role of each link. Add a short description to each link explaining its value.
4. Use the Optional section wisely
Put secondary links — changelogs, legal pages, archived posts — under a ## Optional heading. Models that are short on context are allowed to skip this section entirely.
5. Keep URLs absolute and current
Use full https:// URLs (or let a generator resolve relative paths against your base URL) so links work regardless of where the file is read. Review the file whenever your site structure changes.
llms.txt vs robots.txt vs sitemap.xml
These three files serve different audiences and shouldn't be confused:
- robots.txt tells crawlers what they are and aren't allowed to access. It's about permission.
- sitemap.xml lists every indexable URL for traditional search engines. It's about completeness.
- llms.txt curates your best content for AI models in human-readable markdown. It's about clarity and prioritization.
You should keep all three. They complement each other rather than replace one another.
Where to Put llms.txt and What Comes Next
Host the file at your domain root so it resolves at https://yoursite.com/llms.txt. Many teams also publish an llms-full.txt with the full text of key pages inlined, for models that want everything in one fetch. Start with a clean, well-curated llms.txt, validate that every link works, and update it as your site grows.
Expert Tips
Curate, never dump
List only the pages you want an AI to learn from and cite — docs, foundational guides, and core product pages. A short, sharp file beats an exhaustive one.
Write the summary like a snippet
The blockquote is the first thing a model reads. Make it one factual sentence stating what your site does and who it serves — no marketing fluff.
Frequently Asked Questions
What is an llms.txt file?
An llms.txt file is a markdown document hosted at your domain root that gives AI models like ChatGPT and Perplexity a curated, clutter-free summary of your site and links to your most important pages, so they can read and cite your content efficiently.
Where should I put my llms.txt file?
Place it at the root of your domain so it is accessible at https://yoursite.com/llms.txt. That is the conventional location AI crawlers check, exactly like robots.txt.
Is llms.txt the same as robots.txt?
No. robots.txt controls crawler access (what bots may fetch), while llms.txt curates content for AI models in readable markdown (what they should read first). They work together — keep both.
Do AI crawlers actually use llms.txt?
Adoption is growing as the standard matures. Even where a specific model does not yet read it automatically, a well-structured llms.txt is low-cost, future-proofs your site for AI search, and helps any tool or agent you point at your content.