FREE TOOL · NO SIGNUP TO PREVIEW

Generate a Robots.txt File in Seconds

Build a valid robots.txt with user-agent groups, Allow/Disallow rules, crawl-delay and your sitemap — then copy or download robots.txt in one click.

robots.txt builder

Let every crawler index the whole site.

Group 1

No sitemap added yet.

Example output — pick a preset or edit a group on the left to generate your own robots.txt.
2 groups7 rules

What this does

Lets all crawlers index the site except 5 disallowed paths, with 1 crawler-specific override, and points crawlers to 1 sitemap.

robots.txt

# robots.txt generated by SemlyPro — https://www.semlypro.com/seo-ai-tools/robots-txt-generator
# Place this file at the root of your domain: https://example.com/robots.txt

User-agent: *
Allow: /wp-admin/admin-ajax.php
Disallow: /cart/
Disallow: /checkout/
Disallow: /wp-admin/
Disallow: /*?*sort=
Disallow: /search/

User-agent: GPTBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

Save this as robots.txt in the root of your domain so it is served at https://yoursite.com/robots.txt.

Validation

"Disallow: /" blocks GPTBot from the whole site. Remove this before launch or your pages will be dropped from search.

Directive cheat-sheet

  • User-agent — which crawler the rules below apply to (* = all).
  • Disallow / Allow — block or permit a URL path; the most specific match wins.
  • Crawl-delay — seconds between requests (Bing/Yandex; Google ignores it).
  • Sitemap — absolute URL of your XML sitemap so crawlers find every page.
The Complete Guide

How to Write a Robots.txt File That Helps (Not Hurts) Your SEO

6 MIN READ

Understand with AI

Discuss with your preferred AI assistant

#1
First file crawlers read

robots.txt is typically the first request a crawler makes on your domain, before any page.

0%
Crawl-delay support

Google honours Crawl-delay 0% of the time — it is ignored, unlike Bing and Yandex.

/robots.txt
Required location

The file is only read at the domain root; a copy in any subfolder is ignored entirely.

Your robots.txt file is the first thing most search-engine crawlers read when they arrive at your site. It is a plain-text file that lives at the root of your domain — https://example.com/robots.txt — and it tells crawlers which parts of your site they may request and which they should leave alone. Get it right and you steer crawl budget toward the pages that matter; get it wrong and you can accidentally hide your whole site from Google.

This guide explains exactly what robots.txt does, the directives it supports, how to write one that follows the Robots Exclusion Protocol (now formalised as RFC 9309), and the mistakes that quietly tank rankings. Use the generator above to build a valid file in seconds, then keep this as your reference.

What Is a Robots.txt File?

Robots.txt is a set of instructions for automated crawlers, grouped by user-agent (the name a crawler identifies itself by, such as Googlebot or Bingbot). Within each group you list Allow and Disallow rules that match URL paths. Crawlers that respect the protocol read the file before fetching pages and skip anything you have disallowed.

One thing it is not: a security or privacy tool. Disallowing a path stops compliant crawlers from requesting it, but the URL can still be discovered and the page can still be indexed if other sites link to it. To keep a page out of search results entirely, use a noindex meta tag or HTTP header — not robots.txt. And never rely on robots.txt to hide sensitive data; anyone can read the file.

The Core Directives

A handful of directives cover almost every real-world need:

  • User-agent — names the crawler the following rules apply to. * matches every crawler that doesn't have its own group.
  • Disallow — blocks a URL path. Disallow: /admin/ blocks that folder; an empty Disallow: blocks nothing (i.e. allow all).
  • Allow — carves an exception out of a broader Disallow, e.g. allowing one file inside an otherwise-blocked folder.
  • Crawl-delay — asks a crawler to wait N seconds between requests. Bing and Yandex honour it; Google ignores it (set crawl rate in Search Console instead).
  • Sitemap — points crawlers to your XML sitemap. It must be an absolute URL and can appear anywhere in the file.

Google supports two pattern characters: * matches any sequence of characters, and $ anchors a rule to the end of the URL. So Disallow: /*.pdf$ blocks every URL that ends in .pdf.

How Crawlers Choose Which Rule Wins

This trips up even experienced SEOs. Crawlers do not read rules top to bottom; for Google and most modern crawlers, the most specific (longest matching) rule wins. If you have Disallow: /blog/ and Allow: /blog/launch-post, the longer Allow rule lets that one post through while the rest of the folder stays blocked. When two rules are equally specific, Allow wins. Knowing this means you can write tight, intentional rules instead of fighting line order.

How to Write a Robots.txt File, Step by Step

1. Decide what to block — and why

Most sites should let crawlers index nearly everything. Reserve Disallow for low-value or duplicate URLs: internal search results, faceted-navigation parameters, cart and checkout pages, and admin areas. Blocking these focuses crawl budget on pages you actually want ranked.

2. Create your user-agent groups

Start with a single User-agent: * group that covers all crawlers. Add a dedicated group only when a specific bot needs different treatment — for example, blocking AI training crawlers like GPTBot or CCBot while still allowing Googlebot.

3. Add Allow exceptions where needed

If a blocked folder contains a resource crawlers need (a common WordPress example is /wp-admin/admin-ajax.php), add an Allow rule so rendering isn't broken.

4. Declare your sitemap

Add a Sitemap: line with the full URL of your XML sitemap. This is the single highest-leverage addition for discovery — it helps crawlers find every canonical URL fast.

5. Validate and deploy

Check for the classic mistakes (below), then upload the file to your domain root so it serves at /robots.txt. Re-test it in Google Search Console's robots.txt report after going live.

Common Robots.txt Mistakes

MistakeWhy it hurts
Leaving Disallow: / from stagingBlocks the entire site — the #1 cause of sudden de-indexing after launch.
Using robots.txt to "hide" a pageBlocked URLs can still be indexed via external links; use noindex instead.
Blocking CSS or JSStops Google from rendering the page, which can hurt rankings.
Wrong file location or nameIt must be lowercase robots.txt at the domain root to be read.
Relying on Crawl-delay for GoogleGoogle ignores it; control crawl rate in Search Console.

Robots.txt Best Practices

  • Keep it minimal — block only what genuinely shouldn't be crawled.
  • Always include a Sitemap: line pointing to your live XML sitemap.
  • Never block resources (CSS, JS, images) Google needs to render pages.
  • Use noindex, not Disallow, to remove a page from search results.
  • Re-check the file after every redesign or migration.

Expert Tips

Block low-value URLs, not content

Reserve Disallow for internal search, faceted-navigation parameters, cart and admin paths. Focusing crawl budget on your real content beats blocking pages you actually want ranked.

Test before — and after — you launch

A leftover "Disallow: /" from staging is the most common cause of sudden de-indexing. Validate the file here, then re-check it in Search Console once it is live.

Frequently Asked Questions

Where does the robots.txt file go?

It must sit at the root of your domain and be served at https://yoursite.com/robots.txt. A robots.txt in a subfolder is ignored. Each subdomain needs its own file, and the filename must be all lowercase.

Does robots.txt stop a page from being indexed?

No. Disallow stops compliant crawlers from fetching the page, but the URL can still appear in search results if other pages link to it. To guarantee a page is excluded from search, use a noindex meta tag or X-Robots-Tag header — and make sure that page is not blocked in robots.txt, or the crawler can't see the noindex.

Does Google respect the Crawl-delay directive?

No. Google ignores Crawl-delay entirely. Bing, Yandex and several other crawlers honour it. To slow Googlebot, adjust the crawl rate setting in Google Search Console instead.

Can I block AI crawlers like GPTBot in robots.txt?

Yes. Compliant AI crawlers publish their user-agent tokens — for example GPTBot, CCBot, ClaudeBot and Google-Extended. Add a dedicated group with Disallow: / for each token you want to keep out, while leaving your search-engine groups untouched.

Related guides

Related tools