Yes, this tool is 100% free with no signup required. All features are available instantly in your browser.

Do I need to create an account?

No signup or account needed. The tool runs entirely in your browser with instant results.

Is my data stored anywhere?

No, all processing happens locally in your browser. We do not store, collect, or transmit your input data to any servers.

Free Robots.txt Generator — Build robots.txt with Rules & Sitemaps | SemlyPro

robots.txt builder

Start from a preset

Let every crawler index the whole site.

Crawler groups

Group 1

User-agent

Disallow (one path per line)

Allow (one path per line)

Crawl-delay (seconds, optional)

Sitemap URLs

No sitemap added yet.

Preferred host (Yandex only, optional)

Include a comment header

# robots.txt generated by SemlyPro — https://www.semlypro.com/seo-ai-tools/robots-txt-generator # Place this file at the root of your domain: https://example.com/robots.txt User-agent: * Allow: /wp-admin/admin-ajax.php Disallow: /cart/ Disallow: /checkout/ Disallow: /wp-admin/ Disallow: /*?*sort= Disallow: /search/ User-agent: GPTBot Disallow: / Sitemap: https://example.com/sitemap.xml

Your robots.txt file is the first thing most search-engine crawlers read when they arrive at your site. It is a plain-text file that lives at the root of your domain — https://example.com/robots.txt — and it tells crawlers which parts of your site they may request and which they should leave alone. Get it right and you steer crawl budget toward the pages that matter; get it wrong and you can accidentally hide your whole site from Google.

This guide explains exactly what robots.txt does, the directives it supports, how to write one that follows the Robots Exclusion Protocol (now formalised as RFC 9309), and the mistakes that quietly tank rankings. Use the generator above to build a valid file in seconds, then keep this as your reference.

What Is a Robots.txt File?

Robots.txt is a set of instructions for automated crawlers, grouped by user-agent (the name a crawler identifies itself by, such as Googlebot or Bingbot). Within each group you list Allow and Disallow rules that match URL paths. Crawlers that respect the protocol read the file before fetching pages and skip anything you have disallowed.

One thing it is not: a security or privacy tool. Disallowing a path stops compliant crawlers from requesting it, but the URL can still be discovered and the page can still be indexed if other sites link to it. To keep a page out of search results entirely, use a noindex meta tag or HTTP header — not robots.txt. And never rely on robots.txt to hide sensitive data; anyone can read the file.

The Core Directives

A handful of directives cover almost every real-world need:

User-agent — names the crawler the following rules apply to. * matches every crawler that doesn't have its own group.
Disallow — blocks a URL path. Disallow: /admin/ blocks that folder; an empty Disallow: blocks nothing (i.e. allow all).
Allow — carves an exception out of a broader Disallow, e.g. allowing one file inside an otherwise-blocked folder.
Crawl-delay — asks a crawler to wait N seconds between requests. Bing and Yandex honour it; Google ignores it (set crawl rate in Search Console instead).
Sitemap — points crawlers to your XML sitemap. It must be an absolute URL and can appear anywhere in the file.

Google supports two pattern characters: * matches any sequence of characters, and $ anchors a rule to the end of the URL. So Disallow: /*.pdf$ blocks every URL that ends in .pdf.

How Crawlers Choose Which Rule Wins

This trips up even experienced SEOs. Crawlers do not read rules top to bottom; for Google and most modern crawlers, the most specific (longest matching) rule wins. If you have Disallow: /blog/ and Allow: /blog/launch-post, the longer Allow rule lets that one post through while the rest of the folder stays blocked. When two rules are equally specific, Allow wins. Knowing this means you can write tight, intentional rules instead of fighting line order.

How to Write a Robots.txt File, Step by Step

1. Decide what to block — and why

Most sites should let crawlers index nearly everything. Reserve Disallow for low-value or duplicate URLs: internal search results, faceted-navigation parameters, cart and checkout pages, and admin areas. Blocking these focuses crawl budget on pages you actually want ranked.

2. Create your user-agent groups

Start with a single User-agent: * group that covers all crawlers. Add a dedicated group only when a specific bot needs different treatment — for example, blocking AI training crawlers like GPTBot or CCBot while still allowing Googlebot.

3. Add Allow exceptions where needed

If a blocked folder contains a resource crawlers need (a common WordPress example is /wp-admin/admin-ajax.php), add an Allow rule so rendering isn't broken.

4. Declare your sitemap

Add a Sitemap: line with the full URL of your XML sitemap. This is the single highest-leverage addition for discovery — it helps crawlers find every canonical URL fast.

5. Validate and deploy

Check for the classic mistakes (below), then upload the file to your domain root so it serves at /robots.txt. Re-test it in Google Search Console's robots.txt report after going live.

Common Robots.txt Mistakes

Mistake	Why it hurts
Leaving Disallow: / from staging	Blocks the entire site — the #1 cause of sudden de-indexing after launch.
Using robots.txt to "hide" a page	Blocked URLs can still be indexed via external links; use noindex instead.
Blocking CSS or JS	Stops Google from rendering the page, which can hurt rankings.
Wrong file location or name	It must be lowercase robots.txt at the domain root to be read.
Relying on Crawl-delay for Google	Google ignores it; control crawl rate in Search Console.

Robots.txt Best Practices

Keep it minimal — block only what genuinely shouldn't be crawled.
Always include a Sitemap: line pointing to your live XML sitemap.
Never block resources (CSS, JS, images) Google needs to render pages.
Use noindex, not Disallow, to remove a page from search results.
Re-check the file after every redesign or migration.

Expert Tips

Block low-value URLs, not content

Reserve Disallow for internal search, faceted-navigation parameters, cart and admin paths. Focusing crawl budget on your real content beats blocking pages you actually want ranked.

Test before — and after — you launch

A leftover "Disallow: /" from staging is the most common cause of sudden de-indexing. Validate the file here, then re-check it in Search Console once it is live.

Frequently Asked Questions

Where does the robots.txt file go?

It must sit at the root of your domain and be served at https://yoursite.com/robots.txt. A robots.txt in a subfolder is ignored. Each subdomain needs its own file, and the filename must be all lowercase.

Does robots.txt stop a page from being indexed?

No. Disallow stops compliant crawlers from fetching the page, but the URL can still appear in search results if other pages link to it. To guarantee a page is excluded from search, use a noindex meta tag or X-Robots-Tag header — and make sure that page is not blocked in robots.txt, or the crawler can't see the noindex.

Does Google respect the Crawl-delay directive?

No. Google ignores Crawl-delay entirely. Bing, Yandex and several other crawlers honour it. To slow Googlebot, adjust the crawl rate setting in Google Search Console instead.

Can I block AI crawlers like GPTBot in robots.txt?

Yes. Compliant AI crawlers publish their user-agent tokens — for example GPTBot, CCBot, ClaudeBot and Google-Extended. Add a dedicated group with Disallow: / for each token you want to keep out, while leaving your search-engine groups untouched.

Generate a Robots.txt File in Seconds