Sitemap Best Practices: The Complete Guide

15 MIN READ
Last updated: June 6, 2026

Understand with AI

Discuss with your preferred AI assistant

Your sitemap is one of the quietest parts of your website. No visitor ever sees it, but search engine crawlers? They use it constantly. Get it wrong, and pages that should rank won't even get indexed. Get it right, and you give Googlebot a clear, efficient map of everything worth crawling on your site.

This guide covers sitemap best practices for 2026, walks you through how to create a sitemap from scratch, and shows you how to avoid the mistakes that silently hurt your organic rankings.

What Is a Sitemap and Why Does It Matter in 2026

A sitemap is a file that lists the URLs on your website so search engines can find, crawl, and index your content faster. Think of it as a table of contents for your entire site. You're telling Google, Bing, and other crawlers exactly what exists, where it lives, and when it was last updated.

In 2026, with AI-driven search results and answer engines pulling content at scale, getting your pages indexed quickly matters more than ever. A properly built sitemap helps you do that.

XML vs. HTML Sitemaps

There are two main types, and they serve very different purposes.

XML sitemaps are built for search engines. They're machine-readable files that sit at a URL like yourdomain. com/sitemap. xml. Search engine bots read them directly. You don't write them by hand (well, you could, but you probably shouldn't).

HTML sitemaps are built for people. They're a page on your site that lists your main content in a readable format, organized by category. They help users find what they need and give search engines one more page to crawl.

Both have value, but when most SEOs talk about sitemaps, they mean the XML version. That's what we'll focus on here.

Why Search Engines Still Rely on Sitemaps

Google's crawler can technically find most of your pages without a sitemap. It follows links, but "technically can" and "actually does quickly" are very different things.

New pages, orphaned content, and deep URLs that don't get many internal links all benefit enormously from being listed in a sitemap. Without one, Google might take weeks to discover a new blog post you published today.

With a properly maintained sitemap? It can be crawled within hours.

That speed matters for news sites, e-commerce stores with seasonal products, and any site publishing content at volume.

Sitemap Best Practices Every SEO Professional Should Know

There's a lot of bad advice floating around about sitemaps. Some of it's outdated. Some of it was never accurate. Here are the practices that actually hold up in 2026.

Keep Your Sitemap Clean and Crawlable

Your sitemap should only include URLs you want indexed. That sounds obvious, but a surprising number of sites include redirect chains, broken links, and pages with noindex tags in their sitemaps.

That's a waste of Google's crawl budget, and it sends mixed signals. You're essentially saying "please index this page" with your sitemap while also saying "please don't index this page" with your noindex tag. Don't do that.

Rules for a clean sitemap:

  • Only include URLs that return a 200 HTTP status code
  • Never include redirect URLs (301s or 302s)
  • Never include pages with a noindex directive
  • Never include duplicate content URLs
  • Only include canonical URLs

Run a quick crawl of your sitemap at least once a month. Tools like Screaming Frog or your SEO platform can flag these issues fast.

Set Priority and Change Frequency Correctly

The XML sitemap format lets you set two optional values for each URL: < priority>and < changefreq>.

Honestly? Google has said publicly that it largely ignores these values, but that doesn't mean you should skip them entirely. Some other search engines still use them, and being thoughtful about them signals good practice.

Here's a sensible approach:

  • Set your homepage priority to 1.0
  • Set main category pages and key landing pages to 0.8
  • Set blog posts and standard pages to 0.5 or 0.6
  • Set rarely-updated archive pages to 0.3

For changefreq, be honest. Don't set everything to "daily" to seem more active. If a page changes once a month, say "monthly." If it rarely changes, say "yearly."

Stay Within Sitemap Size Limits

Google's sitemap size limits are firm. Know them.

Limit TypeMaximum Value
URLs per sitemap file50,000
File size (uncompressed)50 MB
File size (compressed)50 MB

If your site exceeds these limits, you need a sitemap index file. We'll cover that shortly.

Most small to mid-size sites never hit these limits, but e-commerce platforms with thousands of product pages and large publishing sites can reach them quickly.

Always Use Canonical URLs

Every URL in your sitemap should match its canonical version exactly. If your canonical tag points to https://www. yourdomain. com/page/, that's what goes in the sitemap. Not http://yourdomain. com/page. Not https://yourdomain. com/page.

Trailing slashes, HTTP vs. HTTPS, www vs. non-www: these all count. Inconsistency here creates duplicate content issues and confuses crawlers about which version of a page to index.

Check your sitemap URLs against your canonical tags regularly. It's one of those things that slips through the cracks when developers make site changes.

How to Create a Sitemap Step by Step

Ready to build one? Here's how to create a sitemap that actually works.

Choose the Right Sitemap Format

Before you generate anything, decide what type of sitemap you need. For most sites, a standard XML sitemap works fine, but depending on your content, you might need:

  • XML sitemap for standard web pages
  • Image sitemap for sites with lots of visual content
  • Video sitemap for sites hosting video content
  • News sitemap for Google News publishers

You can include image and video data inside your main XML sitemap using Google's extensions, or create separate sitemap files for each content type.

Generate Your Sitemap File

You don't need to write XML by hand. Several solid tools will generate your sitemap automatically.

If you're on WordPress, the Yoast SEO or Rank Math plugins generate and update your sitemap automatically. Easy.

If you're on a custom-built site or a platform like Webflow or Shopify, check if your CMS has a built-in sitemap generator. Most do. For Shopify sites, yourdomain. com/sitemap. xmlis generated automatically.

For static sites or custom builds, tools like these work well:

  • Screaming Frog SEO Spider (crawl-based generation)
  • XML-sitemaps. com (free for sites under 500 pages)
  • Sitemap Generator by Semrush
  • Your hosting platform's built-in tools

Pro tip: Whatever tool you use, make sure it respects your noindex pages and doesn't include them in the output.

Validate and Test Your Sitemap

Before you submit anything, validate the file. A single formatting error in your XML can cause the whole sitemap to fail silently.

Here's how to check it:

  1. Open your sitemap URL in a browser. It should display formatted XML, not a download prompt.
  2. Use Google Search Console's sitemap report to check for errors.
  3. Run your sitemap through an online XML validator to catch formatting issues.
  4. Check that every URL in the sitemap loads with a 200 status code.
  5. Confirm that no noindex pages have slipped in.

Don't skip this step. A broken sitemap is worse than no sitemap in some ways because it wastes crawl budget trying to process a malformed file.

Submit Your Sitemap to Search Engines

Once your sitemap is live and validated, tell search engines about it. You've got two options.

Option 1: Submit through Search Console. Go to Google Search Console, navigate to Sitemaps under the Index section, and paste in your sitemap URL. For Bing, use Bing Webmaster Tools and do the same.

Option 2: Reference it in robots. txt. Add a line like this to your robots. txt file:

Sitemap: https://www. yourdomain. com/sitemap. xml

This tells any crawler that visits your robots. txt about your sitemap, not just Google and Bing. It's a good habit even if you've already submitted through Search Console.

Do both. There's no downside.

Sitemap Index Files and Large Websites

If your site has more than 50,000 URLs or your sitemap file exceeds 50 MB, you need to split things up. That's where sitemap index files come in.

When to Use a Sitemap Index

A sitemap index is essentially a sitemap of sitemaps. Instead of one giant XML file, you create multiple individual sitemap files and then reference all of them in a single index file.

The index file looks like this in structure:

  • sitemap-index. xml (the master file)
  • sitemap-pages. xml (your static pages)
  • sitemap-posts. xml (your blog content)
  • sitemap-products. xml (your product pages)

You submit just the index file to Google Search Console. Google then follows the references and crawls each individual sitemap file.

Most small businesses never need this. If you're running a large e-commerce store, a content-heavy media site, or any platform with tens of thousands of pages, this is the structure you want.

Organizing Sitemaps by Content Type

Even if you don't hit the size limits, splitting your sitemap by content type can be useful. Why? Because Google Search Console shows crawl stats per sitemap file. So if there's a crawlability issue with your product pages, you'll see it immediately rather than having to dig through a single massive file.

A smart split for a typical content site:

  • Core pages (homepage, about, contact, services)
  • Blog posts and articles
  • Category and tag pages (only if they're indexable)
  • Image-heavy pages (if you want image sitemap data)

This structure makes debugging much faster. It also lets you prioritize which sitemaps to resubmit when you make changes in a specific area of the site.

Common Sitemap Mistakes and How to Fix Them

Most sitemap errors aren't dramatic. They're quiet. They just silently reduce your crawl efficiency month after month until someone runs an audit and figures out what's wrong.

Here are the most common ones to watch out for.

Including Noindex Pages

This is probably the most common sitemap mistake, and it's completely avoidable.

If a page has a noindex tag, you're telling Google not to index it, but if it's also in your sitemap, you're telling Google to please come look at it. Those two signals contradict each other.

Google won't penalize you for it, but it does waste crawl budget, and Google has mentioned it as a signal of poor site quality.

Fix: Audit your sitemap quarterly. Filter out any URL where the page response includes a noindex directive.

Forgetting to Update After Site Changes

You redesign your site. You move content around. You delete old pages and create new ones, and your sitemap still points to URLs that redirect or return 404 errors.

Real talk: this happens on almost every site that's been around for more than two years. It's not laziness. It's just that sitemaps are easy to forget.

Fix: Make sitemap maintenance part of any site change workflow. If you're launching new content, check that it's in the sitemap. If you're retiring pages, remove them.

Set a recurring reminder to run a sitemap audit every 90 days minimum.

Missing Image and Video Sitemaps

Standard XML sitemaps don't tell Google about your images or videos. You need to use Google's sitemap extensions for that, or create separate sitemap files.

If your site relies heavily on images (photography portfolios, recipe sites, product catalogs), missing image sitemaps means Google Image Search won't reliably index your visuals. That's traffic you're leaving on the table.

Same goes for video content. If you host video on your own site, a video sitemap helps Google index it for video search results.

Fix: Add image sitemap data to your existing sitemap using Google's image namespace extensions. For video, use the video namespace or create a dedicated video sitemap. Most WordPress SEO plugins handle this automatically if you enable the feature.

Semly Pro: Sitemap and SEO Content Management in 2026

Getting your sitemap right is part of a bigger picture. Crawlability means nothing if the content Google finds isn't strong enough to rank. That's where Semly Pro comes in.

How Semly Pro Fits Into Your SEO Workflow

Semly Pro is an AI-powered SEO content platform built for teams that need to publish at scale without sacrificing quality. While your sitemap handles the technical side of helping Google discover your content, Semly Pro handles the content itself.

Here's what you get on the Pro plan at €139/month:

  • 40 long-form SEO articles per month
  • 25 AI tracking prompts per month
  • 1 project with 1 team seat
  • Publishing to 12 CMS platforms
  • AI visibility score and competitor detection
  • Email support

Scale up to the Business Pro plan at €229/month and you get 100 articles per month, 3 projects, advanced AI metrics, LLMs. txt generation, data export in CSV and JSON, roles and permissions, and priority support with a 24-hour response time, and if you'd rather have your entire SEO operation managed for you, the Managed SEO plan at €469/month puts a dedicated strategist on your account. They run the content, the AI visibility tracking, the schema optimization, and the performance reviews.

The point is simple: your sitemap gets pages discovered. Semly Pro makes sure those pages are worth discovering.

Semly Pro vs. Other SEO Tools

Here's a feature comparison of Semly Pro against other tools you might be considering.

FeatureSemly ProSemrushAhrefsSurfer SEOJasperFraseWritesonicSE RankingNightwatch
Long-form SEO article generationYes (40-100/mo)LimitedNoYesYesYesYesLimitedNo
AI visibility scoreYesNoNoNoNoNoNoNoNo
LLMs. txt generationYesNoNoNoNoNoNoNoNo
CMS publishing (12 platforms)YesNoNoNoVariesLimitedVariesNoNo
AI competitor detectionYesYesYesNoNoNoNoYesNo
Managed SEO optionYes (€469/mo)NoNoNoNoNoNoNoNo
Custom brand voiceYesNoNoLimitedYesLimitedYesNoNo
Schema optimizationYes (Managed)NoNoNoNoNoNoNoNo

Bottom line: most tools do one thing well. Semly Pro ties together content creation, AI search visibility, and publishing in a single platform. That's what makes it worth considering if you're serious about growing organic traffic in 2026.

You can start with a 7-day free trial, no commitment needed.

Frequently Asked Questions

What is a sitemap in SEO?

A sitemap is a file, usually in XML format, that lists the URLs on your website. It helps search engines like Google discover and index your pages faster. It doesn't guarantee indexing, but it does speed up the discovery process, especially for new or deep pages that don't get many internal links.

How often should I update my sitemap?

You should update your sitemap every time you add, remove, or significantly change pages on your site. If you're using a CMS like WordPress with an SEO plugin, your sitemap likely updates automatically. If not, set a routine to check and refresh it at least once a month. Large sites that publish content daily should have an automated process that updates the sitemap in real time.

Does Google require a sitemap?

No, Google doesn't require one, but Google recommends having one, especially if your site is large, new, or has lots of pages that aren't linked from other pages. Without a sitemap, Google relies entirely on link discovery, which can miss content or delay indexing by weeks.

What's the difference between an XML sitemap and an HTML sitemap?

An XML sitemap is for search engine bots. It's a machine-readable file that lists your URLs in a structured format. An HTML sitemap is a regular web page that lists your site's content in a format human visitors can read and use. Both are useful, but XML sitemaps are the primary focus for technical SEO.

How do I know if my sitemap is working?

Check Google Search Console. After you submit your sitemap, the Sitemaps report shows how many URLs were submitted versus how many were indexed. If there's a big gap, that's a signal that some URLs aren't being indexed and you need to investigate why. You can also check for errors in the report that flag specific problems with your sitemap file.

Can a sitemap hurt my SEO?

A badly built sitemap can. If your sitemap includes pages with noindex tags, broken URLs, or redirect chains, it wastes Google's crawl budget and sends conflicting signals. Keep your sitemap clean by only including indexable, canonical URLs that return 200 status codes. A well-maintained sitemap won't hurt you. A messy one can slow down your indexing over time.

How many URLs can a sitemap hold?

Each individual sitemap file can hold a maximum of 50,000 URLs and must be under 50 MB in size. If your site exceeds either limit, you need to create a sitemap index file that references multiple individual sitemap files. You can then submit just the index file to Google Search Console.

Should I include noindex pages in my sitemap?

No. Never include noindex pages in your sitemap. If a page has a noindex directive, it means you don't want Google to index it. Including it in your sitemap contradicts that signal. While Google won't penalize you directly, it wastes crawl budget and can be a sign of poor site hygiene that affects how efficiently Google crawls your site overall.

What tools can I use to create a sitemap?

There are several solid options depending on your setup. WordPress users can rely on Yoast SEO or Rank Math, which generate and maintain sitemaps automatically. Shopify generates a sitemap automatically at /sitemap. xml. For custom builds, tools like Screaming Frog, XML-sitemaps. com, or the sitemap tools inside Semrush work well. Always validate your sitemap after generating it to check for errors before submitting.

Semly Pro doesn't generate sitemaps directly, but it plays a key supporting role in your overall SEO setup. It helps you produce the long-form content that your sitemap points search engines toward, tracks your AI visibility score so you know how well your pages appear in AI-powered search results, and handles schema and LLMs. txt optimization on the Managed SEO plan. Essentially, Semly Pro makes sure the content your sitemap surfaces is strong enough to rank and get cited by AI search tools in 2026.