Your guide to sitemaps: best practices for crawling and indexing your site

2024-11-29 22:00:04

When building a website, you want to ensure search engines can easily navigate and index its pages. The same goes when you create a new page or make changes to your site’s structure, such as adding a blog. That’s where a sitemap comes in handy.

This file serves as a roadmap for search engine crawlers to support your SEO efforts. In some cases, it can also help users discover more of your content and improve their experience.  Not every website needs a sitemap, but having one doesn’t hurt. Here’s what you should know about it. 

What is a sitemap?

A sitemap is a file that describes your website’s structure. It lists the pages and files on a domain and how they relate to one another. 

Google and other search engines use this data to better understand your content, such as which pages are most important. Or which version of a page you want them to index and rank. 

You can also build a sitemap for human visitors to help them navigate your site and get the information they need more easily. 

To keep it simple, sitemaps allow you to organize your website content. Depending on their type and purpose, they may contain the following:

  • A list of page URLs
  • The hierarchy of pages
  • The date a URL was last modified
  • Page change frequency (e.g., weekly or monthly)
  • The language of each page
  • Image/video entries
  • News entries

Why sitemaps matter for SEO

Neither users nor search engines need sitemaps to discover your content. For example, search engine crawlers can follow internal links to locate and index web pages. 

But a sitemap makes it easier for search engines like Google and Bing to find your key pages, including those that are several clicks away from the homepage. It also helps them discover orphan pages, which don’t have links from other pages on the same domain. 

For your website content to appear in search engine results pages (SERPs), that content must be accessible to search engine crawlers (e.g Googlebot Smartphone, Googlebot desktop, Bingbot etc.) Sitemaps are an additional chance to surface your website URLs directly to crawlers.

The result? Better discoverability, faster indexing, and, potentially, higher traffic. 

Sitemaps also provide insights into your site, such as the frequency of updates or the type of content on a given page. 

This data allows search engines to crawl your site more effectively, which can boost its visibility. 

Daniel Waisberg, a Search Advocate at Google, recommends creating a sitemap if:

  • You have a large website (e.g., an online shop with hundreds of products)
  • Your site is new or undergoing frequent changes
  • Your pages are isolated (aka “orphan” pages)

In these circumstances, you’re more likely to benefit from a sitemap than, say, someone running a one-page website. 

Types of sitemaps: XML vs. HTML

There are two types of sitemap files: XML and HTML. You can use one or both, depending on your goals. 

What is an XML sitemap? 

If your focus is on SEO, start with an XML sitemap. This text file is targeted to search engines, helping with crawlability and indexing. 

It looks like this: 

XML sitemaps contain links to a website’s pages and information related to their content. They usually reside in the domain’s root directory (e.g., https://www.yourwebsite.com/sitemap.xml), as shown above.

You can hide your sitemap from visitors or competitors by placing it in a subfolder. Or simply change the file name. 

Types of XML sitemaps

Use specialized XML sitemaps for media files to ensure Google can find and index them. You can also create a news sitemap for time-sensitive posts to get them indexed faster. 

That said, let’s take a quick look at sitemap types and why they matter:

  • Image sitemaps: These XML files help search engines locate the images on your site. They’re particularly helpful for websites with lots of pictures, such as photography blogs or online stores. Use them to boost your chances of appearing in Google Image Search. 
  • Video sitemaps: These provide additional information about your website’s video content. For example, you can include the video title, category, length, and age ratings. Create a video sitemap to ensure web crawlers will index your videos.
  • News sitemaps: Google recommends using a news sitemap for articles published over the past two days. Use this tactic to get your content indexed promptly and improve your odds of showing up in Google News. Basically, you’ll create the sitemap and then update it with fresh content. Also, remove any URLs older than two days from your sitemap or delete their <news:news> tag. 

Alternatively, add image, video, or news sitemap tags to your main sitemap using the <image:image> tag.

A standard XML sitemap cannot exceed 50,000 URLs or 50 MB. So, if you have many images, videos, or news articles, it makes sense to create specialized sitemaps. You can find the full list of all tags and protocols on the Schema.org site. The best approach depends on our website’s size.

What is an HTML sitemap?

HTML sitemaps serve as navigational tools for visitors. These files look like regular web pages with links to the about page, contact page, product pages, and so on. 

Here’s an example of an HTML sitemap from Nike:

In 2022, Google Search Analyst John Muller said HTML sitemaps “should never be needed.” When a website has good navigation, visitors don’t need a sitemap to find their way around. 
Some people may check this file to find specific resources, such as archived content. But they could use your website’s search function instead. 

Whether or not you should have an HTML sitemap is subject to debate within SEO circles. Building one takes minutes, and it won’t hurt your SEO or the user experience. 

Adding sitemaps for accessibility

Web Content Accessibility Guidelines (WCAG) are the widely accepted standards for making digital content open to everyone, factoring in special considerations for people with disabilities. WCAG calls for the use of sitemaps as part of its guideline for navigability, or providing ways for users to easily find content when using assistive technologies to access web content. Assistive technologies ‘read’ page content to vision impaired users, which can be a frustrating experience on sites that contain lots of images and items between navigable elements.

Specifically, sitemaps serve WCAG section 2.4.5, which reads:

More than one way is available to locate a Web page within a set of Web pages except where the Web Page is the result of, or a step in, a process. (Level AA)

Meeting WCAG guidelines is not a direct ranking factor. But when you prioritize accessibility, you are enhancing your user experience and reaching a broader audience.

How to create a sitemap

First, decide whether to build a sitemap manually or automate the whole process. Manually-built sitemaps work best for small websites with fewer than 100 pages. Larger websites may want to automate the process to save time. 

How to manually create an XML sitemap

To manually create an XML sitemap, enter your URLs along with relevant metadata, such as the “lastmod” tag (last modified date), in a text editor. For example, you could use TextEdit or Windows Notepad. Follow the Sitemap protocol to make sure everything looks right. 

How to build an HTML sitemap

To build an HTML sitemap, list the pages on your site. Organize them in a logical hierarchy, starting with the most important ones. Then, ask a developer to create an HTML page based on your list and upload it to your website. 

You can also use plugins or other online tools to automatically generate a sitemap and save time. This option is suitable for websites of any size and requires little or no coding. 

Here’s how to do it. 

Determine which pages should appear in your sitemap

The main purpose of an XML sitemap is to tell search engines which pages you want indexed and where they are located. 

With that in mind, make sure your sitemap only includes the URLs that should appear in search results. 

For instance, it doesn’t make sense to exclude blocked pages from being crawled via the robots.txt file. That’s because Google won’t be able to access them. 

Also, exclude password-protected, admin, and redirect pages—just to name a few. 

For HTML sitemaps, only list URLs relevant to your website’s visitors. If, say, you have a couple of pages featuring past events or products you no longer sell, skip them. 

Decide how many sitemaps you need

Next, determine whether you need one or more XML sitemaps. Your website may require multiple sitemaps if:

  • It has more than 50,000 URLs you want indexed
  • It features lots of images and/or videos 
  • Its content is news-oriented
  • The sitemap file size is over 50MB

Say you run an ecommerce site with hundreds or thousands of products. Chances are your sitemap file will exceed 50,000 URLs. 

If you don’t split it up into multiple sitemaps, Google may not crawl and index all your URLs. 

Now, let’s assume you have a business site with a blog section featuring news posts. Or time-sensitive content, such as product update announcements. 

Even if your sitemap doesn’t exceed the size limits, you can create an additional news sitemap. 

This will ensure your blog content gets indexed quickly while making it easier to track its performance with Google Search Console (GSC). 

Use a sitemap generator 

Once you have a plan, choose the right tools to build your sitemap. 

If you’re a WordPress user, look for sitemap plugins like RankMath, Yoast SEO, or Simple Sitemap. 

For example, Yoast SEO can automatically generate XML sitemaps and update them as you create new pages or modify a URL. 

All you need to do is install the plugin and enable this feature from “Settings.”

Your sitemap will look like this:

If your website doesn’t run on WordPress, use a free or premium sitemap generator. Consider these options:

  • XML-Sitemaps.com
  • Dyno Mapper
  • WriteMaps
  • Screaming Frog
  • Octopus.do 

Some tools, such as Dyno Mapper, let you create visual sitemaps. Their drag-and-drop functionality makes it easy to add and organize your URLs—even if you have little or no technical knowledge. 

Best practices for sitemap optimization

Automatically generated sitemaps are often more SEO-friendly than those created manually. However, you should still review your website sitemap and take extra steps to optimize it for search engines. 

Here are some best practices to consider: 

Keep your URLs clean

Remove tracking parameters, session IDs, and special characters from the URLs in your sitemap. These elements can make your URLs harder to read, causing crawlability problems. 

Stick to canonical URLs

If your website has similar pages, you should only use the canonical (primary) version of the URL in your sitemap. Simply add the “link rel=canonical” tag to the URLs you want to get indexed. 

Exclude “noindex” URLs

There’s no point in adding “noindex” URLs to an XML sitemap. The “noindex” tag tells search engines not to index a page, so why would you want them to crawl it?

Use UTF-8 encoding

Your XML sitemap file must be UTF-8 encoded to ensure web crawlers can process URLs with non-ASCII characters and special symbols. Sitemap generators use the UTF-8 character encoding by default. 

Add language attributes as needed

When your content is available in multiple languages, you can create a multilingual sitemap or separate sitemaps for each language. If you choose the first option, use “hreflang” tags to tell Google which page version to show in search results based on the user’s location at the root directory of your HTML server.

Create a sitemap index file

Use a sitemap index file to list and organize multiple sitemaps for the same website. This file can contain up to 50,000 sitemaps and should go into the root directory. It’s a good choice for large websites, allowing you to submit all sitemaps to search engines simultaneously. 

Consider using dynamic XML sitemaps

Large and complex websites can benefit from dynamic sitemaps. These are automatically generated and updated in real-time as you add, modify, or delete web pages. Compared to static sitemaps, they’re easier to maintain and less prone to errors.

Apart from that, review your sitemap regularly to ensure it’s accurate. Check it for broken links, outdated pages, and other issues. This applies to both XML and HTML sitemaps. 

There are no set rules on how often you should audit a sitemap, but, generally, it’s best to do it at least once a month. 

If you have a static XML sitemap, review and update it whenever you add, remove, or modify your website’s content. Better yet, use plugins or other tools (more on that later) to enable automatic updates. 

How to submit a sitemap to search engines 

Submit your XML sitemap to Google, Yahoo! Search, Bing, or other major search engines so they can crawl and index your pages. The exact steps will differ from one search engine to the next. 

Let’s assume you want to submit a sitemap to Google. One solution is to add this file through Google Search Console. 

First, log in to your GSC account and select your website from the left sidebar. Make sure Google has verified your domain. 

Next, go to the “Indexing” section and click “Sitemaps.”

On this page (above) you can enter your sitemap URL directory, such as: sitemap.xml.”

When you’re done, hit “Submit.” 

Provided your sitemap file is correctly formatted, it will appear under “Submitted sitemaps” on the same page. You’ll also see a “Success” notice under “Status.”

Similarly, you can submit your sitemap to Bing and Yahoo via Bing Webmaster Tools. Yandex, Baidu, and Naver accept sitemap submissions, too. 

Monitor and maintain your sitemap

Use Google Search Console to monitor your sitemap for potential errors. If you spot issues, you’ll see the message “Error” or “Couldn’t fetch” in the “Status” column. 

Click the sitemap URL for additional insights. GSC will display a detailed error message related to the error cause.

Based on this information, decide what you need to do next, such as:

  • Removing and resubmitting the sitemap
  • Renaming the sitemap file
  • Fixing URL errors

You can also check your sitemap with Semrush’s Site Audit tool. Click “Issues,” and then type “sitemap” in the search bar to spot potential errors. 

The tool will tell you more about the issues you’re dealing with and how to fix them.

Keep an eye on your sitemap status with Semrush’s Site Audit and the webmaster tools provided by Google, Bing, or other major search engines. 

Also, regularly update the sitemap. Consider using Yoast, JSitemap, Inspyder Sitemap Creator, or other tools that support automatic sitemap updates. 

The goal is to make sure search engines crawl and index the most current version of your pages.

Common sitemap issues and how to fix them 

The most common sitemap-related issues fall into two categories: URL errors and format errors.

For instance, Semrush’s Site Audit tool may warn you that your sitemap doesn’t appear on the robots.txt file. This can affect crawling efficiency and your SEO efforts. 

The solution? 

Include your sitemap location in the robots.txt file. Or add your sitemap index file URL if you have multiple sitemaps. Like this:

https://www.yourwebsite.com/sitemap_index.xml

Now let’s see other sitemap errors you may encounter: 

Invalid date

This error indicates that your sitemap’s <lastmod> dates are invalid or improperly formatted, meaning they don’t use W3C Datetime encoding (YYYY-MM-DD). One option is to correct them by manually updating the sitemap. Or configure your sitemap generator to use the proper date format. 

Sitemap file size error

If your sitemap is over 50MB or has over 50,000 URLs, break it into multiple sitemaps. Then, list them on a sitemap index file. 

Compression error

This issue can prevent Google from accessing and crawling the URLs on your sitemap. Use a data compression app like GZIP or 7-Zip to recompress the sitemap. Then, upload it to your site (after removing the previous version) and submit it to search engines. 

400-level HTTP status code

When you see this error, it means Google can’t access your sitemap or certain files on your site. Say your sitemap contains a URL that no longer exists (error 404). Remove that page and update the sitemap. Then, resubmit it to Google. 

Unsupported format

The sitemap you submit to search engines should be in XML format and follow the Sitemap protocol. Otherwise, web crawlers can’t read it. 

Other errors are specific to image, video, or news sitemaps. 

For example, news publishers can have no more than 1,000 URLs in an individual sitemap. Breaking this rule will trigger an error message and cause crawl inefficiencies. 

One of the most common image sitemap errors is mislabeling alt text or omitting the text (or the map) entirely. Google has published concise image sitemap guidelines to help provide support.

When issues arise with video sitemaps, they usually have to do with the crawlability of the video files. Make sure your robots.txt doesn’t block crawlers from your videos, that your videos don’t require sign-in, and that they’re associated with a supported non-streaming protocol (HTTP/FTP).

Tools to generate and monitor your sitemaps

While it’s possible to manually create and monitor a sitemap, this process can be tedious and prone to errors. A better option is to use tools like:

  • Yoast SEO: Install this WordPress plugin to automatically generate and update XML sitemaps  
  • PowerMapper: Generate HTML, XML, or visual sitemaps with just a few clicks
  • Screaming Frog: Use this SEO tool to generate XML sitemaps and monitor them for potential issues
  • Semrush Site Audit: Conduct a comprehensive website audit to uncover XML sitemap errors
  • Slickplan: Create, edit, and manage sitemaps via a user-friendly drag-and-drop interface
  • Sitebulb: Audit your XML sitemap to ensure it’s properly formatted and SEO-friendly
  • MonSpark: Monitor the URLs in your sitemap for errors, status code changes, and other potential issues in real-time

You don’t need all of these tools to manage your sitemap. Choose a sitemap generator that offers automatic updates. Meanwhile, look for a reliable sitemap monitoring tool, such as Semrush’s Site Audit. 

Consider your business needs, too. For example, Slickplan is a visual sitemap generator and editor with limited SEO features. It’s suitable for creating user-friendly sitemaps to share with your team and clients. 

Yoast, on the other hand, has a strong focus on SEO. Plus, it’s specially designed for WordPress users, offering powerful content optimization tools. 

For best results, use Semrush’s Site Audit to monitor and optimize your sitemap. The tool can pinpoint common but often overlooked issues, such as orphan pages or missing XML tags. 

Opinions expressed in this article are those of the sponsor. Search Engine Land neither confirms nor disputes any of the conclusions presented above.