Part of the SEO audit
Make sure your sitemap helps search engines find the right pages
A clean XML sitemap gives crawlers a reliable list of URLs to discover and revisit. SiteCurl checks whether the sitemap exists and whether robots.txt points to it.
No signup required. Results in under 60 seconds.
What this check does
SiteCurl looks for /sitemap.xml and records whether it is reachable. It also checks whether the sitemap is referenced in robots.txt, which makes discovery easier for crawlers.
The check verifies that the sitemap returns a valid HTTP response. If the sitemap URL returns a 404, a 500 error, or redirects to an unexpected location, SiteCurl flags the issue. A sitemap that exists but is not listed in robots.txt is also flagged, because crawlers may not find it automatically.
SiteCurl does not parse every URL inside the sitemap, but it confirms the file is present and accessible. This is the foundation: if the sitemap itself is broken, nothing it lists can be discovered.
How this shows up in the real world
An XML sitemap is a structured list of URLs that you want search engines to crawl. It is not a ranking signal by itself, but it is a discovery mechanism. For new sites, large sites, and sites with sections that are not well-linked internally, the sitemap is often the first way a search engine learns about a page.
Sitemaps also carry metadata: the lastmod date tells crawlers when the page was last updated, and the priority value suggests relative importance (though most search engines ignore priority). The lastmod date is more useful in practice, because it helps crawlers decide which pages to re-crawl after a content update.
The relationship between the sitemap and robots.txt is important. Adding a Sitemap: directive to robots.txt is the standard way to tell crawlers where to find your sitemap. Without this directive, crawlers rely on Search Console submissions or common URL guesses like /sitemap.xml. Adding the directive removes that guesswork.
Sites with more than 50,000 URLs need a sitemap index file that references multiple child sitemaps. Each child sitemap can contain up to 50,000 URLs. SiteCurl checks the root sitemap URL, so if your site uses a sitemap index, make sure the index file itself is accessible and listed in robots.txt.
Why it matters
Sitemaps do not replace good internal linking, but they improve discovery and reinforce which URLs you want crawled. They are especially helpful on larger sites, new sites, and sections that do not earn many internal links yet.
For new sites with few external links, the sitemap may be the only way search engines discover your pages in the first days after launch. Without a sitemap, you are waiting for crawlers to find your pages through links, which can take weeks or months.
Sitemaps also serve as a reconciliation tool. By comparing the URLs in your sitemap against the pages Google has actually indexed (visible in Search Console), you can identify gaps: pages you want indexed but are not, and pages that are indexed but should not be. This comparison is one of the most useful ongoing SEO maintenance tasks.
Who this impacts most
New sites benefit the most from a working sitemap. Without an established link profile, the sitemap is the primary discovery channel. A new SaaS product that launches with 20 pages and no sitemap may wait weeks before all pages appear in search.
Large content sites with thousands of articles need sitemaps to manage the scale. Individual articles deep in the archive may never be reached by a crawler following links from the homepage. The sitemap ensures every article is at least submitted for crawling.
E-commerce stores with frequent inventory changes depend on lastmod dates in the sitemap to signal which product pages have been updated. Without a sitemap, price changes and new product launches may not be re-crawled for days or weeks.
How to fix it
Step 1: Generate or expose a live sitemap at a stable URL. Most CMS platforms and web frameworks have sitemap plugins or gems. If you are using a static site generator, add a build step that generates the sitemap. The standard location is /sitemap.xml at the root of your domain.
Step 2: Include only canonical, indexable pages. Every URL in the sitemap should return a 200 status, should not have a noindex directive, and should match the canonical URL declared on the page. Remove redirects, 404s, and non-canonical variants from the sitemap.
Step 3: Add the sitemap URL to robots.txt. Add a line like Sitemap: https://yourdomain.com/sitemap.xml at the bottom of your robots.txt file. This is the standard way to advertise your sitemap to all crawlers.
Step 4: Submit the sitemap in Search Console. After major structure changes (new sections, migrations, large content additions), resubmit the sitemap in Google Search Console and Bing Webmaster Tools. This prompts the crawlers to re-process the file sooner than they would on their own schedule.
Common mistakes when fixing this
Listing non-canonical pages. If the sitemap includes URLs that redirect to other URLs, or URLs with query parameters that should canonicalize to a clean version, the sitemap is sending mixed signals about which URLs matter most.
Leaving old or redirected URLs in the sitemap. Search engines waste crawl resources on dead paths. Clean up the sitemap after every migration, URL change, or content removal.
Assuming the sitemap alone fixes discoverability. A sitemap tells crawlers about your pages, but internal links still carry authority and context that sitemaps do not. You need both: a sitemap for discovery and internal links for authority.
Never updating the lastmod dates. If every URL in the sitemap has the same lastmod value (or none at all), crawlers cannot prioritize which pages to re-crawl. Update lastmod only when the page content actually changes.
How to verify the fix
Run another SiteCurl scan and confirm sitemap warnings are gone. Open the sitemap directly in the browser to check that it loads, then review a few listed URLs to confirm they are real, canonical pages.
In Google Search Console, navigate to the Sitemaps section and check the submission status. It should show 'Success' with the number of discovered URLs. If the count is lower than expected, compare the sitemap URLs against the Coverage report to find which pages are missing and why.
Example findings from a scan
XML sitemap not found at /sitemap.xml
robots.txt does not list a sitemap URL
Sitemap contains pages that are no longer linked internally
Related checks
Frequently asked questions
Do small sites need a sitemap?
Yes. Small sites can often be crawled without one, but a sitemap is still a low-effort signal that helps with discovery and maintenance.
Should every indexable page be in the sitemap?
Generally yes, especially the pages you want crawled and kept indexed.
Can a sitemap fix poor internal linking?
No. It helps discovery, but internal links are still important for crawl paths and page importance.
Check your XML sitemap now