Step-by-Step: Create an XML Sitemap with a Sitemap GeneratorAn XML sitemap is a roadmap that helps search engines discover and index the pages on your website. While you can create one manually, using a sitemap generator saves time, reduces errors, and often includes features like priority settings, change frequency hints, and automatic updates. This guide walks you through creating an XML sitemap with a sitemap generator — from preparation to submission and maintenance.
Why an XML sitemap matters
An XML sitemap:
- Helps search engines find pages — especially deep, new, or poorly linked pages.
- Communicates metadata — such as last modification date, change frequency, and priority.
- Improves indexing for complex sites — large, dynamic, or media-heavy sites benefit most.
- Supports canonicalization — when used correctly, sitemaps reinforce canonical URLs.
Before you start: preparations
- Audit your site structure
- List major sections, dynamic pages, and important assets (images, videos).
- Note pages you don’t want indexed (e.g., internal tools, staging, admin pages).
- Decide URLs to include
- Include canonical, publicly accessible pages you want indexed.
- Exclude duplicate, low-value, or blocked pages (robots.txt disallow).
- Gather access details
- For crawlers that connect to your server, have FTP/SFTP or hosting control panel info if needed.
- For CMS plugins, ensure you have admin access.
- Choose a sitemap generator
- Options include desktop tools, online services, and CMS plugins. Pick one that supports XML format, handles the size of your site, and, if needed, supports image/video sitemaps.
Step 1 — Select the right sitemap generator
Consider:
- Site size (some free tools limit URL count).
- Dynamic content (support for crawling JavaScript-rendered pages).
- Automation (scheduled regeneration, auto-submission).
- Extra features (image/video sitemaps, hreflang support, changefreq/priority settings).
Examples of generator types:
- CMS plugins (e.g., for WordPress or Drupal) — easiest for site owners.
- Desktop crawlers (Screaming Frog, Integrity) — powerful for larger sites.
- Online generators — convenient for small sites.
- Command-line tools — for advanced automation and integration in CI/CD.
Step 2 — Configure crawl settings
Important settings to set before crawling:
- Crawl depth — how many levels from the homepage to follow.
- Include/exclude patterns — to skip private directories, query strings, or certain file types.
- Follow internal links only vs. follow external links — usually limit to internal.
- Maximum URLs — for very large sites, set a sensible cap or use a tool that supports large sitemaps and sitemap index files.
If your site uses JavaScript to build links, choose a generator that can render JS or configure headless browser crawling.
Step 3 — Run the crawl and inspect results
- Start the crawl and monitor progress.
- Review found URLs for obvious omissions or unwanted pages.
- Look for errors like 404s, redirects, or blocked resources.
- Many tools will show response codes, canonical tags, and rel=prev/next — use these to refine which URLs to include.
Example checks:
- Are important pages present?
- Are paginated pages being handled properly (canonical, rel=next/prev)?
- Are parameterized URLs being deduplicated?
Step 4 — Configure sitemap rules and metadata
Once URLs are gathered, configure sitemap-specific metadata:
- lastmod — set to last modified date. Use file timestamps, CMS data, or leave blank for crawl date.
- changefreq — options: always, hourly, daily, weekly, monthly, yearly, never. Use sparingly; search engines largely ignore this but it can be useful for large sites to hint frequency.
- priority — numeric 0.0–1.0 indicating importance relative to other pages. Use consistently (e.g., homepage 1.0, category pages 0.8, articles 0.5).
For images/videos or multilingual sites:
- Add image/video sitemap tags per protocol.
- Include hreflang entries or use separate sitemaps per language/region if needed.
If your site exceeds 50,000 URLs or 50MB (uncompressed), use a sitemap index file that references multiple sitemap files.
Step 5 — Export the XML sitemap
Most generators offer export options:
- Single XML file (sitemap.xml).
- Compressed XML (sitemap.xml.gz) for large files.
- Sitemap index (sitemap_index.xml) referencing multiple sitemaps.
Ensure the exported XML follows the sitemap protocol (utf-8, correct tags). A minimal example:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://www.example.com/</loc> <lastmod>2025-08-28</lastmod> <changefreq>weekly</changefreq> <priority>1.0</priority> </url> </urlset>
Step 6 — Place the sitemap on your site
- Upload sitemap.xml (and sitemap index or compressed files) to your site’s root: https://www.example.com/sitemap.xml.
- If using a subdirectory, you must reference that path when submitting to search consoles, but root placement is standard and recommended.
Add the sitemap location to robots.txt for discoverability: robots.txt example:
User-agent: * Sitemap: https://www.example.com/sitemap.xml
Step 7 — Submit to search engines
Google:
- Use Google Search Console → Sitemaps → enter the sitemap URL → Submit.
- Monitor indexing status and any errors reported (parsing issues, unreachable URLs).
Bing:
- Use Bing Webmaster Tools → Sitemaps → Submit sitemap URL.
- Monitor crawl and submit reports.
You don’t need to submit to every engine; including the sitemap in robots.txt and submitting to major consoles covers most crawlers.
Step 8 — Monitor and iterate
- Check Search Console reports for coverage errors, excluded URLs, and indexing trends.
- Re-run the generator after major site updates or schedule automated regeneration.
- Watch for common issues: blocked by robots.txt, noindex tags, canonical pointing elsewhere, or frequent redirects.
For large/dynamic sites:
- Automate sitemap generation via CMS plugins, CI/CD scripts, or server-side routines.
- Use ping URLs or APIs (e.g., call Google’s sitemap ping endpoint) after updates: https://www.google.com/ping?sitemap=https://www.example.com/sitemap.xml
Best practices and tips
- Keep URLs canonical and consistent (trailing slash, scheme).
- Prefer absolute URLs in the sitemap.
- Don’t include noindex pages.
- Use separate sitemaps for different content types (images, videos) or languages.
- Compress large sitemaps (.gz) to reduce bandwidth.
- Review and remove obsolete URLs periodically to avoid wasted crawl budget.
Troubleshooting common problems
- Sitemap not being indexed: check robots.txt, ensure sitemap is reachable, verify noindex tags, and confirm correct canonical tags.
- Too many URLs: split into multiple sitemaps and use an index file.
- Incorrect lastmod dates: pull dates from CMS or use consistent update rules; avoid using crawl date if it misleads search engines.
- Crawl errors reported in Search Console: fix server errors (5xx), broken links (404), and long redirect chains.
Quick checklist
- Choose generator appropriate for site size and technology.
- Configure crawl rules and render JS if necessary.
- Review crawl output and set lastmod/changefreq/priority.
- Export and upload sitemap(s) to site root.
- Add Sitemap line to robots.txt.
- Submit to Google Search Console and Bing Webmaster Tools.
- Monitor coverage reports and automate regeneration.
An XML sitemap is a small file with an outsized impact on indexing efficiency. Using a sitemap generator makes creating, maintaining, and scaling sitemaps practical — especially for evolving sites.
Leave a Reply