sitemapkit
Scannednpx machina-cli add skill aiskillstore/marketplace/sitemapkit --openclawSitemapKit
Use the SitemapKit MCP tools to discover and extract URLs from any website's sitemaps.
Tools available
- discover_sitemaps — finds all sitemap files for a domain (checks robots.txt, common paths, sitemap indexes). Use this first when you just want to know what sitemaps exist.
- extract_sitemap — fetches all URLs from a specific sitemap URL. Use when the user gives you a direct sitemap URL.
- full_crawl — discovers all sitemaps for a domain and returns every URL across all of them in one call. Use this when the user wants the complete list of pages on a site.
When to use which tool
| User says | Use |
|---|---|
| "find sitemaps for X" / "does X have a sitemap?" | discover_sitemaps |
| "extract URLs from X/sitemap.xml" | extract_sitemap |
| "get all pages on X" / "crawl X" / "list all URLs on X" | full_crawl |
Usage guidelines
- Always pass a full URL including protocol:
https://example.com full_crawlanddiscover_sitemapsonly use the domain — paths are ignoredextract_sitemapneeds the exact sitemap URL, e.g.https://example.com/sitemap.xml- Default
max_urlsis 1000. If the user wants more, pass a higher value (up to plan limit) - If
truncated: trueappears in the result, tell the user there are more URLs and suggest increasingmax_urls - Check
meta.quota.remainingin the response — if it's low, warn the user proactively
Error handling
| Error | What to tell the user |
|---|---|
Unauthorized | API key is missing or invalid. Get one at https://app.sitemapkit.com/settings/api |
Monthly quota exceeded | Plan limit reached. Upgrade at https://sitemapkit.com/pricing |
Rate limit exceeded | Too many requests per minute. Wait and retry — the response includes a retryAfter timestamp |
Example interactions
"What pages does stripe.com have?"
→ Call full_crawl with url: "https://stripe.com", present the URL list.
"Find all sitemaps for shopify.com"
→ Call discover_sitemaps with url: "https://shopify.com", list the sitemap URLs found and which sources they came from (robots.txt, common paths, etc.).
"Extract https://example.com/sitemap-posts.xml"
→ Call extract_sitemap with url: "https://example.com/sitemap-posts.xml", present the URLs with lastmod dates if available.
"How many pages does vercel.com have?"
→ Call full_crawl, report totalUrls and whether the result was truncated.
Source
git clone https://github.com/aiskillstore/marketplace/blob/main/skills/0nl1n1n/sitemapkit/SKILL.mdView on GitHub Overview
SitemapKit provides focused tools to discover sitemap files for a domain, extract URLs from a specific sitemap, or crawl an entire site’s URL set across all sitemaps. You use it to audit site structure, perform URL discovery, and validate coverage for SEO or crawling purposes. A valid SITEMAPKIT_API_KEY on your MCP server is required.
How This Skill Works
You call one of three tools against the SitemapKit MCP server: discover_sitemaps to identify available sitemaps for a domain, extract_sitemap to fetch URLs from a given sitemap URL, or full_crawl to aggregate all URLs across all discovered sitemaps. For domain-based tools, only the domain is used (paths are ignored). Results include the URLs and metadata such as totalUrls, truncated status, and quota information.
When to Use It
- Find sitemaps for a domain (e.g., "does X have a sitemap?")
- Extract URLs from a specific sitemap URL you already have
- Get a complete list of pages on a site by crawling all sitemaps
- Audit site structure to understand coverage and page availability
- Perform URL discovery for SEO, indexing, or migration planning
Quick Start
- Ensure your SitemapKit MCP server has a valid API key configured.
- Choose the appropriate tool based on your goal: discover_sitemaps for domain-wide sitemap discovery, extract_sitemap for a specific sitemap URL, or full_crawl for all URLs across all sitemaps.
- Provide input with full URLs (including https://). For domain-based tools, pass the domain (e.g., https://example.com) and let the tool fetch relevant sitemaps.
- Review the response for totalUrls, truncated, and quota.remaining; if truncated or quota is low, adjust max_urls or retry after quota refresh.
Best Practices
- Always pass a full URL including protocol (https://).
- Prefer discover_sitemaps first to map available sitemaps before targeted extraction or full crawl.
- Use max_urls to limit results when you don’t need every URL, especially on large domains.
- If you see truncated: true, increase max_urls or run a subsequent call to retrieve more URLs.
- Monitor meta.quota.remaining and plan requests to avoid interruptions during critical crawls.
Example Use Cases
- What pages does stripe.com have? → Use full_crawl with url: https://stripe.com to obtain a complete URL list across all sitemaps.
- Find all sitemaps for shopify.com → Use discover_sitemaps with url: https://shopify.com and review found sitemap URLs and their sources (robots.txt, common paths, etc.).
- Extract https://example.com/sitemap-posts.xml → Use extract_sitemap with url: https://example.com/sitemap-posts.xml and present the URLs (with lastmod if available).
- How many pages does vercel.com have? → Use full_crawl with url: https://vercel.com and report totalUrls and whether the result was truncated.
- Audit a client domain’s site structure → Run full_crawl to collect all URLs, then analyze sitemap coverage, duplicate paths, and crawl priority.