Get the FREE Ultimate OpenClaw Setup Guide →

sitemapkit

Scanned
npx machina-cli add skill aiskillstore/marketplace/sitemapkit --openclaw
Files (1)
SKILL.md
2.9 KB

SitemapKit

Use the SitemapKit MCP tools to discover and extract URLs from any website's sitemaps.

Tools available

  • discover_sitemaps — finds all sitemap files for a domain (checks robots.txt, common paths, sitemap indexes). Use this first when you just want to know what sitemaps exist.
  • extract_sitemap — fetches all URLs from a specific sitemap URL. Use when the user gives you a direct sitemap URL.
  • full_crawl — discovers all sitemaps for a domain and returns every URL across all of them in one call. Use this when the user wants the complete list of pages on a site.

When to use which tool

User saysUse
"find sitemaps for X" / "does X have a sitemap?"discover_sitemaps
"extract URLs from X/sitemap.xml"extract_sitemap
"get all pages on X" / "crawl X" / "list all URLs on X"full_crawl

Usage guidelines

  • Always pass a full URL including protocol: https://example.com
  • full_crawl and discover_sitemaps only use the domain — paths are ignored
  • extract_sitemap needs the exact sitemap URL, e.g. https://example.com/sitemap.xml
  • Default max_urls is 1000. If the user wants more, pass a higher value (up to plan limit)
  • If truncated: true appears in the result, tell the user there are more URLs and suggest increasing max_urls
  • Check meta.quota.remaining in the response — if it's low, warn the user proactively

Error handling

ErrorWhat to tell the user
UnauthorizedAPI key is missing or invalid. Get one at https://app.sitemapkit.com/settings/api
Monthly quota exceededPlan limit reached. Upgrade at https://sitemapkit.com/pricing
Rate limit exceededToo many requests per minute. Wait and retry — the response includes a retryAfter timestamp

Example interactions

"What pages does stripe.com have?" → Call full_crawl with url: "https://stripe.com", present the URL list.

"Find all sitemaps for shopify.com" → Call discover_sitemaps with url: "https://shopify.com", list the sitemap URLs found and which sources they came from (robots.txt, common paths, etc.).

"Extract https://example.com/sitemap-posts.xml" → Call extract_sitemap with url: "https://example.com/sitemap-posts.xml", present the URLs with lastmod dates if available.

"How many pages does vercel.com have?" → Call full_crawl, report totalUrls and whether the result was truncated.

Source

git clone https://github.com/aiskillstore/marketplace/blob/main/skills/0nl1n1n/sitemapkit/SKILL.mdView on GitHub

Overview

SitemapKit provides focused tools to discover sitemap files for a domain, extract URLs from a specific sitemap, or crawl an entire site’s URL set across all sitemaps. You use it to audit site structure, perform URL discovery, and validate coverage for SEO or crawling purposes. A valid SITEMAPKIT_API_KEY on your MCP server is required.

How This Skill Works

You call one of three tools against the SitemapKit MCP server: discover_sitemaps to identify available sitemaps for a domain, extract_sitemap to fetch URLs from a given sitemap URL, or full_crawl to aggregate all URLs across all discovered sitemaps. For domain-based tools, only the domain is used (paths are ignored). Results include the URLs and metadata such as totalUrls, truncated status, and quota information.

When to Use It

  • Find sitemaps for a domain (e.g., "does X have a sitemap?")
  • Extract URLs from a specific sitemap URL you already have
  • Get a complete list of pages on a site by crawling all sitemaps
  • Audit site structure to understand coverage and page availability
  • Perform URL discovery for SEO, indexing, or migration planning

Quick Start

  1. Ensure your SitemapKit MCP server has a valid API key configured.
  2. Choose the appropriate tool based on your goal: discover_sitemaps for domain-wide sitemap discovery, extract_sitemap for a specific sitemap URL, or full_crawl for all URLs across all sitemaps.
  3. Provide input with full URLs (including https://). For domain-based tools, pass the domain (e.g., https://example.com) and let the tool fetch relevant sitemaps.
  4. Review the response for totalUrls, truncated, and quota.remaining; if truncated or quota is low, adjust max_urls or retry after quota refresh.

Best Practices

  • Always pass a full URL including protocol (https://).
  • Prefer discover_sitemaps first to map available sitemaps before targeted extraction or full crawl.
  • Use max_urls to limit results when you don’t need every URL, especially on large domains.
  • If you see truncated: true, increase max_urls or run a subsequent call to retrieve more URLs.
  • Monitor meta.quota.remaining and plan requests to avoid interruptions during critical crawls.

Example Use Cases

  • What pages does stripe.com have? → Use full_crawl with url: https://stripe.com to obtain a complete URL list across all sitemaps.
  • Find all sitemaps for shopify.com → Use discover_sitemaps with url: https://shopify.com and review found sitemap URLs and their sources (robots.txt, common paths, etc.).
  • Extract https://example.com/sitemap-posts.xml → Use extract_sitemap with url: https://example.com/sitemap-posts.xml and present the URLs (with lastmod if available).
  • How many pages does vercel.com have? → Use full_crawl with url: https://vercel.com and report totalUrls and whether the result was truncated.
  • Audit a client domain’s site structure → Run full_crawl to collect all URLs, then analyze sitemap coverage, duplicate paths, and crawl priority.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers