site-audit
npx machina-cli add skill sivang/sivan-claude-plugins/site-audit --openclawSite Audit
Role: Thorough website quality auditor.
Objective: Crawl the provided URL and all reachable same-domain pages using Chrome browser navigation, checking for broken links, spelling/grammar errors, console errors, and visual issues. Produce a structured report of all findings.
Input
The user provides a target URL as an argument to /site-audit. Example: /site-audit https://example.com
If no URL is provided, ask the user for one. Validate that the input is a valid URL (starts with http:// or https://).
Execution
Phase 1: URL Validation & Browser Setup
- Extract the URL from the user's command
- Validate it starts with http:// or https://
- Set up Chrome browser tab:
- Call
tabs_context_mcpto get existing tab group (withcreateIfEmpty: true) - Call
tabs_create_mcpto create one dedicated tab for the entire audit - Store the returned tab ID for all subsequent operations
- Call
- Navigate to the seed URL to verify it loads
- Confirm the target domain with the user
Phase 2: Chrome-Based Crawl
After URL validation, begin the breadth-first crawl using Chrome navigation.
Initialization:
- Parse the seed URL to extract protocol, domain (hostname), and path
- Initialize crawl state:
queue: List containing only the seed URL (normalized)visited: Empty set for tracking crawled URLspage_count: 0max_pages: 20 (crawl limit — keeps audit focused)broken_links: Empty list for recording dead links with source infoseed_hostname: Extracted hostname from seed URL (for same-domain checks)link_sources: Map of {target_url: source_page} for dead link trackingbroken_links_count: 0spelling_issues_count: 0
- Create
.audit-data/directory and initialize JSONL files:mkdir -p .audit-data touch .audit-data/findings-broken-links.jsonl touch .audit-data/findings-spelling.jsonl - Reference @references/URL_RULES.md for URL normalization rules
- Reference @references/CHECKS.md for content analysis and broken link detection rules
- Display initialization summary to user:
- Seed URL (normalized)
- Domain to crawl
- Max pages limit
- Chrome tab ID being used
- Findings will be written to
.audit-data/directory
Crawl Loop:
Execute the following steps in order, looping until termination:
-
Termination check — If queue is empty OR page_count >= max_pages, exit loop and proceed to Phase 3
-
Dequeue — Remove first URL from queue front (FIFO / breadth-first)
-
Duplicate check — If URL already in visited set, skip to step 1 (continue loop)
-
Mark visited — Add current URL to visited set, increment page_count
-
Navigate in Chrome — Navigate the Chrome tab to the current URL:
- Use the
navigatetool with the stored tab ID - Wait 3 seconds for page load
- Use the
-
Check for 404/error page — Reference @references/CHECKS.md for detection rules:
- Read the page title using tabs context (check tab title in tool response)
- Use
get_page_textto get page content - A page is broken if ANY of these are true:
- Tab title contains "404", "not found" (case-insensitive), or starts with "undefined"
- Page text content starts with "404" or contains "page you're looking for wasn't found" or similar 404 patterns
- If page is broken:
- Get source page from
link_sourcesmap (use current URL as key) - If source not found, use "(seed)" as source
- Record finding:
echo '{"type":"broken_link","page":"[source_page]","target":"[current_url]","error":"404","timestamp":"[iso8601_timestamp]"}' >> .audit-data/findings-broken-links.jsonl - Increment
broken_links_count - Skip to step 11 (loop back) — no links to extract from 404 page
- Get source page from
-
Extract links from DOM — Execute JavaScript on the Chrome tab to get all actual link hrefs:
(function() { const links = Array.from(document.querySelectorAll('a[href]')) .map(a => ({ href: a.href, text: a.textContent.trim().substring(0, 50) })) .filter(l => l.href && !l.href.startsWith('javascript:') && !l.href.startsWith('mailto:') && !l.href.startsWith('tel:')); return JSON.stringify(links); })()This extracts the actual resolved href values from the DOM, not guessed URLs.
-
Classify and route — For each extracted link:
- Reference @references/URL_RULES.md for normalization and classification rules
- Skip check: If URL matches any skip rule (mailto:, .pdf, .jpg, etc.), discard it
- Same-domain check: Extract hostname, compare with seed_hostname using strict matching (www. is significant)
- If same-domain AND not already in visited set: Add to queue, record in
link_sourcesmap:link_sources[normalized_url] = current_page_url - If external: Skip (external link verification is out of scope)
-
Analyze content — Reference @references/CHECKS.md for detailed rules:
- Use the page text already extracted in step 6 (from
get_page_text) - Spelling/Grammar Check:
- Apply filtering rules to exclude non-prose content
- Identify HIGH confidence spelling errors only
- For each error found, write finding to JSONL:
echo '{"type":"spelling","page":"[current_url]","word":"[misspelled]","suggestion":"[correction]","context":"[snippet]","timestamp":"[iso8601]"}' >> .audit-data/findings-spelling.jsonl - Increment
spelling_issues_count
- Use the page text already extracted in step 6 (from
-
State update — After processing each page, output:
Page [page_count]/[max_pages]: [current_url] - [issues_found_this_page] issues Queue: [queue_length] URLs remaining -
Loop back — Return to step 1 (termination check)
After crawl completion:
- Report total pages crawled
- Report queue status (exhausted or hit page limit)
- Report total findings: broken links and spelling issues
- Proceed to Phase 3
Notes:
- ALL link discovery uses JavaScript on the actual DOM — never construct or guess URLs
- Broken link detection happens by navigating to the URL in Chrome and checking the result
- The
a.hrefproperty in JavaScript returns the fully resolved absolute URL (browser handles relative URL resolution) - Use the same Chrome tab for the entire crawl (navigate between pages)
- If JavaScript execution is blocked on a page, fall back to
get_page_textand extract visible link text, but do NOT attempt to construct URLs from link text
Phase 3: External Link Verification (to be implemented)
- Placeholder: Report that external link verification is not yet implemented
Phase 4: UI Checks
After crawl completion, perform Chrome-based runtime checks on all visited pages to detect console errors, broken resources, and visual issues.
Initialization:
- Reference @references/UI_CHECKS.md for console filtering and resource classification rules
- Get the visited pages list from Phase 2 crawl context (the
visitedset) - Reuse the same Chrome tab from Phase 2 (no need to create a new one)
- Initialize JSONL findings files:
touch .audit-data/findings-console-errors.jsonl touch .audit-data/findings-broken-resources.jsonl touch .audit-data/findings-visual-issues.jsonl - Initialize counters:
page_check_count: 0console_errors_count: 0broken_resources_count: 0visual_issues_count: 0total_pages: length of visited set
UI Check Loop:
Execute the following steps in order for each URL in visited pages:
-
Progress check — If page_check_count >= total_pages, exit loop and proceed to completion
-
Navigate — Navigate the Chrome tab to the current URL
-
Wait — Wait 3 seconds for page load (scripts, resources, async content)
-
Console error check — Reference @references/UI_CHECKS.md for filtering rules:
- Call
read_console_messageswith{"tabId": tab_id, "onlyErrors": true, "clear": true} - Filter out messages from
chrome-extension://URLs - Filter out favicon 404 messages
- For each remaining error message:
- Write finding to JSONL:
echo '{"type":"console_error","page":"[url]","level":"[level]","message":"[msg]","timestamp":"[iso8601]"}' >> .audit-data/findings-console-errors.jsonl - Increment
console_errors_count
- Write finding to JSONL:
- Call
-
Broken resource check — Reference @references/UI_CHECKS.md for classification rules:
- Call
read_network_requestswith{"tabId": tab_id, "clear": true} - Filter for failed requests: status >= 400 or status == 0
- Filter out
chrome-extension://requests - For each broken resource:
- Write finding to JSONL
- Increment
broken_resources_count
- Call
-
Visual layout check — Reference @references/UI_CHECKS.md for JavaScript snippets:
- Execute combined layout check JavaScript
- Write each finding to JSONL
- Increment
visual_issues_count
-
State update — After processing each page, increment page_check_count and output progress
-
Loop back — Return to step 1 (progress check)
After UI check completion:
- Report total pages checked
- Report total findings
- Proceed to Phase 5
Notes:
- Single tab reuse: same tab used throughout Phase 2 and Phase 4
- Clear console and network state between pages using
clear: true - Reference @references/UI_CHECKS.md for all filtering, classification, and visual check details
Phase 5: Report
After UI checks complete, generate the structured markdown report.
Report Generation:
-
Reference rules — Reference @references/REPORT.md for format specification
-
Generate filename —
audit-{domain}-{date}.md(domain with dots replaced by hyphens) -
Read all JSONL findings — Read each findings file, parse JSON, group by type
-
Build report — Following @references/REPORT.md format:
- Run metadata header (seed URL, pages crawled, duration, timestamp)
- Summary counts table
- Conditional TOC (if 50+ findings)
- Finding type sections with markdown tables
- Page index sorted by total findings
-
Write report file — Use the Write tool to save to working directory root
After report generation:
- Report file path to user
- Report total findings count and breakdown by type
- Audit is complete
Current Status
All phases are implemented. After validating the URL, the skill will:
- Set up a Chrome browser tab
- Navigate to pages using Chrome (real browser navigation)
- Extract links from the actual DOM using JavaScript (real
hrefattributes) - Detect broken links by navigating to them and checking for 404 page indicators
- Analyze page text for spelling errors using
get_page_text - Check each page for console errors, broken resources, and visual issues
- Generate structured markdown report
External link verification (Phase 3) is coming in future updates.
Source
git clone https://github.com/sivang/sivan-claude-plugins/blob/main/site-audit/skills/site-audit/SKILL.mdView on GitHub Overview
Site Audit crawls the provided URL and all reachable same-domain pages using a Chrome-based navigator to identify issues. It reports broken links, spelling/grammar errors, console problems, and visual issues, storing findings in the .audit-data directory for review.
How This Skill Works
It validates the URL, opens a dedicated Chrome tab, and performs a breadth-first crawl up to a max_pages limit (20). For each page, it checks for 404-like signals in the title and content, records broken links to findings-broken-links.jsonl, and collects spelling, console, and visual issues in their respective files.
When to Use It
- Identify broken internal links on a homepage and navigation
- Crawl a small site to surface 404s and missing pages
- Detect spelling/grammar issues across site content
- Capture console errors and rendering problems during navigation
- Prepare an SEO health report with link integrity and content quality
Quick Start
- Step 1: Run /site-audit with a target URL, e.g. /site-audit https://example.com
- Step 2: Confirm the seed URL loads and the domain matches; the tool will start a Chrome-based BFS crawl with a max_pages of 20
- Step 3: Open .audit-data/ to review findings and export results
Best Practices
- Validate the target URL starts with http:// or https:// before starting
- Limit max_pages to keep audits fast and focused
- Review .audit-data/findings-broken-links.jsonl and .audit-data/findings-spelling.jsonl after the crawl
- Use staging URLs for testing to avoid impacting production
- Cross-check flagged items with URL_RULES.md and CHECKS.md referenced in the skill
Example Use Cases
- Audit https://shop.example.com to surface broken product links
- Run a site-wide check for spelling errors in blog posts
- Identify 404s across category pages for SEO readiness
- Capture and analyze console errors on client-side rendered pages
- Generate a compact audit for QA before site launch