audit-site
npx machina-cli add skill Crawlio-app/crawlio-plugin/audit-site --openclawaudit-site
Perform a comprehensive site audit: crawl, capture enrichment, analyze observations, and produce a findings report.
When to Use
Use this skill when the user wants a thorough analysis of a website — not just downloading it, but understanding its technology stack, identifying issues, and producing actionable findings.
Full Audit Workflow
Phase 1: Configure for the Target
First, understand the site and configure appropriately:
update_settings({
settings: {
maxConcurrent: 4,
crawlDelay: 0.5,
stripTrackingParams: true
},
policy: {
scopeMode: "sameDomain",
maxDepth: 5,
respectRobotsTxt: true,
includeSupportingFiles: true,
downloadCrossDomainAssets: true,
autoUpgradeHTTP: true
}
})
Adjust based on site size:
- Small site (<100 pages):
maxDepth: 10,maxConcurrent: 8 - Medium site (100-1000):
maxDepth: 5,maxConcurrent: 4 - Large site (1000+):
maxDepth: 3,maxPagesPerCrawl: 500
Phase 2: Crawl
start_crawl({ url: "https://example.com" })
Monitor until complete:
get_crawl_status()
// Repeat with `since` parameter for efficient polling
Phase 3: Capture Enrichment (if crawlio-agent available)
If the Chrome extension is running, enrichment data (framework detection, network requests, console logs) is captured automatically during the crawl and appended to the observation log.
Phase 4: Analyze Observations
Query the observation timeline for patterns:
// Get all observations for the target
get_observations({ host: "example.com", limit: 200 })
// Check framework detection
get_observations({ host: "example.com", source: "extension", limit: 50 })
// Review crawl results
get_observations({ op: "crawl_done" })
// Check for errors
get_errors()
get_failed_urls()
Phase 5: Create Findings
For each issue or insight identified, create a finding with evidence:
create_finding({
title: "...",
url: "https://example.com",
evidence: ["obs_xxx", "obs_yyy"],
synthesis: "..."
})
Phase 6: Generate Report
Compile findings into a structured report:
get_findings({ host: "example.com" })
Present as a summary with:
- Technology Stack: Framework, rendering mode, CDN
- Issues Found: Grouped by category (security, performance, SEO, errors)
- Site Structure: Tree overview, orphaned pages, broken links
- Recommendations: Prioritized action items
Audit Checklist
Use this checklist to ensure thorough coverage:
Technology
- Framework detected (React, Vue, Next.js, etc.)
- Rendering mode identified (SSR, SSG, CSR, ISR)
- CDN/hosting identified
- Third-party services cataloged
Performance
- Page count and crawl duration noted
- Large files identified (images, videos, scripts)
- External resource dependencies mapped
Security
- HTTPS enforcement checked
- Mixed content identified
- Security headers reviewed (if captured in network data)
Content
- Failed URLs reviewed and categorized
- Redirect chains identified
- Missing resources noted
Structure
- Site tree reviewed for organization
- Depth distribution analyzed
- Cross-domain assets cataloged
Example Report Structure
# Site Audit: example.com
## Summary
- **Pages crawled**: 142
- **Failed**: 4 (3 auth-required API routes, 1 timeout)
- **Framework**: Next.js (App Router, ISR)
- **Total size**: 23.4 MB
## Findings (5)
### Security
1. **Mixed content on 3 pages** — HTTP images loaded on HTTPS pages
Evidence: obs_a3f7b2c1, obs_b4e8c3d2
### Performance
2. **Unoptimized images** — 8 images over 500KB without modern formats
Evidence: obs_c5f9d4e3
### SEO
3. **Missing meta descriptions** — 12 pages have no meta description
4. **Duplicate titles** — 3 pairs of pages share identical titles
### Errors
5. **Console errors on product pages** — TypeError in cart widget
Evidence: obs_d6g0e5f4
## Recommendations
1. Convert images to WebP/AVIF format
2. Fix mixed content by updating CDN URLs to HTTPS
3. Add unique meta descriptions to all pages
4. Fix TypeError in cart widget JavaScript
Tips
- Save the project after crawling so you can revisit:
save_project({ name: "example.com audit" }) - For recurring audits, load the project and recrawl to compare:
load_project({ id: "..." }) - Use
export_site({ format: "folder" })to archive the downloaded content - Combine with crawlio-agent browser interaction tools for authenticated areas
Source
git clone https://github.com/Crawlio-app/crawlio-plugin/blob/main/skills/audit-site/SKILL.mdView on GitHub Overview
audit-site performs a complete website audit by crawling, enriching, and analyzing observations to produce a findings report. It surfaces technology stack details, detects issues, and delivers prioritized recommendations.
How This Skill Works
First, configure the crawl and policy settings. Then run the crawl, capture enrichment if available, analyze observations, create findings with evidence, and generate the final report.
When to Use It
- You want a thorough analysis of a website beyond basic crawling (tech stack, issues, and actionable findings).
- You need a governance-friendly audit covering performance, security, SEO, and structure.
- You want an evidence-backed findings report with prioritized recommendations.
- You're auditing a site to map architecture, dependencies, and cross-domain assets.
- You're preparing a comprehensive site health check for stakeholders.
Quick Start
- Step 1: Configure crawl settings with update_settings({ settings: ..., policy: ... })
- Step 2: Start the crawl with start_crawl({ url: "https://example.com" }) and monitor with get_crawl_status()
- Step 3: Generate findings and review the report with get_findings({ host: "example.com" }) and review the structured output
Best Practices
- Define a clear scope and set appropriate maxDepth and maxConcurrent based on site size.
- Enable robots.txt respect and proper policy settings to avoid over-crawling.
- Enable enrichment capture by using the crawlio-agent Chrome extension when available.
- Capture observations and errors systematically; attach evidence to each finding.
- Present findings in a structured report with tech stack, issues, site structure, and actionable recommendations.
Example Use Cases
- Audit example.com to map the technology stack and identify missing security headers.
- Review a medium-sized e-commerce site for broken links and crawl coverage.
- Assess a large corporate site to document site structure and orphaned pages.
- Detect unoptimized assets and large image files affecting performance.
- Produce a findings report that prioritizes remediation work for SEO and security.