mcp -webcrawl
MCP server tailored to connecting web crawler data and archives
claude mcp add --transport stdio pragmar-mcp-server-webcrawl python -m mcp_server_webcrawl
How to use
mcp-server-webcrawl provides an advanced, search-enabled interface for working with data crawled from the web. It exposes a fulltext search capability with boolean operators, and supports resource filtering by type, HTTP status, and other attributes. The server is designed to be driven by an LLM, giving it a ready-made prompt toolkit and routines for tasks such as SEO analysis, 404 auditing, performance reviews, and data extraction from multiple crawler backends. This makes it suitable for building knowledge bases from crawled content, running curated prompts, and performing guided queries against diverse crawl datasets.
To use it, install the package via pip and start the MCP server entry point. Once running, you can query the indexed crawl data using the built-in boolean search syntax, field-based filters, and content queries. The server supports prompts and routines (for example, SEO audits or performance analyses) that can be used directly within your LLM workflow. It is compatible with a variety of crawlers and formats, allowing you to filter results by type, status, and other metadata while performing complex searches across crawled content.
How to install
Prerequisites:
- Python 3.10 or newer
- pip (Python package manager)
- Internet access to install dependencies
Installation steps:
-
Create and activate a Python virtual environment (recommended): python -m venv venv source venv/bin/activate # macOS/Linux venv\Scripts\activate # Windows
-
Install the MCP server package from PyPI: pip install mcp-server-webcrawl
-
Confirm installation and available entry point (example): python -m mcp_server_webcrawl --help
-
Start the MCP server (as configured in mcp_config): python -m mcp_server_webcrawl
-
Optional: integrate with your orchestration or compose with other MCP servers as needed.
Additional notes
Notes and tips:
- The server requires Python 3.10+ and will expose endpoints suitable for integration with your MCP CLI or orchestration tooling.
- If you plan to run multiple crawlers or data sources, ensure your environment variables or configuration reflect the specific backends you intend to index.
- Common environment variables may include paths or credentials for crawled data sources; consult the project docs for crawler-specific setup guides.
- When building prompts or routines for the LLM, take advantage of the provided audit and analysis prompts (SEO Audit, 404 Audit, Performance Audit, etc.) to derive structured outputs from raw crawl data.
- Monitor resource usage for large crawl indices, as fulltext search and field filtering can be memory-intensive depending on the dataset size.
Related MCP Servers
mcp-rest-api
A TypeScript-based MCP server that enables testing of REST APIs through Cline. This tool allows you to test and interact with any REST API endpoints directly from your development environment.
awesome s
A curated list of excellent Model Context Protocol (MCP) servers.
pfsense
pfSense MCP Server enables security administrators to manage their pfSense firewalls using natural language through AI assistants like Claude Desktop. Simply ask "Show me blocked IPs" or "Run a PCI compliance check" instead of navigating complex interfaces. Supports REST/XML-RPC/SSH connections, and includes built-in complian
mcp-sysoperator
MCP for Ansible, Terraform, LocalStack, and other IaC tools. Create and iterate IaC
fegis
Define AI tools in YAML with natural language schemas. All tool usage is automatically stored in Qdrant vector database, enabling semantic search, filtering, and memory retrieval across sessions.
vector_mcp
A server implementation for the Model Context Protocol (MCP) in Ruby.