ocr-captcha
高级 OCR 和验证码识别 MCP 服务器 - 为 AI 代理提供图像识别能力
claude mcp add --transport stdio ymeng98-ocr-captcha-mcp-server npx -y @smithery/cli run your-deployment-url
How to use
This OCR & CAPTCHA MCP server provides AI-assisted image understanding capabilities for MCP agents. It exposes tools that can perform optical character recognition, detect text regions, preprocess images to improve OCR accuracy, and solve slider CAPTCHAs through template matching. Core tools include ocr_recognize for extracting text from images, text_detection for locating text regions, image_preprocessing for image enhancement, and slide_captcha_match for solving common sliding CAPTCHA puzzles. You can call these tools by sending a properly structured MCP request with the required parameters (for example, an image payload in Base64 and any optional language or configuration parameters). The server is designed to run in environments like Smithery as well as locally in development mode, and it emphasizes non-root operation and safe handling of user images.
How to install
Prerequisites:
- Node.js 18+ and npm installed
- Access to a terminal/command prompt
- Optional: Docker for containerized deployment
Local development and deployment steps:
-
Clone the repository: git clone https://github.com/ymeng98/ocr-captcha-mcp-server.git cd ocr-captcha-mcp-server
-
Install dependencies: npm install
-
Build the project (if applicable): npm run build
-
Run in development mode: npm run dev
-
Run tests (optional): npm test
-
Docker deployment (optional): docker build -t ocr-captcha-mcp . docker run -p 8080:8080 ocr-captcha-mcp
Note: If you’re deploying via Smithery, follow Smithery-specific deployment steps and configure the deployment URL in the MCP config (see the Smithery-related example in this repository).
Additional notes
Tips and common considerations:
- Ensure Node.js 18+ compatibility as required by the project.
- The server runs as a non-root user in Docker for security; do not rely on persistent local storage for user images.
- Memory usage should be monitored; image processing can be memory-intensive depending on image size and language models used by Tesseract.js.
- When using the ocr_recognize tool, provide the image as Base64 data and optionally specify language and whitelist parameters to improve accuracy.
- The slide_captcha_match tool expects background and piece images as Base64 and uses a threshold to determine a match; tune it if you encounter false positives.
- If deploying via Smithery, ensure the deployment URL is correctly referenced in the MCP config and that the environment has access to any needed assets or models.
- Validate input parameters strictly to prevent errors and ensure robust error handling in MCP clients.
Related MCP Servers
zen
Selfhosted notes app. Single golang binary, notes stored as markdown within SQLite, full-text search, very low resource usage
MCP -Deepseek_R1
A Model Context Protocol (MCP) server implementation connecting Claude Desktop with DeepSeek's language models (R1/V3)
mcp-fhir
A Model Context Protocol implementation for FHIR
mcp
Inkdrop Model Context Protocol Server
mcp-appium-gestures
This is a Model Context Protocol (MCP) server providing resources and tools for Appium mobile gestures using Actions API..
dubco -npm
The (Unofficial) dubco-mcp-server enables AI assistants to manage Dub.co short links via the Model Context Protocol. It provides three MCP tools: create_link for generating new short URLs, update_link for modifying existing links, and delete_link for removing short links.