OmniMCP
OmniMCP uses Microsoft OmniParser and Model Context Protocol (MCP) to provide AI models with rich UI context and powerful interaction capabilities.
claude mcp add --transport stdio openadaptai-omnimcp python cli.py \ --env LOG_LEVEL="INFO" \ --env OMNIPARSER_URL="Optional: URL to OmniParser server" \ --env PYTHONWARNINGS="ignore" \ --env ANTHROPIC_API_KEY="YOUR_ANTHROPIC_KEY" \ --env AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY" \ --env AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
How to use
OmniMCP provides a robust MCP-enabled interface that connects a large language model (LLM) with a visual UI understanding loop. It captures the current screen, parses UI elements with OmniParser, plans actions using an LLM, and then executes those actions via mouse/keyboard controls. The CLI entry point runs a perceive-plan-act loop, enabling tasks such as performing calculator operations or interacting with synthetic UI scenarios. You can run a default goal (e.g., a calculator-like task) or supply a custom goal to steer the agent’s behavior. The system also supports an experimental MCP server implementation for advanced integration and experimentation. Using OmniMCP, you can observe how perception, planning, and action execution come together to autonomously accomplish UI-oriented goals.
How to install
Prerequisites:
- Python 3.10 to 3.12
- A Linux desktop session with graphical support (X11/Wayland) for UI interaction
- pynput and other system dependencies (handled by install.sh)
- Optional: AWS credentials if you enable OmniParser deployment features
Installation steps:
-
Clone the repository: git clone https://github.com/OpenAdaptAI/OmniMCP.git cd OmniMCP
-
Run the installer to create a virtual environment and install dependencies: ./install.sh
-
Copy example environment configuration and edit keys: cp .env.example .env
Edit .env with your keys (AWS, ANTHROPIC_API_KEY, OMNIPARSER_URL, etc.)
-
Activate the virtual environment:
Linux/macOS
source .venv/bin/activate
Windows (PowerShell)
..venv\Scripts\Activate.ps1
-
Run the OmniMCP CLI to execute tasks: python cli.py
-
Optional: If you plan to use AWS deployment features, ensure AWS credentials are configured in .env and then you can leverage the auto-deploy capabilities as described in the docs.
Additional notes
Notes and tips:
- The project includes an experimental MCP server implementation (OmniMCP) that sits alongside the standard CLI/AgentExecutor workflow. It is intended for experimentation and higher-level API usage.
- Debug output is saved under runs/<timestamp>/ with per-step visuals and logs stored under logs/ for easier troubleshooting.
- Ensure a functioning graphical session is available for UI perception and input control (pynput relies on system libraries such as libx11-dev on Linux).
- When using AWS-driven auto deployment, be mindful of potential costs and clean up resources with the provided stop command: python -m omnimcp.omniparser.server stop.
- If you modify or extend the UI perception or action space, consider updating the environment and installation steps to reflect new dependencies and any required permissions.
Related MCP Servers
SearChat
Search + Chat = SearChat(AI Chat with Search), Support OpenAI/Anthropic/VertexAI/Gemini, DeepResearch, SearXNG, Docker. AI对话式搜索引擎,支持DeepResearch, 支持OpenAI/Anthropic/VertexAI/Gemini接口、聚合搜索引擎SearXNG,支持Docker一键部署。
scira -chat
A minimalistic MCP client with a good feature set.
mcp -aws
A Model Context Protocol server implementation for operations on AWS resources
MCP2Lambda
Run any AWS Lambda function as a Large Language Model (LLM) tool without code changes using Anthropic's Model Context Protocol (MCP).
lc2mcp
Convert LangChain tools to FastMCP tools
mcp-db
Session and event store for MCP server, also allows distributed coordination between servers.