Get the FREE Ultimate OpenClaw Setup Guide →

OmniMCP

OmniMCP uses Microsoft OmniParser and Model Context Protocol (MCP) to provide AI models with rich UI context and powerful interaction capabilities.

Installation
Run this command in your terminal to add the MCP server to Claude Code.
Run in terminal:
Command
claude mcp add --transport stdio openadaptai-omnimcp python cli.py \
  --env LOG_LEVEL="INFO" \
  --env OMNIPARSER_URL="Optional: URL to OmniParser server" \
  --env PYTHONWARNINGS="ignore" \
  --env ANTHROPIC_API_KEY="YOUR_ANTHROPIC_KEY" \
  --env AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY" \
  --env AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"

How to use

OmniMCP provides a robust MCP-enabled interface that connects a large language model (LLM) with a visual UI understanding loop. It captures the current screen, parses UI elements with OmniParser, plans actions using an LLM, and then executes those actions via mouse/keyboard controls. The CLI entry point runs a perceive-plan-act loop, enabling tasks such as performing calculator operations or interacting with synthetic UI scenarios. You can run a default goal (e.g., a calculator-like task) or supply a custom goal to steer the agent’s behavior. The system also supports an experimental MCP server implementation for advanced integration and experimentation. Using OmniMCP, you can observe how perception, planning, and action execution come together to autonomously accomplish UI-oriented goals.

How to install

Prerequisites:

  • Python 3.10 to 3.12
  • A Linux desktop session with graphical support (X11/Wayland) for UI interaction
  • pynput and other system dependencies (handled by install.sh)
  • Optional: AWS credentials if you enable OmniParser deployment features

Installation steps:

  1. Clone the repository: git clone https://github.com/OpenAdaptAI/OmniMCP.git cd OmniMCP

  2. Run the installer to create a virtual environment and install dependencies: ./install.sh

  3. Copy example environment configuration and edit keys: cp .env.example .env

    Edit .env with your keys (AWS, ANTHROPIC_API_KEY, OMNIPARSER_URL, etc.)

  4. Activate the virtual environment:

    Linux/macOS

    source .venv/bin/activate

    Windows (PowerShell)

    ..venv\Scripts\Activate.ps1

  5. Run the OmniMCP CLI to execute tasks: python cli.py

  6. Optional: If you plan to use AWS deployment features, ensure AWS credentials are configured in .env and then you can leverage the auto-deploy capabilities as described in the docs.

Additional notes

Notes and tips:

  • The project includes an experimental MCP server implementation (OmniMCP) that sits alongside the standard CLI/AgentExecutor workflow. It is intended for experimentation and higher-level API usage.
  • Debug output is saved under runs/<timestamp>/ with per-step visuals and logs stored under logs/ for easier troubleshooting.
  • Ensure a functioning graphical session is available for UI perception and input control (pynput relies on system libraries such as libx11-dev on Linux).
  • When using AWS-driven auto deployment, be mindful of potential costs and clean up resources with the provided stop command: python -m omnimcp.omniparser.server stop.
  • If you modify or extend the UI perception or action space, consider updating the environment and installation steps to reflect new dependencies and any required permissions.

Related MCP Servers

Sponsor this space

Reach thousands of developers