bytebot
Bytebot is a self-hosted AI desktop agent that automates computer tasks through natural language commands, operating within a containerized Linux desktop environment.
claude mcp add --transport stdio bytebot-ai-bytebot docker compose -f docker/docker-compose.yml up -d \ --env GEMINI_API_KEY="your-google-gemini-api-key" \ --env OPENAI_API_KEY="your-openai-api-key" \ --env ANTHROPIC_API_KEY="your-anthropic-api-key"
How to use
Bytebot is a desktop AI agent that ships with its own virtual desktop and an integrated task management stack. It exposes a REST and UI-driven interface to create and monitor tasks, and provides programmatic controls to interact with the desktop (for example, taking screenshots or performing automated UI actions). The system is designed to operate with multiple AI backends (Anthropic, OpenAI, Gemini) and can authenticate to websites and services via installed password managers. To use Bytebot, deploy the Docker setup described in the Quick Start, configure your AI provider keys in the docker/.env file, and then access the UI and APIs exposed by Bytebot. You can create tasks in natural language, upload documents for processing, and watch Bytebot perform multi-step workflows across desktop applications and web portals. For automation, you can submit tasks via the API (POST /tasks) and issue desktop actions (e.g., screenshot or click_mouse) using the computer-use endpoints to drive interactions on the virtual desktop.
How to install
Prerequisites:
- Docker and Docker Compose installed on your machine
- Git installed
- Optional: API keys for Anthropic, OpenAI, or Google Gemini if you plan to enable external AI backends
Installation steps:
-
Clone the repository: git clone https://github.com/bytebot-ai/bytebot.git cd bytebot
-
Configure AI provider keys:
- Create a file docker/.env and add one or more of the following keys, depending on your provider: ANTHROPIC_API_KEY=sk-... OPENAI_API_KEY=sk-... GEMINI_API_KEY=...
- Example: echo "OPENAI_API_KEY=sk-..." > docker/.env
-
Start Bytebot with Docker Compose: docker-compose -f docker/docker-compose.yml up -d
-
Open the Bytebot UI (desktop view) in your browser: http://localhost:9992
Notes:
- You can stop Bytebot with: docker-compose -f docker/docker-compose.yml down
- The docker-compose.yml in the docker directory wires up the virtual desktop, agent, and UI components and expects the environment keys from docker/.env.
Additional notes
Tips and common issues:
- Ensure Docker has enough resources allocated (CPU, RAM) for a smooth virtual desktop experience.
- If you change provider keys, restart the containers to pick up new credentials.
- The API endpoints include:
- POST /tasks to create tasks (with optional file uploads)
- POST /computer-use to perform desktop actions like taking a screenshot or simulating mouse clicks
- If you encounter port conflicts, verify that ports 9992 (UI) and 9990/9991 (API) are free or adjust docker-compose.yml accordingly.
- For persistent environments, Bytebot installs programs inside its virtual desktop so they remain available across tasks.
- Refer to the Full deployment guide for alternative deployment options (Railway, etc.).
Related MCP Servers
cursor-talk-to-figma
TalkToFigma: MCP integration between AI Agent (Cursor, Claude Code) and Figma, allowing Agentic AI to communicate with Figma for reading designs and modifying them programmatically.
DeepMCPAgent
Model-agnostic plug-n-play LangChain/LangGraph agents powered entirely by MCP tools over HTTP/SSE.
argo
ARGO is an open-source AI Agent platform that brings Local Manus to your desktop. With one-click model downloads, seamless closed LLM integration, and offline-first RAG knowledge bases, ARGO becomes a DeepResearch powerhouse for autonomous thinking, task planning, and 100% of your data stays locally. Support Win/Mac/Docker.
mcp
This MCP server provides documentation about Strands Agents to your GenAI tools, so you can use your favorite AI coding assistant to vibe-code Strands Agents.
memov
Give git-like & traceable memory to OpenClaw and any coding agents. By https://memov.ai/ aka Entire CLI for every coding agents by MCP.
mcpcat-typescript-sdk
MCPcat is an analytics platform for MCP server owners 🐱.