byte-vision
A Model Context Protocol (MCP) server written in GO that provides text completion capabilities using local LLama.cpp models. This server exposes a single MCP tool that accepts text prompts and returns AI-generated completions using locally hosted language models.
claude mcp add --transport stdio kbrisso-byte-vision-mcp ./byte-vision-mcp
How to use
Byte Vision MCP is a local MCP server that exposes a single tool for text generation using locally hosted LLama.cpp models. It acts as a bridge between MCP-compatible clients and your private language models, allowing you to generate AI completions without sending data to external services. The server is designed to run locally and can leverage GPU acceleration via CUDA, ROCm, or Metal, with prompt caching and extensive logging to aid debugging and performance tuning. The available MCP tool is generate_completion, which accepts both basic prompts and advanced generation parameters to fine-tune how the model generates text. To use it, point your MCP client at the server endpoint (default http://localhost:8080/mcp-completion) and send a JSON payload containing at least a prompt. You can also supply advanced generation controls like temperature, top_k, top_p, predict (tokens to generate), ctx_size, and GPU-related options to optimize for your hardware and use case. The server reads configuration from the environment file (byte-vision-cfg.env) allowing you to adjust paths to your LLama.cpp binary, model GGUF file, and logging options. This setup emphasizes privacy by keeping all model inference on your machine.
How to install
Prerequisites:
- Go 1.23+ installed on your system
- LLama.cpp binaries installed (see llamacpp/ for guidance)
- GGUF-format models downloaded and placed in your models/ directory
Step-by-step installation:
- Clone the repository
git clone <repository-url>
cd byte-vision-mcp
- Install Go dependencies and build the server
go mod tidy
go build -o byte-vision-mcp
- Set up LLama.cpp and models
- Follow the LLama.cpp installation guide in llamacpp/README.md to install binaries suitable for your OS and GPU
- Place your GGUF model file(s) in the models/ directory as described in models/README.md
- Configure the server
- Copy the example environment configuration and customize it
cp example-byte-vision-cfg.env byte-vision-cfg.env
- Edit byte-vision-cfg.env to point to your LLama.cpp binary (LLamaCliPath), your GGUF model path (ModelFullPathVal), and log/output locations (AppLogPath, AppLogFileName, etc.)
- Run the server
./byte-vision-mcp
The server will start and listen on http://localhost:8080/mcp-completion by default.
Additional notes
Tips and common issues:
- Ensure your LLama.cpp binary has executable permissions and is accessible from the configured path in LLamaCliPath.
- Verify that your GGUF model is compatible with your LLama.cpp version and that the file path in ModelFullPathVal is correct.
- If the server fails to start, check AppLogPath for detailed logs. Enable verbose logging in the environment config if needed.
- GPU acceleration requires proper drivers and runtime (CUDA for Nvidia, ROCm for AMD, or Metal for Apple silicon). Ensure your hardware is supported and the correct flags are set (gpu_layers, ctx_size, etc.) to optimize performance.
- The MCP endpoint is /mcp-completion; ensure your MCP client is configured to target this path.
- Prompt caching can improve performance for repeated prompts; verify PromptCache settings in the environment config if you rely on repeated usage.
Related MCP Servers
mcp-language
mcp-language-server gives MCP enabled clients access semantic tools like get definition, references, rename, and diagnostics.
flux-operator
GitOps on Autopilot Mode
lingti-bot
🐕⚡ 「极简至上 效率为王 一次编译 到处执行 极速接入」的 AI Bot
kodit
👩💻 MCP server to index external repositories
github-brain
An experimental GitHub MCP server with local database.
bgg
BGG MCP provides access to BoardGameGeek and a variety of board game related data through the Model Context Protocol. Enabling retrieval and filtering of board game data, user collections, and profiles.