byte-vision

A Model Context Protocol (MCP) server written in GO that provides text completion capabilities using local LLama.cpp models. This server exposes a single MCP tool that accepts text prompts and returns AI-generated completions using locally hosted language models.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

View docs

Command

claude mcp add --transport stdio kbrisso-byte-vision-mcp ./byte-vision-mcp

How to use

Byte Vision MCP is a local MCP server that exposes a single tool for text generation using locally hosted LLama.cpp models. It acts as a bridge between MCP-compatible clients and your private language models, allowing you to generate AI completions without sending data to external services. The server is designed to run locally and can leverage GPU acceleration via CUDA, ROCm, or Metal, with prompt caching and extensive logging to aid debugging and performance tuning. The available MCP tool is generate_completion, which accepts both basic prompts and advanced generation parameters to fine-tune how the model generates text. To use it, point your MCP client at the server endpoint (default http://localhost:8080/mcp-completion) and send a JSON payload containing at least a prompt. You can also supply advanced generation controls like temperature, top_k, top_p, predict (tokens to generate), ctx_size, and GPU-related options to optimize for your hardware and use case. The server reads configuration from the environment file (byte-vision-cfg.env) allowing you to adjust paths to your LLama.cpp binary, model GGUF file, and logging options. This setup emphasizes privacy by keeping all model inference on your machine.

How to install

Prerequisites:

Go 1.23+ installed on your system
LLama.cpp binaries installed (see llamacpp/ for guidance)
GGUF-format models downloaded and placed in your models/ directory

Step-by-step installation:

Clone the repository

git clone <repository-url>
cd byte-vision-mcp

Install Go dependencies and build the server

go mod tidy
go build -o byte-vision-mcp

Set up LLama.cpp and models

Follow the LLama.cpp installation guide in llamacpp/README.md to install binaries suitable for your OS and GPU
Place your GGUF model file(s) in the models/ directory as described in models/README.md

Configure the server

Copy the example environment configuration and customize it

cp example-byte-vision-cfg.env byte-vision-cfg.env

Edit byte-vision-cfg.env to point to your LLama.cpp binary (LLamaCliPath), your GGUF model path (ModelFullPathVal), and log/output locations (AppLogPath, AppLogFileName, etc.)

Run the server

./byte-vision-mcp

The server will start and listen on http://localhost:8080/mcp-completion by default.

Additional notes

Tips and common issues:

Ensure your LLama.cpp binary has executable permissions and is accessible from the configured path in LLamaCliPath.
Verify that your GGUF model is compatible with your LLama.cpp version and that the file path in ModelFullPathVal is correct.
If the server fails to start, check AppLogPath for detailed logs. Enable verbose logging in the environment config if needed.
GPU acceleration requires proper drivers and runtime (CUDA for Nvidia, ROCm for AMD, or Metal for Apple silicon). Ensure your hardware is supported and the correct flags are set (gpu_layers, ctx_size, etc.) to optimize performance.
The MCP endpoint is /mcp-completion; ensure your MCP client is configured to target this path.
Prompt caching can improve performance for repeated prompts; verify PromptCache settings in the environment config if you rely on repeated usage.

Related MCP Servers

mcp-language

1.5k

mcp-language-server gives MCP enabled clients access semantic tools like get definition, references, rename, and diagnostics.

flux-operator

521

GitOps on Autopilot Mode

lingti-bot

275

🐕⚡ 「极简至上效率为王一次编译到处执行极速接入」的 AI Bot

kodit

116

👩‍💻 MCP server to index external repositories

github-brain

An experimental GitHub MCP server with local database.

bgg

BGG MCP provides access to BoardGameGeek and a variety of board game related data through the Model Context Protocol. Enabling retrieval and filtering of board game data, user collections, and profiles.