FastAPI-BitNet

Running Microsoft's BitNet inference framework via FastAPI, Uvicorn and Docker.

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

Command

claude mcp add --transport stdio grctest-fastapi-bitnet docker run -d --name ai_container -p 8080:8080 fastapi_bitnet

How to use

FastAPI-BitNet provides a REST API to manage and interact with llama.cpp-based BitNet model instances. The server exposes endpoints to start, stop, and monitor multiple persistent bitnet sessions, run batch operations, and perform interactive prompts against running models. It also includes capabilities for running model benchmarks, estimating resource capacity, and integrating with VS Code Copilot via the MCP protocol. Once the server is up, you can explore and test the API through the auto-generated docs available at /docs (Swagger UI) and /redoc (ReDoc). The API is designed to let you programmatically launch and control BitNet instances, issue prompts, collect cleaned responses, and orchestrate multiple sessions in parallel for benchmarking or automated testing workflows.

How to install

Prerequisites:

Docker Desktop installed and running
Optional Python environment for local development (Python 3.10+)
Git to clone the repository (if you are starting from source)

Option A: Run with Docker (recommended)

Build the Docker image (if you have a Dockerfile in the repo):

docker build -t fastapi_bitnet .

Run the container (maps port 8080 from container to host):

docker run -d --name ai_container -p 8080:8080 fastapi_bitnet

Verify the server is running by visiting http://localhost:8080/docs or http://localhost:8080/redoc

Option B: Run locally with Uvicorn (for development only)

Create and activate a Python environment (optional but recommended):

conda create -n bitnet python=3.11
conda activate bitnet

Install dependencies (adjust to your project setup):

pip install fastapi uvicorn pydantic

Run the app directly (adjust the module path as needed):

uvicorn app.main:app --host 0.0.0.0 --port 8080 --reload

Access the API docs at http://127.0.0.1:8080/docs

Note: The repository may provide a Dockerfile or a Python package entrypoint. If you have a dedicated npm package or a different startup script, adapt the commands accordingly.

Additional notes

Tips and caveats:

The API exposes multiple endpoints to manage BitNet sessions (start, stop, status) and to perform batch operations. Use the API docs to discover exact routes and payload schemas.
When running in Docker, ensure that the container has access to the model files and any required GPU or CPU resources as configured in your host environment.
If you modify the model directory or need to point to a specific BitNet model, adjust the server configuration or environment variables as documented in the repository.
Typical environment variables you might encounter include PORT, MODEL_PATH, and any CLi tool paths required by llama.cpp/llama-cli. If not required, you can omit them or set placeholders until you configure the actual paths.
For VS Code Copilot integration, ensure the MCP endpoint is reachable at http://<host>:8080/mcp and that the server is exposing the appropriate API surface.
If you run into port conflicts, change the host port mapping (e.g., -p 8081:8080) and access the UI via http://localhost:8081/docs.

Related MCP Servers

local -gateway

Aggregate multiple MCP servers into a single endpoint with web UI, OAuth 2.1, and profile-based tool management

eMCP-Nexus

AI-powered MCP marketplace enabling AI engineers to deploy, monetize, and scale MCP tools using a pay-per-task execution model with integrated Stripe and Web3/crypto billing.

jaf-py

Functional Python agent framework with MCP support, enterprise security, immutable state, and production-ready observability for building scalable AI systems.

qarinai

Create unlimited AI chatbot agents for your website — powered by OpenAI-compatible LLMs, RAG, and MCP.

AGENT

Local-first multi-agent platform built on DeepAgents. Gateway, Agent Worker, and Web UI for orchestrating autonomous AI agents.

docker-swarm

MCP server for Docker Swarm orchestration using FastAPI and Docker SDK