token-compressor

Token compressor

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

Command

claude mcp add --transport stdio base76-research-lab-token-compressor uvx token-compressor-mcp

How to use

This MCP server exposes a prompt-compression tool that reduces the length of prompts sent to an LLM while preserving meaning. The core tool, compress_prompt, takes input text and returns a compressed version alongside a stats footer. Plants in your workflow can call this MCP endpoint to automatically abbreviate prompts before they reach the LLM, helping to cut token usage and costs. The server is designed to work with Claude Code and other MCP-compatible clients, so you can wire it into existing prompt pipelines or Claude Code hooks to automatically compress user prompts before submission.

To use it, install the MCP server (see installation steps) and run it so that MCP clients can reach it. From there, you can invoke the compress_prompt tool by name (token-compressor-mcp) via the MCP client, passing the input text as the required parameter. The result will contain compressed_text and a footer with mode, cosine similarity, and token counts. You can integrate this into your workflow to ensure the compressed output is actually sent to the LLM, falling back to the original text if validation fails or if the pipeline decides not to compress.

How to install

Prerequisites:

Python 3.10+
Access to install Python packages (pip)
Optional: a local Ollama setup if you want to replicate the wider environment, though not required for the MCP server itself

Install the MCP server package:

pip install token-compressor-mcp

Run the MCP server via UVX (as shown in the README):

uvx token-compressor-mcp

Alternatively, if you prefer running directly from source with Python, you would install dependencies and run the module (example shown for reference; use the UVX approach if you are following the README):

pip install -r requirements.txt
python3 -m token_compressor_mcp

Test the integration by invoking the MCP tool from a client configured to call the token-compressor-mcp service and passing a sample prompt to verify compressed output is returned.

Additional notes

Notes and tips:

The MCP server exposes the compress_prompt tool. Input should be a string of text. The output includes compressed text plus a stats footer (mode, coverage, tokens).
Ensure Python 3.10+ and that token-compressor-mcp is installed in the same environment where you plan to run the MCP server.
If using Claude Code integration, you can configure the MCP server in ~/.claude/settings.json under mcpServers with command uvx and arguments ["token-compressor-mcp"].
You can adjust the compression behavior via the pipeline options in token-compressor-mcp (e.g., threshold, min_tokens, compress_model, embed_model) if you run the Python module directly.
The tool requires Ollama for the LLM-based compression step if you replicate end-to-end as described, so ensure Ollama is installed and the required models (e.g., llama3.2:1b) are pulled if you intend to run the full pipeline locally.

Related MCP Servers

nautex

MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline

mcp-yfinance

Real-time stock API with Python, MCP server example, yfinance stock analysis dashboard

pfsense

pfSense MCP Server enables security administrators to manage their pfSense firewalls using natural language through AI assistants like Claude Desktop. Simply ask "Show me blocked IPs" or "Run a PCI compliance check" instead of navigating complex interfaces. Supports REST/XML-RPC/SSH connections, and includes built-in complian

cloudwatch-logs

MCP server from serkanh/cloudwatch-logs-mcp

servicenow-api

ServiceNow MCP Server and API Wrapper

the -company

TheMCPCompany: Creating General-purpose Agents with Task-specific Tools