Clawd Throttle
Scanned@liekzejaws
npx machina-cli add skill @liekzejaws/clawd-throttle --openclawClawd Throttle
Route every LLM request to the cheapest model that can handle it. Stop paying Opus prices for "hello" and "summarize this."
Supports 8 providers and 25+ models: Anthropic (Claude), Google (Gemini), OpenAI (GPT / o-series), xAI (Grok), DeepSeek, Moonshot (Kimi), Mistral, and Ollama (local).
How It Works
- Your prompt arrives
- The classifier scores it on 8 dimensions (token count, code presence, reasoning markers, simplicity indicators, multi-step patterns, question count, system prompt complexity, conversation depth) in under 1 millisecond
- The router maps the resulting tier (simple / standard / complex) to a model based on your active mode and configured providers
- The request is proxied to the correct API
- The routing decision and cost are logged to a local JSONL file
Routing Modes
| Mode | Simple | Standard | Complex |
|---|---|---|---|
| eco | Grok 4.1 Fast | Gemini Flash | Haiku |
| standard | Grok 4.1 Fast | Haiku | Sonnet |
| gigachad | Haiku | Sonnet | Opus 4.6 |
Each cell shows the first-choice model. The router tries a preference list and falls through to the next available provider if the first is not configured.
Available Commands
| Command | What It Does |
|---|---|
route_request | Send a prompt and get a response from the cheapest capable model |
classify_prompt | Analyze prompt complexity without making an LLM call |
get_routing_stats | View cost savings and model distribution stats |
get_config | View current configuration (keys redacted) |
set_mode | Change routing mode at runtime |
get_recent_routing_log | Inspect recent routing decisions |
Overrides
- Heartbeats and summaries always route to the cheapest model
- Type
/opus,/sonnet,/haiku,/flash, or/grok-fastto force a specific model - Sub-agent calls automatically step down one tier from their parent
Setup
- Get at least one API key (Anthropic or Google required; others optional):
- Anthropic: https://console.anthropic.com/settings/keys
- Google AI: https://aistudio.google.com/app/apikey
- xAI: https://console.x.ai
- OpenAI: https://platform.openai.com/api-keys
- DeepSeek: https://platform.deepseek.com
- Moonshot: https://platform.moonshot.cn
- Mistral: https://console.mistral.ai
- Run the setup script:
npm run setup - Choose your routing mode (eco / standard / gigachad)
Privacy
- Prompt content is never stored. Only a SHA-256 hash is logged.
- All data stays local in ~/.config/clawd-throttle/
- API keys stored in your local config file
Overview
Clawd Throttle routes every LLM request to the cheapest model that can handle it, spanning eight providers and 25+ models. It scores prompts on eight dimensions in under 1ms, supports eco, standard, and gigachad modes, and logs routing decisions for cost tracking.
How This Skill Works
When a prompt arrives, the classifier scores it on eight dimensions (token count, code presence, reasoning markers, simplicity indicators, multi-step patterns, question count, system prompt complexity, conversation depth) in under 1 ms. The router then maps the resulting tier (simple/standard/complex) to a model based on the active mode and configured providers, proxies the request to the chosen API, and logs the routing decision and cost to a local JSONL file.
When to Use It
- You want to minimize spend by routing to the cheapest model that can handle the prompt across eight providers.
- Prompt complexity varies and you want automatic tiering to simple, standard, or complex within eco/standard/gigachad modes.
- You need cost awareness and auditing by logging routing decisions locally.
- You prefer local or private deployments (e.g., Ollama) to avoid sending prompts to external dashboards.
- You want to switch modes on the fly (eco, standard, gigachad) to balance cost and performance.
Quick Start
- Step 1: Install clawd-throttle and gather API keys for at least one provider (Anthropic or Google required).
- Step 2: Run npm run setup and choose your routing mode (eco, standard, or gigachad).
- Step 3: Route prompts with route_request or classify_prompt and review the locally stored logs at ~/.config/clawd-throttle/.
Best Practices
- Define and routinely update the provider/model roster to reflect current pricing.
- Tune the 8-dimension classifier to align with your typical prompts (token count, code, reasoning, etc.).
- Enable local JSONL logging and monitor cost savings with get_routing_stats.
- Test simple vs. complex prompts across modes to validate routing decisions.
- Secure API keys in local config and rotate them periodically.
Example Use Cases
- A customer-support chatbot that routes common inquiries to low-cost models while preserving accuracy.
- A research project that tracks cost savings across prompts to optimize prompt design.
- An enterprise chat tool using eco mode for routine queries and gigachad for high-complexity tasks.
- A privacy-focused deployment using Ollama/local models for sensitive conversations.
- An analytics dashboard that surfaces routing stats and cost per provider for governance.