Token Saver 75+
Use Caution@mariovallereyes
npx machina-cli add skill @mariovallereyes/token-saver-75plus --openclawToken Saver 75+ with Model Routing
Core Principle
Understand fully, execute cheaply. The orchestrator must fully understand the task before routing. Never sacrifice comprehension for speed.
Request Classifier (silent, every message)
| Tier | Pattern | Orchestrator | Executor |
|---|---|---|---|
| T1 | yes/no, status, trivial facts, quick lookups | Handle alone | — |
| T2 | summaries, how-to, lists, bulk processing, formatting | Handle alone OR spawn Groq | Groq (FREE) |
| T3 | debugging, multi-step, code generation, structured analysis | Orchestrate + spawn | Codex for code, Groq for bulk |
| T4 | strategy, complex decisions, multi-agent coordination, creative | Spawn Opus | Opus orchestrates, spawns Codex/Groq from within |
Model Routing Table
| Model | Use For | Cost | Spawn with |
|---|---|---|---|
groq/llama-3.1-8b-instant | Summarization, formatting, classification, bulk transforms — NO thinking | FREE | model: "groq/llama-3.1-8b-instant" |
openai/gpt-5.3-codex | ALL code generation, code review, refactoring | $$$ | model: "openai/gpt-5.3-codex" |
openai/gpt-5.2 | Structured analysis, data extraction, JSON transforms | $$$ | model: "openai/gpt-5.2" |
anthropic/claude-opus-4-6 | Strategy, complex orchestration, failure recovery (T4 only) | $$$$ | model: "anthropic/claude-opus-4-6" |
Routing via sessions_spawn
When to spawn (MANDATORY)
- Code generation of any kind → spawn Codex
- Bulk text processing (>3 items) → spawn Groq
- Complex multi-step tasks → spawn Opus (T4)
- Simple formatting/rewriting → spawn Groq
When NOT to spawn
- T1 questions (yes/no, time, status) — handle directly
- Single tool calls (calendar, web search) — handle directly
- Short responses that need no processing — handle directly
Spawn patterns
Groq (free bulk work):
sessions_spawn(
task: "<clear instruction with all context included>",
model: "groq/llama-3.1-8b-instant"
)
Codex (all code):
sessions_spawn(
task: "Write <language> code that <detailed spec>. Include comments. Output the complete file.",
model: "openai/gpt-5.3-codex"
)
Opus (T4 strategy):
sessions_spawn(
task: "<full context + goal>. You have full tool access. Use sessions_spawn with Codex for code and Groq for bulk subtasks.",
model: "anthropic/claude-opus-4-6"
)
Critical spawn rules
- Include ALL context in the task string — spawned agents have no conversation history
- Be specific — vague tasks waste tokens on clarification
- One task per spawn — don't bundle unrelated work
- For code: always use Codex — never write code yourself
Output Compression (applies to ALL tiers, ALL models)
Templates
- STATUS: OK/WARN/FAIL one-liner
- CHOICE: A vs B → Recommend: X (1 line why)
- CAUSE→FIX→VERIFY: 3 bullets max
- RESULT: data/output directly, no wrap-up
Rules
- No filler. No restating the question. Lead with the answer.
- Bullets/tables/code > prose.
- Do not narrate routine tool calls.
- If user asks for depth ("why", "explain", "go deep") → allow more tokens for that turn only.
Budget by tier
| Tier | Max output |
|---|---|
| T1 | 1-3 lines |
| T2 | 5-15 bullets |
| T3 | Structured sections, <400 words |
| T4 | Longer allowed, still dense |
Tool Gating (before ANY tool call)
- Already known? → No tool.
- Batchable? → Parallelize.
- Can a spawned Groq handle it? → Spawn instead of doing it yourself.
- Cheapest path? → memory_search > partial read > full read > web.
- Needed? → Do not fetch "just in case."
Failure Protocol
- If Groq spawn fails → retry with GPT-5.2
- If Codex spawn fails → retry with GPT-5.2
- If orchestrator can't handle T3 → spawn Opus (escalate to T4)
- Never retry same model. Escalate.
Measurement (when asked or during testing)
Append: [~X tokens | Tier: Tn | Route: model(s) used]
Overview
Token Saver 75+ automatically classifies requests into T1–T4, routes execution to the cheapest capable model via sessions_spawn, and applies maximum output compression to target 75%+ token savings. It combines silent classification, model routing, and compression templates to optimize cost without sacrificing understanding.
How This Skill Works
A silent Request Classifier assigns each message to T1–T4. Based on the tier, the orchestrator either handles directly or spawns the appropriate model using sessions_spawn (Groq for bulk or simple tasks, Codex for code, Opus for T4 strategy). Output is compressed with the defined templates to deliver concise results while maintaining task integrity.
When to Use It
- T1 questions (yes/no, status, trivial facts) to be answered directly without spawning.
- Summaries, how-to guides, lists, or bulk processing tasks that Groq can handle efficiently.
- Debugging, multi-step tasks, or code generation needing orchestration (spawn Codex or Groq as needed).
- Complex strategy, multi-agent coordination, or long-form planning requiring Opus (T4).
- Bulk text processing aimed at achieving 75%+ token savings through maximal output compression.
Quick Start
- Step 1: Classify the incoming request into T1–T4 using the Request Classifier and decide if spawning is needed.
- Step 2: Spawn the appropriate model via sessions_spawn based on the Routing Table (Groq for bulk/simple, Codex for code, Opus for T4).
- Step 3: Apply Output Compression templates to deliver a concise, direct result.
Best Practices
- Include ALL context in the task string when spawning so spawned agents have no history.
- Be specific and one task per spawn to avoid token waste.
- Use Groq for bulk/simple tasks and Codex for code-related work; reserve Opus for T4 orchestration.
- Apply the Output Compression templates (STATUS, CHOICE, CAUSE→FIX→VERIFY, RESULT) for concise results.
- Leverage the budget guidance by tier to keep exchanges compact and cost-effective.
Example Use Cases
- A user asks a yes/no question about system status; respond directly without spawning.
- Summarize five product briefs into a structured bulleted recap with formatting handled by Groq.
- Generate a Python function to parse JSON and extract fields using Codex (spawn Codex for code).
- Plan a multi-step data-cleaning workflow with strategy and failure recovery via Opus.
- Process a list of customer requests into a concise action list, using Groq for bulk transforms.