prompt-compression
Scannednpx machina-cli add skill a5c-ai/babysitter/prompt-compression --openclawFiles (1)
SKILL.md
1.2 KB
Prompt Compression Skill
Capabilities
- Implement token-efficient prompt compression
- Design context pruning strategies
- Configure selective context inclusion
- Implement LLMLingua-style compression
- Design summary-based compression
- Create compression quality metrics
Target Processes
- cost-optimization-llm
- agent-performance-optimization
Implementation Details
Compression Techniques
- LLMLingua: Token-level compression
- Summary Compression: LLM-based summarization
- Selective Context: Relevant section extraction
- Token Pruning: Remove low-importance tokens
- Document Filtering: Pre-retrieval filtering
Configuration Options
- Compression ratio targets
- Quality threshold settings
- Token budget constraints
- Compression model selection
- Evaluation metrics
Best Practices
- Monitor quality vs compression tradeoff
- Test with representative prompts
- Set appropriate compression ratios
- Validate compressed prompt quality
- Track cost savings
Dependencies
- llmlingua (optional)
- tiktoken
- transformers
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/prompt-compression/SKILL.mdView on GitHub Overview
Prompt compression reduces prompt size to save tokens and cost while preserving essential meaning. It combines techniques such as LLMLingua-style token-level compression, summary-based compression, and selective context inclusion to trim verbosity without sacrificing performance.
How This Skill Works
The skill applies multiple techniques to compress prompts: token-level LLMLingua, LLM-based summarization, selective context extraction of relevant sections, token pruning, and pre-retrieval document filtering. Configuration options like compression ratio targets and token budgets let you balance quality and cost, while evaluation metrics help you validate compressed prompts.
When to Use It
- You must operate under strict token budgets to reduce costs.
- You need faster responses with smaller prompts.
- Your data sources are long documents and reports.
- You want to test compression models or adjust quality thresholds.
- You want to filter documents before retrieval to reduce data.
Quick Start
- Step 1: Assess the prompt to identify high-token sections.
- Step 2: Choose a compression technique (LLMLingua, summary, selective context) and set targets.
- Step 3: Apply compression, run tests with representative prompts, and measure cost impact.
Best Practices
- Monitor quality vs compression tradeoff
- Test with representative prompts
- Set appropriate compression ratios
- Validate compressed prompt quality
- Track cost savings
Example Use Cases
- Apply LLMLingua-style compression to a multi-turn customer-support prompt to cut tokens.
- Use summary-based compression to condense long incident reports into short briefs for LLMs.
- Implement selective context extraction to feed only relevant sections from large policy docs.
- Perform token pruning to remove low-importance tokens in recurring prompts.
- Enable document filtering pre-retrieval to reduce retrieved data and costs.
Frequently Asked Questions
Add this skill to your agents