context-optimizer
Scannednpx machina-cli add skill pablodiegoo/Data-Pro-Skill/context-optimizer --openclawContext Optimizer
This skill transforms large, monolithic documents into a modular .agent/ folder structure optimized for AI agent context consumption. The goal is to minimize context window usage while maximizing information accessibility.
Quick Reference
| Content Type | Destination | Naming Convention | When to Use |
|---|---|---|---|
| Core Rules/Facts | memory/ | project_facts.md, conventions.md | Immutable truths, constraints, standards |
| Processes/How-To | workflows/ | deploy.md, review.md | Step-by-step procedures (turbo-enabled) |
| Tasks/Plans | tasks/ | backlog.md, sprint.md | Active work items, implementation plans |
| Reference Docs | references/ | api_docs.md, schema.md | Large docs loaded on-demand |
| Skills | skills/ | <skill-name>/SKILL.md | Reusable capabilities with scripts |
Workflow
Phase 1: Analyze Source Document
Before splitting, understand the document's structure:
# Preview structure without splitting
head -100 <input_file> | grep -E "^#{1,3} "
Identify:
- Hierarchical depth: How many heading levels exist?
- Content density: Are sections long enough to justify separate files?
- Semantic groupings: Which sections belong together?
Phase 2: Decompose with Script
Use the bundled script to split the document:
python3 .agent/skills/context-optimizer/scripts/decompose.py <input_file> -o <output_dir> [options]
Arguments
| Argument | Description | Default |
|---|---|---|
input_file | Large markdown/text file to split | Required |
-o, --output | Output directory for chunks | <input>_split/ |
-l, --level | Header level to split by (1=#, 2=##) | 2 |
-r, --regex | Custom regex pattern (group 1 = title) | Markdown headers |
--min | Minimum lines per section | 3 |
Examples
# Split by ## (default)
python3 decompose.py project_spec.md -o .agent/temp_split
# Split by # (top-level only)
python3 decompose.py large_doc.md -o chunks -l 1
# Custom pattern (e.g., numbered sections)
python3 decompose.py report.md -r "^(\d+\.\s+.+)$" -o sections
Output: Creates numbered files (01_section_name.md, 02_...) plus 00_INDEX.md and 00_preamble.md.
Phase 3: Organize into .agent Structure
After decomposition, manually categorize each chunk:
.agent/
├── memory/ # Persistent context (always loaded)
│ ├── user_global.md # User preferences, patterns
│ ├── project_facts.md # Tech stack, constraints, conventions
│ └── decisions.md # ADRs, architectural decisions
│
├── workflows/ # Step-by-step procedures
│ ├── deploy.md # Deployment process
│ ├── review.md # Code review checklist
│ └── testing.md # Testing procedures
│
├── tasks/ # Active work items
│ ├── backlog.md # Feature backlog
│ ├── current_sprint.md # Active sprint items
│ └── implementation_plan.md # Current implementation plan
│
├── references/ # On-demand documentation
│ ├── api_docs.md # API specifications
│ ├── schema.md # Database/data schemas
│ └── external_libs.md # Third-party library docs
│
└── skills/ # Reusable capabilities
└── <skill-name>/
└── SKILL.md
Phase 4: Optimize Each File
For each categorized file, apply these optimizations:
Memory Files (High Priority)
- Maximum size: ~500 lines (always loaded)
- Format: Bullet points, tables, concise rules
- Avoid: Long explanations, examples (move to references)
Workflow Files
- Format: Numbered steps with clear actions
- Include:
// turboannotations for auto-runnable steps - Structure: Prerequisites → Steps → Verification
Task Files
- Format: Checkbox lists (
[ ],[/],[x]) - Include: Priority, deadlines, dependencies
- Update: Mark items as in-progress/done during work
Reference Files
- Maximum size: Unlimited (loaded on-demand)
- Include: Table of contents for files > 100 lines
- Add: Grep patterns in SKILL.md for large files
Phase 5: Cleanup
Remove temporary files and validate structure:
# Remove decomposition output
rm -rf .agent/temp_split
# Validate structure (optional)
find .agent -name "*.md" -exec wc -l {} \; | sort -n
Decision Matrix
Use this matrix to decide where content belongs:
┌─────────────────────────────────────────────────────────────────┐
│ Is it a PROCESS/HOW-TO? │
│ │ │
│ ┌───────────────┴───────────────┐ │
│ ▼ YES ▼ NO │
│ ┌────────────────┐ ┌────────────────┐ │
│ │ workflows/ │ │ Is it ACTIVE │ │
│ │ │ │ work to track? │ │
│ └────────────────┘ └───────┬────────┘ │
│ ┌───────┴───────┐ │
│ ▼ YES ▼ NO │
│ ┌──────────┐ ┌──────────────┐ │
│ │ tasks/ │ │ Is it a RULE │ │
│ │ │ │ or FACT? │ │
│ └──────────┘ └──────┬───────┘ │
│ ┌──────┴──────┐ │
│ ▼ YES ▼ NO │
│ ┌──────────┐ ┌──────────┐ │
│ │ memory/ │ │references/│ │
│ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
Best Practices
- Memory files are expensive — Keep them under 500 lines total
- Use references for large docs — They're loaded only when needed
- One concept per file — Easier to update and search
- Add TOC to large files — For files > 100 lines, include a table of contents
- Use consistent naming —
snake_case.mdfor all files - Delete empty directories — Don't keep placeholder folders
Phase 3: Semantic Grouping (Optimized)
Automatically categorize your chunks into .agent/ folders:
python3 .agent/skills/context-optimizer/scripts/group_sections.py <split_dir> --move
This script analyzes each chunk for keywords and structural markers to suggest whether it belongs in memory/, workflows/, tasks/, or references/.
Phase 4: Organize into .agent Structure
...
| Resource | Purpose |
|---|---|
scripts/decompose.py | Split markdown by headers or custom regex |
scripts/group_sections.py | Automatically categorize chunks by semantic analysis |
references/examples.md | Real-world categorization examples and patterns |
Related Skills
- skill-creator: For creating new skills from decomposed content
- documentation-mastery: For formatting the resulting markdown files
Source
git clone https://github.com/pablodiegoo/Data-Pro-Skill/blob/main/src/datapro/data/skills/context-optimizer/SKILL.mdView on GitHub Overview
Transforms large Markdown docs into a modular .agent folder, optimizing context usage by separating Core Rules/Facts, Processes, Tasks, References, and Skills. This structure makes AI agents faster to reason with and easier to maintain.
How This Skill Works
The skill analyzes the source document to understand structure and density, then uses the decompose.py script to split content into memory/, workflows/, tasks/, references/, and skills/. After decomposition, you manually organize chunks into the .agent structure and add navigation files like 00_INDEX.md and 00_preamble.md for quick access.
When to Use It
- Starting a new project with a large requirements document
- Migrating legacy docs to .agent structure
- Refactoring existing context files for better organization
- Converting PDFs or long READMEs into agent-friendly files
- Optimizing context window usage by splitting monolithic docs into Tasks, Memories, Workflows, and References
Quick Start
- Step 1: Analyze the input document to assess heading depth and content density
- Step 2: Run the decompose script, e.g. python3 .agent/skills/context-optimizer/scripts/decompose.py input.md -o output_dir -l 2
- Step 3: Organize the resulting chunks into .agent/memory, .agent/workflows, .agent/tasks, .agent/references, and .agent/skills; add 00_INDEX.md and 00_preamble.md
Best Practices
- Preview the source to assess hierarchical depth and content density before splitting
- Use Phase 2 decompose with appropriate header level (-l) and optional regex (-r) patterns
- Organize chunks into the canonical .agent folders: memory/, workflows/, tasks/, references/, skills/
- Place immutable truths and conventions into memory/ (e.g., project_facts.md, conventions.md)
- Create 00_INDEX.md and 00_preamble.md to improve navigation and load order
Example Use Cases
- Split a large project spec into memory/project_facts.md, workflows/deploy.md, and tasks/backlog.md for an agent-ready context
- Refactor a sprawling monolithic context into modular files across memory/, workflows/, and references/ for easier maintenance
- Convert a multi-page PDF into agent-friendly files such as memory/conventions.md and references/api_docs.md
- Break down a long README into Tasks, Memories, Workflows, and References to optimize context window usage
- Migrate legacy docs to the .agent structure to support a new agent with clean separation of concerns