rlm
npx machina-cli add skill zircote/rlm-rs-plugin/rlm --openclawRLM (Recursive Language Model) Workflow
Orchestrate processing of documents that exceed context window limits using the rlm-rs CLI tool. This skill implements the RLM pattern from arXiv:2512.24601, enabling analysis of content up to 100x larger than typical context windows.
Architecture Mapping
| RLM Concept | Implementation |
|---|---|
| Root LLM | Main Claude Code conversation (Opus/Sonnet) |
Sub-LLM (llm_query) | rlm-subcall agent (Haiku) |
| External Environment | rlm-rs CLI with SQLite storage |
Prerequisites
Verify rlm-rs is installed and available:
command -v rlm-rs >/dev/null 2>&1 || echo "INSTALL REQUIRED: cargo install rlm-rs"
Installation options:
# Via Cargo (recommended)
cargo install rlm-rs
# Via Homebrew
brew install zircote/tap/rlm-rs
Workflow Steps
Step 1: Initialize Database
Create or verify the RLM database:
rlm-rs init
rlm-rs status
If already initialized, status shows current buffers and state.
Step 2: Load Context File
Load the large document into a buffer with appropriate chunking:
# Semantic chunking (recommended for structured content)
rlm-rs load <file_path> --name <buffer_name> --chunker semantic
# Fixed chunking (for unstructured text)
rlm-rs load <file_path> --name <buffer_name> --chunker fixed --chunk-size 6000
# With overlap for continuity
rlm-rs load <file_path> --name <buffer_name> --chunker fixed --chunk-size 6000 --overlap 1000
Step 3: Scout the Content
Examine the beginning and end to understand structure:
# View first 3000 characters
rlm-rs peek <buffer_name> --start 0 --end 3000
# View last 3000 characters
rlm-rs peek <buffer_name> --start -3000
Search for relevant sections:
rlm-rs grep <buffer_name> "<pattern>" --max-matches 20 --window 150
Step 4: Search for Relevant Chunks
Use hybrid semantic + BM25 search to find chunks matching your query:
# Hybrid search (semantic + BM25 with rank fusion)
rlm-rs search "your query" --buffer <buffer_name> --top-k 100
# JSON output for programmatic use
rlm-rs --format json search "your query" --top-k 100
Output includes chunk IDs with relevance scores and document position (index):
{
"count": 2,
"mode": "hybrid",
"query": "your query",
"results": [
{"chunk_id": 42, "buffer_id": 1, "index": 5, "score": 0.0328, "semantic_score": 0.0499, "bm25_score": 1.6e-6},
{"chunk_id": 17, "buffer_id": 1, "index": 2, "score": 0.0323, "semantic_score": 0.0457, "bm25_score": 1.2e-6}
]
}
index: Sequential position within the document (0-based) - use for temporal orderingbuffer_id: Which buffer/document this chunk belongs to
Extract chunk IDs sorted by document position: jq -r '.results | sort_by(.index) | .[].chunk_id'
Step 5: Retrieve Chunks by ID
Get specific chunk content via pass-by-reference:
# Get chunk content
rlm-rs chunk get 42
# With metadata
rlm-rs --format json chunk get 42 --metadata
Step 6: Subcall Loop (Batched, Parallel)
Only process chunks returned by search. Batch chunk IDs to reduce agent calls:
- Search returns chunk IDs with relevance scores and document indices
- Sort all chunk IDs by
index(document position) to preserve temporal context - Group sorted chunk IDs into batches (default 10, configurable via
batch_sizeargument) - Invoke
rlm-subcallagent once per batch using only the two required arguments - Launch batches in parallel via multiple Task calls in one response
- Agent handles retrieval internally via
rlm-rs chunk get <id>(NO buffer ID needed) - Collect structured JSON findings from all batches
IMPORTANT: Sort chunks by index before batching to preserve document flow. Each subagent should receive chunks in document order (e.g., 3,7,12,15,22 not 22,3,15,7,12). This ensures temporal context is maintained - definitions appear before usages, causes before effects.
CORRECT Task invocation - pass ONLY query and chunk_ids arguments (sorted by index):
Task subagent_type="rlm-rs:rlm-subcall" prompt="query='What errors occurred?' chunk_ids='3,7,12,15,22'"
Task subagent_type="rlm-rs:rlm-subcall" prompt="query='What errors occurred?' chunk_ids='28,31,45'"
CRITICAL - DO NOT:
- Write narrative prompts - the agent already knows what to do
- Include buffer ID or buffer NAME anywhere in the prompt
- Mention the buffer at all - chunk IDs are globally unique across all buffers
WRONG (causes exit code 2):
prompt="Analyze chunks from buffer 1..." # NO - has buffer ID
prompt="Analyze chunks from buffer 'myfile.txt'..." # NO - has buffer name
prompt="Use rlm-rs chunk 1 <id>..." # NO - buffer ID in command
prompt="Use rlm-rs chunk get <id> --buffer x..." # NO - --buffer flag doesn't exist
RIGHT:
prompt="query='the user question' chunk_ids='5,105,2,3,74'" # YES - just args!
Step 7: Synthesis
Once all chunks are processed:
- Collect all JSON findings from subcall agents
- Pass findings directly to
rlm-synthesizeragent (no intermediate files) - Present the final synthesized response to the user
Example Task tool invocation:
Task agent=rlm-synthesizer query="What errors occurred?" findings='[...]' chunk_ids="42,17,23"
Guardrails
- Never paste large chunks into main context - Use peek/grep to extract only relevant excerpts
- Keep subagent outputs compact - Request JSON format with short evidence fields
- Orchestration stays in main conversation - Subagents cannot spawn other subagents
- State persists in SQLite - All buffers survive across sessions via
.rlm/rlm-state.db - No file I/O for chunk passing - Use pass-by-reference with chunk IDs
Chunking Strategy Selection
| Content Type | Recommended Strategy |
|---|---|
| Markdown docs | semantic |
| Source code | semantic |
| JSON/XML | semantic |
| Plain logs | fixed with overlap |
| Unstructured text | fixed |
For detailed chunking guidance, refer to the rlm-chunking skill.
CLI Command Reference
| Command | Purpose |
|---|---|
init | Initialize database |
status | Show state summary |
load | Load file into buffer |
list | List all buffers |
show | Show buffer details |
peek | View buffer content slice |
grep | Search with regex |
search | Hybrid semantic + BM25 search |
chunk get | Retrieve chunk by ID |
chunk list | List buffer chunks |
chunk embed | Generate embeddings |
chunk status | Show embedding status |
write-chunks | Export chunks to files (legacy) |
add-buffer | Store intermediate results |
export-buffers | Export all buffers |
var | Get/set context variables |
reset | Clear all state |
Example Session
# 1. Initialize
rlm-rs init
# 2. Load a large log file
rlm-rs load server.log --name logs --chunker fixed --chunk-size 6000 --overlap 500
# 3. Search for relevant chunks
rlm-rs --format json search "database connection errors" --buffer logs --top-k 100
# 4. For each relevant chunk ID, invoke rlm-subcall agent
# 5. Collect JSON findings
# 6. Pass findings to rlm-synthesizer agent
# 7. Present final answer
Additional Resources
Reference Files
references/cli-reference.md- Complete CLI documentation
Related Components
rlm-subcallagent - Chunk-level analysis (Haiku)rlm-synthesizeragent - Result aggregation (Sonnet)rlm-chunkingskill - Chunking strategy selection
Source
git clone https://github.com/zircote/rlm-rs-plugin/blob/main/skills/rlm/SKILL.mdView on GitHub Overview
This skill orchestrates the RLM loop to process documents that exceed context window limits using the rlm-rs CLI. It implements the recursive language model pattern (RLM) from arXiv:2512.24601 to analyze content up to 100x larger than typical contexts by chunking, searching, and batched subcalls.
How This Skill Works
It initializes a local RLM database, loads the large file with semantic or fixed chunking (with optional overlap), and uses hybrid semantic + BM25 search to locate relevant chunks. It then retrieves chunk IDs ordered by document position, batches them (default 10) and invokes the rlm-subcall agent once per batch, often in parallel, to drive the recursive LLM loop.
When to Use It
- When you need to analyze a document that exceeds the model's context window
- When chunking a large file enables systematic analysis and cross-referencing
- When applying the RLM pattern to extend reasoning beyond a single pass
- When you want hybrid semantic + BM25 search to surface relevant chunks
- When you need batched, parallel subcalls to maintain performance on big data
Quick Start
- Step 1: Install rlm-rs and verify: cargo install rlm-rs; command -v rlm-rs
- Step 2: Initialize the DB and load a file: rlm-rs init; rlm-rs load <path> --name <buf> --chunker semantic|fixed --chunk-size <n> [--overlap <m>]
- Step 3: Run a search, get chunks by ID, and start the Subcall Loop: rlm-rs search "query" --buffer <buf> --top-k 100; batch IDs; invoke rlm-subcall per batch (parallel)
Best Practices
- Use semantic chunking for structured content; prefer fixed chunking for unstructured text
- Enable overlap between chunks to preserve continuity across boundaries
- Initialize and verify the RLM database before loading large files
- Combine semantic and BM25 searches for robust chunk retrieval
- Tune batch_size to balance latency and throughput during subcall loops
Example Use Cases
- Analyze a 100+ page report by chunking and querying across sections
- Extract insights from a lengthy contract or legal document larger than a single context
- Audit a large codebase or documentation repository with cross-referenced chunks
- Process a long research paper and generate section-wise summaries
- Build an RLM-powered QA over a large dataset using batched subcalls