Get the FREE Ultimate OpenClaw Setup Guide →

rlm

npx machina-cli add skill zircote/rlm-rs-plugin/rlm --openclaw
Files (1)
SKILL.md
7.9 KB

RLM (Recursive Language Model) Workflow

Orchestrate processing of documents that exceed context window limits using the rlm-rs CLI tool. This skill implements the RLM pattern from arXiv:2512.24601, enabling analysis of content up to 100x larger than typical context windows.

Architecture Mapping

RLM ConceptImplementation
Root LLMMain Claude Code conversation (Opus/Sonnet)
Sub-LLM (llm_query)rlm-subcall agent (Haiku)
External Environmentrlm-rs CLI with SQLite storage

Prerequisites

Verify rlm-rs is installed and available:

command -v rlm-rs >/dev/null 2>&1 || echo "INSTALL REQUIRED: cargo install rlm-rs"

Installation options:

# Via Cargo (recommended)
cargo install rlm-rs

# Via Homebrew
brew install zircote/tap/rlm-rs

Workflow Steps

Step 1: Initialize Database

Create or verify the RLM database:

rlm-rs init
rlm-rs status

If already initialized, status shows current buffers and state.

Step 2: Load Context File

Load the large document into a buffer with appropriate chunking:

# Semantic chunking (recommended for structured content)
rlm-rs load <file_path> --name <buffer_name> --chunker semantic

# Fixed chunking (for unstructured text)
rlm-rs load <file_path> --name <buffer_name> --chunker fixed --chunk-size 6000

# With overlap for continuity
rlm-rs load <file_path> --name <buffer_name> --chunker fixed --chunk-size 6000 --overlap 1000

Step 3: Scout the Content

Examine the beginning and end to understand structure:

# View first 3000 characters
rlm-rs peek <buffer_name> --start 0 --end 3000

# View last 3000 characters
rlm-rs peek <buffer_name> --start -3000

Search for relevant sections:

rlm-rs grep <buffer_name> "<pattern>" --max-matches 20 --window 150

Step 4: Search for Relevant Chunks

Use hybrid semantic + BM25 search to find chunks matching your query:

# Hybrid search (semantic + BM25 with rank fusion)
rlm-rs search "your query" --buffer <buffer_name> --top-k 100

# JSON output for programmatic use
rlm-rs --format json search "your query" --top-k 100

Output includes chunk IDs with relevance scores and document position (index):

{
  "count": 2,
  "mode": "hybrid",
  "query": "your query",
  "results": [
    {"chunk_id": 42, "buffer_id": 1, "index": 5, "score": 0.0328, "semantic_score": 0.0499, "bm25_score": 1.6e-6},
    {"chunk_id": 17, "buffer_id": 1, "index": 2, "score": 0.0323, "semantic_score": 0.0457, "bm25_score": 1.2e-6}
  ]
}
  • index: Sequential position within the document (0-based) - use for temporal ordering
  • buffer_id: Which buffer/document this chunk belongs to

Extract chunk IDs sorted by document position: jq -r '.results | sort_by(.index) | .[].chunk_id'

Step 5: Retrieve Chunks by ID

Get specific chunk content via pass-by-reference:

# Get chunk content
rlm-rs chunk get 42

# With metadata
rlm-rs --format json chunk get 42 --metadata

Step 6: Subcall Loop (Batched, Parallel)

Only process chunks returned by search. Batch chunk IDs to reduce agent calls:

  1. Search returns chunk IDs with relevance scores and document indices
  2. Sort all chunk IDs by index (document position) to preserve temporal context
  3. Group sorted chunk IDs into batches (default 10, configurable via batch_size argument)
  4. Invoke rlm-subcall agent once per batch using only the two required arguments
  5. Launch batches in parallel via multiple Task calls in one response
  6. Agent handles retrieval internally via rlm-rs chunk get <id> (NO buffer ID needed)
  7. Collect structured JSON findings from all batches

IMPORTANT: Sort chunks by index before batching to preserve document flow. Each subagent should receive chunks in document order (e.g., 3,7,12,15,22 not 22,3,15,7,12). This ensures temporal context is maintained - definitions appear before usages, causes before effects.

CORRECT Task invocation - pass ONLY query and chunk_ids arguments (sorted by index):

Task subagent_type="rlm-rs:rlm-subcall" prompt="query='What errors occurred?' chunk_ids='3,7,12,15,22'"
Task subagent_type="rlm-rs:rlm-subcall" prompt="query='What errors occurred?' chunk_ids='28,31,45'"

CRITICAL - DO NOT:

  • Write narrative prompts - the agent already knows what to do
  • Include buffer ID or buffer NAME anywhere in the prompt
  • Mention the buffer at all - chunk IDs are globally unique across all buffers

WRONG (causes exit code 2):

prompt="Analyze chunks from buffer 1..."  # NO - has buffer ID
prompt="Analyze chunks from buffer 'myfile.txt'..."  # NO - has buffer name
prompt="Use rlm-rs chunk 1 <id>..."  # NO - buffer ID in command
prompt="Use rlm-rs chunk get <id> --buffer x..."  # NO - --buffer flag doesn't exist

RIGHT:

prompt="query='the user question' chunk_ids='5,105,2,3,74'"  # YES - just args!

Step 7: Synthesis

Once all chunks are processed:

  1. Collect all JSON findings from subcall agents
  2. Pass findings directly to rlm-synthesizer agent (no intermediate files)
  3. Present the final synthesized response to the user

Example Task tool invocation:

Task agent=rlm-synthesizer query="What errors occurred?" findings='[...]' chunk_ids="42,17,23"

Guardrails

  • Never paste large chunks into main context - Use peek/grep to extract only relevant excerpts
  • Keep subagent outputs compact - Request JSON format with short evidence fields
  • Orchestration stays in main conversation - Subagents cannot spawn other subagents
  • State persists in SQLite - All buffers survive across sessions via .rlm/rlm-state.db
  • No file I/O for chunk passing - Use pass-by-reference with chunk IDs

Chunking Strategy Selection

Content TypeRecommended Strategy
Markdown docssemantic
Source codesemantic
JSON/XMLsemantic
Plain logsfixed with overlap
Unstructured textfixed

For detailed chunking guidance, refer to the rlm-chunking skill.

CLI Command Reference

CommandPurpose
initInitialize database
statusShow state summary
loadLoad file into buffer
listList all buffers
showShow buffer details
peekView buffer content slice
grepSearch with regex
searchHybrid semantic + BM25 search
chunk getRetrieve chunk by ID
chunk listList buffer chunks
chunk embedGenerate embeddings
chunk statusShow embedding status
write-chunksExport chunks to files (legacy)
add-bufferStore intermediate results
export-buffersExport all buffers
varGet/set context variables
resetClear all state

Example Session

# 1. Initialize
rlm-rs init

# 2. Load a large log file
rlm-rs load server.log --name logs --chunker fixed --chunk-size 6000 --overlap 500

# 3. Search for relevant chunks
rlm-rs --format json search "database connection errors" --buffer logs --top-k 100

# 4. For each relevant chunk ID, invoke rlm-subcall agent
# 5. Collect JSON findings
# 6. Pass findings to rlm-synthesizer agent
# 7. Present final answer

Additional Resources

Reference Files

  • references/cli-reference.md - Complete CLI documentation

Related Components

  • rlm-subcall agent - Chunk-level analysis (Haiku)
  • rlm-synthesizer agent - Result aggregation (Sonnet)
  • rlm-chunking skill - Chunking strategy selection

Source

git clone https://github.com/zircote/rlm-rs-plugin/blob/main/skills/rlm/SKILL.mdView on GitHub

Overview

This skill orchestrates the RLM loop to process documents that exceed context window limits using the rlm-rs CLI. It implements the recursive language model pattern (RLM) from arXiv:2512.24601 to analyze content up to 100x larger than typical contexts by chunking, searching, and batched subcalls.

How This Skill Works

It initializes a local RLM database, loads the large file with semantic or fixed chunking (with optional overlap), and uses hybrid semantic + BM25 search to locate relevant chunks. It then retrieves chunk IDs ordered by document position, batches them (default 10) and invokes the rlm-subcall agent once per batch, often in parallel, to drive the recursive LLM loop.

When to Use It

  • When you need to analyze a document that exceeds the model's context window
  • When chunking a large file enables systematic analysis and cross-referencing
  • When applying the RLM pattern to extend reasoning beyond a single pass
  • When you want hybrid semantic + BM25 search to surface relevant chunks
  • When you need batched, parallel subcalls to maintain performance on big data

Quick Start

  1. Step 1: Install rlm-rs and verify: cargo install rlm-rs; command -v rlm-rs
  2. Step 2: Initialize the DB and load a file: rlm-rs init; rlm-rs load <path> --name <buf> --chunker semantic|fixed --chunk-size <n> [--overlap <m>]
  3. Step 3: Run a search, get chunks by ID, and start the Subcall Loop: rlm-rs search "query" --buffer <buf> --top-k 100; batch IDs; invoke rlm-subcall per batch (parallel)

Best Practices

  • Use semantic chunking for structured content; prefer fixed chunking for unstructured text
  • Enable overlap between chunks to preserve continuity across boundaries
  • Initialize and verify the RLM database before loading large files
  • Combine semantic and BM25 searches for robust chunk retrieval
  • Tune batch_size to balance latency and throughput during subcall loops

Example Use Cases

  • Analyze a 100+ page report by chunking and querying across sections
  • Extract insights from a lengthy contract or legal document larger than a single context
  • Audit a large codebase or documentation repository with cross-referenced chunks
  • Process a long research paper and generate section-wise summaries
  • Build an RLM-powered QA over a large dataset using batched subcalls

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers