What is the RLM workflow?

It is a recursive language model approach that analyzes content beyond a single context window by chunking, retrieving, and batched subcalls via the rlm-rs CLI.

How should I choose chunking?

Semantic chunking is preferred for structured content; fixed chunking works for unstructured text; overlap helps continuity between chunks.

What tools do I need?

You need rlm-rs installed (cargo install rlm-rs or via Homebrew). The workflow runs locally and coordinates steps across the RLM loop with the root LLM and rlm-subcall.

rlm

npx machina-cli add skill zircote/rlm-rs-plugin/rlm --openclaw

Files (1)

SKILL.md

7.9 KB

RLM (Recursive Language Model) Workflow

Orchestrate processing of documents that exceed context window limits using the rlm-rs CLI tool. This skill implements the RLM pattern from arXiv:2512.24601, enabling analysis of content up to 100x larger than typical context windows.

Architecture Mapping

RLM Concept	Implementation
Root LLM	Main Claude Code conversation (Opus/Sonnet)
Sub-LLM (`llm_query`)	`rlm-subcall` agent (Haiku)
External Environment	`rlm-rs` CLI with SQLite storage

Prerequisites

Verify rlm-rs is installed and available:

command -v rlm-rs >/dev/null 2>&1 || echo "INSTALL REQUIRED: cargo install rlm-rs"

Installation options:

# Via Cargo (recommended)
cargo install rlm-rs

# Via Homebrew
brew install zircote/tap/rlm-rs

Workflow Steps

Step 1: Initialize Database

Create or verify the RLM database:

rlm-rs init
rlm-rs status

If already initialized, status shows current buffers and state.

Step 2: Load Context File

Load the large document into a buffer with appropriate chunking:

# Semantic chunking (recommended for structured content)
rlm-rs load <file_path> --name <buffer_name> --chunker semantic

# Fixed chunking (for unstructured text)
rlm-rs load <file_path> --name <buffer_name> --chunker fixed --chunk-size 6000

# With overlap for continuity
rlm-rs load <file_path> --name <buffer_name> --chunker fixed --chunk-size 6000 --overlap 1000

Step 3: Scout the Content

Examine the beginning and end to understand structure:

# View first 3000 characters
rlm-rs peek <buffer_name> --start 0 --end 3000

# View last 3000 characters
rlm-rs peek <buffer_name> --start -3000

Search for relevant sections:

rlm-rs grep <buffer_name> "<pattern>" --max-matches 20 --window 150

Step 4: Search for Relevant Chunks

Use hybrid semantic + BM25 search to find chunks matching your query:

# Hybrid search (semantic + BM25 with rank fusion)
rlm-rs search "your query" --buffer <buffer_name> --top-k 100

# JSON output for programmatic use
rlm-rs --format json search "your query" --top-k 100

Output includes chunk IDs with relevance scores and document position (index):

{
  "count": 2,
  "mode": "hybrid",
  "query": "your query",
  "results": [
    {"chunk_id": 42, "buffer_id": 1, "index": 5, "score": 0.0328, "semantic_score": 0.0499, "bm25_score": 1.6e-6},
    {"chunk_id": 17, "buffer_id": 1, "index": 2, "score": 0.0323, "semantic_score": 0.0457, "bm25_score": 1.2e-6}
  ]
}

index: Sequential position within the document (0-based) - use for temporal ordering
buffer_id: Which buffer/document this chunk belongs to

Extract chunk IDs sorted by document position: jq -r '.results | sort_by(.index) | .[].chunk_id'

Step 5: Retrieve Chunks by ID

Get specific chunk content via pass-by-reference:

# Get chunk content
rlm-rs chunk get 42

# With metadata
rlm-rs --format json chunk get 42 --metadata

Step 6: Subcall Loop (Batched, Parallel)

Only process chunks returned by search. Batch chunk IDs to reduce agent calls:

Search returns chunk IDs with relevance scores and document indices
Sort all chunk IDs by index (document position) to preserve temporal context
Group sorted chunk IDs into batches (default 10, configurable via batch_size argument)
Invoke rlm-subcall agent once per batch using only the two required arguments
Launch batches in parallel via multiple Task calls in one response
Agent handles retrieval internally via rlm-rs chunk get <id> (NO buffer ID needed)
Collect structured JSON findings from all batches

IMPORTANT: Sort chunks by index before batching to preserve document flow. Each subagent should receive chunks in document order (e.g., 3,7,12,15,22 not 22,3,15,7,12). This ensures temporal context is maintained - definitions appear before usages, causes before effects.

CORRECT Task invocation - pass ONLY query and chunk_ids arguments (sorted by index):

Task subagent_type="rlm-rs:rlm-subcall" prompt="query='What errors occurred?' chunk_ids='3,7,12,15,22'"
Task subagent_type="rlm-rs:rlm-subcall" prompt="query='What errors occurred?' chunk_ids='28,31,45'"

CRITICAL - DO NOT:

Write narrative prompts - the agent already knows what to do
Include buffer ID or buffer NAME anywhere in the prompt
Mention the buffer at all - chunk IDs are globally unique across all buffers

WRONG (causes exit code 2):

prompt="Analyze chunks from buffer 1..."  # NO - has buffer ID
prompt="Analyze chunks from buffer 'myfile.txt'..."  # NO - has buffer name
prompt="Use rlm-rs chunk 1 <id>..."  # NO - buffer ID in command
prompt="Use rlm-rs chunk get <id> --buffer x..."  # NO - --buffer flag doesn't exist

RIGHT:

prompt="query='the user question' chunk_ids='5,105,2,3,74'"  # YES - just args!

Step 7: Synthesis

Once all chunks are processed:

Collect all JSON findings from subcall agents
Pass findings directly to rlm-synthesizer agent (no intermediate files)
Present the final synthesized response to the user

Example Task tool invocation:

Task agent=rlm-synthesizer query="What errors occurred?" findings='[...]' chunk_ids="42,17,23"

Guardrails

Never paste large chunks into main context - Use peek/grep to extract only relevant excerpts
Keep subagent outputs compact - Request JSON format with short evidence fields
Orchestration stays in main conversation - Subagents cannot spawn other subagents
State persists in SQLite - All buffers survive across sessions via .rlm/rlm-state.db
No file I/O for chunk passing - Use pass-by-reference with chunk IDs

Chunking Strategy Selection

Content Type	Recommended Strategy
Markdown docs	`semantic`
Source code	`semantic`
JSON/XML	`semantic`
Plain logs	`fixed` with overlap
Unstructured text	`fixed`

For detailed chunking guidance, refer to the rlm-chunking skill.

CLI Command Reference

Command	Purpose
`init`	Initialize database
`status`	Show state summary
`load`	Load file into buffer
`list`	List all buffers
`show`	Show buffer details
`peek`	View buffer content slice
`grep`	Search with regex
`search`	Hybrid semantic + BM25 search
`chunk get`	Retrieve chunk by ID
`chunk list`	List buffer chunks
`chunk embed`	Generate embeddings
`chunk status`	Show embedding status
`write-chunks`	Export chunks to files (legacy)
`add-buffer`	Store intermediate results
`export-buffers`	Export all buffers
`var`	Get/set context variables
`reset`	Clear all state

Example Session

# 1. Initialize
rlm-rs init

# 2. Load a large log file
rlm-rs load server.log --name logs --chunker fixed --chunk-size 6000 --overlap 500

# 3. Search for relevant chunks
rlm-rs --format json search "database connection errors" --buffer logs --top-k 100

# 4. For each relevant chunk ID, invoke rlm-subcall agent
# 5. Collect JSON findings
# 6. Pass findings to rlm-synthesizer agent
# 7. Present final answer

Additional Resources

Reference Files

references/cli-reference.md - Complete CLI documentation

Related Components

rlm-subcall agent - Chunk-level analysis (Haiku)
rlm-synthesizer agent - Result aggregation (Sonnet)
rlm-chunking skill - Chunking strategy selection

Source

git clone https://github.com/zircote/rlm-rs-plugin/blob/main/skills/rlm/SKILL.mdView on GitHub

Overview

This skill orchestrates the RLM loop to process documents that exceed context window limits using the rlm-rs CLI. It implements the recursive language model pattern (RLM) from arXiv:2512.24601 to analyze content up to 100x larger than typical contexts by chunking, searching, and batched subcalls.

How This Skill Works

It initializes a local RLM database, loads the large file with semantic or fixed chunking (with optional overlap), and uses hybrid semantic + BM25 search to locate relevant chunks. It then retrieves chunk IDs ordered by document position, batches them (default 10) and invokes the rlm-subcall agent once per batch, often in parallel, to drive the recursive LLM loop.

When to Use It

When you need to analyze a document that exceeds the model's context window
When chunking a large file enables systematic analysis and cross-referencing
When applying the RLM pattern to extend reasoning beyond a single pass
When you want hybrid semantic + BM25 search to surface relevant chunks
When you need batched, parallel subcalls to maintain performance on big data

Quick Start

Step 1: Install rlm-rs and verify: cargo install rlm-rs; command -v rlm-rs
Step 2: Initialize the DB and load a file: rlm-rs init; rlm-rs load <path> --name <buf> --chunker semantic|fixed --chunk-size <n> [--overlap <m>]
Step 3: Run a search, get chunks by ID, and start the Subcall Loop: rlm-rs search "query" --buffer <buf> --top-k 100; batch IDs; invoke rlm-subcall per batch (parallel)

Best Practices

Use semantic chunking for structured content; prefer fixed chunking for unstructured text
Enable overlap between chunks to preserve continuity across boundaries
Initialize and verify the RLM database before loading large files
Combine semantic and BM25 searches for robust chunk retrieval
Tune batch_size to balance latency and throughput during subcall loops

Example Use Cases

Analyze a 100+ page report by chunking and querying across sections
Extract insights from a lengthy contract or legal document larger than a single context
Audit a large codebase or documentation repository with cross-referenced chunks
Process a long research paper and generate section-wise summaries
Build an RLM-powered QA over a large dataset using batched subcalls

Frequently Asked Questions

Add this skill to your agents