What storage and indexing does Research Library use?

SQLite 3.45+ with FTS5 enables fast full-text indexing for documents and offline access.

How does material-type weighting affect search results?

Material Weighting assigns higher relevance to references (1.0) vs research items (0.5), guiding the ranking toward your own assets.

What is project isolation?

Searches are scoped to a project so assets do not contaminate results across projects, preserving boundaries.

Research Library

Scanned

@Jonbuckles

npx machina-cli add skill @Jonbuckles/research-library --openclaw

Files (1)

SKILL.md

4.7 KB

Research Library Skill

A local-first multimedia research library for capturing, organizing, and searching hardware project knowledge.

What It Does

Store documents — Code, PDFs, CAD files, images, schematics
Extract automatically — Text from PDFs, EXIF from images, functions from code
Search intelligently — Full-text with material-type weighting (your work ranks higher than external research)
Project isolation — Arduino separate from CNC; no contamination
Cross-reference — Link knowledge: "this servo tuning applies to that project"
Async extraction — Searches never block while OCR runs
Backup daily — 30-day rolling snapshots

Installation

clawhub install research-library
# OR
pip install /path/to/research-library

Quick Start

# Initialize database
reslib status

# Add a project
reslib add ~/projects/arduino/servo.py --project arduino --material-type reference

# Search
reslib search "servo tuning"

# Link knowledge
reslib link 5 12 --type applies_to

Features

CLI Commands

reslib add — Import documents (auto-detect + extract)
reslib search — Full-text search with filters
reslib get — View document details
reslib archive / reslib unarchive — Manage documents
reslib export — Export as JSON/Markdown
reslib link — Create document relationships
reslib projects — Manage projects
reslib tags — Manage tags
reslib status — System overview
reslib backup / reslib restore — Snapshots
reslib smoke_test.sh — Quick validation

Technical

Storage: SQLite 3.45+ with FTS5 virtual table
Extraction: PDF (pdfplumber + OCR), images (EXIF + OCR), code (AST + regex)
Confidence Scoring: 0.0-1.0 based on quality + source
Material Weighting: Reference (1.0) vs Research (0.5)
Project Isolation: Scoped searches, no contamination
Async Workers: 2-4 configurable extraction workers
Catalog Separation: real_world vs openclaw projects
Backup: Daily snapshots, 30-day retention

Configuration

Copy reslib/config.json and customize:

{
  "db_path": "~/.openclaw/research/library.db",
  "num_workers": 2,
  "worker_timeout_sec": 300,
  "max_retries": 3,
  "backup_retention_days": 30,
  "backup_dir": "~/.openclaw/research/backups",
  "file_size_limit_mb": 200,
  "project_size_limit_gb": 2
}

Integration with War Room

Use RL1 protocol in war room DNA:

from reslib import ResearchDatabase, ResearchSearch

db = ResearchDatabase()
search = ResearchSearch(db)

# Before researching, check existing knowledge
prior = search.search("servo tuning", project="rc-quadcopter")
if prior:
    print(f"Found {len(prior)} prior items")
else:
    # New research needed...
    db.add_research(title="...", content="...", ...)

Performance

All targets exceeded:

Operation	Target	Actual
PDF extraction	<100ms	20.6ms
Search (50 docs)	<100ms	0.33ms
Worker throughput	>6/sec	414.69/sec

Testing

# Run all tests
pytest tests/

# Quick smoke test
bash reslib/smoke_test.sh

# Performance tests
pytest tests/test_integration.py -v -k stress

Known Limitations (Phase 2)

OCR quality varies on hand-drawn sketches
FTS5 designed for <10K documents (PostgreSQL path for scale)
No automatic web research gathering (manual only)
Vector embeddings ready but inactive
CAD file parsing is metadata-only

Documentation

See /docs/:

CLI-REFERENCE.md — All commands + examples
EXTRACTION-GUIDE.md — How extraction works
SEARCH-GUIDE.md — Ranking + weighting
WORKER-GUIDE.md — Async queue details
INTEGRATION.md — War room RL1 protocol

Phase 2 Roadmap

Real-world PDF calibration
FTS5 scaling tests (10K docs)
Auto-detection (reference vs research)
Web research enrichment
Vector embeddings (semantic search)
PostgreSQL upgrade path

Building From Source

cd research-library
pip install -e .
pytest tests/
python -m reslib status

Support

Issues? See TECHNICAL-NOTES.md for troubleshooting.

Production-ready MVP. 214 tests passing. 15K lines. Ready to use.

Source

git clone https://clawhub.ai/Jonbuckles/research-libraryView on GitHub

Overview

A local-first multimedia knowledge base for hardware projects that captures code, CAD, PDFs, and images. It automatically extracts text, EXIF data, and code structures, and offers smart search with material-type weighting. Projects stay isolated but can be cross-referenced, with asynchronous processing and daily backups.

How This Skill Works

Data is stored in SQLite with FTS5 for fast full-text search. An extraction pipeline handles PDFs (text and OCR), images (EXIF and OCR), and code (AST and regex). Searches use material-weighting and scoped project isolation; extraction runs asynchronously with configurable workers, and daily backup snapshots preserve history.

When to Use It

When organizing hardware project assets (code, CAD, PDFs, images) across multiple projects
When you need cross-referenced knowledge that links related items
When you require non-blocking extraction so searches stay responsive
When you manage backups with daily snapshots and 30-day retention
When you want material-type weighting to prioritize your own references over external research

Quick Start

Step 1: Initialize database — reslib status
Step 2: Add a project — reslib add ~/projects/arduino/servo.py --project arduino --material-type reference
Step 3: Search and link — reslib search "servo tuning"; reslib link 5 12 --type applies_to

Best Practices

Tag imported items with material-type and project identifiers
Keep projects self-contained to maximize isolation
Use reslib link to create explicit relationships between documents
Run regular backups and monitor backup_retention
Tune num_workers and file_size_limit_mb in config to balance speed and resources

Example Use Cases

Organize an Arduino servo project with code, CAD, and manuals
Store CNC project sheets and CAD drawings with cross-referenced tuning notes
Archive reference materials for a quadcopter build and relate servo tuning to other projects
Capture firmware code and manuals with OCR from handwritten notes
Backup and restore with daily snapshots during a hardware prototyping sprint

Frequently Asked Questions

Add this skill to your agents