Get the FREE Ultimate OpenClaw Setup Guide →
W

Research Library

Scanned

@Jonbuckles

npx machina-cli add skill @Jonbuckles/research-library --openclaw
Files (1)
SKILL.md
4.7 KB

Research Library Skill

A local-first multimedia research library for capturing, organizing, and searching hardware project knowledge.

What It Does

  • Store documents — Code, PDFs, CAD files, images, schematics
  • Extract automatically — Text from PDFs, EXIF from images, functions from code
  • Search intelligently — Full-text with material-type weighting (your work ranks higher than external research)
  • Project isolation — Arduino separate from CNC; no contamination
  • Cross-reference — Link knowledge: "this servo tuning applies to that project"
  • Async extraction — Searches never block while OCR runs
  • Backup daily — 30-day rolling snapshots

Installation

clawhub install research-library
# OR
pip install /path/to/research-library

Quick Start

# Initialize database
reslib status

# Add a project
reslib add ~/projects/arduino/servo.py --project arduino --material-type reference

# Search
reslib search "servo tuning"

# Link knowledge
reslib link 5 12 --type applies_to

Features

CLI Commands

  • reslib add — Import documents (auto-detect + extract)
  • reslib search — Full-text search with filters
  • reslib get — View document details
  • reslib archive / reslib unarchive — Manage documents
  • reslib export — Export as JSON/Markdown
  • reslib link — Create document relationships
  • reslib projects — Manage projects
  • reslib tags — Manage tags
  • reslib status — System overview
  • reslib backup / reslib restore — Snapshots
  • reslib smoke_test.sh — Quick validation

Technical

  • Storage: SQLite 3.45+ with FTS5 virtual table
  • Extraction: PDF (pdfplumber + OCR), images (EXIF + OCR), code (AST + regex)
  • Confidence Scoring: 0.0-1.0 based on quality + source
  • Material Weighting: Reference (1.0) vs Research (0.5)
  • Project Isolation: Scoped searches, no contamination
  • Async Workers: 2-4 configurable extraction workers
  • Catalog Separation: real_world vs openclaw projects
  • Backup: Daily snapshots, 30-day retention

Configuration

Copy reslib/config.json and customize:

{
  "db_path": "~/.openclaw/research/library.db",
  "num_workers": 2,
  "worker_timeout_sec": 300,
  "max_retries": 3,
  "backup_retention_days": 30,
  "backup_dir": "~/.openclaw/research/backups",
  "file_size_limit_mb": 200,
  "project_size_limit_gb": 2
}

Integration with War Room

Use RL1 protocol in war room DNA:

from reslib import ResearchDatabase, ResearchSearch

db = ResearchDatabase()
search = ResearchSearch(db)

# Before researching, check existing knowledge
prior = search.search("servo tuning", project="rc-quadcopter")
if prior:
    print(f"Found {len(prior)} prior items")
else:
    # New research needed...
    db.add_research(title="...", content="...", ...)

Performance

All targets exceeded:

OperationTargetActual
PDF extraction<100ms20.6ms
Search (50 docs)<100ms0.33ms
Worker throughput>6/sec414.69/sec

Testing

# Run all tests
pytest tests/

# Quick smoke test
bash reslib/smoke_test.sh

# Performance tests
pytest tests/test_integration.py -v -k stress

Known Limitations (Phase 2)

  • OCR quality varies on hand-drawn sketches
  • FTS5 designed for <10K documents (PostgreSQL path for scale)
  • No automatic web research gathering (manual only)
  • Vector embeddings ready but inactive
  • CAD file parsing is metadata-only

Documentation

See /docs/:

  • CLI-REFERENCE.md — All commands + examples
  • EXTRACTION-GUIDE.md — How extraction works
  • SEARCH-GUIDE.md — Ranking + weighting
  • WORKER-GUIDE.md — Async queue details
  • INTEGRATION.md — War room RL1 protocol

Phase 2 Roadmap

  • Real-world PDF calibration
  • FTS5 scaling tests (10K docs)
  • Auto-detection (reference vs research)
  • Web research enrichment
  • Vector embeddings (semantic search)
  • PostgreSQL upgrade path

Building From Source

cd research-library
pip install -e .
pytest tests/
python -m reslib status

Support

Issues? See TECHNICAL-NOTES.md for troubleshooting.


Production-ready MVP. 214 tests passing. 15K lines. Ready to use.

Source

git clone https://clawhub.ai/Jonbuckles/research-libraryView on GitHub

Overview

A local-first multimedia knowledge base for hardware projects that captures code, CAD, PDFs, and images. It automatically extracts text, EXIF data, and code structures, and offers smart search with material-type weighting. Projects stay isolated but can be cross-referenced, with asynchronous processing and daily backups.

How This Skill Works

Data is stored in SQLite with FTS5 for fast full-text search. An extraction pipeline handles PDFs (text and OCR), images (EXIF and OCR), and code (AST and regex). Searches use material-weighting and scoped project isolation; extraction runs asynchronously with configurable workers, and daily backup snapshots preserve history.

When to Use It

  • When organizing hardware project assets (code, CAD, PDFs, images) across multiple projects
  • When you need cross-referenced knowledge that links related items
  • When you require non-blocking extraction so searches stay responsive
  • When you manage backups with daily snapshots and 30-day retention
  • When you want material-type weighting to prioritize your own references over external research

Quick Start

  1. Step 1: Initialize database — reslib status
  2. Step 2: Add a project — reslib add ~/projects/arduino/servo.py --project arduino --material-type reference
  3. Step 3: Search and link — reslib search "servo tuning"; reslib link 5 12 --type applies_to

Best Practices

  • Tag imported items with material-type and project identifiers
  • Keep projects self-contained to maximize isolation
  • Use reslib link to create explicit relationships between documents
  • Run regular backups and monitor backup_retention
  • Tune num_workers and file_size_limit_mb in config to balance speed and resources

Example Use Cases

  • Organize an Arduino servo project with code, CAD, and manuals
  • Store CNC project sheets and CAD drawings with cross-referenced tuning notes
  • Archive reference materials for a quadcopter build and relate servo tuning to other projects
  • Capture firmware code and manuals with OCR from handwritten notes
  • Backup and restore with daily snapshots during a hardware prototyping sprint

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers