Research Library
Scanned@Jonbuckles
npx machina-cli add skill @Jonbuckles/research-library --openclawResearch Library Skill
A local-first multimedia research library for capturing, organizing, and searching hardware project knowledge.
What It Does
- Store documents — Code, PDFs, CAD files, images, schematics
- Extract automatically — Text from PDFs, EXIF from images, functions from code
- Search intelligently — Full-text with material-type weighting (your work ranks higher than external research)
- Project isolation — Arduino separate from CNC; no contamination
- Cross-reference — Link knowledge: "this servo tuning applies to that project"
- Async extraction — Searches never block while OCR runs
- Backup daily — 30-day rolling snapshots
Installation
clawhub install research-library
# OR
pip install /path/to/research-library
Quick Start
# Initialize database
reslib status
# Add a project
reslib add ~/projects/arduino/servo.py --project arduino --material-type reference
# Search
reslib search "servo tuning"
# Link knowledge
reslib link 5 12 --type applies_to
Features
CLI Commands
reslib add— Import documents (auto-detect + extract)reslib search— Full-text search with filtersreslib get— View document detailsreslib archive/reslib unarchive— Manage documentsreslib export— Export as JSON/Markdownreslib link— Create document relationshipsreslib projects— Manage projectsreslib tags— Manage tagsreslib status— System overviewreslib backup/reslib restore— Snapshotsreslib smoke_test.sh— Quick validation
Technical
- Storage: SQLite 3.45+ with FTS5 virtual table
- Extraction: PDF (pdfplumber + OCR), images (EXIF + OCR), code (AST + regex)
- Confidence Scoring: 0.0-1.0 based on quality + source
- Material Weighting: Reference (1.0) vs Research (0.5)
- Project Isolation: Scoped searches, no contamination
- Async Workers: 2-4 configurable extraction workers
- Catalog Separation: real_world vs openclaw projects
- Backup: Daily snapshots, 30-day retention
Configuration
Copy reslib/config.json and customize:
{
"db_path": "~/.openclaw/research/library.db",
"num_workers": 2,
"worker_timeout_sec": 300,
"max_retries": 3,
"backup_retention_days": 30,
"backup_dir": "~/.openclaw/research/backups",
"file_size_limit_mb": 200,
"project_size_limit_gb": 2
}
Integration with War Room
Use RL1 protocol in war room DNA:
from reslib import ResearchDatabase, ResearchSearch
db = ResearchDatabase()
search = ResearchSearch(db)
# Before researching, check existing knowledge
prior = search.search("servo tuning", project="rc-quadcopter")
if prior:
print(f"Found {len(prior)} prior items")
else:
# New research needed...
db.add_research(title="...", content="...", ...)
Performance
All targets exceeded:
| Operation | Target | Actual |
|---|---|---|
| PDF extraction | <100ms | 20.6ms |
| Search (50 docs) | <100ms | 0.33ms |
| Worker throughput | >6/sec | 414.69/sec |
Testing
# Run all tests
pytest tests/
# Quick smoke test
bash reslib/smoke_test.sh
# Performance tests
pytest tests/test_integration.py -v -k stress
Known Limitations (Phase 2)
- OCR quality varies on hand-drawn sketches
- FTS5 designed for <10K documents (PostgreSQL path for scale)
- No automatic web research gathering (manual only)
- Vector embeddings ready but inactive
- CAD file parsing is metadata-only
Documentation
See /docs/:
CLI-REFERENCE.md— All commands + examplesEXTRACTION-GUIDE.md— How extraction worksSEARCH-GUIDE.md— Ranking + weightingWORKER-GUIDE.md— Async queue detailsINTEGRATION.md— War room RL1 protocol
Phase 2 Roadmap
- Real-world PDF calibration
- FTS5 scaling tests (10K docs)
- Auto-detection (reference vs research)
- Web research enrichment
- Vector embeddings (semantic search)
- PostgreSQL upgrade path
Building From Source
cd research-library
pip install -e .
pytest tests/
python -m reslib status
Support
Issues? See TECHNICAL-NOTES.md for troubleshooting.
Production-ready MVP. 214 tests passing. 15K lines. Ready to use.
Overview
A local-first multimedia knowledge base for hardware projects that captures code, CAD, PDFs, and images. It automatically extracts text, EXIF data, and code structures, and offers smart search with material-type weighting. Projects stay isolated but can be cross-referenced, with asynchronous processing and daily backups.
How This Skill Works
Data is stored in SQLite with FTS5 for fast full-text search. An extraction pipeline handles PDFs (text and OCR), images (EXIF and OCR), and code (AST and regex). Searches use material-weighting and scoped project isolation; extraction runs asynchronously with configurable workers, and daily backup snapshots preserve history.
When to Use It
- When organizing hardware project assets (code, CAD, PDFs, images) across multiple projects
- When you need cross-referenced knowledge that links related items
- When you require non-blocking extraction so searches stay responsive
- When you manage backups with daily snapshots and 30-day retention
- When you want material-type weighting to prioritize your own references over external research
Quick Start
- Step 1: Initialize database — reslib status
- Step 2: Add a project — reslib add ~/projects/arduino/servo.py --project arduino --material-type reference
- Step 3: Search and link — reslib search "servo tuning"; reslib link 5 12 --type applies_to
Best Practices
- Tag imported items with material-type and project identifiers
- Keep projects self-contained to maximize isolation
- Use reslib link to create explicit relationships between documents
- Run regular backups and monitor backup_retention
- Tune num_workers and file_size_limit_mb in config to balance speed and resources
Example Use Cases
- Organize an Arduino servo project with code, CAD, and manuals
- Store CNC project sheets and CAD drawings with cross-referenced tuning notes
- Archive reference materials for a quadcopter build and relate servo tuning to other projects
- Capture firmware code and manuals with OCR from handwritten notes
- Backup and restore with daily snapshots during a hardware prototyping sprint