knowledge-base
npx machina-cli add skill SageMindAI/instar/knowledge-base --openclawknowledge-base -- Searchable Knowledge Base for Instar Agents
Build a searchable knowledge base from external sources -- URLs, documents, transcripts, PDFs. Uses the existing MemoryIndex (FTS5) for search, so no new dependencies.
How It Works
The knowledge base is a set of markdown files in .instar/knowledge/ that MemoryIndex indexes alongside your other memory files. Each file has YAML frontmatter for metadata and is tracked in a catalog for browsing.
.instar/knowledge/
catalog.json # Registry of all ingested sources
articles/ # Ingested web articles
transcripts/ # Video/audio transcripts
docs/ # Curated reference documentation
Ingesting Content
Via CLI
# Ingest text content directly
instar knowledge ingest "Article content here..." --title "My Article" --tags "AI,agents"
# Ingest from a URL (fetch first, then ingest)
# Step 1: Fetch the content
python3 .claude/scripts/smart-fetch.py "https://example.com/article" --auto > /tmp/fetched.md
# Step 2: Ingest it
instar knowledge ingest "$(cat /tmp/fetched.md)" --title "Article Title" --url "https://example.com/article" --tags "topic1,topic2"
Via API
curl -X POST http://localhost:4040/knowledge/ingest \
-H "Content-Type: application/json" \
-d '{
"content": "The article content...",
"title": "Article Title",
"url": "https://example.com/article",
"type": "article",
"tags": ["AI", "infrastructure"],
"summary": "Brief description"
}'
Via Agent Workflow
When the agent wants to ingest content during a session:
- Fetch the content (WebFetch, smart-fetch, transcript tools, or Read for local files)
- Clean it (strip navigation, ads, boilerplate)
- Call the ingest API or write the file manually:
# Write the markdown file with frontmatter
cat > .instar/knowledge/articles/2026-02-25-my-article.md << 'EOF'
---
title: "My Article"
source: "https://example.com/article"
ingested: "2026-02-25"
tags: ["AI", "infrastructure"]
---
# My Article
[Cleaned article content here]
EOF
# Sync the index to pick up the new file
instar memory sync
Searching Knowledge
CLI
# Search within knowledge base only
instar knowledge search "notification batching"
# Search all memory (including knowledge)
instar memory search "notification batching"
API
# Knowledge-scoped search
curl "http://localhost:4040/memory/search?q=notification+batching&source=knowledge/&limit=5"
# Browse the catalog
curl "http://localhost:4040/knowledge/catalog"
curl "http://localhost:4040/knowledge/catalog?tag=AI"
Managing Sources
List all sources
instar knowledge list
instar knowledge list --tag AI
Remove a source
# Find the source ID from the list
instar knowledge list
# Remove it
instar knowledge remove kb_20260225123456_abc123
# Re-sync the index
instar memory sync
Via API
# Remove
curl -X DELETE "http://localhost:4040/knowledge/kb_20260225123456_abc123"
MemoryIndex Configuration
To enable knowledge base indexing, add these sources to your .instar/config.json memory section:
{
"memory": {
"enabled": true,
"sources": [
{ "path": "AGENT.md", "type": "markdown", "evergreen": true },
{ "path": "USER.md", "type": "markdown", "evergreen": true },
{ "path": "knowledge/articles/", "type": "markdown", "evergreen": false },
{ "path": "knowledge/transcripts/", "type": "markdown", "evergreen": false },
{ "path": "knowledge/docs/", "type": "markdown", "evergreen": true }
]
}
}
Source behavior:
articles/andtranscripts/useevergreen: false-- recent content ranks higher (30-day temporal decay)docs/usesevergreen: true-- reference documentation doesn't decay
Content Types
| Type | Directory | Temporal Decay | Best For |
|---|---|---|---|
article | articles/ | Yes (30-day) | Web articles, blog posts, news |
transcript | transcripts/ | Yes (30-day) | YouTube videos, podcasts, meetings |
doc | docs/ | No (evergreen) | API docs, manuals, reference material |
Tips
- Always sync after ingesting:
instar memory syncupdates the FTS5 index - Use tags consistently: Tags enable filtered browsing via
instar knowledge list --tag X - Include source URLs: Helps trace back to original content
- Clean before ingesting: Strip navigation, ads, cookie banners for better search results
- Use smart-fetch for URLs:
python3 .claude/scripts/smart-fetch.py URL --autogets clean markdown
Source
git clone https://github.com/SageMindAI/instar/blob/main/skills/knowledge-base/SKILL.mdView on GitHub Overview
knowledge-base creates a searchable store from URLs, documents, and transcripts. It uses MemoryIndex (FTS5) to index content alongside other memories, and stores everything under .instar/knowledge with a catalog for browsing.
How This Skill Works
The knowledge base is a collection of markdown files located in .instar/knowledge/ with YAML frontmatter for metadata. A catalog.json tracks ingested sources and MemoryIndex provides full-text search so you can query via CLI or API.
When to Use It
- Ingest a new research article from a URL or document and add metadata such as title and tags.
- Search across past research and curated docs using full-text search.
- Query transcripts and videos through the knowledge base for quick references.
- Browse sources by tag or path via the catalog.
- Sync the index after ingesting content to keep results up to date.
Quick Start
- Step 1: Use the CLI to ingest content: instar knowledge ingest "Article content here..." --title "My Article" --tags "AI,agents"
- Step 2: Ingest from URL by fetching the page then ingest: python3 .claude/scripts/smart-fetch.py "https://example.com/article" --auto > /tmp/fetched.md; instar knowledge ingest "$(cat /tmp/fetched.md)" --title "Article Title" --url "https://example.com/article" --tags "topic1,topic2"
- Step 3: Sync the index so changes are visible: instar memory sync
Best Practices
- Use rich YAML frontmatter with title, source, ingested date, and tags.
- Pre-clean content to remove navigation, ads, and boilerplate.
- Organize ingested material under knowledge/articles, knowledge/transcripts, and knowledge/docs.
- Run instar memory sync after ingest to refresh the index.
- Prefer API ingestion for programmatic workflows and consistent metadata.
Example Use Cases
- Ingest a vendor article from example.com into knowledge/articles.
- Ingest a product manual PDF into knowledge/docs.
- Ingest a conference transcript into knowledge/transcripts.
- Ingest internal wiki pages into knowledge/articles for team lookup.
- Ingest API docs from a vendor into knowledge/docs.