How do I ingest content?

Use the CLI (instar knowledge ingest ...) or the API to post content. For URLs, fetch the page first (or use the provided fetch script) and then ingest with title, url, and tags. You can also create a markdown file with YAML frontmatter under .instar/knowledge and run instar memory sync to index it.

How do I search knowledge?

Use the CLI to search knowledge only with instar knowledge search 'term', or use the API with a knowledge-scoped query (source=knowledge) to limit results. You can also run instar memory search for a broader search across all memories.

How do I remove a source?

List sources with instar knowledge list to find the source ID, then remove it with instar knowledge remove , and finally run instar memory sync to refresh the index.

knowledge-base

npx machina-cli add skill SageMindAI/instar/knowledge-base --openclaw

Files (1)

SKILL.md

5.0 KB

knowledge-base -- Searchable Knowledge Base for Instar Agents

Build a searchable knowledge base from external sources -- URLs, documents, transcripts, PDFs. Uses the existing MemoryIndex (FTS5) for search, so no new dependencies.

How It Works

The knowledge base is a set of markdown files in .instar/knowledge/ that MemoryIndex indexes alongside your other memory files. Each file has YAML frontmatter for metadata and is tracked in a catalog for browsing.

.instar/knowledge/
  catalog.json            # Registry of all ingested sources
  articles/               # Ingested web articles
  transcripts/            # Video/audio transcripts
  docs/                   # Curated reference documentation

Ingesting Content

Via CLI

# Ingest text content directly
instar knowledge ingest "Article content here..." --title "My Article" --tags "AI,agents"

# Ingest from a URL (fetch first, then ingest)
# Step 1: Fetch the content
python3 .claude/scripts/smart-fetch.py "https://example.com/article" --auto > /tmp/fetched.md
# Step 2: Ingest it
instar knowledge ingest "$(cat /tmp/fetched.md)" --title "Article Title" --url "https://example.com/article" --tags "topic1,topic2"

Via API

curl -X POST http://localhost:4040/knowledge/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "content": "The article content...",
    "title": "Article Title",
    "url": "https://example.com/article",
    "type": "article",
    "tags": ["AI", "infrastructure"],
    "summary": "Brief description"
  }'

Via Agent Workflow

When the agent wants to ingest content during a session:

Fetch the content (WebFetch, smart-fetch, transcript tools, or Read for local files)
Clean it (strip navigation, ads, boilerplate)
Call the ingest API or write the file manually:

# Write the markdown file with frontmatter
cat > .instar/knowledge/articles/2026-02-25-my-article.md << 'EOF'
---
title: "My Article"
source: "https://example.com/article"
ingested: "2026-02-25"
tags: ["AI", "infrastructure"]
---

# My Article

[Cleaned article content here]
EOF

# Sync the index to pick up the new file
instar memory sync

Searching Knowledge

CLI

# Search within knowledge base only
instar knowledge search "notification batching"

# Search all memory (including knowledge)
instar memory search "notification batching"

API

# Knowledge-scoped search
curl "http://localhost:4040/memory/search?q=notification+batching&source=knowledge/&limit=5"

# Browse the catalog
curl "http://localhost:4040/knowledge/catalog"
curl "http://localhost:4040/knowledge/catalog?tag=AI"

Managing Sources

List all sources

instar knowledge list
instar knowledge list --tag AI

Remove a source

# Find the source ID from the list
instar knowledge list

# Remove it
instar knowledge remove kb_20260225123456_abc123

# Re-sync the index
instar memory sync

Via API

# Remove
curl -X DELETE "http://localhost:4040/knowledge/kb_20260225123456_abc123"

MemoryIndex Configuration

To enable knowledge base indexing, add these sources to your .instar/config.json memory section:

{
  "memory": {
    "enabled": true,
    "sources": [
      { "path": "AGENT.md", "type": "markdown", "evergreen": true },
      { "path": "USER.md", "type": "markdown", "evergreen": true },
      { "path": "knowledge/articles/", "type": "markdown", "evergreen": false },
      { "path": "knowledge/transcripts/", "type": "markdown", "evergreen": false },
      { "path": "knowledge/docs/", "type": "markdown", "evergreen": true }
    ]
  }
}

Source behavior:

articles/ and transcripts/ use evergreen: false -- recent content ranks higher (30-day temporal decay)
docs/ uses evergreen: true -- reference documentation doesn't decay

Content Types

Type	Directory	Temporal Decay	Best For
`article`	`articles/`	Yes (30-day)	Web articles, blog posts, news
`transcript`	`transcripts/`	Yes (30-day)	YouTube videos, podcasts, meetings
`doc`	`docs/`	No (evergreen)	API docs, manuals, reference material

Tips

Always sync after ingesting: instar memory sync updates the FTS5 index
Use tags consistently: Tags enable filtered browsing via instar knowledge list --tag X
Include source URLs: Helps trace back to original content
Clean before ingesting: Strip navigation, ads, cookie banners for better search results
Use smart-fetch for URLs: python3 .claude/scripts/smart-fetch.py URL --auto gets clean markdown

Source

git clone https://github.com/SageMindAI/instar/blob/main/skills/knowledge-base/SKILL.mdView on GitHub

Overview

knowledge-base creates a searchable store from URLs, documents, and transcripts. It uses MemoryIndex (FTS5) to index content alongside other memories, and stores everything under .instar/knowledge with a catalog for browsing.

How This Skill Works

The knowledge base is a collection of markdown files located in .instar/knowledge/ with YAML frontmatter for metadata. A catalog.json tracks ingested sources and MemoryIndex provides full-text search so you can query via CLI or API.

When to Use It

Ingest a new research article from a URL or document and add metadata such as title and tags.
Search across past research and curated docs using full-text search.
Query transcripts and videos through the knowledge base for quick references.
Browse sources by tag or path via the catalog.
Sync the index after ingesting content to keep results up to date.

Quick Start

Step 1: Use the CLI to ingest content: instar knowledge ingest "Article content here..." --title "My Article" --tags "AI,agents"
Step 2: Ingest from URL by fetching the page then ingest: python3 .claude/scripts/smart-fetch.py "https://example.com/article" --auto > /tmp/fetched.md; instar knowledge ingest "$(cat /tmp/fetched.md)" --title "Article Title" --url "https://example.com/article" --tags "topic1,topic2"
Step 3: Sync the index so changes are visible: instar memory sync

Best Practices

Use rich YAML frontmatter with title, source, ingested date, and tags.
Pre-clean content to remove navigation, ads, and boilerplate.
Organize ingested material under knowledge/articles, knowledge/transcripts, and knowledge/docs.
Run instar memory sync after ingest to refresh the index.
Prefer API ingestion for programmatic workflows and consistent metadata.

Example Use Cases

Ingest a vendor article from example.com into knowledge/articles.
Ingest a product manual PDF into knowledge/docs.
Ingest a conference transcript into knowledge/transcripts.
Ingest internal wiki pages into knowledge/articles for team lookup.
Ingest API docs from a vendor into knowledge/docs.

Frequently Asked Questions

Add this skill to your agents