How do I combine multiple repos into one index?

Pass multiple repo arguments to the github_index script (e.g., owner/repo1 owner/repo2) and specify an output file like combined.md.

What should I do if a repo has poor or missing descriptions?

Use --skip-fetch to generate a skeleton (skeleton.md) and manually enhance descriptions based on domain knowledge.

building-github-index

Scanned

npx machina-cli add skill oaustegard/claude-skills/building-github-index-v2 --openclaw

Files (1)

SKILL.md

3.8 KB

Building GitHub Index

Create markdown indexes of GitHub repositories optimized for Claude project knowledge. Indexes enable retrieval via GitHub API with semantic descriptions for effective matching.

Quick Start

# Documentation repos (markdown/notebooks)
python scripts/github_index.py owner/repo -o index.md

# Code repos (extract symbols via tree-sitter)
python scripts/github_index.py owner/repo --code-symbols -o index.md

# Multiple repos combined
python scripts/github_index.py owner/repo1 owner/repo2 -o combined.md

Script Options

Flag	Description
`-o, --output`	Output file (default: `github_index.md`)
`--token`	GitHub PAT; also reads `GITHUB_TOKEN` env
`--include-patterns`	Only index matching globs: `"docs/" "src/"`
`--exclude-patterns`	Skip matching globs: `"test/**"`
`--max-files`	Cap files per repo (default: 200)
`--skip-fetch`	Tree only, no content fetch (fast, filename-only descriptions)
`--code-symbols`	Include code files, extract function/class names via tree-sitter

Description Extraction Priority

YAML frontmatter - title: and description: fields
Markdown headings - First h1/h2 as title, subsequent as topics
Notebook cells - First markdown cell heading
Code symbols - Public function/class names (with --code-symbols)
Path-derived - Convert filename to words (fallback)

When Descriptions Fail

Some repos have stub files (links to external docs, empty readmes). In these cases:

Manual curation recommended. Use the tree output and domain knowledge:

# Get tree structure only (fast)
python scripts/github_index.py owner/repo --skip-fetch -o skeleton.md
# Then manually enhance descriptions based on domain knowledge

For code-heavy repos with embedded apps:

Directory names encode purpose: acc_wav_gen → "ACC waveform generation"
Peripheral acronyms map to functions: AFEC=ADC, MCAN=CAN, TWIHS=I2C
Operation modes: blocking, interrupt, dma, polled

Output Format

# {Repo} - Content Index

**Repository:** {url}
**Branch:** `{branch}`

## Retrieval Method
{API curl commands}

---

## {Category}

| Description | Path |
|-------------|------|
| {What this covers} | `{path/file.md}` |

Description column leads (relevance matching), path follows (retrieval key).

API Access

Enumerate files:

curl -sL "https://api.github.com/repos/OWNER/REPO/git/trees/BRANCH?recursive=1"

Fetch content:

curl -s "https://api.github.com/repos/OWNER/REPO/contents/PATH?ref=BRANCH" \
  -H "Accept: application/vnd.github+json" | \
  python3 -c "import sys,json,base64; print(base64.b64decode(json.load(sys.stdin)['content']).decode())"

Network

Allowlist: api.github.com, raw.githubusercontent.com

Related Skills

accessing-github-repos - Private repos, PAT setup, tarball download
mapping-codebases - Detailed code structure (methods, imports, line numbers)

Condensed Format (pk_index.py)

For token-constrained project knowledge, use the condensed script:

python scripts/pk_index.py owner/repo -o repo_pk.md

Produces ~80% smaller output:

Single line per file: path — description
Symbols only (no signatures)
15 files max per category
No retrieval instructions section

Ideal when adding multiple repo indexes to project knowledge.

Source

git clone https://github.com/oaustegard/claude-skills/blob/main/building-github-index-v2/SKILL.mdView on GitHub

Overview

This skill creates Markdown indexes of GitHub repositories optimized for Claude project knowledge. The indexes are designed for retrieval via the GitHub API with semantic descriptions to support effective matching across documentation, knowledge bases, and multi-repo knowledge bases.

How This Skill Works

The tool analyzes one or more GitHub repos and outputs a Markdown index (index.md) that includes repository metadata, a retrieval section, and categorized descriptions. It prioritizes YAML frontmatter, Markdown headings, notebook headings, and optional code symbols (via tree-sitter) to generate descriptive entries; if descriptions fail, it falls back to path-derived descriptions. Multiple repos can be combined into a single index for consolidated project knowledge.

When to Use It

Setting up projects that reference external documentation
Creating searchable indexes of technical blogs or knowledge bases
Combining multiple GitHub repos into a single index
When a user mentions terms like 'index', 'github repo', 'project knowledge', or 'documentation reference'
Preparing a fast skeleton or manual curation workflow for code-heavy repos when descriptions are missing

Quick Start

Step 1: python scripts/github_index.py owner/repo -o index.md
Step 2: python scripts/github_index.py owner/repo --code-symbols -o index.md
Step 3: python scripts/github_index.py owner/repo1 owner/repo2 -o combined.md

Best Practices

Prioritize YAML frontmatter titles and descriptions, then use Markdown headings for topics
Enable --code-symbols to surface public function/class names for code-heavy repos
Use --include-patterns and --exclude-patterns to focus on relevant docs and code paths
Limit scope with --max-files to keep indices performant and maintainable
If descriptions are missing, generate a skeleton with --skip-fetch and refine manually with domain knowledge

Example Use Cases

Index a documentation repo: python scripts/github_index.py owner/repo -o index.md
Index a code repo with symbols: python scripts/github_index.py owner/repo --code-symbols -o index.md
Combine multiple repos: python scripts/github_index.py owner/repo1 owner/repo2 -o combined.md
Create a fast skeleton for manual curation: python scripts/github_index.py owner/repo --skip-fetch -o skeleton.md
Use condensed PK-style indexing for multiple repos: python scripts/pk_index.py owner/repo -o repo_pk.md

Frequently Asked Questions

Add this skill to your agents

building-github-index

Building GitHub Index

Quick Start

Script Options

Description Extraction Priority

When Descriptions Fail

Output Format

API Access

Network

Related Skills

Condensed Format (pk_index.py)

Source

Overview

How This Skill Works

When to Use It

Quick Start

Best Practices

Example Use Cases

Frequently Asked Questions

What is building-github-index?

How do I combine multiple repos into one index?

What should I do if a repo has poor or missing descriptions?