What sources does this skill analyze?

Code comments, commit messages, and documentation (markdown, wikis, READMEs, API docs) to extract tribal knowledge.

What is the output structure?

An Output Schema with extractionId, timestamp, and knowledge sections for comments, commits, documentation, businessRules, and glossary.

How does it support migration projects?

It preserves institutional memory by surfacing rationale, patterns, and business rules, enabling traceable decisions during legacy migrations and documentation migration.

knowledge-extractor

npx machina-cli add skill a5c-ai/babysitter/knowledge-extractor --openclaw

Files (1)

SKILL.md

2.6 KB

Knowledge Extractor Skill

Extracts tribal knowledge from code comments, commit messages, documentation, and other sources to preserve institutional memory during migration.

Purpose

Enable knowledge preservation for:

Comment analysis and extraction
Commit message mining
Documentation parsing
Pattern recognition
Business rule discovery

Capabilities

1. Comment Analysis

Extract TODO/FIXME comments
Parse documentation comments
Identify explanatory notes
Find warning comments

2. Commit Message Mining

Extract rationale from commits
Identify bug fix context
Find feature explanations
Track decision history

3. Documentation Parsing

Parse markdown documentation
Extract from wikis
Process README files
Catalog API docs

4. Pattern Recognition

Identify coding patterns
Recognize idioms
Detect conventions
Map architectural patterns

5. Business Rule Extraction

Find business logic comments
Extract validation rules
Identify calculation explanations
Document edge cases

6. Glossary Generation

Build domain vocabulary
Define abbreviations
Map term usage
Create terminology guide

Tool Integrations

Tool	Purpose	Integration Method
Sourcegraph	Code search	API
GitHub API	Commit history	API
grep/ripgrep	Pattern search	CLI
Custom NLP	Text analysis	Library
Confluence API	Wiki extraction	API

Output Schema

{
  "extractionId": "string",
  "timestamp": "ISO8601",
  "knowledge": {
    "comments": [
      {
        "type": "todo|fixme|note|warning|explanation",
        "file": "string",
        "line": "number",
        "content": "string",
        "context": "string"
      }
    ],
    "commits": [
      {
        "hash": "string",
        "message": "string",
        "author": "string",
        "context": "string",
        "relatedFiles": []
      }
    ],
    "documentation": [],
    "businessRules": [],
    "glossary": {}
  }
}

Integration with Migration Processes

legacy-codebase-assessment: Knowledge discovery
documentation-migration: Source material

Related Skills

legacy-code-interpreter: Code understanding
documentation-generator: Doc creation

Related Agents

legacy-system-archaeologist: Uses for excavation
documentation-migration-agent: Uses for doc creation

Source

git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/code-migration-modernization/skills/knowledge-extractor/SKILL.md

View on GitHub

Overview

The Knowledge Extractor harvests insights from code comments, commit messages, and documentation to preserve institutional memory during migrations. It surfaces rationale, patterns, rules, and glossary terms for future reference.

How This Skill Works

It analyzes sources such as comments, commit messages, and documentation, using parsing and lightweight NLP to extract TODOs, rationales, patterns, and rules. The results are organized into the Output Schema with sections for comments, commits, documentation, businessRules, and glossary.

When to Use It

During legacy codebase assessment to uncover undocumented decisions
When migrating documentation and wikis to preserve context
During code migration or modernization to retain decision history
For pattern recognition and architectural convention mapping in refactors
To extract business rules and edge-case explanations for validation

Quick Start

Step 1: Point the extractor at the codebase, documentation, and commit history
Step 2: Run extraction to populate the Output Schema with comments, commits, and glossary data
Step 3: Review results, resolve ambiguities, and store findings in the migration backlog or knowledge base

Best Practices

Define the source scope (files, docs, commits) before extraction
Normalize glossary terms and ensure consistent term usage in outputs
Capture provenance (file, line, hash, author) for traceability
Validate findings with engineers to avoid misinterpretation
Run incremental extractions aligned with migration milestones

Example Use Cases

Extract TODO, FIXME, and explanatory notes from a legacy codebase
Mine commit messages for rationale and decision history during migration
Parse READMEs, wikis, and API docs to catalog surface area and usage notes
Identify coding patterns, idioms, and architectural conventions
Document business rules, validation logic, and edge-case explanations found in comments

Frequently Asked Questions

Add this skill to your agents