Get the FREE Ultimate OpenClaw Setup Guide →

knowledge-extractor

npx machina-cli add skill a5c-ai/babysitter/knowledge-extractor --openclaw
Files (1)
SKILL.md
2.6 KB

Knowledge Extractor Skill

Extracts tribal knowledge from code comments, commit messages, documentation, and other sources to preserve institutional memory during migration.

Purpose

Enable knowledge preservation for:

  • Comment analysis and extraction
  • Commit message mining
  • Documentation parsing
  • Pattern recognition
  • Business rule discovery

Capabilities

1. Comment Analysis

  • Extract TODO/FIXME comments
  • Parse documentation comments
  • Identify explanatory notes
  • Find warning comments

2. Commit Message Mining

  • Extract rationale from commits
  • Identify bug fix context
  • Find feature explanations
  • Track decision history

3. Documentation Parsing

  • Parse markdown documentation
  • Extract from wikis
  • Process README files
  • Catalog API docs

4. Pattern Recognition

  • Identify coding patterns
  • Recognize idioms
  • Detect conventions
  • Map architectural patterns

5. Business Rule Extraction

  • Find business logic comments
  • Extract validation rules
  • Identify calculation explanations
  • Document edge cases

6. Glossary Generation

  • Build domain vocabulary
  • Define abbreviations
  • Map term usage
  • Create terminology guide

Tool Integrations

ToolPurposeIntegration Method
SourcegraphCode searchAPI
GitHub APICommit historyAPI
grep/ripgrepPattern searchCLI
Custom NLPText analysisLibrary
Confluence APIWiki extractionAPI

Output Schema

{
  "extractionId": "string",
  "timestamp": "ISO8601",
  "knowledge": {
    "comments": [
      {
        "type": "todo|fixme|note|warning|explanation",
        "file": "string",
        "line": "number",
        "content": "string",
        "context": "string"
      }
    ],
    "commits": [
      {
        "hash": "string",
        "message": "string",
        "author": "string",
        "context": "string",
        "relatedFiles": []
      }
    ],
    "documentation": [],
    "businessRules": [],
    "glossary": {}
  }
}

Integration with Migration Processes

  • legacy-codebase-assessment: Knowledge discovery
  • documentation-migration: Source material

Related Skills

  • legacy-code-interpreter: Code understanding
  • documentation-generator: Doc creation

Related Agents

  • legacy-system-archaeologist: Uses for excavation
  • documentation-migration-agent: Uses for doc creation

Source

git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/code-migration-modernization/skills/knowledge-extractor/SKILL.mdView on GitHub

Overview

The Knowledge Extractor harvests insights from code comments, commit messages, and documentation to preserve institutional memory during migrations. It surfaces rationale, patterns, rules, and glossary terms for future reference.

How This Skill Works

It analyzes sources such as comments, commit messages, and documentation, using parsing and lightweight NLP to extract TODOs, rationales, patterns, and rules. The results are organized into the Output Schema with sections for comments, commits, documentation, businessRules, and glossary.

When to Use It

  • During legacy codebase assessment to uncover undocumented decisions
  • When migrating documentation and wikis to preserve context
  • During code migration or modernization to retain decision history
  • For pattern recognition and architectural convention mapping in refactors
  • To extract business rules and edge-case explanations for validation

Quick Start

  1. Step 1: Point the extractor at the codebase, documentation, and commit history
  2. Step 2: Run extraction to populate the Output Schema with comments, commits, and glossary data
  3. Step 3: Review results, resolve ambiguities, and store findings in the migration backlog or knowledge base

Best Practices

  • Define the source scope (files, docs, commits) before extraction
  • Normalize glossary terms and ensure consistent term usage in outputs
  • Capture provenance (file, line, hash, author) for traceability
  • Validate findings with engineers to avoid misinterpretation
  • Run incremental extractions aligned with migration milestones

Example Use Cases

  • Extract TODO, FIXME, and explanatory notes from a legacy codebase
  • Mine commit messages for rationale and decision history during migration
  • Parse READMEs, wikis, and API docs to catalog surface area and usage notes
  • Identify coding patterns, idioms, and architectural conventions
  • Document business rules, validation logic, and edge-case explanations found in comments

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers