knowledge-extractor
npx machina-cli add skill a5c-ai/babysitter/knowledge-extractor --openclawFiles (1)
SKILL.md
2.6 KB
Knowledge Extractor Skill
Extracts tribal knowledge from code comments, commit messages, documentation, and other sources to preserve institutional memory during migration.
Purpose
Enable knowledge preservation for:
- Comment analysis and extraction
- Commit message mining
- Documentation parsing
- Pattern recognition
- Business rule discovery
Capabilities
1. Comment Analysis
- Extract TODO/FIXME comments
- Parse documentation comments
- Identify explanatory notes
- Find warning comments
2. Commit Message Mining
- Extract rationale from commits
- Identify bug fix context
- Find feature explanations
- Track decision history
3. Documentation Parsing
- Parse markdown documentation
- Extract from wikis
- Process README files
- Catalog API docs
4. Pattern Recognition
- Identify coding patterns
- Recognize idioms
- Detect conventions
- Map architectural patterns
5. Business Rule Extraction
- Find business logic comments
- Extract validation rules
- Identify calculation explanations
- Document edge cases
6. Glossary Generation
- Build domain vocabulary
- Define abbreviations
- Map term usage
- Create terminology guide
Tool Integrations
| Tool | Purpose | Integration Method |
|---|---|---|
| Sourcegraph | Code search | API |
| GitHub API | Commit history | API |
| grep/ripgrep | Pattern search | CLI |
| Custom NLP | Text analysis | Library |
| Confluence API | Wiki extraction | API |
Output Schema
{
"extractionId": "string",
"timestamp": "ISO8601",
"knowledge": {
"comments": [
{
"type": "todo|fixme|note|warning|explanation",
"file": "string",
"line": "number",
"content": "string",
"context": "string"
}
],
"commits": [
{
"hash": "string",
"message": "string",
"author": "string",
"context": "string",
"relatedFiles": []
}
],
"documentation": [],
"businessRules": [],
"glossary": {}
}
}
Integration with Migration Processes
- legacy-codebase-assessment: Knowledge discovery
- documentation-migration: Source material
Related Skills
legacy-code-interpreter: Code understandingdocumentation-generator: Doc creation
Related Agents
legacy-system-archaeologist: Uses for excavationdocumentation-migration-agent: Uses for doc creation
Source
git clone https://github.com/a5c-ai/babysitter/blob/main/plugins/babysitter/skills/babysit/process/specializations/code-migration-modernization/skills/knowledge-extractor/SKILL.mdView on GitHub Overview
The Knowledge Extractor harvests insights from code comments, commit messages, and documentation to preserve institutional memory during migrations. It surfaces rationale, patterns, rules, and glossary terms for future reference.
How This Skill Works
It analyzes sources such as comments, commit messages, and documentation, using parsing and lightweight NLP to extract TODOs, rationales, patterns, and rules. The results are organized into the Output Schema with sections for comments, commits, documentation, businessRules, and glossary.
When to Use It
- During legacy codebase assessment to uncover undocumented decisions
- When migrating documentation and wikis to preserve context
- During code migration or modernization to retain decision history
- For pattern recognition and architectural convention mapping in refactors
- To extract business rules and edge-case explanations for validation
Quick Start
- Step 1: Point the extractor at the codebase, documentation, and commit history
- Step 2: Run extraction to populate the Output Schema with comments, commits, and glossary data
- Step 3: Review results, resolve ambiguities, and store findings in the migration backlog or knowledge base
Best Practices
- Define the source scope (files, docs, commits) before extraction
- Normalize glossary terms and ensure consistent term usage in outputs
- Capture provenance (file, line, hash, author) for traceability
- Validate findings with engineers to avoid misinterpretation
- Run incremental extractions aligned with migration milestones
Example Use Cases
- Extract TODO, FIXME, and explanatory notes from a legacy codebase
- Mine commit messages for rationale and decision history during migration
- Parse READMEs, wikis, and API docs to catalog surface area and usage notes
- Identify coding patterns, idioms, and architectural conventions
- Document business rules, validation logic, and edge-case explanations found in comments
Frequently Asked Questions
Add this skill to your agents