chm-converter
chm to markdown and vectorDB
claude mcp add --transport stdio dtducas-chm-converter python chm_to_markdown.py \ --env PYTHONUNBUFFERED="1"
How to use
CHM to Markdown Converter is a Python-based MCP server that transforms CHM (Compiled HTML Help) files into clean Markdown documentation tailored for Revit API content. It processes multiple versioned documentation, extracts HTML using 7-Zip, converts HTML to Markdown with preserved structure, and generates core index files for AI integration and search. The tool supports asynchronous processing, batch conversions, and configurable performance parameters, making it suitable for building a searchable, versioned documentation repository.
To use the server, run the provided Python script in the project while ensuring prerequisites are installed. You can place your CHM files in the resources folder, then execute the converter to generate output organized by version (e.g., 2022, 2023, etc.) with both data/ and core/ subfolders. The converter also offers options to keep intermediate HTML for debugging, control the number of worker threads, and adjust batch sizes for performance tuning.
How to install
Prerequisites:
- Python 3.7+
- 7-Zip installed in the default location and accessible in PATH (or adjust script if your path differs)
- Internet access for installing Python dependencies
- Clone or download the repository:
git clone https://github.com/your-org/dtducas-chm-converter.git
cd dtducas-chm-converter
- Create and activate a Python virtual environment (recommended):
python -m venv venv
# Windows
venv\Scripts\activate.bat
# macOS/Linux
source venv/bin/activate
- Install required Python packages:
pip install -r requirements.txt
If you prefer installing directly without a requirements file:
pip install beautifulsoup4 html2text aiofiles
- Ensure 7-Zip is installed and PATH includes its location. On Windows, the default is typically:
- C:\Program Files\7-Zip\7z.exe
- Prepare resources:
- Place your CHM files in the resources/ folder.
- Run the converter:
python chm_to_markdown.py
Additional notes
Tips and considerations:
- The output is organized by version (e.g., 2022/, 2023/), with core/ containing files like file_index.json, id_lookup.json, and index.md, and data/ containing the generated Markdown docs.
- You can tune performance with command-line options such as --workers, --batch-size, and --semaphore. Example: python chm_to_markdown.py --all --workers 4 --batch-size 25 --semaphore 10
- If you encounter encoding issues or missing modules, ensure Python environment is correctly configured and dependencies are installed.
- For debugging, you can keep intermediate HTML by using --keep-html to inspect transformation results.
- The script handles code snippets with language mappings to preserve syntax highlighting (e.g., csharp, vb, cpp) and updates internal links for AI integration and search features.
Related MCP Servers
mcp-read-website-fast
Quickly reads webpages and converts to markdown for fast, token efficient web scraping
puremd
Unblock, scrape, and search tools for MCP clients
scraps
Scraps is a portable CLI knowledge hub for managing interconnected Markdown documentation with Wiki-link notation.
html-to-markdown
MCP server for converting HTML to Markdown using Turndown.js. Fetch web pages and convert them to clean, formatted Markdown.
revit_mcp
Revit MCP. A Model Context Protocol server for Revit integration, enabling seamless communication between Claude AI and Autodesk Revit.
mcp-doc-forge
MCP server that provides doc forge capabilities