mcp_server_knowledge_engine
Basic knowledge base mcp that can convert pdf files and make them searchable (using non-semantic search)
claude mcp add --transport stdio lhstorm-mcp_server_knowledge_engine python server.py \ --env PDF_FOLDER="./your-pdfs" \ --env SERVER_NAME="your-server-name" \ --env DOMAIN_KEYWORDS="comma,separated,keywords" \ --env MARKDOWN_FOLDER="./your-pdfs/markdown"
How to use
This MCP server provides a Python-based knowledge engine that ingests a collection of PDFs, converts them to a searchable Markdown-backed format, and exposes a Claude Desktop-compatible MCP interface. It builds a TF-IDF inverted index with proximity matching to deliver relevant excerpts and supports domain-specific keyword tuning. You can add PDFs to the configured folder, process them to generate the search index, and then generate an MCP configuration to connect Claude Desktop or other MCP clients. Tools exposed by the server include: a Search tool that returns relevant passages with context, a List tool that enumerates available documents and metadata, and a Content tool that retrieves full document content (with optional page-level access). These tools are configurable and renameable via the provided configuration flow, allowing you to tailor the experience to your domain.
How to install
Prerequisites:
- Python 3.8 or higher
- pip
- Git
- Clone the repository
git clone https://github.com/lhstorm/mcp_server_knowledge_engine.git
cd mcp_server_knowledge_engine
- Create and activate a virtual environment (recommended)
python -m venv venv
# macOS/Linux
source venv/bin/activate
# Windows
venv\Scripts\activate
- Install dependencies
pip install -r requirements.txt
- Configure the server
- Create a copy of server_config.json and adjust your settings (server name, display name, PDF folder, domain keywords, etc.).
- Place PDFs in the configured folder.
- Run the server
python server.py
- Generate MCP config for Claude Desktop
python generate_mcp_config.py
Optional steps:
- Use manage_server.py for CLI tasks such as create-config, add-pdf, process-pdfs, etc.
- Use the interactive setup to customize server name, display name, and domain keywords.
Additional notes
Tips and troubleshooting:
- Ensure PDFs are accessible in the configured pdf_folder and that the process-pdfs step has been run to generate the searchable index.
- If the index seems stale after adding new PDFs, re-run process-pdfs and re-run generate_mcp_config to reflect changes in the MCP config.
- The domain_keywords setting helps tailor search relevance; consider domain-specific terms that users would query.
- If Claude Desktop does not show the server, restart Claude Desktop after generating the MCP config.
- Environment variables can be used to override paths and metadata without changing code; keep them in sync with your deployment environment.
- For large PDF collections, enable parallel_processing in processing to improve indexing speed, and monitor cache_enabled to leverage the MD5-based change detection.
Related MCP Servers
mcp-vegalite
MCP server from isaacwasserman/mcp-vegalite-server
github-chat
A Model Context Protocol (MCP) for analyzing and querying GitHub repositories using the GitHub Chat API.
nautex
MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline
pagerduty
PagerDuty's official local MCP (Model Context Protocol) server which provides tools to interact with your PagerDuty account directly from your MCP-enabled client.
futu-stock
mcp server for futuniuniu stock
mcp -boilerplate
Boilerplate using one of the 'better' ways to build MCP Servers. Written using FastMCP