mcp_server_knowledge_engine

Basic knowledge base mcp that can convert pdf files and make them searchable (using non-semantic search)

Installation

Run this command in your terminal to add the MCP server to Claude Code.

Run in terminal:

Command

claude mcp add --transport stdio lhstorm-mcp_server_knowledge_engine python server.py \
  --env PDF_FOLDER="./your-pdfs" \
  --env SERVER_NAME="your-server-name" \
  --env DOMAIN_KEYWORDS="comma,separated,keywords" \
  --env MARKDOWN_FOLDER="./your-pdfs/markdown"

How to use

This MCP server provides a Python-based knowledge engine that ingests a collection of PDFs, converts them to a searchable Markdown-backed format, and exposes a Claude Desktop-compatible MCP interface. It builds a TF-IDF inverted index with proximity matching to deliver relevant excerpts and supports domain-specific keyword tuning. You can add PDFs to the configured folder, process them to generate the search index, and then generate an MCP configuration to connect Claude Desktop or other MCP clients. Tools exposed by the server include: a Search tool that returns relevant passages with context, a List tool that enumerates available documents and metadata, and a Content tool that retrieves full document content (with optional page-level access). These tools are configurable and renameable via the provided configuration flow, allowing you to tailor the experience to your domain.

How to install

Prerequisites:

Python 3.8 or higher
pip
Git

Clone the repository

git clone https://github.com/lhstorm/mcp_server_knowledge_engine.git
cd mcp_server_knowledge_engine

Create and activate a virtual environment (recommended)

python -m venv venv
# macOS/Linux
source venv/bin/activate
# Windows
venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Configure the server

Create a copy of server_config.json and adjust your settings (server name, display name, PDF folder, domain keywords, etc.).
Place PDFs in the configured folder.

Run the server

python server.py

Generate MCP config for Claude Desktop

python generate_mcp_config.py

Optional steps:

Use manage_server.py for CLI tasks such as create-config, add-pdf, process-pdfs, etc.
Use the interactive setup to customize server name, display name, and domain keywords.

Additional notes

Tips and troubleshooting:

Ensure PDFs are accessible in the configured pdf_folder and that the process-pdfs step has been run to generate the searchable index.
If the index seems stale after adding new PDFs, re-run process-pdfs and re-run generate_mcp_config to reflect changes in the MCP config.
The domain_keywords setting helps tailor search relevance; consider domain-specific terms that users would query.
If Claude Desktop does not show the server, restart Claude Desktop after generating the MCP config.
Environment variables can be used to override paths and metadata without changing code; keep them in sync with your deployment environment.
For large PDF collections, enable parallel_processing in processing to improve indexing speed, and monitor cache_enabled to leverage the MD5-based change detection.

Related MCP Servers

mcp-vegalite

MCP server from isaacwasserman/mcp-vegalite-server

github-chat

A Model Context Protocol (MCP) for analyzing and querying GitHub repositories using the GitHub Chat API.

nautex

MCP server for guiding Coding Agents via end-to-end requirements to implementation plan pipeline

pagerduty

PagerDuty's official local MCP (Model Context Protocol) server which provides tools to interact with your PagerDuty account directly from your MCP-enabled client.

futu-stock

mcp server for futuniuniu stock

mcp -boilerplate

Boilerplate using one of the 'better' ways to build MCP Servers. Written using FastMCP