pdf-mcp

Active·★ 45·MIT·Updated 2026-05-29

★ Trending★ Code Assistant★ RAG / Knowledge Base

MCP server that lets Claude Code and other AI agents read large PDFs without hitting context limits. Chunked reading, hybrid search, OCR, table and image extraction, SQLite cache.

pdf-mcp is a Model Context Protocol (MCP) server that enables AI agents to read, search, and extract content from PDF files. It uses PyMuPDF for PDF parsing, SQLite for persistent caching, and supports hybrid search combining BM25 keyword and semantic embeddings, OCR for scanned documents, and structured extraction of tables and images.

#agentic-ai#ai#claude#codex-cli#copilot#document-processing#llm#mcp

Features

01Hybrid search (BM25 keyword + semantic embeddings) with Reciprocal Rank Fusion

02Paginated reading to avoid context overflow

03OCR support for scanned and image-based PDFs via Tesseract

04Structured extraction of tables, images, and table of contents

05Persistent SQLite cache with automatic invalidation

Compatibility

Claude Code

Verified via docs

Claude Desktop

Verified via docs

Visual Studio Code

VS Code

Verified via docs

Codex CLI

Verified via docs

Kiro

Verified via docs

Quick start

$ pip install pdf-mcp

Use cases

↳Efficiently read and analyze large PDF documents without exceeding context limits

↳Search for specific content or concepts within PDFs using natural language

↳Extract structured data such as tables and images from PDFs

Alternatives

ragflow★ 81.5k

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

Features

Compatibility

Quick start

Use cases

Alternatives

Related searches

Comments

Features

Compatibility

Quick start

Use cases

Alternatives

Related searches

Comments