AgentIndex icon
AgentIndex
ToolsCategoriesTrendingNewCompare
Submit Tool
Home/
Vision / Multimodal/
ocr-mcp
ocr-mcp logo

ocr-mcp

Active·★ 14·MIT·Updated 2026-05-28
★ Trending★ Workflow Automation★ RAG / Knowledge Base

FastMCP server providing advanced OCR capabilities with current state-of-the-art models (DeepSeek-OCR, Florence-2, DOTS.OCR, PP-OCRv5, Qwen-Image-Layered decomposition), WIA scanner control, and multi-format document processing for PDFs, CBZ comics, and images.

OCR-MCP is a complete AI OCR webapp and MCP server. It provides a web interface for drag-and-drop OCR, scanning, and batch processing, and a FastMCP server for agentic IDEs like Claude, Cursor, Windsurf. It supports 13 OCR engines, WIA scanner, preprocessing, and workflow pipelines.

#agentic-workflow#fastmcp#mcp#mcp-server#mcp-servers#ocr
$ Install
$ git clone https://github.com/sandraschi/ocr-mcp.git && cd ocr-mcp && uv sync
↗ Visit site★ GitHub
01

Features

0113 OCR backends (PaddleOCR, Mistral OCR, etc.)
02Auto backend selection
03Preprocessing (deskew, enhance, crop)
04Layout and table extraction
05Batch and pipeline processing
02

Compatibility

Windows
Windows
Verified via docs
Linux
Linux & macOS
Verified via docs
macOS
Linux & macOS
Verified via docs
03

Quick start

1
$ git clone https://github.com/sandraschi/ocr-mcp.git
2
$ cd ocr-mcp
3
$ uv sync
04

Use cases

↳Integrate OCR capabilities into AI agents (Claude, Cursor)
↳Run OCR on scanned documents with WIA scanner
↳Batch process and convert documents to text/PDF/JSON
05

Alternatives

ragflow logo
ragflow★ 81.5k
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
vs →
n8n logo
n8n★ 190.2k
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
vs →
fastmcp logo
fastmcp★ 25.4k
🚀 The fast, Pythonic way to build MCP servers and clients.
vs →
FunASR logo
FunASR★ 16.6k
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
vs →
nuclear logo
nuclear★ 17.7k
Streaming music player that finds free music for you
vs →
semble logo
semble★ 4.5k
Fast and Accurate Code Search for Agents
vs →
csharp-sdk logo
csharp-sdk★ 4.3k
The official C# SDK for Model Context Protocol servers and clients. Maintained in collaboration with Microsoft.
vs →
fast-agent logo
fast-agent★ 3.8k
Code, Build and Evaluate agents - excellent Model and Skills/MCP/ACP Support
vs →
See all alternatives →

Related searches

ocr-mcp AlternativesBest Vision / Multimodal Tools 2026Open Source Vision / Multimodalocr-mcp Tutorialocr-mcp Vs Competitorsagentic-workflowfastmcpmcp

Comments

Log in to leave a comment
  • S
    Spencer BrownMay 12, 2026

    Current OCR models handle handwriting and complex layouts better than older tools.

  • Quinn Kim
    Quinn KimMay 8, 2026

    State-of-the-art OCR capabilities via FastMCP server — quality that matches commercial tools.

  • S
    Spencer NguyenApr 25, 2026

    Works with diverse document formats through a consistent MCP interface.

  • O
    Oaklyn JohnsonMar 5, 2026

    Good for AI workflows that need to extract text from images or scanned documents.

On this page
01Features02Compatibility03Quick start04Use cases05Alternatives
Stats
GitHub Stars★ 14
Last commit2d ago
StatusActive
LicenseMIT
CategoryVision / Multimodal
Trend (30d)
+0.5↑ 0.7%
Links
Documentation↗Discussion↗Issues↗Releases↗

Deploy on DigitalOcean — Get $200 Free Credit

Ad
© 2026 AgentIndex.app|Built by a 10-year iOS Developer.
QYSGitHubBuy me a coffee ☕

Browse by Category

Code AssistantWorkflow AutomationRAG / Knowledge BaseMulti-AgentBrowser AutomationLLM InfraDev ToolingObservability

Not affiliated with Anthropic, OpenAI or Microsoft.