AgentIndex icon
AgentIndex
ToolsCategoriesTrendingNewCompare
Submit Tool
Home/
Vision / Multimodal/
kreuzberg
kreuzberg logo

kreuzberg

Active·★ 8.4k·MIT·Updated 2026-05-29
★ Trending★ Essential

A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 50+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

Kreuzberg is a high-performance, polyglot library designed to extract text and metadata from over 57 file formats, including comprehensive OCR capabilities. Built with a Rust core, it offers native speed processing, memory efficiency, and the ability to generate embeddings without requiring a GPU, making it highly versatile for various data extraction and processing tasks.

#Document Processing#Data Extraction#OCR#Multi-language#Embeddings#Coding#Data Analysis#Image Generation
↗ Visit site★ GitHub
01

Features

01Extensible architecture with a plugin system for custom backends and processors.
02Polyglot support with native bindings for 10+ programming languages.
03Comprehensive support for 57+ file formats across 8 categories, including Office, PDF, and images.
04Advanced OCR capabilities with multiple backends and intelligent table detection.
05High performance due to a Rust core, SIMD optimizations, and full parallelism.
02

Compatibility

Rust
Core Library
Verified via docs
Python
Language Binding
Verified via docs
Elixir
Language Binding
Verified via docs
Node.js
Language Binding
Verified via docs
WASM
WebAssembly Support
Verified via docs
Java
Language Binding
Verified via docs
03

Use cases

↳Automated extraction of text, metadata, and structured data from diverse document types.
↳Building intelligent document processing pipelines for data ingestion and analysis.
↳Enabling efficient search and retrieval systems for unstructured and semi-structured content.
04

Alternatives

ragflow logo
ragflow★ 81.5k
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
vs →
n8n logo
n8n★ 190.2k
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
vs →
Context7 logo
Context7★ 56.4k
MCP Server that provides up-to-date code documentation for LLMs and AI code editors.
vs →
mindsdb logo
mindsdb★ 39.2k
Federated Query Engine for AI - The only MCP Server you'll ever need
vs →
GitHub MCP Server logo
GitHub MCP Server★ 30.3k
GitHub's official MCP Server. Allows AI agents to interact directly with your GitHub repositories (read files, search code, issues).
vs →
Microsoft AutoGen logo
Microsoft AutoGen★ 58.5k
A framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks.
vs →
CrewAI logo
CrewAI★ 52.4k
Framework for orchestrating role-playing, autonomous AI agents. By working together, your Crew can tackle complex tasks.
vs →
MetaGPT logo
MetaGPT★ 68.4k
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
vs →
See all alternatives →

Related searches

kreuzberg AlternativesBest Vision / Multimodal Tools 2026Open Source Vision / Multimodalkreuzberg Tutorialkreuzberg Vs CompetitorsDocument ProcessingData ExtractionOCR

Comments

Log in to leave a comment

No comments yet. Be the first!

On this page
01Features02Compatibility03Use cases04Alternatives
Stats
GitHub Stars★ 8.4k
Last commit1d ago
StatusActive
LicenseMIT
CategoryVision / Multimodal
Trend (30d)
+0.3k↑ 4.3%
Links
Documentation↗Discussion↗Issues↗Releases↗

Deploy on DigitalOcean — Get $200 Free Credit

Ad
© 2026 AgentIndex.app|Built by a 10-year iOS Developer.
QYSGitHubBuy me a coffee ☕

Browse by Category

Code AssistantWorkflow AutomationRAG / Knowledge BaseMulti-AgentBrowser AutomationLLM InfraDev ToolingObservability

Not affiliated with Anthropic, OpenAI or Microsoft.