Archive-Agent
Active·★ 59·GPL-3.0·Updated 2026-05-27
★ Trending★ RAG / Knowledge Base★ Data Processing
Find your files with natural language and ask questions.
Archive Agent is an intelligent file indexer that brings Retrieval Augmented Generation (RAG) to your command line, enabling natural language search and query of your local documents. It features on-device processing with automatic OCR, semantic chunking, and supports various AI providers for flexible deployment.
#RAG#Semantic Search#Document Indexing#OCR#Vector Database#Local AI#File Management#CLI
01
Features
01Intelligent semantic AI search and natural language query of local documents.
02Natively indexes various document types including PDFs, images, Markdown, and plaintext, with automatic OCR and entity extraction.
03Supports multiple AI providers (OpenAI, OpenRouter, Ollama, LM Studio) for flexible model choice and privacy.
04Fully resumable parallel processing for scalable performance, leveraging AI cache and robust retry logic.
05Integrates with workflows via a built-in Model Context Protocol (MCP) server for tool access.
02
Compatibility
Python
Runtime
Verified via docs
Docker
Database
Verified via docs
Qdrant
Vector DB
Verified via docs
Ubuntu 24.04
OS
Verified via docs
Ubuntu 22.04
OS
Verified via docs
Linux Mint
OS
Verified via docs
03
Quick start
1
$ git clone https://github.com/shredEngineer/Archive-Agent
2
$ cd Archive-Agent
3
$ chmod +x install.sh
4
$ ./install.sh
04
Use cases
↳Quickly find information within a large collection of personal or work documents using natural language.
↳Digitize and make searchable scanned documents and images through automatic OCR.
↳Create a local, private RAG system for document analysis without sending data to external services.
↳Automate document processing and querying within an AI-powered workflow via the MCP interface.
↳Efficiently manage and search files across various directories and file types on your local machine.
05
Alternatives
ragflow★ 81.5k
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
n8n★ 190.2k
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
MaxKB★ 21.1k
An open-source platform for building enterprise-grade agents. Powerful and easy to use.
Gemini CLI★ 104.7k
An open-source AI agent that brings the power of Gemini directly into your terminal. Supports native MCP.
letta★ 23.0k
Letta is the platform for building stateful agents: open AI with advanced memory that can learn and self-improve over time.
agents-towards-production★ 20.5k
This repository delivers end-to-end, code-first tutorials covering every layer of production-grade GenAI agents, guiding you from spark to scale with proven patterns and reusable blueprints for real-world launches.
Related searches
Comments
Log in to leave a comment
- SSasha WhiteMay 22, 2026
The question-answering over local files brings RAG to personal file archives
- RReese JohnsonMay 9, 2026
Used for finding documents in large personal archives, accuracy is consistently high
- SSage DavisMay 7, 2026
Natural language file search that actually understands content rather than just filenames