AgentIndex icon
AgentIndex
ToolsCategoriesTrendingNewCompare
Submit Tool
Home/
Vision / Multimodal/
paperbanana
paperbanana logo

paperbanana

Active·★ 1.8k·MIT·Updated 2026-05-21
★ Hidden Gem★ Vision / Multimodal★ Multi-Agent

Open source implementation and extension of Google Research’s PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation.

PaperBanana is an agentic framework designed to automate the generation of publication-quality academic diagrams and statistical plots for AI scientists. It utilizes a two-phase multi-agent pipeline and Google Gemini for VLM planning and image generation, supporting various input formats and refinement iterations.

© 2026 AgentIndex.app|Built by a 10-year iOS Developer.
QYSGitHubBuy me a coffee ☕

Browse by Category

Code AssistantWorkflow AutomationRAG / Knowledge BaseMulti-AgentBrowser AutomationLLM InfraDev ToolingObservability

Not affiliated with Anthropic, OpenAI or Microsoft.

#AI Agents#Diagram Generation#Plotting#Large Language Models#Visual Language Models#Multi-agent Systems#Academic Tools#Python
$ Install
$ pip install paperbanana
↗ Visit site★ GitHub
01

Features

01Two-phase multi-agent pipeline with iterative refinement.
02Gemini-based VLM planning and image generation capabilities.
03Provides CLI, Python API, and MCP server for IDE integration.
04Supports generation of methodology diagrams and statistical plots.
02

Compatibility

Python
Runtime
Verified via docs
Google Gemini
AI Backend
Verified via docs
03

Quick start

1
$ pip install paperbanana
04

Use cases

↳Automate the creation of methodology diagrams from text descriptions for research papers.
↳Generate statistical plots from structured data (CSV/JSON) for data analysis and visualization.
↳Evaluate the quality of generated diagrams against human references using VLM-as-a-Judge.
05

Alternatives

ragflow logo
ragflow★ 81.5k
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
vs →
n8n logo
n8n★ 190.2k
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
vs →
mindsdb logo
mindsdb★ 39.2k

Related searches

paperbanana AlternativesBest Vision / Multimodal Tools 2026Open Source Vision / Multimodalpaperbanana Tutorialpaperbanana Vs CompetitorsAI AgentsDiagram GenerationPlotting

Comments

Log in to leave a comment
  • R
    Remy JacksonApr 13, 2026

    Research visual automation saves hours of manual figure creation for paper submissions.

  • P
    Parker ClarkMar 31, 2026

    The extension to slide generation is the meaningful addition over the original implementation.

  • A
    Avery AndersonMar 16, 2026

    Automated academic figure generation extended beyond diagrams to slides is a useful addition to the original.

  • R
    Rowan MartinezMar 9, 2026

    Open-source extension of Google Research's PaperBanana with broader domain support.

On this page
01Features02Compatibility03Quick start04Use cases05Alternatives
Stats
GitHub Stars★ 1.8k
Last commit
Federated Query Engine for AI - The only MCP Server you'll ever need
vs →
GitHub MCP Server logo
GitHub MCP Server★ 30.3k
GitHub's official MCP Server. Allows AI agents to interact directly with your GitHub repositories (read files, search code, issues).
vs →
Brave Search MCP logo
Brave Search MCP★ 86.5k
Allow your AI Agent to search the real-time internet using Brave Search API. Essential for getting up-to-date information.
vs →
Microsoft AutoGen logo
Microsoft AutoGen★ 58.5k
A framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks.
vs →
CrewAI logo
CrewAI★ 52.4k
Framework for orchestrating role-playing, autonomous AI agents. By working together, your Crew can tackle complex tasks.
vs →
MetaGPT logo
MetaGPT★ 68.4k
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
vs →
See all alternatives →
1w ago
StatusActive
LicenseMIT
CategoryVision / Multimodal
Trend (30d)
+0k↑ 1.7%
Links
Documentation↗Discussion↗Issues↗Releases↗

Deploy on DigitalOcean — Get $200 Free Credit

Ad