AgentIndex icon
AgentIndex
ToolsCategoriesTrendingNewCompare
Submit Tool
Home/
Voice / Speech/
claude-video-vision
claude-video-vision logo

claude-video-vision

Active·★ 700·MIT·Updated 2026-05-18
★ Vision / Multimodal★ Dev Tooling

Give Claude the ability to watch and understand videos — Claude Code plugin with frame extraction and multimodal audio analysis

A Claude Code plugin that gives Claude the ability to watch and understand videos. It extracts frames via ffmpeg and processes audio through multiple backends (Gemini, local Whisper, or OpenAI). Claude receives frames as images and audio transcriptions with timestamps, acting as a perception layer.

#claude-code#claude-code-plugin#ffmpeg#gemini#mcp#mcp-server#mcp-servers#multimodal
$ Install
$ /plugin marketplace add https://github.com/jordanrendric/claude-video-vision && /plugin install claude-video-vision
↗ Visit site★ GitHub
01

Features

01Multimodal perception — Claude sees video frames directly and reads audio transcriptions with timestamps
02Flexible backends — Choose between cloud APIs or fully local processing
03Adaptive extraction — Claude adjusts fps, time range, and resolution based on your question
04Auto-installation — Whisper models download automatically on first use
05Interactive setup wizard — /setup-video-vision walks you through configuration
02

Compatibility

macOS
Native
Verified via docs
Linux
Supported
Verified via docs
Windows
Requires extra setup
Verified via docs
03

Quick start

1
$ /plugin marketplace add https://github.com/jordanrendric/claude-video-vision
2
$ /plugin install claude-video-vision
04

Use cases

↳Analyze a video file by providing its path and optionally asking a specific question
↳Extract frames and audio from specific time ranges for detailed inspection
↳Summarize long lectures or demos with adaptive frame extraction
05

Alternatives

OpenClaw logo
OpenClaw★ 375.5k
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
vs →
MaxKB logo
MaxKB★ 21.1k
An open-source platform for building enterprise-grade agents. Powerful and easy to use.
vs →
fastmcp logo
fastmcp★ 25.4k
🚀 The fast, Pythonic way to build MCP servers and clients.
vs →
nuclear logo
nuclear★ 17.7k
Streaming music player that finds free music for you
vs →
context-mode logo
context-mode★ 16.0k
Context window optimization for AI coding agents. Sandboxes tool output, 98% reduction. 12 platforms
vs →
Auto-claude-code-research-in-sleep logo
Auto-claude-code-research-in-sleep★ 11.0k
ARIS ⚔️ (Auto-Research-In-Sleep) — Claude Code skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation via Codex MCP
vs →
agents-best-practices logo
agents-best-practices★ 1.1k
Provider-neutral Agent Skill for Codex, Claude Code, and agentic harness design.
vs →
semble logo
semble★ 4.5k
Fast and Accurate Code Search for Agents
vs →
See all alternatives →

Related searches

claude-video-vision AlternativesBest Voice / Speech Tools 2026Open Source Voice / Speechclaude-video-vision Tutorialclaude-video-vision Vs Competitorsclaude-codeclaude-code-pluginffmpeg

Comments

Log in to leave a comment
  • R
    River WhiteMay 24, 2026

    The multimodal audio integration works, but processing longer videos can take some time.

  • E
    Emerson PatelMay 6, 2026

    This completely changes how I debug UI tests. Claude can actually see where the selector failed.

  • J
    Justice ThompsonApr 20, 2026

    Perfect for feeding Claude visual context from UI recordings to debug frontend glitch behaviors.

  • P
    Parker DavisApr 19, 2026

    Make sure you have ffmpeg installed globally, otherwise the frame extraction will fail silently.

On this page
01Features02Compatibility03Quick start04Use cases05Alternatives
Stats
GitHub Stars★ 700
Last commit1w ago
StatusActive
LicenseMIT
CategoryVoice / Speech
Trend (30d)
+28↑ 0.7%
Links
Documentation↗Discussion↗Issues↗Releases↗

Deploy on DigitalOcean — Get $200 Free Credit

Ad
© 2026 AgentIndex.app|Built by a 10-year iOS Developer.
QYSGitHubBuy me a coffee ☕

Browse by Category

Code AssistantWorkflow AutomationRAG / Knowledge BaseMulti-AgentBrowser AutomationLLM InfraDev ToolingObservability

Not affiliated with Anthropic, OpenAI or Microsoft.