AgentIndex icon
AgentIndex
ToolsCategoriesTrendingNewCompare
Submit Tool
Home/
Vision / Multimodal/
UI-TARS-desktop
UI-TARS-desktop logo

UI-TARS-desktop

Active·★ 35.7k·Apache-2.0·Updated 2026-05-18
★ Most Popular★ Trending★ Essential

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

UI-TARS Desktop is the desktop application component of the TARS multimodal AI agent stack. It provides a native GUI agent that can understand and interact with your computer's user interface by seeing the screen, running shell commands, and using browser tools. Powered by cutting-edge multimodal LLMs with MCP integration for extending agent capabilities.

#GUI Agent#Desktop App#Multimodal AI#Automation#Cross-Platform
$ Install
$ npx @agent-tars/cli@latest
↗ Visit site★ GitHub
01

Features

01Native GUI agent that sees the screen and interacts with desktop applications
02Multimodal LLM-powered visual understanding of any UI
03Browser automation and shell command execution built in
04MCP integration for extending agent capabilities with custom tools
05Cross-platform desktop app with web UI option
02

Compatibility

macOS
Supported
Verified via docs
Windows
Supported
Verified via docs
Linux
Supported
Verified via docs
03

Quick start

1
$ npx @agent-tars/cli@latest
04

Use cases

↳Automating desktop GUI workflows that don't have APIs by seeing and clicking the UI
↳Building multimodal agents that combine screen understanding with web and file operations
↳Running AI-assisted computer tasks through natural language instructions on desktop
05

Alternatives

ragflow logo
ragflow★ 81.5k
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
vs →
n8n logo
n8n★ 190.2k
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
vs →
GitHub MCP Server logo
GitHub MCP Server★ 30.3k
GitHub's official MCP Server. Allows AI agents to interact directly with your GitHub repositories (read files, search code, issues).
vs →
Gemini CLI logo
Gemini CLI★ 104.7k
An open-source AI agent that brings the power of Gemini directly into your terminal. Supports native MCP.
vs →
awesome-n8n-templates logo
awesome-n8n-templates★ 22.6k
Supercharge your workflow automation with this curated collection of n8n templates! Instantly connect your favorite apps-like Gmail, Telegram, Google Drive, Slack, and more-with ready-to-use, AI-powered automations. Save time, boost productivity, and unlock the true potential of n8n in just a few clicks.
vs →
keep logo
keep★ 11.9k
The open-source AIOps and alert management platform
vs →
claude-code-guide logo
claude-code-guide★ 4.2k
Master Claude Code with this Guide! Includes: Setup, SKILL.md files, Agents, Commands, workflows and tricks making Claude's potential skyrocket!
vs →
open-claude-cowork logo
open-claude-cowork★ 4.2k
Open Source version of Claude Cowork with 500+ SaaS app integrations
vs →
See all alternatives →

Related searches

UI-TARS-desktop AlternativesBest Vision / Multimodal Tools 2026Open Source Vision / MultimodalUI-TARS-desktop TutorialUI-TARS-desktop Vs CompetitorsGUI AgentDesktop AppMultimodal AI

Comments

Log in to leave a comment
  • R
    Remy RiveraMay 25, 2026

    Good for building desktop AI assistants that can see and interact with any application.

  • J
    Jordan MartinezApr 23, 2026

    Handles the model integration complexity so you focus on task definition.

  • P
    Parker ThompsonMar 28, 2026

    Open-source foundation means you're not locked into proprietary automation platforms.

  • L
    Logan AndersonMar 21, 2026

    Multimodal AI agent stack that connects cutting-edge models to desktop automation.

On this page
01Features02Compatibility03Quick start04Use cases05Alternatives
Stats
GitHub Stars★ 35.7k
Last commit1w ago
StatusActive
LicenseApache-2.0
CategoryVision / Multimodal
Trend (30d)
+1.4k↑ 4.8%
Links
Documentation↗Discussion↗Issues↗Releases↗

Deploy on DigitalOcean — Get $200 Free Credit

Ad
© 2026 AgentIndex.app|Built by a 10-year iOS Developer.
QYSGitHubBuy me a coffee ☕

Browse by Category

Code AssistantWorkflow AutomationRAG / Knowledge BaseMulti-AgentBrowser AutomationLLM InfraDev ToolingObservability

Not affiliated with Anthropic, OpenAI or Microsoft.