AgentIndex icon
AgentIndex
ToolsCategoriesTrendingNewCompare
Submit Tool
ToolsCategoriesTrendingNewCompare
Home/
Vision / Multimodal/
UI-TARS-desktop
UI-TARS-desktop logo

UI-TARS-desktop

Active·★ 36.4k·Apache-2.0·Updated 2026-05-18
★ Most Popular★ Trending★ Essential

UI-TARS Desktop is the desktop application component of the TARS multimodal AI agent stack. It provides a native GUI agent that can understand and interact with your computer's user interface by seeing the screen, running shell commands, and using browser tools. Powered by cutting-edge multimodal LLMs with MCP integration for extending agent capabilities.

UI-TARS-desktop is currently grouped under Vision / Multimodal, which makes it easier to evaluate through workflow fit instead of isolated features alone. Based on the available data, it leans most heavily toward Native GUI agent that sees the screen and interacts with desktop applications and Automating desktop GUI workflows that don't have APIs by seeing and clicking the UI. The listed license is Apache-2.0, which is useful when adoption constraints matter. It also shows measurable community traction with 36.4k GitHub stars.

#GUI Agent#Desktop App#Multimodal AI#Automation#Cross-Platform
$ Install
$ npx @agent-tars/cli@latest
↗ Visit site★ GitHub
01

Features

01Native GUI agent that sees the screen and interacts with desktop applications
02Multimodal LLM-powered visual understanding of any UI
03Browser automation and shell command execution built in
04MCP integration for extending agent capabilities with custom tools
05Cross-platform desktop app with web UI option
02

Why choose it

+Native GUI agent that sees the screen and interacts with desktop applications
+Automating desktop GUI workflows that don't have APIs by seeing and clicking the UI
+Covers 3 supported environments or platforms, which is helpful for broader deployment needs.
+Ships with a public repository and a Apache-2.0 license, which makes adoption and review easier.
03

Trade-offs

!There are at least 8 related tools in the same category, so the best choice is easier to make after side-by-side comparison.
04

Compatibility

macOS
Supported
Verified via docs
Windows
Supported
Verified via docs
Linux
Supported
Verified via docs
05

Quick start

1
$ npx @agent-tars/cli@latest
06

Use cases

↳Automating desktop GUI workflows that don't have APIs by seeing and clicking the UI
↳Building multimodal agents that combine screen understanding with web and file operations
↳Running AI-assisted computer tasks through natural language instructions on desktop
07

How it compares

≈UI-TARS-desktop sits in the Vision / Multimodal category, so it makes more sense to evaluate it alongside tools like ragflow instead of in isolation.
≈If your main need is closer to "Automating desktop GUI workflows that don't have APIs by seeing and clicking the UI", that use case is a better lens for comparison than broad feature checklists alone.
≈UI-TARS-desktop uses a Apache-2.0 license, and community traction are both easier to judge in category context.
08

Alternatives

ragflow logo
ragflow★ 82.7k
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
vs →
n8n logo
n8n★ 192.5k
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
vs →
GitHub MCP Server logo
GitHub MCP Server★ 30.7k
GitHub's official MCP Server. Allows AI agents to interact directly with your GitHub repositories (read files, search code, issues).
vs →
Gemini CLI logo
Gemini CLI★ 105.3k
An open-source AI agent that brings the power of Gemini directly into your terminal. Supports native MCP.
vs →
awesome-n8n-templates logo
awesome-n8n-templates★ 23.0k
Supercharge your workflow automation with this curated collection of n8n templates! Instantly connect your favorite apps-like Gmail, Telegram, Google Drive, Slack, and more-with ready-to-use, AI-powered automations. Save time, boost productivity, and unlock the true potential of n8n in just a few clicks.
vs →
keep logo
keep★ 11.9k
The open-source AIOps and alert management platform
vs →
claude-code-guide logo
claude-code-guide★ 4.3k
Master Claude Code with this Guide! Includes: Setup, SKILL.md files, Agents, Commands, workflows and tricks making Claude's potential skyrocket!
vs →
open-claude-cowork logo
open-claude-cowork★ 4.2k
Open Source version of Claude Cowork with 500+ SaaS app integrations
vs →
See all alternatives →

Related searches

UI-TARS-desktop AlternativesBest Vision / Multimodal Tools 2026Open Source Vision / MultimodalUI-TARS-desktop TutorialUI-TARS-desktop Vs CompetitorsGUI AgentDesktop AppMultimodal AI

Comments

Log in to leave a comment
  • R
    Remy RiveraMay 25, 2026

    Good for building desktop AI assistants that can see and interact with any application.

  • J
    Jordan MartinezApr 23, 2026

    Handles the model integration complexity so you focus on task definition.

  • P
    Parker ThompsonMar 28, 2026

    Open-source foundation means you're not locked into proprietary automation platforms.

  • L
    Logan AndersonMar 21, 2026

    Multimodal AI agent stack that connects cutting-edge models to desktop automation.

On this page
01Features02Why choose it03Trade-offs04Compatibility05Quick start06Use cases07How it compares08Alternatives
Stats
GitHub Stars★ 36.4k
Last commit4w ago
StatusActive
LicenseApache-2.0
CategoryVision / Multimodal
Trend (30d)
+1.4k↑ 4.8%
Links
Documentation↗Discussion↗Issues↗Releases↗

Deploy on DigitalOcean — Get $200 Free Credit

Ad
© 2026 AgentIndex.app|Built by a 10-year iOS Developer.
QYSGitHubBuy me a coffee ☕

Browse by Category

Code AssistantWorkflow AutomationRAG / Knowledge BaseMulti-AgentBrowser AutomationLLM InfraDev ToolingObservability

Not affiliated with Anthropic, OpenAI or Microsoft.