AgentIndex icon
AgentIndex
ToolsCategoriesTrendingNewCompare
Submit Tool
Home/
Vision / Multimodal/
Screenhand
Screenhand logo

Screenhand

Active·★ 8·AGPL-3.0·Updated 2026-04-02
★ Trending★ Browser Automation★ Workflow Automation

Give AI eyes and hands on your desktop. Open-source MCP server for desktop automation — screenshots, UI control, browser automation, OCR. Works with Claude, Cursor, and any MCP client. macOS + Windows.

ScreenHand is an open-source MCP server providing native desktop control for AI agents on macOS and Windows. It integrates Accessibility APIs, UI Automation, OCR, and Chrome DevTools Protocol to enable fast, robust interaction with applications and browsers, including multi-agent coordination and background job processing.

#Desktop Automation#AI Integration#UI Automation#Accessibility#OCR#Chrome DevTools Protocol#Multi-agent#Task Automation
$ Install
$ git clone https://github.com/manushi4/screenhand.git && cd screenhand && npm install && npm run build:native
↗ Visit site★ GitHub
01

Features

01Native UI control for macOS and Windows via Accessibility/UI Automation.
02Background Chrome browser control via DevTools Protocol.
03Extensive toolkit (82 tools) for cross-app automation and data extraction.
04Multi-agent coordination with session leases and stall detection.
05Persistent learning memory and background job processing daemon.
02

Compatibility

macOS
Platform
Verified via docs
Windows
Platform
Verified via docs
Node.js
Runtime
Verified via docs
MCP Clients
Integration
Verified via docs
Claude Desktop
AI Client
Verified via docs
Cursor
AI Client
Verified via docs
03

Quick start

1
$ git clone https://github.com/manushi4/screenhand.git
2
$ cd screenhand
3
$ npm install
4
$ npm run build:native
04

Use cases

↳AI-driven debugging and inspection of desktop application UIs.
↳Automating browser tasks like form filling, data scraping, and JavaScript execution via Chrome CDP.
↳Creating cross-application workflows by chaining actions across different desktop apps.
↳Coordinating multiple AI agents to control different application windows simultaneously without conflicts.
↳Processing multi-step automation jobs continuously in the background using a worker daemon.
05

Alternatives

ragflow logo
ragflow★ 81.5k
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
vs →
n8n logo
n8n★ 190.2k
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
vs →
cua logo
cua★ 17.3k
Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).
vs →
mcp-context-forge logo
mcp-context-forge★ 3.8k
A Model Context Protocol (MCP) Gateway & Registry. Serves as a central management point for tools, resources, and prompts that can be accessed by MCP-compatible LLM applications. Converts REST API endpoints to MCP, composes virtual MCP servers with added security and observability, and converts between protocols (stdio, SSE, Streamable HTTP).
vs →
MCP-Chinese-Getting-Started-Guide logo
MCP-Chinese-Getting-Started-Guide★ 3.5k
Model Context Protocol(MCP) 编程极速入门
vs →
cursor-talk-to-figma-mcp logo
cursor-talk-to-figma-mcp★ 6.8k
TalkToFigma: MCP integration between Cursor and Figma, allowing Cursor Agentic AI to communicate with Figma for reading designs and modifying them programmatically.
vs →
XcodeBuildMCP logo
XcodeBuildMCP★ 5.8k
A Model Context Protocol (MCP) server that provides Xcode-related tools for integration with AI assistants and other MCP clients.
vs →
google_workspace_mcp logo
google_workspace_mcp★ 2.5k
Control Gmail, Google Calendar, Docs, Sheets, Slides, Chat, Forms, Tasks, Search & Drive with AI - Comprehensive Google Workspace / G Suite MCP Server
vs →
See all alternatives →

Related searches

Screenhand AlternativesBest Vision / Multimodal Tools 2026Open Source Vision / MultimodalScreenhand TutorialScreenhand Vs CompetitorsDesktop AutomationAI IntegrationUI Automation

Comments

Log in to leave a comment
  • J
    Jamie MartinezMay 22, 2026

    More reliable than screenshot-based approaches for detecting UI state changes

  • J
    Jamie ZhangApr 2, 2026

    Used for UI automation testing workflows that require actual screen interaction

  • R
    Rowan KimMar 22, 2026

    Open eyes and hands for desktop automation via MCP is powerful for automation use cases

  • D
    Dylan WilsonMar 15, 2026

    The open-source approach means you can audit exactly what the AI agent is doing on screen

On this page
01Features02Compatibility03Quick start04Use cases05Alternatives
Stats
GitHub Stars★ 8
Last commit1mo ago
StatusActive
LicenseAGPL-3.0
CategoryVision / Multimodal
Trend (30d)
+0.3↑ 0.5%
Links
Documentation↗Discussion↗Issues↗Releases↗

Deploy on DigitalOcean — Get $200 Free Credit

Ad
© 2026 AgentIndex.app|Built by a 10-year iOS Developer.
QYSGitHubBuy me a coffee ☕

Browse by Category

Code AssistantWorkflow AutomationRAG / Knowledge BaseMulti-AgentBrowser AutomationLLM InfraDev ToolingObservability

Not affiliated with Anthropic, OpenAI or Microsoft.