AgentIndex icon
AgentIndex
ToolsCategoriesTrendingNewCompare
Submit Tool
Home/
Browser Automation/
gemini-skill
gemini-skill logo

gemini-skill

Active·★ 822·Updated 2026-05-29
★ Vision / Multimodal★ Browser Automation

gemini drawing MCP & skill through browser, can be used in openclaw or any agent that supports MCP.

Gemini Skill automates interactions with Google Gemini's web interface using CDP. It supports AI image generation, multi-turn conversations, image uploading and extraction, session management, and an MCP server for integration with AI clients. The system uses a daemon architecture to manage browser processes efficiently.

#automation#drawing#gemini#mcp#mcp-client#mcp-server#mcp-servers#openclaw
$ Install
$ git clone https://github.com/WJZ-P/gemini-skill.git && cd gemini-skill && npm install
↗ Visit site★ GitHub
01

Features

01AI image generation with prompt and full-size download
02Multi-turn text dialogue with Gemini
03Image upload for reference-based generation
04Image extraction from conversations (base64 and CDP full-size)
05Session management (new, temp, model switch, navigate history)
02

Compatibility

Windows
Windows
Verified via docs
macOS
macOS
Verified via docs
Linux
Linux
Verified via docs
03

Quick start

1
$ git clone https://github.com/WJZ-P/gemini-skill.git
2
$ cd gemini-skill
3
$ npm install
04

Use cases

↳Automatically generate game-style emojis through AI dialogue
↳Conduct multi-turn conversations with Gemini for information retrieval
↳Upload a reference image to generate a new variant using Gemini
05

Alternatives

CopilotKit logo
CopilotKit★ 31.8k
React UI + elegant infrastructure for AI Copilots, AI chatbots, and in-app AI agents. The Agentic Frontend.
vs →
mcp-chrome logo
mcp-chrome★ 11.8k
Chrome MCP Server is a Chrome extension-based Model Context Protocol (MCP) server that exposes your Chrome browser functionality to AI assistants like Claude, enabling complex browser automation, content analysis, and semantic search.
vs →
budibase logo
budibase★ 28.0k
Create business apps and automate workflows in minutes. Supports PostgreSQL, MySQL, MariaDB, MSSQL, MongoDB, Rest API, Docker, K8s, and more 🚀 No code / Low code platform..
vs →
fastmcp logo
fastmcp★ 25.4k
🚀 The fast, Pythonic way to build MCP servers and clients.
vs →
FunASR logo
FunASR★ 16.6k
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
vs →
nuclear logo
nuclear★ 17.7k
Streaming music player that finds free music for you
vs →
semble logo
semble★ 4.5k
Fast and Accurate Code Search for Agents
vs →
csharp-sdk logo
csharp-sdk★ 4.3k
The official C# SDK for Model Context Protocol servers and clients. Maintained in collaboration with Microsoft.
vs →
See all alternatives →

Related searches

gemini-skill AlternativesBest Browser Automation Tools 2026Open Source Browser Automationgemini-skill Tutorialgemini-skill Vs Competitorsautomationdrawinggemini

Comments

Log in to leave a comment
  • Jesse Chen
    Jesse ChenMay 23, 2026

    Works surprisingly well on Node 18+ setups. The browser automation side of it is remarkably stable.

  • O
    Oakley ZhangApr 9, 2026

    Used this to let an LLM agent draw mockups directly in a headless browser while iterating on UI feedback.

  • J
    Justice GarciaMar 13, 2026

    Is there support for rendering SVG outputs directly, or does it always go through the canvas element?

On this page
01Features02Compatibility03Quick start04Use cases05Alternatives
Stats
GitHub Stars★ 822
Last commit1d ago
StatusActive
License—
CategoryBrowser Automation
Trend (30d)
+32.8↑ 0.9%
Links
Documentation↗Discussion↗Issues↗Releases↗

Deploy on DigitalOcean — Get $200 Free Credit

Ad
© 2026 AgentIndex.app|Built by a 10-year iOS Developer.
QYSGitHubBuy me a coffee ☕

Browse by Category

Code AssistantWorkflow AutomationRAG / Knowledge BaseMulti-AgentBrowser AutomationLLM InfraDev ToolingObservability

Not affiliated with Anthropic, OpenAI or Microsoft.