FunASR: FunASR is a fundamental end-to-end speech recognition toolkit. It offers industrial-grade speech recognition, being 170x faster than Whisper, supporting over 50 languages, and integrating features like speaker diarization, emotion detection, and streaming.; gemini-skill: Gemini Skill automates interactions with Google Gemini's web interface using CDP. It supports AI image generation, multi-turn conversations, image uploading and extraction, session management, and an MCP server for integration with AI clients. The system uses a daemon architecture to manage browser processes efficiently.
Meeting transcription with speaker labels, timestamps, and punctuation
Automatically generate game-style emojis through AI dialogue