ChatTTS: ChatTTS is a generative speech model specifically designed for daily dialogue scenarios, such as LLM assistants. It offers natural and expressive speech with fine-grained control over prosodic features like laughter, pauses, and interjections.; FunASR: FunASR is a fundamental end-to-end speech recognition toolkit. It offers industrial-grade speech recognition, being 170x faster than Whisper, supporting over 50 languages, and integrating features like speaker diarization, emotion detection, and streaming.
Providing voice output for LLM assistants in dialogue scenarios
Meeting transcription with speaker labels, timestamps, and punctuation