AI Audio

Speech synthesis, voice cloning, music generation and audio production tools from Chinese AI teams.

Alibaba Cloud

Qwen Audio / CosyVoice

4.2

Qwen Cloud has enough official audio evidence to warrant a separate audio-category profile.

Best fit · Teams evaluating Chinese speech synthesis, voice cloning, ASR and realtime speech APIs through an English platform.

Coverage · 100/100

Globally availableFull English UITrusted

Zhipu AI

GLM Audio

3.9

Audio is now a documented GLM capability family and should be visible in the audio category.

Best fit · Developers evaluating Chinese speech, voice clone, ASR and realtime multimodal APIs.

Coverage · 100/100

Partially availablePartial English UITrusted

MiniMax

MiniMax Audio / Speech

4.3

MiniMax Audio deserves a separate profile because the official API docs cover a mature speech product line beyond general model chat.

Best fit · Teams evaluating Chinese speech synthesis, voice cloning and multilingual audio generation APIs.

Coverage · 100/100

Globally availableFull English UITrusted

Meituan LongCat

LongCat-AudioDiT

4.2

LongCat-AudioDiT belongs in AI Audio because it is a direct-text-to-speech and voice-cloning model with released code and weights, not a generic research paper.

Best fit · Researchers and speech teams evaluating open-source TTS, waveform-latent diffusion and zero-shot voice cloning.

Coverage · 100/100 · backfill: freshness

Globally availableFull English UITrusted

MiniMax

MiniMax Music

4.2

MiniMax Music is a distinct international product line in the official docs and should not be hidden inside a generic API profile.

Best fit · Creators and developers evaluating Chinese music generation APIs for songs, covers and app soundtracks.

Coverage · 100/100

Globally availableFull English UITrusted

ByteDance / Volcano Engine

Seeduplex

3.9

Seeduplex gives ByteDance Seed a distinct voice-interaction profile beyond text, image and video models.

Best fit · Teams tracking Chinese full-duplex speech models, realtime voice agents and multimodal interaction research.

Coverage · 100/100 · backfill: pricing

Partially availableFull English UITrusted

StepFun

StepAudio

4.1

StepAudio is a distinct capability line and should be visible in the AI Audio category, not hidden under the generic StepFun profile.

Best fit · Teams evaluating Chinese speech APIs for expressive TTS, voice cloning, dubbing, customer service, NPC dialogue and transcription.

Coverage · 100/100 · backfill: freshness

Partially availableFull English UITrusted

Xiaomi MiMo

MiMo Speech Models

4.0

MiMo now has enough English-facing speech signals to deserve a separate audio profile.

Best fit · Teams watching Xiaomi's speech stack for ASR, TTS and voice-agent experiments.

Coverage · 100/100 · backfill: pricing

Partially availableFull English UITrusted