AI Audio

Speech synthesis, voice cloning, music generation and audio production tools from Chinese AI teams.

Alibaba Cloud

Qwen Audio / CosyVoice

4.2

Qwen Cloud has enough official audio evidence to warrant a separate audio-category profile.

Best fit · Teams evaluating Chinese speech synthesis, voice cloning, ASR and realtime speech APIs through an English platform.

Coverage · 100/100

Globally availableFull English UITrustedPublic APIFreemium

Payment: Qwen Cloud billing / Token Plan where supported
Checked: May 17
Sources: High confidence
From: Free tier and pay-as-you-go speech API billing vary by model

View product

Zhipu AI

GLM Audio

3.9

Audio is now a documented GLM capability family and should be visible in the audio category.

Best fit · Developers evaluating Chinese speech, voice clone, ASR and realtime multimodal APIs.

Coverage · 100/100

Partially availablePartial English UITrustedPublic APIFreemium

Payment: Free model where available / Platform billing
Checked: May 17
Sources: High confidence
From: Usage-based audio API pricing varies by model

View product

MiniMax

MiniMax Audio / Speech

4.3

MiniMax Audio deserves a separate profile because the official API docs cover a mature speech product line beyond general model chat.

Best fit · Teams evaluating Chinese speech synthesis, voice cloning and multilingual audio generation APIs.

Coverage · 100/100

Globally availableFull English UITrustedPublic APIFreemium

Payment: Audio Subscription / Token Plan
Checked: May 17
Sources: High confidence
From: Audio Subscription, Token Plan quotas, Credits and pay-as-you-go billing vary by model

View product

Meituan LongCat

LongCat-AudioDiT

4.2

LongCat-AudioDiT belongs in AI Audio because it is a direct-text-to-speech and voice-cloning model with released code and weights, not a generic research paper.

Best fit · Researchers and speech teams evaluating open-source TTS, waveform-latent diffusion and zero-shot voice cloning.

Coverage · 100/100 · backfill: freshness

Globally availableFull English UITrustedLimited APIFree

Payment: GitHub repository / Model weights download
Checked: May 18
Sources: High confidence
From: Open-source MIT repository and released model weights; inference runs locally or through a Hugging Face-compatible workflow

View product

MiniMax

MiniMax Music

4.2

MiniMax Music is a distinct international product line in the official docs and should not be hidden inside a generic API profile.

Best fit · Creators and developers evaluating Chinese music generation APIs for songs, covers and app soundtracks.

Coverage · 100/100

Globally availableFull English UITrustedPublic APIFreemium

Payment: Token Plan / Credits
Checked: May 17
Sources: High confidence
From: Token Plan music quotas, Credits and pay-as-you-go billing vary by model

View product

ByteDance / Volcano Engine

Seeduplex

3.9

Seeduplex gives ByteDance Seed a distinct voice-interaction profile beyond text, image and video models.

Best fit · Teams tracking Chinese full-duplex speech models, realtime voice agents and multimodal interaction research.

Coverage · 100/100 · backfill: pricing

Partially availableFull English UITrustedLimited APIUnknown

Payment: BytePlus billing / Volcano Engine billing
Checked: May 17
Sources: High confidence
From: Voice model access and pricing should be verified through BytePlus or Volcano Engine

View product

StepFun

StepAudio

4.1

StepAudio is a distinct capability line and should be visible in the AI Audio category, not hidden under the generic StepFun profile.

Best fit · Teams evaluating Chinese speech APIs for expressive TTS, voice cloning, dubbing, customer service, NPC dialogue and transcription.

Coverage · 100/100 · backfill: freshness

Partially availableFull English UITrustedPublic APIPaid

Payment: Open Platform balance / Step Plan quota for supported audio models
Checked: May 17
Sources: High confidence
From: stepaudio-2.5-tts $0.85 / 10,000 characters; step-tts-2 $0.40 / 10,000 characters; ASR $0.022 / hour; voice cloning $1.50 / voice

View product

Xiaomi MiMo

MiMo Speech Models

4.0

MiMo now has enough English-facing speech signals to deserve a separate audio profile.

Best fit · Teams watching Xiaomi's speech stack for ASR, TTS and voice-agent experiments.

Coverage · 100/100 · backfill: pricing

Partially availableFull English UITrustedLimited APIUnknown

Payment: API Platform billing / AI Studio
Checked: May 17
Sources: High confidence
From: Speech-model pricing not publicly visible on the English homepage; verify inside MiMo API Platform

View product