Speech synthesis, voice cloning, music generation and audio production tools from Chinese AI teams.
Alibaba Cloud
Qwen Audio / CosyVoice Qwen Cloud has enough official audio evidence to warrant a separate audio-category profile.
Best fit · Teams evaluating Chinese speech synthesis, voice cloning, ASR and realtime speech APIs through an English platform.
Coverage · 100/100
Globally available Full English UI Trusted Public API Freemium
PaymentQwen Cloud billing / Token Plan where supported
From Free tier and pay-as-you-go speech API billing vary by model Audio is now a documented GLM capability family and should be visible in the audio category.
Best fit · Developers evaluating Chinese speech, voice clone, ASR and realtime multimodal APIs.
Coverage · 100/100
Partially available Partial English UI Trusted Public API Freemium
PaymentFree model where available / Platform billing
From Usage-based audio API pricing varies by model MiniMax
MiniMax Audio / Speech MiniMax Audio deserves a separate profile because the official API docs cover a mature speech product line beyond general model chat.
Best fit · Teams evaluating Chinese speech synthesis, voice cloning and multilingual audio generation APIs.
Coverage · 100/100
Globally available Full English UI Trusted Public API Freemium
PaymentAudio Subscription / Token Plan
From Audio Subscription, Token Plan quotas, Credits and pay-as-you-go billing vary by model Meituan LongCat
LongCat-AudioDiT LongCat-AudioDiT belongs in AI Audio because it is a direct-text-to-speech and voice-cloning model with released code and weights, not a generic research paper.
Best fit · Researchers and speech teams evaluating open-source TTS, waveform-latent diffusion and zero-shot voice cloning.
Coverage · 100/100 · backfill: freshness
Globally available Full English UI Trusted Limited API Free
PaymentGitHub repository / Model weights download
From Open-source MIT repository and released model weights; inference runs locally or through a Hugging Face-compatible workflow MiniMax Music is a distinct international product line in the official docs and should not be hidden inside a generic API profile.
Best fit · Creators and developers evaluating Chinese music generation APIs for songs, covers and app soundtracks.
Coverage · 100/100
Globally available Full English UI Trusted Public API Freemium
PaymentToken Plan / Credits
From Token Plan music quotas, Credits and pay-as-you-go billing vary by model ByteDance / Volcano Engine
Seeduplex Seeduplex gives ByteDance Seed a distinct voice-interaction profile beyond text, image and video models.
Best fit · Teams tracking Chinese full-duplex speech models, realtime voice agents and multimodal interaction research.
Coverage · 100/100 · backfill: pricing
Partially available Full English UI Trusted Limited API Unknown
PaymentBytePlus billing / Volcano Engine billing
From Voice model access and pricing should be verified through BytePlus or Volcano Engine StepAudio is a distinct capability line and should be visible in the AI Audio category, not hidden under the generic StepFun profile.
Best fit · Teams evaluating Chinese speech APIs for expressive TTS, voice cloning, dubbing, customer service, NPC dialogue and transcription.
Coverage · 100/100 · backfill: freshness
Partially available Full English UI Trusted Public API Paid
PaymentOpen Platform balance / Step Plan quota for supported audio models
From stepaudio-2.5-tts $0.85 / 10,000 characters; step-tts-2 $0.40 / 10,000 characters; ASR $0.022 / hour; voice cloning $1.50 / voice Xiaomi MiMo
MiMo Speech Models MiMo now has enough English-facing speech signals to deserve a separate audio profile.
Best fit · Teams watching Xiaomi's speech stack for ASR, TTS and voice-agent experiments.
Coverage · 100/100 · backfill: pricing
Partially available Full English UI Trusted Limited API Unknown
PaymentAPI Platform billing / AI Studio
From Speech-model pricing not publicly visible on the English homepage; verify inside MiMo API Platform