Alibaba Cloud
Qwen Audio / CosyVoice
Qwen Cloud lists CosyVoice-v3-plus for high-quality speech synthesis and voice cloning, Qwen3-ASR-Flash-Realtime for multilingual real-time speech recognition, and docs list Qwen3.5-Omni-Flash-Realtime for real-time multimodal speech conversation.
Quick answers
At a glance
- Overview
- Qwen Cloud's speech stack for text-to-speech, voice cloning, speech recognition and real-time multimodal speech.
- Best fit
- Teams evaluating Chinese speech synthesis, voice cloning, ASR and realtime speech APIs through an English platform.
- Trust
- 2/2 sources verified, recently checked · 2026-05-17
- Coverage
- 100/100
Editorial verdict
Best for
Teams evaluating Chinese speech synthesis, voice cloning, ASR and realtime speech APIs through an English platform.
Avoid if
Avoid using cloned voices in production without explicit rights and retention checks.
Why it matters
Qwen Cloud has enough official audio evidence to warrant a separate audio-category profile.
Pricing
Free tier and pay-as-you-go speech API billing vary by model
Payment
Qwen Cloud billing, Token Plan where supported, Pay-as-you-go API billing
Commercial use
Commercial use should follow the current product, API, model license and billing terms.
Privacy
Review prompt, file, media upload, retention and training-use terms before sensitive workloads.
Use-case fit
Text-to-speech and voice cloning
StrongUse CosyVoice for professional speech synthesis and custom voice generation.
Realtime multilingual ASR
StrongUse Qwen3-ASR-Flash-Realtime for multilingual speech recognition tests.
Global user checklist
Model names, quotas, release status, regional access and commercial terms can change quickly; recheck official sources before procurement or production use.
Pros
- - TTS, voice cloning, ASR and realtime speech categories are documented
- - CosyVoice and Qwen3-ASR are visible in the English marketplace
Cons
- - Voice cloning requires consent and data-handling review
Decision paths
minimax-audio
zhipu-glm-audio
sparkdesk
Sources
official · en · verified 2026-05-17
Lists CosyVoice-v3-plus and Qwen3-ASR-Flash-Realtime.
docs · en · verified 2026-05-17
Lists text-to-speech, speech-to-text and speech-to-speech categories.