Alibaba Cloud

Qwen Audio / CosyVoice

Qwen Cloud lists CosyVoice-v3-plus for high-quality speech synthesis and voice cloning, Qwen3-ASR-Flash-Realtime for multilingual real-time speech recognition, and docs list Qwen3.5-Omni-Flash-Realtime for real-time multimodal speech conversation.

Globally availableFull English UIPublic APIFreemiumTrusted

Quick answers

At a glance

Overview
Qwen Cloud's speech stack for text-to-speech, voice cloning, speech recognition and real-time multimodal speech.
Best fit
Teams evaluating Chinese speech synthesis, voice cloning, ASR and realtime speech APIs through an English platform.
Trust
2/2 sources verified, recently checked · 2026-05-17
Coverage
100/100

Editorial verdict

Best for

Teams evaluating Chinese speech synthesis, voice cloning, ASR and realtime speech APIs through an English platform.

Avoid if

Avoid using cloned voices in production without explicit rights and retention checks.

Why it matters

Qwen Cloud has enough official audio evidence to warrant a separate audio-category profile.

Pricing

Free tier and pay-as-you-go speech API billing vary by model

Payment

Qwen Cloud billing, Token Plan where supported, Pay-as-you-go API billing

Commercial use

Commercial use should follow the current product, API, model license and billing terms.

Privacy

Review prompt, file, media upload, retention and training-use terms before sensitive workloads.

Use-case fit

Text-to-speech and voice cloning

Strong

Use CosyVoice for professional speech synthesis and custom voice generation.

Realtime multilingual ASR

Strong

Use Qwen3-ASR-Flash-Realtime for multilingual speech recognition tests.

Global user checklist

RegistrationConfirmedAudio entries are listed in the Qwen Cloud English marketplace and docs.
English UIConfirmedMarketplace and docs are English-facing.
API and docsConfirmedDocs include TTS, ASR and speech-to-speech model categories.
Commercial useReviewVoice cloning, synthetic speech and recordings need consent and policy review.
Coverage · 100/100

Model names, quotas, release status, regional access and commercial terms can change quickly; recheck official sources before procurement or production use.

Pros

  • - TTS, voice cloning, ASR and realtime speech categories are documented
  • - CosyVoice and Qwen3-ASR are visible in the English marketplace

Cons

  • - Voice cloning requires consent and data-handling review

Decision paths

minimax-audio

zhipu-glm-audio

sparkdesk

Sources

Qwen Cloud model marketplace

official · en · verified 2026-05-17

Lists CosyVoice-v3-plus and Qwen3-ASR-Flash-Realtime.

Qwen Cloud model selection

docs · en · verified 2026-05-17

Lists text-to-speech, speech-to-text and speech-to-speech categories.

Reviews