MiniMax
MiniMax Audio / Speech
MiniMax Audio is tracked for Speech 2.8, Speech 2.6 and Speech-02 models, 40-language speech synthesis, synchronous HTTP/WebSocket TTS, async long-form TTS, voice cloning and official voice-management APIs.
Quick answers
At a glance
- Overview
- MiniMax's international speech stack for text-to-speech, long-form audio, voice cloning, voice design and voice management.
- Best fit
- Teams evaluating Chinese speech synthesis, voice cloning and multilingual audio generation APIs.
- Trust
- 3/3 sources verified, recently checked · 2026-05-17
- Coverage
- 100/100
Editorial verdict
Best for
Teams evaluating Chinese speech synthesis, voice cloning and multilingual audio generation APIs.
Avoid if
Avoid using cloned voices in production without clear consent, data and commercial-use review.
Why it matters
MiniMax Audio deserves a separate profile because the official API docs cover a mature speech product line beyond general model chat.
Pricing
Audio Subscription, Token Plan quotas, Credits and pay-as-you-go billing vary by model
Payment
Audio Subscription, Token Plan, Credits, Pay-as-you-go API billing
Commercial use
Voice cloning, synthetic voice and generated-audio use should be reviewed against current consent and product terms.
Privacy
Review uploaded voice samples, cloned voice retention and generated audio storage before using real voices.
Use-case fit
Multilingual text-to-speech
StrongUse Speech 2.8 or 2.6 for multilingual TTS, voice chat and online social interaction scenarios.
Long-form audio generation
StrongAsync TTS supports long-form audio tasks such as books or long documents.
Voice cloning and custom voices
MediumUse voice cloning and voice design only after legal and consent checks.
Global user checklist
Recheck model list, supported languages, voice IDs and subscription quotas before production.
Pros
- - Speech 2.8 and 2.6 are current documented models
- - Supports HTTP and WebSocket TTS plus async long-text generation
- - Voice cloning and voice design APIs are documented
Cons
- - Voice rights and consent requirements need explicit review
Decision paths
The API Platform profile covers billing, keys and cross-modal integration.
SparkDesk is a Chinese speech and vertical-scenario reference.
Sources
docs · en · verified 2026-05-17
Lists Speech 2.8, Speech 2.6 and Speech-02 model families.