MiniMax

MiniMax Audio / Speech

MiniMax Audio is tracked for Speech 2.8, Speech 2.6 and Speech-02 models, 40-language speech synthesis, synchronous HTTP/WebSocket TTS, async long-form TTS, voice cloning and official voice-management APIs.

Globally availableFull English UIPublic APIFreemiumTrusted

Quick answers

At a glance

Overview
MiniMax's international speech stack for text-to-speech, long-form audio, voice cloning, voice design and voice management.
Best fit
Teams evaluating Chinese speech synthesis, voice cloning and multilingual audio generation APIs.
Trust
3/3 sources verified, recently checked · 2026-05-17
Coverage
100/100

Editorial verdict

Best for

Teams evaluating Chinese speech synthesis, voice cloning and multilingual audio generation APIs.

Avoid if

Avoid using cloned voices in production without clear consent, data and commercial-use review.

Why it matters

MiniMax Audio deserves a separate profile because the official API docs cover a mature speech product line beyond general model chat.

Pricing

Audio Subscription, Token Plan quotas, Credits and pay-as-you-go billing vary by model

Payment

Audio Subscription, Token Plan, Credits, Pay-as-you-go API billing

Commercial use

Voice cloning, synthetic voice and generated-audio use should be reviewed against current consent and product terms.

Privacy

Review uploaded voice samples, cloned voice retention and generated audio storage before using real voices.

Use-case fit

Multilingual text-to-speech

Strong

Use Speech 2.8 or 2.6 for multilingual TTS, voice chat and online social interaction scenarios.

Long-form audio generation

Strong

Async TTS supports long-form audio tasks such as books or long documents.

Voice cloning and custom voices

Medium

Use voice cloning and voice design only after legal and consent checks.

Global user checklist

RegistrationConfirmedAvailable through the MiniMax international API platform.
English UIConfirmedSpeech docs and product pages are English-facing.
API and docsConfirmedOfficial docs cover TTS, async TTS, voice cloning, voice design and voice management.
Commercial useReviewVoice rights, consent and generated-audio usage need explicit review.
Coverage · 100/100

Recheck model list, supported languages, voice IDs and subscription quotas before production.

Pros

  • - Speech 2.8 and 2.6 are current documented models
  • - Supports HTTP and WebSocket TTS plus async long-text generation
  • - Voice cloning and voice design APIs are documented

Cons

  • - Voice rights and consent requirements need explicit review

Decision paths

Use the API Platform for full multimodal access

The API Platform profile covers billing, keys and cross-modal integration.

Compare with SparkDesk for China speech workflows

SparkDesk is a Chinese speech and vertical-scenario reference.

Sources

MiniMax models overview

docs · en · verified 2026-05-17

Lists Speech 2.8, Speech 2.6 and Speech-02 model families.

MiniMax speech guide

docs · en · verified 2026-05-17

Documents synchronous TTS and streaming usage.

MiniMax voice clone guide

docs · en · verified 2026-05-17

Documents voice cloning capabilities.

Reviews