Z.ai's audio and realtime multimodal API family, including GLM-TTS, voice clone, ASR, Realtime and GLM-4-Voice.

GLM Audio English UI and API

English UI: partial · API: available

Zhipu AI

GLM Audio

Name: GLM Audio
Price: Usage-based audio API pricing varies by model
Availability: LimitedAvailability
Rating: 3.9 (0 reviews)

BigModel's model overview lists GLM-TTS, GLM-TTS-Clone, GLM-ASR, GLM-Realtime and GLM-4-Voice under audio/video models, covering speech synthesis, voice cloning, speech recognition and realtime audio-video interaction.

Partially availablePartial English UIPublic APIFreemiumTrusted

Quick answers

At a glance

Overview: Z.ai's audio and realtime multimodal API family, including GLM-TTS, voice clone, ASR, Realtime and GLM-4-Voice.
Best fit: Developers evaluating Chinese speech, voice clone, ASR and realtime multimodal APIs.
Trust: 2/2 sources verified, recently checked · 2026-05-17
Coverage: 100/100

Editorial verdict

Best for

Developers evaluating Chinese speech, voice clone, ASR and realtime multimodal APIs.

Avoid if

Avoid production voice cloning without consent and data-retention review.

Why it matters

Audio is now a documented GLM capability family and should be visible in the audio category.

Pricing

Usage-based audio API pricing varies by model

Payment

Free model where available, Platform billing

Commercial use

Commercial use should follow the current product, API, model license and billing terms.

Privacy

Review prompt, file, media upload, retention and training-use terms before sensitive workloads.

Use-case fit

Speech and realtime API evaluation

Strong

Use it to test TTS, voice cloning, ASR and realtime audio-video calls.

Global user checklist

RegistrationPartialAccess depends on BigModel account and region.

English UIPartialDetailed audio docs are Chinese-facing.

API and docsConfirmedDocs index lists text-to-speech, voice clone, ASR and realtime APIs.

Commercial useReviewVoice rights and consent are required for real-person voice use.

Coverage · 100/100

Model names, quotas, release status, regional access and commercial terms can change quickly; recheck official sources before procurement or production use.

Pros

- Speech synthesis, clone, ASR and realtime models are all documented
- Useful complement to GLM chat and vision APIs

Cons

- Voice consent and regional access need explicit checks

Decision paths

minimax-audio

sparkdesk

zhipu-glm

Sources

BigModel model overview

docs · zh · verified 2026-05-17

Lists GLM-TTS, GLM-TTS-Clone, GLM-ASR, GLM-Realtime and GLM-4-Voice.

BigModel docs index