Synthesize Speech
Headers
Request
The text that is to be synthesised. e.g. ‘Hello there [pause 1s] friend’
Use the GET /v1/speech/voices API to find supported voiceIds. You can use either the voiceId (e.g. en-US-natalie) or just the voice actor’s name (e.g. natalie).
This parameter allows specifying the duration (in seconds) for the generated audio. If the value is 0, this parameter will be ignored. Only available for Gen2 model.
Valid values: STEREO, MONO
Format of the generated audio file. Valid values: MP3, WAV, FLAC, ALAW, ULAW, PCM, OGG
Valid values: GEN2. Audio will be generated using the new and advanced GEN2 model. Outputs from GEN2 sound more natural and high-quality compared to earlier models.
Specifies the language for the generated audio, enabling a voice to speak in multiple languages natively. Only available in the Gen2 model. Valid values: “en-US”, “en-UK”, “es-ES”, etc. Use the GET /v1/speech/voices endpoint to retrieve the list of available voices and languages.
An object used to define custom pronunciations.
Example 1: {“live”:{“type”: “IPA”, “pronunciation”: “laɪv”}}.
Example 2: {“2022”:{“type”: “SAY_AS”, “pronunciation”: “twenty twenty two”}}
If set to true, the word durations in response will return words as the original input text. (English only)