Voice Changer
Returns a url to the generated audio file along with other associated properties.
Headers
Request
This parameter allows specifying the duration (in seconds) for the generated audio. If the value is 0, this parameter will be ignored. Only available for Gen2 model.
Valid values: STEREO, MONO
Set to true to receive audio in response as a Base64 encoded string along with a url.
The file to upload
Format of the generated audio file. Valid values: MP3, WAV, FLAC, ALAW, ULAW
Specifies the language for the generated audio, enabling a voice to speak in multiple languages natively. Only available in the Gen2 model. Valid values: “en-US”, “en-UK”, “es-ES”, etc.
Use the GET /v1/speech/voices endpoint to retrieve the list of available voices and languages.
Pitch of the voiceover
A JSON string that defines custom pronunciations for specific words or phrases. Each key is a word or phrase, and its value is an object with type
and pronunciation
.
Example 1: ’{“live”: {“type”: “IPA”, “pronunciation”: “laɪv”}}’
Example 2: ’{“2022”: {“type”: “SAY_AS”, “pronunciation”: “twenty twenty two”}}’
Speed of the voiceover
Set to true to retain the original accent of the speaker during voice generation.
Indicates whether to retain the original prosody (intonation, rhythm, and stress) of the input voice in the generated output.
Set to true to include a textual transcription of the generated audio in the response.
Valid values are 8000, 24000, 44100, 48000
The voice style to be used for voiceover generation.
This parameter allows specifying a transcription of the audio clip, which will then be used as input for the voice changer
Higher values will add more variation in terms of Pause, Pitch, and Speed to the voice. Only available for Gen2 model.
Use the GET /v1/speech/voices api to find supported voiceIds.
Response
Ok
The URL or path of the generated audio file.
Length of the generated audio in seconds.
Remaining number of characters available for synthesis in the current billing cycle.
Base64 encoded string of the generated audio. Used when audio is returned directly in the response.
Transcript of the generated audio, if transcription was requested.
Any warning or informational message related to the audio generation process.