Voice Changer

Returns a url to the generated audio file along with other associated properties.

Headers

api-keystringOptional

Request

This endpoint expects a multipart form containing an optional file.
audio_durationdoubleOptional>=0

This parameter allows specifying the duration (in seconds) for the generated audio. If the value is 0, this parameter will be ignored. Only available for Gen2 model.

channel_typestringOptionalDefaults to MONO

Valid values: STEREO, MONO

encode_output_as_base64booleanOptional

Set to true to receive audio in response as a Base64 encoded string along with a url.

filefileOptional

The file to upload

file_urlstringOptional
formatstringOptionalDefaults to WAV

Format of the generated audio file. Valid values: MP3, WAV, FLAC, ALAW, ULAW

multi_native_localestringOptional

Specifies the language for the generated audio, enabling a voice to speak in multiple languages natively. Only available in the Gen2 model. Valid values: “en-US”, “en-UK”, “es-ES”, etc.

Use the GET /v1/speech/voices endpoint to retrieve the list of available voices and languages.

pitchintegerOptional>=-50<=50

Pitch of the voiceover

pronunciation_dictionarystringOptional

A JSON string that defines custom pronunciations for specific words or phrases. Each key is a word or phrase, and its value is an object with type and pronunciation.

Example 1: ’{“live”: {“type”: “IPA”, “pronunciation”: “laɪv”}}’

Example 2: ’{“2022”: {“type”: “SAY_AS”, “pronunciation”: “twenty twenty two”}}’

rateintegerOptional>=-50<=50

Speed of the voiceover

retain_accentbooleanOptionalDefaults to true

Set to true to retain the original accent of the speaker during voice generation.

retain_prosodybooleanOptionalDefaults to true

Indicates whether to retain the original prosody (intonation, rhythm, and stress) of the input voice in the generated output.

return_transcriptionbooleanOptionalDefaults to false

Set to true to include a textual transcription of the generated audio in the response.

sample_ratedoubleOptionalDefaults to 44100

Valid values are 8000, 24000, 44100, 48000

stylestringOptional

The voice style to be used for voiceover generation.

transcriptionstringOptional

This parameter allows specifying a transcription of the audio clip, which will then be used as input for the voice changer

variationintegerOptional>=0<=5Defaults to 1

Higher values will add more variation in terms of Pause, Pitch, and Speed to the voice. Only available for Gen2 model.

voice_idstringOptional

Use the GET /v1/speech/voices api to find supported voiceIds.

Response

Ok

audio_filestring

The URL or path of the generated audio file.

audio_length_in_secondsdouble

Length of the generated audio in seconds.

remaining_character_countlong

Remaining number of characters available for synthesis in the current billing cycle.

encoded_audiostringOptional

Base64 encoded string of the generated audio. Used when audio is returned directly in the response.

transcriptionstringOptional

Transcript of the generated audio, if transcription was requested.

warningstringOptional

Any warning or informational message related to the audio generation process.

Errors