Synthesize Speech

POST

Returns a url to the generated audio file along with other associated properties.

Headers

api-keystringOptional

Request

This endpoint expects an object.
textstringRequired

The text that is to be synthesised. e.g. ‘Hello there [pause 1s] friend’

voiceIdstringRequired
audioDurationdoubleOptional>=0

This parameter allows specifying the duration (in seconds) for the generated audio. If the value is 0, this parameter will be ignored. Only available for Gen2 model.

channelTypestringOptionalDefaults to MONO

Valid values: STEREO, MONO

encodeAsBase64booleanOptional

Set to true to receive audio in response as a Base64 encoded string instead of a url.

formatstringOptionalDefaults to WAV

Format of the generated audio file. Valid values: MP3, WAV, FLAC, ALAW, ULAW

modelVersionenumOptionalDefaults to GEN2
Allowed values: GEN1GEN2

Valid values: GEN1, GEN2. Use GEN2 to generate audio using new and advanced model. Outputs from Gen 2 will sound better, but different from the old model

multiNativeLocalestringOptional

Specifies the language for the generated audio, enabling a voice to speak in multiple languages natively. Only available in the Gen2 model. Valid values: “en-US”, “en-UK”, “es-ES”, etc. Use the GET /v1/speed/voices endpoint to retrieve the list of available voices and languages.

pitchintegerOptional>=-50<=50

Pitch of the voiceover

pronunciationDictionarymap from strings to objectsOptional

An object used to define custom pronunciations.

Example 1: {“live”:{“type”: “IPA”, “pronunciation”: “laɪv”}}.

Example 2: {“2022”:{“type”: “SAY_AS”, “pronunciation”: “twenty twenty two”}}

rateintegerOptional>=-50<=50

Speed of the voiceover

sampleRatedoubleOptionalDefaults to 24000

Valid values are 8000, 24000, 44100, 48000

stylestringOptional

The voice style to be used for voiceover generation.

variationintegerOptional>=0<=5Defaults to 1

Higher values will add more variation in terms of Pause, Pitch, and Speed to the voice. Only available for Gen2 model.

Response

Ok

audioFilestringOptional
audioLengthInSecondsdoubleOptional
consumedCharacterCountlongOptional

Number of characters consumed so far in the current billing cycle.

encodedAudiostringOptional
remainingCharacterCountlongOptional

Remaining number of characters available for synthesis in the current billing cycle.

warningstringOptional
wordDurationslist of objectsOptional

Errors

Built with