Skip to main content

Advanced Features

While generating lifelike voiceovers is exciting on its own, Murf API lets you customise your API calls so that the generated voice overs are tailored to your use cases.

Pronunciations

Custom pronunciations can be used to improve the pronunciations of certain words in your script to better suit the context or accent. The pronunciationDictionary key in Synthesize Speech operation's request body is used to specify custom pronunciations.

Users can specify custom pronunciations as an IPA or an alternate word. IPA is an internationally recognized set of phonetic symbols based on the principle of strict one-to-one correspondence between sounds and symbols.

Pronunciations can be specified in a key-value pair format, where the key is the word that needs to be changed, and the value is an object that specifies the pronunciation type and value.

Example
{
"pronunciationDictionary": {
"live": { "type": "IPA", "pronunciation": "laɪv" },
"2022": { "type": "SAY_AS", "pronunciation": "twenty twenty two" }
}
}

Pauses

In the Synthesize Speech operation, the text key of the request body holds the text to be synthesized. This text key can be tweaked to add a pause between words in your script. This is done using Murf's pause syntax: [pause <duration>].

Specify how long you want the pause to be in seconds by replacing the <duration> part of the syntax, and you'll get silence for that duration in the generated voiceover. The pause duration can be between 0.1s to 5s.

Example
{
"text": "The answer to the problem was [pause 2s] patience."
}

Variation

This feature adjusts the variation in the generated voiceover using three primary parameters: pause, pitch, and speed. A higher variation value results in a more dynamic voice output, incorporating changes in speech delivery, pitch shifts, and pauses to make the audio sound more natural and less robotic.

  • Valid values: Integer between 0 and 5.
  • Default value: 1.
  • Higher values: Increasing the value will add more variation in voice style, with noticeable shifts in pause, pitch, and speed.
  • Availability: Only available for the Gen2 model.
Variation Example
{
"variation": 3
}

Audio Duration

The audioDuration key in Synthesize Speech operation's request body lets developers specify the desired length of the generated audio (in seconds), and the system adjusts the speech to fit this duration. This can be useful for matching voiceovers with specific audio lengths or other time constraints. The system will try to match the duration of the generated audio to audioDuration as closely as possible. If there’s a significant difference between the requested and actual duration, consider adjusting the text length or audioDuration value for better alignment.

  • Valid values: A double value representing the time in seconds.
  • Guideline: As a rule of thumb. ~150 words of text generates around 60 seconds of audio.
  • Availability: Only available for the Gen2 model.
audioDuration Example
{
"audioDuration": 5.0
}

Multi-Native Locale

The multiNativeLocale feature enables a single voice to fluently speak in multiple languages, delivering speech that sounds native to each language. By default, the value will be set to the base language of the selected voice. In the response of the GET /v1/speech/voices endpoint, the locale field will indicate the base language of the voice, while the supportedLocales field will list all the languages the voice supports. The response also provides details such as the voiceId, display name, gender, supported languages, and styles.

Example
{
"multiNativeLocale": "fr-FR"
}

Base64 Encoding

You can set the encodeAsBase64 variable of the Synthesize Speech request body as true if you want your generated audio returned as a base64 encoded string instead of an audio URL.

Base64 Example
{
"voiceId": "en-US-julie",
"text": "This audio will be returned as a base64 string",
"encodeAsBase64": true
}

gzip Support

Responses from Murf API can be gzipped by including "gzip" in the accept-encoding header of your requests. This is especially beneficial if you choose to return the audio response as a Base64 encoded string.

Example headers for gzipped response
{
"headers": {
"Content-Type": "application/json",
"Accept": "application/json",
"accept-encoding": "gzip",
"token": "TOKEN"
}
}