Advanced Features
While generating lifelike voiceovers is exciting on its own, Murf API lets you customise your API calls so that the generated voice overs are tailored to your use cases.
Pronunciations
Custom pronunciations can be used to improve the pronunciations of certain words in your script to better suit the context or accent. The pronunciationDictionary
key in Synthesize Speech operation's request body is used to specify custom pronunciations.
Users can specify custom pronunciations as an IPA or an alternate word. IPA is an internationally recognized set of phonetic symbols based on the principle of strict one-to-one correspondence between sounds and symbols.
Pronunciations can be specified in a key-value pair format, where the key is the word that needs to be changed, and the value is an object that specifies the pronunciation type and value.
{
"pronunciationDictionary": {
"live": { "type": "IPA", "pronunciation": "laɪv" },
"2022": { "type": "SAY_AS", "pronunciation": "twenty twenty two" }
}
}
Pauses
In the Synthesize Speech operation, the text
key of the request body holds the text to be synthesized. This text
key can be tweaked to add a pause between words in your script. This is done using Murf's pause syntax: [pause <duration>]
.
Specify how long you want the pause to be in seconds by replacing the <duration>
part of the syntax, and you'll get silence for that duration in the generated voiceover. The pause duration can be between 0.1s to 5s.
{
"text": "The answer to the problem was [pause 2s] patience."
}
Variation
This feature adjusts the variation in the generated voiceover using three primary parameters: pause, pitch, and speed. A higher variation value results in a more dynamic voice output, incorporating changes in speech delivery, pitch shifts, and pauses to make the audio sound more natural and less robotic.
- Valid values: Integer between 0 and 5.
- Default value: 1.
- Higher values: Increasing the value will add more variation in voice style, with noticeable shifts in pause, pitch, and speed.
- Availability: Only available for the Gen2 model.
{
"variation": 3
}
Audio Duration
The audioDuration
key in Synthesize Speech operation's request body lets developers specify the desired length of the generated audio (in seconds), and the system adjusts the speech to fit this duration. This can be useful for matching voiceovers with specific audio lengths or other time constraints. The system will try to match the duration of the generated audio to audioDuration
as closely as possible. If there’s a significant difference between the requested and actual duration, consider adjusting the text length or audioDuration value for better alignment.
- Valid values: A double value representing the time in seconds.
- Guideline: As a rule of thumb. ~150 words of text generates around 60 seconds of audio.
- Availability: Only available for the Gen2 model.
{
"audioDuration": 5.0
}
Multi-Native Locale
The multiNativeLocale feature enables a single voice to fluently speak in multiple languages, delivering speech that sounds native to each language. By default, the value will be set to the base language of the selected voice. In the response of the GET /v1/speech/voices
endpoint, the locale field will indicate the base language of the voice, while the supportedLocales field will list all the languages the voice supports. The response also provides details such as the voiceId, display name, gender, supported languages, and styles.
{
"multiNativeLocale": "fr-FR"
}
Base64 Encoding
You can set the encodeAsBase64
variable of the Synthesize Speech request body as true
if you want your generated audio returned as a base64 encoded string instead of an audio URL.
{
"voiceId": "en-US-julie",
"text": "This audio will be returned as a base64 string",
"encodeAsBase64": true
}
gzip Support
Responses from Murf API can be gzipped by including "gzip" in the accept-encoding
header of your requests. This is especially beneficial if you choose to return the audio response as a Base64 encoded string.
{
"headers": {
"Content-Type": "application/json",
"Accept": "application/json",
"accept-encoding": "gzip",
"token": "TOKEN"
}
}