Speech Customization
Discover how to adjust voice settings to create a distinctive and expressive voice for your application.
Murf’s AI models not only generate natural-sounding speech quickly but also give you powerful customization controls to shape the output with precision and personality. Through intuitive controls, you can fine-tune every detail to bring your creative vision to life.
Styles
Murf Styles enable developers to fine-tune voice output for different contexts. Each voice supports multiple predefined styles that modify tone, emotional inflection, and delivery patterns. By passing the style parameter, you can programmatically transform a neutral voice to match specific contexts such as promotional, newscast, conversational, or inspirational to meet your application’s delivery requirements.
Here are some examples of different styles available in the Murf API:
Use the style
key to select which style to use for your audio generation.
Pronunciations
While our models are capable at handling complex pronunciations of heteronyms, acronyms, numbers, and proper nouns, you might sometimes need a specific pronunciation for certain words. Our custom pronunciation feature lets you adjust how words are spoken to perfectly match your context or accent preferences.
Here are a few examples of words and how they sound before and after adding custom pronunciations:
The pronunciationDictionary
key in Synthesize Speech operation’s request body is used to specify custom pronunciations.
You can specify custom pronunciations as an IPA or an alternate word. IPA is an internationally recognized set of phonetic symbols based on the principle of strict one-to-one correspondence between sounds and symbols.
Pronunciations are specified in a key-value pair format, where the key is the word that needs to be changed, and the value is an object that specifies the pronunciation type and value.
MultiNative
MultiNative voices enable text-to-speech synthesis that sounds authentically native across multiple languages. This allows you to use the same voice which can speak multiple languages while preserving natural pronunciation patterns specific to each language, effectively eliminating the “foreign accent” effect common in conventional multilingual TTS systems.
Use the multiNativeLocale
key to select which locale to use for your audio generation.
Make sure the locale that you send in multiNativeLocale
is supported by your
chosen voice. You can see the list of supported locales for each voice in the
Voice Library.
Pauses
Our models are capable of adding natural pauses based on the text and context. In some cases, you may want to adjust the pause duration between two words to achieve the desired effect in your speech.
In the Synthesize Speech operation, the text key of the request body holds the text to be synthesized. This text key can be tweaked to add a pause between words in your script. This is done using Murf’s pause syntax: [pause <duration>]
.
Specify how long you want the pause to be in seconds by replacing the <duration>
part of the syntax, and you’ll get silence for that duration in the generated voiceover. The pause duration can be between 0.1s to 5s.
Audio Duration
The audioDuration
key in Synthesize Speech operation’s request body lets you specify the desired length of the generated audio (in seconds), and the system adjusts the speech to fit this duration.
Here is an example of how audio duration helps in generating voiceovers of specific lengths:
This can be useful for matching voiceovers with specific audio lengths or other time constraints. The system will try to match the duration of the generated audio to audioDuration
as closely as possible.
If there’s a significant difference between the requested and actual duration, consider changing the text length or audioDuration
value for better alignment.
- Valid values: A double value representing the time in seconds.
- Guideline: As a rule of thumb. ~150 words/1000 characters of text generates around 60 seconds of audio.
- Availability: Only available for the Gen2 model.
Speed & Pitch
The rate
and pitch
keys in the Synthesize Speech operation’s request body let you adjust the speed and pitch of the generated voiceover. These parameters can be used to fine-tune the voice output to better suit your application’s needs.
Variations
Variations allows you to generate voiceover using three primary parameters: pause, pitch, and speed. A higher variation value results in a more dynamic voice output, incorporating changes in speech delivery, pitch shifts, and pauses to make the audio sound more natural and less robotic.
Variation 1
Variation 5
Increasing the value will add more variation in voice style, with noticeable shifts in pause, pitch, and speed
- Valid values: An integer between 0 and 5
- Default value: 1
- Availability: Only available for the Gen2 model