Speech Customization | Murf API

Murf’s AI models not only generate natural-sounding speech quickly but also give you powerful customization controls to shape the output with precision and personality. Through intuitive controls, you can fine-tune every detail to bring your creative vision to life.

Styles

Murf Styles enable developers to fine-tune voice output for different contexts. Each voice supports multiple predefined styles that modify tone, emotional inflection, and delivery patterns. By passing the style parameter, you can programmatically transform a neutral voice to match specific contexts such as promotional, newscast, conversational, or inspirational to meet your application’s delivery requirements.

Here are some examples of different styles available in the Murf API:

Sad

Angry

Use the style key to select which style to use for your audio generation.

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="Oh! I'll have to do this all over again.",
5     voice_id="en-US-ken",
6     style="Angry"
7 )

Pronunciations

While our models are capable at handling complex pronunciations of heteronyms, acronyms, numbers, and proper nouns, you might sometimes need a specific pronunciation for certain words. Our custom pronunciation feature lets you adjust how words are spoken to perfectly match your context or accent preferences.

Here are a few examples of words and how they sound before and after adding custom pronunciations:

wound (wuːnd vs waʊnd)

2010 (twenty ten vs two thousand and ten)

The pronunciationDictionary key in Synthesize Speech operation’s request body is used to specify custom pronunciations.

You can specify custom pronunciations as an IPA or an alternate word. IPA is an internationally recognized set of phonetic symbols based on the principle of strict one-to-one correspondence between sounds and symbols.

Pronunciations are specified in a key-value pair format, where the key is the word that needs to be changed, and the value is an object that specifies the pronunciation type and value.

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="The 2010 world cup was held in South Africa",
5     voice_id="en-US-natalie",
6     pronunciation_dictionary={
7         "live": { "type": "IPA", "pronunciation": "laɪv" },
8         "2010": { "type": "SAY_AS", "pronunciation": "two thousand and ten" }
9     }
10 )

MultiNative

MultiNative voices enable text-to-speech synthesis that sounds authentically native across multiple languages. This allows you to use the same voice which can speak multiple languages while preserving natural pronunciation patterns specific to each language, effectively eliminating the “foreign accent” effect common in conventional multilingual TTS systems.

For example - “Croissant” in English & French

Without MultiNative Locale

With MultiNative Locale

Use the multiNativeLocale key to select which locale to use for your audio generation.

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="Croissant",
5     voice_id="en-US-natalie",
6     multi_native_locale="fr-FR"
7 )

Make sure the locale that you send in multiNativeLocale is supported by your chosen voice. You can see the list of supported locales for each voice in the Voice Library.

Pauses

Our models are capable of adding natural pauses based on the text and context. In some cases, you may want to adjust the pause duration between two words to achieve the desired effect in your speech.

In the Synthesize Speech operation, the text key of the request body holds the text to be synthesized. This text key can be tweaked to add a pause between words in your script. This is done using Murf’s pause syntax: [pause <duration>].

Specify how long you want the pause to be in seconds by replacing the <duration> part of the syntax, and you’ll get silence for that duration in the generated voiceover. The pause duration can be between 0.1s to 5s.

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="The answer to the problem was [pause 1s] patience.",
5     voice_id="en-US-terrell"
6 )

Audio Duration

The audioDuration key in Synthesize Speech operation’s request body lets you specify the desired length of the generated audio (in seconds), and the system adjusts the speech to fit this duration.

Here is an example of how audio duration helps in generating voiceovers of specific lengths:

Default (7s)

Faster (6s)

Slower (8s)

This can be useful for matching voiceovers with specific audio lengths or other time constraints. The system will try to match the duration of the generated audio to audioDuration as closely as possible.

If there’s a significant difference between the requested and actual duration, consider changing the text length or audioDuration value for better alignment.

Valid values: A double value representing the time in seconds.
Guideline: As a rule of thumb. ~150 words/1000 characters of text generates around 60 seconds of audio.
Availability: Only available for the Gen2 model.

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="The team is down by three points. Ten seconds left on the clock! The next play could decide the game",
5     voice_id="en-US-miles",
6     audio_duration=8.0
7 )

Speed & Pitch

The rate and pitch keys in the Synthesize Speech operation’s request body let you adjust the speed and pitch of the generated voiceover. These parameters can be used to fine-tune the voice output to better suit your application’s needs.

Default

Low Pitch & High Speed

	Speed	Pitch
Valid values	`>= -50 and <= 50`	`>= -50 and <= 50`
Default value	0	0

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="I can't believe it! Is that really you captain?",
5     voice_id="en-US-ken",
6     rate=10,
7     pitch=-10
8 )

Variations

Variations allows you to generate voiceover using three primary parameters: pause, pitch, and speed. A higher variation value results in a more dynamic voice output, incorporating changes in speech delivery, pitch shifts, and pauses to make the audio sound more natural and less robotic.

Variation 1

Variation 5

Increasing the value will add more variation in voice style, with noticeable shifts in pause, pitch, and speed

Valid values: An integer between 0 and 5
Default value: 1
Availability: Only available for the Gen2 model

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="And off they went. Gently walking into the sunset, with not a single care in the world",
5     voice_id="en-US-julia",
6     variation=5
7 )