Speech Customization | Murf API

Murf’s AI models not only generate natural-sounding speech quickly but also give you powerful customization controls to shape the output with precision and personality. Through intuitive controls, you can fine-tune every detail to bring your creative vision to life.

Voices

Murf offers a diverse collection of 150+ AI voices across different accents, genders, and speaking styles—designed to suit a wide range of use cases from narration and marketing to training and conversation. The voiceId key is a required parameter in the Synthesis Speech operation’s request body and must be provided to specify which voice should be used to generate the audio output. Each voice comes with its own unique tonal profile and supports different features such as styles and multi-native locales.

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="What color is the sky?",
5     voice_id="en-US-ariana",
6 )

Find your Perfect Voice

Explore, preview, and select from 150+ voices in 20+ expressive styles

Styles

Murf Styles enable developers to fine-tune voice output for different contexts. Each voice supports multiple predefined styles that modify tone, emotional inflection, and delivery patterns. By passing the style parameter, you can programmatically transform a neutral voice to match specific contexts such as promotional, newscast, conversational, or inspirational to meet your application’s delivery requirements.

Here are some examples of different styles available in the Murf API:

Sad

Angry

Use the style key to select which style to use for your audio generation.

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="Oh! I'll have to do this all over again.",
5     voice_id="en-US-ken",
6     style="Angry"
7 )

You can explore all supported styles and hear audio samples in our Voice Library.

Pronunciations

While our models are capable at handling complex pronunciations of heteronyms, acronyms, numbers, and proper nouns, you might sometimes need a specific pronunciation for certain words. Our custom pronunciation feature lets you adjust how words are spoken to perfectly match your context or accent preferences.

Here are a few examples of words and how they sound before and after adding custom pronunciations:

wound (wuːnd vs waʊnd)

2010 (twenty ten vs two thousand and ten)

The pronunciationDictionary key in Synthesize Speech operation’s request body is used to specify custom pronunciations.

You can specify custom pronunciations as an IPA or an alternate word. IPA is an internationally recognized set of phonetic symbols based on the principle of strict one-to-one correspondence between sounds and symbols.

Pronunciations are specified in a key-value pair format, where the key is the word that needs to be changed, and the value is an object that specifies the pronunciation type and value.

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="The 2010 world cup was held in South Africa",
5     voice_id="en-US-natalie",
6     pronunciation_dictionary={
7         "live": { "type": "IPA", "pronunciation": "laɪv" },
8         "2010": { "type": "SAY_AS", "pronunciation": "two thousand and ten" }
9     }
10 )

Multilingual

Multilingual voices enable text-to-speech synthesis that sounds authentically native across multiple languages. This allows you to use the same voice which can speak multiple languages while preserving natural pronunciation patterns specific to each language, effectively eliminating the “foreign accent” effect common in conventional Multilingual TTS systems.

For example - “Croissant” in English & French

Without Multilingual Locale

With Multilingual Locale

Use the multiNativeLocale key to select which locale to use for your audio generation.

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="Croissant",
5     voice_id="en-US-natalie",
6     multi_native_locale="fr-FR"
7 )

Make sure the locale that you send in multiNativeLocale is supported by your chosen voice. You can see the list of supported locales for each voice in the Voice Library.

Pauses

Our models are capable of adding natural pauses based on the text and context. In some cases, you may want to adjust the pause duration between two words to achieve the desired effect in your speech.

In the Synthesize Speech operation, the text key of the request body holds the text to be synthesized. This text key can be tweaked to add a pause between words in your script. This is done using Murf’s pause syntax: [pause <duration>].

Specify how long you want the pause to be in seconds by replacing the <duration> part of the syntax, and you’ll get silence for that duration in the generated voiceover. The pause duration can be between 0.1s to 5s.

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="The answer to the problem was [pause 1s] patience.",
5     voice_id="en-US-terrell"
6 )

The [pause <duration>] tag is currently supported only in the Synthesize Speech operation. The Stream Speech operation does not support custom pause tags, it automatically adds natural pauses based on the text and context.

Audio Duration

The audioDuration key in Synthesize Speech operation’s request body lets you specify the desired length of the generated audio (in seconds), and the system adjusts the speech to fit this duration.

Here is an example of how audio duration helps in generating voiceovers of specific lengths:

Default (7s)

Faster (6s)

Slower (8s)

This can be useful for matching voiceovers with specific audio lengths or other time constraints. The system will try to match the duration of the generated audio to audioDuration as closely as possible.

If there’s a significant difference between the requested and actual duration, consider changing the text length or audioDuration value for better alignment.

Valid values: A double value representing the time in seconds.
Guideline: As a rule of thumb. ~150 words/1000 characters of text generates around 60 seconds of audio.
Availability: Supported only in the Synthesize Speech operation for the Gen2 model.

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="The team is down by three points. Ten seconds left on the clock! The next play could decide the game",
5     voice_id="en-US-miles",
6     audio_duration=8.0
7 )

Speed

The rate key in the Synthesize Speech operation’s request body controls the speed at which the voice speaks. Adjusting this parameter lets you make the voice output faster or slower.

Higher values mean higher speed, and lower values slow down the speech.

Valid values: Any integer between -50 and 50
Default value: 0

Default

High Speed

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="I can't believe it! Is that really you captain?",
5     voice_id="en-US-ken",
6     rate=10
7 )

Pitch

The pitch key controls the tone or frequency of the generated voice. Increasing the pitch makes the voice sound higher (more treble), while decreasing it results in a deeper (more bass) voice.

Valid values: Any integer between -50 and 50
Default value: 0

Default

Low Pitch

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="I can't believe it! Is that really you captain?",
5     voice_id="en-US-ken",
6     pitch=-10
7 )

Variations

Variations allows you to generate voiceover using three primary parameters: pause, pitch, and speed. A higher variation value results in a more dynamic voice output, incorporating changes in speech delivery, pitch shifts, and pauses to make the audio sound more natural and less robotic.

Variation 1

Variation 5

Increasing the value will add more variation in voice style, with noticeable shifts in pause, pitch, and speed

Valid values: An integer between 0 and 5
Default value: 1
Availability: Only available for the Gen2 model

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="And off they went. Gently walking into the sunset, with not a single care in the world",
5     voice_id="en-US-julia",
6     variation=5
7 )