Overview | Murf API | Documentation

Murf provides a powerful Text to Speech API that allows you to generate high-quality, natural-sounding speech from text input. The API supports over 35 languages and 20 speaking styles across 150+ voices to suit your application’s needs.

Quickstart

Murf offers two ways to generate speech:

Streaming API: Real-time, low-latency speech generation for conversational AI and real-time voice agents. It delivers natural speech with time-to-first-audio under 130ms.
Synthesize Speech : Designed for studio-quality speech synthesis. Ideal for multimedia applications requiring rich, expressive voiceovers.

You can Generate your API key from the Murf API Dashboard and optionally set it as an environment variable.

Install the SDK

If you’re using Python, you can install Murf’s Python SDK using the following command:

$ pip install murf

Using the Streaming API

1 import pyaudio
2 from murf import Murf, MurfRegion
3 
4 client = Murf(
5     api_key="YOUR_API_KEY", # Not required if you have set the MURF_API_KEY environment variable
6     region=MurfRegion.GLOBAL
7 )
8 
9 # For lower latency, specify a region closer to your users
10 # client = Murf(region=MurfRegion.IN)  # Example: India region
11 
12 # Audio format settings (must match your API output)
13 SAMPLE_RATE = 24000  
14 CHANNELS = 1
15 FORMAT = pyaudio.paInt16
16 
17 def play_streaming_audio():
18     # Get the streaming audio generator
19     audio_stream = client.text_to_speech.stream(
20         text="Hi, How are you doing today?",
21         voice_id="Matthew",
22         model="FALCON",
23         multi_native_locale="en-US",
24         sample_rate=SAMPLE_RATE,
25         format="PCM"
26     )
27 
28     # Setup audio stream for playback
29     pa = pyaudio.PyAudio()
30     stream = pa.open(format=FORMAT, channels=CHANNELS, rate=SAMPLE_RATE, output=True)
31 
32     try:
33         print("Starting audio playback...")
34         for chunk in audio_stream:
35             if chunk:  # Check if chunk has data
36                 stream.write(chunk)
37     except Exception as e:
38         print(f"Error during streaming: {e}")
39     finally:
40         stream.stop_stream()
41         stream.close()
42         pa.terminate()
43         print("Audio streaming and playback complete!")
44 
45 if __name__ == "__main__":
46     play_streaming_audio()

Using the Non-Streaming API

1 from murf import Murf
2 
3 client = Murf(
4     api_key="YOUR_API_KEY" # Not required if you have set the MURF_API_KEY environment variable
5 )
6 
7 res = client.text_to_speech.generate(
8     text="There is much to be said",
9     voice_id="en-US-terrell",
10 )
11 
12 print(res.audio_file)

A link to the audio file will be returned in the response. You can use this link to download the audio file and use it wherever you need it. The audio file will be available for download for 72 hours after generation.

Streaming

Generate speech in real-time with low latency and high quality using the streaming API

WebSockets

Build responsive, real-time voice applications with low-latency, bidirectional streaming.

Voices & Styles

Explore Murf’s extensive library of voices and styles

Speech Customization

Craft unique and expressive voiceovers for your application

Supported Output Formats

The API supports multiple output formats for the generated audio - the default output format is wav. You can choose from the following formats:

Format	Description
WAV	Uncompressed audio format, useful for low-latency applications as it eliminates the need for decoding.
MP3	Compressed audio format, widely supported and suitable for applications where file size is a concern.
FLAC	Lossless compressed audio format, ideal for applications requiring high audio fidelity without the large file size of uncompressed formats.
ALAW	Compressed audio format commonly used in telephony, providing a good balance between audio quality and bandwidth usage.
ULAW	Another compressed audio format used in telephony, similar to ALAW but with slightly different compression characteristics.
OGG	Efficient compressed format offering better quality at similar bitrates; ideal for web playback and streaming.
PCM	Raw, uncompressed audio data; useful for telephony, DSP pipelines, and systems requiring raw waveform access.

You can specify the output format using the format parameter in the request payload.

Furthermore, you can use the channelType and sampleRate keys to specify the channel type and sample rate for the generated audio. The API supports stereo and mono channels, and sample rates of 8000, 24000, 44100, and 48000 Hz.

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="Hi, How are you doing today?",
5     voice_id="en-US-julia",
6     format="MP3",
7     channel_type="STEREO",
8     sample_rate=44100
9 )

ULAW and ALAW formats only support mono channel type and a sample rate of 8000 Hz. If you specify a different channel type or sample rate, the API will default to the supported values.

Base64 Encoding

Note: Not available for Streaming API.

You can choose to receive the audio file in Base64 encoded format by setting the encodeAsBase64 parameter to true in the request payload. This can be useful when you need to embed the audio file directly into your application or store it in a database. This will also enable zero retention of audio data on Murf’s servers.

1 from murf import Murf
2 client = Murf()
3 res = client.text_to_speech.generate(
4     text="Hi, How are you doing today?",
5     voice_id="en-US-julia",
6     encode_as_base_64=True
7 )

The response will include the audio file encoded in Base64 format, which you can decode and use as needed.

Response

1 {
2   ...,
3   "encodedAudio": "U29tZSB0ZXh0IHNob3cgd2l0aCB0aGF0Lg==...",
4   ...
5 }

gzip Support

Note: Not supported for Streaming API.

Responses from Murf API can be gzipped by including “gzip” in the accept-encoding header of your requests. This is especially beneficial if you choose to return the audio response as a Base64 encoded string.

1 from murf import Murf
2 
3 client = Murf()
4 
5 client.text_to_speech.generate(
6         text="Hi, How are you doing today?",
7         voice_id="en-US-natalie",
8         encode_as_base_64=True,
9         request_options={
10             'additional_headers': {
11                 'accept-encoding': 'gzip'
12             }
13         }
14 )

FAQ

What are audio formats, and how do I choose the right one?

Audio formats define how sound data is stored and compressed. Choose MP3 for web streaming due to its small size; OGG as an open, efficient option for streaming with better quality at similar bitrates; WAV for highest-quality, uncompressed recordings and editing; FLAC for lossless compression with reduced size; ALAW/ULAW for telephony systems; and PCM for raw, uncompressed audio when you need maximum compatibility or low-level processing (note: large files). Base64 encodes audio as text, making it useful for embedding in APIs or data transfers.

What are audio channels, and when should I use MONO vs. STEREO?

Audio channels define the number of sound signals in a recording.

Mono (1 channel): Best for voice calls, podcasts, and telephony—ensuring clarity.
Stereo (2 channels): Preferred for music, films, and immersive experiences where directional sound matters.

What is a sample rate, and which one should I choose?

The sample rate (measured in Hz) determines audio detail:

8000 Hz: Telephony & VoIP (mandatory for ALAW/ULAW).
24000 Hz: Balanced for podcasts and e-learning.
44100 Hz: CD-quality audio.
48000 Hz: Industry standard for film and professional audio. Higher sample rates improve quality but increase file size—choose based on your needs.

What is Base64 audio, and when should I use it?

Base64 encodes audio as text, making it useful for embedding in APIs, JSON, XML, or data transfers where binary formats aren’t supported. Base64 is useful for transmitting audio files in web-based applications. Since Base64 increases file size compared to its original format, it’s best used for compatibility rather than storage efficiency.