WebSockets

Murf TTS API supports WebSocket streaming, enabling low-latency, bidirectional communication over a persistent connection. It’s designed for building responsive voice experiences like interactive voice agents, live conversations, and other real-time applications.

New (Beta): Pass model = FALCON to use our Falcon model in text-to-speech streaming endpoints, designed for ultra-low latency (~130 ms).

With a single WebSocket connection, you can stream text input and receive synthesized audio continuously, without the overhead of repeated HTTP requests. This makes it ideal for use cases where your application sends or receives text in chunks and needs real-time audio to deliver a smooth, conversational experience.

Simple WebSocket Connection

Quickstart

This guide walks you through setting up and making your first WebSocket streaming request.

1

Getting Started

Generate an API key here. Store the key in a secure location, as you’ll need it to authenticate your requests. You can optionally save the key as an environment variable in your terminal.

$# Export an environment variable on macOS or Linux systems
>export MURF_API_KEY="your_api_key_here"
2

Install required packages

This guide uses the websockets and pyaudio Python packages. The websockets package is essential for the core functionality.

Note: pyaudio is used in this quickstart guide to demonstrate playing the audio received from the WebSocket. However, it is not required to use Murf WebSockets if you have a different method for handling or playing the audio stream.

pyaudio depends on PortAudio, you may need to install it first.

PyAudio depends on PortAudio, a cross-platform audio I/O library. You may need to install PortAudio separately if it’s not already on your system.

$brew install portaudio

Once you have installed PortAudio, you can install the required Python packages using the following command:

Install Python packages
$pip install websockets pyaudio
3

Streaming Text and Playing Synthesized Audio

1import asyncio
2import websockets
3import json
4import base64
5import pyaudio
6# import os
7
8
9API_KEY = "YOUR_API_KEY" # Or use os.getenv("MURF_API_KEY") if you have set the API key as an environment variable
10WS_URL = "wss://global.api.murf.ai/v1/speech/stream-input"
11PARAGRAPH = "With a single WebSocket connection, you can stream text input and receive synthesized audio continuously, without the overhead of repeated HTTP requests. This makes it ideal for use cases where your application sends or receives text in chunks and needs real-time audio to deliver a smooth, conversational experience"
12
13# Audio format settings (must match your API output)
14SAMPLE_RATE = 24000
15CHANNELS = 1
16FORMAT = pyaudio.paInt16
17
18async def tts_stream():
19 async with websockets.connect(
20 f"{WS_URL}?api-key={API_KEY}&model=FALCON&sample_rate=24000&channel_type=MONO&format=WAV"
21 ) as ws:
22 # Send voice config first (optional)
23 voice_config_msg = {
24 "voice_config": {
25 "voiceId": "Matthew",
26 "multiNativeLocale":"en-US",
27 "style": "Conversation",
28 "rate": 0,
29 "pitch": 0,
30 "variation": 1
31 }
32 }
33 print(f'Sending payload : {voice_config_msg}')
34 await ws.send(json.dumps(voice_config_msg))
35
36 # Send text in one go (or chunk if you want streaming)
37 text_msg = {
38 "text": PARAGRAPH,
39 "end" : True # This will close the context. So you can re-run and concurrency is available.
40 }
41 print(f'Sending payload : {text_msg}')
42 await ws.send(json.dumps(text_msg))
43
44 # Setup audio stream
45 pa = pyaudio.PyAudio()
46 stream = pa.open(format=FORMAT, channels=CHANNELS, rate=SAMPLE_RATE, output=True)
47
48 first_chunk = True
49 try:
50 while True:
51 response = await ws.recv()
52 data = json.loads(response)
53 print(f'Received data: {data}')
54 if "audio" in data:
55 audio_bytes = base64.b64decode(data["audio"])
56 # Skip the first 44 bytes (WAV header) only for the first chunk
57 if first_chunk and len(audio_bytes) > 44:
58 audio_bytes = audio_bytes[44:]
59 first_chunk = False
60 stream.write(audio_bytes)
61 if data.get("final"):
62 break
63 finally:
64 stream.stop_stream()
65 stream.close()
66 pa.terminate()
67
68if __name__ == "__main__":
69 asyncio.run(tts_stream())

Falcon Supported Voices

Voice IDSupported LocalesVoice Styles
Matthewen-US (English - US & Canada)Conversation
Zionen-US (English - US & Canada)Conversational
Kenen-US (English - US & Canada)Conversation
Riveren-US (English - US & Canada)Conversation
Emilyen-US (English - US & Canada)Narration
Voice IDSupported LocalesVoice Styles
Anishaen-IN (English - India)Conversation
Voice IDSupported LocalesVoice Styles
Namritahi-IN (Hindi - India)Conversation
Voice IDSupported LocalesVoice Styles
Amarabn-IN (Bengali - India)Conversation

Available Regions

Use the region closest to your users for the lowest latency.

Region (City/Area)Endpoint
Global (Routes to the nearest server)https://global.api.murf.ai/v1/speech/stream
US-Easthttps://us-east.api.murf.ai/v1/speech/stream
US-Westhttps://us-west.api.murf.ai/v1/speech/stream
Indiahttps://in.api.murf.ai/v1/speech/stream
Canadahttps://ca.api.murf.ai/v1/speech/stream
South Koreahttps://kr.api.murf.ai/v1/speech/stream
UAEhttps://me.api.murf.ai/v1/speech/stream
Japanhttps://jp.api.murf.ai/v1/speech/stream
Australiahttps://au.api.murf.ai/v1/speech/stream
EU (Central)https://eu-central.api.murf.ai/v1/speech/stream
UKhttps://uk.api.murf.ai/v1/speech/stream
South America (São Paulo)https://sa-east.api.murf.ai/v1/speech/stream

The Global Router automatically picks the nearest region automatically.The concurrency limit is 15 for the US-East region and 2 for all other regions. To get higher concurrency, use the US-East endpoint directly or contact us to increase limits for regional endpoints.

Best Practices

Following are some best practices for using the WebSocket streaming API:

  • Once connected, the session remains active as long as it is in use and will automatically close after 3 minutes of inactivity.
  • You can maintain up to 10X your streaming concurrency limit in WebSocket connections, as per your plan’s rate limits.
  • For the lowest latency, prefer Falcon voices by setting model = FALCON. If you need Multilingual support or the widest voice/style coverage, use the default streaming model.

Next Steps

FAQs

WebSocket allows you to stream input text and receive audio over the same persistent connection, making it truly bidirectional. In contrast, HTTP streaming is one-way, you send the full text once and receive audio while it is being generated. WebSocket is better for real-time, interactive use cases where text arrives in parts.

The audio is streamed as a sequence of base64-encoded strings, with each message containing a chunk of the overall audio

The WebSocket connection will automatically close after 3 minutes of inactivity.

You can control style, speed, pitch and pauses.

Add model = FALCON to your WebSocket connection query (or request parameters). Falcon is optimized for ultra-low latency (~130 ms) and is ideal for interactive agents, live support, gaming, tutoring, and other real-time experiences where fast turn-taking matters.