WebSockets

Murf TTS API supports WebSocket streaming, enabling low-latency, bidirectional communication over a persistent connection. It’s designed for building responsive voice experiences like interactive voice agents, live conversations, and other real-time applications.

With a single WebSocket connection, you can stream text input and receive synthesized audio continuously, without the overhead of repeated HTTP requests. This makes it ideal for use cases where your application sends or receives text in chunks and needs real-time audio to deliver a smooth, conversational experience.

Simple WebSocket Connection

Quickstart

This guide walks you through setting up and making your first WebSocket streaming request.

1

Getting Started

Generate an API key here. Store the key in a secure location, as you’ll need it to authenticate your requests. You can optionally save the key as an environment variable in your terminal.

$# Export an environment variable on macOS or Linux systems
>export MURF_API_KEY="your_api_key_here"
2

Install required packages

This guide uses the websockets and pyaudio Python packages. The websockets package is essential for the core functionality.

Note: pyaudio is used in this quickstart guide to demonstrate playing the audio received from the WebSocket. However, it is not required to use Murf WebSockets if you have a different method for handling or playing the audio stream.

pyaudio depends on PortAudio, you may need to install it first.

PyAudio depends on PortAudio, a cross-platform audio I/O library. You may need to install PortAudio separately if it’s not already on your system.

$brew install portaudio

Once you have installed PortAudio, you can install the required Python packages using the following command:

Install Python packages
$pip install websockets pyaudio
3

Streaming Text and Playing Synthesized Audio

1import asyncio
2import websockets
3import json
4import base64
5import pyaudio
6
7
8API_KEY = "YOUR_API_KEY"
9WS_URL = "wss://api.murf.ai/v1/speech/stream-input"
10PARAGRAPH = "With a single WebSocket connection, you can stream text input and receive synthesized audio continuously, without the overhead of repeated HTTP requests. This makes it ideal for use cases where your application sends or receives text in chunks and needs real-time audio to deliver a smooth, conversational experience"
11
12# Audio format settings (must match your API output)
13SAMPLE_RATE = 44100
14CHANNELS = 1
15FORMAT = pyaudio.paInt16
16
17async def tts_stream():
18 async with websockets.connect(
19 f"{WS_URL}?api-key={API_KEY}&sample_rate=44100&channel_type=MONO&format=WAV"
20 ) as ws:
21 # Send voice config first (optional)
22 voice_config_msg = {
23 "voice_config": {
24 "voiceId": "en-US-amara",
25 "style": "Conversational",
26 "rate": 0,
27 "pitch": 0,
28 "variation": 1
29 }
30 }
31 print(f'Sending payload : {voice_config_msg}')
32 await ws.send(json.dumps(voice_config_msg))
33
34 # Send text in one go (or chunk if you want streaming)
35 text_msg = {
36 "text": PARAGRAPH,
37 "end" : True # This will close the context. So you can re-run and concurrency is available.
38 }
39 print(f'Sending payload : {text_msg}')
40 await ws.send(json.dumps(text_msg))
41
42 # Setup audio stream
43 pa = pyaudio.PyAudio()
44 stream = pa.open(format=FORMAT, channels=CHANNELS, rate=SAMPLE_RATE, output=True)
45
46 first_chunk = True
47 try:
48 while True:
49 response = await ws.recv()
50 data = json.loads(response)
51 print(f'Received data: {data}')
52 if "audio" in data:
53 audio_bytes = base64.b64decode(data["audio"])
54 # Skip the first 44 bytes (WAV header) only for the first chunk
55 if first_chunk and len(audio_bytes) > 44:
56 audio_bytes = audio_bytes[44:]
57 first_chunk = False
58 stream.write(audio_bytes)
59 if data.get("isFinalAudio"):
60 break
61 finally:
62 stream.stop_stream()
63 stream.close()
64 pa.terminate()
65
66if __name__ == "__main__":
67 asyncio.run(tts_stream())

Best Practices

Following are some best practices for using the WebSocket streaming API:

  • Once connected, the session remains active as long as it is in use and will automatically close after 3 minutes of inactivity.
  • You can maintain up to 10X your streaming concurrency limit in WebSocket connections, as per your plan’s rate limits.

Next Steps

FAQs

WebSocket allows you to stream input text and receive audio over the same persistent connection, making it truly bidirectional. In contrast, HTTP streaming is one-way, you send the full text once and receive audio while it is being generated. WebSocket is better for real-time, interactive use cases where text arrives in parts.

The audio is streamed as a sequence of base64-encoded strings, with each message containing a chunk of the overall audio

The WebSocket connection will automatically close after 3 minutes of inactivity.

You can control style, speed, pitch and pauses.