WebSockets | Murf API | Documentation

Murf TTS API supports WebSocket streaming, enabling low-latency, bidirectional communication over a persistent connection. It’s designed for building responsive voice experiences like interactive voice agents, live conversations, and other real-time applications.

With a single WebSocket connection, you can stream text input and receive synthesized audio continuously, without the overhead of repeated HTTP requests. This makes it ideal for use cases where your application sends or receives text in chunks and needs real-time audio to deliver a smooth, conversational experience.

Simple WebSocket Connection

Quickstart

This guide walks you through setting up and making your first WebSocket streaming request.

Getting Started

Generate an API key here. Store the key in a secure location, as you’ll need it to authenticate your requests. You can optionally save the key as an environment variable in your terminal.

$ # Export an environment variable on macOS or Linux systems
> export MURF_API_KEY="your_api_key_here"

Install required packages

This guide uses the websockets and pyaudio Python packages. The websockets package is essential for the core functionality.

Note: pyaudio is used in this quickstart guide to demonstrate playing the audio received from the WebSocket. However, it is not required to use Murf WebSockets if you have a different method for handling or playing the audio stream.

pyaudio depends on PortAudio, you may need to install it first.

Installing PortAudio (for PyAudio)

PyAudio depends on PortAudio, a cross-platform audio I/O library. You may need to install PortAudio separately if it’s not already on your system.

macOS

Linux (Debian/Ubuntu)

Windows

$ brew install portaudio

Once you have installed PortAudio, you can install the required Python packages using the following command:

Install Python packages

$ pip install websockets pyaudio

Streaming Text and Playing Synthesized Audio

1 import asyncio
2 import websockets
3 import json
4 import base64
5 import pyaudio
6 # import os
7 
8 
9 API_KEY = "YOUR_API_KEY" # Or use os.getenv("MURF_API_KEY") if you have set the API key as an environment variable
10 WS_URL = "wss://api.murf.ai/v1/speech/stream-input"
11 PARAGRAPH = "With a single WebSocket connection, you can stream text input and receive synthesized audio continuously, without the overhead of repeated HTTP requests. This makes it ideal for use cases where your application sends or receives text in chunks and needs real-time audio to deliver a smooth, conversational experience"
12 
13 # Audio format settings (must match your API output)
14 SAMPLE_RATE = 44100
15 CHANNELS = 1
16 FORMAT = pyaudio.paInt16
17 
18 async def tts_stream():
19   async with websockets.connect(
20       f"{WS_URL}?api-key={API_KEY}&sample_rate=44100&channel_type=MONO&format=WAV"
21   ) as ws:
22       # Send voice config first (optional)
23       voice_config_msg = {
24           "voice_config": {
25               "voiceId": "en-US-amara",
26               "style": "Conversational",
27               "rate": 0,
28               "pitch": 0,
29               "variation": 1
30           }
31       }
32       print(f'Sending payload : {voice_config_msg}')
33       await ws.send(json.dumps(voice_config_msg))
34 
35       # Send text in one go (or chunk if you want streaming)
36       text_msg = {
37           "text": PARAGRAPH,
38           "end" : True # This will close the context. So you can re-run and concurrency is available.
39       }
40       print(f'Sending payload : {text_msg}')
41       await ws.send(json.dumps(text_msg))
42 
43       # Setup audio stream
44       pa = pyaudio.PyAudio()
45       stream = pa.open(format=FORMAT, channels=CHANNELS, rate=SAMPLE_RATE, output=True)
46 
47       first_chunk = True
48       try:
49           while True:
50               response = await ws.recv()
51               data = json.loads(response)
52               print(f'Received data:  {data}')
53               if "audio" in data:
54                   audio_bytes = base64.b64decode(data["audio"])
55                   # Skip the first 44 bytes (WAV header) only for the first chunk
56                   if first_chunk and len(audio_bytes) > 44:
57                       audio_bytes = audio_bytes[44:]
58                       first_chunk = False
59                   stream.write(audio_bytes)
60               if data.get("isFinalAudio"):
61                   break
62       finally:
63           stream.stop_stream()
64           stream.close()
65           pa.terminate()
66 
67 if __name__ == "__main__":
68     asyncio.run(tts_stream())

Best Practices

Following are some best practices for using the WebSocket streaming API:

Once connected, the session remains active as long as it is in use and will automatically close after 3 minutes of inactivity.
You can maintain up to 10X your streaming concurrency limit in WebSocket connections, as per your plan’s rate limits.

Next Steps

Context ID

Use a unique identifier to track a specific TTS request, ensuring continuity in the conversation.

Advanced WebSockets

Fine-tune text buffering to balance audio quality and Time to First Byte (TTFB).

FAQs

How is WebSocket streaming different from HTTP streaming in the Murf TTS API?

WebSocket allows you to stream input text and receive audio over the same persistent connection, making it truly bidirectional. In contrast, HTTP streaming is one-way, you send the full text once and receive audio while it is being generated. WebSocket is better for real-time, interactive use cases where text arrives in parts.

What format is the audio received over WebSocket?

The audio is streamed as a sequence of base64-encoded strings, with each message containing a chunk of the overall audio

After how long will the WebSocket connection close due to inactivity?

The WebSocket connection will automatically close after 3 minutes of inactivity.

What features can I use with the WebSocket Streaming API?

You can control style, speed, pitch and pauses.