Streaming

Murf TTS API supports real-time streaming capabilities, allowing developers to generate and play text-to-speech (TTS) audio dynamically as it is being generated in real-time, reducing the time-to-first-byte. This ensures minimal latency, making it ideal for conversational AI, real-time applications, and voice-enabled assistants.

New (Beta): Pass model = FALCON to use our Falcon model in text-to-speech streaming endpoints, designed for ultra-low latency (~130 ms).

In addition to HTTP streaming endpoint, Murf TTS supports Websocket streaming which enables bidirectional streaming for real-time audio generation.

Quickstart

Streaming enables returning raw audio bytes (e.g., MP3 data) directly over HTTP using chunked transfer encoding. This allows clients to process or play audio incrementally as it is generated. This section focuses on how streaming works for requests made to the Text to Speech API.

1

Getting Started

Generate an API key here. Store the key in a secure location, as you’ll need it to authenticate your requests. You can optionally save the key as an environment variable in your terminal.

2

Initiating a Streaming Request

1import requests
2
3url = "https://global.api.murf.ai/v1/speech/stream" ## global endpoint
4## url = "https://in.api.murf.ai/v1/speech/stream" regional endpoint
5headers = {
6"api-key": "YOUR_API_KEY",
7"Content-Type": "application/json"
8}
9data = {
10 "text": "Hello, this is a test message.",
11 "voiceId": "Matthew",
12 "multiNativeLocale":"en-US",
13 "format": "MP3",
14 "sampleRate": 24000,
15 "rate": 0,
16 "channelType": "MONO",
17 "model":"FALCON"
18}
19
20response = requests.post(url, headers=headers, json=data, stream=True)
21with open("audio.mp3", "wb") as f:
22 for chunk in response.iter_content(chunk_size=1024):
23 f.write(chunk)

In the response, you will receive a stream of audio data. You can save this data to a file or play it directly using an audio library.

Falcon Supported Voices

Voice IDSupported LocalesVoice Styles
Matthewen-US (English - US & Canada)Conversation
Zionen-US (English - US & Canada)Conversational
Kenen-US (English - US & Canada)Conversation
Riveren-US (English - US & Canada)Conversation
Emilyen-US (English - US & Canada)Narration
Voice IDSupported LocalesVoice Styles
Anishaen-IN (English - India)Conversation
Voice IDSupported LocalesVoice Styles
Namritahi-IN (Hindi - India)Conversation
Voice IDSupported LocalesVoice Styles
Amarabn-IN (Bengali - India)Conversation

Endpoint & Concurrency Overview

Endpoint baseConcurrency cap
https://global.api.murf.ai/v1/speech/stream15 (if nearest server is US-East) / 2 (all other regions)
https://<region>.api.murf.ai/v1/speech/stream (see regions below)15 for US-East / 2 for all other regions

The Global Router automatically picks the nearest region automatically.The concurrency limit is 15 for the US-East region and 2 for all other regions. To get higher concurrency, use the US-East endpoint directly or contact us to increase limits for regional endpoints.

Available Regions

Use the region closest to your users for the lowest latency.

Region (City/Area)Endpoint
US-Easthttps://us-east.api.murf.ai/speech/stream
US-Westhttps://us-west.api.murf.ai/speech/stream
Indiahttps://in.api.murf.ai/speech/stream
Canadahttps://ca.api.murf.ai/speech/stream
South Koreahttps://kr.api.murf.ai/speech/stream
UAEhttps://me.api.murf.ai/speech/stream
Japanhttps://jp.api.murf.ai/speech/stream
Australiahttps://au.api.murf.ai/speech/stream
EU (Central)https://eu-central.api.murf.ai/speech/stream
UKhttps://uk.api.murf.ai/speech/stream
South America (São Paulo)https://sa-east.api.murf.ai/speech/stream

FAQs

Falcon is our fastest streaming model (~130 ms latency) optimized for real-time interactions.

Use Falcon when your top priority is ultra-low latency. Typical fits include:

  • Conversational agents & live support where snappy turn-taking matters.
  • Real-time apps (IVR, gaming, tutoring, assistive tech) that stream audio as users speak.
  • Interruptible/barge-in experiences and interactive demos or prototyping.

Include model = FALCON in your request (HTTP or WebSocket). If omitted, the default streaming model is used.

Yes, we support tags to control voice styles, pitch and pauses.

All the voices and languages supported in TTS are available via streaming.A full list is available in our docs.

  • We support MP3, FLAC, WAV, ALAW, ULAW, OGG, and PCM.
  • If you need to transmit audio as text, you can Base64-encode any of these.