Latency Optimization | Murf API

Introduction

Voice agents and real-time applications rely on consistently low latency. This page explains how to use streaming output, how to correctly measure latency, and what you can do to reliably achieve ~130 ms first-byte response times with Murf Falcon.

How to Use Streaming Output

With streaming output, you don’t need to wait for the entire audio response or write it to a file before playback. Instead, process and play each audio chunk as soon as the server delivers it.

Below are example code snippets showing how to test and play streaming audio:

HTTP Streaming

WebSockets

1 import requests
2 import pyaudio
3 import os
4 
5 api_key = os.getenv("MURF_API_KEY")
6 
7 url = "https://global.api.murf.ai/v1/speech/stream" # Global URL
8 
9 # url = "https://in.api.murf.ai/v1/speech/stream" # Regional URL
10 
11 response = requests.post(
12     url,
13     headers={"api-key": api_key},
14     json={
15         "text": "Hi, How are you doing today?",
16         "model": "FALCON",
17         "voiceId": "Matthew",
18         "multiNativeLocale": "en-US",
19         "format": "PCM",
20         "sampleRate": 24000
21     },
22     stream=True
23 )
24 
25 # Audio format settings (must match your API output)
26 SAMPLE_RATE = 24000  # Default sample rate for streaming
27 CHANNELS = 1
28 FORMAT = pyaudio.paInt16
29 
30 # Setup audio stream for playback
31 pa = pyaudio.PyAudio()
32 stream = pa.open(format=FORMAT, channels=CHANNELS, rate=SAMPLE_RATE, output=True)
33 
34 try:
35     print("Starting audio playback...")
36     for chunk in response.iter_content(chunk_size=1024):
37         if chunk:
38             stream.write(chunk)
39 except Exception as e:
40     print(f"Error during streaming: {e}")
41 finally:
42     stream.stop_stream()
43     stream.close()
44     pa.terminate()
45     print("Audio streaming and playback complete!")

We also recommend integrating Falcon directly into your agent infrastructure when testing streaming output. For the fastest and simplest setup, use the Murf plugins, which are optimized for low latency.

LiveKit Integration

Pipecat Integration

How to Measure Latency

Streaming & WebSocket Endpoints

Both Streaming and WebSocket endpoints deliver audio incrementally. To measure latency correctly, the key metric to track is Time to First Audio Byte. This is the only metric that reflects true real-time performance.

Understanding Latency Components

Time_namelookup: This is the time spent resolving DNS. It can be reduced through OS-level DNS caching or by using long-lived processes that avoid repeated lookups.
time_connect: This represents the TCP handshake. You can reduce it by reusing connections and enabling keep-alive agents.
time_appconnect: This is the TLS handshake time. It can be improved by using TLS 1.3 and leveraging persistent TLS sessions.
time_starttransfer: This measures how long it takes for the server to send the very first byte of audio. This is the most important indicator for streaming performance. You can lower it with proper streaming usage and by sending shorter text chunks.
time_total: This is the full time required to receive the entire response. For real-time applications, this metric is not critical. Focus instead on first-byte latency.

Measuring Latency with curl

You can inspect all timing components using the curl -w formatter:

$ curl -w "\
> \n  time_namelookup:   %{time_namelookup}s\
> \n  time_connect:      %{time_connect}s\
> \n  time_appconnect:   %{time_appconnect}s\
> \n  time_starttransfer: %{time_starttransfer}s\
> \n  time_total:        %{time_total}s\n" \
>   -o /dev/null \
>   -X POST https://global.api.murf.ai/v1/speech/stream \
>   -H "api-key: YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>      "text":"Hello, how are you doing today?",
>      "voiceId": "matthew",
>      "multiNativeLocale":"en-US",
>      "model": "FALCON"
>   }'

Measuring Latency with Postman

If you prefer a UI-based method, Postman also provides a simple way to measure latency. After you send a request, Postman shows the request duration directly in the response window. This value represents how long it took for Postman to receive the first byte of the response.

Postman latency measurement showing request duration

Best practices to achieve lowest latency

1. Client Location

Murf provides Falcon deployment in 11 global regions. For best performance, you should:

Choose the region closest to your infrastructure, or
Use the Global Endpoint, which automatically routes your request to the nearest available Murf region.

This ensures your request travels the shortest possible distance, giving you the fastest achievable first-byte latency.

For example: if your agent is hosted in AWS us-east-2, choosing the same Murf region minimizes RTT. Or If you’re testing from Europe or India, the global endpoint will route you to the nearest region automatically.

Recommendation:

Use the global endpoint for routing the request to the nearest region automatically. Below are regional urls for falcon streaming endpoint.

Region (City/Area)	Endpoint
Global (Routes to the nearest server)	`https://global.api.murf.ai/v1/speech/stream`
US-East	`https://us-east.api.murf.ai/v1/speech/stream`
US-West	`https://us-west.api.murf.ai/v1/speech/stream`
India	`https://in.api.murf.ai/v1/speech/stream`
Canada	`https://ca.api.murf.ai/v1/speech/stream`
South Korea	`https://kr.api.murf.ai/v1/speech/stream`
UAE	`https://me.api.murf.ai/v1/speech/stream`
Japan	`https://jp.api.murf.ai/v1/speech/stream`
Australia	`https://au.api.murf.ai/v1/speech/stream`
EU (Central)	`https://eu-central.api.murf.ai/v1/speech/stream`
UK	`https://uk.api.murf.ai/v1/speech/stream`
South America (São Paulo)	`https://sa-east.api.murf.ai/v1/speech/stream`

2. Client Environment

Latency depends heavily on where your request originates. Your client (browser or server) must send a request to Murf’s servers and wait for the first audio byte to come back. The physical distance and network path directly affect round-trip time (RTT).

Browser vs. Server

When you test from a browser, your local machine becomes the client. This means latency will include:

Your home/office network quality
Your ISP routing
Distance from your physical location to Murf’s servers
Wi-Fi instability, VPNs, or proxies

As a result, browser tests usually show higher and more variable latency. However, in production your voice agent will typically be hosted on a server, such as AWS, or GCP. Servers in major cloud regions have:

Much lower network hop counts
Direct peering to hyperscalers
Highly stable routing paths

This results in significantly lower and more consistent latency.

Recommendation:

To understand real-world performance of your agent, always measure latency from the same environment where the agent will run, ideally a cloud server, not your local laptop.

3. Language & Script Considerations

Some languages require additional preprocessing or have inherently higher synthesis complexity, which can add a 10-20 milliseconds to first-byte latency. You can minimize this by using the right script and optimizing input text.

Best Practices:

Use the correct script for each language

For Hindi, always prefer Devanagari (देवनागरी) instead of Latin transliteration. This reduces preprocessing and results in lower latency and more accurate pronunciation. If you send Hindi written in English characters (“aap kaise ho”), the system currently performs transliteration, which adds 5–10 ms of overhead.

Examples:

Hindi — Devanagari (देवनागरी)
- Recommended: “आप कैसे हैं?”
- Not recommended: “aap kaise hain?”
Tamil — Tamil script (தமிழ்)
- Recommended: “நீங்கள் எப்படி இருக்கிறீர்கள்?”
- Not recommended: “neenga eppadi irukeenga?”
Telugu — Telugu script (తెలుగు)
- Recommended: “మీరు ఎలా ఉన్నారు?”
- Not recommended: “meeru ela unnaru?”
Kannada — Kannada script (ಕನ್ನಡ)
- Recommended: “ನೀವು ಹೇಗಿದ್ದೀರಿ?”
- Not recommended: “neevu hegiddiri?”
Bengali — Bengali script (বাংলা)
- Recommended: “আপনি কেমন আছেন?”
- Not recommended: “apni kemon achhen?”
Marathi — Devanagari (देवनागरी)
- Recommended: “तू कसा आहेस?”
- Not recommended: “tu kasa ahes?”
Gujarati — Gujarati script (ગુજરાતી)
- Recommended: “તમે કેમ છો?”
- Not recommended: “tame kem cho?”

Keep your text concise

Very long input blocks delay the tokenization stage before streaming can start.Shorter, well-structured sentences allow faster first-byte response.

4. Use Connection Pooling (Persistent Connections)

Reusing connections avoids repeated TCP and TLS handshakes, which significantly reduces first-byte latency.

Python: use httpx.AsyncClient() or requests.Session() to keep connections warm.
WebSockets: Keep the socket open and stream multiple generations over the same connection.

Persistent connections help you avoid 30 - 80 ms of extra overhead per request, especially in real-time voice workloads.