Voice agents and real-time applications rely on consistently low latency. This page explains how to use streaming output, how to correctly measure latency, and what you can do to reliably achieve ~130 ms first-byte response times with Murf Falcon.
With streaming output, you don’t need to wait for the entire audio response or write it to a file before playback. Instead, process and play each audio chunk as soon as the server delivers it.
Below are example code snippets showing how to test and play streaming audio:
We also recommend integrating Falcon directly into your agent infrastructure when testing streaming output. For the fastest and simplest setup, use the Murf plugins, which are optimized for low latency.
Both Streaming and WebSocket endpoints deliver audio incrementally. To measure latency correctly, the key metric to track is Time to First Audio Byte. This is the only metric that reflects true real-time performance.
You can inspect all timing components using the curl -w formatter:
If you prefer a UI-based method, Postman also provides a simple way to measure latency. After you send a request, Postman shows the request duration directly in the response window. This value represents how long it took for Postman to receive the first byte of the response.

Murf provides Falcon deployment in 11 global regions. For best performance, you should:
This ensures your request travels the shortest possible distance, giving you the fastest achievable first-byte latency.
For example: if your agent is hosted in AWS us-east-2, choosing the same Murf region minimizes RTT. Or If you’re testing from Europe or India, the global endpoint will route you to the nearest region automatically.
Recommendation:
Use the global endpoint for routing the request to the nearest region automatically. Below are regional urls for falcon streaming endpoint.
Latency depends heavily on where your request originates. Your client (browser or server) must send a request to Murf’s servers and wait for the first audio byte to come back. The physical distance and network path directly affect round-trip time (RTT).
When you test from a browser, your local machine becomes the client. This means latency will include:
As a result, browser tests usually show higher and more variable latency. However, in production your voice agent will typically be hosted on a server, such as AWS, or GCP. Servers in major cloud regions have:
This results in significantly lower and more consistent latency.
To understand real-world performance of your agent, always measure latency from the same environment where the agent will run, ideally a cloud server, not your local laptop.
Some languages require additional preprocessing or have inherently higher synthesis complexity, which can add a 10-20 milliseconds to first-byte latency. You can minimize this by using the right script and optimizing input text.
Use the correct script for each language
For Hindi, always prefer Devanagari (देवनागरी) instead of Latin transliteration. This reduces preprocessing and results in lower latency and more accurate pronunciation. If you send Hindi written in English characters (“aap kaise ho”), the system currently performs transliteration, which adds 5–10 ms of overhead.
Examples:
Hindi — Devanagari (देवनागरी)
Tamil — Tamil script (தமிழ்)
Telugu — Telugu script (తెలుగు)
Kannada — Kannada script (ಕನ್ನಡ)
Bengali — Bengali script (বাংলা)
Recommended: “আপনি কেমন আছেন?”
Not recommended: “apni kemon achhen?”
Marathi — Devanagari (देवनागरी)
Recommended: “तू कसा आहेस?”
Not recommended: “tu kasa ahes?”
Gujarati — Gujarati script (ગુજરાતી)
Recommended: “તમે કેમ છો?”
Not recommended: “tame kem cho?”
Very long input blocks delay the tokenization stage before streaming can start.Shorter, well-structured sentences allow faster first-byte response.
Reusing connections avoids repeated TCP and TLS handshakes, which significantly reduces first-byte latency.
httpx.AsyncClient() or requests.Session() to keep connections warm.Persistent connections help you avoid 30 - 80 ms of extra overhead per request, especially in real-time voice workloads.