Murf’s WebSocket implementation is based on the concept of a Context ID — a unique identifier used to track a specific TTS request. Context ID ensures continuity in the conversation, especially when the input text is generated in real time (e.g., via an LLM), comes in parts, or when the interaction is interrupted. It serves as a proxy for a single turn in the interaction between a user and the agent.


context_id for each turn in the conversation.end parameter should be set to true at the end of each turn. This clears the context and allows the next turn to start.context_id, allowing you to match responses with their corresponding inputs.final flag will indicate that all audio for that context has been sent.context_id for the next turn.context_ids can run independently over the same connection.clear parameter. If synthesis hasn’t started, the request is cancelled. If synthesis has started, the audio will still play to completion.Let’s say you’re building a voice agent that helps users book flights.
You open a WebSocket connection when the conversation starts. This connection stays open and allows real-time audio streaming between user and agent.
User says: “I want to book a flight to Paris.”
Your backend processes this and generates an agent response:
Agent text: “Sure, when would you like to travel?”
You send this text to Murf’s WebSocket API:
Murf returns audio with context_id: “turn_1” so you can play it back to the user.
User says: “Next Friday.”
Agent response: “Got it. Do you prefer morning or evening flights?”
You send this text to Murf’s WebSocket API:
Murf returns audio tagged with context_id: “turn_2”.
If the agent is mid-response and the user interrupts (e.g., says “Wait, make that Saturday”):
clear: true flag cancels any pending or incomplete responses tied to earlier contexts.context_id:Murf returns audio tagged with context_id: “turn_3”.