Context ID

Murf’s WebSocket implementation is based on the concept of a Context ID — a unique identifier used to track a specific TTS request. Context ID ensures continuity in the conversation, especially when the input text is generated in real time (e.g., via an LLM), comes in parts, or when the interaction is interrupted. It serves as a proxy for a single turn in the interaction between a user and the agent.

Why Use a Context ID?

  • Maintains Conversational Flow: Links input and output across split or partial messages.
  • Handles Interruptions Gracefully: If a user interrupts the agent, the Context ID ensures that the current request can be cancelled or skipped.
  • Supports Multi-Turn Interactions: Essential for structured flows like bookings, troubleshooting, or guided forms.
  • Simplifies Handoff and Debugging: If support is needed, the Context ID allows you to trace exactly what happened in a specific interaction turn.

Simple WebSocket Connection

Simple WebSocket Connection

Working with Context IDs

Input Stream

  • Assign a new context_id for each turn in the conversation.
  • Ensure input text includes proper punctuation for better prosody and consistent audio output.

Output Audio

  • The response will include the same context_id, allowing you to match responses with their corresponding inputs.
  • An is_final flag will indicate that all audio for that context has been sent.
  • Output is streamed in the same order that input text was received.

Handling Interruptions

  • If the user interrupts the agent mid-response, start a new context_id for the next turn.
  • Murf’s WebSocket supports multiplexing, so multiple context_ids can run independently over the same connection.
  • To cancel a pending or in-progress turn, use the clear parameter. If synthesis hasn’t started, the request is cancelled. If synthesis has started, the audio will still play to completion.

Example for using Context ID in a Voice Agent

Let’s say you’re building a voice agent that helps users book flights.

1

Establish WebSocket Connection

You open a WebSocket connection when the conversation starts. This connection stays open and allows real-time audio streaming between user and agent.

2

Use a Context ID for Each Agent Response (Turn)

Turn 1

User says: “I want to book a flight to Paris.”

Your backend processes this and generates an agent response:

Agent text: “Sure, when would you like to travel?”

You send this text to Murf’s WebSocket API:

1{
2 "context_id": "turn_1",
3 "text": "Sure, when would you like to travel?",
4}

Murf returns audio with context_id: “turn_1” so you can play it back to the user.

Turn 2

User says: “Next Friday.”

Agent response: “Got it. Do you prefer morning or evening flights?”

You send this text to Murf’s WebSocket API:

1{
2 "context_id": "turn_2",
3 "text": "Got it. Do you prefer morning or evening flights?",
4}

Murf returns audio tagged with context_id: “turn_2”.

3

Handle Interruptions

If the agent is mid-response and the user interrupts (e.g., says “Wait, make that Saturday”):

  • Stop playback of the current audio. The clear: true flag cancels any pending or incomplete responses tied to earlier contexts.
  • Send the updated agent reply with a new context_id:
1// Clear the previous context
2{
3 "context_id": "turn_2",
4 "clear": true
5}
6// Send the updated agent reply with a new context_id
7{
8 "context_id": "turn_3",
9 "text": "Saturday works. Do you want to fly direct or stopover?",
10}

Murf returns audio tagged with context_id: “turn_3”.

Why This Matters

  • Each context ID represents one agent turn in conversation.
  • You maintain clean tracking of each response, even with interruptions.
  • WebSocket handles all turns over a single connection, supporting real-time, fluid interaction.