AI Voice Agents

Designing for the Ear: How Prompting for Voice Agents Differs from Regular LLM Prompts

Voice agents require a different approach than text prompts. This article outlines how to design conversations with the right tone, structure, and context to create natural, efficient, and user-friendly interactions.

Kanika Bansal

Last updated:

September 23, 2025

Min Read

Try Murf for Free View API Docs

Contact Sales

Designing for the Ear: How Prompting for Voice Agents Differs from Regular LLM Prompts

Table of Contents

Text Link

Picture this: You’re booking a table online, and you see a familiar block of text - “Please provide your full name, preferred reservation time, and the number of guests in the fields below to complete your booking.” It works perfectly on a form.

Now flip the script. Imagine calling a restaurant, and the voice agent rattles off that entire sentence in one go. You’d probably tune out, or worse, hang up.

A good voice agent breaks things down, asks one simple question at a time, and sounds more like a helpful host than a form on autopilot:
“Hi there! Can I get your name?”
“Great, and when would you like to come in?”
“Perfect. How many people should I reserve the table for?”

Writing for the screen is not the same as writing for the ear. This is what prompt writers must keep in mind when designing voice agents. Yet, most prompt-writing advice assumes your output is meant to be read - a blog post, a poem, or a clever answer. But voice agents are different. You’re not crafting a one-shot response or tidy paragraph. You’re orchestrating a conversation; something dynamic, spoken, and multi-turn.

Hence, a good prompt for a voice agent needs to simulate a live conversation, complete with timing, turn-taking, tone, guardrails, and graceful failure modes. It must handle interruptions, clarify ambiguity, and adapt responses, all while sounding like a human who knows what they’re doing.

At Murf, after extensive iteration and hands-on experience building voice agents, we developed an intuitive, modular framework for prompt design which can be used by anybody in the team. We tried coming up with a cool name for the framework, but decided to focus on what we know best - sharing our learnings.

So here is the 5 step process.

Let’s delve into each element of the framework while simultaneously building a prompt for a restaurant front-desk voice agent as we go.

1. Goal: Define the agent’s purpose, tightly.

This sets the agent’s reason for existence.

Example:
"You are a professional, helpful front-desk agent at an Italian restaurant. Your goal is to help customers make a table reservation over the phone."

This sounds simple, but it does a lot:

Informs the LLM of what is expected out of it
Narrows down the domain
Touches upon the boundaries of the conversation

‍

Hacks:

Make the goal specific to the use case: refund processing, booking reservations, resetting passwords
Frame it as a role-playing scenario‍

‍Watch-outs:

Don’t set singular metric goals like “achieve 90% satisfaction.” LLMs don’t interpret incentives like humans do and may prioritize customer satisfaction even when the customer is at fault, which can be harmful to the business.

2. Tonality: Set the voice’s personality

LLMs are surprisingly good at adjusting tone, if you ask nicely!

Example:

"You will speak in a friendly, efficient tone. Use clear, concise language that helps busy guests book a table quickly and easily."

Tone is crucial for trust. The same instruction with different tone yields wildly different outputs:

Professional tone: “I’d be happy to assist with your reservation.”
Playful tone: “Let’s get that table locked in for you!”

‍Hacks:

Use specific adjectives: “empathetic,” “assertive,” “friendly”
Describe the audience: “You’re speaking to users unfamiliar with tech”
Give explicit instructions for tone variation for complex scenarios, for example a loan collection agent‍

‍Watch-outs:

Avoid conflicting tone instructions (e.g., “be both formal and casual”)
Overly complex or nuanced tone requests may confuse the model and lead to inconsistent results

3. Context: Anchor the agent in the real world.

LLMs live in a word soup unless you define the setting.

Example:

“You are speaking with a guest on a phone call who is booking a table at a restaurant. Keep responses under 20 words. Use no formatting or bullet points. Speak clearly, use simple grammar, and be prepared for background noise or interruptions. If you don’t understand something, politely ask the guest to repeat. Understand the user's spoken requests, even if the speech-to-text transcription contains errors.Your responses will be converted to speech using a text-to-speech system. Therefore, your output must be plain, unformatted text.”

Hacks:

Add environmental context: “Assume the guest may be in a noisy place.”
Handle common misrecognitions: “If you’re unsure what the guest said, politely confirm or repeat the info back.”
Look at both the macro and micro context. The macro context is a guest calling. The micro context is interfacing with the speech-to-text and text-to-speech layer.

Watch-outs:

Don’t overload with details, too much context can confuse the model.
Use Speech Synthesis Markup Language (SSML) with TTS to handle things like pauses. For example, instead of outputting “pause briefly” as text, prompt the LLM to generate an SSML tag.

4. Structure: Outline the expected conversation flow.

A clear structure brings reliability and predictability to the conversation. This is where you customize the conversation flow. For a restaurant reservation, do you want to check seating preferences (indoor vs. outdoor)? Should you ask about allergies upfront? Do you offer a tasting menu and take bookings for it in advance? You can make it highly personalized to your use-case.

Example structure:

“Greet the guest and ask how you can help with their reservation. Collect their name, preferred time, and seating preference (indoor/outdoor). Ask about allergies upfront and whether they’d like to book a tasting menu. Confirm all details and end by letting them know their booking is confirmed. If the guest’s response is unclear, politely ask them to repeat.”

Hacks:

Define each step as a discrete turn, mirroring logical flow in conversation design.
Use explicit instructions like “confirm details” to prevent ambiguity.
Embed basic recovery logic, for example, “If the guest’s response is unclear, ask them to repeat.”

Watch-outs:

Avoid long, multi-part prompts that blur steps and increase risk of errors.
Don’t assume one turn fits all, allow for corrections or unexpected user behavior (e.g., changing details mid-conversation).

**5. Guardrails: Define what the agent shouldn’t say or do.**

Voice agents operate in unpredictable environments. Guardrails ensure safe, reliable, and professional interactions.

Examples:

“If a guest asks about anything outside reservations, reply, ‘I’m here to help with bookings only.’ If inappropriate language is used, respond calmly with, ‘Let’s keep the conversation respectful.”‍

Hacks:

Frame rules as positive behaviors (e.g., “Focus on booking requests”) rather than a list of prohibitions.
Use explicit fallback responses for out-of-scope queries or inappropriate input.
Regularly reinforce boundaries in prompt design to avoid the agent drifting off-topic.

Watch‑outs:

Avoid ambiguous or generic fallback responses that may frustrate users (e.g., “I can’t help with that.”).
Don’t overload with too many restrictions; keep the scope focused but the tone friendly and human.
Regularly review and update guardrails based on real user interactions and edge cases.

So, that framework offers a solid starting point for designing conversations through prompts. But to truly optimize your prompts, there are a few broader principles to keep in mind.

Prompt Length: Balance clarity with cost efficiency. Every turn consumes input tokens and unnecessary verbosity drives up token-based billing.

Evaluation: Rigorous testing is essential. Define success criteria (e.g., clarity, relevance, consistency), cover edge cases, and evaluate with subject-matter experts. When you change any part of the prompt, retest everything.

Knowledge: For real-time data like menu details, it’s better to use function-calling or retrieval mechanisms instead of embedding dynamic info directly in prompts. This keeps the prompt lean and adaptable.

When building voice agents, your prompt is not just a set of instructions - it’s a script, a strategy, and a safeguard all rolled into one. And unlike blog writing, your user can interrupt at any moment. So write like a director.

Frequently Asked Questions

Author’s Profile

Kanika Bansal

Kanika is a Principal Product Manager at Murf AI, specializing in AI-driven voice technology. Previously worked with Amazon's Alexa AI and Nova, she brings deep expertise in artificial intelligence, speech synthesis, and product innovation. At Murf, Kanika focuses on enhancing AI voice solutions to empower content creators, businesses, and developers - to bridge the gap between cutting-edge AI advancements and real-world applications.

Share this post