Powerful, Feature Rich Alternative to Amazon Polly Text to Speech

Murf text to speech has 4 dream features for every content creator that Amazon Polly misses out on. Free Trial. Natural Voices. Simple UI. Customization.

No items found.

This Amazon Polly alternative has 4 dream features for every content creator

Amazon Polly is a cloud-based service that converts text into lifelike speech. It produces natural sounding speech using advanced deep learning technologies. Over the last couple of years, text to speech (TTS) has found mainstream acceptance in entertainment (animation, game videos), marketing (product demos, marketing videos), contact centres (IVR, chat bot voices), assistive apps and devices, and personal voice assistants like Siri and Hey Google. Services like Amazon Polly make high quality voices through speech synthesis accessible and affordable, they also offer real time speech generation. TTS has created entirely new categories of human-AI interactions through read-aloud applications, live translation and speech synchronized facial animation.

In this article we cover how Amazon Polly works, its key features, what it’s great and not so great for, and 4 features every content creator needs for the perfect voiceover.

How Amazon Polly Text to speech works

Amazon Polly text to speech works in 3 steps.

Input text

Polly synthesizes the text entered into an audio stream. You can provide the input as plain text or in Speech Synthesis Markup Language (SSML) format. SSML tags help you control speech output metrics like volume, pitch, and talk rate. For custom pronunciations, Amazon Polly supports lexicons.

Choose the ideal voice

Amazon Polly voices cover a range of lifelike voices, male and female, across multiple languages. Once you have entered your text, select the voice of your choice by specifying the voice ID. Amazon Polly will use this voice to convert the generate speech.

Create speech files

The audio files generated are available in multiple formats. The MP3 or Ogg Vorbis format is for web and mobile applications, and the PCM output format for AWS IoT devices and telephony solutions. You can also download the metadata stream alongside the audio file using speech marks.

Amazon Polly voices can be accessed via the Polly API (and various language-specific SDKs), AWS Management Console, and the AWS command-line interface (CLI).

Key Benefits of Amazon Text to Speech

Voice and language options

Amazon launched Polly in November 2016 with 47 natural sounding voices across 24 languages. Today you can synthesize speech using any of the 68 male and female voices across 24 languages and accents.

There are two types of voices - Standard TTS voices and Neural TTS voices.

Standard TTS voices use concatenative synthesis, which involves stringing together the phonemes of recorded speech.

Neural TTS (NTTS) voices are generated by a two-part system that emphasize frequency characteristics that are unique to human speech. NTTS also has a newscaster style for narration based use cases.

Fast response time

Amazon Polly uses the power of the cloud and the agility of its APIs to deliver near instant speech generation. Its low latency is particularly useful for real time applications like dialog systems and assisted communication. Polly also allows you to optimise the streamed audio through various sampling rates.

Custom brand voice

Brands can work with the Amazon Polly team to build an exclusive Neural Text-to-Speech (NTTS) voice. A brand voice is another touchpoint to reinforce a brand’s identity. To know more, read our article on brand voice.

What Amazon polly text to speech is great for

Reliable TTS services

According to G2 reviews, users find Amazon Polly easy to access and use. The most popular use cases are chat bot audio, help desk queries and interactive voice response (IVR). Polly also offers one bilingual voice that can speak both English and a foreign language in the same sentence, and newscaster voices designed to deliver high quality voice output in news form.

Simple API operations that generate lifelike speech

Developer teams can leverage Amazon Polly's API through SDKs or the CLI to build speech enabled applications. AWS users can also use the Wordpress and Medium plug ins to create audio content for their blogs, pages and websites. The API returns the audio to your application as a stream so you can play the voices immediately.

Unbeatable price for AWS customers

AWS free tier users get five million characters free every month for the first year. This is an irresistible offer for existing AWS users looking for TTS services to access high quality voices. With Polly, Amazon’s strength in cloud management combines with on-demand audio streaming to download, store and redistribute speech. Amazon Polly has a pay-as-you-go model that charges only for text synthesized, which users find cheaper and more efficient.

What you won't find in Amazon Polly

Non-text input and non-audio output files

Amazon Polly synthesizes text into high quality voices through a series of specific commands that need to manually entered. The input text needs to be in one of the 24 recognised languages, and the output will be an audio in the voice selected. While this is great for standard text to speech actions, you cannot add a voiceover for your video, for example, or edit an existing audio file in Amazon Polly.

Speech recognition services

Speech recognition services include dictation, voice typing and transcription. This is called speech to text and is available as a separate application called Amazon Transcribe.

An intuitive user interface

The Amazon Polly interface is a single page with a box for text and a couple of drop downs for the output format. Refinements like speaking style, speech rate and pitch are made with SSML tags, and custom pronunciations are done through lexicons. In other applications, Polly’s output can be coded through an SDK or an API. Given these requirements, the process of generating speech with specifications can be intimidating to non-developers.

What you need to create the best voiceovers

Voice’s role in mainstream content has amped up in tune with the spectacular growth of online video. Podcasts and audiobooks have further increased the demand for high quality audio. Multimedia content continues to be effective and persuasive, and has spurred growth in complementary media strategies like audio blogs, animate avatars and even karaoke style word highlighting. Studies have shown that the combined effect of video and audio in digital marketing, product demos, reviews and other content can boost engagement and purchase metrics.

Consumer content needs to work on multiple parameters to engage its audience. To create the perfect voiceover, here are 4 features that every content creator should look for in a TTS tool:

A no-strings-attached free trial

Amazon’s TTS service might make obvious sense to some existing AWS customers, but others need to jump through multiple hoops to try Polly’s voices, notably an AWS account and a credit card. (Add a straight forward sentence that Amazon doesn't have free trial)

A free trial helps creators evaluate if a text to speech tool meets their requirements. With just an email signup, Murf offers every new user 10 minutes of voice generation time. No credit card is required, and users can access Murf Studio till the minutes are exhausted. Murf also offers usage-led pricing plans, including a one time pack that’s an excellent stepping stone to a subscription.

Emotive AI voices

Even in today’s screen-dictated world, voice remains the prevalent method of communication. As conversational chatbots and remote contact centers proliferate, the human voice is an enduring example of the emotive power of sound. It is capable of an incredibly complex range of tone, intonation and pitch. No voice is like any other.

This social power of voice lies in its non-verbal ability to convey emotional cues. Each of the 120+ natural sounding voices in Murf Studio is a personality in itself, in that each one was created with a specific tone and use case in mind. The same words, said in a different voice, can have entirely new meaning. Murf voices are high quality voices that go beyond words and sounds by adding character to content through depth of feeling.

Comprehensive audio editing options

Amazon Polly is great for building speech enabled products and applications, where the generated speech has pre-coded requirements. However, content creation use cases like audio books, elearning modules, product demos and video ads are essentially marketing assets with a specific audience in mind.

To stand out in the crowd, consumer content needs to be relevant and engaging. Ensuring that the voice and the content fit together perfectly is a critical component to this success.

Murf offers high quality voices to suit every possible use case as well as comprehensive audio editing options. You can edit the audio for emphasis, pitch, speed and most importantly, pronunciation. You can then sync the edited audio with video or images in Murf itself.

Murf also has an add-on for Google Slides, using which you can add realistic voice overs to your presentations.

All-in-one voice maker

Beyond a basic web interface to input the text and get an audio output, Polly relies on APIs for its usage. Murf, on the other hand, is not just another text to speech tool. It is the ultimate toolkit to make voice over videos. You can upload images and videos, add the voices of your choice and sync them together in the feature-rich Studio interface. You can also adjust timings, fade in and fade out music, and move around the audio and video content to your satisfaction.


Amazon Polly text to speech (TTS) is a cloud-based service that converts text into lifelike speech. TTS is driven by a technology that evolves as it handles more data, a process called Deep Learning.

Amazon Polly uses high quality voices tuned for a global audience. For particular sentences, words and sounds, the metadata in the synthesized speech audio stream specifies words and through lexicons and speech marks. Audio stream applications use Amazon Polly's fluid pronunciation in content creation, audio-only assets as well as individual services like real time translation.

However, Polly has some drawbacks. It cannot handle non-standard speech generation requests, works best through APIs and speech to text services are available through a separate application.

Every content creator understands that consumers want relevant, engaging content with a clearly stated benefit. Murf offers 4 features designed to help build perfect voiceovers for every need:

  1. A free trial on email signup.
  2. 120+ studio-quality AI voices across age, gender and accents that can also be filtered by use case.
  3. User-friendly editing interfaces to fine-tune pitch, emphasis, speed and pronunciation of the audio.
  4. A complete voice over maker to create everything from audio books and caller scripts to video ads and elearning modules.

With a powerful, feature-rich web-based platform like Murf, creating global content need not feel out of reach.