Voice generation time (VGT) is calculated as the sum of generated speech length of every text block. It is consumed every time you render a newly created text block or modify text in an existing text block.
Modifying the generated speech using a different Voice actor, Style, Pitch, Speed, Pause, Emphasis, Pronunciation, Punctuation and Volume for the same text will not consume any voice generation time.
The generated audio file's duration in seconds is deducted from your available balance which can be tracked from the top of your screen in the Studio. This is irrespective of whether you download the audio or not. To keep your voice testing usage low, we recommend that you test your text to voice with smaller paragraphs.