When cloning a voice, it’s important to consider what the AI has been trained on: which languages and what type of dataset. In this case, the following are available &n ...
With the introduction of the newer models, we also added a style exaggeration setting. This setting attempts to amplify the style of the original speaker. It does consume addi ...
Try changing your voice settings; you'll find them in the "Voice Settings" tab. Each attempt to generate a voice will bring a different result (especially visible at low stability) ...
We only deliver audio in the MP3 format 44.1kHz/16bit MP3 in 96kbps
The similarity slider dictates how closely the AI should adhere to the original voice when attempting to replicate it. If the original audio is of poor quality and the similar ...
Like we are proud to say always using our platform is easy for everyone ! Here is a quick tutorial about how to create voice -
We are working on features that will enable pauses in text more easily and reliably. There are a few ways to introduce pauses into the generated speech for now. The most r ...
We know that the voices tend to degrade or start whispering during longer audio generations, and our team is working hard to develop the technology to improve this. This issue is m ...
Audio outputs and their corresponding text prompts. In this part, we’re highlighting what the text to speech AI can do, particularly in expressing variety of emotions. ...
Multilingual v2 This model has good stability, great language diversity, and fantastic accuracy in cloning voices and accents. Its speed is rather remarkable consider ...
A guide on using stability, similarity sliders for tailored voice performances in Voiceover Air. Learn how to strike a balance between emotive and consistent audio outputs. Our u ...
Numbers, acronyms, and foreign words sometimes default to English when prompted in a different language. For instance, the number "11" or the word "radio", typed in a Spanish promp ...
Based on varying user feedback and test results, it’s been theorised that using a singular long sample for voice cloning has brought more success for some, compared to u ...
We do not have any integrated solution to force a certain pronunciation. However, we are developing a proper solution and the tools to force and fine-tune pronunciations. But, at t ...
All created voices are expected to maintain most of their original speech characteristics across all languages, including their original accent.
The stability slider determines how stable the voice is and the randomness between each generation. Lowering this slider introduces a broader emotional range for the voice. A ...
There are a few ways to introduce a pause or break and influence the rhythm and cadence of the speaker. The most consistent way is programmatically using the syntax < break ...
A guide on how to generate voiceovers using your voice in Voiceover Air. Now that you have your voice, it’s time to generate some voiceovers! To convert text to  ...
No textual-like characters and punctuation such as {,},<,>,[,] will usually result in low-quality speech generated by the model.
We are working on features that will allow for speed optimization.
Unfortunately, we don’t have any such list of symbols. While the model responds to changes in pronunciation, there isn’t a predefined list of symbols that coul ...
We plan on introducing features that allow emotions such as laughter later in the year.
Effective techniques to guide Voiceover Air AI in adding pauses, conveying emotions, and pacing the speech.
These options are inconsistent and might not always work. We recommend using the syntax above for consistency. One trick that seems to provide the most consistence output - s ...
The AI has been trained on a vast amount of audio. The type of audio varies, but the mostprominent is audiobooks. This is the context it understands the best, and it provi ...
When the voice drops in volume, whispers, or distorts, this is most likely a stability issue. How prevalent this is also dependent on the voice used and how wide the dynamic range ...
This is another setting that was introduced in the new models. The setting itself is quite self- explanatory – it boosts the similarity to the original speaker. However, ...
You have a maximum of 15,000 characters per production and you can create multiple voices and voice segments with a maximum of 1500. The 1500 is per voice segment, as anything mor ...