A magic eraser can work wonders on an iron, especially if your iron is spotted with hard water. Get your magic eraser wet, and then rub your cool iron along the sponge until the stains come off. Re-wet the sponge as needed.
Voicebox is a non-autoregressive flow-matching model trained to infillspeech given audio context and text. We train an English-only Voiceboxon 60K hours of data and a multilingual version on 50K hours of datacovering six languages (English, French, German, Spanish, Polish, andPortuguese).
Voicebox can tasks not explicitly trained on through in-contextlearning. It is more flexible than auto-regressive models because itcan condition on not only past but also future context. We show thatVoicebox can be used for monolingual and cross-lingual zero-shottext-to-speech synthesis, style conversion, transient noise removal,content editing, and diverse sample generation.
Getting interrupted by doorbell or dog barking while recording speech?Now there is no need to re-record the speech anymore. Voicebox can beused like a magic eraser to remove transient noise by re-generatingnoise corrupted speech.
Through in-context learning, Voicebox can synthesize speech with anyaudio style by taking as input a reference audio of the desired styleand the text to synthesize. It produces speech that sounds coherent tothe reference in every aspects, including voice, background noise, andspeaking style.