Microsoft’s new zero-shot text-to-speech model can duplicate everyone’s voice in three seconds.
This Article is all about the recent Discovery and research by Microsoft in the field of text-to-speech that come up with the Voice version of DALL-E.
Re...