Voice emulation is the software-generated reproduction of an individual's speech. The software applies advanced technologies like deep learning and neural networks to speech synthesis, making it possible to imitate the voices of specific people.
Lyrebird, a Canadian AI startup based in Montreal, released software that can produce an imitation of anyone’s speech from a single minute of audio. Lyrebird’s algorithms can take a 60-second recording of a person’s speech as input and generate up to a thousand sentences within a half-second. The software can change the intonation to match a desired emotion, so the output speech sounds excited, for example, or angry or stressed out.
Adobe is working on a similar technology. The company’s Project VoCo system requires 20 minutes of input but then allows the user to edit text similarly to the way that Adobe Photoshop makes it possible to alter images.
The technology is not yet sophisticated enough to be completely convincing but potential applications of voice emulation are promising. Lyrebird’s software could make it possible for you to have your favorite actor read you a book, or you could “read” a book to your child when you were away from home. The software could also enable speech prosthesis for the disabled, reproducing the user’s actual voice.
Other applications of voice emulation are less benign. An attacker could use the technology to masquerade as an authorized user in a voice recognition system, for example, or to imitate someone's voice saying something they hadn't actually said. Such statements could be used to damage a target's reputation or to spread false or weaponized information.