A few weeks ago at Adobe MAX, the tech giant may have killed voice acting. Adobe Voco could be a big problem.
Adobe MAX is the Photoshop firm’s annual conference. It’s their equivalent of Apple WWDC and Google I/O. At the 4-day event, Adobe holds panels, workshops, and classes. They give keynote speeches, and they show all the crazy new tech they’ve been working on.
And some of it is way out there.
What Is Adobe Voco
In the middle of the conference, Adobe held a “MAX Sneaks” session. Executives demoed 11 experimental technologies that are in the pipe. Comedy legend Jordan Peele (Key & Peele) hosted the event along with Adobe Community Engagement Manager Kim Chambers, ensuring that there were plenty of laughs to be had along the way.
The company brought the big guns right off the bat, as the first technology shown as Adobe Voco.
They’re billing it as “Photoshop for Voices”. During the demo, engineer Zeyu Jin demonstrated how the software works. Taking a piece of audio of Peele saying “I kissed my dogs and my wife”, Jin first copied and pasted the word “wife”, transforming the audio to “I kissed my wife and my wife.”
No big deal, and that’s nothing that can’t be done with Audacity. Next, Jin brought up an interface that brought up the sound bite transcribed into text. Rather than pasting the audio snippet of “dogs” back into the clip, he simply typed it in.
In the sound bite, recorded Jordan obediently repeated “I kissed my dogs and my wife.”
And it worked. It was a neat piece of UI wizardry, and would certainly make editing easier.
Then the real shock. Jin erased “wife” and typed “Jordan.” Upon pressing “Play”, Voco did its thing.
“I kissed Jordan and my dogs.”
Peele hadn’t said his own name anywhere in the audio file onscreen. The software created it.
What Adobe Voco Means for Voice Acting
Thankfully, we’re not in much danger of voice actors being replaced by computer software. At least not entirely.
Adobe Voco isn’t able to create voices out of whole cloth. In other words, a user can’t simply type in “I want a bloodthirsty rodent and a dopey tree” and cast Guardians of the Galaxy 3 without signing new contracts with Bradley Cooper and Vin Diesel.
The software is meant for small edits and corrections to existing recorded audio. In other words, if a few words need to be added after a voice actor has left the booth, Voco can handle it.
From the side of the stage, the real Jordan Peele immediately asked how much audio is required by the software to make edits. Peele has done his fair share of voice acting work, and he must have seen the potential right away.
Jin answered that no, Adobe Voco can’t just take any piece of audio and edit it. The software needs about 20 minutes of solid audio, and has limitations on how far afield the edits can go.
Presumably, it’s hunting down phonemes within the recording and then rearranging them to form new words. Phonemes are basically the molecules that make up words. When we make a “T” sound, for example, that sound can be used in any number of words.
If there aren’t enough phonemes to sample from, then Voco doesn’t have anything to work with.
Voice Actors Need Legal Protections
This does mean that voice actors could potentially be employed a great deal less in future, though. If the software can take a large sample of audio and then rearrange it infinitely, then why would studios pay an actor for each individual project?
According to a leading trademark attorney, voices cannot be trademarked. In other words, an actor cannot legally protect their voice so that companies are prevented from copying it. Specific recordings can fall under copyright, but not the voice itself.
What does this mean for voice acting? Potentially a great deal.
There’s nothing currently stopping a studio from hiring a voice actor to record 20-30 minutes of audio, and then using software like Adobe Voco to create the voiceover for other projects. Imagine if a studio hired an actor to record the pilot episode of a new cartoon. The cartoon is picked up by the network, but the voice actor never hears back. Their lines in the rest of the series are computer-generated in their own voice.
This kind of thing can be covered in contract negotiations, but those are never a sure thing. We predict a court case soon to establish whether a voice can fall under trademark.
We’ll keep a close eye out in the meantime.