Can You Hear Me Now? Voice Computing is Here, and It's a Big Deal.

Mike Edelhart

October 4, 2017

Social Starts

When I was a lad and Scotty on Star Trek would bray into his computer to demand maximum warp, it was clever sci-fi, a nifty, but unrealistic, glimpse into the fantasy future.

Now, though, the age of vocal computing has leapt into reality with startling suddenness. How many of you have had the experience I have over the past couple of months: You are at a dinner party and the host suddenly calls into the air: “Alexa, when does the Warriors season open?” Or “Okay Google, who was the fat guy at the bar in Cheers?”

I notice that not a single person at these gatherings is surprised to experience a human declaiming to the air and expecting an answer. And all of them, I realize, believe the answer when it is intoned.

Interestingly, traditional computers themselves are barely part of this phenomenon. Rather, the current vocal era is being driven by the interplay of smartphones, new devices like Alexa and Google Home, and the worldwide network. (Keep in mind, the device Scotty talked to was handheld, as well.)

Culturally, we have been conditioned to be prepared for voice-centric computing.

We’re already comfortable with the experience of humans walking down the street, wearing earbuds, apparently talking to themselves, but actually on conference calls or Skype sessions. Increasingly, people will be asking their phones and wearables for information or to undertake tasks on their behalf, fully expect they will get an answer, and that this answer will be true.

Several tech trends have driven the swift emergence of voice interfaces.

Ubiquitous computing. Most people today have computing power and network access with them all the time. Their phones are always at hand. Their Fitbits are always snapped on their wrists. It has become broadly accepted that digital experiences aren’t something one goes somewhere to get; they are something that comes to the individual directly, wherever, whenever. When computing shifted from desktop to mobile, the digital experience shifted quite fundamentally. Now, as mobile shifts to ubiquitous, the way we interact with the worldwide network will shift again.

Deep consumer analytics. Ubiquity doesn’t just provide universal access; perhaps more importantly, our always-present devices harvest a breadth and depth of analytics of sci-fi proportions. Our digital devices know where we go, how we sleep; they track our heartbeats, our breathing. This means that not only can digital experiences now come to us wherever, whenever-- but that they can be based on a vast depth of understanding about our actions, motivations, wants, and needs.

Stronger AIs. Ubiquity + analytical depth provides ideal conditions for nurturing strong artificial intelligences. This means that our phones and the new in-home voice driven platforms can not only talk back to us, but can do so with startling clarity and appropriateness. The AIs beneath the surface of these devices and can base the choices they make on our behalf from the rich store of data our always-connected lifestyles provide.

So, voice-driven computing is here, but in a form that may still seem like a bit of a parlor trick. Long term, however, the impact will be profound.

Voice is the natural interface for people on the go. Once we accept that the network is always at hand, voice leaps to the fore as the natural interface. Typing while walking (as so many of us have learned with our fat fingers on tiny keyboards) is hard; talking while walking is natural. Talking while busy at home or while doing work tasks is also much more natural than stopping to type or mouse for a result. In VR, voice fits right in, while any other form of interaction would feel inappropriate. Voice will quickly become the dominant way we interact with our digital devices, and through them the worldwide network.

Voice is the essential interface for IoT and nano. IoT devices by and large have no keyboards and few other non-essential resources. Maintaining an on-device database of keystrokes or menu items stretches their capabilities, increases their complexity and cost, and generally holds back innovation. Voice, however, which can be passed directly through to the cloud, parsed by AIs, with only the necessary response coming back to the device, allows them to be simpler while even more capable than otherwise. As nano particles and devices begin literally entering our bodies and becoming part of the fabric of the world around us, the need for voice as the interface for essentially invisible systems will grow even more essential.

Voice shifts the power centers for apps. Control of the screen (think Microsoft’s advantage pitching its apps by controlling Windows’ or Apple’s edge from having a leg up on IoS screens) has been core to the success of today’s Big 5 tech giants. But in a world of voice-driven computing, screens won’t matter. The power in computing will shift, heavily, toward the most prevalent voice platforms, and particularly toward the AIs that drive them. Ask Alexa what happened at the ballgame, and Alexa's AI decides whose news story to read; ask about baking bread, and Google Home chooses which recipe shows first. This edge in sorting and delivering information is what has pushed Google to the fore in search, Amazon in products, and Apple in apps. Now, in voice, those advantages need to be reacquired by all these leaders, or taken from them by new voice-centric players.

In just a few years, it will feel completely natural to handle the bulk of our computing interactions via voice. And the generation born now will grow up thinking that typing in words to obtain value from the worldwide network is as antiquated as using flints to make fire.