By Ross Rubin
AI that creates essays, art, and even code, rivaling human output, could obviate many time-honored professions. But it has also threatened the jobs of some nonhuman workers: specifically, voice assistants.
In the mid-2010s, as Siri, Alexa, and the Google Assistant proliferated, the initial buzz rivaled the excitement over ChatGPT today. They were hailed as introducing a new era of natural interaction with our devices, one that might go on to replace the graphical user interface that had dominated computing for a generation. In the short term, they shifted the industry’s competitive dynamics, as Alexa built a device ecosystem that countered Apple’s and Google’s smartphone platform duopoly.
All the major players tried to expand their assistants’ purviews. Apple started a domain-by-domain approach, teaching Siri about sports and entertainment, which was useful in navigating Apple TV. Google and Amazon showed how their assistants could respond to follow-up requests without having to repeat a signature invocation such as “okay Google . . .” or “Alexa . . .” every time. Several board games took advantage of Alexa, and Amazon later showed how the assistant might chime in to a conversation between humans.
The assistants also expanded beyond voice. Amazon and Google created smart display products that could show graphically enhanced responses in addition to speaking them. Google has made smart display functionality a key differentiator of its recently released Pixel Tablet, its first Android tablet in a decade. And Amazon integrated such a display into its Astro robot in a bid to define a category.
But even as the voice assistants stretched their capabilities, the core set of applications remained the same: providing common info such as weather and stock prices, responding to home automation commands, and—particularly on smart speakers—playing music. Stretched beyond their bailiwick, assistants would simply indicate that they didn’t understand the question or couldn’t provide an answer. Or they would cite info from a website, shattering the illusion of omniscience.
In contrast, ChatGPT can exhibit seeming expertise in almost any subject—even if it must make up facts to do so. Its perceived breadth of understanding and depth of its output has led some to see generative AI as having blindsided stagnating assistants. A New York Times article contrasting the AI approaches cited a Saturday Night Live sketch promoting “Alexa Silver,” a parody product pretending to be from Amazon and the AARP. In the skit, Alexa plays the straight man as users exhibit senior-citizen stereotypes such as forgetfulness, rambling, suspicion, difficulty hearing, and feeling cold.
The article also cited a Financial Times interview with Microsoft CEO Satya Nadella, in which the rarely disparaging executive called assistants “dumb as a rock.” Microsoft, which has integrated generative AI throughout its product line with astounding urgency after investing $10 billion in ChatGPT creator OpenAI, began retreating from the voice assistant race not long after it entered in 2014 with Cortana.
Old and new together
The competitive tension between multimodal assistants and generative AI, however, is a false construct. Much as Microsoft began its generative AI embrace as a complementary approach by adding a ChatGPT-style component alongside Bing’s traditional search box, tech companies can bring together the old and the new. From Siri’s mobile roots, assistants were optimized for intuitive input and quick responses. In contrast, ChatGPT will synthesize and revise paragraphs that vary so widely that entire courses have been developed around crafting prompts. And while ChatGPT is available from any web browser and recently landed in app stores, it’s still not as easy to access as a voice assistant.
We have already seen how big tech companies can deploy generative AI to attack or defend generationally dominant offerings from competitors. Soon after Microsoft added generative AI to Bing. Google responded by adding similar technology to its incumbent search offering. And with the roles reversed, Google has added AI-based features to its Workspace productivity suite, challenging Microsoft’s entrenched Office apps, which Microsoft is also imbuing with AI-driven capabilities.
And with its Assistant deployed on billions of phones, Google is heavily incentivized to integrate the two AI directions. For now, however, its long-touted example of AI investment has taken a back seat as the company rushed to counter the wonder instilled by OpenAI’s work. Last May, at its Google I/O developer event—once a showcase for advancements in Google Assistant and Android—the focus was on a new generation of AI technologies. In contrast, the Google Assistant was a footnote that merely offered extra functionality on the Pixel Fold phone.
Both multimodal assistants and generative AI interactions have much to gain by tapping into each other strengths. One LinkedIn user, for example, has shown how Alexa shrugs its disembodied shoulders when asked to use the power formula to calculate the energy consumption of a 1200-watt toaster in 30 minutes, but can explain and apply the formula when it’s using a skill called MyGPT.
The companies that offer assistants recognize that they must meet the growing expectations for resourcefulness that generative AI has established, but must address some key issues to do so. The most significant of these for the long-term is economic viability. Fortunately for the assistant teams, generative AI has the potential to greatly expand not only the general utility of assistants, but their revenue potential, especially for Amazon and Google.
For instance, Microsoft has already shown how the new Bing can help plan travel itineraries, which could lead to the kind of booking referral revenue that Google currently pursues through its maps and flights services. Amazon could similarly leverage ChatGPT-like discussions to recommend products from its retail website, groceries from Whole Foods, and audiobooks from Audible. One day it could even summon a self-driving taxi from its Zoox subsidiary. Even now, in fact, Amazon, says that consumer use of Alexa shopping features has grown 40% in the past year.
Generative AI could also add significant value to simple tasks such as playing a song, providing far more details about artists’s influences and generating a playlist of their songs on command. Then there’s home automation, where Amazon and Google have both played up the push toward automated routines either created by the user or initiated by AI. That would seem to be a divergent path from chatty interactions with ChatGPT. But in a recent interview, the CEO of luxury-home voice control system Josh.ai, which has partnered with Amazon, praised the improvements generative AI integration has made to his company’s signature assistant.
And that leads to the final challenge: trustworthiness. As impressive as generative AI is overall, its overconfidence and hallucinations are a step in the wrong direction compared to the more conservative but authoritatively sourced answers given by today’s prevalent assistants. Much as nobody should take ChatGPT’s answers at face value to prepare a legal argument, we must be able to count on a home AI’s ability to credibly report if there is an intruder in our home. In the context of a web-based text conversation, there is ample opportunity to plaster caveats explaining the experimental nature of generative AI; that may not be the case in a brief voice exchange. However, new options to enforce guardrails are materializing.
Generative AI is on a course to merge with the multimodal assistants that have provided a useful, if limited, experience. As the former seeps into the latter, though, consumers will continue to access intelligence through the most convenient avenue. What have been the ears and voice of our devices will become more capable and knowledgeable conversationalists as their brains evolve.