As voice assistants become a normal part of our everyday lives, Qiraat Attar explores the trend and where its headed
You wake up, and it’s seven AM. Alexa tells you the time. She also reminds you of your meetings today and confirms the names of invitees on request, a scrum meeting with the team, a brainstorming session with your clients, or a scheduled video call with your college friends.
Alexa isn’t your roommate or your partner. It’s not a ‘she’. You have a stranger in your home, and they order you pizza and play sad music when you’re feeling blue. This is the life of 110 million users (in the USA alone) and counting, who rely on voice assistants to assist them daily.
Alexa represents the voice assistants of today, the most popular of the ‘voice sisters’, which also comprises Siri, Cortana, and Google Duplex. With the turn of the decade, they’ve grown in strength, speed, and capability. So, who is Alexa really, and how did ‘she’ go from an unknown to the voice in your house?
ALEXA, WHO ARE YOU?’
The cardinal rule of technology is slated for evolution, more convenience, and lesser manual intervention. Examples are everywhere – Phones gradually lost clunky buttons, cars advanced from gear stick drives to automatic, every device in existence has become sleeker, faster and has had a veritable ‘glow up’ compared to its ancestors. Voice assistance is no different.
You might expect the history to be brief, but you could not be more wrong. While voice assistants are recent, the functions that comprise them have been brewing quietly for a while. These are the ears and brains of the operation. After all, before Alexa could say a word, she had to learn to understand yours.
Voice assistants use speech recognition, natural language processing, and speech synthesis to take actions to help their users, not to mention machinations of AI, all of which evolved over their own journeys before they conflated together to form this conversational technology.
Voice recognition technology has been around since the 1960s, supplementing virtual assistance technology popularized in the 2010s. NLP originated from the idea of Machine Translation, coming into existence during the second world war. Speech synthesizers of the electrical variety have been around since 1939, debuting in New York’s World Fair by Homer Dudley under the moniker VODER (Voice Operating Demonstrator).
Just as the magic cave in ‘Alibaba and the forty thieves’ opened only on hearing a secret message, so does a voice assistant gets activated on hearing its decreed command. Let us explore through Apple’s disarming voice assistant. The words ‘Hey Siri!’ are the signal words, waking up the device. Let’s suppose you want to know the score of the football match between UAE and Qatar. The request is recorded, and it travels, thanks to the internet, to a database where it will be broken down into essentials. The massive database compares and correlates your statement with information present, recency, and other cues, formulating an appropriate response. This answer is conveyed back to you, and you have your score a short moment later.
Well, doesn’t it get things wrong? All the time! But the beauty of AI-powered technology is that it learns from its mistakes, determined to make every misstep a learning. Every time you ask it to play a ‘Little Mix’ song, and it starts suggesting mixers instead, it’s logged as an erroneous response to a request. All these devices are gradually beefing up their intelligence through their mistakes, getting better and faster at providing instant answers to any question you might have.
ALEXA, PLAY ‘GIMME MORE’
Siri became the first voice assistant for a vast number of people, setting expectations and ideas about how voice assistants should operate ever since 2010. Soon after, Google rolled out the little ‘microphone’ icon in its Chrome browser, offering voice search years before they came up with Google Assistant. The ball got rolling as Nuance introduced ‘Nina’, Microsoft announced ‘Cortana’, and Amazon put a big bow on it with ‘Alexa’.
Initially, adoption was slow, even on the personal use that was rampantly advertised. People made jokes at the expense of voice assistants and repeating the command several times as the AI failed to understand words or names of people became a running gag for several ‘YouTube’ videos. This faltering is a thing of the past now. With both dedicated devices such as the Amazon Echo, familiar with bedside tables and dining rooms, to the assistants we carry in our pockets, we have forgiven its little mistakes and are thrilled at how far it has come and how far we might go with it. We’ve reached the point where voice assistants are as much a tech facet as they are something separate to consider.
The corporate setup too has slowly begun to find a use for AI-powered voice assistants. Initially, they were only used for secretarial or peripheral duties, such as task tracking and time management, mostly centered around smoother administration. Once adopted, employees started using the same capabilities to improve professional conduct and look more conscientious about their commitments. For instance, Siri can offer proactive suggestions — like texting someone that you’re running late for a meeting. Returned results from Siri are also individualized.
Soon enough, international businesses adopted the technology. Their diverse capabilities made them the right fit for spaces with multilingual employees and the ensuing language barrier. Alexa being equipped to understand several different languages, including Italian, Spanish, Portuguese, Japanese, and Hindi, aided in easy translation, where things don’t get lost due to miscommunication.
If working in research-intensive professions, voice AI can be a big help. Cortana syncs seamlessly with the search engine Bing and answers questions using its database and web results. It will alert you to breaking news and keep you updated as programmed.
Despite increased assimilation, they were still kept away from critical or confidential information, which isolated them from the heart of corporate business activity. One possible reason for this could be the data and privacy concerns that still cloud the adoption of voice assistants. Despite the fear, organizations agreed that using them for merely mundane tasks is arguably a poor utilization of the technology. Hence slowly, voice assistants like Cortana and Siri are being used to analyze data. You can use voice commands to build analytical queries over large data stores, dividing data by date ranges or functions by simply using your voice. Voice assistants also help you not only interpret the data but also provide visualization as well. One can imagine this diversification of technology allows voice AI to become an integral part of the work ecosystem.
ALEXA, CAN YOU BE HUMAN?
Sunder Pichai’s fascinating video at the 2018 I/O keynote address, standing before a teeming crowd to demonstrate the first fully-functional interactive voice assistant. His claim: A bot voice that sounds so naturally human as to be indistinguishable. He has the crowd, but he will lose more than just their attention if his words are proved empty.
The task is simple – Google Assistant has to make a telephonic appointment for a haircut, as you, live at the event. The crowd lets out astonished cheers, hushed and waiting. In the next minute, Google Duplex starts a conversation with the receptionist at the hairstyling salon. The robot sounds uncannily human, with ‘umms’ and ‘ahs’ in the right places, modifying the request when the 12 PM slot needed is unavailable. A booking is made for 10 AM. The woman knows nothing of what just happened – she just made a haircut appointment with a robot.
Interactive voice assistance is like a sequel to the one-sided saga you conduct with your assistant, where you ask them to carry out tasks, but their speech responses are limited to the confirmation of your task. But with this stellar upgrade, the purpose changes. Your voice assistant will be able to do things you don’t have time for – haggling with your cable company on the phone or negotiating the price of stocks. We are predictably filled with cautious excitement as this technology carves into new territory.
ALEXA, DO YOU KNOW EVERYTHING?
If you type ‘Is Siri…’ on Google, one suggestion is ‘Is Siri your enemy?’. One can’t help but wonder. When a technology manifests and attempts to mimic being human and learns every time it stumbles, it’s a little bit of the ‘Planet of the Apes’ moment for us mortals. I mean, what if it rises and takes over, uses commands to crush us instead, ala ‘Portal’ style?
Alright, flipping the world order is too far-fetched. But what is reasonable are rising concerns regarding privacy. The keyword trigger to ‘activate’ voice assistants is not foolproof, which means people can accidentally engage the assistant and inadvertently record huge amounts of sensitive data. Add that to voice assistants that sync with your other devices such as your phone or your car, and you have built a virtually compounded machine, ala ‘Transformer’ style that knows too much to be considered harmless.
There is also the question of overdependence. Already, life looks more sedentary than it did hundred years ago, thanks to the advent of convenient technology. Plus, the expectation is that we live hyper-productive lives in the absence of daily chores is a factually incorrect one. Instead, several humans admit that they’re chronically addicted to their devices. Siri and Alexa might be able to do your every bidding, but we must stop and wonder now, should they have to? The human experience is made richer by simple encounters, and it seems immoderate to bring in an expensive, high-powered device for such tasks. Mundane errands might be time-consuming or boring, but while we work at them, the mind is free to reflect on ideas and enjoy some blessed moments of quiet. It seems unwise to trade off our downtime for what is effectively the capitalist demand of hyper-productivity, with no care for human limits.
One must also spare a thought for basic skill workers. There are millions of jobs in call centers and troubleshooting domains handled by people having basic communication and technical training, with not much else. These integral vocations are the lifeblood of the middle and lower-middle classes, allowing these people to lead dignified lives. Caught in the allure and the purported ease of AI, there is talk of replacing such jobs, en masse, with artificial intelligence. Such sweeping changes without consideration for those who will bear the fallout will definitely be humanitarian and administration disasters.
We live in exciting times. Our devices are alive, animated. Much like the characters in ‘Beauty and the Beast’, they talk, they beep in a complaint, they make their presence felt. They talk to us, they apologize for errors, and they wish us on our birthday. It’s easy to think of this as mere play and embrace them wholeheartedly, but there is a flip side. We must consider if the ease and convenience are worth the constant intrusiveness, the near-total lack of privacy, and what some consider the most blatant and constant data collection of this century. And as public concerns about the fallout seem brushed aside by tech moguls with futuristic vision but no perspective, one has to ask – Is anyone really listening to our voice?
STATS FOR VOICE ASSISTANCE USAGE
- In 2020, there were 4.2 billion digital voice assistants being used in devices around the world. Forecasts suggest that by 2024, the number of digital voice assistants will reach 8.4 billion units – a number higher than the world’s population.
- One-third of enterprises will use conversational speech technology for customer engagement by 2022.
- 65% of smart speaker owners say they are comfortable making purchases with a smart speaker.
- 41% of people who own a voice-activated speaker say it feels like talking to a friend or another person, as per Google.
- In 2021, 68% of respondents said that their company currently has a voice strategy up from 18% in 2020. Of those that don’t, 60% said that is under consideration for sometime within the next 5 years.
- 85% of Amazon customers select the recommended Amazon product when voice shopping.
- While the Apple HomePod understands 99.4 percent of all queries, it only answers 52.3 percent correctly, putting it behind Amazon Echo (64%), Google Home (81%), and Harman Kardon Invoke (57%) in that regard, as per Loup Ventures.
- The voice recognition market is expected to grow at a yearly rate of 17.2% and to reach $26.8 billion by 2025.