Speech recognition: the flop that’s about to be fab

How’s your ‘paperless office’? Where did you park your flying car? Will you be taking your next vacation on the moon? Perhaps you could call your robot housemaid and ask it to grab a drink from the fridge. Or maybe you could ask Siri to make dinner reservations for you at your favourite restaurant.

Technology has provided us with many lifestyle benefits, but when it fails to deliver on its promises it fails spectacularly, and we can be an unforgiving lot, with very long memories.

One of the most obvious letdowns is one most of us will have had experience of at one time or another, either by way of the voice dialling feature on our smartphone, or via the voice prompts used on many telephone customer service lines. It’s how we expected to be communicating with technology by 2013, but how close are we to achieving speech recognition technology that actually works?

Engineers and scientists working in the field of speech recognition cautiously predict that the technology will be line ball with humans in terms of understanding multi-language speech by the magic date of 2020.

Maybe. And even if we hit that date, it’s hard not to feel that speech recognition technology is the application that has, so far, most seriously disappointed the consumer.

Voice Record — Speech recognition is everywhere. Even in your tablets.

The role of artificial intelligence

Although we often speak (or swear) at our computers, the thought of them conversing with us is may be somewhat disturbing for many people. The ultimate computer with artificial intelligence was HAL, the onboard computer in the science fiction classic, 2001: A Space Odyssey.

HAL could not only understand and respond to human speech, but also determined what was best for the future of the human race, even if that meant losing a few people along the way!

While the computers of today do not have the capabilities of HAL, there is a new era of computers incorporating the use of artificial intelligence to enable speech recognition. Such voice technology systems are no longer just an emerging technology, but are being used by companies such as BWM, Dell, LG and Fisher and Paykel.

Automobile maker, Ford has recently demonstrated an advanced voice technology system that allows you to fairly naturally communicate with your car. Its vehicles can be equipped with ‘conversational’ speech interface technology. The system uses a text-to-speech technology that sounds like you are talking to another person, not a robot.

So what can you talk about with your car?

Ford Invites European AppLink Partners — Ford's AppLink could pave the way for talking to your car.

Want to play music? The system asks what type of music and then will list what artists are available.

Need to make a call? Tell your car to call John Smith and if there is more than one John Smith listed, the car will ask you which one should be called. Also controlled through voice recognition is the car’s navigation system, climate control and retractable roof. Ford’s conversational speech technology has a vocabulary of over 50,000 words.

Electronics manufacturers in particular are exploring the exciting potential of voice recognition technology across their broad product portfolios.

LG-G2-review-2013-08 — At present, LG's G2 doesn't always listen to everything you do, but it could come in an update to Android later on, and in future phones.

At the IFA 2013 Expo in Berlin, LG showcased voice recognition technology via a range of products, including:

The LG Android smartphone (due 2014) with always-on voice commands and the ability to differentiate between general conversation and requests/commands specifically directed at it.

A fridge that can provide spoken recommendations on dishes to cook based on which ingredients are available in the refrigerator.

LG’s robotic RoboKing Square vacuum cleaner that can be controlled with a smartphone using either onscreen controls or voice commands.

lg-roboking — Talk to a robotic vacuum cleaner? Sure. Why not.

Speech recognition v Natural Language detection

Speech recognition technology has been around for decades, but it has always been awkward and far less accurate than typing on a keyboard.

Plus the user has always needed to ‘think’ like a computer rather than naturally conversing with the technology, much like a normal chat between two people.

Achieving this kind of ‘conversational understanding’ has been the Holy Grail of speech recognition technology, and it has taken major leaps forward in the last few years.

Often referred to as ‘Natural Language’ detection, the technology is now appearing in products as diverse as televisions, smartphones, appliances and high-end motor cars.

These products don’t just respond to simple verbal commands, you can request they search the web, find an airline arrival time, or recommend a movie with a particular actor for you to watch.

nuance-using-photo-02 — Nuance's Dragon Dictate allows people to speak and have that appear on screen.

This kind of interaction is here now, and is set to reinvent how technology is used.

Vlad Sejnoha, the chief technology officer of Nuance Communications, one of the leading developers in voice recognition and birthplace of Apple’s digital assistant, Siri, says “We’re at a transition point where voice and natural language understanding are suddenly at the forefront.

“I think voice recognition is really going to upend the current user interface, not just in computers but in a broad variety of devices – it’s really the ‘next big thing.'”