Why speech recognition is a novelty.

I have an iPhone4S which means I have Siri, their voice-activated assistant. Sadly, like almost all speech recognition software, Siri is still largely a novelty, and I've all but given up using it. I suspect most owners have. There's only two things I can rely on it getting mostly right - setting a countdown timer and sending the most basic "leaving work now" text to my wife. The problem is that it's not bombproof and until speech recognition can be relied upon 100%, to the point where you don't have to read what it interpreted and double-check it, it will remain firmly in the novelty category.
Last week for example, I needed to dictate a simple text while sitting at some traffic lights. I said "I'll suck up the leaves when I get home" referring to a conversation talking about leaf blowers and clearing away the autumn debris. What Siri put in my text was "I'll fuck up the girls with my get boned". Had I just sent it, that would have been a problem but Siri has trained me to double and triple check everything it interprets.
The same is true for Xbox Kinect. It fails to recognise, reliably, even the most basic commands and all it's trying to do is play games.
My car can't understand the word "dial" when talking to the phone and consistently thinks I'm saying "cancel".
My friend's Ford MySync system can't get any of the names right in his phonebook, much less understand street addresses for the onboard GPS.
Automated airline flight information lines are a nightmare. They can't get the flight numbers, dates or airports correct so in trying to get the arrival times for Amsterdam for today, you'll be presented with the departure times for Hamburg tomorrow.
Throw in an accent and the already sketchy detection rates can drop to almost zero. I have trouble and I have a relatively flat, unaccented British voice.
You know things in speech recognition have gone horribly wrong when you see people having shouting matches and arguments with their electronic devices.
And that's the point - speech recognition systems cannot be relied upon and as such, they take more time to use than conventional techniques. Typing this blog entry, for example, I'm error-checking at a basic level as I type. If I was speaking this, I'd have to speak a sentence, wait for the interpretation, then go in and hand-correct all the mistakes, which ultimately takes more time than just typing in the first place.

Comments

Popular posts from this blog

The non-separation of the LDS church and Utah state.

Employees don't want much