Voice recognition: are we there yet?

Voice recognition: are we there yet?

A few days ago, I was having an instant message session with a friend of mine who was complaining that she was suffering from repetitive strain injury. Coincidentally, earlier that day, I had received an e-mail with an advertisement for Dragon NaturallySpeaking 10 Preferred from Nuance software. Taking that as a sign from God, I decided it was time once again for me to check out whether voice recognition software was ready for prime time.

I have to say that on the basis of spending half an hour or so with this new software, my answer may just be yes. It’s been several years since I actually played around with voice recognition software, and my memory of it was that the transcription was filled with errors and that it took a ridiculously long period of time for the computer to figure out what I was saying. But even with the most basic of training, and using the simple microphone that was bundled with the software, I have found Dragon NaturallySpeaking 10 amazingly accurate and responsive.

The programmers of this software have clearly gone well beyond simply trying to recognize what you’re saying; they are trying to figure out what you’re meaning. For example, in the previous paragraph I said the words "Dragon naturally speaking." The software must have realized that I was talking about it because it combined the words naturally and speaking into a single word with the S in speaking capitalized. And yet two sentences ago when I said "naturally speaking" it realized that I was not talking about the product.

I’ve barely scratched the surface of NaturallySpeaking’s capabilities. I’ve been using it to dictate e-mail and, of course, this blog entry. What I haven’t done yet is figure out how to do things in Windows like opening and closing programs or switching between them, and within programs like selecting items from menus.

It’s immediately clear that anyone who has to transcribe a large amount of text would benefit from using this program. Let’s say you had to transcribe the Gettysburg address. It’s about 270 words and if you type 50 words per minute that’s going to take you about 5 1/2 minutes. I’m going to try it now:

four score and seven years ago our forefathers brought forth on this continent a new nation conceived in liberty and dedicated to the proposition that all men are created equal now we are engaged in a great civil are testing whether that nation or any nation so conceived and so dedicated can long endure we are met on a great battlefield of that war we have come to dedicate a portion of that field as a final resting place for those who here gave their lives that that nation might live it is altogether fitting and proper that we should do this but in a larger sense we cannot dedicate we cannot consecrate we cannot hallow this ground the brave man living and dead who struggled here have consecrated it far above our poor power to aggregate tract will little note nor long remember what we say here but it can never forget what they did here is for us the living rather to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced it is rather for us to be here dedicated to the great task remaining before us that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion that we here highly resolve that these dead shall not have died in vain that this nation under God shall have a new birth of freedom and that government of the people by the people for the people shall not perish from the earth

That took one minute and 45 seconds, and only a few words (in red) were misunderstood. Note, however, that there is no punctuation. My understanding is that the program can figure out from your speech patterns where to insert commas and periods but I haven’t gotten that far yet. Now we’ll try it with punctuation marks – in other words, I’m going to speak aloud the words “comma” and “period.”:

four score and seven years ago our forefathers brought forth on this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation, so conceived and so dedicated, can long endure. We are met on a great battlefield of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. But, in a larger sense, we cannot dedicate-we cannot consecrate-we cannot hallow-this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us to living, rather, to be dedicated here to the unfinished work which they who for here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us-that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion-that we here highly resolve that these dead shall not have died in vain-that this nation, under God, shall have a new birth of freedom-and that government of the people, by the people, for the people, shall not perish from the earth.

That took two minutes and 10 seconds and had even fewer mistakes. I can’t imagine that proofing and editing what I just spoke would take much longer than if I had typed it and then had to fix my typos.

I’m eager to keep experimenting with the program to see exactly what it can do. I’m thankful that I work alone, because I can imagine that the initial thrill experienced by my coworkers upon hearing me talk to my computer would quickly wear off.

Posted in All, Software, Technology on Oct 2nd, 2008, 6:31 pm by David Schrag   

2 Responses

  1. October 3rd, 2008 | 2:27 am

    Thanks David for the post. I suffer from RSI and what has saved my carrer is this product from Natural Point, called SmartNav. You can read more about it on my blog. I think I may eventually have to try this voice recognition software.

    http://www.gnetsolutions.info/category/technology-reviews/

  2. Michael Pahre
    October 3rd, 2008 | 10:04 am

    Hey, sounds like voice recognition is now pretty fast and accurate. I’m a quick typist, and it took me 3’20” to type it in and a little longer (total 4′) to spell-check/proofread a similar number of mistakes to what your software produced. Maybe we should all switch over to VRS soon.

Leave a reply