My recent post about using Amazon Mechanical Turk for the transcription of digital audio (a practice which may, or may not, be ethical) has left me thinking about other options for getting audio of spoken words transcribed into written words.
There are many reasons why you might not want to use the keyboard for composing text. You might suffer from carpal tunnel syndrome. You might want to add something to your to-do list but not happen to be next to your computer. You might have a conversation recorded already and want it to be available as text. And finally, you might be a geek like me who likes to see what’s possible with new hardware and software tools.
If you’re looking for ways to have humans do the transcription, then uploading your jobs to Amazon Mechanical Turk is not the only route: you could go with online services like CastingWords (which, it turns out, uses AMT), Purple Shark, or Kedrowski Transcription. These services are not cheap — much of this kind of work is driven by the fields of law and medicine, where clients can afford higher rates — but they promise accuracy and convenience.
There are also machine-based solutions, where software does the work of transcription. In this post, I cover a few of those options. Now I wish that I could say that there is a wide variety of companies working on the issue of speech recognition. However, it appears that one company — Nuance — has dominated the market with two desktop applications called Dragon NaturallySpeaking and MacSpeech Dictate, the iPhone app Dragon Dictation, and last year’s acquisition of web service Jott. All of the products from Nuance make use of the same speech recognition engine, which can be run on a server (to which you connect via your mobile device through Dragon Dictation or Jott) or on a desktop computer (using Dragon Naturally Speaking or MacSpeech Dictate).
Google Voice
First, I’ll continue the
ProfHacker love affair with Google before moving on to the offerings by Nuance. This service is free, which is great, but the accuracy of the transcriptions
leaves something to be desired. I’d expect the accuracy to improve as they continue to tweak the service.
With a
Google Voice account, you’ll find that any voicemail left for you is transcribed and then sent to your Gmail inbox. Need to write a paragraph or two? Call your Google Voice number and leave yourself a message. Need to transcribe a long audio clip? This is probably not the way to do it.
(Here’s the company’s 00:44
YouTube video explaining transcription.)
Jott
Like Google Voice,
Jott allows you to call a number and speak what you want transcribed. Unlike Google Voice, Jott is not free. Subscriptions are available in different plans. For $3.95 a month, subscribers get unlimited transcriptions of 15-second audio recordings. For $12.95 a month, you get unlimited transcriptions of 30-second audio clips. Finally, the pay-as-you-go option allows you to get a total of 5 minutes of audio transcribed (in 30-second clips) for $6.95.
Jott will connect to a
wide variety of web services, so if you want to add to your
Remember the Milk list, update your status on a social media site, or create an appointment on your Google Calendar, you can do so through a call to Jott. If what you’re looking for is something to allow you to dictate a long (or long-ish) message, though, then this is not the solution for you.
Dragon Dictation
With this free
iPhone app from Nuance, you can speak into your iPhone or iPod Touch (for what appears to be a maximum of about 30 seconds) and your audio is sent to the company’s servers, where it’s processed and sent back to your device as written text. The time that it takes to do this is, in my limited experience, negligible. This is a good solution for composing a brief email, text message, or social media status update.
If you’re concerned about privacy, Dragon Dictation might make you a little nervous. Do they store your audio on their servers? What do they do with your list of contacts, which are uploaded to their servers in order to improve the accuracy of transcription?
Mel Martin reports that the company has assured him users’ data is safe, but it would be nice if the company were a little more explicit in what they tell users about this issue.
(Here’s the company’s 01:02
YouTube video about Dragon Dictation.)
MacSpeech Dictate
This
desktop application for the Mac environment works surprisingly well. I’ve used it occasionally and been impressed by the accuracy of its speech recognition. However, I’ve found it awkward to speak out loud the punctuation and paragraph breaks necessary for proper formatting with such a tool. The awkwardness would probably diminish with continued use, but I haven’t gotten there, yet.
MacSpeech Dictate is not cheap: it retails for $199 with a headset microphone, but you can buy it from
Amazon for a bit less (and probably from other online vendors, too).
Dragon Naturally Speaking
Because I’m a Mac user, I’ve never tried this
desktop application for the Windows environment, but my understanding is that it works in essentially the same way as MacSpeech Dictate.
The standard edition of Dragon NaturallySpeaking is more affordable than MacSpeech Dictate: without a headset, it retails for $99 but is
available on Amazon for a lot less; with a headset, it retails for $199 but you’ll pay less if you make your purchase
from Amazon or other online vendors.
What about you?
Do you use speech recognition tools? What’s been your experience?
[Creative Commons licensed photo by Flickr user Duchamp]
Comments
1. Chad - March 04, 2010 at 09:44 am
I believe Nuance just bought the company that makes MacSpeech Dictate, so it is likely that eventually Dictate will be upgraded to share the same advanced features as its PC relative.
2. George H. Williams - March 04, 2010 at 09:50 am
Thanks, Chad. Yes, that acquisition happened just a couple of weeks ago.
3. jra - March 04, 2010 at 09:57 am
Adobe Soundbooth includes a built in voice to text transcription tool. Adobe Soundbooth is included in a number of the Adobe CS Suite's as well as the elearning suite - so many people may already have a solid transcription program without even knowing it.
4. OPIEWeb - March 04, 2010 at 10:26 am
There is also Speech Recognition built into Windows 7 which can be used for controlling your PC or dictation. I hear its also available in Vista, but like most Vista features, it was vastly improved for Win7. It has similar accuracy to Dragon Naturally Speaking 10, but lacks many of the more advanced features.
5. Billie Hara - March 04, 2010 at 10:59 am
I'm a PC user, and I am a huge fan of Dragon Naturally Speaking. It was a slow process, I think, to get it trained to my voice, but I've found that the more I use it the more accurate it becomes. I have also found that with DNS (or other voice to text programs), the better microphone you have, the better the transcription.
6. OPIEWeb - March 04, 2010 at 11:40 am
Your last point can not be stressed enough. There is no software solution for bad hardware. Ever*. The difference between a barely adequate microphone and an above-average one is less than $50.
*This applies to anything you want to do with a computer.
7. joanna - March 04, 2010 at 11:40 am
I've just ordered the Educator Pro version of Dragon Naturally Speaking, and Academic Superstore gives educators a discount, so I'll be paying about 70 dollars for it.
8. Drew - March 04, 2010 at 06:01 pm
I use the stock software that comes with Vista one one of my computers. The speech to text is not so good. What makes it useful to me is that it will open windows, or navigate backwards or forwards on a webpage. Kind of interesting. Not very practical for me.
Add Your Comment
Commenting is closed.