The Reliability, Efficiency, and Affordability of Amazon’s Mechanical Turk

Recently I had reason to get some audio transcribed using’s Mechanical Turk service. The results have been impressive, and in this post I describe my experience and consider ways besides transcription that this service might be useful.

I’m currently participating in a project that involves collecting oral histories and posting them online in video, audio, and text format. When I asked around for suggestions of online transcription services, several people mentioned such sites as Casting Words, Purple Shark, and Kedrowski Transcription. These services charge between $.75 and $2.30 per minute to transcribe audio, meaning that every transcription of a 30-minute interview would cost between $22.50 and $78.00. However, a few people suggested instead looking into Amazon’s Mechanical Turk. So I did. (As an eighteenth-century studies scholar, I’ve always liked the name of this service.) The result? Each roughly 30-minute interview transcription cost about $12, which is way below even the lowest transcription rate I could find otherwise. As for reliability, the results are not as polished as what I probably would have gotten by paying more to a service devoted strictly to transcription, but the difference in quality is nowhere near enough to justify paying the difference between using Amazon’s service and using one of the more expensive options.

How is this possible? Allow me to explain.

The basic framework

  • There are two ways anyone can make use of this service, as a “Requester” or as a “Worker.”
    • Requesters create “human intelligence tasks” (HITs) and offer to pay for their completion.
    • Workers complete HITs in exchange for the offered payment.
  • A HIT can be as simple as looking at an image and coming up with 3 keywords to describe it.
  • Payment can be as low as a few cents per HIT.

In concept, that’s all there is. In practice, the practice of publishing HITS and getting them completed is more complicated.

My specific process

For audio transcription, I followed the instructions in Andy Baio’s step-by-step blog post entitled “Cheap, Easy Audio Transcription with Mechanical Turk.” You can follow these steps, too,

  • if you’re comfortable writing or editing basic HTML,
  • if you can do some basic editing of sound files with an audio application like Audacity,
  • if you have somewhere online to post your audio clips,
  • if you can use a spreadsheet program or text editor to create a CSV file, and
  • if you’re patient with trial-and-error experimentation.

Now those are a lot of conditions, I know. Luckily, I meet them all.

The results

Using Andy’s method, I broke each interview up into roughly 5-minute clips and offered to pay any Amazon Turk worker $2 for each clip they transcribed. Remarkably, I usually had results within just a few hours of posting my HITs, and while the quality varied from worker to worker, the resulting transcriptions were good enough for me to do only some light editing before considering them ready for more attentive proofreading, perhaps by an undergraduate student assistant using the original interview video as a guide.

See for yourself by comparing this video clip from an interview with Clay Jeffcoat, who has used braille pretty much his entire life and now works as the Access Technology Specialist in the South Carolina School for the Deaf and Blind (SCSDB) Vision Outreach Program.

CLAY: “The hardest thing for me about learning braille would probably be…as far as I can recall, I didn’t have an extreme amount of problems learning the alphabet, the contractions. Braille is, in its most basic form, a literal spelling of a word, common words like “with.” You can spell them out letter by letter, W-I-T-H. However, many words have a single symbol or a combination of symbols that can represent either a whole word or a combination of letters. For example, in the English language, sh‘es, ch, en, in… letter combinations like that are commonly used. So, there are symbols to represent those letter combinations. So, you have your uncontracted braille, which is letter by letter, and you have your contracted braille, which can have contractions for certain letters. Those contractions can get confusing. Once you get the alphabet down, you think you’ve got it made. Then, you start with what’s commonly referred to back in those days as ‘grade 2′ versus ‘grade 1.’ Now, it’s ‘contracted’ and ‘uncontracted.’”

Other possible applications

Of course, audio transcription is just one of many possible applications. Amazon provides examples of several “use cases” that include

  • “Catalog and Data Management,”
  • “Search Optimization,”
  • “Database Creation,” and
  • “Content Management.”

For more concrete (and perhaps unambitious) examples, consider these scenarios:

  • A collection of scanned images of handwritten notes need to be turned into text files.
  • Each digital photo in an online archive needs some basic metadata. (Is this black and white? How many people are in the photo?)
  • A rough digital edition of a nineteenth-century novel has been created by scanning the pages and using optical character recognition software, and now the resulting text needs to be proofread against the scanned images.

There are probably more advanced possibilities that I’m not imagining at the moment. As the Turk FAQ explains, anyone with the requisite software development skills can use what they know about API’s to “build applications that help people … use the Amazon Mechanical Turk web service.” (See Julie’s awesome ProfHacker series on Working With APIs, by the way.)

Can we trust the results of such applications? Well, in this undated interview, Amazon talks with Rion Snow, from Stanford University’s Computer Science Department, about using this service “to explore whether nonexpert annotators could give you the same level of quality as expert annotators.” The results of the study are interesting, to say the least:

What we found was, in general if you just asked a single annotator on Turk to label all of the samples, the quality wasn’t going to be nearly as good as if you went and asked a linguist or graduate student to do the same thing. However, if you were able to break the work up among a large group of Turkers and ask them to perform multiple independent annotations per question you can actually do quite a bit better than experts.

But is it ethical?

Most of the Turk workers who completed my HITs ended up earning about $2 to $3 per hour. Does this mean the system is exploiting people? Good question, and I’m not going to pretend that there’s an easy answer. I’ll return to this issue in a subsequent post next week.

What about you?

Have you ever used Amazon’s Mechanical Turk? If not, can you imagine how you might find it useful?

[Image created using a public domain file available at Wikimedia Commons]

Return to Top