• Wednesday, February 15, 2012

Previous

Next

Text-Mining the ‘Times’

August 1, 2006, 1:53 pm

Researchers at the University of California at Irvine are trumpeting an advance in the field of "text mining:" They’ve managed to get computers to analyze the topics of some 330,000 New York Times stories in just hours.

Text-mining efforts have typically been inefficient because computers require a lot of guidance to categorize text. But the Irvine scientists were able to expedite the process by using a technique called topic modeling, which teaches computers to search for patterns of words that tend to occur together in articles on specific subjects. It’s a pretty safe bet, for example, that articles that include the words "rider," "bike," "race," "Lance Armstrong" and "Jan Ullrich" are about the Tour de France. —Brock Read

This entry was posted in Research. Bookmark the permalink.

  • Print
  • Comment

Comments are closed.