Researchers at the University of California at Irvine are trumpeting an advance in the field of "text mining:" They’ve managed to get computers to analyze the topics of some 330,000 New York Times stories in just hours.
Text-mining efforts have typically been inefficient because computers require a lot of guidance to categorize text. But the Irvine scientists were able to expedite the process by using a technique called topic modeling, which teaches computers to search for patterns of words that tend to occur together in articles on specific subjects. It’s a pretty safe bet, for example, that articles that include the words "rider," "bike," "race," "Lance Armstrong" and "Jan Ullrich" are about the Tour de France. —Brock Read



