Scholars Seek Better Metrics for Assessing Research Productivity

December 16, 2009

Evaluating scholars simply by tallying their citations is "like saying Britney Spears is the most important artist who ever existed because she's sold 50 million records," said Johan Bollen, an associate professor of informatics and computing at Indiana University at Bloomington, as he introduced a daylong workshop on academic metrics here on Wednesday.

"That's not how we do things in the real world," Mr. Bollen continued. "And I don't think we should do things like that in scholarly assessments either. … We need to find ways to take trust, prestige, and influence into account."

But while it is easy to criticize simplistic measures of citations and "impact factors," it is not so easy to find agreement about how to improve those metrics. Faculty members sometimes suggest that evaluators should de-emphasize numbers and instead look qualitatively at how research projects affect the public good. But a version of that proposal might be put into practice in Britain, and the idea is now causing anger and anxiety among scholars there.

At Wednesday's workshop, roughly a dozen researchers, publishers, and agency officials described a wide range of new assessment schemes, most of which are still in the formative stages.

The workshop was organized by Mr. Bollen and supported by a grant from the National Science Foundation. It was held in conjunction with a meeting of the Coalition for Networked Information, a group of libraries and universities concerned with scholarly communication.

A Measure of 'Usage'

Mr. Bollen began the day by describing Mesur, a research project that has compiled data about hundreds of millions of "usage events"—that is, page views and downloads—from several major scholarly publishers and databases.

By using network analysis, Mr. Bollen said, he and his colleagues have been able to estimate which scholarly articles and journals are truly central to the flow of information. They have also experimented with dozens of different measures of scientific impact that might be derived from their data.

They have also mined their database to create a "map of science" that visually describes the flow of information among various disciplines.

The usage data harvested by Mesur and similar projects can be a powerful tool—but that information needs to be interpreted intelligently, said Michael J. Kurtz, an astronomer at the Harvard Smithsonian Center for Astrophysics.

Roughly half of the online page views of research articles in astronomy are clearly generated by scholars, Mr. Kurtz said, because the readers arrive via Harvard's Astrophysics Data System, an academic search portal. But roughly half of the page views are generated by Google searches, and the people who used that tool often seem to be nonscholars who arrived at the articles more or less randomly.

"There is virtually no correlation between the behavior of the two groups of readers," Mr. Kurtz said. So if researchers are going to be rewarded because 50,000 people read one of their scholarly articles, should the evaluators worry about how many of those readers seem to be scientists? Mr. Kurtz did not offer an opinion, but said that question would need to be debated as open-access databases take root.

Analyses of Journals' Strengths

Jevin D. West, a doctoral candidate in biology at the University of Washington at Seattle, described Eigenfactor, another project that seeks to use network analysis to assess the strengths of various journals. (The project's Web site features an array of elaborate graphics and top-25 lists.)

"There is no reason at this point for us simply to be counting citations. We have the computational resources and the conceptual advances in network science" to do much better analyses, Mr. West said.

He added that he and other biologists are interested in analyzing scholarly information flows because they might provide a model for studying other huge, complex, dynamic systems, including ecological systems and genomic databases.

A much simpler project was described by Jorge E. Hirsch, a professor of physics at the University of California at San Diego. Four years ago, Mr. Hirsch proposed what he called an "h index" to replace traditional measures of citations and research productivity.

The idea quickly caught fire, but it has also been widely criticized, in part because of the difficulty of treating papers with multiple authors. (Last month Mr. Hirsch proposed an alternative measure, hbar, that he says would deal with that problem.)

"One flaw in all of these measures," Mr. Hirsch said, "is that no bibliometric measure will do a good job picking up very novel, nonmainstream research until it becomes mainstream." But there is probably no good way to fix that problem, he added; it is just something that scholars will have to live with.

Peter Binfield, the managing editor of PLoS ONE, a major online open-access journal, described the steps that his journal has taken to measure the impact of its work. Every article in a PLoS journal features a "metrics" tab that reveals how often the article has been viewed, how often it has been cited, how many readers have tagged the article on social-bookmarking sites, and how many blogs have discussed the article.

Each of those measures is imperfect, Mr. Binfield said, but there is no reason for publishers not to experiment with them.

Mr. Binfield said it is too soon to say whether those measures will be embraced. "Will the public understand and trust these numbers?" he asked. "Will the scholarly community adopt them? Will promotion-and-tenure committees start to look at them? Will people quote these figures on their CV's?"

Those questions probably apply to all of the projects discussed here Wednesday.