• September 5, 2015

Scholars Seek Better Metrics for Assessing Research Productivity

Evaluating scholars simply by tallying their citations is "like saying Britney Spears is the most important artist who ever existed because she's sold 50 million records," said Johan Bollen, an associate professor of informatics and computing at Indiana University at Bloomington, as he introduced a daylong workshop on academic metrics here on Wednesday.

"That's not how we do things in the real world," Mr. Bollen continued. "And I don't think we should do things like that in scholarly assessments either. … We need to find ways to take trust, prestige, and influence into account."

But while it is easy to criticize simplistic measures of citations and "impact factors," it is not so easy to find agreement about how to improve those metrics. Faculty members sometimes suggest that evaluators should de-emphasize numbers and instead look qualitatively at how research projects affect the public good. But a version of that proposal might be put into practice in Britain, and the idea is now causing anger and anxiety among scholars there.

At Wednesday's workshop, roughly a dozen researchers, publishers, and agency officials described a wide range of new assessment schemes, most of which are still in the formative stages.

The workshop was organized by Mr. Bollen and supported by a grant from the National Science Foundation. It was held in conjunction with a meeting of the Coalition for Networked Information, a group of libraries and universities concerned with scholarly communication.

A Measure of 'Usage'

Mr. Bollen began the day by describing Mesur, a research project that has compiled data about hundreds of millions of "usage events"—that is, page views and downloads—from several major scholarly publishers and databases.

By using network analysis, Mr. Bollen said, he and his colleagues have been able to estimate which scholarly articles and journals are truly central to the flow of information. They have also experimented with dozens of different measures of scientific impact that might be derived from their data.

They have also mined their database to create a "map of science" that visually describes the flow of information among various disciplines.

The usage data harvested by Mesur and similar projects can be a powerful tool—but that information needs to be interpreted intelligently, said Michael J. Kurtz, an astronomer at the Harvard Smithsonian Center for Astrophysics.

Roughly half of the online page views of research articles in astronomy are clearly generated by scholars, Mr. Kurtz said, because the readers arrive via Harvard's Astrophysics Data System, an academic search portal. But roughly half of the page views are generated by Google searches, and the people who used that tool often seem to be nonscholars who arrived at the articles more or less randomly.

"There is virtually no correlation between the behavior of the two groups of readers," Mr. Kurtz said. So if researchers are going to be rewarded because 50,000 people read one of their scholarly articles, should the evaluators worry about how many of those readers seem to be scientists? Mr. Kurtz did not offer an opinion, but said that question would need to be debated as open-access databases take root.

Analyses of Journals' Strengths

Jevin D. West, a doctoral candidate in biology at the University of Washington at Seattle, described Eigenfactor, another project that seeks to use network analysis to assess the strengths of various journals. (The project's Web site features an array of elaborate graphics and top-25 lists.)

"There is no reason at this point for us simply to be counting citations. We have the computational resources and the conceptual advances in network science" to do much better analyses, Mr. West said.

He added that he and other biologists are interested in analyzing scholarly information flows because they might provide a model for studying other huge, complex, dynamic systems, including ecological systems and genomic databases.

A much simpler project was described by Jorge E. Hirsch, a professor of physics at the University of California at San Diego. Four years ago, Mr. Hirsch proposed what he called an "h index" to replace traditional measures of citations and research productivity.

The idea quickly caught fire, but it has also been widely criticized, in part because of the difficulty of treating papers with multiple authors. (Last month Mr. Hirsch proposed an alternative measure, hbar, that he says would deal with that problem.)

"One flaw in all of these measures," Mr. Hirsch said, "is that no bibliometric measure will do a good job picking up very novel, nonmainstream research until it becomes mainstream." But there is probably no good way to fix that problem, he added; it is just something that scholars will have to live with.

Peter Binfield, the managing editor of PLoS ONE, a major online open-access journal, described the steps that his journal has taken to measure the impact of its work. Every article in a PLoS journal features a "metrics" tab that reveals how often the article has been viewed, how often it has been cited, how many readers have tagged the article on social-bookmarking sites, and how many blogs have discussed the article.

Each of those measures is imperfect, Mr. Binfield said, but there is no reason for publishers not to experiment with them.

Mr. Binfield said it is too soon to say whether those measures will be embraced. "Will the public understand and trust these numbers?" he asked. "Will the scholarly community adopt them? Will promotion-and-tenure committees start to look at them? Will people quote these figures on their CV's?"

Those questions probably apply to all of the projects discussed here Wednesday.


1. richardtaborgreene - December 17, 2009 at 05:51 am

I suggest that we measure how dead our civilization is, then which journal put the last nail in its coffin---that at least will be a practical definition. It will tell us who to blame after all is devastated by our penchant for generating lots of knowledge no one uses (except via citation).

2. triumphus - December 17, 2009 at 06:38 am

So much to measure; so little time.

3. educationfrontlines - December 17, 2009 at 09:09 am

In taxonomy, a number of individual new species may be described by various researchers over a long period of time. When enough species have been described, a researcher may write a monograph that revises the genus and clarifies the biogeography and evolutionary relationships. If this is done well, it may quiet the field for a long time, become the authority reference work in labwork, but result in very few citations. If the researcher gets it wrong, it can result in many citations as colleagues have to correct it.

Subdisciplines vary in the nature of their research and the validity of ham-handed beancounting, no matter how manipulated. There is no substitute for expertise in the field to evaluate research, which means that cross-discipline research should not be compared. Contrived indices are part of the undervaluing of botany, zoology, systematics and many other fields.

John Richard Schrock

4. davi2665 - December 17, 2009 at 10:40 am

Most promotion and tenure committees will continue to count publications, with little attention to the journals in which they appeared. Of course, one of the problems with some of the "high impact" journals is that they are controlled by the good ol' boy network that provides space for their own cronies to publish, but little for the innovative new kids on the block. Citation evaluations can help, but must be placed in the context of the purpose of the paper. Some of the most heavily cited papers are methodological and are cited because they describe the latest technique de jour. Research creativity and real intellectual impact may be very difficult to reduce to numbers for a committee to evaluate. One of the most respected and revered scientists I have ever known had a total number of peer-reviewed manuscripts in the 20s range. They were all classics, comprehensive in scope, and brilliant carried out and analyzed; this professor's impact on the field, on the growth of the discipline, and on the lives of his students and colleagues were significant and obvious. This type of productivity is far more valuable to a field than the "least publishable units" of endless mundane trivia that litter the CVs of many so-called scholars whose focus is on "my CV is bigger than your CV." I still consider the best way to evaluate research productivity and impact is to have leading figures in the field, not selected by the candidate being evaluated, to provide frank evaluation and due diligence on the key publications of that candidate.

5. browng8 - December 17, 2009 at 11:49 am

It is interesting to compare this with the piece on assessing learning outcomes ( http://chronicle.com/article/Learning-to-Hate-Learning/49399/#lastComment ), which we tend to resist, versus research metrics, which, in general, we have learned to accept.

6. cwinton - December 17, 2009 at 01:47 pm

Create a system and people will figure out how to manipulate it. The only true measure of worth is to wait 50 or 100 years to see if the work had any lasting positive impact. That of course would probably eliminate about 99% of what is published today from consideration, but then it's obviously not a practical measure. Assuming most publications have little if any future value and that many are counterproductive, perhaps we should be measuring worth based on a scale from -10 to 10, to be normally distributed around 0 with a standard deviation of say 2, and where negative values correspond to research pollution.

7. wepstein - December 17, 2009 at 11:03 pm

Shrock had it right: "There is no substitute for expertise in the field to evaluate research." But in many universities faculty do not have the required expertise. The life of the mind is often the life of the ego.

Further, would someone out there please translate what hbar is. Its author is cryptic and David Glenn's article strategically avoided a description with a web link.

William M. Epstein

8. dgle6511 - December 18, 2009 at 11:49 am

"Strategically avoided a description with a web link"? The article does link to Hirsch's hbar paper. Click on "hbar" in the text above to see Hirsch's abstract -- and from there, you can download the full paper in various formats.

Here's the basic definition of Hirsch's hbar index, from the first page of his paper:

"A scientist has index [hbar] if [hbar] of his/her papers belong
to his/her [hbar] core. A paper belongs to the [hbar] core of a
scientist if it has greater than or equal to [hbar] citations and in addition belongs to the h-core of each of the coauthors of the paper."

Perfectly obvious, right? Actually, it does become reasonably clear if you wade through Hirsch's paper. But I'll try to translate here:

Suppose that I'm a junior scholar, and that I've published 7 papers, 5 of which have each been cited at least 5 times. So my h-index -- that's Hirsch's original measure -- is 5.

But suppose that I had a co-author on 2 of those 5 papers.

In one case, my co-author was a senior scholar with dozens of publications. Let's say that the senior scholar's h-index is 34. And let's say that the paper has been cited only 11 times, so it does NOT count toward increasing the senior scholar's h-index. Under Hirsch's hbar index, this paper should not count toward my hbar index, either.

In the second case, my co-author was another junior scholar with relatively few publications and an h-index of 9. Let's say that this paper has been cited 10 times, so it DOES count toward building my co-author's h-index. In this case, under Hirsch's hbar index, the paper counts toward my hbar score (and also my co-author's hbar score).

So my hbar score is 4.

See also:

David Glenn

9. wepstein - December 19, 2009 at 08:06 pm

Thanks for the attempt to clarify. I can understand why the description was not in the article. Regards.

10. 11113567 - December 19, 2009 at 09:10 pm

Being cited in an academic, peer reviewed paper is not the same as Britney selling product. Racking up page views is. So if the problem is raw quantification without measurement of quality, how is counting eyeballs going to solve it?

Counting citations that trash a paper is arguably a problem, but since anyone who trashes a scholar considers them worth responding to, it is still a measure of one's impact to be cited, however negatively.

Counting citations is still the best way to measure impact. Compare your own number of citations to Albert Einstein's if you don't believe me.

Add Your Comment

Commenting is closed.

  • 1255 Twenty-Third St., N.W.
  • Washington, D.C. 20037
subscribe today

Get the insight you need for success in academe.