• April 21, 2014

Counterpoint: Doctoral-Program Rankings—the NRC Responds

When A Data-Based Assessment of Research-Doctorate Programs in the United States was released by the National Research Council on September 28, we expected many comments, including criticisms. We would like to offer our perspective on some of these comments and look forward to hearing the views of others on the type of online database that should be maintained in the future to help strengthen doctoral programs.

Stephen Stigler, a professor from the University of Chicago, offered a thoughtful critique, saying that the project "was doomed from the start" when reputation was downplayed. Rankings based on reputation were provided in the 1982 and 1995 NRC reports, but they were excluded from the charge given to the committee for the 2010 report. We agree that reputationally based rankings contain important information, which was discussed in the report, but they also contain seriousness weaknesses.

The decision not to include rankings based purely on reputation arose from three considerations. First, pure reputational rankings can contain "halo effects," meaning a program's ranking may be skewed by the university's overall reputation, or it may lag behind because of its past reputation. Second, reputationally based rankings were not supported by many universities whose participation was needed to collect the data. And third, this study was intended to provide a comprehensive, updatable collection of data on characteristics associated with perceived program quality that would allow faculty, students, administrators, and other stakeholders to assess programs based on their own values, and thereby become a transparent tool to strengthen doctoral programs. The reputationally based rankings that received the most attention in previous reports did not provide a means to achieve this last important objective.

Professor Stigler also argues that the rankings have "little credence." We want to stress that the rankings provided in the NRC report are really intended to be "illustrative." They are not intended to be definitive; they are not endorsed by the NRC above other alternatives that might be constructed. Instead, they are examples of two ways of deriving faculty values for determining weights for making rankings, which illustrate how stakeholders can apply their own weights.

Professor Stigler criticizes the fact that not all programs were surveyed in determining the weights for the R (regression-based) rankings. Even if this had been done, the R and S (survey-based) rankings would still be called "illustrative." An important insight from comparing the differences between the S and R rankings is that faculty members generally do not assign great importance to program size (compared with scholarly output per faculty member) when assigning weights directly to characteristics—but when they rank programs, size appears to implicitly carry large weight. So it is not so clear which weights most accurately represent faculty values.

The committee also wanted to avoid the defect of "spurious precision" associated with rankings. The ranges provided with the R and S rankings represent an estimate of statistical uncertainties, but they do not represent all of the uncertainties associated with the challenging task of trying to develop a methodology for assessing programs based solely on quantitative measures of program characteristics. The committee presented both R and S rankings with large statistical ranges and called them "illustrative" to indicate that models based solely on quantitative variables and implicit and explicit faculty views also have pitfalls, as noted in the report.

A valid criticism made by some observers relates to potential errors in the huge database. The NRC took many precautions during data collection to ensure the accuracy of the information. In spite of these efforts, errors may persist. The NRC seeks information by November 1, 2010, regarding possible mistakes, and will work with each institution to identify the source of the error and to see if it can be remedied. We will record what universities tell us on a publicly available list. At least one institution submitted faculty lists for some departments that were not correct because numerous adjunct faculty members were included, which is unfortunate.

A number of other universities corrected similar errors during the validation process. Some errors may have arisen from complex guidance provided by the NRC in an effort to obtain lists of only those faculty involved in doctoral education and research. Other errors may have resulted from mistakes by some universities during efforts to collect and submit data. At this point, it would be difficult to make corrections on the 2005-6 data for characteristics such as publications and citations, which depend on faculty lists.

The fields of computer science and communication are examples where the illustrative rankings may be more problematic. In the case of computer science, the publications of faculty members were re-counted when it became clear that certain peer-reviewed conference proceedings were highly valued in the field. While the publications per faculty member were updated based on this re-count, citations per faculty member had to be dropped, because it was not possible to include peer-reviewed conference proceedings in the data on citations. In the field of communication, the information collected for the database may not include all types of scholarly work that are most important for assessing programs in this discipline.

We hope that departments will post updated data, perhaps for the 2009-10 academic year, on their program Web sites. We also hope that each discipline will discuss whether the program characteristics and methods of data collection encompass what is most important for assessing their programs, so that appropriate changes may be incorporated in the future. Lastly, we hope that all of those interested in graduate education in the United States will work together to maintain a regularly updated, online database that can be an important tool for helping to strengthen graduate programs through continuous improvement. The National Research Council is certainly willing to help with that task.

E. William Colglazier is executive officer of the National Academy of Sciences and chief operating officer of the National Research Council. Jeremiah P. Ostriker is chair of the Committee on an Assessment of Research-Doctorate Programs and provost emeritus at Princeton University.


1. soc_sci_anon - October 18, 2010 at 02:03 pm

This doesn't address the seemingly incomprehensible decisions to, for example, exclude books as part of research productivity in the social sciences, to fail to account for the quality of the journals in which articles appear (using, e.g., impact factors), or to only count math GRE scores. Did the committee talk only to economists and assume that they are representative of the social sciences?

The measures are deeply flawed, even if they were accurately entered (which, as the NRC admits, they often weren't). Yes, they can just be ignored, but the gloss of objectivity in measuring, e.g., faculty productivity, lends legitimacy to what is a massively flawed database. Garbage in, garbage out, but in this case the outgoing garbage is dressed in confidence intervals.

It's also ironic that the (current) NRC disparages reputation-based measures. After all, if any organization except the NRC had produced this effort, it would be ignored, and deservedly so. It's only the NRC name that gives the rankings as much air-time as they currently have.

2. sabdale - October 19, 2010 at 03:39 am

This flawed means of gathering information lend credence to what Paulo Friere talks about in Pedagogy of the Oppressed-simply oppressive methods of disparaging institutions who do not comply with their microscopic version of rating colleges and university. How about using a fair system not based on standardized tests data?

3. marka - October 26, 2010 at 01:38 am

Whoa - wow - Hey - if you don't like someone's metrics, please provide your own. Way too easy to take potshots at efforts to give illustrative rankings - probably way too hard for those criticizing to come up with appropriate measures.

In my experience, folks who criticize metrics, when pressed, can't come up with anything even remotely comparable to be used as a way of judging relative merit. Instead, because measuring merit may be difficult, they simply through up their hands and claim it is impossible. Of course, many then turn around & use all sorts of 'metrics' to make everyday judgments in their respective lives: grades, test scores, consumer reports evaluations, reference letters, recommendations from friends & family, peer review, brand status, etc.

So ... what would be 'fair' that is not based on standardized data? To me, that is oxymoronic - can't be 'fair' if it isn't standardized (one teacher's 'A' is not really comparable to another's, unless one 'standardizes' how one issues an 'A').

Add Your Comment

Commenting is closed.

subscribe today

Get the insight you need for success in academe.