• April 17, 2014

A Critic Sees Deep Problems in the Doctoral Rankings

A Critic Sees Deep Problems in the Doctoral Rankings 1

Kris Snibbe, Harvard News Office

Stephen Stigler

Enlarge Image
close A Critic Sees Deep Problems in the Doctoral Rankings 1

Kris Snibbe, Harvard News Office

Stephen Stigler

One scholar who is not impressed by the National Research Council's doctoral study is Stephen M. Stigler, a professor of statistics at the University of Chicago.

Mr. Stigler was invited by the NRC to review early drafts of the methodology guide, and he began to circulate criticisms privately among colleagues in the summer of 2009. This week he posted a public critique of the NRC study on his university's Web site. That statement's bottom line: "Little credence should be given" to the NRC's ranges of rankings.

"Their measures don't distinguish among programs—or at least they don't distinguish in the way that people expect for a study like this," says Mr. Stigler, who is the author of Statistics on the Table: The History of Statistical Concepts and Methods (Harvard University Press, 1999). "Are we No. 10 or No. 15 or No. 20? There's not very much real information about quality in the simple measures they've got."

One of Mr. Stigler's chief concerns is the way the NRC gathered data for its R-rankings. (For a detailed explanation of the NRC's S-rankings and R-rankings, see a list of Frequently Asked Questions.)

To construct its R-rankings, the NRC surveyed faculty members for their opinions (on a scale of 1 to 6) of a sample of programs in each field. But those samples were generally not very large. In many cases, fewer than half of the programs in a field were included. In psychology, only 50 out of the 237 doctoral programs were included.

The NRC project's directors say that those small samples are not a problem, because the reputational scores were not converted directly into program assessments. Instead, the scores were used to develop a profile of the kinds of traits that faculty members value in doctoral programs in their field. That profile was then used to assess all programs in the field.

So if the reputational scores implied, for example, that faculty members in sociology admired large, ethnically diverse programs with high GRE scores, then all sociology programs that had those traits tended to do well in the NRC's R-rankings.

That system is fine in theory, Mr. Stigler says. But he strongly believes that all programs should have been included in the underlying reputational surveys.

For one thing, Mr. Stigler says, the relationships between programs' reputations and the various program traits are probably not simple and linear. Take GRE scores, for example. If students' average GRE scores fall below a certain level, that might be associated with steep drops in programs' reputations. Or, at the other end of the quality scale, reputations might spike sharply in cases where faculty members' publication-citation rates are above some threshold. In other words, if these correlations between reputation and citations were plotted on a graph, the most accurate representation would be a curved line, not a straight line. (The curve would occur at the tipping point where high citation levels make reputations go sky-high.)

But it is impossible to have an accurate picture of those nonlinear relationships, Mr. Stigler says, if only a small minority of programs were included in the reputational survey.

And that means in turn, according to Mr. Stigler, that the accuracy of each program's R-ranking depends on whether it was actually included in the reputational survey. If a program was not in the survey, then its R-ranking is based on weights that have "potentially much greater errors than those for other programs," he wrote in a privately circulated critique last year.

The NRC should disclose which programs were and were not included in those reputational surveys, Mr. Stigler says. But there are no plans to do so, the project's committee chairman, Jeremiah P. Ostriker, said in a recent interview.

A 'Paradox'

Mr. Stigler is not much happier with the S-rankings. Those rankings are based on surveys where faculty members were asked directly about which traits are most important to the quality of doctoral programs in their fields.

Many doctoral programs have S-ranking ranges that are very wide. For example, a program's S-ranking range might be 3-18, meaning that the NRC is 90 percent confident that its "true" S-rank is between 3 and 18. The wide breadth there is based partly on variations in how faculty members weighted the traits on the surveys.

But in an analysis done this week, Mr. Stigler noticed that in most fields, there actually wasn't much variation in how faculty members weighted the various traits. For most traits in most fields, the range of faculty weights is very small.

That presents a paradox, Mr. Stigler says. Tiny differences in faculty weights lead to huge swings in programs' S-rankings. How can that be?

The answer, Mr. Stigler argues, is that the variables in the NRC's study actually aren't very good at making distinctions among programs, especially among programs that are clustered in the middle of the quality spectrum.

"Their measures, for most of the body of programs, are unable to distinguish between programs," Mr. Stigler says. "They can roughly distinguish, I suppose, between what is in the top half and the lower half of the nation, which is not a major feat."

Mr. Stigler says that it was a mistake for the NRC to so thoroughly abandon the reputational measures it used in its previous doctoral studies, in 1982 and 1995. Reputational surveys are widely criticized, he says, but they do provide a check on certain kinds of qualitative measures. When the new NRC counts faculty publication rates, it does not offer any information about whether scholars in the field believe those publications are any good. (That's especially true in humanities fields, where the NRC report does not include citation counts.)

"Everybody involved in this was trying hard, and with good intentions and high integrity," Mr. Stigler says. "But once they decided to rule out reputation, they cut off what I consider to be the most useful measure from all past surveys."

In an e-mail message to The Chronicle this week, Mr. Ostriker declined to reply to Mr. Stigler's specific statistical criticisms. But he pointed out that the National Academies explicitly instructed his committee not to use reputational measures.

"Many other groups have collected reputationally based ratings and rankings in the past and continue to do so," Mr. Ostriker said. "I can see the virtue in such efforts, but it was not our task to do this."


1. 11221722 - September 30, 2010 at 05:02 pm

"Tiny differences in faculty weights lead to huge swings in programs' S-rankings. How can that be?"

Well, the NRC is also perturbing the data between iterations, so the S rankings end up being driven by that. In fact, they are probably driven by the pertubations in your own program's data, since the mean and std dev (other factors building the standardized score) should not vary that much between iterations.

The R weights go nuts between iterations, so that tends to drive the ranking range there. At least with the S rankings you know what is an important measure ratings wise...

Compare those .05 and .95 info sheets (also as compared to your variables sheet). The weights jump for the R rankings and the standardized variables jump for the S rankings.

2. dnewton137 - September 30, 2010 at 05:32 pm

With all due regard for Professor Stigler's professional and academic distinction, I asked my favorite statistician about the Committee's methods, considering all the circumstances of the Committee's challenges. She thought the Committee's work was pretty good. (She's a PhD in mathematics, and a practicing biostatistician in what amounts to a School of Public Health. She's also my wife.)

As for the importance of reputational rankings: Back in the seventies I was essentially a graduate dean. The NRC did a pilot study of three disciplines in twenty-five institutions, one of them mine. I received a massive data print-out, from which I learned that my institution's clinical psychology program was judged by the faculties of the other institutions in the study to be one of the twenty best in the country. I found that interesting, because my institution had no clinical psychology program.

I think that's comparable with the common opinion that Princeton University's Law School is one of the nation's best!

3. trterry - September 30, 2010 at 09:49 pm

I have a friend who when faced with chosing a medical school, due some family obligations, only had two schools that he could realisticly attend. The one with the better reputation was farther away and would put a greater strain on his family. I asked how he chose. He said that made sure that if he graduated from either one his diploma would have "MD" after his name and chose the closer school.

If it grants the degree it should be in the study.

T R Terry, Jr.

4. raymond_j_ritchie - October 01, 2010 at 04:19 am

Ranking things that are inherently unrankable is a particular form of American silliness. I am sure that soon enough we in Australia will have to put up with millions of bucks being wasted on a similar effort in Australia. We are already suffering journal rankings and scoring people on how many papers they publish in A+, A, B, C etc class journals based on citation rate. Predictably enough, academics who publish in biomedical journals get big scores. Palaeontologists do not fare well.

The suggestion Glenn makes that there are deep problems in the doctoral rankings schemes is wrong. The problems are not deep at all. The whole exercise is wrong headed. The world is full of Oxford/Cambridge DPhils and Ivy League PhDs who have never had an idea of their own in their lives and would not know what to do with one if they stepped on one in the street. Look at the number of PhDs who never publish anything out of their PhD.

It is the merit of the job candidate themselves and perhaps the reputation of their supervisor, not the place they came from that is important.

This ranking scheme will be used for mischief. I am sure it will be used in screening job applicants and as an excuse for denying tenure to some poor bastard who for some reason they want to get rid of.

5. dank48 - October 01, 2010 at 08:31 am

"Lies, damned lies, and statistics."--Mark Twain

6. ksledge - October 01, 2010 at 08:36 am

"They can roughly distinguish, I suppose, between what is in the top half and the lower half of the nation, which is not a major feat."

Yeah, but given how personalized the PhD education is (it depends on your specific sub-sub-field, your graduate advisor, etc), I think that's about all people can hope for. I tell students all of the time that they should not look at rankings except to give them a list of schools to consider looking into applying to.

7. tridaddy - October 01, 2010 at 09:07 am

Most applicants to graduate school will not be admitted to the "prestigous" schools and most know which schools they should absolutely stay away from. So what is left are all those decent quality schools in the middle. I did not attend a top notch graduate school, but I certainly got a good education that has served me well for 30 years while working at an R1 (top 100 NSF ranking). I've always told my graduate students that its not necessarily the school you attend but the sophistication of the dissertation research you do. Who would have thought that students using a fish as a model would end up at NIH or St. Jude's or other prestigious research institutions, but my students have. Rankings in general do nothing and say nothing to any real degree about the "real" quality of the education received by students attending graduate school at one of the schools that make up 60 - 70% of all graduate schools.

8. steveterrell - October 01, 2010 at 09:27 am

Dank48 - "lies, damned lies, and statistics" is often attributed to Mark Twain. It was originally said by Benjamin Disraeli. I've been teaching stats for 20 years and just found this out myself recently!

9. uiipbir - October 01, 2010 at 10:01 am

for some additional pithy quotations about statistics, and confirmation regarding the Disraeli/Twain attribution, see

10. frankschmidt - October 01, 2010 at 10:04 am

What galls some people, it appears, is that the new rankings don't allow them to say "We're number One (Or six, or 14, or 82)." The difficulty, however, is that the previous NRC rankings, like those of US News and World Report, have a considerable amount of uncertainty to them, which the simple numerical ranking does not admit to. I for one am much happier with confidence intervals as are presented in the report.

Cheer up folks, you have nothing to lose except inappropriate significant figures.

11. innocentpasserby - October 01, 2010 at 11:27 am

I can't agree, however, that this is a form of particularly American silliness, unless the Times Higher Education Supplement and Shanghai Jiao Tong University are now to be considered American. I have taught overseas for many years and find the interest in rankings to be obsessive everywhere I've been.

12. soc_sci_anon - October 01, 2010 at 12:25 pm

The new NRC rankings are a classic case of garbage in, garbage out. The measures of faculty research, for example, is just asinine. Books (in the social sciences) aren't counted at all, either in the measure of per faculty productivity or citations. Articles in crappy third-tier journals are given the same weight as articles in the flagship journals. There's no attempt to adjust for different publication and citation norms across subfields, so, for example, sociology departments that have a lot of medical sociologists (a subfield that's not at all core to the discipline, and fairly low-status) automatically ranked higher simmply because medical sociologists tend to write a lot of small articles and cite each other extensively.

In other cases, the raw data are simply wrong, whether because of poor NRC instructions or creative interpretation by universities (e.g., "external fellowships" interpreted as external to the department, not external to the university).

Not to mention the decision to have 5 of the 20 (humanities) or 21 (everything else) indicators be measures of diversity. I'm all for holding departments accountable for their diversity efforts, but find it asinine and borderline offensive to say that the percentage of Asian students or minority faculty is a net indicator of quality.

It's an epic fail. Too bad Deans and Provosts don't dig deep enough in the data to realize it.

13. dank48 - October 01, 2010 at 03:18 pm

SteveTerrell, it apparently isn't that simple. According to Wikipedia, from whom no secrets are hid, Twain attributed the phrase to Disraeli, but it's not found in Disraeli's writings. Other possibilities are considered. Nobody seems to know.

But I did learn about the variant: "Liars, damned liars, and experts." I've always been fond of someone's definition of that occupation: "An expert is just a damn fool with a briefcase and a collection of jargon five miles from home."

14. gharbisonne - October 04, 2010 at 04:32 pm

There are other, far more serious methodological problems. For example, my Department is listed as having had 54 faculty in 2006, when in fact we had 21.If you're ranking prgrams based on papers/faculty and funding/faculty, the denominator is as important as the numerator.

We have no idea where NRC got their numbers. But we're not the only ones victimized by ridiculous numbers. U Utah Chemistry, with 32 actual faculty, are listed as having 112! U Michigan Ann Arbor Chemistry, with 57, are listed as having 174.

Add Your Comment

Commenting is closed.

subscribe today

Get the insight you need for success in academe.