It was the early 1980s, and the nation’s health-care system was in crisis. Costs were soaring, and hospitals and other providers were being blamed for the increase. Policy makers were questioning the quality of the nation’s health care, and the medical profession had lost some of its political clout.
In an effort to rein in costs and improve outcomes, the federal government began compiling data on hospitals’ mortality rates and sharing them with contractors that were reviewing the medical necessity and quality of care provided to Medicare beneficiaries.
In 1986, a New York Times reporter filed a Freedom of Information Act request seeking those data, and the government reluctantly released them. The report identified hospitals with higher and lower patient-mortality rates and summarized each institution’s overall performance.
By the early 1990s, the annual report had become a 55-volume publication. But the reports had also become a political and professional embarrassment, criticized by industry executives and scholars alike for their statistical shortcomings. After just eight years, the first federal rating of hospitals was abandoned. Washington didn’t rejoin the ratings game for five years, when the Centers for Medicare and Medicaid Services began publishing Nursing Home Compare, the first of its six health-care comparison sites.
Fast forward 15 years, and the nation’s colleges are in crisis. Costs are skyrocketing, and lawmakers are blaming the colleges. Policy makers are raising doubts about the quality of the nation’s higher-education system, and President Obama is crafting a controversial ratings system that aims to capture the value of a college degree. For Darrell G. Kirch, president of the Association of American Medical Colleges, there’s a sense of “massive déjà vu.”
“You have these two very important public goods, and we all want them to be of the best quality,” he says.
But rating large institutions isn’t easy, and Dr. Kirch and academic researchers who have studied the health-care ratings say the administration should be mindful of the pitfalls. Here are the top five lessons they say the Education Department can learn from the experiences of the Centers for Medicare and Medicaid Services, or CMS.
1. What You Measure Matters
These days, CMS offers six health-care-comparison websites, covering hospitals, doctors, home health-care services, dialysis facilities, and health plans. Much of the data they provide are pulled from claims that providers submit to the federal government for Medicare reimbursement, or are culled from existing databases. The website Nursing Home Compare, for example, draws from the agency’s health-inspections database and its “minimum data set,” which includes assessments of the health of all residents at federally supported nursing homes.
While the reliance on existing data alleviates the collection burden for providers, it has some drawbacks. Claims provide only limited information on patient conditions, making it harder to adjust the ratings for risk. Quality measures are often self-reported, and they represent only a fraction of the conditions and outcomes that matter to prospective patients.
In part, that’s because in health care, as in education, not all outcomes are easily measured, says Helen Burstin, senior vice president for performance measures at the National Quality Forum, a nonprofit group that convenes panels to evaluate health-care measures for use by states, the federal government, and private organizations. To capture the intangibles, the sector is increasingly turning to “patient-reported outcomes,” the industry equivalent of alumni-satisfaction surveys, Dr. Burstin says.
“Sometimes we look at things that are quantitative because they’re easier to measure,” she says. “But they may not measure what really matters.”
At the same time, there’s often a tension between “waiting for what’s perfect and starting to measure something,” she says. Those trade-offs sometimes lead to disagreements between scientists and patients on the forum’s panels over whether to endorse a measure, she says.
Even so, there are fewer gaps in federal health-care data than in federal education data, says Dana B. Mukamel, a professor of medicine at the University of California at Irvine, who testified at the Education Department’s recent ratings forum. She was surprised to learn, at the forum, that federal law prohibits the department from creating a “unit-record” database to track individual students.
“This is really going to frustrate the objectives of quality reporting,” she warns. “Hopefully cooler heads will prevail.”
2. Risk Adjustment Isn’t Easy—but It Is Essential
Most health-care comparisons adjust for risk, weighing patient-related factors when evaluating providers. The amount of adjustment varies by site, but the majority account only for the severity of illness and coexisting conditions, not for patient demographics.
That approach is consistent with the policies of the National Quality Forum, which advises against adjusting for demographic disparities. The concern is that doing so could hide differences in outcomes and make weaker providers complacent. Instead, the forum recommends that outcomes be “stratified,” or calculated separately by sociodemographic factors, such as income, race, and education level.
But some patients, providers, and policy makers say the current protocol is unfair to hospitals that serve disadvantaged patients, and that it will exacerbate disparities in care. They warn that cutting funds to poorly rated “safety net” providers will leave them with fewer resources to treat disadvantaged patients or force the providers to abandon their mission altogether. That warning resonates among colleges that serve disadvantaged students.
Responding to those concerns, the National Quality Forum issued a draft report last month recommending that measures that are “influenced by factors other than the quality of care provided” be adjusted to account for sociodemographic factors when used for accountability purposes.
The change was driven by two new federal accountability measures that penalize providers with high rates of readmission and high levels of spending per Medicare beneficiary. In general, hospitals that serve large numbers of disadvantaged patients tend to perform worse on both measures.
The question, says Andrew M. Ryan, an associate professor of health-care policy and research at Weill Cornell Medical College, a Cornell University affiliate, is: “Is it the hospital’s fault?”
As an alternative, Mr. Ryan suggests grouping like providers together, then distributing awards and penalties within the peer group. That approach has been endorsed by MedPac, an independent agency that advises Congress on Medicare issues. Mr. Obama has promised to take a similar approach to his college ratings.
The challenge for both health care and higher education will be deciding who is assigned to each group, Mr. Ryan says.
“There’s no way you can come up with a grouping that will satisfy everyone,” he says.
3. There Are Trade-offs Between Simplicity and Completeness
When Nursing Home Compare debuted, in 1998, it compared facilities using a series of discrete measures. A decade later, in an effort to make the website more user-friendly, the Centers for Medicare and Medicaid Services added a five-star rating system.
In the five years since the star system took effect, the proportion of facilities receiving four or five stars has risen, suggesting that nursing homes are responding to the ratings. Patients say they prefer the new system, and there’s some evidence that they are more likely to use it to pick a nursing home, says Rachel Werner, an associate professor of medicine at the University of Pennsylvania who is studying the shift.
The problem with star ratings and other “composite” scores, researchers say, is that quality is multidimensional, and the dimensions don’t often correlate, or “scale.” So a nursing home that excels in dealing with depression may be less effective in treating pressure sores, just as a university may have high graduation rates but poor job-placement rates. Composite scores mask strengths and weaknesses in given areas in favor of a global, “big picture” assessment. The result is simpler, but also less nuanced.
“The main drawback is that you lose a lot of detail,” says Dr. Werner. “If a consumer is looking for a nursing home that is good at one thing, it’s much harder to find that information.”
Another drawback of composite scores is that they rely on the value judgments of experts. Nursing Home Compare, for example, places the greatest weight on hospital inspections, adding or subtracting a single star each if the staffing and quality ratings are very high or very low. The index includes only half of the quality measures the site reports, and it averages outcomes across long-term and acute-care facilities.
Ms. Mukamel says the solution lies in customized ratings, in which patients (or prospective students) can assign their own weights to the measures, based on personal preferences. She is now testing that idea in a randomized control trial at Irvine’s hospital.
4. Ratings Work—in Intended and Unintended Ways
The goal of all ratings systems, regardless of the sector, is to motivate providers to improve and to help consumers make informed decisions.
In health care, there is some evidence that ratings have guided at least some patients to higher-rated providers. Dr. Werner found, for example, that higher-risk patients were more likely to choose high-quality nursing homes after consulting Nursing Home Compare. And Ms. Mukamel found that patients seeking a cardiac surgeon were half as likely to rely on “implicit measures” of quality, such as cost and experience, after New York State began publishing its surgeons’ mortality rates.
But Ms. Mukamel says there is just as much evidence that ratings don’t work, and Dr. Werner says Hospital Compare, which provides comprehensive quality data on hospitals, is “rarely used by consumers, as far as we can tell.” Dr. Kirch says that when he asked 100 college presidents at the American Council on Education annual meeting last month if any of them had used Hospital Compare to choose a facility, only one hand went up.
He says that despite the popularity of ratings in general, most patients still turn to “trusted sources,” such as their primary-care physicians, for help with health-care decisions. He worries that the multitude of ratings has confused patients and created “more noise than signal” for the providers themselves.
When it comes to quality improvement, however, it may not really matter if consumers are using the ratings or not. If providers believe that consumers are paying attention, they will respond, “functionally or dysfunctionally,” says David L. Weimer, a professor of political economy at the University of Wisconsin at Madison. His research, and others’, has found evidence of both positive and negative outcomes, with some providers improving their care and others simply shifting resources around.
In the case of Nursing Home Compare, some facilities appear to be “teaching to the test,” moving money and resources from unmeasured areas to measured ones. There’s less evidence that providers are engaging in “cream skimming,” dropping riskier patients, though some studies suggest that is happening with other rating systems.
Mr. Weimer urges the Education Department to consider ways it can maximize the “functional” responses its rating system elicits, while mitigating the “dysfunctional” ones.
“You have to anticipate how this can be gamed,” he says.
5. Pay for Performance Doesn’t Pay (at Least Not Yet)
The most controversial piece of President Obama’s college-ratings plan is his proposal to tie a portion of student aid to the ratings. Under his plan, which Congress must approve, students attending higher-rated institutions could obtain larger Pell Grants and more-affordable loans.
The federal government has been testing the idea of performance-based pay in health care for several years, offering Medicare bonuses to high-performing providers. Results from the early experiments have been mixed, with some studies suggesting that the money changed provider behavior and others suggesting that it didn’t.
But it was not until Congress passed the president’s Affordable Care Act, in 2010, that performance-based pay went mainstream, says Mr. Ryan, of Cornell. Under the sweeping health-care-reform law, all Medicare payments to hospitals and physicians will eventually depend, in part, on metrics of quality and efficiency.
Conceptually, the idea of paying performers on the basis of value, rather than volume, is compelling, Mr. Ryan says. But research shows it’s very hard to get right. If the reward or penalty is token, providers may not respond. But make it too big, and it could send struggling providers into a death spiral.
In response to this concern, lawmakers structured the hospital payment system to reward both high performance and improvement. Under the system, some hospitals gain money, and some lose it.
So far, the incentives don’t seem to be having much of an effect. Mr. Ryan studied the program in its first year and found that hospitals focusing on poorer patients did indeed get smaller bonuses, as feared. But because the incentives were small, he concluded that they were unlikely to deepen disparities in the short term. However, he worries that as the incentives grow larger, the quality of care could deteriorate at some hospitals.
Mr. Ryan suggests that the government revise its payment criteria to give greater weight to improvement over achievement. He acknowledges that will be “a tough task.”
Policy makers, he says, still don’t know how best to structure performance-based pay systems. So they rely on trial and error, “gradually developing an evidence base” as they go.
“We’re rolling out these national programs without a clear idea that this is the best way to do it,” he says.
How a Health-Care Rating System Might Translate
The federal government is no newcomer to the ratings game; its Centers for Medicare and Medicaid Services has been rating health-care providers online for more than 15 years. Nursing Home Compare, the oldest of its six sites, assigns facilities one to five stars based on three separate measures. Here’s what the model might look like if it were applied to higher education.
Nursing-home measures | Higher-education equivalent |
Health-inspections rating: based on the three most recent annual inspections, and on inspections prompted by complaints in the past three years. Greater emphasis given to recent inspections. | Compliance rating: based on the findings of federal program reviews, accreditor visits, and state and federal audits and investigations. |
Quality-measures rating: combines values on nine quality measures, such as the percentage of long-stay residents experiencing falls, urinary-tract infections, pressure ulcers, and pain. | Quality-measures rating: based on retention and graduation rates and job-placement rates; or, negatively constructed, on dropout and unemployment rates. |
Staffing rating: based on nurse/patient ratios and total hours of care per day. | Staffing rating: based on student/faculty ratios and staffing levels. |