by

## What Sports Can Teach About the ‘Cautionary Side’ of Big Data

Steve Hirdt

New York — When college officials talk about using “Big Data” to improve higher education—the focus of a SUNY conference here this week—they often draw an analogy to Moneyball. The movie recounts how Billy Beane, the Oakland A’s general manager, revived his ailing baseball team by analyzing data in new ways.

So what might sports teach higher education about data mining? In academe the stakes are higher than in baseball, but progress toward making good use of data has been uneven. Nonetheless, colleges are busy mining students’ data trails to build software that does things like suggest what mathematics problems they should work on or even what classes they should take.

During a panel on Wednesday about the “cautionary side” of Big Data, colleges got some insight from Steve Hirdt, a 45-year sports-data veteran who is executive vice president at the Elias Sports Bureau, the official statistician to the major North American professional sports leagues. Elias records game statistics—hits in baseball, yards gained in football, points scored in basketball, etc.—and supplies data to teams and news-media clients. When you watch Monday Night Football, Mr. Hirdt is the guy off camera feeding the announcer facts like “Seattle 135 yards: fewest for a winning team in the NFL in the last three years.”

Mr. Hirdt drew on his football and baseball data experience to give colleges two main warnings:

First off, what you initially find in a given data set may turn out to be flat-out wrong upon closer scrutiny. In professional football, for example, a lot of early analysis looked at the role of running, Mr. Hirdt explained in an interview with The Chronicle. The statistics sheets of winning teams would show that they had run the ball, say, 40 times, and passed it 25 times. Aha! Running is the key! “That simple principle—you have to run to win—was so ingrained in a generation of football coaches based on an early look at the data,” Mr. Hirdt said.

The reality was different. In football, Mr. Hirdt noted, if you run the ball, the game clock generally keeps going. If you pass the ball, the clock stops for every incomplete throw. Teams that get ahead, Mr. Hirdt said, run the ball toward the end of games in order to use up the remaining time. If you compare stats from the first half of games, before the time remaining becomes so important, the result is entirely different from the old running-equals-victory dogma, Mr. Hirdt said. “You can see then that the teams are achieving their lead through passing,” he said, “and they’re just accumulating more running plays at the end, when they’re just protecting their lead.”

“A wrong conclusion from a cursory look—to me that’s the real cautionary side of Big Data,” Mr. Hirdt said. “If Big Data is going to amplify the possibilities for misapplication, as well as the possibilities for application, we might be in for a little bit of a rocky road.”

Mr. Hirdt’s second warning: Beware of basing decisions on averages.

He illustrated that point with a story from baseball. In 2006 the New York Mets and the St. Louis Cardinals were down to one playoff game that would determine which team would go to the World Series. The Cardinals had a 2-0 lead going into the bottom of the ninth inning, with the dregs of the Mets’ lineup on deck. The first two Mets batters got hits. Now there were runners on first and second with nobody out. When computers first came into baseball, this kind of scenario was one of the first questions tackled: Is the batter better off sacrifice-bunting? Or swinging away? Data showed that, in general, you’re better off hitting away.

But a slew of factors made this situation different from the average case. It was the ninth inning of the last game of the playoffs. If the Mets didn’t score two runs, they were toast. One reason to swing away, in a typical situation, is the chance to get a big inning with a lot of runs. But that wasn’t even a possibility in the bottom of the ninth inning because three runs would end the game with a walk-off win. In an average game, moreover, the guys who got the two hits would have been two of the team’s better players. Here, the hits came from the seventh- and eighth-place hitters. So whatever the ninth batter did, the good hitters at the top of the lineup would soon come up. What’s more, the Mets’ had an adept bunter available to pinch-hit in the ninth slot, Tom Glavine.

But the Mets followed the conventional strategy. The batter swung away. And they lost.

Mr. Hirdt’s point: Nobody faces an average situation. Yes, knowing the average is a useful guidepost. But people must deal with specific situations, with immediate circumstances that must be brought to bear on decisions.

“It always stuck with me, that the specific sometimes can overwhelm the overall average,” he said. “But are people predisposed to think in terms of, Well, I’ll cover myself by staying with the average?”

With the Red Sox and Cardinals set to play Game 6 of the World Series on Wednesday night, the question that struck one audience member was what outcome Mr. Hirdt predicted: “Sox in six or Sox in seven?”

“I think it’ll be seven,” he said.