How Not to Measure What Matters

When KPIs Bite The Hand That Feeds

Dec 02, 2020

“When a measure becomes a metric, it ceases to be a good measure”.

I’d argue that the most fundamental insight into use of applied statistics in management, whether in the public or private sector, is the quip above. It’s generally called “Goodhart’s Law” [Wikipedia] after economist Charles Goodhart’s rather less pithy and far more British-y original formulation: “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

It’s one of my favorite ideas, because it so neatly captures the paradox of trying to use data and statistics to manage organizations or complex processes. It’s never just about picking the right metric, but about thinking (at least) two steps down the road about how the organization or process you’re trying to manage will adapt to your choice. And it shows up everywhere, from your daily life to the nation’s response to Covid-19.

The Long Paw of the Law

You’ll pick up on this idea pretty easily if you’ve ever tried training a dog. As it so happens, the subject is on my mind quite a bit because we recently got a dog. Her name is Clio and she’s perfect [Citation Needed].

We took her to an obedience class, where the trainer Luis had a pithy saying of his own to rival Mr. Goodhart: “dogs are gamblers”. Clio is extremely food-motivated, and will take any actions that she thinks materially increases her chances of getting a treat. This is the fundamental mechanism for training a dog: show them that desirable actions have a chance of paying out via food, affection, or even just attention and they’ll do more of it. Sit when we say sit, she gets paid. Lie down when we say down, she gets paid. And we pay well (she takes direct deposit of chunks of dehydrated liver).

This same behavior has some extremely unintuitive and perverse consequences. Comfort a dog when they whine, or give them a treat to calm down, and they draw the same conclusion: do X, get paid. You can easily and unintentionally train them to create more whining, barking, or bad behavior of any stripe. This training and reactivity is a marvelous little furry demonstration of Goodhart’s Law: one may easily observe the correlation of treat-giving leading to less whining, and yet applying more and more treats can easily create more whining. When you try to use that treat => calming relationship for “control purposes”, the observed statistical regularity collapses.

Congratulations, you now know more about fighting Covid-19 than the President.

Cracking the Cases

The President has, dozens or perhaps even hundreds of times, made some comment along the lines of this:

“Cases are up because we have the best testing in the world and we have the most testing.” [Fox News]

You may not have realized the President is a sophisticated thinker on metrics and KPIs, but he is in fact trying to teach us a profound lesson in using metrics for management. One may easily observe that the availability of Covid tests predict more confirmed Covid cases, and confirmed Covid cases predict Covid deaths. And so the obvious natural inference is that if testing is decreased, then so too will Covid cases and deaths.

This obvious, natural inference appears to be why the White House moved to discourage testing of potential Covid exposures [Vox]. This is a textbook Goodhart’s Law error: the White House took an easily observed predictor (positive Covid tests) of an outcome (Covid deaths) and tried to move the outcome by manipulating the predictor. As the currently-exploding Covid pandemic makes clear, it doesn’t work like that.

Once you start looking for Goodhart’s Law, you realize it is pervasive in outcome-driven organizations of all sorts. Everywhere from infectious disease management to business to education, once people start targeting an easily-measurable metric they figure out ways to game it rather than drive the outcome it’s attempting to measure. And in doing so, the metric that was once a good measurement comes to diverge further and further from what it used to predict.

One followup from my recent entry about the seductive myth of “productivity scores” [Standard Errors]: I missed that Microsoft just announced that this feature will now be part of Microsoft Office [Microsoft]. Diligent readers may be able to guess my reaction, but to make it explicit: I’m not a fan. For reasons alluded to in my Goodhart’s Law discussion above, I think managers may be successful at driving charts up and to the right without achieving the productivity nirvana of which they dream.

Standard Errors

How Not to Measure What Matters

When KPIs Bite The Hand That Feeds

Discussion about this post