Food, Bias, and Justice: a Case for Statistical Prediction Rules

We’re remarkably bad at making good decisions. Even when we know what goal we’re pursuing, we make mistakes predicting which actions will achieve it. Are there strategies we can use to make better policy decisions? Yes – we can gain insight by looking at cognitive science.

On the surface all we need to do is experience the world and figure out what does and doesn’t work at achieving goals (the focus of instrumental rationality). That’s why we tend to respect expert opinion: they have a lot more experience on an issue and have considered/evaluated different approaches.

Let’s take the example of deciding whether or not to grant prisoners parole. If the goal is to reduce repeat offenses, we tend to trust a panel of expert judges who evaluate the case and use their subjective opinion. They’ll do a good job, or at least as good a job as anyone else, right? Well… that’s the problem: everyone does a pretty bad job. Quite frankly, even experts’ decision-making is influenced by factors that are unrelated to the matter at hand. Ed Yong calls attention to a fascinating study which finds that a prisoner’s chance of being granted parole is strongly influenced by when their case is heard in relation to the judges’ snack breaks:

The graph is dramatic. It shows that the odds that prisoners will be successfully paroled start off fairly high at around 65% and quickly plummet to nothing over a few hours (although, see footnote). After the judges have returned from their breaks, the odds abruptly climb back up to 65%, before resuming their downward slide. A prisoner’s fate could hinge upon the point in the day when their case is heard.

Curse our fleshy bodies and their need for “Food” and “breaks”! It’s obviously a problem that human judgment is influenced by irrelevant, quasi-random factors. How can we counteract those effects?

Statistical Prediction Rules do better

Fortunately, we have science and statistics to help. We can objectively record evidential cues, look at the resulting target property, and find correlations. Over time, we can build an objective model, meat-brain limitations out of the way.

This was the advice of Bishop and Trout in “Epistemology and the Psychology of Human Judgment“, an excellent book recommended by Luke Muehlhauser of Common Sense Atheism (and a frequent contributor to Less Wrong).

Bishop and Trout argued that we should use such Statistical Prediction Rules (SPRs) far more often than we do. Not only are they faster, it turns out they’re more trustworthy: Using the same amount of information (or often less) a simple mathematical model consistently out-performs expert opinion.

They point out that when Grove and Meehl did a survey of 136 different studies comparing an SPR to the expert opinion, they found that “64 clearly favored the SPR, 64 showed approximately equivalent accuracy, and 8 clearly favored the clinician.” The target properties the studies were predicting varied from medical diagnoses to academic performance to – yup – parole violation and violence.

So based on some cues, a Statistical Prediction Rule would probably give a better prediction than the judges on whether a prisoner will break parole or commit a crime. And they’d do it very quickly – just by putting the numbers into an equation! So all we need to do is show the judges the SPRs and they’ll save time and do a better job, right? Well, not so much.
Read more and comment:

“More like OKStupid, amirite?”

That was the subject line of an email my friend James sent me yesterday. His email contained a link to this post by OK Cupid’s blog, where the OKC team sifts through their massive amounts of data to find interesting facts about people’s dating habits.

This latest post is called “The Mathematics of Beauty” and it purports to reveal a startling finding: women whose looks inspire a lot of disagreement among men (i.e., with some men rating them hot and others rating them ugly) get more messages. And the number of messages you receive is positively correlated with the number of men rating you a “5 out of 5,” but is negatively correlated with the number of men rating you a “4 out of 5.”  OK Cupid says, “This is a pretty crazy result, but every time we ran the numbers—changing the constraints, trying different data samples, and so on—it came back to stare us in the face.”

To explain these odd results, the OKCupid bloggers came up with two game theoretic stories: First, men who see a woman and think “She’s a 4” will also think “That’s cute enough for plenty of other men to be into her, so I’ll have lots of competition… but that’s not hot enough for it to be worth it for me to try anyway.” And second, if men think, “She’s really hot to me, but I bet other men will disagree,” they’ll be more likely to message her, because they expect less competition. So women with a polarizing look will turn off some men, but the men who are turned on will be even more likely to message her knowing that other men are turned off.

Based on these stories, OKCupid offers the following advice to its female users who want to get more messages from men:

“We now have mathematical evidence that minimizing your “flaws” is the opposite of what you should do. If you’re a little chubby, play it up. If you have a big nose, play it up. If you have a weird snaggletooth, play it up: statistically, the guys who don’t like it can only help you, and the ones who do like it will be all the more excited.”

Oh my. That sounds like really bad advice. Before people start enthusiastically pointing the camera at their fat rolls, maybe we should check and make sure this analysis is sound. Because my opinion is that OKCupid’s crazy results can easily be explained by much less counterintuitive stories than the ones they concoct.

First of all, the “attractiveness” ratings they’re using aren’t really attractiveness ratings. They come from a feature on the site called Quickmatch, which presents you with the profile pictures of a succession of people for you to rate from 1 to 5. But you’re free to click through to each person’s full profile. And if you like the way they present themselves through the written part of the profile, you might well rate them highly on Quickmatch; conversely, if you don’t like their written profiles, you might well rate them poorly. Treating those scores as pure “attractiveness” ratings is way off the mark.

Second of all, the way Quickmatch works is that if you rate someone a 4 or 5 and they similarly rate you a 4 or 5, then you both receive emails informing you of each other’s interest. So this data is even more tainted, because people are not simply thinking “How attractive is this person?” — they’re thinking “Do I want this person to contact me?” If you think someone’s not that attractive but you’d still want to date her, you might well rate her a 4 just in case she’s also interested in you.

In fact, I strongly suspect there are a lot of guys who just rate every single girl a 4 or 5, giving 5’s to the girls they think are good-looking and 4’s to everyone else. It’s a carpet-bombing strategy — why rule anyone out off the bat? (My suspicion is grounded in some results from a speed-dating study I worked on in college, with a psychology professor at Columbia; I got to look at the ratings sheets after each speed dating session, and there were plenty of guys who just circled the entire row of “YES” rather than circling YES or NO to each girl individually.)

And as you can imagine, if a lot of guys are using “4” to mean “anyone who’s not a 5,” then of course 4’s are going to be negatively correlated with the number of messages a girl gets, because many or most of those 4’s actually indicate 1’s, 2’s, and 3’s.

What I think the OKCupid blog post illustrates is how easy it is to come up with a story to explain any result, whether or not the result is real. To paraphrase my friend James for a minute: if you find yourself saying “I know this is crazy, but numbers don’t lie,” you should really calm down and check to see if you’ve made a mistake, because chances are, you have.

%d bloggers like this: