Calibrating our Confidence

Jesse Galef

13 years ago

It’s one thing to know how confident we are in our beliefs, it’s another to know how confident we should be. Sure, the de Finetti’s Game thought experiment gives us a way to put a number on our confidence – quantifying how likely we feel we are to be right. But we still need to learn to calibrate that sense of confidence with the results. Are we appropriately confident?

Taken at face value, if we express 90% confidence 100 times, we expect to be proven wrong an average of 10 times. But very few people take the time to see whether that’s the case. We can’t trust our memories on this, as we’re probably more likely to remember our accurate predictions and forget all the offhand predictions that fell flat. If want to get an accurate sense of how well we’ve calibrated our confidence, we need a better way to track it.

Well, here’s a way: PredictionBook.com. While working on my last post, I stumbled on this nifty project. Its homepage features the words “How Sure Are You?” and “Find out just how sure you should be, and get better at being only as sure as the facts justify.” Sounds perfect, right?

It allows you to enter your prediction, how confident you are, and when the answer will be known. When the time comes, you record whether or not you were right and it tracks your aggregate stats. Your predictions can be private or public – if they’re public, other people can weigh in with their own confidence levels and see how accurate you’ve been.

(This site isn’t new to rationalists: Eliezer and the LessWrong community noticed it a couple years ago, and LessWrong’er Gwern has been using it to – among other things – track inTrade predictions.)

Since I don’t know who’s using the site and how, I don’t know how seriously to take the following numbers. So take this chart with a heaping dose of salt. But I’m not surprised that the confidences entered are higher than the likelihood of being right:

Predicted Certainty	50%	60%	70%	80%	90%	100%	Total
Actual Certainty	37%	52%	58%	70%	79%	81%
Sample Size	350	544	561	558	709	219	2941

Sometimes the miscalibration matters more than others. In Mistakes Were Made (but not by me), Tavris and Aronson describe the overconfidence police interrogators feel about their ability to discern honest denials from false ones. In one study, researchers selected videos of police officers interviewing suspects who were denying a crime – some innocent and some guilty.

Kassin and Fong asked forty-four professional detectives in Florida and Ontario, Canada, to watch the tapes. These professionals averaged nearly fourteen years of experience each, and two-thirds had ha special training, many in the Reid Technique. Like the students [in a similar study], they did no better than chance, yet they were convinced that their accuracy rate was close to 100 percent. Their experience and training did not improve their performance. Their experience and training simply increased their belief that it did.

As a result, more people are falsely imprisoned as prosecutors steadfastly pursue convictions for people they’re sure are guilty. This is a case in which poor calibration does real harm.

Of course, it’s often a more benign issue. Since finding PredictionBook, I see everything as a prediction to be measured. A coworker and I were just discussing plans to have a group dinner, and had the following conversation (almost word for word):

Her: How to you feel about squash?”
Me: “I’m uncertain about squash…”
Her: “What about sauteed in butter and garlic?”
Me: “That has potential. My estimation of liking it just went up slightly.”
*Runs off to enter prediction*

I’ve already started making predictions in hopes that tracking my calibration errors will help me correct them. I wish Prediction Book had tags – it would be fascinating (and helpful!) to know that I’m particularly prone to misjudge whether I’ll like foods or that I’m especially well-calibrated at predicting the winner of sports games.

And yes, I will be using PredictionBook on football this season. Every week I’ll try to predict the winners and losers, and see whether my confidence is well-placed. Honestly, I expect to see some homer-bias and have too much confidence in the Ravens. Isn’t exposing irrationality fun?

Like this? Share it!