Calibrating our Confidence

It’s one thing to know how confident we are in our beliefs, it’s another to know how confident we should be. Sure, the de Finetti’s Game thought experiment gives us a way to put a number on our confidence – quantifying how likely we feel we are to be right. But we still need to learn to calibrate that sense of confidence with the results.  Are we appropriately confident?

Taken at face value, if we express 90% confidence 100 times, we expect to be proven wrong an average of 10 times. But very few people take the time to see whether that’s the case. We can’t trust our memories on this, as we’re probably more likely to remember our accurate predictions and forget all the offhand predictions that fell flat. If want to get an accurate sense of how well we’ve calibrated our confidence, we need a better way to track it.

Well, here’s a way: While working on my last post, I stumbled on this nifty project. Its homepage features the words “How Sure Are You?” and “Find out just how sure you should be, and get better at being only as sure as the facts justify.” Sounds perfect, right?

It allows you to enter your prediction, how confident you are, and when the answer will be known.  When the time comes, you record whether or not you were right and it tracks your aggregate stats.  Your predictions can be private or public – if they’re public, other people can weigh in with their own confidence levels and see how accurate you’ve been.

(This site isn’t new to rationalists: Eliezer and the LessWrong community noticed it a couple years ago, and LessWrong’er Gwern has been using it to – among other things – track inTrade predictions.)

Since I don’t know who’s using the site and how, I don’t know how seriously to take the following numbers. So take this chart with a heaping dose of salt. But I’m not surprised that the confidences entered are higher than the likelihood of being right:

Predicted Certainty 50% 60% 70% 80% 90% 100% Total
Actual Certainty 37% 52% 58% 70% 79% 81%
Sample Size 350 544 561 558 709 219 2941

Sometimes the miscalibration matters more than others. In Mistakes Were Made (but not by me), Tavris and Aronson describe the overconfidence police interrogators feel about their ability to discern honest denials from false ones. In one study, researchers selected videos of police officers interviewing suspects who were denying a crime – some innocent and some guilty.

Kassin and Fong asked forty-four professional detectives in Florida and Ontario, Canada, to watch the tapes. These professionals averaged nearly fourteen years of experience each, and two-thirds had ha special training, many in the Reid Technique. Like the students [in a similar study], they did no better than chance, yet they were convinced that their accuracy rate was close to 100 percent. Their experience and training did not improve their performance. Their experience and training simply increased their belief that it did.

As a result, more people are falsely imprisoned as prosecutors steadfastly pursue convictions for people they’re sure are guilty. This is a case in which poor calibration does real harm.

Of course, it’s often a more benign issue. Since finding PredictionBook, I see everything as a prediction to be measured. A coworker and I were just discussing plans to have a group dinner, and had the following conversation (almost word for word):

Her: How to you feel about squash?”
Me: “I’m uncertain about squash…”
Her: “What about sauteed in butter and garlic?”
Me: “That has potential. My estimation of liking it just went up slightly.”
*Runs off to enter prediction*

I’ve already started making predictions in hopes that tracking my calibration errors will help me correct them. I wish Prediction Book had tags – it would be fascinating (and helpful!) to know that I’m particularly prone to misjudge whether I’ll like foods or that I’m especially well-calibrated at predicting the winner of sports games.

And yes, I will be using PredictionBook on football this season. Every week I’ll try to predict the winners and losers, and see whether my confidence is well-placed. Honestly, I expect to see some homer-bias and have too much confidence in the Ravens.  Isn’t exposing irrationality fun?

15 Responses to Calibrating our Confidence

  1. I’d be a little hesitant with the predictions about liking food. When you try the food you will know what you predicted on the website and will want your predictions there to be good, and that knowledge might cause your prediction to become more accurate. The football case seems like a better test of your predictions, since you have no way of influencing who wins a football game.

    • Jesse Galef says:

      You know, I thought of that and it’s a valid concern. I’m not sure what to do about it, but it’s still important to me to know whether or not (or to what extent) my expectations about food are accurate.

      Can you think of good ways to check on predictions about things under my control? It seems important and useful to check my expectations of “I’m sure I can finish this project in an hour” or “I’m sure I won’t enjoy going to the folk festival” in case my confidence is misplaced.

      I already think to myself “I bet I wouldn’t like this vegetable” so I wanted to start recording them? Making the prediction slightly more formal could affect things, but I’m keeping it private, not putting money or reputation on the line, and trying to embrace any evidence that I’m irrational (rather than working to “appear right”).

      Can you think of any other ways around the problem?

  2. binary says:

    u remind me of a computer

  3. I predict you will give up rather quickly on keeping track of your predictions. I further predict you will soon agree with me. Now go mark that down.

    • Jesse Galef says:

      I predict that it’s very possible. Unfortunately, it happens, then I won’t go back to PredictionBook to see that I stopped, and we’ll never know!

  4. Spencer says:

    Nice post!

  5. Ellen says:

    Makes me wonder how confident they are in their results… because if they’re 90% sure they’re correct in their findings here, they’re likely only 81% sure. 😉

  6. Max says:

    When scoring predictions, is it better to have actual certainty higher than predicted certainty rather than lower? Like, if my actual certainty is 100% regardless of my predicted certainty, is that good or bad?

  7. Max says:

    Is the 100% predicted uncertainty in the table exactly 100% or 95-100%? Notice there’s only a 2% difference in actual certainty between 90% predicted certainty and 100%. You’d think that 100% certainty would be a sure thing like the chance that an asteroid won’t destroy the Earth tomorrow.

  8. Max says:

    Interesting that 50-60% predicted certainty that something will happen means it probably won’t actually happen.

  9. Pingback: Follow Friday « War and Words

  10. Pingback: Award Time | kitchenmudge

Leave a Reply to Jesse Galef Cancel reply

%d bloggers like this: