Pulling levers, killing monsters: the lure of unpredictable rewards

Julia Galef

15 years ago

In an interesting recent TED Talk, video game designer Tom Chatfield explained how his field has honed the art of dispensing rewards at precisely the right rate to keep people hooked on the game. Chatfield used a simple quest as a case study: you kill a monster, you earn a pie, and your goal is to collect 15 pies. Simple, silly, but this is more or less the template used by a huge number of video games, if you boiled them down to their essentials.

As a game designer, you don’t want to give your player a pie every time he kills a monster. That would quickly get boring and the player would lose interest. But you also don’t want to give him pies so rarely that he gives up out of frustration. So what’s the ideal reward-rate? From looking at their data on the behavior of hundreds of millions of players, game designers have figured out that giving people a reward about 25 percent of the time will keep them playing the longest.

(As an interesting side note, Chatfield explains that the ideal reward-rate fluctuates depending on how close the player is to meeting his quota of pies: “When people get to about 13 out of 15 pies, their perception shifts, they start to get a bit bored, a bit testy. They’re not rational about probability. They think this game is unfair. It’s not giving me my last two pies. I’m going to give up.” So to compensate for players’ skewed perception of probabilities which kicks in around 13-out-of-15 pies, game developers often change the reward rate at that point, increasing it from a 25 percent chance of getting a pie after killing a monster to a 75 percent chance.)

You’ve probably heard of this phenomenon in the context of casinos, where the unpredictability of rewards is what keeps people sitting in front of slot machines, feeding in their chips and pulling a lever for hours on end. But it was originally rats, not humans, whose lever-pulling addictions revealed the power of what psychologists and animal trainers now call “variable-ratio schedule reinforcement.” Psychologist B. F. Skinner, in his 1940’s and 50’s experiments on rats, found that rats who were given a food pellet after pulling a lever a fixed number of times would quickly abandon the behavior if the pellets stopped coming on schedule. But rats whose food pellets appeared after a random number of pulls were hooked — they would continue pulling the lever even if it had been ages since they last received a pellet, always hoping that this time, they’d get their reward.

“A variable schedule is far more effective in maintaining behavior than a constant, predictable schedule of reinforcement,” wrote animal trainer Karen Pryor in Don’t Shoot the Dog: The New Art of Teaching and Training. In fact, a variable schedule can actually motivate the animal to perform their task not just longer, but better:

“If I were to give a dolphin a fish every time it jumped, very quickly the jump would become as minimal and perfunctory as the animal could get away with. If I then stopped giving fish, the dolphin would quickly stop jumping. However, once he animal had learned to jump for fish, if I were to reinforce now the first jump, then the third, and so on at random, the behavior would be much more strongly maintained; the unrewarded animal would actually jump more and more often, hoping to hit the lucky number, as it were, and the jumps might even increase in vigor.”

I’ve personally noticed the motivating appeal of unpredictable rewards when dieting. If you step on the scale every day, assuming you’re sticking to your diet, you’ll see an overall downward trend over time. But on any particular day, your weight might not be lower than the previous day’s reading — it might even be slightly higher. That’s because your weight fluctuates naturally from day to day, within a pound or two, depending on whether you’ve eaten recently, whether you’re retaining water, and so on. Even though the expected value of the decline per day is fixed as long as you stick to your diet, on any given day there’s nevertheless some noise, some uncertainty about whether you actually are going to see a lower number than you did the day before — and if you do, you get a little hit of endorphins.

Why is this better than your weight going down by a predictable amount every day? Because if you know for sure what number you’re going to see on the scale tomorrow morning, you start to take that fact for granted, and by the time tomorrow rolls around there’s no jolt of endorphins when you actually do step on the scale. Of course, there would always be the incentive that if you don’t stick to your diet you won’t see that pleasurable decrease on the scale. But in general, the promise of reward is a bigger motivator than the threat of punishment – especially because, if you know for sure you aren’t going to see a lower number on the scale, you may just avoid the scale altogether.

The motivational power of variable rewards may also be the key to explaining a more modern mystery: why I can’t seem to stop checking my damn email every few minutes. Dan Ariely, author of Predictably Irrational, explains:

If you think about it, e-mail is very much like trying to get the pellet rewards. Most of it is junk and the equivalent to pulling the lever and getting nothing in return, but every so often we receive a message that we really want. Maybe it contains good news about a job, a bit of gossip, a note from someone we haven’t heard from in a long time, or some important piece of information. We are so happy to receive the unexpected e-mail (pellet) that we become addicted to checking, hoping for more such surprises. We just keep pressing that lever, over and over again, until we get our reward.

Like this? Share it!