Nine Things Nerds Can Appreciate about Football’s Game Design

The Super Bowl is this weekend, and you know what that means: Superb Owl! and Sportsball! memes galore. But I noticed that many of my friends who mock the sport also love strategy games like Magic the Gathering, Starcraft, or 7 Wonders — and thus I know they can appreciate a well-designed game.

Image result for football strategy

There are plenty of reasons to criticize the NFL: a hypocritical position on brutal hits, stinginess in paying the refs, the exploitation of unpaid college athletes, and of course the (somewhat improving) issue of concussions and player safety.

But for my fellow nerds, here are some elements of the National Football League that you can look at on a strategic game-design level and say “Ok, that’s actually really cool.”

Nine? Elements of Football that Reflect Good Strategic Game Design

Resource Management:

Like most good games, a key component is making smart decisions about where to invest your resources.


In the NFL, the collective bargaining agreement between owners and the players’ union sets a salary cap to set a maximum (and minimum) that teams can spend each year.

Would you want to pay premium for a star or spread that salary around and upgrade more positions? It might depend on synergies with the rest of your roster. (Or you can get lucky with a player like Tom Brady who takes less money than his market value so his team can surround him with more talent.)

Rookies, as unproven quantities, are typically signed on a very affordable contract for their first few years, whereas established veterans can be much pricier. Teams are faced with the question of where they can afford to have uncertainty on their roster — contenders rarely want to risk an unproven player at quarterback, while teams that are a bit further away can take that gamble.

Time / (Plus Good Catch-up Mechanics):

In any game, there’s incentive for the person losing to resort to riskier higher-variance strategies. In football, that’s passing plays, which have a chance of gaining a lot of yards or not being caught at all. A neat aspect of football is that incomplete passes can stop the clock.

When a team is behind, time is one of the most precious resources it has, so this means the game is structured to give them every opportunity to claw their way back. Deciding when to burn time (and time-outs) is cruicial, and something even the best coaches struggle to do well.

Energy Level in Game:

Image result for football huddle

This is a minor point, but one I found really neat: Playing defense tends to be more exhausting than playing offense. If an offense can run many plays in rapid succession without huddling up or substituting — and thus prevent the defense from substituting in fresh players as well — they can gain an edge over an increasingly-tired defense.

Synergies Between Players and Schemes

Building a roster

Of course, the downside of this kind of “Hurry up offense” is that the offense is also stuck with the same players on the field and doesn’t have as much time for coaches to call the next play. Teams that want to use more hurry-up offense may need to hire versatile players who can be counted on to run many types of play, and an experienced quarterback who can make smart changes on the fly.

Since players have unique skill sets, teams need to build around who’s coming out of college and who they can afford to sign. Just like putting together a Magic the Gathering or Netrunner deck to find and build on synergies, Football has fantastic interplay between skills, style, and strategy.

Finding a generational talent like Detroit’s ultra-elusive running back Barry Sanders opens up a host of new options for a team. The Detroit Lions never focused on their offensive linemen — who would try to clear the way for him — because, frankly, Barry could succeed without them. (Until he got fed up and retired early, which is a story of its own.)

If a team has a quarterback with pinpoint accuracy but a weak arm, paying for blazing fast wide receivers to race down the field and catch deep passes doesn’t make as much sense as signing shifty and precise route-runners. However, to make good use of a quarterback with a cannon-arm, fast receivers aren’t enough — he needs strong offensive linemen to protect him and give him time to throw.

There are countless interactions between strategic decisions and I love it.

Strategic Structural Asymmetries

Stadiums and Cities Matter

Even the stadium and climate in a city factor into a football team’s strategy.

Because it’s considered more difficult to pass the ball in cold and windy weather, teams in the frigid Northern divisions are more likely to focus on building a running attack and a defense that can stop the other team from running successfully. As winter approaches, they can count on the environment to limit the passing game.

Image result for nfl retractable roofs

However, roughly a third of teams play in indoor stadiums — eliminating the weather and the wind to make passing strategies more effective. Since half a team’s games are played at home, it has disproportionate impact on how they want to build a roster. A few stadiums even have retractable roofs, which let them decide whether to allow the elements to impact the game!

Leveraged Division Structure

An NFL division is comprised of four teams who play each other twice every season. The team in each division with the best record is guaranteed a spot in the playoffs. (Even if NONE of the teams were very good – the Washington Football Team made the playoffs this year despite losing more than half their games!)

This makes it even more crucial to plan how to counter division opponents’ schemes, and those rivalry games are usually close and exciting.

If an opposing team in your division has exceptionally tall, strong receivers you can’t afford to be caught assigning small cornerbacks to guard them. You’d need to keep that in mind when building a roster.

Well Designed Scoring System

Intermittent Scoring to Build Excitement Throughout

In the NBA, a jaw-dropping feat of athleticism… gets 2 or 3 points. Since teams average over 100 points a game, there’s a limit to how impactful any one play can be until the last minutes of a game.

On the other end of the spectrum, each score in soccer is rare and thus hugely important. Unfortunately, goals are so infrequent that historically, over 30% of English league games end with neither team scoring more than once. Those rare goals are dramatic, but personally I find it difficult to get excited about the intervening dribbling and passing when I know it’s very unlikely to shape the final result.

It’s a tradeoff, and football’s rules situate it nicely between these extremes. There are typically around 8 scoring plays each NFL game, and even the non-scoring plays are impactful (see below).

Building Progress / Tension

Getting a player on base in baseball or softball, winning a game or a set in tennis’ Game/Set/Match structure, or defeating a video game monster and getting to heal at a save point — these smaller discrete goals build toward the larger one.

Image result for first down

Football is designed to have two key ways to make non-scoring plays significant. First, teams are only given four chances (downs) to score before they turn the ball over to the other team — but the count is reset every time they gain another 10 yards. Every third-down and fourth-down opportunity has more importance because it’s approaching that impactful mini-goal, breathing more life into a drive.

But even without this ratcheting structure, non-scoring progress matters because field position is a ‘stateful’ element which persists from one play to the next. Where one play ends, the next one begins. Every yard one team moves forward is an extra yard the opponents will need to win back. It’s a tug-of-war with scoring opportunities on the line at any time.

“Legacy” Narratives

Legacy games are all the rage, and for good reason. (Our country even decided to LARP a game of Pandemic Legacy!)

The fact that players, teams, and coaches have storylines, rivalries, and arcs is a big part of the human element to football. We’re watching some of the legends of their craft face off each week, with history between them and human motivations.

On Sunday, we get to watch completely different quarterbacks compete: young phenom Patrick Mahomes against the unaging Tom Brady. Mahomes, who won last year, is widely considered one of the best in the league despite being only 25. In contrast, Brady is 43 and amazingly this is his tenth Super Bowl appearance. He’s been playing at an elite level for decades, and he won his first Super Bowl in 2002 — when Mahomes was 6.

Most of Brady’s 20 years were spent under coach Bill Belichick, known for being brilliant and famously taciturn. (Unless, of course, you ask him about the history and minutia of football kicking rules, which makes him light up and talk for ages.) However, last year Brady decided to leave Belichick and the New England Patriots, so people were wondering whether he could succeed with a different coach. He’s answered those questions in dramatic fashion.

When storylines carry over from campaign to campaign or season to season, it’s a great way to build long narratives of meaning and importance.

Whether or not you enjoy watching the sport, there’s a lot it does well from a design perspective and I recommend anyone who enjoys strategy to try playing some of the Madden video games — how I initially got excited about the game.

Forget 3-D Chess; Here’s My 1-D Chess Rules

3dchessChess is sometimes held up as the embodiment of strategy and brilliance — if you’re playing chess while your opponent is playing checkers, you’re out-thinking them. Those even smarter can play chess in higher dimensions, with 3-D chess often used as a metaphor for politics. (There’s even a 5-D chess game on Steam which looks mind-bending.)

But going the other direction, the existence of 5-D, 3-D, and 2-D chess made me wonder: is there a way 1-D chess could work? And be fun, that is.

I’m not the first to have this thought; many people have tried their hand at designing one-dimensional chess including the late great Martin Gardner. His approach was for each side to have a single King, Knight, and Rook at the ends of an eight-tile long board. With so few pieces and spaces it’s fairly easy to “solve” the game, mapping out every possible move the same way we can solve tic-tac-toe.

I set out to create 1-D Chess which kept the spirit of the game as much as possible. It was initially inspired by conversations with Brienne years ago about designing mobius chess (which is topologically identical to playing on a loop, but is *obviously* cooler.)

Values to Preserve

  1. Low complexity – Piece moves are simple, there are few rules
  2. High depth – Many games are possible, with a mix of strategy and tactics
  3. Full information – No fog of war, no hidden cards, no randomness
  4. Personalized openings – Different opening play/counter-play options to match your aesthetics and strengths.

The last one is contentious — I know many people bemoan the amount of memorization required to learn the various chess openings. Bobby Fischer even famously proposed Fischer Random Chess which randomized the back row each game, thus stripping the game down to a player’s ability to understand the situation and respond.

However, I happen to enjoy the way you can study various opening strategies and say “I prefer to use the Alapin Variation to counter the Accelerated Dragon Sicilian Defense — I hate ceding the middle of the board.” Being able to steer the game toward your preferred style before getting into tactical elements of the game is a key part of what makes a game feel like *chess* to me.

So, after a lot of brainstorming and a lot of rejected ideas — see the last section — I whittled it down to a few core concepts. Pictures are worth a thousand words (although I’m sure there are opportunities for arbitrage somewhere…) so here’s a screenshot of the game I started building in Tabletop Simulator:

My Proposal for 1D Chess


  1. Ring Board – 28 squares; the outside of a standard chess board
  2. 12 Pieces per side – 4 fewer pawns, but otherwise the same pieces
  3. Placement Control – Players take turns placing non-pawns in their region to set up

Ring Board

Look, nobody said it had to be a line segment. Since each square has exactly two neighbors and the entire board is connected, it counts as 1-D.  Put it into polar coordinates if you have to.

Using a 28-square ring allows us to keep the standard chess board, but it also allows much more depth of play without adding complexity to the rules. Like in 2-D chess, you can focus your attack on one side or the other, and you have the ability to try interrupting your opponent’s plans by striking and causing havoc on the other side of the fight.

12 Pieces, Simple Moves

Similarly, I stuck with the original pieces and kept their movement as close in spirit as I could:

  • Pawn: Move forward one or capture two spaces ahead, ignoring the square in front. Cannot turn around.
  • Bishop: Moves up to 6 spaces, 2 at a time (hopping over every other square).
  • Rook: Moves up to 3 spaces forward, 1 at a time. [EDIT: Because the Rooks slide instead of hop, they get stuck easily. My current solution is that they can move *though* the King.]
  • Knight: Jumps either 3 or 5 squares
  • Queen: Can move like the Rook or Bishop
  • King: Moves one square.

This move set creates parallels to the 2-D version: Bishops stay on their color, pawns can get locked together, and Knights have a unique move (5 squares) that not even the Queen has.

The moves themselves stay fairly simple, but allow the kind of interplay that I like in 2-D chess with pieces defending each other and getting in each other’s way.

Opening Placement

Each player has 12 opposite squares to start, with 2 on each end filled by pawns. The remaining 8 squares are up to the players to arrange.

Starting with White, the players take turns placing one of their pieces on an empty square between their pawns.

It’s up to you: You can choose to create an unbalanced attack with both Knights on one side, ready to jump over the pawns and storm the enemy. You can choose to put your Bishops on the inside, where they have an easier time of getting out, or on the outside so that the Rooks are the last line of defense to mop up any attacks. You can leave the King with the Queen — your strongest piece — or between two Rooks…

There are lots of possibilities which rely on how you enjoy playing and how your opponent seems to be setting up. While the complexity of this rule is low, it adds immense depth to the game and prevents it from being quite so easily “solved”.

By requiring the pawns to take up the outermost two spaces, initial move choices are limited to advancing a pawn or using a Knight to hop over them. Moving one pawn can give your Bishops or Queen a way to move through them and enter the fray.  This is all just like in the 2-D version in a way I find aesthetically very pleasing.

If you prefer to just focus on the tactical side of things, you can use the normal ordering or give both players mirrored random arrangements.

Ideas that I considered but didn’t use:

Here are some snippets of ideas that I had but rejected because the complexity/depth tradeoff wasn’t good enough, or the game strayed too far and stopped being recognizable as “Chess”.

  • Making pieces face a direction, limiting them to moving forward
    • Allowed to turn around if the square immediately in front of them is filled
    • Might allow rules that make it easier to capture pieces from the back
  • Pieces can only capture certain types of pieces (in either a rock-paper-scissors style or Stratego style)
  • Ranged attacks without moving
  • Allow pieces to swap with each other
    • Either upon landing on your own, or as a type of movement
  • Pieces that push or pull rather than capture
  • Pieces that move differently when next to certain others
    • Rooks launch pawns, for example
    • The Queen could move in the pattern of any piece in a contiguous chain with her
  • Different terrain
    • Mud tiles which must be stopped on
    • Rocky terrain which prevents knights from landing on it
  • Pieces spawn new pieces next to them as an action

What do you think? Ideas and opinions are welcome!


Which Cognitive Bias is Making NFL Coaches Predictable?

In football, it pays to be unpredictable (although the “wrong way touchdown” might be taking it a bit far.) If the other team picks up on an unintended pattern in your play calling, they can take advantage of it and adjust their strategy to counter yours. Coaches and their staff of coordinators are paid millions of dollars to call plays that maximize their team’s talent and exploit their opponent’s weaknesses.

That’s why it surprised Brian Burke, formerly of (and now hired by ESPN) to see a peculiar trend: football teams seem to rush a remarkably high percent on 2nd and 10 compared to 2nd and 9 or 11.

What’s causing that?

His insight was that 2nd and 10 disproportionately followed an incomplete pass. This generated two hypotheses:

  1. Coaches (like all humans) are bad at generating random sequences, and have a tendency to alternate too much when they’re trying to be genuinely random. Since 2nd and 10 is most likely the result of a 1st down pass, alternating would produce a high percent of 2nd down rushes.
  2. Coaches are suffering from the ‘small sample fallacy’ and ‘recency bias’, overreacting to the result of the previous play. Since 2nd and 10 not only likely follows a pass, but a failed pass, coaches have an impulse to try the alternative without realizing they’re being predictable.

These explanations made sense to me, and I wrote about phenomenon a few years ago. But now that I’ve been learning data science, I can dive deeper into the analysis and add a hypothesis of my own.

The following work is based on the play-by-play data for every NFL game from 2002 through 2012, which Brian kindly posted. I spend some time processing it to create variables like Previous Season Rushing %, Yards per Pass, Yards Allowed per Pass by Defense, and QB Completion percent. The Python notebooks are available on my GitHub, although the data files were too large to host easily.

Irrationality? Or Confounding Variables?

Since this is an observational study rather than a randomized control trial, there are bound to be confounding variables. In our case, we’re comparing coaches’ play calling on 2nd down after getting no yards on their team’s 1st down rush or pass. But those scenarios don’t come from the same distribution of game situations.

A number of variables could be in play, some exaggerating the trend and others minimizing it. For example, teams that passed for no gain on 1st down (resulting in 2nd and 10) have a disproportionate number of inaccurate quarterbacks (the left graph). These teams with inaccurate quarterbacks are more likely to call rushing plays on 2nd down (the right graph). Combine those factors, and we don’t know whether any difference in play calling is caused by the 1st down play type or the quality of quarterback.


The classic technique is to train a regression model to predict the next play call, and judge a variable’s impact by the coefficient the model gives that variable.  Unfortunately, models that give interpretable coefficients tend to treat each variables as either positively or negatively correlated with the target – so time remaining can’t be positively correlated with a coach calling running plays when the team is losing and negatively correlated when the team is winning. Since the relationships in the data are more complicated, we needed a model that can handle it.

I saw my chance to try a technique I learned at the Boston Data Festival last year: Inverse Probability of Treatment Weighting.

In essence, the goal is to create artificial balance between your ‘treatment’ and ‘control’ groups — in our case, 2nd and 10 situations following 1st down passes vs. following 1st down rushes. We want to take plays with under-represented characteristics and ‘inflate’ them by pretending they happened more often, and – ahem – ‘deflate’ the plays with over-represented features.

To get a single metric of how over- or under-represented a play is, we train a model (one that can handle non-linear relationship better) to take each 2nd down play’s confounding variables as input – score, field position, QB quality, etc – and tries to predict whether the 1st down play was a rush or pass. If, based on the confounding variables, the model predicts the play was 90% likely to be after a 1st down pass – and it was – we decide the play probably has over-represented features and we give it less weight in our analysis. However, if the play actually followed a 1st down rush, it must have under-represented features for the model to get it so wrong. Accordingly, we decide to give it more weight.

After assigning each play a new weight to compensate for its confounding features (using Kfolds to avoid training the model on the very plays it’s trying to score), the two groups *should* be balanced. It’s as though we were running a scientific study, noticed that our control group had half as many men as the treatment group, and went out to recruit more men. However, since that isn’t an option, we just decided to count the men twice.

Testing our Balance

Before processing, teams that rushed on 1st down for no gain were disproportionately likely to be teams with the lead. After the re-weighting process, the distributions are far much more similar:


Much better! They’re not all this dramatic, but lead was the strongest confounding factor and the model paid extra attention to adjust for it.

It’s great that the distributions look more similar, but that’s qualitative. To do a quantitative diagnostic, we can take the standard difference in means, recommended as a best practice in a 2015 paper by Peter C. Austin and Elizabeth A. Stuart titled “Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies“.

For each potential confounding variable, we take the difference in means between plays following 1st down passes and 1st down rushes and adjust for their combined variance. A high standard difference of means indicates that our two groups are dissimilar, and in need of balancing. The standardized differences had a max of around 47% and median of 7.5% before applying IPT-weighting, which reduced the differences to 9% and 3.1%, respectively.


Actually Answering Our Question

So, now that we’ve done what we can to balance the groups, do coaches still call rushing plays on 2nd and 10 more often after 1st down passes than after rushes? In a word, yes.


In fact, the pattern is even stronger after controlling for game situation. It turns out that the biggest factor was the score (especially when time was running out.) A losing team needs to be passing the ball more often to try to come back, so their 2nd and 10 situations are more likely to follow passes on 1st down. If those teams are *still* calling rushing plays often, it’s even more evidence that something strange is going on.

Ok, so controlling for game situation doesn’t explain away the spike in rushing percent at 2nd and 10. Is it due to coaches’ impulse to alternate their play calling?

Maybe, but that can’t be the whole story. If it were, I would expect to see the trend consistent across different 2nd down scenarios. But when we look at all 2nd-down distances, not just 2nd and 10, we see something else:


If their teams don’t get very far on 1st down, coaches are inclined to change their play call on 2nd down. But as a team gains more yards on 1st down, coaches are less and less inclined to switch. If the team got six yards, coaches rush about 57% of the time on 2nd down regardless of whether they ran or passed last play. And it actually reverses if you go beyond that – if the team gained more than six yards on 1st down, coaches have a tendency to repeat whatever just succeeded.

It sure looks like coaches are reacting to the previous play in a predictable Win-Stay Lose-Shift pattern.

Following a hunch, I did one more comparison: passes completed for no gain vs. incomplete passes. If incomplete passes feel more like a failure, the recency bias would influence coaches to call more rushing plays after an incompletion than after a pass that was caught for no gain.

Before the re-weighting process, there’s almost no difference in play calling between the two groups – 43.3% vs. 43.6% (p=.88). However, after adjusting for the game situation – especially quarterback accuracy – the trend reemerges: in similar game scenarios, teams rush 44.4% of the time after an incomplete and only 41.5% after passes completed for no gain. It might sound small, but with 20,000 data points it’s a pretty big difference (p < 0.00005)

All signs point to the recency bias being the primary culprit.

Reasons to Doubt:

1) There are a lot of variables I didn’t control for, including fatigue, player substitutions, temperature, and whether the game clock was stopped in between plays. Any or all of these could impact the play calling.

2) Brian Burke’s (and my) initial premise was that if teams are irrationally rushing more often after incomplete passes, defenses should be able to prepare for this and exploit the pattern. Conversely, going against the trend should be more likely to catch the defense off-guard.

I really expected to find plays gaining more yards if they bucked the trends, but it’s not as clear as I would like.  I got excited when I discovered that rushing plays on 2nd and 10 did worse if the previous play was a pass – when defenses should expect it more. However, when I looked at other distances, there just wasn’t a strong connection between predictability and yards gained.

One possibility is that I needed to control for more variables. But another possibility is that while defenses *should* be able to exploit a coach’s predictability, they can’t or don’t. To give Brian the last words:

But regardless of the reasons, coaches are predictable, at least to some degree. Fortunately for offensive coordinators, it seems that most defensive coordinators are not aware of this tendency. If they were, you’d think they would tip off their own offensive counterparts, and we’d see this effect disappear.

An Atheist’s Defense of Rituals: Ceremonies as Traffic Lights

BarMitzvahThe idea of a coming-of-age ceremony has always been a bit strange to me as an atheist. Sure, I attended more than my fair share of Bat and Bar Mitzvahs in middle school. But it always struck me as odd for us to pretend that someone “became an adult” on a particular day, rather than acknowledging it was a gradual process of maturation over time. Why can’t we just all treat people as their maturity level deserves?

The same goes with weddings – does a couple’s relationship really change in a significant way marked by a ceremony? Or do two people gradually fall in love and grow committed to each other over time? Moving in with each other marks a discrete change, but what does “married” change about the relationship?

But my thinking has been evolving since reading this fantastic post about rituals by Brett and Kate McKay at The Art of Manliness. Not only do the rituals acknowledge a change, they use psychological and social reinforcement to help the individuals make the transition more fully:

One of the primary functions of ritual is to redefine personal and social identity and move individuals from one status to another: boy to man, single to married, childless to parent, life to death, and so on.

Left to follow their natural course, transitions often become murky, awkward, and protracted. Many life transitions come with certain privileges and responsibilities, but without a ritual that clearly bestows a new status, you feel unsure of when to assume the new role. When you simply slide from one stage of your life into another, you can end up feeling between worlds – not quite one thing but not quite another. This fuzzy state creates a kind of limbo often marked by a lack of motivation and direction; since you don’t know where you are on the map, you don’t know which way to start heading.

Just thinking your way to a new status isn’t very effective: “Okay, now I’m a man.” The thought just pings around inside your head and feels inherently unreal. Rituals provide an outward manifestation of an inner change, and in so doing help make life’s transitions and transformations more tangible and psychologically resonant.

Brett and Kate McKay cover a range of aspects of rituals, but I was particular struck by the game theory implications of these ceremonies. By coordinating society’s expectations in a very public manner, transition rituals act like traffic lights to make people feel comfortable and confident in their course of action.

The Value of Traffic Lights

Traffic lights are a common example in game theory. Imagine that you’re driving toward an unmarked intersection and see another car approaching from the right. You’re faced with a decision: do you keep going, or brake to a stop?

If you assume they’re going to keep driving, you want to stop and let them pass. If you’re wrong, you both lose time and there’s an awkward pause while you signal to each other to go.

If you assume they’re going to stop, you get to keep going and maintain your speed. Of course, if you’re wrong and they keep barreling forward, you risk a deadly accident.

Things go much more smoothly when there are clear street signs or, better yet, a traffic light coordinating everyone’s expectations.

Ceremonies as Traffic Lights

Now, misjudging a teenager’s maturity is unlikely to result in a deadly accident. But, with reduced stakes, the model still applies.

As a teen gets older, members of society don’t always know how to treat him – as a kid or adult. Each type of misaligned expectations is a different failure mode: If you treat him as a kid when he expected to be treated as an adult, he might feel resentful of the “overbearing adult”. If you treat him as an adult when he was expecting to be treated as a kid, he might not take responsibility for himself.

trafficlightA coming-of-age ritual acts like the traffic light to minimize those failure modes. At a Bar or Bat Mitzvah, members of society gather with the teenager and essentially publicly signal “Ok everyone, we’re switching our expectations… wait for it… Now!”

It’s important that the information is known by all to be known to all – what Steven Pinker calls common or mutual knowledge:

“In common knowledge, not only does A know x and B know x, but A knows that B knows x, and B knows that A knows x, and A knows that B knows that A knows x, ad infinitum.”

If you weren’t sure that the oncoming car could see their traffic light, it would be almost as bad as if there were no light at all. You couldn’t trust your green light because they might not stop. Not only do you need to know your role, but you need to know that everyone knows their role and trusts that you know yours… etc.

Public ceremonies gather everyone to one place, creating that common knowledge. The teenager knows that everyone expects him to act as an adult, society knows that he expects them to treat him as one, and everyone knows that those expectations are shared. Equipped with this knowledge, the teen can count on consistent social reinforcement to minimize awkwardness and help him adopt his new identity.

Obviously, these rituals are imperfect – Along with the socially-defined parts of identity, there are internal factors that make someone more or less ready to be an adult. Quite frankly, setting 13 as the age of adulthood is probably too young.

But that just means we should tweak the rituals to better fit our modern world. After all, we have precise engineering to set traffic light schedules, and it still doesn’t seem perfect (this XKCD comes to mind).

That’s what makes society and civilization powerful. We’re social creatures, and feel better when we feel comfortable in our identity – either as a child or adult, as single or married, as grieving or ready to move on. Transition rituals serve an important and powerful role in coordinating those identities.

We shouldn’t necessarily respect them blindly, but I definitely respect society’s rituals more after thinking this through.

To take an excerpt from a poem by Bruce Hawkins:

Three in the morning, Dad, good citizen
stopped, waited, looked left, right.
He had been driving nine hundred miles,
had nearly a hundred more to go,
but if there was any impatience
it was only the steady growl of the engine
which could just as easily be called a purr.

I chided him for stopping;
he told me our civilization is founded
on people stopping for lights at three in the morning.

Why Blocking Roads Can Speed Up Traffic

It’s so counter-intuitive that it’s called Braess’ Paradox: How can closing a road actually make everyone’s commute shorter? You would think that blocking a route would be an inconvenience, but under some circumstances it’s actually for the best.

Doesn’t sound right, does it?  Here’s the situation: Assume drivers are rational and intelligent.  I know, that’s a stretch – I grew up around DC.  But bear with me.  If there are multiple paths that people can take, they should in theory find an equilibrium between them.  If one path has less traffic and takes less time, more people will switch to it until it loses its advantage.  If one path starts longer than the others, nobody will use it until the other paths get congested enough to make it worth it.

So how can an extra path actually make the average commute time longer?  Shouldn’t an extra path just give people more options to choose from, and ultimately find the best equilibrium?

The Situation:

It turns out that when some roads are more prone to traffic than others, it can create Braess’ Paradox.  Imagine that some roads aren’t as affected by traffic – I picture these as the local roads with traffic lights. They add a fixed amount of time to your commute, say 45 minutes. The other roads are heavily dependent on traffic – these highways can either be wonderfully fast or a mess of stop-and-go congestion, depending on how many other people are on them. The average time it takes to drive on them is the number of cars over 100.

(Image modified from Wikipedia)

Let’s say there are 4000 cars driving from the start to finish. Without the connector (dotted in the diagram), an equilibrium forms where half the drivers (2000 cars) take the top route through A, and half take the bottom route through B.  The highway takes 2000/100 = 20 minutes, and the local road takes 45 minutes. So half the population spends 45 minutes on a local street, followed by 20 minutes on a highway, and the other half of the drivers spend 20 minutes on a highway, followed by 45 minutes on a local street. Everyone gets to their destination in 65 minutes. Nobody has any incentive to switch.

But what if a new connector is opened between A and B, allowing people to go straight from one highway to the other? Now everyone thinks to themselves, “Hey, why spend 45 minutes on a local street when I could spend 20 minutes on the highway? I’m going to take the route Start –> A –> B –> Finish, and shave 25 minutes off of my commute time!”

Of course, if everyone thinks that way, there are now double the cars on each highway than there were before, and it’s half as fast: now each highway takes 40 minutes, not 20 minutes. That’s still 5 minutes less than the 45 minutes it takes to drive on the local street, though, so everyone still has an incentive to take the highway.

So in the end, how has the connector affected people’s commutes? Everyone’s commute used to be 65 minutes; now, everyone’s commute is 80 minutes. And to make it stranger, there’s no better path to take – anyone considering switching to their original route would be looking at an 85 minute drive.

How does this happen?

How can opening a new, super-fast connector make commutes worse? It comes down to the price of anarchy and people’s selfish motivations.  With the connector open, each set of cars has the option to clog up the other half’s highways – saving themselves 5 minutes but adding 20 minutes to the other guys’ commute.

It’s like the prisoner’s dilemma: Each driver has the motivation to take the highways, even though it damages the overall system. Without the connector, nobody is allowed to “defect” for personal gain. In the traditional prisoner’s dilemma, it would be like a mafia boss keeping all his criminals anonymous. Without the option to rat each other out, criminals would avoid the selfish temptation and the entire system is better off.

Braess’ Paradox isn’t purely hypothetical – it has real-world implications in city planning. According to this New York Times article titled What if They Closed 42d Street and Nobody Noticed?, “When a network is not congested, adding a new street will indeed make things better. But in the case of congested networks, adding a new street probably makes things worse at least half the time, mathematicians say.”  That’s shocking. My intuitions about how traffic works were way off.

Lastly, via Presh Talkwalkar’s fantastic game theory blog, Mind Your Decisions, (which brought Braess’ paradox to my attention) there’s a great video of the paradox physically in action with springs. Check it out:

Coach Smith’s Gutsy Call

Coach Mike Smith was facing a tough decision. His Falcons were in overtime against the division-rival Saints. His team had been stopped on their own 29 yard-line and were facing fourth down and inches. Should he tell his players to punt, or go for it? A punt would be safe. Trying to get the first down would be the high-risk, high-reward play. Success would mean a good chance to win, failure would practically guarantee a loss. What play call would give his team the best chance to win?

He decided to be aggressive. He called for star running back Michael Turner to try pounding up the middle of the field.

It failed. The Saints were given the ball in easy range to score, and quickly did so. The media and fans criticized Smith for his stupid decision.

But is the criticism fair? If the play call had worked, I bet he would have been praised for his guts and brilliance. I think my favorite reaction came from ESPN writer Pat Yasinskas:

When Mike Smith first decided to go for it on fourth-and-inches in overtime, I liked the call. I thought it was gutsy and ambitious. After watching Michael Turner get stuffed, I changed my mind. Smith should have punted and taken his chances with his defense.

What a perfect, unabashed example of Outcome Bias! We have a tendency to judge a past decision solely based on the result, not on the quality of the choice given the information available at the time.

Did Coach Smith know that the play would fail? No, of course not. He took a risk, which could go well or poorly. The quality of his decision lies in the chances of success and the expected values for each call.

Fortunately, some other people at ESPN did the real analysis, using 10 years of historical data of teams’ chances to win based on factors like field position, score, time remaining, and so on:

Choice No. 1: Go for the first down

…Since 2001, the average conversion percentage for NFL teams that go for it on fourth-and-1 is 66 percent. Using this number, we can find the expected win probability for Atlanta if it chooses this option.

* Atlanta win probability if it converts (first-and-10 from own 30-yard line): 67.1 percent
* Atlanta win probability if it does not convert (Saints first-and-10 from Falcons’ 29-yard line): 18 percent.
* Expected win probability of going for the first down: 0.660*(.671) + (1-.660)*(.180) = 50.4%

Choice No. 2: Punt

* For this choice, we will assume the Falcons’ net punt average of 36 yards for this season. This means the expected field position of the Saints after the punt is their own 35-yard line. This situation (Saints with first-and-10 from their 35, in OT, etc.) would give the Falcons a win probability of 41.4%.

So by choosing to go for it on fourth down, the Falcons increased their win probability by 9 percentage points.

That’s a much better way to evaluate a coach’s decision! Based on a simple model and league averages (there are problems with both of those, but they’re better than simply trusting outcome!) the punt was not the best option. Smith made the right decision.

Well, sort of. There are different ways to go for the fourth-down conversion, and according to Brian Burke at AdvancedNFLStats, Smith chose the wrong one:

Conversion success rates on 1-yd to go runs (%)

Position 3rd Down 4th Down
FB 77 70
QB 87 82
RB 68 66
Total 72 72

In these situations, quarterback sneaks have proven much more effective than having your running back take the ball. In a perfect game-theory world, defenses would realize their weakness and focus more effort on stopping it. But for now, it remains something more offenses teams can exploit. According to the numbers, the Falcons probably could have made a better decision.

And, of, course, it was OBVIOUS to me at the time that they should have called a quarterback sneak. </hindsight bias>

Game Theory and Football: How Irrationality Affects Play Calling

Coaches and coordinators in professional football get paid a lot of money to call the right plays – not just the best plays for particular situations, but also unpredictable plays that will catch the other team off guard. It’s a perfect setup for game theory analysis!

As in other game theory situations, the best play depends in part on what your opponent does. Your running play is much more likely to succeed against a pass-prevent defense, but would be in trouble against a run-stuffing formation. If the defense can guess what you’re going to call, they can adjust accordingly and have an advantage. Even on 3rd down and long – a common passing situation – there’s value in calling a percent of running plays, because the defense is less likely to be geared toward stopping that. But as you do it more, the chance of catching the defense off guard gets smaller. There’s some optimal balance where the expected success of a surprising run is equal to the expected success of a more sensible (but anticipated) pass.

The goal is to stay unpredictable and exploit patterns where your opponent is using a sub-optimal combination. If a team notices that passing plays are working better, they’ll be more likely to call them. As the defense notices, they’ll shift away from their run-defense and focus more on defending passes. In theory, the two teams reach an equilibrium.

In practice, it doesn’t quite work that perfectly – human beings are making the decisions, and humans are both vulnerable to cognitive biases and notoriously bad at mimicking true unpredictability. Brian Burke, a fellow fan of combining sports with statistics, was poring over the play-calling data for second downs and noticed something odd:

There’s a strange spike in percent of running plays called at 2nd and 10! Tactically, 2nd and 10 isn’t all that different from 2nd and 9 or 11, so it’s strange to see such a difference. Why would they call so many more running plays in that particular situation?

The key is to realize that there are two ways a team tends to find itself facing a 2nd and 10 situation – runs that happen to go nowhere or any incomplete pass. Of those, incomplete passes are far more common. So in cases of 2nd and 10, it’s most often because the team just failed a passing play. That suggests two reasons coaches might be irrationally switching to running plays, even at the cost of sacrificing unpredictability:

(1) The hasty generalization bias (also called the small sample bias) and the recency effect are cognitive biases in which people overgeneralize from a small amount of data, especially recent data. Failed passes are very common (about 40% fail), so there’s no good reason for a coach to treat any single failed pass as evidence that they’d be better off switching to a running play. But the urge to overreact to the failed pass that just happened is strong, thanks to these two biases.

(2) People are terrible at generating unpredictability — when asked to make up a “seemingly-random” sequence of coin flips, we tend to use far more alternation between Heads and Tails than would actually occur in a real sequence of coin flips. So even if coaches weren’t overreacting to a failed pass, and they were simply trying to be unpredictable, they would still tend to switch to a running play after a passing play more often than random chance would dictate.

Indeed, when Brian separated the data by previous play, the alternation trend is clear — passes are more likely after runs, and runs are more likely after passes:

(My favorite team, the Baltimore Ravens, was pretty bad about this under the previous regime, Coach Billick)

Brian concludes:

Coaches and coordinators are apparently not immune to the small sample fallacy. In addition to the inability to simulate true randomness, I think this helps explain the tendency to alternate. I also think this why the tendency is so easy to spot on the 2nd and 10 situation. It’s the situation that nearly always follows a failure. The impulse to try the alternative, even knowing that a single recent bad outcome is not necessarily representative of overall performance, is very strong.

So recency bias may be playing a role. More recent outcomes loom disproportionately large in our minds than past outcomes. When coaches are weighing how successful various play types have been, they might be subconsciously over-weighting the most recent information—the last play. But regardless of the reasons, coaches are predictable, at least to some degree.

Coaches are letting irrational biases influence their play calling, pulling them away from the optimal mix. The result, according to Pro Football Reference stats, is less success on those plays. I wonder how well a computer could call plays using a Statistical Prediction Rule

Game theory and basketball

Ben Morris is a friend-of-a-friend of mine who recently competed in a contest sponsored by ESPN called “Stat Geek Smackdown,” in which the goal was to correctly predict as many of the NBA playoff games as possible. For each correct guess, a contestant received 5 points.

Heading into the final game between Miami and Dallas, Ben was in second place, trailing just 4 points behind a veteran stat geek named Ilardi. By most estimates, Miami had about a 63% chance of beating Dallas. But Ben realized that if he and Ilardi both chose Miami, then even if Miami won the game, Ilardi would still win the competition, because he and Ben would each get 5 points and the gap between their scores would remain unchanged. In order for Ben to win the competition, he would have to pick the winning team and Ilardi would have to pick the losing team.

So that created an interesting game theory problem: If Ben predicted that Ilardi would pick Miami, since they were more likely to win, then Ben should pick Dallas. But if Ilardi predicted that Ben would be reasoning that way, then Ilardi might pick Dallas, knowing that all he needs to do to win the competition is to pick the same team as Ben. But of course if Ben predicts that Ilardi will be thinking that way, maybe Ben should pick Miami…

What would you do if you were Ben? You can read about Ben’s reasoning on his excellent blog, Skeptical Sports, but here’s my summary. Ben essentially had two options:

(1) His first option was to play his Nash equilibrium strategy, which is a concept you might recall if you ever took game theory (or if you saw the movie “A Beautiful Mind,” although the movie botched the explanation). That’s the set of strategies (Ben’s and Ilardi’s) which gives each of them no incentive to switch to a new strategy as long as the other guy doesn’t. The Nash equilibrium strategy is especially appealing if you’re risk averse because it’s “unexploitable,” meaning that it gives you predictable, fixed odds of winning the game, no matter what strategy your opponent uses.

In this case — and you can read Ben’s blog for the proof — the Nash equilibrium is for Ben to pick Miami with exactly the same probability as Miami has of losing (0.37) and for Ilardi to pick Miami with exactly the same probability as Miami has of winning (0.63). (You might wonder how you should pick a team “with X probability,” but it’s pretty easy: just roll a 100-sided die, and pick the team if the die comes up X or lower.)

If you do the calculation, you’ll find that playing this strategy — i.e., rolling a hundred-sided die and picking Miami only if the die came up 37 or lower — would give Ben a 23.3% chance of beating Ilardi, no matter how Ilardi decided to play. Not terrible odds, especially given that this approach doesn’t require Ben to make any predictions about Ilardi’s strategy. But perhaps Ben could do better if he were able to make a reasonable guess about what Ilardi would do.

(2) That leads us to option two: Ben could abandon his Nash equilibrium strategy, if he felt that he could predict Ilardi’s action with sufficient confidence. To be precise, if Ben thinks that Ilardi is more than 63% likely to pick Miami, then Ben should pick Dallas.

Here’s a rough proof. Call “p” the likelihood that Ilardi picks Miami, and “q” the likelihood that Ben picks Miami. Then we can assign probabilities to each of the outcomes in which Ben wins:

Since the two outcomes are mutually exclusive, we can add up their probabilities to get the total probability that Ben wins, as a function of p and q:

Probability Ben wins = .37p + .63q – pq

Just to illustrate how Ben’s chance of winning changes depending on p, I plugged in three different values of p to create three different lines: For the black line, p=0.63. For the red line, p < 0.63 (to be precise, I plugged in p=0.62, but any value of p<0.63 will create an upward sloping line). For the blue line, p > 0.63 (to be precise, I plugged in p=0.64, but any value of p>0.63 will create a downward sloping line).

If p = .63, that renders Ben’s chance of winning constant ( .233) for all values of q. In other words, if Ilardi seems to be about 63% likely to pick Miami, then it doesn’t matter how Ben picks, he’ll have the same chance of winning (23.3%) as he would if he played his Nash equilibrium strategy.

If p > .63, Ben’s chance of winning decreases as q (his probability of choosing Miami) increases. In other words, if Ben thinks there’s a greater than 63% chance that Ilardi will pick Miami, then Ben should pick Miami with as low a probability as possible (i.e., he should pick Dallas).

If p < .63, Ben’s chance of winning increases as q (his probability of choosing Miami) increases. In other words, if Ben thinks there’s a lower than 63% chance that Ilardi will pick Miami, then Ben should pick Miami with as high a probability as possible (i.e., he should pick Miami).

So what happened? Ben estimated that Ilardi would pick Miami with greater than 63% probability. That’s mainly because most people aren’t comfortable playing probabilistic strategies that require them to roll a die —  people will simply “round up” in their mind and pick the team that would give them a win more often than not. And Ben knew that if he was right about Ilardi picking Miami, then Ben would end up with a 37% chance of winning, rather than the 23.3% chance he would have had if he stuck to his equilibrium strategy.

So Ben picked Dallas. As he’d predicted, Ilardi picked Miami, and lucky for Ben, Dallas won. This one case study doesn’t prove that Ilardi reasoned as Ben expected, of course. Ben summed up the takeaway on his blog:

Of course, we shouldn’t read too much into this: it’s only a single result, and doesn’t prove that either one of us had an advantage.  On the other hand, I did make that pick in part because I felt that Ilardi was unlikely to “outlevel” me.  To be clear, this was not based on any specific assessment about Ilardi personally, but based my general beliefs about people’s tendencies in that kind of situation.

Was I right? The outcome and reasoning given in the final “picking game” has given me no reason to believe otherwise, though I think that the reciprocal lack of information this time around was a major part of that advantage.  If Ilardi and I find ourselves in a similar spot in the future (perhaps in next year’s Smackdown), I’d guess the considerations on both sides would be quite different.

The Game Theory of Story Endings

Do happy endings really make you as happy if you see them coming a mile away? When we watch a trashy action flick or a fluffy romantic comedy, aren’t the conflicts less interesting because we know it’ll all end happily ever after? Someone has to bite the bullet and write a sad ending to give plausibility to the threat of unhappiness. It’s disincentivized because sad endings are more challenging and risk upsetting the audience, but someone has to do it.

Steven E. Landsburg muses about this in The Armchair Economist:

I am intrigued by the market for movie endings. Movie-goers want two things in an ending: They want it to be happy and they want it to be unpredictable. There is some optimal frequency of sad endings that maintains the right level of suspense. Yet the market might fail to provide enough sad endings.

An individual director who films a sad ending risks short-term losses, as word gets around that the movie is “unsatisfying.” It is true that there are long-term gains, as viewers are kept off their guard for future movies. Unfortunately, most of those gains may be captured by other directors, because movie-goers remember only that the murderer does sometimes catch up with the heroine in the basement, and do not remember that it happens only in movies with particular directors. Under these circumstances, no individual director may be willing to incur costs for his rivals’ benefit.

A solution is for directors to display their names prominently, so that viewers know when a movie was made by someone unpredictable. Viewers, however, may find it in their interests to retaliate by covering their eyes when the director’s name is shown.

If you can be associated more strongly with unpredictability, you reap more benefits. You’re also more strongly associated with the unhappy ending, which might turn audiences away.

One way to ease the blow of an unexpected sad ending is to make deaths triumphant, defiant, or heroic. Think of how Spock died in The Wrath of Khan (No, I’m not going to give a spoiler alert for a 30 year old movie). Sure, people die in Star Trek all the time – when Kirk, Spock, and fresh-faced, red-shirted Ensign Jimmy beam down to explore a planet for life, we all know one of them isn’t going to make it back. But to kill a main character is more significant. And it was done in a touching way. They got the unpredictability without upsetting their audience.

I genuinely respect Joss Whedon for his willingness to throw curve balls like this in his story lines. He’s developed a reputation for having sympathetic characters die, leave, or change sides – often without warning. Rather than watching Buffy, Firefly and Serenity thinking “So, how is it all going to work out this time?” we’re forced to think “Is it going to work out this time?”

TV Tropes has a name for all this – Anyone Can Die:

This is where no one is exempt from being killed, including the main characters (maybe even the hero). The Sacrificial Lamb is often used to establish the writer’s Anyone Can Die cred early on. However, if the Lamb’s death is a one-off with no follow-up, it’s just Killed Off for Real. To really be Anyone Can Die, the work must include multiple deaths, happening at different points in the story. Bonus points if the death is unnecessary and devoid of Heroic Sacrifice.

In game theory situations, reputation plays a large role. TV Tropes mentions building a ‘Anyone Can Die’ cred, which can be achieved through repeated interactions. In a TV series or multiple films by the same director, you get a feel for whether the good guys always prevail. But even within a single story, early and repeated signaling can make the remainder of the plot more intense. When a major character is killed off without it being a Heroic Sacrifice, that’s a powerful signal that anything can happen. The musical Into the Woods will always have a special place in my heart for mastering this dynamic.

But there’s another route. Historical dramas can increase society’s perception of “sadness plausibility” without anyone taking a hit for being a downer. Nobody’s going to feel unsatisfied that Titanic, The Great Escape, or Butch Cassidy and the Sundance Kid have sad endings. (Or if they do, they can take it up with reality for writing a depressing script. It’s not easy to keep those separate in our brains; we just get the overall sense that sometimes stories have sad endings. And that perception helps us enjoy all the other movies we watch.

How Smiles Might Beat Poker Faces

Is putting on a poker face the most effective strategy in cards? People talk about spotting players’ ‘tells’, involuntary behavior that gave away their confidence level. To prevent people from picking up on your tells, you were supposed to work on a poker face – a blank look that gives nothing away.

But instead of shutting down and sending no signals, can we send misleading signals?

Well, via NCBI ROFL, a recent study wants to suggest that a blank face isn’t the best – a trustworthy face is:

Participants made risky choices in a simplified poker task while being presented opponents whose faces differentially correlated with subjective impressions of trust… [P]eople took significantly longer and made more mistakes against emotionally positive opponents… According to these results, the best “poker face” for bluffing may not be a neutral face, but rather a face that contains emotional correlates of trustworthiness. Moreover, it suggests that rapid impressions of an opponent play an important role in competitive games, especially when people have little or no experience with an opponent.

[Quick note: As I read through the procedure, I wasn’t sure that I agreed with their assessment of ‘mistake’ in this simplified version of poker. I’ll look into it, but it’s still interesting to note that people folded more against trustworthy-looking faces]

It’s a natural habit to develop – a if a person looks confident and trustworthy, we’re more likely to believe them if they say they have an advantage (by, say, betting). They provide an image which I think lays it out well:

We have access to the information in grey – our own cards, our opponent’s bet, and our opponent’s facial expression. We’re trying to make a decision based on how our cards compare to theirs – which is unknown. Their style is also unknown, which is a problem because it’s a hidden factor influencing their bet.

Since we can only work with the information we have, we go backward: given the face we see, what is their style/attitude? Given their style and bet, what cards do they have? Of course, the random static computerized faces weren’t ‘inadvertently’ giving cues, but we’ve unconsciously learned to treat facial cues are a vital part of figuring out how the bet correlates to cards.

So is a trustworthy smile better than a blank poker face? As the authors point out, the study holds for first impressions, “when people have little or no experience with an opponent.” I would love to see a study of how long the effect holds up. After repeated interaction with the trustworthy-looking computer opponent, do people adapt and re-calibrate the assumed face-to-attitude correlation? I expect there to be some difference even over time – the unconscious connection is tough to shake. But its impact would diminish.

Perhaps we can smile and look trustworthy whenever we want opponents to fold? Sending misleading information is a powerful tool in game-theory. But I suspect decent card players would adapt quickly. Unless you’re damn confident that you know how to exploit their assumptions, you’ll be giving them an edge. Poker faces give up the chance to send intentional false information, but it also cuts out the unintentional cues we’re not even aware of.

I can’t pass up this perfect opportunity to quote Harry Potter and the Methods of Rationality after Harry’s attempt to bluff Snape:

Professor Quirrell had remarked over their lunch that Harry really needed to conceal his state of mind better than putting on a blank face when someone discussed a dangerous topic, and had explained about one-level deceptions, two-level deceptions, and so on. So either Severus was in fact modeling Harry as a one-level player, which made Severus himself two-level, and Harry’s three-level move had been successful; or Severus was a four-level player and wanted Harry to think the deception had been successful. Harry, smiling, had asked Professor Quirrell what level he played at, and Professor Quirrell, also smiling, had responded, One level higher than you.

Against true amateurs in a little house game, I’ll try some one-level deceptions. But when I’m in a new place with good players, I tend to play it safe and focus my energy on staying blank rather than on sending false signals.