Nine Things Nerds Can Appreciate about Football’s Game Design

The Super Bowl is this weekend, and you know what that means: Superb Owl! and Sportsball! memes galore. But I noticed that many of my friends who mock the sport also love strategy games like Magic the Gathering, Starcraft, or 7 Wonders — and thus I know they can appreciate a well-designed game.

Image result for football strategy

There are plenty of reasons to criticize the NFL: a hypocritical position on brutal hits, stinginess in paying the refs, the exploitation of unpaid college athletes, and of course the (somewhat improving) issue of concussions and player safety.

But for my fellow nerds, here are some elements of the National Football League that you can look at on a strategic game-design level and say “Ok, that’s actually really cool.”

Nine? Elements of Football that Reflect Good Strategic Game Design

Resource Management:

Like most good games, a key component is making smart decisions about where to invest your resources.


In the NFL, the collective bargaining agreement between owners and the players’ union sets a salary cap to set a maximum (and minimum) that teams can spend each year.

Would you want to pay premium for a star or spread that salary around and upgrade more positions? It might depend on synergies with the rest of your roster. (Or you can get lucky with a player like Tom Brady who takes less money than his market value so his team can surround him with more talent.)

Rookies, as unproven quantities, are typically signed on a very affordable contract for their first few years, whereas established veterans can be much pricier. Teams are faced with the question of where they can afford to have uncertainty on their roster — contenders rarely want to risk an unproven player at quarterback, while teams that are a bit further away can take that gamble.

Time / (Plus Good Catch-up Mechanics):

In any game, there’s incentive for the person losing to resort to riskier higher-variance strategies. In football, that’s passing plays, which have a chance of gaining a lot of yards or not being caught at all. A neat aspect of football is that incomplete passes can stop the clock.

When a team is behind, time is one of the most precious resources it has, so this means the game is structured to give them every opportunity to claw their way back. Deciding when to burn time (and time-outs) is cruicial, and something even the best coaches struggle to do well.

Energy Level in Game:

Image result for football huddle

This is a minor point, but one I found really neat: Playing defense tends to be more exhausting than playing offense. If an offense can run many plays in rapid succession without huddling up or substituting — and thus prevent the defense from substituting in fresh players as well — they can gain an edge over an increasingly-tired defense.

Synergies Between Players and Schemes

Building a roster

Of course, the downside of this kind of “Hurry up offense” is that the offense is also stuck with the same players on the field and doesn’t have as much time for coaches to call the next play. Teams that want to use more hurry-up offense may need to hire versatile players who can be counted on to run many types of play, and an experienced quarterback who can make smart changes on the fly.

Since players have unique skill sets, teams need to build around who’s coming out of college and who they can afford to sign. Just like putting together a Magic the Gathering or Netrunner deck to find and build on synergies, Football has fantastic interplay between skills, style, and strategy.

Finding a generational talent like Detroit’s ultra-elusive running back Barry Sanders opens up a host of new options for a team. The Detroit Lions never focused on their offensive linemen — who would try to clear the way for him — because, frankly, Barry could succeed without them. (Until he got fed up and retired early, which is a story of its own.)

If a team has a quarterback with pinpoint accuracy but a weak arm, paying for blazing fast wide receivers to race down the field and catch deep passes doesn’t make as much sense as signing shifty and precise route-runners. However, to make good use of a quarterback with a cannon-arm, fast receivers aren’t enough — he needs strong offensive linemen to protect him and give him time to throw.

There are countless interactions between strategic decisions and I love it.

Strategic Structural Asymmetries

Stadiums and Cities Matter

Even the stadium and climate in a city factor into a football team’s strategy.

Because it’s considered more difficult to pass the ball in cold and windy weather, teams in the frigid Northern divisions are more likely to focus on building a running attack and a defense that can stop the other team from running successfully. As winter approaches, they can count on the environment to limit the passing game.

Image result for nfl retractable roofs

However, roughly a third of teams play in indoor stadiums — eliminating the weather and the wind to make passing strategies more effective. Since half a team’s games are played at home, it has disproportionate impact on how they want to build a roster. A few stadiums even have retractable roofs, which let them decide whether to allow the elements to impact the game!

Leveraged Division Structure

An NFL division is comprised of four teams who play each other twice every season. The team in each division with the best record is guaranteed a spot in the playoffs. (Even if NONE of the teams were very good – the Washington Football Team made the playoffs this year despite losing more than half their games!)

This makes it even more crucial to plan how to counter division opponents’ schemes, and those rivalry games are usually close and exciting.

If an opposing team in your division has exceptionally tall, strong receivers you can’t afford to be caught assigning small cornerbacks to guard them. You’d need to keep that in mind when building a roster.

Well Designed Scoring System

Intermittent Scoring to Build Excitement Throughout

In the NBA, a jaw-dropping feat of athleticism… gets 2 or 3 points. Since teams average over 100 points a game, there’s a limit to how impactful any one play can be until the last minutes of a game.

On the other end of the spectrum, each score in soccer is rare and thus hugely important. Unfortunately, goals are so infrequent that historically, over 30% of English league games end with neither team scoring more than once. Those rare goals are dramatic, but personally I find it difficult to get excited about the intervening dribbling and passing when I know it’s very unlikely to shape the final result.

It’s a tradeoff, and football’s rules situate it nicely between these extremes. There are typically around 8 scoring plays each NFL game, and even the non-scoring plays are impactful (see below).

Building Progress / Tension

Getting a player on base in baseball or softball, winning a game or a set in tennis’ Game/Set/Match structure, or defeating a video game monster and getting to heal at a save point — these smaller discrete goals build toward the larger one.

Image result for first down

Football is designed to have two key ways to make non-scoring plays significant. First, teams are only given four chances (downs) to score before they turn the ball over to the other team — but the count is reset every time they gain another 10 yards. Every third-down and fourth-down opportunity has more importance because it’s approaching that impactful mini-goal, breathing more life into a drive.

But even without this ratcheting structure, non-scoring progress matters because field position is a ‘stateful’ element which persists from one play to the next. Where one play ends, the next one begins. Every yard one team moves forward is an extra yard the opponents will need to win back. It’s a tug-of-war with scoring opportunities on the line at any time.

“Legacy” Narratives

Legacy games are all the rage, and for good reason. (Our country even decided to LARP a game of Pandemic Legacy!)

The fact that players, teams, and coaches have storylines, rivalries, and arcs is a big part of the human element to football. We’re watching some of the legends of their craft face off each week, with history between them and human motivations.

On Sunday, we get to watch completely different quarterbacks compete: young phenom Patrick Mahomes against the unaging Tom Brady. Mahomes, who won last year, is widely considered one of the best in the league despite being only 25. In contrast, Brady is 43 and amazingly this is his tenth Super Bowl appearance. He’s been playing at an elite level for decades, and he won his first Super Bowl in 2002 — when Mahomes was 6.

Most of Brady’s 20 years were spent under coach Bill Belichick, known for being brilliant and famously taciturn. (Unless, of course, you ask him about the history and minutia of football kicking rules, which makes him light up and talk for ages.) However, last year Brady decided to leave Belichick and the New England Patriots, so people were wondering whether he could succeed with a different coach. He’s answered those questions in dramatic fashion.

When storylines carry over from campaign to campaign or season to season, it’s a great way to build long narratives of meaning and importance.

Whether or not you enjoy watching the sport, there’s a lot it does well from a design perspective and I recommend anyone who enjoys strategy to try playing some of the Madden video games — how I initially got excited about the game.

Which Cognitive Bias is Making NFL Coaches Predictable?

In football, it pays to be unpredictable (although the “wrong way touchdown” might be taking it a bit far.) If the other team picks up on an unintended pattern in your play calling, they can take advantage of it and adjust their strategy to counter yours. Coaches and their staff of coordinators are paid millions of dollars to call plays that maximize their team’s talent and exploit their opponent’s weaknesses.

That’s why it surprised Brian Burke, formerly of (and now hired by ESPN) to see a peculiar trend: football teams seem to rush a remarkably high percent on 2nd and 10 compared to 2nd and 9 or 11.

What’s causing that?

His insight was that 2nd and 10 disproportionately followed an incomplete pass. This generated two hypotheses:

  1. Coaches (like all humans) are bad at generating random sequences, and have a tendency to alternate too much when they’re trying to be genuinely random. Since 2nd and 10 is most likely the result of a 1st down pass, alternating would produce a high percent of 2nd down rushes.
  2. Coaches are suffering from the ‘small sample fallacy’ and ‘recency bias’, overreacting to the result of the previous play. Since 2nd and 10 not only likely follows a pass, but a failed pass, coaches have an impulse to try the alternative without realizing they’re being predictable.

These explanations made sense to me, and I wrote about phenomenon a few years ago. But now that I’ve been learning data science, I can dive deeper into the analysis and add a hypothesis of my own.

The following work is based on the play-by-play data for every NFL game from 2002 through 2012, which Brian kindly posted. I spend some time processing it to create variables like Previous Season Rushing %, Yards per Pass, Yards Allowed per Pass by Defense, and QB Completion percent. The Python notebooks are available on my GitHub, although the data files were too large to host easily.

Irrationality? Or Confounding Variables?

Since this is an observational study rather than a randomized control trial, there are bound to be confounding variables. In our case, we’re comparing coaches’ play calling on 2nd down after getting no yards on their team’s 1st down rush or pass. But those scenarios don’t come from the same distribution of game situations.

A number of variables could be in play, some exaggerating the trend and others minimizing it. For example, teams that passed for no gain on 1st down (resulting in 2nd and 10) have a disproportionate number of inaccurate quarterbacks (the left graph). These teams with inaccurate quarterbacks are more likely to call rushing plays on 2nd down (the right graph). Combine those factors, and we don’t know whether any difference in play calling is caused by the 1st down play type or the quality of quarterback.


The classic technique is to train a regression model to predict the next play call, and judge a variable’s impact by the coefficient the model gives that variable.  Unfortunately, models that give interpretable coefficients tend to treat each variables as either positively or negatively correlated with the target – so time remaining can’t be positively correlated with a coach calling running plays when the team is losing and negatively correlated when the team is winning. Since the relationships in the data are more complicated, we needed a model that can handle it.

I saw my chance to try a technique I learned at the Boston Data Festival last year: Inverse Probability of Treatment Weighting.

In essence, the goal is to create artificial balance between your ‘treatment’ and ‘control’ groups — in our case, 2nd and 10 situations following 1st down passes vs. following 1st down rushes. We want to take plays with under-represented characteristics and ‘inflate’ them by pretending they happened more often, and – ahem – ‘deflate’ the plays with over-represented features.

To get a single metric of how over- or under-represented a play is, we train a model (one that can handle non-linear relationship better) to take each 2nd down play’s confounding variables as input – score, field position, QB quality, etc – and tries to predict whether the 1st down play was a rush or pass. If, based on the confounding variables, the model predicts the play was 90% likely to be after a 1st down pass – and it was – we decide the play probably has over-represented features and we give it less weight in our analysis. However, if the play actually followed a 1st down rush, it must have under-represented features for the model to get it so wrong. Accordingly, we decide to give it more weight.

After assigning each play a new weight to compensate for its confounding features (using Kfolds to avoid training the model on the very plays it’s trying to score), the two groups *should* be balanced. It’s as though we were running a scientific study, noticed that our control group had half as many men as the treatment group, and went out to recruit more men. However, since that isn’t an option, we just decided to count the men twice.

Testing our Balance

Before processing, teams that rushed on 1st down for no gain were disproportionately likely to be teams with the lead. After the re-weighting process, the distributions are far much more similar:


Much better! They’re not all this dramatic, but lead was the strongest confounding factor and the model paid extra attention to adjust for it.

It’s great that the distributions look more similar, but that’s qualitative. To do a quantitative diagnostic, we can take the standard difference in means, recommended as a best practice in a 2015 paper by Peter C. Austin and Elizabeth A. Stuart titled “Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies“.

For each potential confounding variable, we take the difference in means between plays following 1st down passes and 1st down rushes and adjust for their combined variance. A high standard difference of means indicates that our two groups are dissimilar, and in need of balancing. The standardized differences had a max of around 47% and median of 7.5% before applying IPT-weighting, which reduced the differences to 9% and 3.1%, respectively.


Actually Answering Our Question

So, now that we’ve done what we can to balance the groups, do coaches still call rushing plays on 2nd and 10 more often after 1st down passes than after rushes? In a word, yes.


In fact, the pattern is even stronger after controlling for game situation. It turns out that the biggest factor was the score (especially when time was running out.) A losing team needs to be passing the ball more often to try to come back, so their 2nd and 10 situations are more likely to follow passes on 1st down. If those teams are *still* calling rushing plays often, it’s even more evidence that something strange is going on.

Ok, so controlling for game situation doesn’t explain away the spike in rushing percent at 2nd and 10. Is it due to coaches’ impulse to alternate their play calling?

Maybe, but that can’t be the whole story. If it were, I would expect to see the trend consistent across different 2nd down scenarios. But when we look at all 2nd-down distances, not just 2nd and 10, we see something else:


If their teams don’t get very far on 1st down, coaches are inclined to change their play call on 2nd down. But as a team gains more yards on 1st down, coaches are less and less inclined to switch. If the team got six yards, coaches rush about 57% of the time on 2nd down regardless of whether they ran or passed last play. And it actually reverses if you go beyond that – if the team gained more than six yards on 1st down, coaches have a tendency to repeat whatever just succeeded.

It sure looks like coaches are reacting to the previous play in a predictable Win-Stay Lose-Shift pattern.

Following a hunch, I did one more comparison: passes completed for no gain vs. incomplete passes. If incomplete passes feel more like a failure, the recency bias would influence coaches to call more rushing plays after an incompletion than after a pass that was caught for no gain.

Before the re-weighting process, there’s almost no difference in play calling between the two groups – 43.3% vs. 43.6% (p=.88). However, after adjusting for the game situation – especially quarterback accuracy – the trend reemerges: in similar game scenarios, teams rush 44.4% of the time after an incomplete and only 41.5% after passes completed for no gain. It might sound small, but with 20,000 data points it’s a pretty big difference (p < 0.00005)

All signs point to the recency bias being the primary culprit.

Reasons to Doubt:

1) There are a lot of variables I didn’t control for, including fatigue, player substitutions, temperature, and whether the game clock was stopped in between plays. Any or all of these could impact the play calling.

2) Brian Burke’s (and my) initial premise was that if teams are irrationally rushing more often after incomplete passes, defenses should be able to prepare for this and exploit the pattern. Conversely, going against the trend should be more likely to catch the defense off-guard.

I really expected to find plays gaining more yards if they bucked the trends, but it’s not as clear as I would like.  I got excited when I discovered that rushing plays on 2nd and 10 did worse if the previous play was a pass – when defenses should expect it more. However, when I looked at other distances, there just wasn’t a strong connection between predictability and yards gained.

One possibility is that I needed to control for more variables. But another possibility is that while defenses *should* be able to exploit a coach’s predictability, they can’t or don’t. To give Brian the last words:

But regardless of the reasons, coaches are predictable, at least to some degree. Fortunately for offensive coordinators, it seems that most defensive coordinators are not aware of this tendency. If they were, you’d think they would tip off their own offensive counterparts, and we’d see this effect disappear.

Basking in Reflected Glory: Football, Self Esteem, and Pronoun Choice

Sports fan? This might describe you. Not a sports fan? This will help you make fun of the sports fans! Everyone else who just doesn’t care either way, here’s a neat psychology study for you.

You’ve probably noticed that when a team wins, their fans are more likely to wear their jerseys around. Since the New York Giants beat the New England Patriots 21-17 in the Superbowl last night, I’ve seen a bunch of proud Giants fans gloating on Facebook. But it’s not just the bragging, it’s the way they brag.

It turns out that sports fans will actually change the words they use based on whether their favorite team won or lost. Once again, I turn to the impeccable Mitchell and Webb to illustrate the tendency:

I love that retort: “Remember when we were chasing the Nazis in Raiders of the Lost Ark?” Movies don’t inspire the same tribal attitudes that sports do, but Mitchell’s rant does highlight the absurdity of using the word “we” in this context.

It’s not just anecdotal. Mitchell and Webb are describing an actual social phenomenon: Even if they have nothing to do with the results, fans are more likely to use “we” pronouns when their favorite team is doing well.

Robert Cialdini called it Basking in Reflected Glory. In an attempt to gain social standing, we try to associate ourselves with success.

Basking in Reflected Glory

Conducting a creative study (pdf), Cialdini and his researchers called college students and asked them how their school’s team had done in a particular game. When describing victories, 32% of the students referred to the team as “we” – “We won,” “We beat them,” etc. In contrast, only 18% used the word “we” when talking about their school’s team losing.

Makes sense, right? People wanted to be seen as part of a winning group. But it gets better.

Cialdini added a twist to his study: before asking about the football game, he asked the students six quick, factual questions. Regardless of their answers, they were either told that they’d done well (gotten five correct) or poorly (gotten only one out of six correct). He hypothesized that the students who were told they’d failed would be more likely to grasp at straws to regain social status.

When the numbers were separated out, the tendency was clear: Almost all the increase in “we” pronouns was from the students who lost prestige by being told they’d failed.

Likelihood of using “we” pronoun(%)

“Succeeded” on Test “Failed” on Test Mean
Describing Win 24% (11/45) 40% (16/40) 32% (27/85)
Describing Loss 22% (9/41) 14% (6/42) 18% (15/83)

Students who were given a dose of self-esteem didn’t change their language based on whether their team won or lost.

But students who felt embarrassed? They were much more likely to latch onto a winning team and distance themselves from a losing team.

So you know all those Giants fans posting status updates on Facebook saying “We won!” or “We’re number one”? Ask them why their self-esteem is so low that they need to Bask in Reflected Glory.

That’ll show ’em.

[Title changed after posting from “How Football Scores Actually Change The Way We Talk”]

%d bloggers like this: