Quantcast
Channel: Hacker News 50
Viewing all articles
Browse latest Browse all 9433

Dataset: Ten Years of NFL Plays Analyzed, Visualized, Quizzified (Downloadable) - Statwing Blog

$
0
0

Comments:"Dataset: Ten Years of NFL Plays Analyzed, Visualized, Quizzified (Downloadable) - Statwing Blog"

URL:http://blog.statwing.com/dataset-ten-years-of-nfl-plays-analyzed-visualized-quizzified-downloadable/


Statwing is an easy-to-use data analysis tool, available for individual use or embedded into other products.

It’s third-and-3 and you desperately need a first down. What do you do, run or pass?

We’ve structured ten years of NFL play-by-play data (raw data complements of Advanced NFL Stats), thenuploaded it into Statwing for analysis. Now you can test your coaching instincts against the data.

Early in the game, the score is tied. You have fourth-and-goal at the 2-yard line. What should you do?

Go for it Kick a field goal If you do the math, it works out similarly either way

Wrong. You should go for it.

Wrong. You should go for it.

When teams go for it on fourth-and-goal from the 2, they get a touchdown 45% of the time. So on average teams get 3.1 points when they go for it—roughly the same amount they’d expect if they kicked a field goal, since only 2% of field goals are missed at that range.

Click the image to explore this analysis. If you want your analyses to save, though, you’ll need to use the link at the top of the page to play with the dataset.

But that’s not all. If you’re stopped on fourth-and-goal, the opponent starts with terrible field position. You’ll even get a safety about 5% of the time. By comparison, you can expect the opponent to start from the 23-yard line after a kickoff following a made field goal, and they will even have a 0.5% chance of returning the kickoff for a touchdown

Click the image to see the full NFL 2012 regular season dataset in Statwing. It contains all the analyses cited above.

Going for it and kicking a field goal both yield about 3 points on average, and the field position is much better if you go for it. Despite this, coaches usually kick a field goal on fourth-and-goal. During the last ten seasons coaches went for the touchdown about 20% of the time in this situation.

You’ve gotten out of answers correct so far.

It’s third-and-1. Which type of run is most likely to result in a first down?

Run it up the gut(between the linemen) Run off tackle(outside the linemen)

Wrong. Side story: your author’s mom always yelled at the Chiefs not to go up the middle on third-and-short. It saddens your author to find out that she was mostly likely leading the Chiefs astray.

Going up the gut just barely beats running around the end.

Click the image to see an even more detailed breakdown of running plays (e.g., off-center versus off-guard).

You’ve gotten out of answers correct so far.

On third-and-3, are you more likely to pick up a first down by running the ball or passing it?

Running Passing Running and passing are roughly equally likely to work

Sort of. Good enough.

Running was not statistically significantly more likely to yield a first down, but it did trend slightly above passing (51% vs 49%), so we’ll give it to you.

Wrong

Running actually trends towards being more effective than passing, at 51% vs 49% (though the difference isn’t statistically significant, so the best answer is “equally likely to work”.

Correct

Running was not statistically significantly more likely to yield a first down (though it did trend slightly above passing (51% vs 49%)
In case you’re curious, here are the odds of picking up a first down on third-and-x, split by running versus passing:

Runs are statistically significantly more effective with 1 yard to go, and passes are more effective with 4+ yards to go.

As an aside, coaches tend to pass on third with more than a yard to go.

Coaches very rarely run on third down with three or more yards to go.

You’ve gotten out of questions correct so far.

You need a two-point conversion. What kind of play should you call?

Running Passing Running and passing are equally likely to work.

During the last ten years running has succeeded 62% of the time, versus 46% for passing.

This seems odd because we just found out that running was only microscopically better than passing on third-and-2. But a two-point conversion is different from a typical third-and-2; the defense isn’t spread out, so it’s hard for receivers to find gaps in the coverage.

Click the image to see statistical data, explore this analysis, and play with the rest of the dataset in Statwing.

This suggests that coaches should run more often than they currently do.

Click the image to see the confidence intervals and play with the rest of the dataset in Statwing.

You’ve gotten out of answers correct so far.

Would you like to see pretty data visualizations about punting?

Yes No

Correct.You would, it seems.

Sorry, that’s incorrect, you would love to see pretty data visualization about punting. Suprised you didn’t know that.

Punts have gotten roughly half of a yard longer per season over the ten year period.

Click the image to see statistical data, explore this analysis, and play with the rest of the dataset in Statwing.

But does that mean punters are increasingly outpunting their coverage? After all, longer punts beget longer returns.

Like the other binned scatterplot, this visualization was made automatically in Statwing with 3 clicks.

Nope! We looked into it, and while returns have gotten longer, they only got longer at about .15 yards per season, and a pretty similar number of punts aren’t returned at all.


Thanks for playing

You got out of answers correct. When you try this quiz with a sorry quiz-taker like you, that’s the result you’re going to get. [That's a joke, we think you did fine :)]When you try this quiz with a sorry quiz-taker like you, that’s the result you’re going to get. [That's a joke, we think you did fine :)]When you try this quiz with a sorry quiz-taker like you, that’s the result you’re going to get. [That's a joke, we think you did fine :)]When you try this quiz with a sorry quiz-taker like you, that’s the result you’re going to get. [That's a joke, we think you did fine :)]When you try this quiz with a sorry quiz-taker like you, that’s the result you’re going to get. [That's a joke, we think you did fine :)]You’re the best quiz-taker in the game.

Tweet your quiz results

Discussion on Hacker News

A special thanks to David Laughlin, who wrote most of the copy for this post. David is available for freelance work at davidclaughlin@gmail.com.


See notes below to download the data.

Update: burntsushi from Hacker News created some tools that make it easy to query this kind of data. We haven’t looked at them in depth but they look much more efficient than the datasets we link to below.

Notes

The original data from Advanced NFL Stats is mostly free-text play descriptions, which we interpreted into structured data using Excel. The original data does have a few errors here and there, not all of which could be cleaned up. Some plays are missing, and a some plays have some inaccurate data, maybe 0.5%.

We’re very confident that you’ll have a much easier time exploring the data in Statwing than in Excel or another tool. So we encourage you to try analyzing it in Statwing first. To save your analyses, use this link, not the ones linked to in the images above.

But, if you don’t believe us or you want to modify the data, you can download the raw CSV of our version of the data.

In 2003, 2004, and 2005, the data doesn’t discriminate between a QB scramble and a run. So for many analyses (like many of the above), you’ll want to filter out those years.

We make the following assumptions throughout:

  • You are coaching the “average” team. Individual teams would vary, but the data we use is the average of all teams’ behaviors in whatever situation we are analyzing.
  • There are more than five minutes left in the half or game.
  • Unless otherwise noted, you’re between your ten and your opponents 20-yard line.
  • Yardage from penalties committed during a play is included in the outcome.
  • The hypothetical coach does not always call the same type of play in the same situation. That is, the coach randomizes play calling enough to be unpredictable while still favoring the more advantageous plays. The very awesome Brian Burke at Advanced NFL Stats does a great job ofdescribing randomization and game theory.

In the last ten years, there have only been a few instances where teams went for it on fourth-and-goal at the 2-yard line. It turns out, though, that going for the end zone on third-and-short is pretty similarly successful to going for it on fourth-and-short, so we used both types of plays in this analysis. For example, the 45% figure was calculated using both third- and fourth-and-2. Here, as with the rest of this fourth-and-2 analysis, we were inspired by a 2002 paper by David Romer. Romer is a notable Berkeley economist, and his wife chaired the White House Council of Economic Advisors in 2009 and 2010.

If objectives other than just picking up a first down are considered, there is evidence that running is better. Brian at Advanced NFL Stats does a great job of diving into that question, though you might want to learn about the concept of expected points before reading Brian’s analysis.


Viewing all articles
Browse latest Browse all 9433

Trending Articles