A Quarterback Wins Discussion

Rule #1 on the Dynasty, in Theory blog: any time you can write an article title that doubles as a garden path sentence, you must do so.

Ryan Riddle asks on Twitter:

Real question: Can someone who is against “QB wins” and for other QB stats please explain what they’re protesting exactly?

— Ryan Riddle (@Ryan_Riddle) December 2, 2015

Because I firmly believe that the only limits placed on character should all be self-imposed, I have taken to not-Twitter to respond.

First of all, I can handily parry the claim that wins are not a statistic. Pro Football Reference’s player queries allow you to search by quarterback wins. If it exists on PFR, it’s a statistic. QED.

A quarterback’s wins do, indeed, tell us information about that quarterback. Good quarterbacks win more than bad quarterbacks. Everyone knows this. If you show me the fifty winningest quarterbacks of all time and the fifty losingest quarterbacks of all time, I feel quite confident that the former group is substantially better, on the whole, than the latter.

So “quarterback wins” does indeed exist as a thing we can use to roughly estimate a quarterback’s quality. It is not technically useless. My main problem with it lies in the fact that I hope for more from statistics used to estimate player quality than “not technically useless”.

Fatal Flaw #1-
The easiest way to demonstrate a flaw in “quarterback wins”, in my opinion, is to ask why we specifically use QUARTERBACK wins. Why does no one track wide receiver wins and hold it against Calvin Johnson? Why are J.J. Watt’s defensive end wins not an indictment of him?

I would wager that if you took the 50 winningest receivers of all time and compared them to the 50 losingest receivers, the former group would be substantially better, as a whole. And likewise for defensive ends, and tight ends, and offensive linemen, and linebackers, and defensive backs, and even kickers or punters.

This is because a player’s “quality” is convenient shorthand for how much he helps his team win football games. Good players help their teams win more than bad players, so if we knew nothing else, we should expect a good player’s team to win more than a bad player’s.

Imagine that after this season, Josh Norman and Brandon Browner were going to be taken off of their current franchises, randomly assigned to new teams, and those teams would be required to start them for all 16 games and play them for at least 60% of all defensive snaps. If Vegas was taking bets, who would you bet would win more games in 2016? If you said “Josh Norman”, you are a believer in cornerback wins, too.

The reason given for using quarterback wins but not cornerback wins is that quarterbacks are more responsible for whether a team wins or loses than cornerbacks. And this is true! But this explanation presupposes that there’s a certain level of “responsibility” above which wins are a meaningful stat and below which they are not. But given that this threshold is assigned entirely arbitrarily, I question that very much. We should either say “wins are a bad measure of player quality for everyone, but they’re less bad for quarterbacks”, or we should say “wins are a perfectly fine measure of player quality for everyone, including non-quarterbacks”.

Unless and until I see someone adopting either of those positions, I will continue to question the privileged place “quarterback wins”, in particular, occupy.

Fatal Flaw #2-
Let’s go back to my Norman/Browner hypothetical. If we knew nothing else, we would expect Norman to win more games in 2016.

Let us suppose, though, that we did know something else. Suppose we knew Browner would be returning to New England, while Norman would be playing for the Cleveland Browns. In this hypothetical, I’d bet on Browner winning more games. Because while good players impact wins, each individual player is just a small part of the whole.

Football statistics, more than basketball or baseball, are hopelessly entangled. Unlike the other popular sports, each individual’s performance is going to be heavily dependent on his supporting cast. Over the last two years, Antonio Brown averages 115 yards in games Ben Roethlisberger plays and 58 yards in games Ben Roethlisberger misses. Since 2011, Tom Brady averages 7.9 yards per attempt when Rob Gronkowski starts and 6.7 yards per attempt when Rob Gronkowski sits.

Any attempt to evaluate players based on their statistics must grapple with this reality. But while every statistic is entangled, some statistics are far more entangled than others.

Let’s assume for a second that yards per attempt is 50% a result of the quarterback and 50% a result of the quality of his supporting cast. Let’s assume at the same time that quarterbacks are so important that they account for 25% of their team’s total contributions towards winning games.

Imagine extracting meaning from these statistics to be like extracting juice from oranges. You can certainly get a full glass of juice if you have enough oranges when using wins, but using YPA to estimate player quality will get you the same amount of juice with half as many oranges. (Or, alternately, twice as much juice with the same number of oranges.)

Because wins are more hopelessly entangled than many other stats, they are inferior to those other stats as measures of player quality, unless there is a dramatic difference in sample sizes. Again, it’s not so much that wins are not useless, it’s that I aim higher than “not useless” when selecting which statistics I’m going to devote much attention to.

Fatal Flaw #3-
I just mentioned sample sizes. They’re a bitch. QB wins are discrete, binary, and indivisible. They accumulate at a rate of one per game, meaning it takes a lot of games to get something usable. A full 16-game season is not a large enough sample size to draw meaningful inferences, and forget about using partial seasons, (such as noting Indianapolis’s record with Andrew Luck vs. Matt Hasselbeck as if it tells us anything at all about the relative quality of Andrew Luck and Matt Hasselbeck). Those sample sizes are just unusable.

If you don’t think small sample sizes are an issue, take it to its logical extreme. If we are using a sample size of one, we must conclude that Brock Osweiler is probably a better quarterback than Tom Brady right now. Does anyone out there fancy themselves a modern Gene Forrester and feel like joining me on that limb? For the sake of my leg, I hope not.

Now, given a large enough sample size, we start to get some really workable data. John Elway played 234 games. Vinny Testaverde played 233. Elway’s career record was 148-82-1. Testaverde’s was 90-123-1. John Elway was probably a better quarterback than Vinny Testaverde. (Though if you think a losing career record means Testaverde was a bad quarterback, I challenge you to explain how he started 233 games in the first place.)

But, as with fatal flaw #2, we could have much more easily gotten this information elsewhere. It’s not like we really needed to resort to wins to explain why Elway was a better quarterback than Testaverde. He completed a higher percentage of his passes for more touchdowns and fewer interceptions, averaging more yards per game and per attempt. And it’s not like we can just use quarterback wins as-is without also considering the difference between the Denver Broncos and the rotating cast of teams Testaverde found himself on.

But most people deploying “quarterback wins” don’t seem interested in waiting long enough to acquire a somewhat usable sample size. Instead, we see stats like “Dallas is 3-0, (now 3-1), with Tony Romo and 0-7 without him”. Interesting trivia, but not especially useful with regards to evaluating how good Tony Romo really is as a player.

Fatal Flaw #4-
Wins are ultimately a descriptive stat that gets dressed up like a predictive stat. Descriptive stats explain what happened, predictive stats tell us what will happen next. Wins ultimately are not a very “sticky” stat; they fluctuate wildly from sample to sample.

ESPN’s QBR originally included extra weight for performance in “high leverage” situations- the fourth quarter of a one-score game, say. This extra weight essentially incorporated wins into the statistic, since players who performed well in high leverage won more games. This made QBR a much better descriptive stat- it better explained what happened before. But it limited the effectiveness of QBR as a predictive stat, one that would tell us what was going to happen next, so it was ultimately discarded.

This is not a shot at descriptive stats. They are incredibly valuable. I have been called various things in my time analyzing football- a historian, a “stats guy”, an idiot. I consider myself a storyteller. And part of the reason I’m so drawn to football is its suitability for the telling of a story.

Indeed, storytelling is woven into the very fabric of the game, immortalized for generations to come. And for the telling of stories, there is nothing more useful than wins. It is the descriptive statistic nonpareil. My complaints should be understood as only applying when it crosses the Rubicon and is deployed– indeed, often weaponized– against poor, individual players. “Tony Romo has choked in the past” quickly becomes “Tony Romo is a choker”, and suddenly there is no longer any redeeming quarterback wins.

Fatal Flaw #5-
Earlier, I mentioned the binary, indivisible nature of wins and losses. That’s bad. It turns wins into an extremely blunt hammer.

Consider: we often hear about a quarterback’s “4,000 yard passing seasons”. 4k seasons are also binary and indivisible. In general, a quarterback with four 4k seasons in four years will be better than a quarterback with two 4k seasons in four years.

But imagine the first quarterback had exactly 4,000 yards in all four years, while the second threw for 3,999, 3,999, 4,999, and 4,999. The second quarterback threw for nearly 2,000 more yards over that span, averaging nearly 500 more yards per year! And yet the arbitrary and indivisible nature of “4k seasons” obscured all of that detail.

I’m not hating on the idea of 4,000-yard seasons, or 300-yard passing games, (or the RB/WR equivalent of 1,000-yard seasons and 100-yard games). These are interesting, quick little shorthands. At the broadest level, they’re pretty good indicators of player quality. A guy with eight 4k seasons is probably better than a guy with three.

But again, they obscure more detail than they reveal. There’s a reason why fantasy football has moved towards decimal scoring over the last two decades- because a 99-yard game has more in common with a 100-yard game than it does with a 90-yard game.

Wins, like arbitrary yardage thresholds, takes quantitative data and turns it into qualitative data. Which is sometimes useful for communication, as language is far more qualitative. But from the perspective of analysis, it’s a clear step backwards.

A Potential Solution-
I’m not saying that winning is irrelevant. I’m just saying that wins and losses are a pretty crappy indicator of player quality, all things considered. As far as outcomes, everything done in the NFL is supposed to be done in the pursuit of wins. That’s the be-all, end-all measure of success.

So I do think we should consider wins. Or, more specifically, I think we should consider what a player is doing that is contributing to wins. And there are a class of statistics out there that do exactly this. They typically even include the word “wins” right in the title. Win shares. Win Probability Added, (or WPA).

These stats solve much of the issue with wins themselves. For starters, the biggest improvement is they attempt to tease out individual contributions to wins. J.J. Watt is no longer penalized for playing with Brian Hoyer and not Tom Brady.

The next biggest improvement is that they are continuous instead of discrete, divisible instead of binary. A player cannot earn a fraction of a win, but he can earn a fraction of a win share. And if he plays better, he can earn an even larger fraction. And if he plays worse, he can earn a smaller fraction. This divisibility means we more quickly get to a point of usable sample sizes, which means we don’t have to wait a decade before drawing any meaningful conclusions.

WPA, Win Shares, and similar metrics are not perfect. There’s no such thing as perfect stats in football; as I mentioned, there is far too much entanglement. But they are RADICALLY better than just plain old wins and losses, and I’m not one to let the absence of perfection dissuade me from the pursuit of improvement.

I get to choose which stats I devote mindspace to. The fact that wins are not technically useless doesn’t mean they’re worthwhile. If I want to consider how a player is impacting his team’s bottom line– and I do!– there are better ways for me to go about it.

Which, in a(n awfully large) nutshell, is why I’m not a fan of “quarterback wins”.

You Might Also Like

Owning the First Round

How I’m Rebuilding My Dynasty Team (Part 3): Final Trades and Rookie Draft

How I’m Rebuilding My Dynasty Team (Part 2): How I Trade