When Does Quarterback Performance Get Real?
We have a pretty good idea about who a QB is after 9-10 games.
Rookie QB and 2024 number 2 overall pick Jayden Daniels had a tremendous game Monday night against the Cincinnati Bengals. He completed 21/23 passes for 254 yards and 2 touchdowns while adding 39 yards and a touchdown over 12 carries on the ground. This puts him in a great light in comparison to his fellow rookies Caleb Williams and Bo Nix, who have collectively thrown 2 touchdowns and 8 interceptions in 3 games. So how confident can we be that he will be better than Nix or Williams at this point in time? Or, to get at the core issue, when will we know his performance is “real”? After going through the careers of every QB who has entered the league since 1999, the data suggests that a given QB’s performance will stabilize after roughly 272 dropbacks or 9-10 games.
When we talk about the “real”-ness of any rookie QB’s early season performance, we want to know when we can say their results are reflective of their talent and not just the random noise of a couple games. We can arrive at a more precise measure of when statistics begin to stabilize by employing the “padding approach” commonly used by Kostya Medvedovksy (creator of DARKO), Justin Kubatko, and Tangotiger.
The logic behind the padding approach is that the best projection for a player or team’s future performance given little information is that they will be average. But as we get more and more information about the player and team, we rely more on the actual performance. We do this by “padding” their actual results with some number of league average results, called the stabilization point. This is what the formula looks like for EPA per dropback (X represents the stabilization point):
This function tells us that our best guess for a QB’s future performance is mostly reliant on their past performance after they pass the stabilization point. One analogy is that passing the stabilization point is the statistical equivalent of having enough film to evaluate a QB. From that point they can still improve, regress, change their tendencies, get really into conspiracy theories or whatever, but we can reliably begin to make predictions based on their past performance from that point forward.
NOTE: any of the data scientists in the audience get a gold star for noticing that the function is an empirical Bayes method, and that it just converges to EPA/DB over a large sample
But how do we figure out what the right stabilization point is anyways? Simple: we decide which level we want to project EPA/DB at (i.e. game, season, career) with what sample and then we find a clever way to try a bunch of different X values to see what works best. I wanted to project career EPA/DB using all QBs who have entered the league since 1999 with at least 500 dropbacks and my clever way of trying a bunch of numbers was to copy what Medvedovsky did (differential evolution). Differential evolution is a random process so I ended up with a distribution of fitted potential stabilization points.

The distribution of potential stabilization points ended up as a nice bellcurve between 250 and 300 dropbacks. The average of 272.57 dropbacks minimizes the error in our projections and I have no reason to pick anything else, that is our optimal stabilization point! If we assume that QBs average 30 dropbacks a game, this means that we our projections of their future performance becomes mostly reliant on their past performance after 9ish games. So what does this tell us about the current crop of rookies?
Through three games, the broad brush results of each QB’s performances is the same after padding the results. Daniels is performing the best while Williams and Nix are fighting each other to be less subpar. Just the differences between the three are much less pronounced than their raw outputs suggest. The padding approach allows us to create more realistic projections on a small sample, while giving us the knowledge that their past performance will not be particularly predictive for another two months.
This is typical of statistical analysis. It's fantastic because it reinforces what you already knew, but that also makes it somewhat boring. Not to call this post boring or anything. My experience in academia has trained me to interpret 'your analysis reinforced what I already knew' as a compliment.
I like this analysis because it aligns almost perfectly with the generally accepted NFL wisdom that team stats (teams don't have careers, they restart every time) crystallise after week 11, which with byes would be the ten game mark. However, just because I could've told you the answer before I started reading does not mean it does not need to be proven. Thank you for doing the proving.
It's still mind boggling that the best QB in the NFL is a rookie. There's nothing fluky about it either. His CPOE is almost 12 (!). It's unbelievable. There's no possible way that this performance can continue, because if it does than Jayden Daniels has come into the league as the best QB in NFL history, which is just not possible. From here, we're all just predicting how steep the falloff will be, and I don't think any of us have any idea as to the answer of that question.
My mind leans toward the side that the falloff will be severe, because rookie QBs just tend to stink, but there's been nothing fluky about Jayden's start. Nothing whatsoever, so it's so hard to tell.