2026-06-18 Β· 9f68c4a
Docs / 4 Β· Variable Catalog

04 β€” Variable Catalog: What Actually Predicts NFL Games

TL;DR. Ranked by out-of-sample predictive value (what matters for forecasting future games, not describing past ones): **EPA/play (pass-weighted)

QB identity > ANY/A > opponent-adjusted point differential / Pythagorean > DVOA + success rate > pace/PPD/field position > wind > rest/HFA > turnovers / red-zone / fumble luck (regress hard) > special teams / O-line / coaching (mostly captured indirectly). The recurring theme: offense is more stable than defense, and the noisy stuff (turnovers, RZ TD%, fumble recoveries, close-game records) must be regressed toward the mean β€” use it to flag regression, not to project.**

This is the menu of inputs. For each: what it is, how predictive, and the caveats that govern how much to trust it.


Tier 1 β€” Highest predictive value

1. EPA per play (offense & defense, pass/rush split)

The most-cited single predictor. Passing EPA carries far more weight than rushing EPA β€” it is the backbone of modern power ratings. - Offense is more stable than defense. Year-to-year correlation: off EPA/play r β‰ˆ 0.377 vs def EPA/play r β‰ˆ 0.322. Every defensive correlation is lower than its offensive counterpart β†’ weight offense more, regress defense more. - Caveat: it's a team/unit metric (can't cleanly isolate a player); needs sample to stabilize; the EP baseline drifts yearly with scoring environment.

2. QB identity / QB injury

The largest discrete swing variable. Elite QB β‰ˆ 5–7 pts on the spread; non-elite starter 3–4; backup downgrade swings 3–7 pts depending on drop-off. - Caveat: once announced, the market prices it in seconds β€” no residual edge from the obvious move. The edge is in anticipating it and correctly valuing the backup.

3. ANY/A (Adjusted Net Yards per Attempt)

Best simple passing/QB metric. (yds + 20Β·TD βˆ’ 45Β·INT βˆ’ sack_yds) / (att + sacks). Correlation with team wins β‰ˆ 0.67; the higher-ANY/A team scored more in ~87% of games one season. Caveat: descriptive correlation, partly an EPA proxy.

4. Pythagorean expectation

Point differential predicts future wins better than W-L record. NFL exponent 2.37: PF^2.37 / (PF^2.37 + PA^2.37). Teams whose record outran their differential regress down ~2 wins; underperformers gain ~1.2. Caveat: adjust point differential for opponent and garbage time first.


Tier 2 β€” Strong, with structure/adjustment required

5. DVOA (opponent-adjusted efficiency)

Per-play value vs. a baseline for the same down, distance, field position, then opponent-adjusted; RZ and late-close plays weighted more. - Success baselines: 1st down β‰₯45% of needed yds, 2nd β‰₯60%, 3rd/4th = convert. - Companions: DAVE (blends a preseason prior with in-season DVOA early in the year), Weighted DVOA (recency-weighted). - Caveat: opponent adjustments are unreliable until ~Week 4–6; lean on the prior before then. (We already ingest DVOA into the prediction engine.)

6. Success rate

Binary "stayed on schedule" per play; lower variance than EPA β†’ stabilizes faster β†’ good early-season signal. Best paired with EPA (frequency vs. magnitude).

7. Strength of schedule / opponent adjustment

Not predictive alone, but a necessary correction to every raw efficiency stat. Unadjusted EPA/DVOA/point-diff overrate teams that played weak slates.

8. Rest / travel / situational

Small and mostly priced. Bye edge collapsed post-2011: +2.21 → +0.31 pts. Short-week (<6 days) road after Week 6: 43.9% win / 47.4% ATS. West→East time-zone shifts affect cognition (hard to quantify). (Full table in Doc 2 §8.)

9. Home-field advantage

Shrunk: ~2.5 long-run, ~1.5 in recent seasons; home win rate fell from ~57–60% to ~52–53% since 2019. Use a current, decaying, ideally venue-specific constant β€” not the historical 3. (Doc 2 Β§6.)


Tier 3 β€” Real but high-variance / regress heavily / second-order

10. Turnovers / turnover margin β€” the classic trap

Important for past outcomes, nearly useless for prediction. - Year-to-year turnover-margin correlation β‰ˆ 0.10 (RΒ² β‰ˆ 0.01). A +20 team projects to just +2.2 next year. - ~46% skill / 54% luck; fumble recovery is ~50/50 with ~zero carryover skill. - Rule: regress turnover margin hard toward zero. Use it to identify regression candidates, not to project. - Our engine: a turnovers_scale of 0.2, cap 0.8 β€” already modest and appropriate. Don't increase it. Confidence also leans on turnover differential β€” consider down-weighting given its noise.

11. Red-zone efficiency

Strong descriptively, regresses ~11–12%/yr for top teams. Use as a finishing-drives adjustment with heavy regression, not a raw input. (Our redzone_scale is 1.0 β€” make sure the underlying input is regressed.)

12. Special teams / field position

Small "hidden" yardage: ~0.03 EP/yard; +1 net punt yard β‰ˆ +7.3 pts/season. Biggest single-drive predictor is starting field position. Non-trivial in close games; small share of total team value.

13. O-line / non-QB skill / individual defenders

Modest spread impact: skill players ~0.5–2.5 pts, defenders ~0.5–1 pt. O-line matters mainly via QB efficiency / sack rate, not as a standalone input β€” which is the right way to model it (we have line-play metrics).

14. Coaching / situational tendencies

Pace philosophy, pass-rate-over-expectation, 4th-down aggressiveness, blitz rate. Hard to quantify directly; best captured indirectly through the efficiency and pace metrics they produce. (We have a coaching weight at 0.6 β€” keep it small.)


Cross-cutting: recency weighting & priors

  • Recency weighting is standard. Recent games carry more weight (injuries, scheme, personnel). Methods: exponential decay, weight = 1/(weeks_ago + 0.4), or linear (7,6,5,…). One optimization found the most recent ~5 weeks of spreads minimized MSE.
  • Early-season priors. Blend current results with a long-run prior (prior 3 seasons' ratings + market-implied odds), Γ  la DAVE β€” the first ~4 weeks are too noisy to trust raw.
  • Regression to the mean governs trust per stat: strongest for turnovers, fumble recoveries, RZ TD%, close-game records (regress aggressively); weakest (most "real") for offensive pass EPA and ANY/A (trust more).

The ranked summary (out-of-sample predictive value)

Rank Variable Use it for Regress?
1 EPA/play, pass-weighted (off > def) core team strength mild; regress def more
2 QB identity / starter-backup delta spread (3–7 pts) n/a (it's an event)
3 ANY/A simple passing strength mild
4 Opp-adj point diff / Pythagorean (exp 2.37) record-independent strength adjust opp + garbage time
5 DVOA + success rate efficiency + stability lean on prior pre–Wk 6
6 Pace / PPD / field position totals core mild
7 Wind β‰₯15–20 mph totals (βˆ’2.7 at 20+) n/a (forecast)
8 Rest / HFA small spread nudges use current values
9 Turnovers / RZ TD% / fumble luck regression flags only hard
10 ST / O-line / non-QB skill / coaching second-order, via efficiency β€”

Net guidance for our weights file. Our biggest missing inputs are #2 (QB delta) and a proper #1/#4 opponent-adjusted rating spine. Our existing situational weights (turnovers, rest, divisional, coaching) are appropriately small and should stay small. The mistake to avoid is over-trusting Tier-3 noise.

Sources: EPA stability, ANY/A, Pythagorean, DVOA methods, turnover randomness, turnover margin, red zone regression, field position / ST, QB/player values, recency/priors, bye-week academic.

β†’ Continue to Doc 5 β€” Modeling Methods.