Docs / 4 · Variable Catalog

04 — Variable Catalog: What Actually Predicts NFL Games

TL;DR. Ranked by out-of-sample predictive value (what matters for forecasting future games, not describing past ones): **EPA/play (pass-weighted)

QB identity > ANY/A > opponent-adjusted point differential / Pythagorean > DVOA + success rate > pace/PPD/field position > wind > rest/HFA > turnovers / red-zone / fumble luck (regress hard) > special teams / O-line / coaching (mostly captured indirectly). The recurring theme: offense is more stable than defense, and the noisy stuff (turnovers, RZ TD%, fumble recoveries, close-game records) must be regressed toward the mean — use it to flag regression, not to project.**

This is the menu of inputs. For each: what it is, how predictive, and the caveats that govern how much to trust it.

Tier 1 — Highest predictive value

1. EPA per play (offense & defense, pass/rush split)

The most-cited single predictor. Passing EPA carries far more weight than rushing EPA — it is the backbone of modern power ratings. - Offense is more stable than defense. Year-to-year correlation: off EPA/play r ≈ 0.377 vs def EPA/play r ≈ 0.322. Every defensive correlation is lower than its offensive counterpart → weight offense more, regress defense more. - Caveat: it's a team/unit metric (can't cleanly isolate a player); needs sample to stabilize; the EP baseline drifts yearly with scoring environment.

2. QB identity / QB injury

The largest discrete swing variable. Elite QB ≈ 5–7 pts on the spread; non-elite starter 3–4; backup downgrade swings 3–7 pts depending on drop-off. - Caveat: once announced, the market prices it in seconds — no residual edge from the obvious move. The edge is in anticipating it and correctly valuing the backup.

3. ANY/A (Adjusted Net Yards per Attempt)

Best simple passing/QB metric. (yds + 20·TD − 45·INT − sack_yds) / (att + sacks). Correlation with team wins ≈ 0.67; the higher-ANY/A team scored more in ~87% of games one season. Caveat: descriptive correlation, partly an EPA proxy.

4. Pythagorean expectation

Point differential predicts future wins better than W-L record. NFL exponent 2.37: PF^2.37 / (PF^2.37 + PA^2.37). Teams whose record outran their differential regress down ~2 wins; underperformers gain ~1.2. Caveat: adjust point differential for opponent and garbage time first.

Tier 2 — Strong, with structure/adjustment required

5. DVOA (opponent-adjusted efficiency)

Per-play value vs. a baseline for the same down, distance, field position, then opponent-adjusted; RZ and late-close plays weighted more. - Success baselines: 1st down ≥45% of needed yds, 2nd ≥60%, 3rd/4th = convert. - Companions: DAVE (blends a preseason prior with in-season DVOA early in the year), Weighted DVOA (recency-weighted). - Caveat: opponent adjustments are unreliable until ~Week 4–6; lean on the prior before then. (We already ingest DVOA into the prediction engine.)

6. Success rate

Binary "stayed on schedule" per play; lower variance than EPA → stabilizes faster → good early-season signal. Best paired with EPA (frequency vs. magnitude).

7. Strength of schedule / opponent adjustment

Not predictive alone, but a necessary correction to every raw efficiency stat. Unadjusted EPA/DVOA/point-diff overrate teams that played weak slates.

8. Rest / travel / situational

Small and mostly priced. Bye edge collapsed post-2011: +2.21 → +0.31 pts. Short-week (<6 days) road after Week 6: 43.9% win / 47.4% ATS. West→East time-zone shifts affect cognition (hard to quantify). (Full table in Doc 2 §8.)

9. Home-field advantage

Shrunk: ~2.5 long-run, ~1.5 in recent seasons; home win rate fell from ~57–60% to ~52–53% since 2019. Use a current, decaying, ideally venue-specific constant — not the historical 3. (Doc 2 §6.)

Tier 3 — Real but high-variance / regress heavily / second-order

10. Turnovers / turnover margin — the classic trap

Important for past outcomes, nearly useless for prediction. - Year-to-year turnover-margin correlation ≈ 0.10 (R² ≈ 0.01). A +20 team projects to just +2.2 next year. - ~46% skill / 54% luck; fumble recovery is ~50/50 with ~zero carryover skill. - Rule: regress turnover margin hard toward zero. Use it to identify regression candidates, not to project. - Our engine: a turnovers_scale of 0.2, cap 0.8 — already modest and appropriate. Don't increase it. Confidence also leans on turnover differential — consider down-weighting given its noise.

11. Red-zone efficiency

Strong descriptively, regresses ~11–12%/yr for top teams. Use as a finishing-drives adjustment with heavy regression, not a raw input. (Our redzone_scale is 1.0 — make sure the underlying input is regressed.)

12. Special teams / field position

Small "hidden" yardage: ~0.03 EP/yard; +1 net punt yard ≈ +7.3 pts/season. Biggest single-drive predictor is starting field position. Non-trivial in close games; small share of total team value.

13. O-line / non-QB skill / individual defenders

Modest spread impact: skill players ~0.5–2.5 pts, defenders ~0.5–1 pt. O-line matters mainly via QB efficiency / sack rate, not as a standalone input — which is the right way to model it (we have line-play metrics).

14. Coaching / situational tendencies

Pace philosophy, pass-rate-over-expectation, 4th-down aggressiveness, blitz rate. Hard to quantify directly; best captured indirectly through the efficiency and pace metrics they produce. (We have a coaching weight at 0.6 — keep it small.)

Cross-cutting: recency weighting & priors

Recency weighting is standard. Recent games carry more weight (injuries, scheme, personnel). Methods: exponential decay, weight = 1/(weeks_ago + 0.4), or linear (7,6,5,…). One optimization found the most recent ~5 weeks of spreads minimized MSE.
Early-season priors. Blend current results with a long-run prior (prior 3 seasons' ratings + market-implied odds), à la DAVE — the first ~4 weeks are too noisy to trust raw.
Regression to the mean governs trust per stat: strongest for turnovers, fumble recoveries, RZ TD%, close-game records (regress aggressively); weakest (most "real") for offensive pass EPA and ANY/A (trust more).

The ranked summary (out-of-sample predictive value)

Rank	Variable	Use it for	Regress?
1	EPA/play, pass-weighted (off > def)	core team strength	mild; regress def more
2	QB identity / starter-backup delta	spread (3–7 pts)	n/a (it's an event)
3	ANY/A	simple passing strength	mild
4	Opp-adj point diff / Pythagorean (exp 2.37)	record-independent strength	adjust opp + garbage time
5	DVOA + success rate	efficiency + stability	lean on prior pre–Wk 6
6	Pace / PPD / field position	totals core	mild
7	Wind ≥15–20 mph	totals (−2.7 at 20+)	n/a (forecast)
8	Rest / HFA	small spread nudges	use current values
9	Turnovers / RZ TD% / fumble luck	regression flags only	hard
10	ST / O-line / non-QB skill / coaching	second-order, via efficiency	—

Net guidance for our weights file. Our biggest missing inputs are #2 (QB delta) and a proper #1/#4 opponent-adjusted rating spine. Our existing situational weights (turnovers, rest, divisional, coaching) are appropriately small and should stay small. The mistake to avoid is over-trusting Tier-3 noise.

Sources: EPA stability, ANY/A, Pythagorean, DVOA methods, turnover randomness, turnover margin, red zone regression, field position / ST, QB/player values, recency/priors, bye-week academic.

→ Continue to Doc 5 — Modeling Methods.