02 β Predicting Spreads (Margin of Victory)
TL;DR. A point spread is just
(Rating_home β Rating_away) + HFA. The entire art is (a) producing good opponent-adjusted power ratings, (b) using a modern, venue-specific home-field number (~1.5β2.5, not 3), (c) layering small situational adjustments (QB is by far the biggest, ~3β7 pts; rest/bye are now β€~0.5 pts each), and (d) converting the predicted margin into win/cover probabilities with a normal model (Ο β 13.5) that respects the key-number spikes at 3 and 7.
1. The rating β spread identity
Projected margin (home) = (Rating_home β Rating_away) + HFA
If ratings are "points vs. a league-average team," a +6 team hosting a β1 team
with HFA = 2 projects to win by (6 β (β1)) + 2 = 9 β fair spread β9. Every
public system differs only in how it derives the ratings:
| System | Basis | Notes |
|---|---|---|
| Sagarin PREDICTOR | points-based, optimized to forecast margins | the betting-relevant flavor (ELO_CHESS ignores margin) |
| Massey | least-squares/MLE over all results, points + yards, SoS-adjusted | carries prior-season info; outputs off/def/overall in points |
| 538 Elo | iterative game-by-game Elo | full spec in Β§3; ~25 Elo β 1 point |
| ESPN FPI | EPA-based off/def/ST efficiency in net points | adds HFA, rest, travel, altitude, QB health; simulates season 10kΓ |
| Football Outsiders DVOA | per-play efficiency vs. situational baseline, opponent-adjusted | efficiency %, not points β must be regressed into a points rating |
Our engine doesn't have this identity. It adds points onto a scored/allowed baseline plus a "power rank slot" nudge. That conflates rating with recent scoring and never solves for opponent-adjusted strength league-wide. Doc 5/7 propose a ridge-regression rating as the new spine.
Sources: DVOA explained, Massey, ESPN FPI explainer.
2. Two ways ratings get built (preview of Doc 5)
- Simultaneous least-squares (ridge) power ratings. Encode every game:
+1away column,β1home column,1HFA column; response = margin of victory. One regression recovers all 32 ratings + HFA at once. Ridge (L2) shrinkage is essential with only ~272 games/season. This style calls games right ~66% of the time and recovers HFA β 2.5β3 historically. - Iterative Elo (next section): cheap, online, self-correcting, with a margin-of-victory multiplier and preseason mean-reversion.
3. The 538 NFL Elo spec (fully documented, copyable)
The most completely published public spread engine. Constants are taken from
538's open-source forecast.py.
Win expectation (logistic, base-10, 400 divisor):
E_home = 1 / (1 + 10^(-(Ξ) / 400)), Ξ = Elo_home + HFA_elo β Elo_away
Rating update:
Elo_new = Elo_old + K Β· MOV_mult Β· (S β E)
- K = 20
- HFA = 65 Elo points (β 2.5 on-field points; 25 Elo β 1 point). Note: some analysts argue ~50 Elo is closer to the modern reality β calibrate to data.
S= actual result (1 / 0 / 0.5)
Margin-of-victory multiplier (with autocorrelation control):
MOV_mult = ln(|PD| + 1) Β· ( 2.2 / (2.2 + 0.001 Β· Ξ_winner) )
ln(|PD|+1)β diminishing credit for blowouts (a 35-pt win is not 5Γ a 7-pt win).2.2 / (2.2 + 0.001Β·Ξ_winner)β autocorrelation adjustment: shrinks the bonus when a big favorite wins big, inflates it when an underdog wins. Without it, strong teams' ratings spiral upward (rating and margin are correlated through team strength).
Preseason reversion: Elo_new = 1505 + (2/3)Β·(Elo_old β 1505) β keep 2/3 of
last year, revert 1/3 to the mean. (REVERT = 1/3.)
QB adjustment (538's QB-adjusted Elo, also how ESPN FPI works): each QB
carries a rolling value rating; the team's effective Elo is shifted by
starter_rating β replacement_baseline. A QB injury moves the projected spread
by the difference between the two QBs' ratings β see Β§7.
Sources: 538 forecast.py, 538 methodology mirror, autocorrelation.
4. Key numbers β the most important thing about NFL margins
NFL scoring is built from 3 (FG) and 7 (TD+XP), so victory margins clump on those numbers and their combinations. The margin distribution is roughly normal in shape but has sharp spikes ("key numbers").
Approximate margin frequencies (modern era, ~last 25 seasons):
| Margin | ~% of games | Note |
|---|---|---|
| 3 | ~15% | by far the most common β over 1/7 of all games |
| 7 | ~9% | second |
| 6 | ~6β7% | third |
| 10 | ~6% | 3 + 7 |
| 14 | ~5% | two TDs |
| 4 | ~4% | declined after the 2015 XP-distance rule change |
| 1, 2, 5, 8 | ~3β4% each | |
| 11, 13, 17, 21 | secondary spikes | combinations of 3/7 |
| 15+ | ~28β29% | the long blowout tail |
3 + 6 + 7 alone β 30% of all games.
Why this dominates pricing and conversion
- Pushes: at a spread of exactly 3, ~15β16% of games land on the number. Crossing a key number is worth far more than crossing a dead one: moving β3 β β2.5 (or +3 β +3.5) captures the entire 3-point bucket. A half-point right at 3 is worth roughly 3.8%. This is why books price β2.5/β3/β3.5 so differently and why bettors "buy off 3."
- Discreteness breaks the smooth normal model. A predicted margin of 2.9 vs. 3.1 has very different cover implications because of the spike at 3. Serious models either (a) use the empirical discrete margin distribution directly, or (b) use a normal approximation plus key-number corrections layered on top (nfelo's approach).
Our engine ignores key numbers entirely. It outputs a margin and clamps scores; it never asks "how likely is exactly 3?" This is the single biggest probability-layer gap. Doc 7 Phase 3 adds a margin-distribution module.
Sources: WalterFootball margins, BetMGM common margins, SportsInsights key numbers, nfelo margin probabilities.
5. Converting spread β win probability β moneyline
Model the final margin as Normal(mean = spread, Ο β 13.5) (published estimates: 13.45β13.86; use ~13.5 as the working value).
Win probability of a team favored by p points:
P(win) = Ξ¦(p / Ο) Ο β 13.5, Ξ¦ = standard-normal CDF
Example: favored by 7 β z = 7/13.5 = 0.519 β Ξ¦(0.519) β 69.8% (~70%).
ATS / cover probability when your model predicts margin ΞΌ vs. market spread m:
P(cover) = Ξ¦((ΞΌ β m) / Ο)
Over probability when you project total TΜ vs. market total L (Ο_total β 10):
P(over) = Ξ¦((TΜ β L) / 10)
Probability β fair moneyline:
p β₯ 0.5 (favorite): ML = β100 Β· p / (1 β p)
p < 0.5 (underdog): ML = +100 Β· (1 β p) / p
Empirical reference table (Boyd's Bets, all games since 1980 β note these run slightly hotter than the pure Ο=13.5 model because they reflect realized outcomes and historical rates):
| Spread | Fav win % | Fair fav ML | Dog win % | Fair dog ML |
|---|---|---|---|---|
| 1 | 51.3% | β105 | 48.8% | +105 |
| 3 | 59.4% | β146 | 40.6% | +146 |
| 7 | 75.2% | β303 | 24.8% | +303 |
| 10 | 83.6% | β510 | 16.4% | +510 |
| 14 | 92.4% | β1216 | 7.6% | +1216 |
Decide one source of truth. The pure normal model and the empirical table disagree slightly; pick one and use it consistently for calibration. Refinement: Ο actually varies by spread size β a per-spread Ο table is better than one constant.
Sources: Boyd's spreadβML, arXiv 2212.08116, PFR win prob.
6. Home-field advantage (HFA)
It has shrunk and it varies by venue. Do not hardcode a flat 3.
- Historical: ~3 points; home teams won ~57%.
- Modern: markets price ~1.5, applied roughly 1.5β2.5 by venue. Home win rate has fallen to ~52β53%. 2024 home teams went 127-125-1 (barely .500).
- 2020 no-fans natural experiment: HFA fell to ~0.1 points with empty stadiums; within-2020 splits β 54% home wins with fans vs. ~47% without β strong evidence the mechanism is crowd noise β false starts / penalties / communication + reduced ref bias, not travel/familiarity.
- Venue variation: KC (Arrowhead) went 56-15 (78.9%) at home 2018β24; loudest venues (PIT, KC, PHI, GB, CLE) cost visiting offenses ~3.8% completion / ~18β24 passing yards per game. Denver's altitude is historically the single biggest venue edge.
Our weight: a
home_advantagevalue of 1.0 Γ per-team home advantage capped at 2.5. The cap is at the high end of modern reality. Action: lower the default toward ~1.5β2.0 and let the optimizer tune; ideally make HFA venue-specific rather than one cap.
Sources: NFL Ops β 2020 HFA, arXiv 2104.11595, Covers HFA, Sharp Football toughest stadiums.
7. Quarterback impact (the biggest single-player lever)
The QB is the only position that reliably moves an NFL spread on its own.
- Elite QBs β 7 points. Recent oddsmaker surveys: Josh Allen ~6.98, Mahomes ~6.94 (an earlier survey had Mahomes ~7.5).
- General starter loss: ~3β7 points; the value depends as much on the backup drop-off as the starter's quality. A great starter with a competent backup can move the line less than a good starter with a replacement-level backup.
- Concrete: Andrew Luck's retirement moved IND vs. LAC from +3 to +7 (a ~4 pt shift to Brissett).
This is exactly the lever 538 QB-Elo and ESPN FPI pull: carry a QB rating, adjust
team strength by starter β replacement, re-derive the spread.
Our engine has no QB model. It captures rookies, coaching, and "FG team impact" but not a starter-vs-backup QB delta. This is likely our highest-ROI missing feature. Doc 7 Phase 2.
Sources: theScore QB values, Yahoo oddsmaker QB survey, Boyd's player values.
8. Schedule & situational adjustments (small, and smaller than they used to be)
Modern rule changes (2011 CBA practice limits, 2020 extensions) have shrunk almost every rest/schedule edge to β€ ~0.5 point. The old "+3 off a bye" is obsolete.
| Factor | Modern value | Source |
|---|---|---|
| Off a bye | ~+0.31 pts (was +2.2 pre-2011), not significant | Frontiers 2024 |
| Mini-bye (post-Thursday, 10 days) | ~+0.48, not significant | Frontiers 2024 |
| Facing a team off MNF | ~+0.37 (market), ~+0.18 (actual) | Frontiers 2024 |
| Short-week (<6 days) road, late season | won ~43.9%, covered ~47.4% | TheLines |
| Extra-rest road team | won ~46.9%, covered ~53.3% | TheLines |
| Divisional games | margins compress; road dogs "live" β well under 1 pt | β |
| Letdown / lookahead spots | favorite ~0.5β1.5 pts worse in the spot | β |
Our weights are roughly consistent already:
rest_per_day0.07 capped at 0.7,divisional_compression0.0 (Optuna-decided). Good β keep these small. Don't inflate them. The win is QB + power ratings + the probability layer, not bigger situational nudges.
Sources: Frontiers β Bye-Bye, Bye Advantage, TheLines rest advantage.
9. Implications for our engine
- Replace the scored/allowed baseline with a proper opponent-adjusted power rating (ridge regression, Doc 5). The current baseline is not opponent-adjusted and double-counts with the power-rank nudge.
- Add a QB starter-vs-backup delta (~3β7 pts), driven by depth-chart data we already scrape via the roster-data pipeline. Biggest gap.
- Lower/regionalize HFA. Default toward ~1.7; ideally a per-venue table.
- Add the probability layer β
Ξ¦((ΞΌ β m)/Ο)for cover/win and a key-number-aware margin distribution, not just a clamped score. - Keep situational adjustments small β the research validates our existing modest caps; resist inflating them.
β Continue to Doc 3 β Predicting Totals.