Docs / 2 · Predicting Spreads

02 — Predicting Spreads (Margin of Victory)

TL;DR. A point spread is just (Rating_home − Rating_away) + HFA. The entire art is (a) producing good opponent-adjusted power ratings, (b) using a modern, venue-specific home-field number (~1.5–2.5, not 3), (c) layering small situational adjustments (QB is by far the biggest, ~3–7 pts; rest/bye are now ≤~0.5 pts each), and (d) converting the predicted margin into win/cover probabilities with a normal model (σ ≈ 13.5) that respects the key-number spikes at 3 and 7.

1. The rating → spread identity

Projected margin (home) = (Rating_home − Rating_away) + HFA

If ratings are "points vs. a league-average team," a +6 team hosting a −1 team with HFA = 2 projects to win by (6 − (−1)) + 2 = 9 → fair spread −9. Every public system differs only in how it derives the ratings:

System	Basis	Notes
Sagarin PREDICTOR	points-based, optimized to forecast margins	the betting-relevant flavor (ELO_CHESS ignores margin)
Massey	least-squares/MLE over all results, points + yards, SoS-adjusted	carries prior-season info; outputs off/def/overall in points
538 Elo	iterative game-by-game Elo	full spec in §3; ~25 Elo ≈ 1 point
ESPN FPI	EPA-based off/def/ST efficiency in net points	adds HFA, rest, travel, altitude, QB health; simulates season 10k×
Football Outsiders DVOA	per-play efficiency vs. situational baseline, opponent-adjusted	efficiency %, not points — must be regressed into a points rating

Our engine doesn't have this identity. It adds points onto a scored/allowed baseline plus a "power rank slot" nudge. That conflates rating with recent scoring and never solves for opponent-adjusted strength league-wide. Doc 5/7 propose a ridge-regression rating as the new spine.

Sources: DVOA explained, Massey, ESPN FPI explainer.

2. Two ways ratings get built (preview of Doc 5)

Simultaneous least-squares (ridge) power ratings. Encode every game: +1 away column, −1 home column, 1 HFA column; response = margin of victory. One regression recovers all 32 ratings + HFA at once. Ridge (L2) shrinkage is essential with only ~272 games/season. This style calls games right ~66% of the time and recovers HFA ≈ 2.5–3 historically.
Iterative Elo (next section): cheap, online, self-correcting, with a margin-of-victory multiplier and preseason mean-reversion.

3. The 538 NFL Elo spec (fully documented, copyable)

The most completely published public spread engine. Constants are taken from 538's open-source forecast.py.

Win expectation (logistic, base-10, 400 divisor):

E_home = 1 / (1 + 10^(-(Δ) / 400)),   Δ = Elo_home + HFA_elo − Elo_away

Rating update:

Elo_new = Elo_old + K · MOV_mult · (S − E)

K = 20
HFA = 65 Elo points (≈ 2.5 on-field points; 25 Elo ≈ 1 point). Note: some analysts argue ~50 Elo is closer to the modern reality — calibrate to data.
S = actual result (1 / 0 / 0.5)

Margin-of-victory multiplier (with autocorrelation control):

MOV_mult = ln(|PD| + 1) · ( 2.2 / (2.2 + 0.001 · Δ_winner) )

ln(|PD|+1) — diminishing credit for blowouts (a 35-pt win is not 5× a 7-pt win).
2.2 / (2.2 + 0.001·Δ_winner) — autocorrelation adjustment: shrinks the bonus when a big favorite wins big, inflates it when an underdog wins. Without it, strong teams' ratings spiral upward (rating and margin are correlated through team strength).

Preseason reversion: Elo_new = 1505 + (2/3)·(Elo_old − 1505) — keep 2/3 of last year, revert 1/3 to the mean. (REVERT = 1/3.)

QB adjustment (538's QB-adjusted Elo, also how ESPN FPI works): each QB carries a rolling value rating; the team's effective Elo is shifted by starter_rating − replacement_baseline. A QB injury moves the projected spread by the difference between the two QBs' ratings — see §7.

Sources: 538 forecast.py, 538 methodology mirror, autocorrelation.

4. Key numbers — the most important thing about NFL margins

NFL scoring is built from 3 (FG) and 7 (TD+XP), so victory margins clump on those numbers and their combinations. The margin distribution is roughly normal in shape but has sharp spikes ("key numbers").

Approximate margin frequencies (modern era, ~last 25 seasons):

Margin	~% of games	Note
3	~15%	by far the most common — over 1/7 of all games
7	~9%	second
6	~6–7%	third
10	~6%	3 + 7
14	~5%	two TDs
4	~4%	declined after the 2015 XP-distance rule change
1, 2, 5, 8	~3–4% each
11, 13, 17, 21	secondary spikes	combinations of 3/7
15+	~28–29%	the long blowout tail

3 + 6 + 7 alone ≈ 30% of all games.

Why this dominates pricing and conversion

Pushes: at a spread of exactly 3, ~15–16% of games land on the number. Crossing a key number is worth far more than crossing a dead one: moving −3 → −2.5 (or +3 → +3.5) captures the entire 3-point bucket. A half-point right at 3 is worth roughly 3.8%. This is why books price −2.5/−3/−3.5 so differently and why bettors "buy off 3."
Discreteness breaks the smooth normal model. A predicted margin of 2.9 vs. 3.1 has very different cover implications because of the spike at 3. Serious models either (a) use the empirical discrete margin distribution directly, or (b) use a normal approximation plus key-number corrections layered on top (nfelo's approach).

Our engine ignores key numbers entirely. It outputs a margin and clamps scores; it never asks "how likely is exactly 3?" This is the single biggest probability-layer gap. Doc 7 Phase 3 adds a margin-distribution module.

Sources: WalterFootball margins, BetMGM common margins, SportsInsights key numbers, nfelo margin probabilities.

5. Converting spread ↔ win probability ↔ moneyline

Model the final margin as Normal(mean = spread, σ ≈ 13.5) (published estimates: 13.45–13.86; use ~13.5 as the working value).

Win probability of a team favored by p points:

P(win) = Φ(p / σ)        σ ≈ 13.5,  Φ = standard-normal CDF

Example: favored by 7 → z = 7/13.5 = 0.519 → Φ(0.519) ≈ 69.8% (~70%).

ATS / cover probability when your model predicts margin μ vs. market spread m:

P(cover) = Φ((μ − m) / σ)

Over probability when you project total T̂ vs. market total L (σ_total ≈ 10):

P(over) = Φ((T̂ − L) / 10)

Probability → fair moneyline:

p ≥ 0.5 (favorite):  ML = −100 · p / (1 − p)
p < 0.5 (underdog):  ML = +100 · (1 − p) / p

Empirical reference table (Boyd's Bets, all games since 1980 — note these run slightly hotter than the pure σ=13.5 model because they reflect realized outcomes and historical rates):

Spread	Fav win %	Fair fav ML	Dog win %	Fair dog ML
1	51.3%	−105	48.8%	+105
3	59.4%	−146	40.6%	+146
7	75.2%	−303	24.8%	+303
10	83.6%	−510	16.4%	+510
14	92.4%	−1216	7.6%	+1216

Decide one source of truth. The pure normal model and the empirical table disagree slightly; pick one and use it consistently for calibration. Refinement: σ actually varies by spread size — a per-spread σ table is better than one constant.

Sources: Boyd's spread→ML, arXiv 2212.08116, PFR win prob.

6. Home-field advantage (HFA)

It has shrunk and it varies by venue. Do not hardcode a flat 3.

Historical: ~3 points; home teams won ~57%.
Modern: markets price ~1.5, applied roughly 1.5–2.5 by venue. Home win rate has fallen to ~52–53%. 2024 home teams went 127-125-1 (barely .500).
2020 no-fans natural experiment: HFA fell to ~0.1 points with empty stadiums; within-2020 splits ≈ 54% home wins with fans vs. ~47% without — strong evidence the mechanism is crowd noise → false starts / penalties / communication + reduced ref bias, not travel/familiarity.
Venue variation: KC (Arrowhead) went 56-15 (78.9%) at home 2018–24; loudest venues (PIT, KC, PHI, GB, CLE) cost visiting offenses ~3.8% completion / ~18–24 passing yards per game. Denver's altitude is historically the single biggest venue edge.

Our weight: a home_advantage value of 1.0 × per-team home advantage capped at 2.5. The cap is at the high end of modern reality. Action: lower the default toward ~1.5–2.0 and let the optimizer tune; ideally make HFA venue-specific rather than one cap.

Sources: NFL Ops — 2020 HFA, arXiv 2104.11595, Covers HFA, Sharp Football toughest stadiums.

7. Quarterback impact (the biggest single-player lever)

The QB is the only position that reliably moves an NFL spread on its own.

Elite QBs ≈ 7 points. Recent oddsmaker surveys: Josh Allen ~6.98, Mahomes ~6.94 (an earlier survey had Mahomes ~7.5).
General starter loss: ~3–7 points; the value depends as much on the backup drop-off as the starter's quality. A great starter with a competent backup can move the line less than a good starter with a replacement-level backup.
Concrete: Andrew Luck's retirement moved IND vs. LAC from +3 to +7 (a ~4 pt shift to Brissett).

This is exactly the lever 538 QB-Elo and ESPN FPI pull: carry a QB rating, adjust team strength by starter − replacement, re-derive the spread.

Our engine has no QB model. It captures rookies, coaching, and "FG team impact" but not a starter-vs-backup QB delta. This is likely our highest-ROI missing feature. Doc 7 Phase 2.

Sources: theScore QB values, Yahoo oddsmaker QB survey, Boyd's player values.

8. Schedule & situational adjustments (small, and smaller than they used to be)

Modern rule changes (2011 CBA practice limits, 2020 extensions) have shrunk almost every rest/schedule edge to ≤ ~0.5 point. The old "+3 off a bye" is obsolete.

Factor	Modern value	Source
Off a bye	~+0.31 pts (was +2.2 pre-2011), not significant	Frontiers 2024
Mini-bye (post-Thursday, 10 days)	~+0.48, not significant	Frontiers 2024
Facing a team off MNF	~+0.37 (market), ~+0.18 (actual)	Frontiers 2024
Short-week (<6 days) road, late season	won ~43.9%, covered ~47.4%	TheLines
Extra-rest road team	won ~46.9%, covered ~53.3%	TheLines
Divisional games	margins compress; road dogs "live" — well under 1 pt	—
Letdown / lookahead spots	favorite ~0.5–1.5 pts worse in the spot	—

Our weights are roughly consistent already: rest_per_day 0.07 capped at 0.7, divisional_compression 0.0 (Optuna-decided). Good — keep these small. Don't inflate them. The win is QB + power ratings + the probability layer, not bigger situational nudges.

Sources: Frontiers — Bye-Bye, Bye Advantage, TheLines rest advantage.

9. Implications for our engine

Replace the scored/allowed baseline with a proper opponent-adjusted power rating (ridge regression, Doc 5). The current baseline is not opponent-adjusted and double-counts with the power-rank nudge.
Add a QB starter-vs-backup delta (~3–7 pts), driven by depth-chart data we already scrape via the roster-data pipeline. Biggest gap.
Lower/regionalize HFA. Default toward ~1.7; ideally a per-venue table.
Add the probability layer — Φ((μ − m)/σ) for cover/win and a key-number-aware margin distribution, not just a clamped score.
Keep situational adjustments small — the research validates our existing modest caps; resist inflating them.

→ Continue to Doc 3 — Predicting Totals.