How To Be the Book: NFL Spread & Total Handicapping Reference
A research-backed series on how Las Vegas oddsmakers and professional quant bettors actually price and predict NFL games — written specifically so we can upgrade this repo's prediction engine to perform like the people who do it for a living.
Why this series exists
Our current prediction engine is an additive points-adjustment model: an opponent-aware, defense-weighted "envelope" baseline plus a rush/pass scheme matchup and ~15 small capped nudges (home edge, coaching, rookies, rest, DVOA, power rank, line play, etc.), with a score-dispersion knob and per-defense clamps. That is a solid handicapper's worksheet — and recent work already moved it toward this series (opponent-aware, defense-weighted, variance-calibrated) — but it is still not how the market is built, and it lacks the three things that separate real operators from hobbyists:
Reading this after the recent rework? Doc 7's "Where we are today" and phase statuses are reconciled with it (Phase 4 is now partially done). Start there for the current state and the still-open priorities (QB lever, the probability layer, de-vig/CLV).
- A power-rating → spread identity with opponent adjustment and a proper home-field constant.
- A probability layer — converting a predicted margin into win / cover / over probabilities using the empirical distribution of NFL margins (σ ≈ 13.5) and the key-number spikes at 3 and 7.
- A market-aware evaluation loop — measuring ourselves against the closing line (CLV), not against the scoreboard, and only betting when we disagree with the no-vig market by enough to clear the vig.
The market is one of the most efficient prediction systems in the world. The goal is not to "beat Vegas at predicting scores" — it's to (a) reproduce the market's accuracy as a baseline, then (b) find the small, specific spots where we have incremental information the closing line hasn't absorbed yet.
The documents
| # | Doc | What it covers |
|---|---|---|
| 1 | 01-bookmaking-fundamentals.md |
How books originate and move lines, the vig, balanced-book myth, sharp vs. square money, Closing Line Value (CLV), market efficiency |
| 2 | 02-predicting-spreads.md |
Power ratings, the rating→spread identity, Elo (538 spec), home-field advantage, key numbers, spread↔probability conversion, schedule/QB adjustments |
| 3 | 03-predicting-totals.md |
Drives × points-per-drive, pace, efficiency, weather (wind is king), game script, scoring-era shifts |
| 4 | 04-variable-catalog.md |
Ranked catalog of every predictive input (EPA, DVOA, ANY/A, Pythagorean, turnovers, rest, HFA…) with predictive value and regression caveats |
| 5 | 05-modeling-methods.md |
Ridge power ratings, Elo/Glicko, Bayesian state-space, ML pitfalls, walk-forward validation, calibration, market-aware blending |
| 6 | 06-betting-strategy-bankroll.md |
Edge identification, de-vigging, Kelly & fractional Kelly, CLV tracking, why accuracy ≠ profit |
| 7 | 07-engine-improvement-roadmap.md |
Concrete, phased plan mapping all of the above to our actual files, weights, and backtester |
| 8 | 08-clv-and-odds-data.md |
Closing Line Value: where to get opening/closing odds (free + paid), and the capture / import / CLV-backtest tools |
The five numbers to memorize
| Quantity | Value | Why it matters |
|---|---|---|
| Break-even win rate at −110 | 52.38% | Below this you lose money long-term; this is the bar |
| SD of NFL game margin | σ ≈ 13.5 (13.45–13.86) | Converts a predicted margin into win/cover probability |
| SD of NFL game total | ≈ 10 | Converts a predicted total into over/under probability |
| Most common margin | 3 pts (~15%), then 7 (~9%) | Key numbers; half-points across 3 & 7 are worth ~3–4% each |
| Modern home-field edge | ~1.5–2.5 pts (was ~3) | Our weight cap of 2.5 is at the high end; market prices ~1.5 |
How to read this as an engineer
Each doc ends with a "Implications for our engine" section that translates theory into specific code changes. Doc 7 consolidates those into a build sequence. Start with the README and Doc 7 if you want the action plan; read 1–6 for the why behind each change.
Scope / disclaimer. This is a modeling and analytics reference for our own prediction engine and educational use. It documents how legal, regulated sportsbooks price markets and how public research evaluates those markets. It is not gambling advice.
Source quality note
The strongest claims (margin distributions, σ ≈ 13.5, the 52.38% bar, bye-week edge collapse, EPA/turnover stability) come from academic papers (arXiv, JASA, Frontiers) and established analytics outlets. Mechanics (vig, sharp/square, CLV, market-maker structure) are corroborated across multiple industry sources and the canonical books. Each doc cites its sources inline. Vendor "58% ATS" type claims are flagged as marketing, not audited results.
Canonical books referenced throughout
- Ed Miller & Matthew Davidow — The Logic of Sports Betting (2019) — how books actually work, price discovery, CLV.
- Stanford Wong — Sharp Sports Betting (2001) — point spreads, totals, key numbers, half-point values.
- Wayne Winston — Mathletics — margin ≈ Normal(line, 13.86), least-squares power ratings.
- King Yao — Weighing the Odds in Sports Betting (2007) — line movement, scalping/middling, hedging.
- Joseph Buchdahl — Squares & Sharps, Suckers & Sharks (2016) — efficiency, luck vs. skill, wisdom of crowds.
Foundational papers
- Glickman & Stern (1998), A State-Space Model for NFL Scores, JASA — Bayesian dynamic ratings; origin of σ ≈ 13.86.
- Dixon & Coles (1997), Modelling Association Football Scores…, JRSS-C — score modeling + market inefficiency template.
- Levitt (2004), Why are gambling markets organised so differently…, Economic Journal — books shade lines to public bias.
- Benter (1994), Computer Based Horse Race Handicapping… — market-aware blending + Kelly; the blueprint.
- Kelly (1956), A New Interpretation of Information Rate — the Kelly criterion.