Docs / How Our Model Works

How Our NFL Prediction Engine Works

A one-page tour of the data, models, and methods behind every spread, total, and bet on this site. For the deeper "why," see the handicapping reference series.

We don't try to out-guess the final score by gut. We build an opponent-aware, defense-weighted scoring model, convert its output into calibrated probabilities, and tune it against years of real results — measuring ourselves the way a sportsbook would, not by bravado.

1. The data we start from

Source	What it gives us
Final scores, 1999–present	Every game's result, closing spread, total, and favorite
Play-by-play, recent seasons	Per-play EPA (expected points added) and success rate, split by rush vs. pass — offense and defense
Online schedule + odds + rosters	The current slate, posted spreads/totals/moneylines, depth charts, injuries
DVOA, power ratings, defensive narratives	Independent efficiency and strength signals

From these we build a per-team profile via the defensive model: each team's scored/allowed averages, the realistic envelope of points a defense allows (15th–85th percentile), and rush/pass defensive EPA ranks — all recency-weighted so the latest season counts most.

2. The scoring engine

For every matchup the prediction engine projects each team's points:

Opponent-aware envelope baseline. We blend two views, both weighting the defense more heavily than the offense (a great defense should move the number more than a great offense): an additive matchup of each team's deviation from league average, and "envelope framing" that places an offense inside the band of points the specific defense it faces has historically allowed. This is what lets elite defenses produce sub-13 outputs and leaky ones surrender 35+ — instead of every game drifting to a bland ~23.
Rush/pass scheme matchup. We nudge each offense by the kind of defense it meets (run-stuffing front vs. shutdown secondary) using defensive EPA ranks and the offense's pass rate — and emit plain-language "why" bullets you see on the pick cards.
~15 situational adjustments, each small and capped: home field, coaching, rookies, travel, rest, momentum, turnovers, divisional familiarity, DVOA, power rank, defensive-narrative rankings, offensive/defensive line play, red-zone and third-down efficiency.
Dispersion + realism clamp. A deterministic model is naturally flatter than real life, so we stretch scores around the league mean to reproduce real blowouts and duds, then clamp each team to the realistic range its specific opponent allows.

3. The probability layer

A point estimate isn't a bet. The probability model turns projections into prices the way the market does:

Margin → win/cover probability via the empirical spread of NFL outcomes (margin SD ≈ 13.5); total → over probability (SD ≈ 10).
De-vigging. We strip the bookmaker's commission from the posted line to get the market's fair probability, and compare our model to that — never to the raw, juice-inflated number.
Key numbers. NFL margins cluster on 3 and 7; we treat those specially rather than assuming a smooth bell curve.
Edge + staking. We flag a bet only when our number beats the fair line by a meaningful margin, and size it with fractional Kelly (never bet-the-house).

4. How we tune it

The engine has dozens of knobs. We search 24 of them at once with Bayesian optimization (Optuna TPE) over real historical games — not by hand. Crucially:

Walk-forward, held-out validation. We train on some seasons and score on a season the optimizer never saw, so we measure generalization, not memorization.
Accuracy-first objective. We optimize point accuracy (MAE) and calibration, which generalize, rather than chasing noisy bet ROI, which overfits. (We learned this the hard way — an ROI-tuned config looked great in-sample and got worse on the holdout; the accuracy-tuned one improved across the board.)
A realism gate. Any tuned config that produces unrealistic scores (e.g. pinned-at-the-ceiling blowouts) is rejected before it ships.

5. How we grade ourselves

We score the model the way the literature says you should — against the market, not the scoreboard:

Against the spread (ATS) and over/under win rates, with the break-even bar of 52.4% at standard −110 pricing front and center.
Brier score & log loss — are our probabilities actually calibrated, or just confident? (A model can win 60% yet have coin-flip probabilities.)
MAE / RMSE of the predicted margin, and ROI at −110.
Closing Line Value (CLV) — did our number beat the line at kickoff? This is the gold-standard skill metric, wired in and ready as we capture closing lines.

6. What we honestly don't claim

NFL betting markets are among the most efficient in the world; the closing line is a brutally good predictor. Realistic skilled performance is ~53–55% ATS, not 65%. We treat the market as a strong prior, hunt for the specific spots where we have an edge, and report our results — including the misses — rather than overselling. The model is a decision aid, not a money printer.

Go deeper: the full reference series covers how books set lines, predicting spreads and totals, the predictive-variable catalog, modeling methods, bankroll & edge, and our engineering roadmap.