How Our Predictions Work
Overview
Every prediction on SharpBetz is generated by a machine learning model — not by human opinion. The model ingests real-time data from ESPN, processes it through a feature engineering pipeline, and outputs projected spreads, totals, and win probabilities. Here's how each step works.
Step 1: Data Collection
We pull game data from ESPN's public API for every Division I college basketball and NBA game. This includes:
- Team records, scoring averages, and defensive statistics
- Box score data (field goal percentage, three-point shooting, turnovers, rebounds)
- Team rankings (AP Poll, coaches poll)
- Venue information and home/away splits
- Injury reports with player impact assessments
- Market odds from major sportsbooks
Historical data spanning multiple seasons is stored in a local database and updated daily, giving the model a rich dataset to learn from.
Step 2: Feature Engineering
Raw statistics don't tell the full story. Our feature engineering pipeline transforms raw data into 27+ predictive features organized into groups:
- Base features — points per game differentials, scoring margins, win percentages, home court advantage
- Situational features — rest days, back-to-back games, win/loss streaks, conference matchups
- Efficiency metrics — offensive and defensive efficiency ratings, pace adjustments
- Strength of schedule — opponent quality adjustments based on average opponent win percentage
- Market signals — implied probabilities from opening lines and market movement
- Tempo proxies — pace differentials and combined tempo estimates
Each feature is computed as a differential between the two teams, capturing the relative strength matchup rather than absolute values.
Step 3: XGBoost Model
We use XGBoost (eXtreme Gradient Boosting), a state-of-the-art machine learning algorithm widely used in competitive data science and financial modeling. The model:
- Trains on thousands of historical games with known outcomes
- Uses Optuna Bayesian hyperparameter optimization to find the best model configuration
- Outputs a predicted margin (spread) and predicted total for each game
- Calculates win probabilities based on the predicted margin and historical model error
The model is versioned — each iteration is tracked with its training data, features, hyperparameters, and performance metrics so we can measure improvement over time.
Step 4: Edge Detection
A prediction alone isn't a pick. We compare the model's projected line against the market line to find edges — situations where our model disagrees with the sportsbooks by a statistically significant margin.
The edge is measured in standard deviations (z-score) relative to the model's historical error rate. Larger edges indicate higher confidence that the market has mispriced the game.
Step 5: Confidence Tiers
Every pick is assigned a confidence tier based on the z-score of the edge:
- 1-unit (lean) — small edge detected, lower confidence
- 2-unit (moderate) — meaningful edge with reasonable confidence
- 3-unit (strong) — significant edge, historically profitable tier
- 4-unit (max) — largest edges, reserved for the strongest signals
Higher-confidence tiers have historically outperformed lower tiers, validating that the model is correctly calibrating its own certainty.
Step 6: Backtesting & Validation
Before any model version goes live, it undergoes rigorous backtesting:
- Temporal validation — the model is tested on future games it has never seen, simulating real-time prediction
- Performance gates — a new version must meet minimum thresholds for ATS accuracy (52.4%+), positive ROI, and low overfitting gap
- Tier analysis — we verify that higher-confidence tiers actually perform better than lower ones
- Comparison — every new model is compared side-by-side against previous versions
Track Record
We publish every pick — wins and losses — on our results page. This is the ultimate accountability measure. You can verify our claimed performance against the actual record at any time.
What Our Model Is Not
Transparency means being honest about limitations:
- No model wins every game — a 55-60% ATS win rate is elite in sports betting
- Past performance doesn't guarantee future results
- The model can't account for every variable (locker room dynamics, motivation, game-day weather)
- We recommend responsible bankroll management regardless of confidence level
For more about responsible betting practices, see our responsible gambling page.