Every year, millions of college basketball fans attempt to predict the outcome of the NCAA Men’s Basketball Tournament, known as “March Madness.” The tournament consists of 68 teams and seven rounds of single-elimination basketball. While nobody has ever predicted the correct outcome of all 67 games held during the tournament, the emergence of more accurate machine learning techniques as well as incentives such as bracket pools and Kaggle’s machine learning bracket competition have led to increased prediction accuracy based on algorithms using historical data.
- Build a model to predict game outcomes and the probabilities of each outcome
- Train the model on historical data from past tournaments
- Validate the model and evaluate performance by making predictions based on historical data from years withheld from the training set
- Use the model to predict the outcomes of games in the 2017 tournament
- Evaluate the model’s performance based on prediction accuracy and log-loss on the games that actually occurred in the 2017 NCAA tournament