# An Exploratory Study on Methods for Interpolating and Extrapolating Baseball Win-Loss Percentage

Spring 2023

Thesis

## Degree Name

Master of Science

## Department

Mathematics

J. Christopher Tweddle, Ph.D.

Heng Li, Ph.D.

Anne Morlet, Ph.D.

## Abstract

The 2020 Major League Baseball season was shortened by a few months due to travel restrictions implemented in response to Coronavirus pandemic. When baseball began in mid-July, the schedule had been changed so that teams were mostly playing against division opponents. This left some fans wondering how different the 2020 season would have looked if the original schedule of 162 games were played. A website called Strat-O-Matic used the Monte Carlo method to simulate a full 2020 season. This paper proposes and explores other possible models for predicting the outcome of the season. The first model we present is a Fourier series model that fits the teams’ win-loss percentage (W-L%). We hypothesize that a team’s W-L% can be modeled by a sum of cosine and sine curves as every team has winning seasons and losing seasons. The second model we suggest is a cubic spline model, which connects two adjacent W-L% data points with a cubic function. The last model we examine is a seasonal Auto-Regressive Integrated Moving Average (SARIMA) model that was generated following the Box-Jenkins method. With W-L% data from 1998 to 2021, we produce Fourier series, cubic spline, and SARIMA models for each team with the help of Python. The Fourier series and cubic spline model were employed to predict the 2020 W-L%. We analyzed the goodness of fit of the models by computing the sum of the absolute residuals, the sum of the absolute residuals squared, and the maximum absolute residual. Furthermore, we compared these two models to Strat-O-Matic’s model. The purpose of the SARIMA model was to forecast the 2022 monthly W-L%. We evaluated this model by calculating the mean absolute error, root mean square error, and mean absolute percentage error. We concluded that using a Fourier series and cubic spline model to predict a team’s 2020 W-L% and using a SARIMA model to forecast 2022 monthly W-L% is appropriate and satisfactory.

COinS