Document Type
Thesis
First Faculty Advisor
Suhong Li
Second Faculty Advisor
Richard Glass
Keywords
machine learning; NFL; modeling
Publisher
Bryant University
Rights Management
CC-BY-NC
Abstract
This study hypothesizes that injury-causing factors can be identified through training machine learning models with NFL injury data. The machine learning process entailed web scraping, pre-processing, cleaning, modeling, and analyzing NFL injury data to identify these factors. The features used to model injuries included the following: games played, games started, weight, height, age, year, years of experience, starting position, and team. The four models used to model NFL injuries were Logistic Regression, Decision Trees, Random Forests, and Gradient Boosted Trees. The model with the best performance was the Gradient Boosted Trees model, with an F1 score of 0.508. In addition, the Gradient Boosted Trees model selected teams, games played, and starting position to be the most influential factors when determining the probability of a player getting injured during the season. The most notable takeaway from this study is that the left guard position is the most injury-prone, followed by defensive secondary player positions such as safety and cornerback.