Document Type

Thesis

First Faculty Advisor

Suhong Li

Second Faculty Advisor

Richard Glass

Keywords

machine learning; NFL; modeling

Publisher

Bryant University

Rights Management

CC-BY-NC

Abstract

This study hypothesizes that injury-causing factors can be identified through training machine learning models with NFL injury data. The machine learning process entailed web scraping, pre-processing, cleaning, modeling, and analyzing NFL injury data to identify these factors. The features used to model injuries included the following: games played, games started, weight, height, age, year, years of experience, starting position, and team. The four models used to model NFL injuries were Logistic Regression, Decision Trees, Random Forests, and Gradient Boosted Trees. The model with the best performance was the Gradient Boosted Trees model, with an F1 score of 0.508. In addition, the Gradient Boosted Trees model selected teams, games played, and starting position to be the most influential factors when determining the probability of a player getting injured during the season. The most notable takeaway from this study is that the left guard position is the most injury-prone, followed by defensive secondary player positions such as safety and cornerback.

Download

Included in

Data Science Commons, Other Computer Sciences Commons

COinS

Honors Projects in Data Science

Identifying Factors that Lead to Injury in the NFL

Document Type

First Faculty Advisor

Second Faculty Advisor

Keywords

Publisher

Rights Management

Abstract

Included in

Search

Browse

Links

Honors Projects in Data Science

Identifying Factors that Lead to Injury in the NFL

Authors

Document Type

First Faculty Advisor

Second Faculty Advisor

Keywords

Publisher

Rights Management

Abstract

Included in

Share

Search

Browse

Links