Document Type

Thesis

First Faculty Advisor

Son Nguyen

Second Faculty Advisor

Rick Gorvett

Keywords

bankruptcy; imbalance; sampling; machine learning

Publisher

Bryant University

Rights Management

CC 4.0 BY-NC-SA

Abstract

Bankruptcy prediction is a widely researched topic. However, few studies focus on dealing with the imbalance problem. This paper proposes a new technique that applies a bagging undersampling procedure to balance the data and compares it to random undersampling and five oversampling techniques. The performance of the algorithm is evaluated by a random forest’s balanced accuracy, sensitivity, and specificity. The results show that models trained after applying the oversampling techniques are prone to overfitting, and the model trained after applying the proposed method had the highest balanced accuracy without overfitting.

COinS