"The Resampled Data in Imbalanced Classification" by Matthew Bonas, Son Nguyen et al.

Information Systems and Analytics Department Faculty Book Publications

Title

The Resampled Data in Imbalanced Classification

Authors

Matthew Bonas
Son Nguyen, Bryant UniversityFollow
Alan Olinsky, Bryant UniversityFollow
John T. Quinn, Bryant UniversityFollow
Phyllis Schumacher, Bryant UniversityFollow

Document Type

Book Chapter

Identifier Data

978-1-64802-143-5

Publisher

Information Age Publishing Inc.

Publication Source

Contemporary Perspectives in Data Mining, Volume 4

Abstract

Classical positive-negative classification models often fail to detect positive observations in data that have a significantly low positive rate. This is a common problem in many domains, such as finance (fraud detection and bankruptcy detection), business (product organization), and healthcare (rare diagnosis). A popular solution is to balance the data by random undersampling (RUS), that is, randomly remove a number of negative observations or random oversampling (ROS), that is, randomly reuse a number of positive observations. In this study, we discuss a generalization of RUS and ROS where the dataset becomes balanced, so that number of positive observations matches the number of negative observations. We also propose a data-driven method to determine the size of the resampled data that most improves classification models.

Link to Full Text

Find in your library

COinS

Information Systems and Analytics Department Faculty Book Publications

Title

The Resampled Data in Imbalanced Classification

Authors

Document Type

Identifier Data

Publisher

Publication Source

Abstract

Search

Browse

Links

Information Systems and Analytics Department Faculty Book Publications

Title

The Resampled Data in Imbalanced Classification

Authors

Document Type

Identifier Data

Publisher

Publication Source

Abstract

Share

Search

Browse

Links