The Effects of Sampling Methods on Machine Learning Models for Predicting Long-term Length of Stay: A Case Study of Rhode Island Hospitals

Document Type


Identifier Data



IGI Global


The ability to predict the patients with long-term length of stay (LOS) can aid a hospital's admission management, maintain effective resource utilization and provide a high quality of inpatient care. Hospital discharge data from the Rhode Island Department of Health from the time period between 2010 to 2013 reveals that inpatients with long-term stays, i.e. two weeks or more, costs about six times more than those with short stays while only accounting for 4.7% of the inpatients. With the imbalance in the distribution of long-stay patients and short-stay patients, predicting long-term LOS patients becomes an imbalanced classification problem. Sampling methods—balancing the data before fitting it to a traditional classification model—offer a simple approach to the problem. In this work, the authors propose a new resampling method called RUBIES which provides superior predictive ability when compared to other commonly used sampling techniques.

This document is currently not available here.