Predicting Hospital LOS Using Machine Learning

I wanted to share some exciting updates about a project I've been working on with a couple of my classmates. We've been diving deep into the world of healthcare and machine learning, specifically focusing on predicting the length of stay (LOS) for hospital patients. We recently presented some of our preliminary findings and I'd to give you a glimpse of what we’ve been up to.

The Motivation Behind Our Project

Hospital bed occupancy is a critical issue. With occupancy levels in England reaching peaks of over 90%, efficient resource allocation is more important than ever. Predicting the length of stay can help hospitals manage their resources better, reduce overcrowding, and ultimately improve patient care.

Our Approach

To tackle this problem, we're leveraging the MIMIC-III Clinical Database, which contains de-identified health-related data from over 50,000 ICU patients. This rich dataset allows us to explore various machine learning models to predict LOS.

Methods and Models

We're employing several machine learning techniques, including:

Decision Tree-Based Models
Neural Networks (ANN)
Clustering Methods (K-Means Clustering)

Our analysis involves descriptive and correlation analysis, feature engineering, and model training. We're paying special attention to the skewness of the LOS data, which influenced our choice of using medians for more accurate predictions.

Preliminary Results

Our initial results are promising. For instance, using all available data from a patient's hospital stay, one of our ANN models achieved an R² of 0.85. However, predicting LOS with data only from the first 12 hours after admission yielded a lower R² of 0.45, indicating room for improvement, and that we could possibly work on producing models both for time of admittance and for predicting imminent discharges later on in patient stays.

I'm personally getting my best results from using gradient-boosted decision trees, with R² results of over 0.7. I'm excited to explore creating models with this method for time of admittance, time of transfer out of ICU, and perhaps for every day spent in the hospital, predicting the probability of discharge within 24 hours. I'm also planning on creating nested models which combine multiple approaches together, and I'm intrigued as to how much improvement I can see doing this.

Next Steps

We're excited to continue refining our models and explore:

Determinant factors affecting LOS
Prediction differentials at different stages of admission
Identifying the best-fit model for accurate predictions

Predicting the length of stay in hospitals can be a game-changer for healthcare efficiency. Our work is a step towards achieving this goal, and we’re eager to see how our models can be applied in real-world settings.

Thank you for reading, and stay tuned for more updates on our journey!

What I've Been Up To