Data Analysis with Azure Machine Learning: Healthcare Data Demo


Data is the lifeblood driving innovation in many industries, including healthcare. Machine learning algorithms are being used to predict patient outcomes, optimize treatment plans, and more. Azure Machine Learning is a popular platform for machine learning. Let’s take a look at some of Azure's tools. I’ll guide you through a hands-on demonstration of how to harness machine learning to analyze some synthetic healthcare data.
Obtain and Prepare Source Data
On your development machine, create a new directory called healthcare_azure
.
Go to the Kaggle Healthcare Dataset by Prasad Patil. It is a synthetic healthcare dataset designed to mimic real-world healthcare data.
Click the Download button, then Download dataset as zip. You may need to create an account or sign in to Kaggle (don’t worry, it’s free).
When the file is done downloading, unzip it. Then, move the file healthcare_dataset.csv
into the healthcare_azure
directory you created earlier.
Feel free to explore this file to get a sense of what data it contains. You will notice it has columns for things like age, gender, medical condition, test results, and more.
Also in the healthcare_azure
directory, create a file called MLTable
(no extension) with the following content:
# MLTable definition file for Healthcare Dataset
type: mltable # Specifies the type of asset definition
paths:
- file: ./healthcare_dataset.csv # Path to data file, relative to this MLTable file
# How to read the data file(s)
transformations:
- read_delimited:
delimiter: ","
encoding: "utf8"
header: all_files_same_headers # First row contains column names
This is a kind of definition file that describes the data in the directory. For this project, it’s simple because there’s only one CSV file.
Launch Azure Machine Learning Studio
Go to the Microsoft Azure Portal. You may need to create or sign in to your Microsoft account. If this is your first time using Azure services, you’ll be set up with a 1-month free trial account and $200 in credits.
Once you’re in, under Azure services, click on More Services.
You’ll see many sections. Find the AI + machine learning section and click on Azure Machine Learning under it.
You’ll need to create a workspace. Workspaces are dedicated spaces where you build, train, and deploy your models. Click on Create, then New workspace.
Now you’ll need to assign the workspace to a resource group, which in our case means creating a new one. Resource groups are used for management, billing, and lifecycle control. Under Resource group, click Create new.
Enter a name for the resource group, like default
and click OK.
It’s time to name the workspace. Put something like healthcare_demo
in the Name box. The rest of the default on this screen are fine.
Click on Review + create, then Create.
Once the deployment is complete, click on Go to resource.
Finally, click on Launch studio. We’re ready to work on some machine learning tasks.
Uploading Your Data
Data is at the heart of any machine learning project. In this step, you’ll get your data into Azure Studio.
From the sidebar on the left, under Assets, click Data.
Under Select data, click Create.
Under Name, enter kaggle_healthcare_dataset
.
Under Type, make sure Table (mltable) is selected.
Click Next.
Select From local files and click Next.
Under Datastore type, make sure Azure Blob Storage is selected.
Make sure workspaceblobstore is selected.
Click Next.
Click Upload folder.
Choose the healthcare_azure directory on your local machine and click Upload.
Click Next.
Click Create.
Author an ML Job
From the sidebar on the left, under Authoring, click Automated ML.
Click New Automated ML job.
You can use the default Job name and Experiment name, or customize them with your own name.
Click Next.
Under Select task type, select Classification.
Under Select data, check the box next to kaggle_healthcare_dataset.
Click Next.
Under Target column, select Test Results (String).
If you have a paid subscription, you can check the box for Enable deep learning. This featurizes text data and provides higher accuracy. But it requires a GPU, which none of the compute options on the free trial provide.
Under Validation type, select Train-validation split.
Under Percentage validation of data, enter 10
.
Click Next
Under Select compute type, select Serverless.
Under Virtual machine type, select CPU.
Under Virtual machine tier, select Dedicated.
Under Virtual machine size, choose an option that works for you. The more expensive the option, the faster the job will run.
Under Number of instances, enter a number that works for you. The more instances, the faster the job will run.
Note that choosing a cheap model with too few instances may cause the job to hit the default timeout limit of 6 hours.
Click Next.
Click Submit training job.
The job could take minutes, hours, or days depending on the source data, configuration, and compute resources requisitioned.
Auto ML will train and validate several models on the input data. When it’s complete, you’ll be able to evaluate all models.
Exploring the Results
It might take a while, but once the job is complete, you can view the performance of the models Auto ML created.
On the sidebar on the left, under Authoring, click on Automated ML.
Click the Display name of the recently completed job.
Click the Models + child jobs tab.
Look at all the algorithms Auto ML created and validated. In this table, you can see the AUC weighted for each one. AUC weighted measures the model’s ability to distinguish between classes. Higher is better.
Click the name of the algorithm with the highest AUC weighted (the one at the top of the table). That’s the best model.
Click Metrics.
Scroll down and view the the confusion_matrix.
This visualization shows how well the model predicted each category (Abnormal, Normal, Inconclusive). You can see exactly where the model go confused. Meaning, for example, how many Normal cases were incorrectly predicted as Abnormal.
Also take a look at all of the other metrics. Accuracy, Precision, Recall, and F1-Score are particularly important.
Next Steps
Azure Machine Learning gives you the ability to deploy a real-time endpoint, which enables immediate predictions. You deploy the model as a web service that's always running. You send it one or more data records via an API call, and it returns the predicted Test Result ('Abnormal', 'Normal', or 'Inconclusive') almost instantly. This is good for interactive applications.
However, you can also deploy a batch endpoint, which is what you want if you have a large dataset you want to score periodically. You provide the input dataset, then the endpoint processes it in the background, and saves the predictions to storage.
Deployment is outside the scope of this article, but perhaps I’ll write about it in the future!
Hopefully you’ve learned a bit about how to prepare, upload, and analyze synthetic healthcare data to gain valuable insights. Azure’s capabilities, including Automated ML, allow for efficient model training and evaluation, helping to identify the best-performing algorithms. As I continue to explore the tools, I’m finding many opportunities to enhance data-driven decision-making. Hopefully you will, too.
Subscribe to my newsletter
Read articles from Travis Horn directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Travis Horn
Travis Horn
I have a passion for discovering and working with cutting-edge technology. I am a constant and quick learner. I enjoy collaborating with others to solve problems. I believe helping people achieve their goals helps me achieve mine.