Do you know there are 3 types of missing values ?
Introduction : My experience
I hope you are doing great, in the previous blog we discussed about what are missing values, how to identify them, what harm do they do if we keep them as it is and also we discussed approaches to deal with missing values. Today in this blog we will discuss about an important but not mostly discussed topic called types of missing values.
Most of my friends practicing machine learning and deep learning only know about what are missing values and how to deal with them, but there are only few friends of mine who actually are aware about types of missing values, and because of this I personally have seen that they just randomly apply some technique to deal with missing values, instead of using a technique based on type of missing value.
Types of missing values
There are basically 3 different types of missing values ( MAR, MCAR and MNAR ). Now in the next couple of minutes we will discuss each and every type of misisng value with some amazing illustrations and also we will take a look at which techniques are best to use for specific type of missing value. So without any further delay let's get started.
MAR ( Missing at random )
MAR stands for missing at random and it is one of the most common type of missing values. In MAR, missingness in one feature (variable) is related to the values of other observed features in the dataset.
To better understand this let us assume that have conducted some survey and in that survey we asked the individuals about their income and level of education. Now in most of scenarios we observed that the individuals with high income generally ignored to mention their income, so in this scenario the missingness in some feature is due to the values of some other features which in our case is education level.
So in such scenario we can impute the missing value in the feature by understanding the pattern between that feature and other features in our dataset in short we can say that for MAR we can do multivariate imputation, because there would be some indivisuls who would have mentioned their income so by using those data points we can analyze the pattern and based on that pattern we can impute the values.
MCAR ( Missing completely at random )
MCAR stands for missing completely at random and according to this type the missingness in some feature is completely random and is not because of some other features in our dataset, now becasue of this we can't actually figure out some pattern between the missing value feature and other features present in our dataset.
To better understand MCAR let us assume that we have conduced some survey in which are interested in to capture the age and thier color preference, but unfortunately due to some error during data collection, certain participants' favorite color data went missing. In this case, if the missingness is completely random, so in such kind of scenarios instead of using the multi-variate imputation, simple univariate imputation technique would be more appropriate.
MNAR ( Missing not at random )
Missing Not at Random (MNAR) refers to a type of missingness where the missing values are not randomly occurring, and the missingness is related to the unobserved data or factors that are not captured in the observed variables.
Let's say you're conducting a study on the effectiveness of a new medication for chronic pain management. The study involves collecting data on participants' pain levels before and after taking the medication. However, some participants who experience severe pain might be less likely to complete the post-medication pain assessment.
In this scenario, the missingness of the post-medication pain assessment data is related to the unobserved pain severity. Participants with higher levels of pain may be more likely to drop out of the study or not provide complete information about their post-medication pain levels. The missingness mechanism is directly influenced by the unobserved variable (pain severity), indicating that it is not random. In this MNAR situation, the missingness is driven by the values of the missing variable itself (post-medication pain) and is not dependent solely on observed variables. Ignoring or improperly handling the MNAR missingness could lead to biased conclusions about the medication's effectiveness.
To address MNAR, specialized methods can be employed. For example, selection models can be used to model both the missingness mechanism and the relationship between the missing variable and other observed variables. These models estimate the missing values while accounting for the missingness pattern.
Short note
I hope you good understanding of what are type of missing values and which technique to apply for specific type of missing value, so if you liked this blog or have any suggestion kindly like this blog or leave a comment below it would mean a to me.
Subscribe to my newsletter
Read articles from Yuvraj Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Yuvraj Singh
Yuvraj Singh
With hands-on experience from my internships at Samsung R&D and Wictronix, where I worked on innovative algorithms and AI solutions, as well as my role as a Microsoft Learn Student Ambassador teaching over 250 students globally, I bring a wealth of practical knowledge to my Hashnode blog. As a three-time award-winning blogger with over 2400 unique readers, my content spans data science, machine learning, and AI, offering detailed tutorials, practical insights, and the latest research. My goal is to share valuable knowledge, drive innovation, and enhance the understanding of complex technical concepts within the data science community.