Missing in Action

By
Roberto Garrido
February 11, 2014
Roberto-Garrido-Sun

Most data is not created perfect. And one imperfection quite common to most data analysts is that of missing values. Records that have missing values in one or more variables are called incomplete cases. In SAS, most procedures that analyse data ignore records with missing values. Only those records having complete values also known as complete cases are analysed.
Removing incomplete cases is a simple approach to addressing the missing values problem. In data having significant missing values of systematic nature, the procedure of removing incomplete records could yield incorrect results. The remaining “good” data fail to be an unbiased representation of its population and any inference drawn from this sample does not hold true of the population.
There are simple ways of addressing incomplete data. Populating the missing values with its mean value is a common approach in remedying the problem. This method allows for the other non-missing values to be used in an analysis. In this single imputation approach, the probability of the predictions about the missing values are not taken into account.
SAS came up with an alternative solution which is more robust but rather complex in populating missing values. The procedure is called MI and performs multiple imputations of missing data. Rather than looking at a single value to use as a proxy for imputation, the methodology looks at a set of values and their underlying distribution. The random sample of probable values of the missing value is assessed using standard statistical methods including confidence intervals about the missing value.
The MI procedure can be summarised in three steps:
1. The missing data are filled in n times to generate n complete data sets.
2. The n complete data sets without missing values are analyzed using standard statistical analyses.
3. The n complete data sets provide the information to perform statistical inferences.
The procedure is far better than single imputation approach because it takes into account the uncertainty associated with the missing value.
While computational time on a computer is longer using this methodology, it is no longer a significant deterrent since modern day computers have enough processing speed to handle such tasks. Roberto

Copyright © 2019 OptimalBI LTD.