Necessary Packages

#install.packages("naniar")
library(naniar)
#install.packages("Hmisc")
library(Hmisc)

Missing data is an advanced topic, and this document does not provide a comprehensive treatment of how to handle missing data in statistical modeling. These are just two tools that I’m fond of. For this document, I’ll be using the built-in airquality dataset. Before you proceed, make sure to install the naniar and Hmisc R packages.

Missing data visualization

The naniar package provides lots of tools for understanding the nature of the missingness.

data("airquality")
vis_miss(airquality)

naniar also includes the upset plot to explore the patterns of missingness. We can clearly see there are only two observations that are missing both Ozone and Solar.R.

gg_miss_upset(airquality)