gravatar

mnusrat786

Muhammad Osama Nusrat

Recently Published

Applied statistics: Missing data handling tutorial
Missing data is not a trivial problem when analyzing dataset, it is usually not so straightforward either. If the amount of missing data is very small relatively to the size of the dataset, then leaving out the few samples with missing values may be a good solution in order not to bias the analysis. However leaving out some available data (some few samples) may hide some amount of information and depending on the situation you face, you may want to look for other fixes before extracting potentially useful data from your dataset. While some quick solutions such as _mean imputation_ may be good in some cases, such simple approaches usually introduce bias into the data, for instance, applying _mean imputation_ leaves the mean unchanged (which is desirable) but decreases variance, which may be undesirable. On other hand, The _mice_ method, helps imputing missing values with plausible data values. These plausible values are drawn from a distribution specifically designed for each missing data, which can be a good solution. In this tutorial: We are using the built-in dataset 'airquality' in R as a sample dataset, 1. We will identify missing data 2. We will visualize data 3. Handle missing data : imputation, mice etc