Just wanna share my experience


Missing or incomplete data are a common scenario occurring in many studies. An observation is considered as incomplete case if the value of any of the variables is missing. Even with the best design and monitoring, the observations can be incomplete usually due to the following possible reasons: missing by design; censoring and drop-out; or non-response etc. Most statistical packages exclude incomplete cases from analysis by default. This approach is easy to implement but has serious problems. Firstly, the loss of any information on incomplete cases may lower the desired efficiency in the study .Secondly; they may lead to substantial biases in analyses. Thus, missing data are important to consider in the analyses.

In statistical terminology, missingness in the data is assumed to be three types: 1) Missing completely at Random (MCAR); 2) Missing at random and 3) Missing not at random (MNAR).

Missing Completely at Random

A non-response process is said to be missing completely at random (MCAR) if the missingness is independent of both unobserved and observed data. Under missing completely at Random (MCAR) the observed data can be analyzed as though the pattern of missing values were predetermined. In anyway of analyzing the data procedure the process generating the missing values can be ignored

Missing at Random (MAR)

A non-response process is said to be missing at random (MAR) if, conditional on the observed data, the missingness is independent of the unobserved measurements. Although, according to Molenberghs and Verbeke, the MAR assumption is particularly convenient in that it leads to considerable simplification in the issues surroundings the analysis of incomplete longitudinal data, it is rare in practice for an investigator to be able to justify its adoption, and so in the situations the final class of missing value mechanisms cannot be ruled out.

Missing Not at Random

A process that is neither MCAR nor MAR is termed nonrandom (MNAR). Under MNAR the probability of measurement being missing depends on the unobserved data. Inference can only be made by making further assumptions about which the observed data alone carries no information.

Missingness frequently complicates the analysis of longitudinal data. In many clinical trials and other setting, the standard methodology used to analyze incomplete longitudinal data is based on such methods as complete case analysis (CC), Last observation carried forward method (LOCF) or simple form of imputation (unconditional or conditional mean imputation). This is often done without questioning the possible influence of these assumptions on the final results


Fitzmaurice G. M., Laird, N. M., Ware J. H. (2004). Applied Longitudinal Analysis. Wiley Series in Statistics. Wiley-IEEE.

Folstein, M.F., Folstein, S., McHugh, P.R. (1975): “Mini-mental state”: a practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research 12: pp. 189-198.

Jansen, I., Beunckens, C., Molenberghs, G., Verbeke, G. and Mallinckrodt, C. (2006). Analyzing incomplete discrete longitudinal clinical trial data.Statistical Science . 21, 1, pp. 52–69.

Molenberghs, G. and Verbeke, G. (2005). Models for Discrete Longitudinal Data. New York: Springer-Verlag.

Molenberghs, G. and Verbeke, G. (2007). Longitudinal Data Analysis. Censtat, Universiteit Hasselt.


May 12, 2008 Posted by | Statistics | | Leave a comment