Classification using L1-Penalized Logistic Regression
There are various classification algorithms that have been developed in different fields. Some algorithms are commonly used in genomics such as linear discriminant analysis (LDA), nearest neighbor classifier and logistic regression. Many authors such as Gohlmann and Talloen (2009), and Lee (2005) have comprehensively reviewed and compared of these algorithms.
Logistic regression is a supervised method for binary or multi-class classification (Hosmer and Lemeshow 1989). Because it is a simple, flexible and straightforward model that is easy to extend, the extensions of logistic regression
have been widely used in genomics research (e.g., Liao and Chin, 2007, and Sun and Wang, 2012).
In high-dimensional datasets such as in microarray settings where usually there are more variables than the observations and variables are correlated (multicolinierity), the classical logistic regression would perform badly and provide inaccurate estimates. It would give a perfect fit to the data with no bias and high variance which can lead to bad prediction (overfitting). In order to prevent this problem, a penalty for complexity in the model should be introduced.
The presentation which can be viewed here shows a short overview of L1 penalization logistics regression. Example of the application of this method in genomic is to define candidate classifiers genes to classify two different groups, e.g., cancer and non-cancer group.
• Lee JW, et al, 2005. An extensive comparison of recent classification tools applied to microarray data. Computational Statistics & Data Analysis. 48:869-885.
• Hosmer, D.W., Lemeshow, S., 1989. Applied Logistic Regression. Wiley Series in Probability and Mathematical Statistics. Wiley, New York, NY.
• Sun, H. andWang, S. 2012. Penalized logistic regression for high-dimensional DNA methylation data with case-control studies Bioinformatics. 28(10):1368-1375
• Tibshirani, R. 1996. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B (Methodological). 58:267- 288.
• Goeman, J.J. 2010. L1 Penalized Estimation in the Cox Proportional Hazards Model. Biometrical Journal. 52 (1): 70-84.
• Gohlmann, H., and, Talloen, W. 2009. Gene Expression Studies Using Affymetrix Microarrays. Chapman & Hall/CRC.
• Liao, J.G. , and Chin, K.V. 2007. Logistic regression for disease classification using microarray data: model