Just wanna share my experience

Principal Component Analysis

In wikipedia, a simple definition of principal components analysis (PCA) is a technique for simplifying a dataset, by reducing multidimensional datasets to lower dimensions for analysis.

Principal Component Analysis is a variable reduction procedure and it is useful when we have a certain number of (possible) correlated variables and believe that there is some redundancy in those variables possible without loosing information.

PCA will reduce the existing numerous variables into ‘new’ uncorrelated variables, called principal components (PC), ranking them from the PC with the highest variance to the one with the lowest variance. The variance is a measurement for maximal information obtained. The higher the variance, the more information is retained. By applying PCA, as much PC’s will be created as there are original variables, but the most information can be derived from the PC’s with the highest variance.

In PCA, the general rules of thumb in deciding the numbers of principal components to be considered are;

1 Retain the first few principal components in order to get approximation 80 – 90 % of variation such that an accepted amount of information is lost.

2 Choose principal components that are greater than average percentage of the sample variation. However, the choice depends on the situation.

3 A scree plot, which is a visual aid to determining an appropriate number of principal components. The number of components is taken to be the point at which the remaining eigenvalues are relatively small and all about the same size (Johnson and Wischern, 2002).

However, there is no definitive rule to determine the number of principal component  to retain.


May 11, 2008 Posted by | Statistics | | Leave a comment