## Canonical Correlation

To describe the association between two sets of variables, the canonical correlation analysis plays a significant role. In canonical correlation, two sets of variables are related and these variables may or may not be categorical. The main goal of the canonical correlation analysis is to develop these linear composites (*canonical variable*), derive a set of weights for each variable, thereby explaining the nature of relationships that exist between the sets of response and predictor variables that are measured by the relative contribution of each variable to the canonical functions (relationships) that exist.

The results of applying canonical correlation are a measure of the strength of the relationship between two sets of multiple variables. This measure is expressed as a canonical correlation coefficient (r) between the two sets. For interpretation of the results, standard canonical coefficients (not raw canonical coefficient) were used for unification of units and the scales of the original variables. The correlation between the original variables and canonical variables is known as *intra-set structure correlation.* The *intra-set structure correlation *is more stable than the raw or standardized canonical weights in univariable context (Gittens, 1985). That is, a correlation value describes a univariate relation between the variable and its canonical variable, without considering the existence of the other variables. That is, the correlation between the original variable and its canonical variable provide no information about the multivariate contribution of a variable to its canonical variable. To quantify the contribution of each variable to the canonical variables in a multivariate context, Rencher (1998 ) and Johnson and Wichern (2002) recommend using the standardized coefficients instead of the *intra-set structure correlation.*

Canonical correlation requires a relatively large number of observations compared to the number of variables. It is also sensitive to collinearity in independent variables and requires multinormal data sets. If the canonical correlation is done on the standardized variables, each canonical variable is a principal component and maximizing correlation and covariance are the same.

Reference:

Johnson, R.A., and Wichern, D.W. (2002). *Applied Multivariate Statistical Analysis.* 5^{th} edition. Peason education: Prentice-Hall.