My two cents on the between-group PCA issue

Some weeks ago F. Bookstein announced in morphmet his last paper, in which he warned about an artifactual separation between groups when we use a between-group PCA. Following that message there were many others, some related to authorship issues (which I won’t comment) and some others related to technical ones.

So first, what’s a between-group PCA? A between-group PCA is one kind of discriminant analysis where you maximize the separation among pre-defined groups. To do that, we first run a PCA on the group means and then individuals are projected on the axes of that group-means PCA. Because it’s based on a rotation of the original space (PCA) and not on the estimation of variance and a subsequent deformation of the space (e. g. CVA) it’s considered robust against data with high dimensionality/sample sizes ratios (here you may be interested in reading my first post on PCA and dimensionality).

Imagine one population (black), where you run a PCA so you get a visualization like this (PC1 & PC2). If you make two groups out of your population (dashed circles) and you run a PCA on the means you will get just one axis (bgPC1, N-1) , that connects the centre of each circle. Finally, each individual would be projected on this new axis to get the between-group PCA scores.

So the problem people have noticed is that as the dimensionality exceeds the sample size, the between-group PCA strongly differentiates the groups we ask to discriminate no matter how arbitrary they are. It makes sense: when we run a PCA on the group means we highly reduce the dimensionality to N-1 dimensions (where N is the number of groups). Then, the projection of each individual to these few dimensions reduces the variance within groups (as in my beloved Albrecht 1980 figure but reading it from the bivariate plot to the two univariate plots):

Imagine the same situation as in figure 1 but with some overlap between the groups (blue area). This area of overlap is indeed larger in two-dimensions (blue area) than in one dimension (red line within blue area). As we increase the dimensionality, the area of overlap in one dimension becomes smaller in relation to the original area of overlap (volume of two spheres in 3D, hypervolume of two hyperspheres for more than 3 dimensions). Separation among groups will be larger and larger.

Now, Philip Mitteroecker also pointed out an interesting fact: even when we use a high number of landmarks, we find covariation (integration) among variables and therefore effective dimensionality is never going to be really high. Therefore, ‘many of the problems described by Cardini et al. and Fred can be avoided by variable reduction (ordinary PCA) prior to bgPCA and related techniques’.

That is only true if, pay attention, the pattern of covariation within groups is similar to the pattern of covariation among groups. When both patterns are equal, then the PCA on the raw sample will be equivalent to the between-group PCA and therefore no artefacts show up. When within-groups variation is perpendicular to the among group variation, then we’ll find a collapse of the within-group variance. Actually, on the extreme, this second situation would be analogous to run a PCA on the group means and then place the individuals on their group centroid:

Two examples of between-group PCA with anisotropic variation. In the first case, the one I think Philip Mitteroecker had in mind and probably the most likely when the two groups are evolutionary close, the major axis of variation within-group is the same than the axis of variation among groups. On the extreme, within-group variation could be two straight lines overlapping with the bgPC1 (red line): in that situation the original disposition of the individuals would be equivalent to the distribution after the between group-PCA.
In the second case, however, within-group variation is perpendicular to among-group variation. On the extreme, within-group would be represented by two lines perpendicular to the bgPC1 (red line). In that situation the projection of the individuals on the new axis would completely remove within-group variation and it would also reduce (but comparatively much less) variation among individuals from the two groups.

While it’s certainly true that in the first case anisotropic variation reduces the problem of spurious separation of groups (because population PCA and bgPCA would be equivalent), the second case would make it much worse. When population grouping is based on, for example, on evolutionary relatedness, it is unlikely that within-group variation will be perpendicular to among-group variation. However, for some other kind of groups (e. g. response to one environmental factor) that might happen.

Here’s my two cents on the between-group PCA issue.

CVG