Some weeks ago F. Bookstein announced in morphmet his last paper, in which he warned about an artifactual separation between groups when we use a between-group PCA. Following that message there were many others, some related to authorship issues (which I won’t comment) and some others related to technical ones.
So first, what’s a between-group PCA? A between-group PCA is one kind of discriminant analysis where you maximize the separation among pre-defined groups. To do that, we first run a PCA on the group means and then individuals are projected on the axes of that group-means PCA. Because it’s based on a rotation of the original space (PCA) and not on the estimation of variance and a subsequent deformation of the space (e. g. CVA) it’s considered robust against data with high dimensionality/sample sizes ratios (here you may be interested in reading my first post on PCA and dimensionality).

So the problem people have noticed is that as the dimensionality exceeds the sample size, the between-group PCA strongly differentiates the groups we ask to discriminate no matter how arbitrary they are. It makes sense: when we run a PCA on the group means we highly reduce the dimensionality to N-1 dimensions (where N is the number of groups). Then, the projection of each individual to these few dimensions reduces the variance within groups (as in my beloved Albrecht 1980 figure but reading it from the bivariate plot to the two univariate plots):

Now, Philip Mitteroecker also pointed out an interesting fact: even when we use a high number of landmarks, we find covariation (integration) among variables and therefore effective dimensionality is never going to be really high. Therefore, ‘many of the problems described by Cardini et al. and Fred can be avoided by variable reduction (ordinary PCA) prior to bgPCA and related techniques’.
That is only true if, pay attention, the pattern of covariation within groups is similar to the pattern of covariation among groups. When both patterns are equal, then the PCA on the raw sample will be equivalent to the between-group PCA and therefore no artefacts show up. When within-groups variation is perpendicular to the among group variation, then we’ll find a collapse of the within-group variance. Actually, on the extreme, this second situation would be analogous to run a PCA on the group means and then place the individuals on their group centroid:


In the second case, however, within-group variation is perpendicular to among-group variation. On the extreme, within-group would be represented by two lines perpendicular to the bgPC1 (red line). In that situation the projection of the individuals on the new axis would completely remove within-group variation and it would also reduce (but comparatively much less) variation among individuals from the two groups.
While it’s certainly true that in the first case anisotropic variation reduces the problem of spurious separation of groups (because population PCA and bgPCA would be equivalent), the second case would make it much worse. When population grouping is based on, for example, on evolutionary relatedness, it is unlikely that within-group variation will be perpendicular to among-group variation. However, for some other kind of groups (e. g. response to one environmental factor) that might happen.
Here’s my two cents on the between-group PCA issue.
CVG