Skip to main content

2.10 Small sample sizes

There is one necessary restriction on the use of PERMDISP, which is that the number of replicate samples per group must exceed n = 2. The reason is that, if there are only two replicates, then, by definition, the distance to the centroid for those two samples must be equal to one another. Consider a single variable and a group with two samples having values of 4 and 6. The centroid (average) in Euclidean space for this group is therefore 5. The distance from sample 1 to the centroid is 1 and the distance from sample 2 to the centroid is also 1. These two values of z are necessarily equal to one another. This will also be the case for other groups having only 2 replicate samples, so the within-group variance of the z’s when n = 2 for all groups will be equal to zero. If the within-group variance is equal to zero, then the F statistic will be infinite, so the test loses all meaning. Clearly, the test is also meaningless for a group with n = 1, which will have only a single z value of zero. Thus, if the sample size for any of the groups is n ≤ 2, then the PERMDISP routine will issue a warning accordingly. Although test results are meaningless in such cases, the individual deviations (the z’s) can nevertheless still be examined and compared in their value across the different groups, if desired. More generally, the issue here is the degree of correlation among values of z, which increases the smaller the sample size. Levene (1960) showed the degree of correlation is of order n-2 which, he suggested, will probably not have a serious effect on the distribution of the F statistic. We suggest that formal tests using PERMDISP having within-group sample sizes less than n = 10 should be viewed with some caution and those having sample sizes less than n = 5 should probably be avoided, though (as elsewhere) further simulation studies for realistic multivariate cases would be helpful in refining such rules-of-thumb.