6.5 Solution to the multivariate BFP

Overview

Anderson et al. (2017) described a general dissimilarity-based solution to the multivariate Behrens-Fisher problem. Their solution uses a statistic (called '$F_2$' therein) that is a modification of the original PERMANOVA pseudo F statistic (called '$F_1$' therein). This modification is a direct dissimilarity-based multivariate analogue to a solution to the univariate BFP proposed by Brown & Forsythe (1974) . The PERMANOVA routine in PRIMER is the only known software implementation of this method that correctly accounts for heterogeneity in multivariate dispersions not only in the construction of the test-statistic itself, but also in the algorithm used for permutations to estimate the p-value.

What is described below are the details of the one-way case, but for more complex ANOVA designs with multiple factors, the end-user must specify wherein the heterogeneity lies that needs to be accounted for, and the construction of the correct $F$ statistic and associated permutation algorithm required to achieve rigorous inference in the context of the full study design is not trivial. We re-iterate: the PERMANOVA routine in PRIMER is the only software we know of that will do all of this correctly.

PERMANOVA in a nutshell

For a description of PERMANOVA, please see the original articles ( Anderson (2001) , McArdle & Anderson (2001) ) and also the rather more recent and more thorough encyclopedia entry provided by Anderson (2017) . We shall describe the original PERMANOVA test-statistic here, in brief, following Anderson et al. (2017) , with notation that facilitates the description of the modified test.

Suppose $\bf Y$ is an $N \times p$ matrix of multivariate row vectors ${\bf y}_ {ij}$ corresponding to sampling units, each of length $p$ and each belonging to one of $i = 1, \ldots, a$ groups, with $j = 1, \ldots, n_{i}$ sampling units (rows) in the $i$th group and $N = \sum_{i=1}^a n_i$. Let $\bf D$ be an $N \times N$ symmetric matrix of dissimilarities $\lbrace d_{ij,i'j'} \rbrace$ calculated between every pair of sampling units.

Next, as in Gower (1966) , let matrix $\bf A$ be comprised of elements $\lbrace a_{ij,i'j'}\rbrace$ = $\lbrace -0.5 \times d_{ij,i'j'}^2 \rbrace$, then define matrix $\bf G$ (a centred version of matrix $\bf A$) as: $$ {\bf G} = ( {\bf I} - (1/N) {\bf J}_ {\scriptscriptstyle N} ) {\bf A} ( {\bf I} - (1/N) {\bf J}_ {\scriptscriptstyle N} ) $$ where ${\bf J}_ {\scriptscriptstyle N}$ denotes an $N \times N$ matrix of $1$s and $\bf I$ denotes an $N \times N$ identity matrix.

For the one-way case, let $\bf X$ be a $N \times r$ matrix of full rank $r = (a-1)$ containing orthogonal contrasts among the groups. We can construct a linear projection matrix for the design as: $$ {\bf H} = {\bf X} [ {\bf X}'{\bf X} ]^{-1} {\bf X}' $$ Then, the PERMANOVA pseudo $F$ statistic for comparing the centroids among the $a$ groups is: $$ F_1 = \frac { \text{tr} ({\bf HG})/(a-1) } { \text{tr} [ {\bf (I-H) G} ] /(N-a) } $$

where '$ \text{tr}(\cdot)$' denotes the trace (sum of diagonal elements) of a matrix. Note that if $p = 1$ and $\bf D$ is calculated using Euclidean distances, then $F_1 = F_0$, the classical univariate $F$ ratio.

Test by permutation

We may calculate a p-value to test the null hypothesis of equality of centroids in the space of the chosen dissimilarity measure under the sole assumption that the sampling units (rows) are exchangeable among the $a$ groups. First, we calculate an observed value of the test statistic, $F_1$, with the rows of the data (sampling units) in their original order. Then, we randomly permute (re-order) the $1,\ldots,N$ rows of matix $\bf Y$ to obtain a matrix of permuted data ${\bf Y}^\pi$, yet leaving the grouping structure fixed (i.e., the original ordering is retained in matrix $\bf X$ and hence also in the projection matrix $\bf H$). In other words, under a true null hypothesis, any ordering of the sampling units across the groups is equally likely via exchangeability. Indeed, exchangeability is the only assumption of the PERMANOVA test, which is distribution-free.

Re-calculation of $F_1$ replacing $\bf Y$ with ${\bf Y}^\pi$ yields $F_1^\pi$, a value of the test-statistic under permutation. Repeating this random permutation and re-calculation a large number of times (say, $n_\text{perm} = 9999$), yields a distribution of values of $F_1^\pi$ from which we can empirically calculate a p-value as $P = \text{Pr}(F_1^\pi \ge F_1)$. Specifically, the p-value is calculated directly as:

$$ P = \frac{( \text{no. of } F_1^{\pi} \ge F_1 ) + 1}{ (n_\text{perm} + 1) } $$

This mirrors what we saw for the test by permutation for univariate ANOVA (and so many other permutation tests offered in PRIMER); namely, that the $+1$ in each of the numerator and denominator are there simply to acknowledge the inclusion of the original (genuine) ordering of the data as one of the possible 'random' outcomes we could have obtained - they would not be needed in the equation if all possible orderings were to be done systematically and exhaustively.

Modified PERMANOVA to account for heterogeneity

Anderson et al. (2017) suggested a modification to the PERMANOVA pseudo $F$ statistic to account for heterogeneity, following directly from the univariate solution to the BFP proposed by Brown & Forsythe (1974) . Specifically, instead of $F_1$, we can use:

$$ F_2 = \frac { \text{tr} ({\bf HG}) } { \sum_{i=1}^a (1-n_i/N) \cdot V_i } $$

where $V_i$ is the within-group dispersion for group $i$, defined as:

$$ V_i = \sum_{j=1}^{(n-1)} \sum_{j'=(j+1)}^n d_{ij,i'j'}^2 / [ n_i(n_i-1) ] $$

Some important things to note about this modified test-statistic are:

The null hypothesis for $F_2$ is equality of centroids in the space of the chosen dissimilarity measure given potential differences in dispersions among the groups.
The potential for heterogeneity is explicitly acknowledged in the modified test-statistic, $F_2$ by the calculation of separate individual dispersions ($V_i$) for each group.
$F_2$ is equivalent to $F_1$ if sample sizes are equal across all groups. This is sensible, as PERMANOVA (like ANOVA) is very robust to heterogeneity of dispersions when the design is balanced.
$F_2$, like $F_1$, is carefully constructed so that the numerator and denominator have the same expectation when the null hypothesis is true.
If we are dealing with univariate data, so $p = 1$ response variable, and the entries in ${\bf D}$ are Euclidean distances, then $V_i$ is the usual classical univariate unbiased measure of the sample variance ($s^2$) for group $i$.
If PERMANOVA is run using $F_2$, then the degrees of freedom are also modified to reflect the Satterthwaite approximation given by Brown & Forsythe (1974) . This is to ensure compatibility between the PERMANOVA implementation and the solution provided by Brown & Forsythe (1974) for univariate cases, but in practice the p-value itself is always calculated in PERMANOVA using permutation algorithms (see below), so there is no direct consequence of this change in the degrees of freedom (which will remain constant under permutation) for the level of significance in the outcome.

Obtaining a p-value for the modified test

On the face of it, we would not expect to be able to do a permutation test for $F_2$, because how can we view the sampling units as exchangeable among the groups if we know already that the groups have different dispersions? Some alternative method, such as a separate-sample bootstrap, either with or without some kind of bias-adjustment (e.g., Efron & Tibshirani (1993) , Manly (2006) ), would seem to be more appropriate to use here, at least conceptually. Simulation work by Anderson et al. (2017) demonstrated, however, that the modified test based on $F_2$ had better statistical behaviour when permutations were done to obtain the p-value, rather than bootstraps. To be specific, the Type I error was closer to the nominated significance level and the distribution of p-values under a true null hypothesis was more uniform (as is desirable) when tests were done using permutations. In contrast, the results obtained using bootstrapping methods were always much more conservative, and the degree of conservatism increased with increasing dimensionality, increasing degree of heterogeneity and increasing differences in sample sizes among groups. In addition, under all simulation scenarios, tests by permutation using $F_2$ had the greatest power, matching or exceeding that of $F_1$, compared to bootstrap alternatives.

Thus, PERMANOVA in PRIMER that is done using $F_2$ to account for heterogeneity calculates p-values using permutation algorithms rather than using bootstrapping methods.

Introduction

New Statistical Methods in P8

New Tools & Utilities in P8

1.1 Expansion from P7 to P8

1.2 Definitions of statistics

1.3 Biotic data: summary stats

1.4 Split summary stats results by groups

1.5 Environmental data: summary stats

2.1 What is an empirical distribution?

2.2 Example: Empirical distributions of oyster sizes

3.1 Plots of empirical densities

3.2 Example: Dotplot of oyster sizes

3.3 Example: Violin plot of kelp holdfast volumes

4.1 Wilcoxon signed-rank test

4.2 Example: Plankton hauls

4.3 Mann-Whitney U test

4.4 Example: Snapper in marine reserves

4.5 Kruskal-Wallis test

4.6 Example: A bivalve species from Ekofisk

4.7 Kolmogorov-Smirnov test

4.8 Example: Sizes of oysters

4.9 Test of Association

4.10 Example: Ekofisk diversity

4.11 Example: Associations between species

Overview of new 'Design' options and tools

6.1 Overview - Allow heterogeneity

6.2 ANOVA in a nutshell

6.3 The Behrens-Fisher problem (BFP)

6.4 Multivariate Behrens-Fisher problem

6.5 Solution to the multivariate BFP

6.6 Example: one-way PERMANOVA allowing heterogeneity

6.7 Heterogeneity in more complex designs

6.8 Example: two-way crossed PERMANOVA allowing heterogeneity

7.1 Overview - Finite factors

7.2 Dichotomy: fixed vs random factors

7.3 Not a dichotomy: a progression from fixed to random

7.4 Example: environmental impact on molluscs

7.5 Broader implications for detecting impact

8.1 Designs lacking replication

8.2 Example: Split-plot - Woodstock vegetation

8.3 Example: Repeated measures - Victorian avifauna

9.1 Why group covariables together?

9.2 Periodic and cyclical models

9.3 Example: Annual monthly cycles - B.C. macroalgae

10.1 Ordinations for multi-factor designs

10.2 Main effects plot

10.3 Interaction plot

10.4 Example: NZ fish assemblages

11.1 What are 'residual' distances?

11.2 Example: Plankton (revisited)

12.1 Overview - Control charts

12.2 Classical univariate control chart

12.3 Classical multivariate control chart

12.4 Bivariate normal example: NZ fish

12.5 Dissimilarity-based multivariate control chart

12.6 Additional notes on implementing control charts

12.7 Example: Birds from Grand Forks

13.1 Overview

13.2 Analysing cumulative standardised data

13.3 Example: Mussel sizes in the Gulf of Alaska

13.4 Example: Gulf of Maine invertebrates - functional resemblance

14.1 Overview

14.2 Example: NE Pacific groundfish vs depth

15.1 New default colour palette

15.2 New selection options

15.3 Re-name levels of a factor (or indicator)

15.4 Add customised values/labels to graphical axes

15.5 Split data sheet by factor/indicator

15.6 Line plots for samples

15.7 Output group-level stats from dispersion (or variability) weighting

15.8 Output diagnostic plots from CAP

15.9 New diagnostics for PCA/PCO plots

6.5 Solution to the multivariate BFP

Overview

PERMANOVA in a nutshell

Test by permutation

Modified PERMANOVA to account for heterogeneity

Obtaining a p-value for the modified test