1.2 Partitioning

We shall begin by considering the balanced one-way (single factor) ANOVA experimental design. A factor is defined as a categorical variable that identifies several groups, treatments or levels which we wish to compare. Imagine that we have one factor with a groups (or levels) and n observations (samples⁵) per group for a total of N = a × n samples. For each sample, we have recorded the values for each of p different variables. Recall that in univariate analysis of variance, the total sum of squares ($SS _ T$, the sum of squared deviations of observations from the overall mean) is partitioned into two parts that are meaningful for testing hypotheses about group differences: the within-group (or residual) sum of squares ($SS _ {Res}$, the sum of squared deviations of observations from their own group mean) and the among-group sum of squares ($SS _ A$, the sum of squared deviations of group means from the overall mean). A directly analogous partitioning is done in multivariate space by PERMANOVA.

PERMANOVA may be thought of as a method that takes a geometrical approach to MANOVA ( Edgington (1995) ). Let each of the p variables be a dimension, and each of the N samples be represented by points in the p-dimensional space according to the values they take for each variable along each dimension. Now, the simplest of all multivariate systems has only p = 2 variables. It is good to consider this situation, because it is easy to draw just 2 dimensions! Now, imagine that there were n = 10 replicate samples in each of a = 3 groups, as shown in Fig. 1.1.

Fig. 1.1. Plot of a hypothetical data set with p = 2 variables (dimensions) and n = 10 replicate samples in each of a = 3 groups. The three groups are identified by different symbols in the plot.

The whole set of N = 30 samples taken together create a data cloud which is centred on a point called the overall centroid. For a Euclidean system, this is obtained as the arithmetic average for each of the (in this case 2) variables. Similarly, each of the groups also has its own group centroid, located in the centre of each of the clouds of points identified for each group.

Fig. 1.2. Plots of the hypothetical data set from Fig. 1.1 showing the geometric partitioning.

Just as in univariate ANOVA, we can consider the distance of any given point (sample) from the overall centroid in this space as being made up of two parts: the distance from the point to its group centroid (Fig. 1.2a) plus the distance from the group centroid to the overall centroid (Fig. 1.2b). This is the essence of the geometric approach to MANOVA in Euclidean space. We can calculate the sums of squares as:

$SS _ T$ = the sum of squared distances from the samples to the overall centroid,

$SS _ {Res}$ = the sum of squared distances from the samples to their own group centroid, and

$SS _ A$ = the sum of squared distances from the group centroids to the overall centroid.

The well-known univariate ANOVA identity: $SS _T = SS _ {Res} + SS _ A$ also holds for this geometric conception of MANOVA in Euclidean space. Verdonschot & ter Braak (1994) and Legendre & Anderson (1999) also remark on how these sums of squares are equal to the sum of the individual univariate sums of squares for each of the separate variables if Euclidean distance is used.

⁵ The word sample will be used throughout this manual in the manner that ecologists, and not statisticians, have come to understand the word. A sample shall mean a single unit used for sampling, such as a core, a transect, or a quadrat. This is consistent with the use of this word in PRIMER.

0.1 Title page

0.2 Contact details and installation of the PERMANOVA+ software

0.3 Introduction to the methods of PERMANOVA+

0.4 Changes from DOS to PERMANOVA+ for PRIMER

0.5 Using this manual

1.1 General description

1.2 Partitioning

1.3 Huygens’ theorem

1.4 Sums of squares from a distance matrix

1.5 The pseudo-F statistic

1.6 Test by permutation

1.7 Assumptions

1.8 One-way example (Ekofisk oil-field macrofauna)

1.9 Creating a design file

1.10 Running PERMANOVA

1.11 Pair-wise comparisons

1.12 Monte Carlo P-values (Victorian avifauna)

1.13 PERMANOVA versus ANOSIM

1.14 Two-way crossed design (Subtidal epibiota)

1.15 Interpreting interactions

1.16 Additivity

1.17 Methods of permutations

1.18 Additional assumptions

1.19 Contrasts

1.20 Fixed vs random factors (Tasmanian meiofauna)

1.21 Components of variation

1.22 Expected mean squares (EMS)

1.23 Constructing $F$ from EMS

1.24 Exchangeable units

1.25 Inference space and power

1.26 Testing the design

1.27 Nested design (Holdfast invertebrates)

1.28 Estimating components of variation

1.29 Pooling or excluding terms

1.30 Designs that lack replication (Plankton net study)

1.31 Split-plot designs (Woodstock plants)

1.32 Repeated measures (Victorian avifauna, revisited)

1.33 Unbalanced designs

1.34 Types of sums of squares (Birds from Borneo)

1.35 Designs with covariates (Holdfast invertebrates, revisited)

1.36 Linear combinations of mean squares (NZ fish assemblages)

1.37 Asymmetrical designs (Mediterranean molluscs)

1.38 Environmental impacts

2.1 General description

2.2 Rationale

2.3 Multivariate Levene’s test (Bumpus’ sparrows)

2.4 Generalisation to dissimilarities

2.5 $P$-values by permutation

2.6 Test based on medians

2.7 Ecological example (Tikus Island corals)

2.8 Choice of measure

2.9 Dispersion as beta diversity (Norwegian macrofauna)

2.10 Small sample sizes

2.11 Dispersion in nested designs (Okura macrofauna)

2.12 Dispersion in crossed designs (Cryptic fish)

2.13 Concluding remarks

3.1 General description

3.2 Rationale

3.3 Mechanics of PCO

3.4 Example: Victorian avifauna

3.5 Negative eigenvalues

3.6 Vector overlays

3.7 PCO versus PCA (Clyde environmental data)

3.8 Distances among centroids (Okura macrofauna)

3.9 PCO versus MDS

4.1 General description

4.2 Rationale

4.3 Partitioning

4.4 Simple linear regression (Clyde macrofauna)

4.5 Conditional tests

4.6 (Holdfast invertebrates)

4.7 Assumptions & diagnostics

4.8 Building models

4.9 Cautionary notes

4.10 (Ekofisk macrofauna)

4.11 Visualising models: dbRDA

4.12 Vector overlays in dbRDA

4.13 dbRDA plot for Ekofisk

4.14 Analysing variables in sets (Thau lagoon bacteria)

4.15 Categorical predictor variables (Oribatid mites)