10.1 Ordinations for multi-factor designs

Rationale

When considering the response of a whole set of variables (such as the abundances of species or taxa) simultaneously to a suite of several factors (e.g., arising from a multi-factor experiment or sampling design), it can be difficult to visualise salient structures and patterns in the data. One common problem is that multi-factor designs, when appropriately replicated, can yield a large total number of sampling units. A non-metric (or metric) multi-dimensional scaling (MDS) ordination done on a large number of samples can be very difficult to interpret. First, the 2D (and even 3D) stress might be quite high (> 0.20), precluding interpretability. Second, the residual variation (i.e., variation among the sampling units within each cell of the study design) is often quite large, and can mask essential patterns happening across the main factors of interest.

In univariate analyses of groups of samples (e.g., as in an ANOVA), one would commonly examine plots of means, rather than plots of raw sample values, to visualise patterns. In a similar way, for multivariate analyses, it is very useful to be able to visualise distances among the centroids in the space of a chosen resemblance measure. We can usefully construct ordinations from distance matrices among centroids that have been constructed from:

levels of factors that are the main effects (main effects plots);
combinations of levels of factors that are crossed with one another (interaction plots).

Ordination plots of distances among centroids were described by Anderson (2017) . In PRIMER 7 , it was possible to calculate distances among centroids, based on any given grouping factor (using PERMANOVA+ > Distance Among Centroids...). One can use this tool to generate interaction plots of interest by first creating factors that consist of combinations of levels of some chosen factors (e.g., using Edit > Factors... > Combine...).

In PRIMER 8, one can generate resemblance matrices and ordination plots of either (i) main-effect centroids or (ii) interaction centroids, automatically, by reference to a specific Design file. We shall outline briefly here (below) how resemblance matrices among centroids are constructed. We will then demonstrate these new practical tools and their utility for visualising and interpreting salient patterns in a multi-factor design by way of an example.

Distances among centroids

Let ${\bf Y}$ be a matrix of $N$ rows (sampling units) by $p$ columns (variables). Let ${\bf D} = \lbrace d_{ij} \rbrace$, $i = 1, ..., N$; $j = 1, ..., N$ be the distances or dissimilarities between every pair $(i,j)$ of sampling units. If ${\bf D}$ contains Euclidean distances, then the distances among centroids are equivalent to Euclidean distances among the arithmetic averages calculated separately for each variable. This equivalence does not hold, however, for non-Euclidean dissimilarities. Distances among centroids based on some other chosen dissimilarity measure (such as Bray-Curtis) are calculated as follows:

Step 1 - Calculate Gower's ${\bf G}$ matrix from ${\bf D}$
As in Gower (1966) , obtain $(N \times N)$ matrix $\bf G$ by first defining matrix ${\bf A} = \lbrace a_{ij} \rbrace = \lbrace - 0.5 \cdot d_{ij}^2 \rbrace$, then centring the elements of this matrix by its row-means, $\bar{a}_ {i\cdot}$, its column-means, $\bar{a}_ {\cdot j}$, and its overall mean, $\bar{a}_ {\cdot \cdot}$, to yield ${\bf G}$, i.e.,

${\bf G} = \lbrace g_{ij} \rbrace = \lbrace a_{ij} - \bar{a}_ {i\cdot} - \bar{a}_ {\cdot j} + \bar{a}_ {\cdot \cdot} \rbrace$

Step 2 - Obtain the full set of principal coordinate (PCO) axes from matrix ${\bf G}$
This is done by performing an eigenvalue decomposition of matrix ${\bf G}$. The resulting eigenvectors are each standardised by the absolute value of their respective eigenvalue. At this step, it is important to keep track of those eigenvectors that are associated with positive eigenvalues, and those that are associated with negative eigenvalues (if any), as two separate sets.

Step 3 - Calculate centroids as averages along PCO axes
Suppose there are $\ell = 1, ..., c$ cells (or specified groups of sampling units), and we require a centroid for each of these. The centroids are obtained as the arithmetic averages of the sampling units belonging to each cell (or group), calculated separately along each PCO axis.

Step 4 - Calculate distances among centroids
For every pair of centroids $(\ell, \ell')$, $\ell = 1, ..., c$ and $\ell' = 1, ..., c$, calculate Euclidean distances separately in each of two sets: one based on PCO axes corresponding to non-negative eigenvalues $(d_{\ell \ell'}^+)$ and one based on those corresponding to negative eigenvalues $(d_{\ell \ell'}^-)$, if any. Next, the $(c \times c)$ matrix of distances among centroids in the space of the chosen dissimilarity measure is then:

${\bf D}^{\left[ C \right]} = \lbrace d_{\ell \ell'}^{\left[ C \right]} \rbrace$, where $d_{\ell \ell'}^{\left[ C \right]} = \sqrt{ | (d_{\ell \ell'}^+)^2 - (d_{\ell \ell'}^-)^2 } | $

It is worth noting here that the distances among centroids can also be calculated directly from matrix ${\bf G}$, without using PCO axes. See section 5.1 in Anderson (2017) for details.

Introduction

New Statistical Methods in P8

New Tools & Utilities in P8

1.1 Expansion from P7 to P8

1.2 Definitions of statistics

1.3 Biotic data: summary stats

1.4 Split summary stats results by groups

1.5 Environmental data: summary stats

2.1 What is an empirical distribution?

2.2 Example: Empirical distributions of oyster sizes

3.1 Plots of empirical densities

3.2 Example: Dotplot of oyster sizes

3.3 Example: Violin plot of kelp holdfast volumes

4.1 Wilcoxon signed-rank test

4.2 Example: Plankton hauls

4.3 Mann-Whitney U test

4.4 Example: Snapper in marine reserves

4.5 Kruskal-Wallis test

4.6 Example: A bivalve species from Ekofisk

4.7 Kolmogorov-Smirnov test

4.8 Example: Sizes of oysters

4.9 Test of Association

4.10 Example: Ekofisk diversity

4.11 Example: Associations between species

Overview of new 'Design' options and tools

6.1 Overview - Allow heterogeneity

6.2 ANOVA in a nutshell

6.3 The Behrens-Fisher problem (BFP)

6.4 Multivariate Behrens-Fisher problem

6.5 Solution to the multivariate BFP

6.6 Example: one-way PERMANOVA allowing heterogeneity

6.7 Heterogeneity in more complex designs

6.8 Example: two-way crossed PERMANOVA allowing heterogeneity

7.1 Overview - Finite factors

7.2 Dichotomy: fixed vs random factors

7.3 Not a dichotomy: a progression from fixed to random

7.4 Example: environmental impact on molluscs

7.5 Broader implications for detecting impact

8.1 Designs lacking replication

8.2 Example: Split-plot - Woodstock vegetation

8.3 Example: Repeated measures - Victorian avifauna

9.1 Why group covariables together?

9.2 Periodic and cyclical models

9.3 Example: Annual monthly cycles - B.C. macroalgae

10.1 Ordinations for multi-factor designs

10.2 Main effects plot

10.3 Interaction plot

10.4 Example: NZ fish assemblages

11.1 What are 'residual' distances?

11.2 Example: Plankton (revisited)

12.1 Overview - Control charts

12.2 Classical univariate control chart

12.3 Classical multivariate control chart

12.4 Bivariate normal example: NZ fish

12.5 Dissimilarity-based multivariate control chart

12.6 Additional notes on implementing control charts

12.7 Example: Birds from Grand Forks

13.1 Overview

13.2 Analysing cumulative standardised data

13.3 Example: Mussel sizes in the Gulf of Alaska

13.4 Example: Gulf of Maine invertebrates - functional resemblance

14.1 Overview

14.2 Example: NE Pacific groundfish vs depth

15.1 New default colour palette

15.2 New selection options

15.3 Re-name levels of a factor (or indicator)

15.4 Add customised values/labels to graphical axes

15.5 Split data sheet by factor/indicator

15.6 Line plots for samples

15.7 Output group-level stats from dispersion (or variability) weighting

15.8 Output diagnostic plots from CAP

15.9 New diagnostics for PCA/PCO plots

10.1 Ordinations for multi-factor designs

Rationale

Distances among centroids