5.12 Canonical correlation: multiple X’s

In some cases, interest lies in finding axes through the cloud of points so as to maximise correlation with not just one X variable, but with linear combinations of multiple X variables simultaneously. In such cases, neither of these two sets of variables (i.e. the PCO’s arising from the resemblance matrix, on the one hand, and the X variables, on the other) have a specific role in the analysis – neither set is considered to be either predictors or responses. Rather, when there are several X variables, CAP can be conceptually described as sphericising both sets of variables, and then rotating them simultaneously against one another in order to find axes with maximum inter-correlations between these two sets.

As the agenda here is neither to explain nor predict one set using the other set, canonical correlation analysis with multiple X variables is a method for simply exploring relationships between two sets of variables. As such, its utility is perhaps rather limited for ecological applications, but certainly can be useful for generating hypotheses. CAP does canonical correlation between the PCO axes Q$_ m$ (N × m) and X (N × q), where m is generally chosen so as to minimise the leave-one-out residual sum of squares (see the section on Diagnostics), and the number of canonical axes generated will be min(m, q, (N – 1)). If p < (N – 1) and the measure being used is Euclidean embeddable (e.g., see the section on Negative eigenvalues in chapter 3 herein and Gower & Legendre (1986) regarding the geometric properties of dissimilarity measures), then it makes sense to manually set m = p in the CAP dialog a priori, as the dimensionality of PCO’s in such cases is known to be equal to p.

Note also that, as CAP will search for linear combinations of X variables that are maximally correlated with the PCO’s, it therefore makes sense to spend a little time examining the distributions of the X variables first (just as in dbRDA) to ensure that they have reasonably symmetric distributions and even scatter (using a draftsman plot for example), to transform them if necessary and also to consider eliminating redundant (very highly correlated) variables. The two sets of variables are treated symmetrically here, so CAP also simultaneously searches for linear combinations for the PCO’s that are maximally correlated with the X variables. Thus, one might also consider examining the distributions of the PCO’s (in scatterplot ordinations or a draftsman plot), to ensure that they, too, have fairly even scatter (although certainly no formal assumptions in this regard are brought to bear on the analysis).

0.1 Title page

0.2 Contact details and installation of the PERMANOVA+ software

0.3 Introduction to the methods of PERMANOVA+

0.4 Changes from DOS to PERMANOVA+ for PRIMER

0.5 Using this manual

1.1 General description

1.2 Partitioning

1.3 Huygens’ theorem

1.4 Sums of squares from a distance matrix

1.5 The pseudo-F statistic

1.6 Test by permutation

1.7 Assumptions

1.8 One-way example (Ekofisk oil-field macrofauna)

1.9 Creating a design file

1.10 Running PERMANOVA

1.11 Pair-wise comparisons

1.12 Monte Carlo P-values (Victorian avifauna)

1.13 PERMANOVA versus ANOSIM

1.14 Two-way crossed design (Subtidal epibiota)

1.15 Interpreting interactions

1.16 Additivity

1.17 Methods of permutations

1.18 Additional assumptions

1.19 Contrasts

1.20 Fixed vs random factors (Tasmanian meiofauna)

1.21 Components of variation

1.22 Expected mean squares (EMS)

1.23 Constructing $F$ from EMS

1.24 Exchangeable units

1.25 Inference space and power

1.26 Testing the design

1.27 Nested design (Holdfast invertebrates)

1.28 Estimating components of variation

1.29 Pooling or excluding terms

1.30 Designs that lack replication (Plankton net study)

1.31 Split-plot designs (Woodstock plants)

1.32 Repeated measures (Victorian avifauna, revisited)

1.33 Unbalanced designs

1.34 Types of sums of squares (Birds from Borneo)

1.35 Designs with covariates (Holdfast invertebrates, revisited)

1.36 Linear combinations of mean squares (NZ fish assemblages)

1.37 Asymmetrical designs (Mediterranean molluscs)

1.38 Environmental impacts

2.1 General description

2.2 Rationale

2.3 Multivariate Levene’s test (Bumpus’ sparrows)

2.4 Generalisation to dissimilarities

2.5 $P$-values by permutation

2.6 Test based on medians

2.7 Ecological example (Tikus Island corals)

2.8 Choice of measure

2.9 Dispersion as beta diversity (Norwegian macrofauna)

2.10 Small sample sizes

2.11 Dispersion in nested designs (Okura macrofauna)

2.12 Dispersion in crossed designs (Cryptic fish)

2.13 Concluding remarks

3.1 General description

3.2 Rationale

3.3 Mechanics of PCO

3.4 Example: Victorian avifauna

3.5 Negative eigenvalues

3.6 Vector overlays

3.7 PCO versus PCA (Clyde environmental data)

3.8 Distances among centroids (Okura macrofauna)

3.9 PCO versus MDS

4.1 General description

4.2 Rationale

4.3 Partitioning

4.4 Simple linear regression (Clyde macrofauna)

4.5 Conditional tests

4.6 (Holdfast invertebrates)

4.7 Assumptions & diagnostics

4.8 Building models

4.9 Cautionary notes

4.10 (Ekofisk macrofauna)

4.11 Visualising models: dbRDA

4.12 Vector overlays in dbRDA

4.13 dbRDA plot for Ekofisk

4.14 Analysing variables in sets (Thau lagoon bacteria)

4.15 Categorical predictor variables (Oribatid mites)