4.4 Simple linear regression (Clyde macrofauna)

In our first example of DISTLM, we will examine the relationship between the Shannon diversity (H′) of macrofauna and log copper concentration from benthic sampling at 12 sites along the Garroch Head dumpground in the Firth of Clyde, using simple linear regression. How much of the variation in macrofaunal diversity (Y) is explained by variation in the concentration of copper in the sediment (X)? In this case, p = q = 1, so both Y and X are actually just single vectors containing one variable (rather than being whole matrices). We shall do a DISTLM analysis on the basis of Euclidean distances, which is equivalent to univariate simple linear regression, but where the P-value for the test is obtained using permutations, rather than by using traditional tables.

In PRIMER, open up the environmental data file clev.pri and also the macrofaunal data file clma.pri, which are both located in the ‘Clydemac’ folder of the ‘Examples v6’ directory. Here, we will focus first only on a single predictor variable: copper. Highlight the variable ‘Cu’ and choose Select > Highlighted. Next, transform the variable (for consistency, we shall use the log-transformation suggested by Clarke & Gorley (2006) ), by choosing Tools > Transform(individual) > Expression: log(0.1+V). Rename the transformed variable ‘ln Cu’, and also rename the data sheet containing this variable ln Cu. Next, go to the sheet containing the macrofaunal data clma and choose Analyse > DIVERSE. In the resulting dialog, remove the tick marks from all of the measures (shown under the ‘Other’ tab or the ‘Simpson’ tab), so that the only diversity measure that will be calculated is Shannon’s index H′ (shown under the ‘Shannon’ tab) using Log base $\checkmark$e. Place a tick mark next to $\checkmark$Results to worksheet, then click OK. Rename the resulting data sheet H′. Calculate Euclidean distances among samples on the basis of H′ alone. That is, from the H′ data sheet, choose Analyse > Resemblance > (Analyse between •Samples) & (Measure •Euclidean distance).

Fig. 4.3. DISTLM procedure for simple linear regression of Clyde macrofaunal diversity (H′) versus log copper concentration (ln Cu).

With the predictor (X) variable(s) in one sheet and the resemblance matrix (arising from the response variable(s) Y) in another, we are now ready to proceed with the analysis. DISTLM, like all of the methods in the PERMANOVA+ add-on, begins from the resemblance matrix. Note that the number and names of the samples in the resemblance matrix have to match precisely the ones listed in the worksheet of predictor variables. They do not necessarily have to be in the same order, but they do have to have the same strict names, so that they can be matched and related to one another directly in the analysis. Choose PERMANOVA+ > DISTLM > (Predictor variables worksheet: ln Cu) & (Selection procedure •All specified) & (Selection criterion •R^2) & (Num. permutations: 9999) (Fig. 4.3).

The results file (Fig. 4.4) shows that the proportion of the variation in Shannon diversity explained by log copper concentrations is quite large ($R ^ 2 = 0.815$, as seen in the column headed ‘Prop.’) and, not surprisingly, statistically significant by permutation (P = 0.0001). Thus, 81.5% of the variation in macrofaunal diversity among these 12 sites (as measured by H′ alone) is explained by variation in log copper concentration. Also shown in the output is the explicit quantitative partitioning: $SS _ \text{Total} = 7.63$ and $SS _ \text{Regression} = 6.22$. As there is only one predictor variable here, the information given under the heading ‘Marginal tests’ (Fig. 4.4) is all that is really needed from the output file. To see a scatterplot for this simple regression case, go to the ln Cu worksheet and choose Tools > Merge > Second worksheet: H′> OK. From this merged data worksheet (called ‘Data1’ by default), choose Analyse > Draftsman Plot > $\checkmark$Correlations to worksheet > OK. The plot shows that H′ decreases strongly with increasing ln Cu and the Pearson linear correlation (r) is -0.903 (Fig. 4.4). Indeed, this checks out with what was given in DISTLM for $R ^2$, as (-0.903)$^2$ = 0.815.

Fig. 4.4. DISTLM results for the regression of diversity of Clyde macrofauna (H′) versus log copper concentration (ln Cu).

0.1 Title page

0.2 Contact details and installation of the PERMANOVA+ software

0.3 Introduction to the methods of PERMANOVA+

0.4 Changes from DOS to PERMANOVA+ for PRIMER

0.5 Using this manual

1.1 General description

1.2 Partitioning

1.3 Huygens’ theorem

1.4 Sums of squares from a distance matrix

1.5 The pseudo-F statistic

1.6 Test by permutation

1.7 Assumptions

1.8 One-way example (Ekofisk oil-field macrofauna)

1.9 Creating a design file

1.10 Running PERMANOVA

1.11 Pair-wise comparisons

1.12 Monte Carlo P-values (Victorian avifauna)

1.13 PERMANOVA versus ANOSIM

1.14 Two-way crossed design (Subtidal epibiota)

1.15 Interpreting interactions

1.16 Additivity

1.17 Methods of permutations

1.18 Additional assumptions

1.19 Contrasts

1.20 Fixed vs random factors (Tasmanian meiofauna)

1.21 Components of variation

1.22 Expected mean squares (EMS)

1.23 Constructing $F$ from EMS

1.24 Exchangeable units

1.25 Inference space and power

1.26 Testing the design

1.27 Nested design (Holdfast invertebrates)

1.28 Estimating components of variation

1.29 Pooling or excluding terms

1.30 Designs that lack replication (Plankton net study)

1.31 Split-plot designs (Woodstock plants)

1.32 Repeated measures (Victorian avifauna, revisited)

1.33 Unbalanced designs

1.34 Types of sums of squares (Birds from Borneo)

1.35 Designs with covariates (Holdfast invertebrates, revisited)

1.36 Linear combinations of mean squares (NZ fish assemblages)

1.37 Asymmetrical designs (Mediterranean molluscs)

1.38 Environmental impacts

2.1 General description

2.2 Rationale

2.3 Multivariate Levene’s test (Bumpus’ sparrows)

2.4 Generalisation to dissimilarities

2.5 $P$-values by permutation

2.6 Test based on medians

2.7 Ecological example (Tikus Island corals)

2.8 Choice of measure

2.9 Dispersion as beta diversity (Norwegian macrofauna)

2.10 Small sample sizes

2.11 Dispersion in nested designs (Okura macrofauna)

2.12 Dispersion in crossed designs (Cryptic fish)

2.13 Concluding remarks

3.1 General description

3.2 Rationale

3.3 Mechanics of PCO

3.4 Example: Victorian avifauna

3.5 Negative eigenvalues

3.6 Vector overlays

3.7 PCO versus PCA (Clyde environmental data)

3.8 Distances among centroids (Okura macrofauna)

3.9 PCO versus MDS

4.1 General description

4.2 Rationale

4.3 Partitioning

4.4 Simple linear regression (Clyde macrofauna)

4.5 Conditional tests

4.6 (Holdfast invertebrates)

4.7 Assumptions & diagnostics

4.8 Building models

4.9 Cautionary notes

4.10 (Ekofisk macrofauna)

4.11 Visualising models: dbRDA

4.12 Vector overlays in dbRDA

4.13 dbRDA plot for Ekofisk

4.14 Analysing variables in sets (Thau lagoon bacteria)

4.15 Categorical predictor variables (Oribatid mites)