7.5 Broader implications for detecting impact

Comparison of results treating 'Locations' as random

Historical wisdom for such a design would have treated 'Locations' as a random factor ( Underwood (1992) , Glasby (1997) ). It is quite instructive to consider what the results of this analysis might have been had we done this, instead of treating the 'Locations' factor as finite. Below are the two output files:

treating 'Loc' as a finite factor with a sampling fraction of 2/8 (= 1/4) for the controls ('PERMANOVA1')

treating 'Loc' as a random factor ('PERMANOVA2').

There are a number of things that are different between these two outputs (Table 7.2). Look specifically at the EMS for the 'IvC' term and, hence, the construction of the pseudo $F$ ratio from mean squares and associated degrees of freedom for that test. These, in turn, obviously affect the observed value of the pseudo $F$ statistic for the test of 'IvC' and its associated $P$ value as well.

Table 7.2. Key essential differences in the PERMANOVA output when we treat 'Locations' as a finite population with 8 levels versus treating it as a random factor.

Point of difference	Treat 'Loc' as finite (8 levels)	Treat 'Loc' as random
Number of levels in the population for the factor of 'Location' (among Controls)	$B_1 = 8$	$B_1 = \infty$
Coefficient, $K$, on the 'Loc' variance component (Loc(IvC)) in the EMS for 'IvC'	$K = 6.75$	$K = 27$
Denominator in the construction of the pseudo F test statistic to test 'IvC'	$0.75 \cdot \text{MS}_ {\text{Sites}} + 0.25 \cdot \text{MS}_ {\text{Loc}}$	$\text{MS}_ {\text{Loc}}$
Denominator degrees of freedom for the test of 'IvC'	$\text{df}_ {\text{denom}} = 5.46$	$\text{df}_ {\text{denom}} = 1$
pseudo $F$ test-statistic for 'IvC'	$F = 3.7936$	$F = 2.8886$
$P$ value for the test of 'IvC'	$P = 0.021$	$P = 0.331$^¶

An interesting thing to note is that the denominator that needs to be used for the construction of the pseudo $F$ test-statistic for the test of 'IvC' when we treat 'Loc' as finite has to be constructed as a linear combination of mean squares. This is also the essential reason that the denominator degrees of freedom for the test of 'IvC' in the finite-factor case is a non-integer value (i.e., df_denom = 5.46).

The implications of all of this are far-reaching, because the test of the 'IvC' term is definitely the most important test for the researcher in this particular study design. The specification of the 'Loc' factor as 'finite' has clearly provided more power for this key test of environmental impact in the present case. In general, we can expect that the power will increase whenever there is an increase in the denominator degrees of freedom (all else being equal).

Changes in the size of the inference space

To demonstrate the effect of changing the size of the inference space on the analysis, we can posit what the results for this study would look like - specifically for the test of the 'IvC' term in the model - if the number of 'Control' locations in the population (i.e., $B_1$), were larger (Table 7.3). We have already seen what would happen if we consider that this population is infinite (i.e., if we treat 'Locations' as a random factor). The table below looks at a progression of values for $B_1$, corresponding to a gradation in the sampling fraction ($f = b_1/B_1$) from fixed ($f=1$) to random (where $B_1 = \infty$ so $f$ is effectively zero), showing the concomitant change in the results.

Table 7.3. Effect of a change in the size of the inference space (sampling fraction) on construction of the F statistic and associated denominator degrees of freedom in a PERMANOVA test for the factor 'IvC'.

$B_1$	Sampling fraction ($f$)	Denominator for the test of 'IvC'	$\text{df}_ {\text{denom}}$	$F_{\text{IvC}}$
$\infty$ (random)	$\underset{B_1 \to \infty}{\lim} f = 0$	$\text{MS}_ {\text{Loc}}$	$1$	$2.889$
$100$	$1/50$	$0.67 \cdot \text{MS}_ {\text{Sites}} + 0.33 \cdot \text{MS}_ {\text{Loc}}$	$4.35$	$3.676$
$30$	$1/15$	$0.69 \cdot \text{MS}_ {\text{Sites}} + 0.31 \cdot \text{MS}_ {\text{Loc}}$	$4.57$	$3.699$
$20$	$1/10$	$0.70 \cdot \text{MS}_ {\text{Sites}} + 0.30 \cdot \text{MS}_ {\text{Loc}}$	$4.72$	$3.716$
$10$	$1/5$	$0.73 \cdot \text{MS}_ {\text{Sites}} + 0.27 \cdot \text{MS}_ {\text{Loc}}$	$5.21$	$3.767$
$8$	$1/4$	$0.75 \cdot \text{MS}_ {\text{Sites}} + 0.25 \cdot \text{MS}_ {\text{Loc}}$	$5.46$	$3.794$
$4$	$1/2$	$0.83 \cdot \text{MS}_ {\text{Sites}} + 0.17 \cdot \text{MS}_ {\text{Loc}}$	$6.62$	$3.930$
$2$	$1$	$\text{MS}_ {\text{Sites}}$	$6$	$4.235$

Note that we are not changing anything about the number of levels actually sampled, here: i.e., for all of the lines in Table 7.3 above, we have the same $b_1$ = 2 sampled control locations.

A word of caution

We are not typically at liberty simply to choose whatever inference space we want. Ethical as well as scientific considerations may well come in to play here. It has to be recognised that the specific pseudo $F$ ratio in every line of Table 7.3 actually corresponds to a test of a different null hypothesis. Each line presents a test with a different breadth of inference: the top line has the broadest scope, while the bottom line has the narrowest. Indeed, treating the control locations as fixed (bottom line in Table 7.3) might well give us more power, but it also severely limits our statistical inferences to just those particular locations in our study and to no others. It is clear that the inferences from the true study design, where we sampled 2 out of 8 control locations, apply to the whole of the Italian Mediterranean coastline that is spanned by those 8 potential locations, no less and no further. Hence, this excellent new tool permitting specification of finite factors, although it affords us greater flexibility and (potentially) power to test the terms of greatest interest, also comes with a special dose of responsibility. We need, as researchers, to articulate carefully the broader population, hence the scale and extent of the inferences that we shall draw from any study.

^¶Note that the p-value here is being limited by the fact that there are only 3 possible permutations of the 3 locations across the 2 groups 'I' and 'C'. We do have the option to obtain a Monte Carlo approximation to the p-value when we run the PERMANOVA, by ticking the box in the PERMANOVA run dialog: ($\checkmark$Do Monte Carlo tests). Assuming that each of the principal coordinate (PCO) axes representing the sample points in the chosen resemblance space (Bray-Curtis in this example) are asymptotically normal, then the PERMANOVA pseudo-F statistic is distributed as a ratio of two linear forms in chi-square under a true null hypothesis, from which we can take a random Monte Carlo draw (the linear forms are supplied by the eigenvalues of the PCO). The Monte Carlo approximate p-value for the test of the 'IvC' term for this example, treating 'Locations' as a random and not a finite factor, is $P$ ~ 0.03.

Introduction

New Statistical Methods in P8

New Tools & Utilities in P8

1.1 Expansion from P7 to P8

1.2 Definitions of statistics

1.3 Biotic data: summary stats

1.4 Split summary stats results by groups

1.5 Environmental data: summary stats

2.1 What is an empirical distribution?

2.2 Example: Empirical distributions of oyster sizes

3.1 Plots of empirical densities

3.2 Example: Dotplot of oyster sizes

3.3 Example: Violin plot of kelp holdfast volumes

4.1 Wilcoxon signed-rank test

4.2 Example: Plankton hauls

4.3 Mann-Whitney U test

4.4 Example: Snapper in marine reserves

4.5 Kruskal-Wallis test

4.6 Example: A bivalve species from Ekofisk

4.7 Kolmogorov-Smirnov test

4.8 Example: Sizes of oysters

4.9 Test of Association

4.10 Example: Ekofisk diversity

4.11 Example: Associations between species

Overview of new 'Design' options and tools

6.1 Overview - Allow heterogeneity

6.2 ANOVA in a nutshell

6.3 The Behrens-Fisher problem (BFP)

6.4 Multivariate Behrens-Fisher problem

6.5 Solution to the multivariate BFP

6.6 Example: one-way PERMANOVA allowing heterogeneity

6.7 Heterogeneity in more complex designs

6.8 Example: two-way crossed PERMANOVA allowing heterogeneity

7.1 Overview - Finite factors

7.2 Dichotomy: fixed vs random factors

7.3 Not a dichotomy: a progression from fixed to random

7.4 Example: environmental impact on molluscs

7.5 Broader implications for detecting impact

8.1 Designs lacking replication

8.2 Example: Split-plot - Woodstock vegetation

8.3 Example: Repeated measures - Victorian avifauna

9.1 Why group covariables together?

9.2 Periodic and cyclical models

9.3 Example: Annual monthly cycles - B.C. macroalgae

10.1 Ordinations for multi-factor designs

10.2 Main effects plot

10.3 Interaction plot

10.4 Example: NZ fish assemblages

11.1 What are 'residual' distances?

11.2 Example: Plankton (revisited)

12.1 Overview - Control charts

12.2 Classical univariate control chart

12.3 Classical multivariate control chart

12.4 Bivariate normal example: NZ fish

12.5 Dissimilarity-based multivariate control chart

12.6 Additional notes on implementing control charts

12.7 Example: Birds from Grand Forks

13.1 Overview

13.2 Analysing cumulative standardised data

13.3 Example: Mussel sizes in the Gulf of Alaska

13.4 Example: Gulf of Maine invertebrates - functional resemblance

14.1 Overview

14.2 Example: NE Pacific groundfish vs depth

15.1 New default colour palette

15.2 New selection options

15.3 Re-name levels of a factor (or indicator)

15.4 Add customised values/labels to graphical axes

15.5 Split data sheet by factor/indicator

15.6 Line plots for samples

15.7 Output group-level stats from dispersion (or variability) weighting

15.8 Output diagnostic plots from CAP

15.9 New diagnostics for PCA/PCO plots

7.5 Broader implications for detecting impact

Comparison of results treating 'Locations' as random

Changes in the size of the inference space

A word of caution