7.5 Broader implications for detecting impact
Comparison of results treating 'Locations' as random
Historical wisdom for such a design would have treated 'Locations' as a random factor ( Underwood (1992) , Glasby (1997) ). It is quite instructive to consider what the results of this analysis might have been had we done this, instead of treating the 'Locations' factor as finite. Below are the two output files:
- treating 'Loc' as a finite factor with a sampling fraction of 2/8 (= 1/4) for the controls ('PERMANOVA1')
- treating 'Loc' as a random factor ('PERMANOVA2').
There are a number of things that are different between these two outputs (Table 7.2). Look specifically at the EMS for the 'IvC' term and, hence, the construction of the pseudo $F$ ratio from mean squares and associated degrees of freedom for that test. These, in turn, obviously affect the observed value of the pseudo $F$ statistic for the test of 'IvC' and its associated $P$ value as well.
Table 7.2. Key essential differences in the PERMANOVA output when we treat 'Locations' as a finite population with 8 levels versus treating it as a random factor.
| Point of difference | Treat 'Loc' as finite (8 levels) | Treat 'Loc' as random |
|---|---|---|
| Number of levels in the population for the factor of 'Location' (among Controls) | $B_1 = 8$ | $B_1 = \infty$ |
| Coefficient, $K$, on the 'Loc' variance component (Loc(IvC)) in the EMS for 'IvC' | $K = 6.75$ | $K = 27$ |
| Denominator in the construction of the pseudo F test statistic to test 'IvC' | $0.75 \cdot \text{MS}_ {\text{Sites}} + 0.25 \cdot \text{MS}_ {\text{Loc}}$ | $\text{MS}_ {\text{Loc}}$ |
| Denominator degrees of freedom for the test of 'IvC' | $\text{df}_ {\text{denom}} = 5.46$ | $\text{df}_ {\text{denom}} = 1$ |
| pseudo $F$ test-statistic for 'IvC' | $F = 3.7936$ | $F = 2.8886$ |
| $P$ value for the test of 'IvC' | $P = 0.021$ | $P = 0.331$¶ |
An interesting thing to note is that the denominator that needs to be used for the construction of the pseudo $F$ test-statistic for the test of 'IvC' when we treat 'Loc' as finite has to be constructed as a linear combination of mean squares. This is also the essential reason that the denominator degrees of freedom for the test of 'IvC' in the finite-factor case is a non-integer value (i.e., dfdenom = 5.46).
The implications of all of this are far-reaching, because the test of the 'IvC' term is definitely the most important test for the researcher in this particular study design. The specification of the 'Loc' factor as 'finite' has clearly provided more power for this key test of environmental impact in the present case. In general, we can expect that the power will increase whenever there is an increase in the denominator degrees of freedom (all else being equal).
Changes in the size of the inference space
To demonstrate the effect of changing the size of the inference space on the analysis, we can posit what the results for this study would look like - specifically for the test of the 'IvC' term in the model - if the number of 'Control' locations in the population (i.e., $B_1$), were larger (Table 7.3). We have already seen what would happen if we consider that this population is infinite (i.e., if we treat 'Locations' as a random factor). The table below looks at a progression of values for $B_1$, corresponding to a gradation in the sampling fraction ($f = b_1/B_1$) from fixed ($f=1$) to random (where $B_1 = \infty$ so $f$ is effectively zero), showing the concomitant change in the results.
Table 7.3. Effect of a change in the size of the inference space (sampling fraction) on construction of the F statistic and associated denominator degrees of freedom in a PERMANOVA test for the factor 'IvC'.
| $B_1$ | Sampling fraction ($f$) | Denominator for the test of 'IvC' | $\text{df}_ {\text{denom}}$ | $F_{\text{IvC}}$ |
|---|---|---|---|---|
| $\infty$ (random) | $\underset{B_1 \to \infty}{\lim} f = 0$ | $\text{MS}_ {\text{Loc}}$ | $1$ | $2.889$ |
| $100$ | $1/50$ | $0.67 \cdot \text{MS}_ {\text{Sites}} + 0.33 \cdot \text{MS}_ {\text{Loc}}$ | $4.35$ | $3.676$ |
| $30$ | $1/15$ | $0.69 \cdot \text{MS}_ {\text{Sites}} + 0.31 \cdot \text{MS}_ {\text{Loc}}$ | $4.57$ | $3.699$ |
| $20$ | $1/10$ | $0.70 \cdot \text{MS}_ {\text{Sites}} + 0.30 \cdot \text{MS}_ {\text{Loc}}$ | $4.72$ | $3.716$ |
| $10$ | $1/5$ | $0.73 \cdot \text{MS}_ {\text{Sites}} + 0.27 \cdot \text{MS}_ {\text{Loc}}$ | $5.21$ | $3.767$ |
| $8$ | $1/4$ | $0.75 \cdot \text{MS}_ {\text{Sites}} + 0.25 \cdot \text{MS}_ {\text{Loc}}$ | $5.46$ | $3.794$ |
| $4$ | $1/2$ | $0.83 \cdot \text{MS}_ {\text{Sites}} + 0.17 \cdot \text{MS}_ {\text{Loc}}$ | $6.62$ | $3.930$ |
| $2$ | $1$ | $\text{MS}_ {\text{Sites}}$ | $6$ | $4.235$ |
Note that we are not changing anything about the number of levels actually sampled, here: i.e., for all of the lines in Table 7.3 above, we have the same $b_1$ = 2 sampled control locations.
A word of caution
We are not typically at liberty simply to choose whatever inference space we want. Ethical as well as scientific considerations may well come in to play here. It has to be recognised that the specific pseudo $F$ ratio in every line of Table 7.3 actually corresponds to a test of a different null hypothesis. Each line presents a test with a different breadth of inference: the top line has the broadest scope, while the bottom line has the narrowest. Indeed, treating the control locations as fixed (bottom line in Table 7.3) might well give us more power, but it also severely limits our statistical inferences to just those particular locations in our study and to no others. It is clear that the inferences from the true study design, where we sampled 2 out of 8 control locations, apply to the whole of the Italian Mediterranean coastline that is spanned by those 8 potential locations, no less and no further. Hence, this excellent new tool permitting specification of finite factors, although it affords us greater flexibility and (potentially) power to test the terms of greatest interest, also comes with a special dose of responsibility. We need, as researchers, to articulate carefully the broader population, hence the scale and extent of the inferences that we shall draw from any study.
¶Note that the p-value here is being limited by the fact that there are only 3 possible permutations of the 3 locations across the 2 groups 'I' and 'C'. We do have the option to obtain a Monte Carlo approximation to the p-value when we run the PERMANOVA, by ticking the box in the PERMANOVA run dialog: ($\checkmark$Do Monte Carlo tests). Assuming that each of the principal coordinate (PCO) axes representing the sample points in the chosen resemblance space (Bray-Curtis in this example) are asymptotically normal, then the PERMANOVA pseudo-F statistic is distributed as a ratio of two linear forms in chi-square under a true null hypothesis, from which we can take a random Monte Carlo draw (the linear forms are supplied by the eigenvalues of the PCO). The Monte Carlo approximate p-value for the test of the 'IvC' term for this example, treating 'Locations' as a random and not a finite factor, is $P$ ~ 0.03.
![11a._P+output_again[i].png](https://learninghub.primer-e.com/uploads/images/gallery/2025-12/scaled-1680-/11a-p-output-again-i.png)
![11b._P+output_Loc_random[ii].png](https://learninghub.primer-e.com/uploads/images/gallery/2025-12/scaled-1680-/11b-p-output-loc-random-ii.png)