1.33 Unbalanced designs

Virtually all of the examples thus far have involved the analysis of what are known as balanced experimental designs. For these situations, there is equal replication within each level of a factor (or within each cell). Even data that lack replication have an equal number of n = 1 within each cell. However, despite a researcher’s best efforts, sometimes the design ends up being unbalanced, where there are unequal numbers of replicate samples within each factor level (or within each cell) of the design.

For the one-way case (such as the Ekofisk oil-field data seen in the section One-way example), the consequences of an unbalanced design are not problematic. We can perform PERMANOVA in the usual way, with the usual partitioning of sums of squared dissimilarities. Only two consequences of unequal replication are apparent for one-way designs. First, the multiplier on the EMS for the factor of interest is no longer necessarily a whole number, as it was for the balanced case. By scrolling down the PERMANOVA results window produced in the analysis of the Ekofisk oil-field data (shown only partially in Fig. 1.10 above), we can see the multiplier for the ‘S(Di)’ component of variation in the EMS for Distance is 9.57, whereas for any of the balanced designs, the multipliers for any component in any EMS are whole numbers (e.g., see the EMS in Figs. 1.15, 1.23 or 1.29, all of which are balanced designs). For one-way designs, this does not change, however, the use of the residual MS in the denominator of the pseudo-F ratio for the test of “No differences among the groups”.

The second consequence of an unbalanced design is apparent when we consider the permutations. We still randomly allocate observation units across the groups (levels of the factor), while maintaining the existing group differences in sample sizes. Each individual unit no longer has an equal chance of falling into any particular group, but instead will have a greater chance of falling into a group that has a larger sample size. However, we can still proceed easily on the basis that all possible re-arrangements of the samples by reference to the existing (albeit unbalanced) experimental design are equally likely.

The more important issues facing experimenters with unbalanced designs occur when there is more than one factor in the design. In that case, the consequences are: (i) the multipliers on individual components of variation in the EMS’s are not necessarily whole numbers and these multipliers can differ for the same component when it appears in the EMS’s of different terms in the model; (ii) the main effects of factors and the interaction terms are no longer independent of one another. The latter is perhaps the most important conceptual issue in the analysis of unbalanced, as opposed to balanced, designs. This means that, like in multiple regression (see chapter 4), the order in which we choose to fit the terms matters.

Fig. 1.40. Venn diagrams showing the difference between (a) a balanced and (b) an unbalanced two-way crossed design.

Begin by considering a two-way crossed ANOVA design, with factors A, B and their interaction A×B. If the design is balanced, then the individual amounts of variation in the response data cloud explained by each of the terms in the model are completely independent of one another. This can be visualised using a Venn diagram (Fig. 1.40), where the total variation in the system ($SS _T$) is represented by a large circle and the residual variation, $SS _ {Res}$, is the area left over after removing all of the portions explained by the model. For a balanced design, the individual terms in the model explain separate independent portions of the total variation (Fig. 1.40a), whereas for an unbalanced design, there will be some overlap among the terms regarding the individual portions of variation that they explain (Fig. 1.40b).

0.1 Title page

0.2 Contact details and installation of the PERMANOVA+ software

0.3 Introduction to the methods of PERMANOVA+

0.4 Changes from DOS to PERMANOVA+ for PRIMER

0.5 Using this manual

1.1 General description

1.2 Partitioning

1.3 Huygens’ theorem

1.4 Sums of squares from a distance matrix

1.5 The pseudo-F statistic

1.6 Test by permutation

1.7 Assumptions

1.8 One-way example (Ekofisk oil-field macrofauna)

1.9 Creating a design file

1.10 Running PERMANOVA

1.11 Pair-wise comparisons

1.12 Monte Carlo P-values (Victorian avifauna)

1.13 PERMANOVA versus ANOSIM

1.14 Two-way crossed design (Subtidal epibiota)

1.15 Interpreting interactions

1.16 Additivity

1.17 Methods of permutations

1.18 Additional assumptions

1.19 Contrasts

1.20 Fixed vs random factors (Tasmanian meiofauna)

1.21 Components of variation

1.22 Expected mean squares (EMS)

1.23 Constructing $F$ from EMS

1.24 Exchangeable units

1.25 Inference space and power

1.26 Testing the design

1.27 Nested design (Holdfast invertebrates)

1.28 Estimating components of variation

1.29 Pooling or excluding terms

1.30 Designs that lack replication (Plankton net study)

1.31 Split-plot designs (Woodstock plants)

1.32 Repeated measures (Victorian avifauna, revisited)

1.33 Unbalanced designs

1.34 Types of sums of squares (Birds from Borneo)

1.35 Designs with covariates (Holdfast invertebrates, revisited)

1.36 Linear combinations of mean squares (NZ fish assemblages)

1.37 Asymmetrical designs (Mediterranean molluscs)

1.38 Environmental impacts

2.1 General description

2.2 Rationale

2.3 Multivariate Levene’s test (Bumpus’ sparrows)

2.4 Generalisation to dissimilarities

2.5 $P$-values by permutation

2.6 Test based on medians

2.7 Ecological example (Tikus Island corals)

2.8 Choice of measure

2.9 Dispersion as beta diversity (Norwegian macrofauna)

2.10 Small sample sizes

2.11 Dispersion in nested designs (Okura macrofauna)

2.12 Dispersion in crossed designs (Cryptic fish)

2.13 Concluding remarks

3.1 General description

3.2 Rationale

3.3 Mechanics of PCO

3.4 Example: Victorian avifauna

3.5 Negative eigenvalues

3.6 Vector overlays

3.7 PCO versus PCA (Clyde environmental data)

3.8 Distances among centroids (Okura macrofauna)

3.9 PCO versus MDS

4.1 General description

4.2 Rationale

4.3 Partitioning

4.4 Simple linear regression (Clyde macrofauna)

4.5 Conditional tests

4.6 (Holdfast invertebrates)

4.7 Assumptions & diagnostics

4.8 Building models

4.9 Cautionary notes

4.10 (Ekofisk macrofauna)

4.11 Visualising models: dbRDA

4.12 Vector overlays in dbRDA

4.13 dbRDA plot for Ekofisk

4.14 Analysing variables in sets (Thau lagoon bacteria)

4.15 Categorical predictor variables (Oribatid mites)