1.34 Types of sums of squares (Birds from Borneo)

When the design is unbalanced, there will be a number of different ways to do the partitioning, which will depend to some extent on our hypotheses and how we wish to treat the potential overlap among the terms. The different ways of doing the partitioning are called “Types” of sums of squares. More particularly, there are (at least) four types, known (perhaps unhelpfully) as Type I, II, III and IV. This terminology was initially coined by the developers of the SAS computer program (e.g., SAS Institute (1999) ), and is now in common usage. All of these types of SS produce identical results for balanced designs. Furthermore, Types II, III and IV will be identical for models with no interactions and Types III and IV will be identical if all cells are filled (i.e. if all cells have n ≥ 1). Searle (1987) provides an excellent text regarding models and hypotheses for unbalanced designs, including a comparison of the types of SS, which are briefly described below:

Fig1.41.png

Fig. 1.41. Schematic Venn diagrams demonstrating the conceptual differences in Types of SS for a two-way crossed unbalanced design.

The PERMANOVA dialog box offers the user the option of using Type I, II or III SS. The default in PERMANOVA is to use Type III SS. This is primarily because most editors of journals (at least, most ecological journals) have now come to expect Type III SS to be used for unbalanced designs, simply because these will tend to be the most conservative of the three. However, there is no particular reason not to use the other types, especially if one of these is better suited to particular hypotheses of interest. For example, as already noted, a sequential analysis (Type I SS) would be quite sensible to use for an hierarchical nested design. Indeed, many statisticians would consider Type I SS to be the most sensible general approach, as no components of variation are left out (i.e. there are no “bits missing”). Type I SS also allows clarification of the relative sizes of overlapping regions, when terms are fitted in different orders. In most cases, provided the degree of imbalance in the design is modest (due, for example, to just a few missing observations here and there), the overall conclusions of the study will be little affected by this choice.
 

Fig1.42.png

Fig. 1.42. Sample sizes per cell in the unbalanced two-way layout for the Borneo birds example.
 

A case in point is provided by an analysis of bird assemblages from Borneo, Indonesia in response to a two-way unbalanced design, as described by Cleary, Genner, Boyle et al. (2005) . A total of N = 37 sites were sampled within the Kayu Mas logging concession, close to Sangai, Central Kalimantan. The sites were cross-classified according to two factors: Logging (fixed with a = 3 levels: unlogged primary forest, forest logged in 1993/94 and forest logged in 1989/90) and Slope (fixed with b = 3 levels: lower, middle and upper). There were different numbers of sites (n) within each of the a × b = 3 × 3 = 9 cells in the design, as shown in Fig. 1.42. Within each site, spot-mapping (using calls and visual observations) was used to sample birds along each of two parallel 300 m linear transects, 50 m apart at each site. There were p = 177 bird species recorded in all and the data are located in the file born.pri, located in the ‘BorneoBirds’ folder of the ‘Examples add-on’ directory.
 

Fig1.43.png

Fig. 1.43. MDS ordination of bird assemblages from Borneo in all combinations of logging (P = primary forest, L89 = logged in 1989/90, L93 = logged in 1993/94) and slope (L = lower, M = middle and U = upper).

An MDS plot of the bird communities, on the basis of log(x+1)-transformed data and Bray-Curtis similarities ( Cleary, Genner, Boyle et al. (2005) ) shows a clear effect of logging, and suggests some effects of slope as well, although these are less clear (Fig. 1.43). For illustration, four different analyses of the data were done using PERMANOVA (Fig. 1.44). First an analysis using Type I SS was done, fitting the factor of “Logging” first. Next, an analysis using Type I SS was done again, but this time fitting the factor of “Slope” first. Analyses were then done using each of Type II and Type III SS, in turn. Note that the SS for either factor using Type II SS corresponds to what is obtained if that factor is fitted second in a sequential (Type I) analysis. The Type III SS are different from all others, except for the interaction term, which in all cases was conditioned upon both of the main effects. Note also that the multipliers on individual components in each EMS differ for the different types of SS as well. This, in turn, means that the estimates of components of variation will also differ (Fig. 1.44).

Fig1.44.png

Fig. 1.44. PERMANOVA analyses of the two-way crossed unbalanced design for the example of Borneo birds using different Types of SS, as indicated.

In summary, the analysis of an unbalanced design using different Types of SS affects the values of (i) the SS themselves (and thus the values of MS and pseudo-F) for individual terms in the model; (ii) the EMS for each term and multipliers of individual components of variation; (iii) the estimates of the sizes of components of variation. Despite all of this, for the present example, the same general conclusions would be obtained, regardless of which Type of SS we had decided to use (Fig. 1.44). There are significant differences among bird assemblages in forests having different logging histories and the slope of the site also has a significant effect. These factors did not interact with one another and logging effects were much larger than slope effects (see Fig. 1.43 and also the estimated components of variation). The extent to which different types of SS will give comparable results will depend on just how unbalanced the design is – greater imbalances will generally lead to greater overlapping regions and thus potentially greater discrepancies. Perhaps the most important point is to recognise how different choices for the type of SS in unbalanced designs correspond conceptually to different underlying hypotheses (see Table 1.5, Fig. 1.41 and Searle (1987) ).

ScreenshotTable1.5.png

Table 1.5. Tests done using different types of SS in a two-way crossed unbalanced design (cf. Fig. 1.41). The vertical line is to be read as “given”, thus “A | B” should be read as “factor A given factor B”. A comma should be read as “and”.


Revision #7
Created 9 August 2022 08:56:37 by Arden
Updated 4 September 2022 10:08:54 by Arden