Other BEST applications
Another situation employing rank correlation ($\rho$) between two resemblance matrices is the BEST (Bio-Env) routine of Section 13, where the biological similarity matrix (‘response’) describes the among-sample relationships of the full community and the secondary data sheet (‘explanation’) is of environ¬mental variables. Subsets of the latter variables were taken, and among-sample distances computed for each subset and correlated with the biotic similarities, the search being for a variable set that maximises $\rho$. However, there is nothing in the construction of BEST which limits its use to species similarities and environmental matrices. Either or both of these two sheets could be from biotic or abiotic samples – the user needs only to specify a resemblance measure which is relevant for the type of data in the secondary data matrix. A number of possib¬ilities can be envisaged. In what might be termed Env-Bio, subsets of species could be selected which best characterise the environmental gradient defined by a specified set of abiotic variables, or best match a simple model structure, e.g. the seriation distance matrix for n equally-spaced points on a line, as in the Phuket corals transect example earlier in this section (“which species define the serial gradient along the transect?”). Or for samples which have an a priori (unordered) group structure, a relevant model matrix of distances was seen to consist simply of 0’s (within groups) and 1’s (among groups). An Env-Bio analysis in that case would search for subsets of species which, in combination, best split the samples into those pre-defined groups – a rather different form of SIMPER analysis (Section 10) acting on all the groups at once, rather than selected pairs. It is equivalent to optimising the ANOSIM R statistic, PRIMER’s preferred measure of group separation in high-d space. [We saw ANOSIM R used in the same way earlier, Sections 6 & 13, in searching for optimal subdivisions of samples in divisive clustering, though there the set of species was fixed and the sample divisions selected, and here the sample groups are fixed and the set of species is being searched over. It should be stressed again that having selected an optimal species set, it is totally invalid to re-test the groups with a simple ANOSIM test! The strong selection bias effect is allowed for, however, in the global BEST test of Section 13, so that when sample groups are fixed a priori the BEST test could be used to justify interpreting the selected optimal species subset as ‘better than chance’.]
A further generalisation would allow ordering on the groups, e.g. for the seriation with replication model matrix described earlier in this section. There the idea would be to select the subset of species which best character¬ise an ordered group structure of community change, i.e. lead to both good separation of the groups from each other and in their pre-defined order (e.g. as in the distance groups for the Ekofisk oil-field study). A similar use of variable selection to best match a priori ordered groups was given by Valesini F et al 2003. Est Coast Shelf Sci 57: 163-177, under what might be termed an Env-Env scenario, since the variables were beach morphology characteristics, and thus required a distance-based resemblance calculation, such as normalised Euclidean. Other natural applications of this type might include the selection of biomarkers to best display a given impact gradient determined by tissue chemistry, the selection of morphometric measurements to best characterise known species or sub-species categories (unordered groups or ordered clines) etc, again supplemented by the global BEST test, to allow for the selection bias when testing overall significance of the ‘explanation’ (but see the important reservations expressed in Chapters 11 and 12 of CiMC on the extent to which correlative-type links of species to environmental variables, biomarkers to tissue contaminants etc, are ever demonstrated to be causal).