Chapter 3: Principal coordinates analysis (PCO)
Key references Method: Torgerson (1958), Gower (1966)
 3.1 General description
 3.2 Rationale
 3.3 Mechanics of PCO
 3.4 Example: Victorian avifauna
 3.5 Negative eigenvalues
 3.6 Vector overlays
 3.7 PCO versus PCA (Clyde environmental data)
 3.8 Distances among centroids (Okura macrofauna)
 3.9 PCO versus MDS
3.1 General description
Key references

Method:
Torgerson (1958)
,
Gower (1966)
PCO is a routine for performing principal coordinates analysis ( Gower (1966) ) on the basis of a (symmetric) resemblance matrix. PCO is also sometimes referred to as classical scaling, with origins in the psychometric literature ( Torgerson (1958) ). PCO places the samples onto Euclidean axes (i.e., so they can be drawn) using only a matrix of interpoint dissimilarities (e.g., Legendre & Legendre (1998) ). Ordination using nonmetric multidimensional scaling (MDS) focuses on preserving only the rank order of dissimilarities for an a priori chosen number of dimensions, whereas ordination using PCO (like PCA) is a projection onto axes, but (unlike PCA) in the space of the dissimilarity measure chosen. The user chooses the number of PCO axes to include in the output, but generally only the first two or three axes are drawn in ordination plots. A further utility provided in the PERMANOVA+ addon that uses PCO axes is the option to calculate distances among centroids for levels of a chosen factor. This allows visualisation of centroids from an experimental design in the space of the resemblance measure chosen.
3.2 Rationale
It is difficult to visualise patterns in the responses of whole sets of variables simultaneously. Each variable can be considered a dimension, with its own story to tell in terms of its mean, variance, skewness, etc. For most sets of multivariate data, there are also correlations among the variables. Ordination is simply the ordering of samples in Euclidean space (e.g., on a page) in some way, using the information provided by the variables. The primary goal of ordination methods is usually to reduce the dimensionality of the data cloud in order to allow the most salient patterns and structures to be observed.
Different ordination methods have different criteria by which the picture in reduced space is drawn (see chapter 9 of Legendre & Legendre 1998 for a more complete discussion). For example:

Nonmetric multidimensional scaling (MDS) preserves the rank order of the interpoint dissimilarities (for whatever resemblance measure has been chosen) as well as possible within the constraints of a small number of dimensions (usually just two or three). The adequacy of the plot is ascertained by virtue of how well the interpoint distances in the reduceddimension, Euclidean ordination plot reflect the rank orders of the underlying dissimilarities (see, for example, chapter 5 in Clarke & Warwick (2001) ).

Principal components analysis (PCA) is a projection of the points (perpendicularly) onto axes that minimise residual variation in Euclidean space. The first principal component axis is defined as the straight line drawn through the cloud of points such that the variance of sample points, when projected perpendicularly onto the axis, is maximised (see, for example, chapter 4 of Clarke & Warwick (2001) ).

Correspondence analysis (CA) is also a projection of the points onto axes that minimise residual variation, but this is done in the space defined by the chisquared distances among points ( ter Braak (1987) , Minchin (1987) , Legendre & Legendre (1998) );

Principal coordinates analysis (PCO) is like MDS in that it is very flexible – it can be based on any (symmetric) resemblance matrix. However, it is also like a PCA, in that it is a projection of the points onto axes that minimise residual variation in the space of the resemblance measure chosen.
PCO performed on a resemblance matrix of Euclidean distances reproduces the pattern and results that would be obtained by a PCA on the original variables. Similarly, PCO performed on a resemblance matrix of chisquared distances reproduces the pattern that would be obtained by a CA on the original variables. Thus, PCO is a more general procedure than either PCA or CA, yielding a projection in the space indicated by the resemblance measure chosen.
Two other features of PCO serve to highlight its affinity with PCA (as a projection). First, the scales of the resulting PCO axes are interpretable in the units of the original resemblance measure. Although the distances between samples in a plot of few dimensions will underestimate the distances in the fulldimensional space, they are, nevertheless, estimated in the same units as the original resemblance measure, but as projected along the PCO axes. Thus, unlike MDS axes, the PCO axes refer to a nonarbitrary quantitative scale defined by the chosen resemblance measure. Second, PCO axes (like PCA axes) are centered at zero and are only defined up to their sign. So, any PCO axis can be reflected (by changing the signs of the sample scores), if convenient^{61}.
For many multivariate applications (especially for species abundance data), MDS is usually the most appropriate ordination method to use for visualising patterns in a small number of dimensions, because it is genuinely the most flexible and robust approach available ( Kenkel & Orloci (1986) , Minchin (1987) ). There is a clear relationship in the philosophy underlying the ANOSIM testing procedure and nonmetric MDS in PRIMER. Both are nonparametric approaches that work on the ranks of resemblance measures alone. However, when using PERMANOVA to perform a partitioning for more complex designs, it is the actual dissimilarities (and not just their ranks) that are of interest and which are being modeled directly (e.g., see the section PERMANOVA versus ANOSIM in chapter 1). Therefore, we may wish to use an ordination procedure that is a little more consistent with this philosophy, and PCO may do this by providing a direct projection of the points in the space defined by the actual dissimilarities themselves. Although MDS will provide a more optimal solution for visualising in a few dimensions what is happening in the multidimensional cloud, the PCO can in some cases provide additional insights regarding original dissimilarities that might be lost in the nonmetric MDS, due to ranking. In addition (as stated in chapter 2), the Euclidean distance between two points in the space defined by the full set of PCO axes (all together) is equivalent to the original dissimilarity between those two points using the chosen resemblance measure on the original variables^{62}. So another main use of PCO axes is to obtain distances among centroids, which can then form the basis of further analyses when dealing with more complex and/or large multifactorial datasets.
^{61} This is done within PRIMER in the same manner as for either an MDS or PCA plot, by choosing Graph > Flip X or Graph > Flip Y in the resulting configuration.
^{62} With appropriate separate treatment of the axes corresponding to the positive and negative eigenvalues, if any, see McArdle & Anderson (2001) and Anderson (2006) and the section on Negative eigenvalues for details.
3.3 Mechanics of PCO
To construct axes that maximise fitted variation (or minimise residual variation) in the cloud of points defined by the resemblance measure chosen, the calculation of eigenvalues (sometimes called “latent roots”) and their associated eigenvectors is required. It is best to hold on to the conceptual description of what the PCO is doing and what it produces, rather than to get too bogged down in the matrix algebra required for its computation. More complete descriptions are available elsewhere (e.g., Gower (1966) , Legendre & Legendre (1998) ), but in essence, the PCO is produced by doing the following steps (Fig. 3.1):
 From the matrix of dissimilarities, D, calculate matrix A, defined (elementwise) as minus onehalf times each dissimilarity (or distance)^{63};
 Centre matrix A on its row and column averages, and on the overall average, to obtain Gower’s centred matrix G;
 Eigenvalue decomposition of matrix G yields eigenvalues ($\lambda _ i$ , i = 1, …, N) and their associated eigenvectors.
 The PCO axes Q (also called “scores”) are obtained by multiplying (scaling) each of the eigenvectors by the square root of their corresponding eigenvalue^{64}.
Fig. 3.1. Schematic diagram of the mechanics of a principal coordinates analysis (PCO).
The eigenvalues associated with each of the PCO axes provide information on how much of the variability inherent in the resemblance matrix is explained by each successive axis (usually expressed as a percentage of the total). The eigenvalues (and their associated axes) are ordered from largest to smallest, and their associated axes are also orthogonal to (i.e., perpendicular to or independent of) one another. Thus, $\lambda _ 1$ is the largest and the first axis is drawn through the cloud of points in a direction that maximises the total variation along it. The second eigenvalue, $\lambda _ 2$, is secondlargest and its corresponding axis is drawn in a direction that maximises the total variation along it, with the further caveat that it must be perpendicular to (independent of) the first axis, and so on. Although the decomposition will produce N axes if there are N points, there will generally be a maximum of N – 1 nonzero axes. This is because only N – 1 axes are required to place N points into a Euclidean space. (Consider: only 1 axis is needed to describe the distance between two points, only 2 axes are needed to describe the distances among three points, and so on…). If matrix D has Euclidean distances to begin with and the number of variables (p) is less than N, then the maximum number of eigenvalues will be p and the PCO axes will correspond exactly to principal component axes that would be produced using PCA.
The full set of PCO axes when taken all together preserve the original dissimilarities among the points given in matrix D. However, the adequacy of the representation of the points as projected onto a smaller number of dimensions is determined for a PCO by considering how much of the total variation in the system is explained by the first two (or three) axes that are drawn. The two (or three) dimensional distances in the ordination will underestimate the true dissimilarities^{65}. The percentage of the variation explained by the ith PCO axis is calculated as ($100 \times \lambda _ i / \sum \lambda _ i $). If the percentage of the variation explained by the first two axes is low, then distances in the twodimensional ordination will not necessarily reflect the structures occurring in the full multivariate space terribly well. How much is “enough” for the percentage of the variation explained by the first two (or three) axes in order to obtain meaningful interpretations from a PCO plot is difficult to establish, as it will depend on the goals of the study, the original number of variables and the number of points in the system. We suggest that taking an approach akin to that taken for a PCA is appropriate also for a PCO. For example, a twodimensional PCO ordination that explains ~70% or more of the multivariate variability inherent in the full resemblance matrix would be expected to provide a reasonable representation of the general overall structure. Keep in mind, however, that it is possible for the percentage to be lower, but for the most important features of the data cloud still to be well represented. Conversely, it is also possible for the percentage to be relatively high, but for considerable distortions of some distances still to occur, due to the projection.
^{63} If a matrix of similarities is available instead, then the PCO routine in PERMANOVA+ will automatically translate these into dissimilarities as an initial step in the analysis.
^{64} If the eigenvalue ($\lambda _ i$) is negative, then the square root of its absolute value is used instead, but the resulting vector is an imaginary axis (recall that any real number multiplied by $i = \sqrt{1}$ is an imaginary number).
^{65}Except in certain rare cases, where the first two or three axes might explain greater than 100% of the variability! See the section on Negative eigenvalues.
3.4 Example: Victorian avifauna
As an example, consider the data on Victorian avifauna at the level of individual surveys, in the file vicsurv.pri of the ‘VictAvi’ folder in the ‘Examples addon’ directory. For simplicity, we shall focus only on a subset of the samples: Select > Samples > •Factor levels > Factor name: Treatment > Levels… and select only those samples taken from either ‘good’ or ‘poor’ sites. Duplicate this sheet containing only the subset of data and rename it vic.good.poor. Next, choose Analyse > Resemblance > (Analyse between•Samples) & (Measure•BrayCurtis similarity) & ($\checkmark$Add dummy variable > Value: 1). From this resemblance matrix (calculated using the adjusted BrayCurtis measure), choose PERMANOVA+ > PCO > (Maximum no of PCOs: 15) & ($\checkmark$Plot results). The default maximum number of PCOs corresponds to N – 1.
The output provided from the analysis includes two parts. First, a text file (Fig. 3.2) with information concerning the eigenvector decomposition of the G matrix, including the percentage of the variation explained by each successive PCO axis. Values taken by individual samples along each PCO axis (called “scores”) are also provided. Like any other text file in PRIMER, one can cut and paste this information into a spreadsheet outside of PRIMER, if desired. Note there is also the option ($\checkmark$Scores to worksheet) in the PCO dialog box (Fig. 3.2), which allows further analyses of one or more of these axes from within PRIMER.