Skip to main content

1.5 Environmental data: summary stats

For environmental data, we might choose to calculate different sorts of summary statistics than the kinds of things we would want to know about biotic data consisting of counts. For count data, quantities like the numbers of zeros, singletons, doubletons and frequencies of occurrence might well be of interest. However, these are not typically meaningful for variables that have been recorded as (effectively) continuous values, such as temperature, dissolved oxygen, etc. Instead, when we deal with environmental data (or, more generally, continuous quantitative variables), it may be helpful to know what the smallest non-zero value is, or to know the degree of skewness. These can aid in determining an appropriate transformation (e.g., to obtain approximate symmetry).

Let's look at some environmental data from a study of benthic soft-sediment assemblages in the Firth of Clyde, SW Scotland ( Pearson & Blackstock (1984) ). The abundance and biomass of 84 macrofaunal species as well as contaminant data (organic enrichment and concentrations of heavy metals in the sediment) were sampled at a series of 12 sites along a transect that passed through the sewage-sludge disposal ground at Garroch Head. Open the data file named 'Clyde_environment.pri' in PRIMER (found inside the 'Examples_P8 > Clyde_macrofauna' folder).

9.Env_Clyde[i].png

Of course, tools such as histograms and draftsman plots (found under the Plots menu) are very useful for visualising the distributions of values for the individual variables. Summary statistics complement these visual tools, yielding some important additional detailed information. Click Tools > Summary Stats..., then choose to output the following statistics (shown in the dialog below): average, minimum, maximum, standard deviation, symmetry (0.05), skewness, kurtosis, and smallest number above a threshold of 0.

10.Env_Summary_Stats_dialog[new].png

The resulting output file (called 'Data1' and shown below) indicates that many of the variables are right-skewed, with positive values for skewness, and values less than 0.5 for the symmetry statistic, while others (Co, Ni) are apparently left-skewed (negative skewness and symmetry > 0.5). Depth, however (Dep) is a fairly flat (platykurtic) variable (with negative excess kurtosis). Also evident is the fact that values of cadmium concentration (Cd) reach a minimum value of zero, and that the smallest value recorded above zero is 0.1. Thus, if we were inclined to transform the values for Cd (e.g., to make its distribution more symmetric), we might consider a log transformation such as $log(y_i+c)$, where $c$ = 0.1.

11.Summary_Stats_Clyde_results[i].png

(Note: the variable 'Cd' has been highlighted in the above image of 'Data1' by clicking on the heading for that column. Highlighting variables (or samples), can be very useful for selecting subsets of data, or for applying a transformation to a subset of variables.)