Binary divisive clustering

Two new clustering methods are introduced towards the end of Chapter 3 in CiMC, the first still a hierarchical clustering method leading to a tree diagram, but a divisive rather than agglomerative algorithm in which all samples start off in a single group and are then split into two groups, each of those then further sub-divided into two, and so on until some stopping rule is activated. The sub-groups are not constrained to be of comparable sizes, in fact may sometimes be a split of n samples into a group of size n–1 and a singleton. In keeping with the principles embodied by the PRIMER package, the criterion which is maximised in making each split is the non-parametric ANOSIM R statistic of Section 9, used as a pure measure of group separation for a multivariate set of samples (and not in any way as a test statistic). R is essentially the difference between the averages of rank dissimilarities between two groups and averaged rank dissimilarity within those groups, suitably scaled so that it takes values up to +1 (perfect rank separation, in which all dissimilarities between the groups are larger than any dissimilarities within either group). After each binary division, the dissimilarities among samples within each new group are re-ranked, and used to maximise R in a further binary division. Even for quite modest sample sizes, evaluating R for all possible splits into two groups can be prohibitive, so a search algorithm is required and the number of random restarts of that process needs to be specified (default 10, but increase this if the routine runs quickly). A range of different stopping rules are allowed, which can be used in combination: a) a split which would produce a group of size n or less is never made (n specified); b) groups of size <n are never split (n specified); c) a split is not made if the largest R is less than a specified value; d) a group is never split if a SIMPROF test of its samples cannot reject the hypothesis of ‘no structure’ within that group – this is the least arbitrary and most natural of the stopping rules, a natural counterpart to the stopping rule for interpretation used for the agglomerative clustering described earlier.

A parallel routine Analyse>Cluster>LINKTREE is described in Section 13 (called linkage trees), a constrained divisive clustering in which binary splits of, for example, biotic community samples are made in the same way (by maximising R), but only if an environmental variable can be found that takes a non-overlapping range of values in the two groups produced (a possible ‘explanation’ for that split therefore). In contrast, this new routine to PRIMER 7 is a completely unconstrained tree, accessed by Analyse>Cluster>UNCTREE: each sample is divided to maximise R, based only on the input resemblance matrix, e.g. the community similarities, without external constraints.

Getting in touch with us

System requirements

Installing PRIMER

Information on analyses

PERMANOVA+ add-on

Introduction to the methods of PRIMER

Changes from PRIMER 6 to PRIMER 7

Typographic conventions for this manual

Opening the examples

Reading data in from Excel

Basic MVA wizard

Pre-treatment of data

Matrix display wizard

Environmental data

Resemblance calculation

ANOSIM tests

CLUSTER analyses

MDS & PCA ordinations

Species analyses

Other analyses

Primer 7 trial software

Help system & manuals

Updates

Install and Uninstall

Example data

Getting the examples

Primer file types

Compatibility of files

Opening the PRIMER 7 desktop

Entering data directly

Labelling samples & variables

Deleting & inserting rows/cols

Undo data sheet edits

Moving & sorting rows/cols

Cut, copying & pasting

Saving data, renaming & deleting

Undo in the workspace

Saving, closing & opening a workspace

Setting the initial directory

Opening PRIMER files

(Ekofisk oil-field fauna)

Properties

Opening Excel files

(Ekofisk abiotic data)

Wizard for input data

Missing or zero values?

(Tasmanian meiofauna)

Opening several files at once

Opening the same file twice

Text-format input files

Factors in 3-column text format files

Dialog for input of text format files

Size of data worksheets

Merging worksheets

Output data formats

Editing labels

Active window

Use of factors

Creating & filling in factors

Cut, Copy, Paste, Delete in factors

Renaming & reordering factors

Multiple sessions and recent workspaces

Combining factors (e.g. to average)

Factor keys

Importing factors

Label matching

Factors in *.xls(x) or *.txt files

Creating indicators on variables

Indicators in selection

Variable information (aggregation files)

Highlight and select

(W Australia fish diets)

Summary Statistics

Control of highlighting

Selecting & deselecting highlights

Duplicating a selected worksheet

Selecting by factor levels

Multiple selections

Selecting by number and non-missing

Selecting variables

Factors in .xls(x) or .txt files