Skip to contents

This function allow to evaluate some internal clustering criteria on one or several partitions. Connectivity can be taken into account for reduce the impact of contiguity / connectivity constraint.

Usage

clustering_criterion(
  partitions,
  criterion,
  distances = NULL,
  d = NULL,
  data = NULL,
  standardQuant = FALSE,
  binarQual = FALSE,
  linkage = NULL,
  contiguity = NULL,
  connected = FALSE,
  ...
)

Arguments

partitions

a partition or a list of partitions of the same set the criterion will be calculated on. (vector or list of vectors)

criterion

one of the available criterion. Use available_criteria() to see what criterion can be applied. (string)

distances

The distance matrix of the problem. This can be omitted if a distance function d and data context data are provided. If only distances is provided, all distances must be present. (distance matrix)

d

Distance function between elements. Some criteria require this value .If present, data must also be specified. Some classical distances are available, it is recommended to use them rather than a personal function for optimization reasons :

  • "euclidean": euclidean distance.

  • "manhattan" : manhattan distance.

  • "minkowski" : minkowski distance. In that case a value for p >= 1 must be specified. (function or string)

data

a data.frame where each row represents data related to an element. This can be omitted if d is omitted, but might be necessary for some criteria (e.g. Calinski-Harabasz). The present variables can be quantitative or qualitative. If qualitative variables are present, some distances and criteria may not be used. Possibility of standardising variables and transforming qualitative variables into binary variables (one-hot encoding) using standardQuant and binarQual. (data.frame)

standardQuant

TRUE if the variables in data should be standardised (i.e., centered and scaled), FALSE (default) otherwise. Standardisation is applied after the possible binarization of qualitative variables (see binarQual). (flag)

binarQual

TRUE if qualitative variables should be binarized (one-hot encoding), for example, to make the data set compatible with common distances or to standardize these variables. FALSE (default) otherwise. (flag)

linkage

a distance linkage. Can be a string (see available_linkages()) or a user function. Used for some of the criteria (e.g. Dunn).

contiguity

A contiguity matrix or an igraph contiguity graph. If not provided, the problem is considered completely contiguous (all elements are neighbors of each other).

connected

a flag equals to TRUE if the criterion should be calculated in its connected form, i.e. the mean of its value on each connected component.

...

Arguments specific for the criterion.