This function allow to evaluate some internal clustering criteria on one or several partitions. Connectivity can be taken into account for reduce the impact of contiguity / connectivity constraint.
Usage
clustering_criterion(
partitions,
criterion,
distances = NULL,
d = NULL,
data = NULL,
standardQuant = FALSE,
binarQual = FALSE,
linkage = NULL,
contiguity = NULL,
connected = FALSE,
...
)
Arguments
- partitions
a partition or a list of partitions of the same set the criterion will be calculated on. (vector or list of vectors)
- criterion
one of the available criterion. Use
available_criteria()
to see what criterion can be applied. (string)- distances
The distance matrix of the problem. This can be omitted if a distance function
d
and data contextdata
are provided. If onlydistances
is provided, all distances must be present. (distance matrix)- d
Distance function between elements. Some criteria require this value .If present,
data
must also be specified. Some classical distances are available, it is recommended to use them rather than a personal function for optimization reasons :"
euclidean
": euclidean distance."
manhattan
" : manhattan distance."
minkowski
" : minkowski distance. In that case a value for p >= 1 must be specified. (function or string)
- data
a data.frame where each row represents data related to an element. This can be omitted if
d
is omitted, but might be necessary for some criteria (e.g. Calinski-Harabasz). The present variables can be quantitative or qualitative. If qualitative variables are present, some distances and criteria may not be used. Possibility of standardising variables and transforming qualitative variables into binary variables (one-hot encoding) usingstandardQuant
andbinarQual
. (data.frame)- standardQuant
TRUE
if the variables indata
should be standardised (i.e., centered and scaled),FALSE
(default) otherwise. Standardisation is applied after the possible binarization of qualitative variables (seebinarQual
). (flag)- binarQual
TRUE
if qualitative variables should be binarized (one-hot encoding), for example, to make the data set compatible with common distances or to standardize these variables.FALSE
(default) otherwise. (flag)- linkage
a distance linkage. Can be a string (see
available_linkages()
) or a user function. Used for some of the criteria (e.g. Dunn).- contiguity
A contiguity matrix or an
igraph
contiguity graph. If not provided, the problem is considered completely contiguous (all elements are neighbors of each other).- connected
a flag equals to
TRUE
if the criterion should be calculated in its connected form, i.e. the mean of its value on each connected component.- ...
Arguments specific for the criterion.