Algorithm to improve (according to a certain criterion) a
solution that is feasible for a certain classification problem with
connectivity and size constraints.
Usage
enhance_feasible(
regionalisation,
distances = NULL,
contiguity = NULL,
sizes = NULL,
d = NULL,
data = NULL,
m = 0,
M = Inf,
standardQuant = FALSE,
binarQual = FALSE,
enhanceCriteria = c("AHC", "Silhouette", "Dunn"),
linkages = "saut max",
evaluationCriteria = enhanceCriteria,
maxIt = Inf,
parallel = TRUE,
nbCores = detectCores() - 1L,
verbose = TRUE
)Arguments
- regionalisation
feasible regionalisation to optimize.
- distances
The distance matrix of the problem. This can be omitted if a distance function
dand data contextdataare provided. If onlydistancesis provided, all distances must be present. (distance matrix)- contiguity
A contiguity matrix or an
igraphcontiguity graph. If not provided, the problem is considered completely contiguous (all elements are neighbors of each other).- sizes
Represents the size of each element. By default, it is set to
1for each element (the size of a cluster becomes its cardinal). All data must be positive or zero. (positive real numeric vector)- d
Distance function between elements. This can be omitted if
distancesis already indicated. If present,datamust also be specified. Some classical distances are available, it is recommended to use them rather than a personal function for optimisation reasons :"
euclidean": Euclidean distance."
manhattan" : Manhattan distance."
minkowski" : Minkowski's distance. In that case a value for p >= 1 must be specified.
(function or string)
- data
A data.frame where each row represents data related to an element. This can be omitted if
dis omitted. Present variables can be quantitative or qualitative. If qualitative variables are present, some distances may not be used. Possibility of standardising variables and transforming qualitative variables into binary variables (one-hot encoding) usingstandardQuantandbinarQual. (data.frame)- m
Minimum size constraint. Must be positive or zero and small enough for the problem to be feasible. Default is
0(no constraint). (positive number)- M
Maximum size constraint. Must be positive, superior or equal to
mand large enough for the problem to be feasible. Default isInf(no constraint). (positive number)- standardQuant
TRUEif the variables indatashould be standardised (i.e., centered and scaled),FALSE(default) otherwise. Standardisation is applied after the possible binarization of qualitative variables (seebinarQual). (flag)- binarQual
TRUEif qualitative variables should be binarized (one-hot encoding), for example, to make the data set compatible with common distances or to standardize these variables.FALSE(default) otherwise. (flag)- enhanceCriteria
A vector of criteria used for the enhancement of the actual feasible solution. Currently available choices are those in
available_criteria(), plus "AHC" (depends of thelinkagesparameter). Compared to others AHC doesn't improve a global criterion but do this locally, hoping to reduce computing time. Regarding to this criterion a feasible solution, built by move a unique element from a cluster to another is better if the element is closer to the other cluster than it's actual (depending of some linkage).- linkages
Vector of linkage distances used when a criterion ("
Dunn", "AHC") needs it.- evaluationCriteria
criteria used for comparison after enhancement. They are evaluated on each feasible solution given by each criterion used for enhancement. Must be a vector composed of the available criteria in c3t. For the Dunn index there will be one criterion per linkage given. See
available_criteria().- maxIt
maximum number of allowed iterations. Default is
Inf. (strictly positive integer)- parallel
Logical indicating whether to use parallel processing. Default is TRUE.
- nbCores
Number of CPU cores to use for parallel processing (sockets method). Default is one less than the detected number of cores.
- verbose
Logical indicating whether to display progress messages. Default is TRUE.
Value
a tibble with one row per try. For each row the following variables:
criterion: name of the criterion used for improvement.linkage: type of linkage distance used (NAif this argument is irrelevant for the actual criterion).sampleSize: size of the sample for the calculation of the criterion (NAif irrelevant).statut: state of improvement. Indicates whether an improvement could be made or not.iterations: number of improving iterations performed.regionalisationOpti: the new regionalisation. Identical to the input argument if no improvement could be made.one column per criterion indicated in
critereEvaluation. If some of those criteria use a linkage distance, there will be one column per linkage distance given inlinkageand per criterion.
