Algorithm to improve (according to a certain criterion) a
solution that is feasible for a certain classification problem with
connectivity and size constraints.
Usage
enhance_feasible(
regionalisation,
distances = NULL,
contiguity = NULL,
sizes = NULL,
d = NULL,
data = NULL,
m = 0,
M = Inf,
standardQuant = FALSE,
binarQual = FALSE,
enhanceCriteria = c("AHC", "Silhouette", "Dunn"),
linkages = "saut max",
evaluationCriteria = enhanceCriteria,
maxIt = Inf,
parallel = TRUE,
nbCores = detectCores() - 1L,
verbose = TRUE
)
Arguments
- regionalisation
feasible regionalisation to optimize.
- distances
The distance matrix of the problem. This can be omitted if a distance function
d
and data contextdata
are provided. If onlydistances
is provided, all distances must be present. (distance matrix)- contiguity
A contiguity matrix or an
igraph
contiguity graph. If not provided, the problem is considered completely contiguous (all elements are neighbors of each other).- sizes
Represents the size of each element. By default, it is set to
1
for each element (the size of a cluster becomes its cardinal). All data must be positive or zero. (positive real numeric vector)- d
Distance function between elements. This can be omitted if
distances
is already indicated. If present,data
must also be specified. Some classical distances are available, it is recommended to use them rather than a personal function for optimisation reasons :"
euclidean
": Euclidean distance."
manhattan
" : Manhattan distance."
minkowski
" : Minkowski's distance. In that case a value for p >= 1 must be specified.
(function or string)
- data
A data.frame where each row represents data related to an element. This can be omitted if
d
is omitted. Present variables can be quantitative or qualitative. If qualitative variables are present, some distances may not be used. Possibility of standardising variables and transforming qualitative variables into binary variables (one-hot encoding) usingstandardQuant
andbinarQual
. (data.frame)- m
Minimum size constraint. Must be positive or zero and small enough for the problem to be feasible. Default is
0
(no constraint). (positive number)- M
Maximum size constraint. Must be positive, superior or equal to
m
and large enough for the problem to be feasible. Default isInf
(no constraint). (positive number)- standardQuant
TRUE
if the variables indata
should be standardised (i.e., centered and scaled),FALSE
(default) otherwise. Standardisation is applied after the possible binarization of qualitative variables (seebinarQual
). (flag)- binarQual
TRUE
if qualitative variables should be binarized (one-hot encoding), for example, to make the data set compatible with common distances or to standardize these variables.FALSE
(default) otherwise. (flag)- enhanceCriteria
A vector of criteria used for the enhancement of the actual feasible solution. Currently available choices are those in
available_criteria()
, plus "AHC" (depends of thelinkages
parameter). Compared to others AHC doesn't improve a global criterion but do this locally, hoping to reduce computing time. Regarding to this criterion a feasible solution, built by move a unique element from a cluster to another is better if the element is closer to the other cluster than it's actual (depending of some linkage).- linkages
Vector of linkage distances used when a criterion ("
Dunn
", "AHC
") needs it.- evaluationCriteria
criteria used for comparison after enhancement. They are evaluated on each feasible solution given by each criterion used for enhancement. Must be a vector composed of the available criteria in c3t. For the Dunn index there will be one criterion per linkage given. See
available_criteria()
.- maxIt
maximum number of allowed iterations. Default is
Inf
. (strictly positive integer)- parallel
Logical indicating whether to use parallel processing. Default is TRUE.
- nbCores
Number of CPU cores to use for parallel processing (sockets method). Default is one less than the detected number of cores.
- verbose
Logical indicating whether to display progress messages. Default is TRUE.
Value
a tibble with one row per try. For each row the following variables:
criterion
: name of the criterion used for improvement.linkage
: type of linkage distance used (NA
if this argument is irrelevant for the actual criterion).sampleSize
: size of the sample for the calculation of the criterion (NA
if irrelevant).statut
: state of improvement. Indicates whether an improvement could be made or not.iterations
: number of improving iterations performed.regionalisationOpti
: the new regionalisation. Identical to the input argument if no improvement could be made.one column per criterion indicated in
critereEvaluation
. If some of those criteria use a linkage distance, there will be one column per linkage distance given inlinkage
and per criterion.