![](../logo.png)
Search of a fusion tree under connectivity and/or size constraints.
Source:R/fusionTree.R
search_fusion_tree.Rd
Search of a fusion tree under connectivity and/or size constraints.
Usage
search_fusion_tree(
distances = NULL,
contiguity = NULL,
sizes = NULL,
d = NULL,
data = NULL,
m = 0L,
M = Inf,
standardQuant = FALSE,
binarQual = FALSE,
search = "DFS",
stopCriterion = "first_valid_branch",
childChoice = "distance",
regionalisation = NULL,
linkage = "complete"
)
AHR_fusion_tree(
distances = NULL,
contiguity = NULL,
sizes = NULL,
d = NULL,
data = NULL,
m = 0L,
M = Inf,
standardQuant = FALSE,
binarQual = FALSE,
regionalisation = NULL,
linkage = "complete"
)
Arguments
- distances
The distance matrix of the problem. This can be omitted if a distance function
d
and data contextdata
are provided. If onlydistances
is provided, all distances must be present. (distance matrix)- contiguity
A contiguity matrix or an
igraph
contiguity graph. If not provided, the problem is considered completely contiguous (all elements are neighbors of each other).- sizes
Represents the size of each element. By default, it is set to
1
for each element (the size of a cluster becomes its cardinal). All data must be positive or zero. (positive real numeric vector)- d
Distance function between elements. This can be omitted if
distances
is already indicated. If present,data
must also be specified. Some classical distances are available, it is recommended to use them rather than a personal function for optimisation reasons :"
euclidean
": Euclidean distance."
manhattan
" : Manhattan distance."
minkowski
" : Minkowski's distance. In that case a value for p >= 1 must be specified.
(function or string)
- data
A data.frame where each row represents data related to an element. This can be omitted if
d
is omitted. Present variables can be quantitative or qualitative. If qualitative variables are present, some distances may not be used. Possibility of standardising variables and transforming qualitative variables into binary variables (one-hot encoding) usingstandardQuant
andbinarQual
. (data.frame)- m
Minimum size constraint. Must be positive or zero and small enough for the problem to be feasible. Default is
0
(no constraint). (positive number)- M
Maximum size constraint. Must be positive, superior or equal to
m
and large enough for the problem to be feasible. Default isInf
(no constraint). (positive number)- standardQuant
TRUE
if the variables indata
should be standardised (i.e., centered and scaled),FALSE
(default) otherwise. Standardisation is applied after the possible binarization of qualitative variables (seebinarQual
). (flag)- binarQual
TRUE
if qualitative variables should be binarized (one-hot encoding), for example, to make the data set compatible with common distances or to standardize these variables.FALSE
(default) otherwise. (flag)- search
Type of traversal to apply on the fusion tree. Three possibilities:
"
BFS
" (Depth First Search)"
DFS
" (Breadth First Search)"
random
" (random)
- stopCriterion
Indicates when the traversal should stop. The possibilities are as follows:
"
full
": The algorithm stops when all possible fusions from the initial partition have been evaluated."
first_valid_node
": Stop when a first feasible solution is found. Useful when a minimum size constraint is present."
first_valid_branch
": Stop as soon as a node without fusion has been selected and a feasible solution has been found.
- childChoice
In the case of breadth-first or depth-first search, the way in which child nodes are ordered (and accessed). Multiple choices:
"
random
": Children are ordered randomly."
first_valid_node
": Initial order is preserved."
distance
": Nodes are ordered by increasing inter-cluster distance.
- regionalisation
regionalisation to start the tree traversal.
- linkage
used when
chilChoice =
"distance"
. Can be a function or a string. This determines the linkage criterion and so the selection of the children nodes. If using custom distance function, it must take the pairwise distance matrix as an argument and return the linkage distance. Default implemented linkages can be seen withavailable_linkages()
.
Details
The triplet
(search
= "DFS
", childChoice
= "distance
",
stopCriterion
= "first_valid_branch
")
corresponds to an Agglomerative Hierarchical Clustering (AHC) with
contiguity and size constraints, not necessarily stopping at the fusion of
two clusters but when a feasible partition is found.
The two methods are equivalent when there is no minimum size constraint.