Skip to contents

Search of a fusion tree under connectivity and/or size constraints.

Usage

search_fusion_tree(
  distances = NULL,
  contiguity = NULL,
  sizes = NULL,
  d = NULL,
  data = NULL,
  m = 0L,
  M = Inf,
  standardQuant = FALSE,
  binarQual = FALSE,
  search = "DFS",
  stopCriterion = "first_valid_branch",
  childChoice = "distance",
  regionalisation = NULL,
  linkage = "complete"
)

AHR_fusion_tree(
  distances = NULL,
  contiguity = NULL,
  sizes = NULL,
  d = NULL,
  data = NULL,
  m = 0L,
  M = Inf,
  standardQuant = FALSE,
  binarQual = FALSE,
  regionalisation = NULL,
  linkage = "complete"
)

Arguments

distances

The distance matrix of the problem. This can be omitted if a distance function d and data context data are provided. If only distances is provided, all distances must be present. (distance matrix)

contiguity

A contiguity matrix or an igraph contiguity graph. If not provided, the problem is considered completely contiguous (all elements are neighbors of each other).

sizes

Represents the size of each element. By default, it is set to 1 for each element (the size of a cluster becomes its cardinal). All data must be positive or zero. (positive real numeric vector)

d

Distance function between elements. This can be omitted if distances is already indicated. If present, data must also be specified. Some classical distances are available, it is recommended to use them rather than a personal function for optimisation reasons :

  • "euclidean": Euclidean distance.

  • "manhattan" : Manhattan distance.

  • "minkowski" : Minkowski's distance. In that case a value for p >= 1 must be specified.

(function or string)

data

A data.frame where each row represents data related to an element. This can be omitted if d is omitted. Present variables can be quantitative or qualitative. If qualitative variables are present, some distances may not be used. Possibility of standardising variables and transforming qualitative variables into binary variables (one-hot encoding) using standardQuant and binarQual. (data.frame)

m

Minimum size constraint. Must be positive or zero and small enough for the problem to be feasible. Default is 0 (no constraint). (positive number)

M

Maximum size constraint. Must be positive, superior or equal to m and large enough for the problem to be feasible. Default is Inf (no constraint). (positive number)

standardQuant

TRUE if the variables in data should be standardised (i.e., centered and scaled), FALSE (default) otherwise. Standardisation is applied after the possible binarization of qualitative variables (see binarQual). (flag)

binarQual

TRUE if qualitative variables should be binarized (one-hot encoding), for example, to make the data set compatible with common distances or to standardize these variables. FALSE (default) otherwise. (flag)

search

Type of traversal to apply on the fusion tree. Three possibilities:

  • "BFS" (Depth First Search)

  • "DFS" (Breadth First Search)

  • "random" (random)

stopCriterion

Indicates when the traversal should stop. The possibilities are as follows:

  • "full": The algorithm stops when all possible fusions from the initial partition have been evaluated.

  • "first_valid_node": Stop when a first feasible solution is found. Useful when a minimum size constraint is present.

  • "first_valid_branch": Stop as soon as a node without fusion has been selected and a feasible solution has been found.

childChoice

In the case of breadth-first or depth-first search, the way in which child nodes are ordered (and accessed). Multiple choices:

  • "random": Children are ordered randomly.

  • "first_valid_node": Initial order is preserved.

  • "distance": Nodes are ordered by increasing inter-cluster distance.

regionalisation

regionalisation to start the tree traversal.

linkage

used when chilChoice = "distance". Can be a function or a string. This determines the linkage criterion and so the selection of the children nodes. If using custom distance function, it must take the pairwise distance matrix as an argument and return the linkage distance. Default implemented linkages can be seen with available_linkages().

Details

The triplet (search = "DFS", childChoice = "distance", stopCriterion = "first_valid_branch") corresponds to an Agglomerative Hierarchical Clustering (AHC) with contiguity and size constraints, not necessarily stopping at the fusion of two clusters but when a feasible partition is found. The two methods are equivalent when there is no minimum size constraint. [Experimental]

Functions

  • AHR_fusion_tree(): Shortcut for search_fusion_tree in the case of an AHC / AHR search.

See also