Evaluates algorithms on internal/external validation indices. Poor performing algorithms can be trimmed from the ensemble. The remaining algorithms can be given weights before use in consensus functions.
Usage
consensus_evaluate(
data,
...,
cons.cl = NULL,
ref.cl = NULL,
k.method = NULL,
plot = FALSE,
trim = FALSE,
reweigh = FALSE,
n = 5,
lower = 0,
upper = 1
)Arguments
- data
data matrix with rows as samples and columns as variables
- ...
any number of objects outputted from
consensus_cluster()- cons.cl
matrix of cluster assignments from consensus functions such as
kmodesandmajority_voting- ref.cl
reference class
- k.method
determines the method to choose k when no reference class is given. When
ref.clis notNULL, k is the number of distinct classes ofref.cl. Otherwise the input fromk.methodchooses k. The default is to use the PAC to choose the best k(s). Specifying an integer as a user-desired k will override the best k chosen by PAC. Finally, specifying "all" will produce consensus results for all k. The "all" method is implicitly performed when there is only one k used.- plot
logical; if
TRUE,graph_allis called- trim
logical; if
TRUE, algorithms that score low on internal indices will be trimmed out- reweigh
logical; if
TRUE, after trimming out poor performing algorithms, each algorithm is reweighed depending on its internal indices.- n
an integer specifying the top
nalgorithms to keep after trimming off the poor performing ones using Rank Aggregation. If the total number of algorithms is less thannno trimming is done.- lower
the lower bound that determines what is ambiguous
- upper
the upper bound that determines what is ambiguous
Value
consensus_evaluate returns a list with the following elements
k: ifref.clis notNULL, this is the number of distinct classes in the reference; otherwise the chosenkis determined by the one giving the largest mean PAC across algorithmspac: a data frame showing the PAC for each combination of algorithm and cluster sizeii: a list of data frames for all k showing internal evaluation indicesei: a data frame showing external evaluation indices forktrim.obj: A list with 4 elementsalg.keep: algorithms keptalg.remove: algorithms removedrank.matrix: a matrix of ranked algorithms for every internal evaluation indextop.list: final order of ranked algorithmsE.new: A new version of aconsensus_clusterdata object
Details
This function always returns internal indices. If ref.cl is not NULL,
external indices are additionally shown. Relevant graphical displays are also
outputted. Algorithms are ranked across internal indices using Rank
Aggregation. Only the top n algorithms are kept, the rest are trimmed.
Examples
# Consensus clustering for multiple algorithms
set.seed(911)
x <- matrix(rnorm(500), ncol = 10)
CC <- consensus_cluster(x, nk = 3:4, reps = 10, algorithms = c("ap", "km"),
progress = FALSE)
# Evaluate algorithms on internal/external indices and trim algorithms:
# remove those ranking low on internal indices
set.seed(1)
ref.cl <- sample(1:4, 50, replace = TRUE)
z <- consensus_evaluate(x, CC, ref.cl = ref.cl, n = 1, trim = TRUE)
str(z, max.level = 2)
#> List of 5
#> $ k : int 4
#> $ pac :'data.frame': 2 obs. of 3 variables:
#> ..$ k : chr [1:2] "3" "4"
#> ..$ AP: num [1:2] 0.505 0.48
#> ..$ KM: num [1:2] 0.514 0.498
#> $ ii :List of 2
#> ..$ 3:'data.frame': 2 obs. of 16 variables:
#> ..$ 4:'data.frame': 2 obs. of 16 variables:
#> $ ei :List of 1
#> ..$ 4:'data.frame': 2 obs. of 19 variables:
#> $ trim.obj:List of 5
#> ..$ alg.keep : chr "KM"
#> ..$ alg.remove : chr "AP"
#> ..$ rank.matrix:List of 1
#> ..$ top.list :List of 1
#> ..$ E.new :List of 1
