Create Centroids — create_centroids • GSgalgoR

This functions create the signature centroids estimated from the GalgoR output and the expression matrix of the training sets.

create_centroids (output, solution_names, trainset,
distancetype = "pearson")

Arguments

output	@param output An object of class `galgo.Obj`
solution_names	A `character` vector with the names of the solutions for which the centroids are to be calculated
trainset	a `matrix` or `data.frame`. Must be an expression matrix with features in rows and samples in columns
distancetype	a `character` that can be either `'pearson'`, `'uncentered'`, `'spearman'` or `'euclidean'`

Value

Returns a list with the centroid matrix for each of the solutions in solution_names, where each column represents the prototypic centroid of a subtype and each row the constituents features of the solution signature

Examples

# load example dataset
library(breastCancerTRANSBIG)
data(transbig)
Train <- transbig
rm(transbig)
#> Warning: object 'transbig' not found

expression <- Biobase::exprs(Train)
clinical <- Biobase::pData(Train)
OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs)

# We will use a reduced dataset for the example
expression <- expression[sample(1:nrow(expression), 100), ]

# Now we scale the expression matrix
expression <- t(scale(t(expression)))

# Run galgo
output <- GSgalgoR::galgo(generations = 5, population = 15,
prob_matrix = expression, OS = OS)
#> Using CPU for computing pearson distance
#> Generation 1 Non-dominated solutions:
#>            k                       rnkIndex     CrowD
#> result.2   2 0.098918398 102.34123        1 0.8156654
#> result.6   9 0.013063678 266.00167        1 0.5122638
#> result.7   5 0.040735456 188.60108        1 0.3334821
#> result.9   2 0.147257481  91.05519        1       Inf
#> result.10  4 0.042905991 119.83671        1 0.7310967
#> result.11 10 0.006874307 273.11792        1       Inf
#> result.14  6 0.035951204 192.68150        1 0.4908588
#> Generation 2 Non-dominated solutions:
#>            k                       rnkIndex     CrowD
#> result.9   2 0.147257481  91.05519        1       Inf
#> result.11 10 0.006874307 273.11792        1       Inf
#> result.4   4 0.074174549 222.76597        1 1.2568850
#> result.3   2 0.124728085 128.01790        1 0.9742191
#> result.6   9 0.013063678 266.00167        1 0.6373805
#> Generation 3 Non-dominated solutions:
#>            k                      rnkIndex     CrowD
#> result.11 10 0.006874307 273.1179        1       Inf
#>            2 0.157594761 116.7450        1       Inf
#> result.4   4 0.074174549 222.7660        1 0.8744581
#>            2 0.125659477 173.8552        1 0.6204926
#>            8 0.034788163 252.8030        1 0.5502943
#>            2 0.135854613 162.1494        1 0.4198634
#> result.6   9 0.013063678 266.0017        1 0.2535294
#> Generation 4 Non-dominated solutions:
#>            k                      rnkIndex     CrowD
#> result.11 10 0.006874307 273.1179        1       Inf
#>            2 0.157594761 116.7450        1       Inf
#> result.4   4 0.074174549 222.7660        1 0.8770056
#>            8 0.034788163 252.8030        1 0.5702679
#>            2 0.123626400 179.2014        1 0.5361895
#>            2 0.135854613 162.1494        1 0.4521258
#> result.6   9 0.013063678 266.0017        1 0.2630106
#>            2 0.125659477 173.8552        1 0.1516187
#> Generation 5 Non-dominated solutions:
#>            k                      rnkIndex     CrowD
#> result.11 10 0.006874307 273.1179        1       Inf
#>            2 0.157594761 116.7450        1       Inf
#> result.4   4 0.074174549 222.7660        1 0.8613939
#>            8 0.034788163 252.8030        1 0.5652213
#>            2 0.123626400 179.2014        1 0.5223256
#>            2 0.135854613 162.1494        1 0.3207182
#>            2 0.147664203 126.5714        1 0.3120183
#> result.6   9 0.013063678 266.0017        1 0.2602698
#>            2 0.125659477 173.8552        1 0.1441415
outputDF <- to_dataframe(output)
outputList <- to_list(output)

RESULTS <- non_dominated_summary(
    output = output, OS = OS,
    prob_matrix = expression,
    distancetype = "pearson"
)
#> Using CPU for computing pearson distance
CentroidsList <- create_centroids(output, RESULTS$solution,
trainset = expression)
#> Using CPU for computing pearson distance