fssg is a small package focused on allowing users to
quickly test multiple parametric survival models in one line of code, by
stream-lining the building of many flexsurv models.
This package was originally inspired some of my other research on
Broad Genomic Profiling, where tens to hundreds of genomic mutations are
tested in one ‘batch’ using a single tissue sample. This type of
approach has many names, but can be colloquially referred to as a
“shotgun” approach, which is where the package gets it’s name:
FlexSurv ShotGun
(fssg).
To get started with fssg, we’ll simply load our package,
which will also grab the main dependency flexsurv.
The main powerhouse behind this package is the parametric survival
modeling package flexsurv. flexsurv provides a
straightforward way to build parametric survival models, including many
of the most commonly used distributions in survival modeling (weibull,
gamma, etc.). This is a very useful tool, but is limited in parametric
distribution options. Thankfully, the authors of flexsurv
included a way for users to declare and use their own custom defined
distribution. This means that as long as we know what we’re doing, we
can create models using whatever distribution we’d like!
Included in this package are some functions for the creation of
custom flexsurv-compatible distributions. As long as we
have distribution functions for the distribution
(i.e. pnorm and dnorm), and understand the
bounds of our parameters, then we can construct a distribution object
using fssg_dist.
What do we need in order to build a custom distribution?
fssg back-end, but technically
isn’t that important if you’re not doing batch testing via
fssg.flexsurv know what
parameters need to be initialized and optimized in the fitting
process.flexsurv calls this the ‘location’
variable, as it originally focused primarily on location-scale
distributions, but non-location parameters may be supplied as well.
Please note, that this can only be one parameter in
flexsurv, but we can allow other parameters to vary by
supplying a list into the “anc” parameter in flexsurvreg.
See ?flexsurvreg for more information on this.flexsurv also
requests for the inverse functions to allow for back transformation to
the original parameter scale. These are supplied to the distribution
object as a list of functions. Most often, when parameters must be
greater than 0, the transforms function is something like
log, while the inverse transformation is
exp.flexsurv actually allows these
to be specified in one of two primary ways: with a density and
distribution function (‘d’, ‘p’), or with a Hazard and Cumulative Hazard
function (‘h’,‘H’). It also helps to include the ‘q’ function when
possible. fssg includes some helper functions to construct
these distribution functions based off a ‘d’ and ‘p’ function (See
?survivify).Let’s try an example. fssg contains some custom
distribution functions for the Gamma-Gompertz distribution
(e.g. dgamgomp, pgamgomp). These distributions
functions are specified in a similar fashion to the native R
distribution functions like pnorm, dnorm.
# 'Density' function
dgamgomp(x=1, b=1, sigma=1, beta=1)
#> [1] 0.3678794
# 'Distribution' function
pgamgomp(q=1, b=1, sigma=1, beta=1)
#> [1] 0.6321206
# Constructing our distribution object
fssg_gamgomp <- fssg_dist(
name='gamgomp', # a simple name
pars=c('b','sigma','beta'), # parameters are scale, shape, shape respectively
location='b', # name of the parameter we want to vary
transforms=c(log,log,log), # transformation functions for each of our parameters (all must be >0)
inv.transforms=c(exp,exp,exp), # back-transformation functions for each of our parameters
inits=function(t){c(1/max(t), 1, 1)}, # function that constructs the initial parameter estimates for optimization
d = dgamgomp, # density function
p = pgamgomp, # distribution function.
q = quantilify(pgamgomp), # quantile function (optional)
h = hazardify(dgamgomp, pgamgomp), # hazard function (optional)
H = cumhazardify(pgamgomp), # cumulative hazard function (optional)
fullname='gamma_gompertz' # secondary name(s) which is used in the back-end of `fssg` to better label outputs
)We can then use this distribution object directly with the normal
flexsurvreg function. It’s recommended that you specify the
dfns parameter explicitly, as otherwise it will attempt to
use globally declared d and p variables based on your distribution
name.
# Sample data set provided in the package
data('pseudo', package='fssg')
flexsurvreg(
formula = Surv(time, death) ~ 1,
data = pseudo,
dist = fssg_gamgomp,
dfns = list(fssg_gamgomp$d, fssg_gamgomp$p)
) -> f1
print(f1)
#> Call:
#> flexsurvreg(formula = Surv(time, death) ~ 1, data = pseudo, dist = fssg_gamgomp,
#> dfns = list(fssg_gamgomp$d, fssg_gamgomp$p))
#>
#> Estimates:
#> est L95% U95% se
#> b 1.57e-01 1.50e-01 1.65e-01 3.75e-03
#> sigma 5.08e-01 4.50e-01 5.74e-01 3.17e-02
#> beta 1.86e+06 1.10e+06 3.13e+06 4.96e+05
#>
#> N = 10000, Events: 5228, Censored: 4772
#> Total time at risk: 930539
#> Log-likelihood = -23777.51, df = 3
#> AIC = 47561.02
plot(f1)To save users some time, we’ve already specified a large number of
custom distributions which can be cleanly accessed using
fssg_dist_list. This function creates a list of each
included distribution object. If you want to grab a specific
distribution object from this list, you can directly grab it using
get_fssg_dist.
get_fssg_dist('gamma_gompertz')
#> $name
#> [1] "gamgomp"
#>
#> $pars
#> [1] "b" "sigma" "beta"
#>
#> $location
#> [1] "b"
#>
#> $transforms
#> $transforms[[1]]
#> function (x, base = exp(1)) .Primitive("log")
#>
#> $transforms[[2]]
#> function (x, base = exp(1)) .Primitive("log")
#>
#> $transforms[[3]]
#> function (x, base = exp(1)) .Primitive("log")
#>
#>
#> $inv.transforms
#> $inv.transforms[[1]]
#> function (x) .Primitive("exp")
#>
#> $inv.transforms[[2]]
#> function (x) .Primitive("exp")
#>
#> $inv.transforms[[3]]
#> function (x) .Primitive("exp")
#>
#>
#> $inits
#> function (t)
#> {
#> c(1/max(t), 1, 1)
#> }
#> <bytecode: 0x00000290f301a730>
#> <environment: 0x00000290fa290438>
#>
#> $d
#> function (x, b, sigma, beta, log = FALSE)
#> {
#> return(dgamgomp_c(x, b, sigma, beta, log))
#> }
#> <bytecode: 0x00000290ef38b648>
#> <environment: namespace:fssg>
#>
#> $p
#> function (q, b, sigma, beta, lower.tail = TRUE, log.p = FALSE)
#> {
#> return(pgamgomp_c(q, b, sigma, beta, lower.tail, log.p))
#> }
#> <bytecode: 0x00000290ef384500>
#> <environment: namespace:fssg>
#>
#> $q
#> function (p, ...)
#> {
#> lower = -1e+06
#> upper = 1e+100
#> lower_init <- if (p_function(0, ...) == 0)
#> 0
#> else lower
#> sapply(p, function(pi) {
#> if (is.na(pi)) {
#> return(NA_real_)
#> }
#> if (pi < 0 || pi > 1) {
#> return(NaN)
#> }
#> if (pi == 0) {
#> return(-Inf)
#> }
#> if (pi == 1) {
#> return(Inf)
#> }
#> stats::uniroot(function(x) p_function(x, ...) - pi, lower = lower_init,
#> upper = upper)$root
#> })
#> }
#> <bytecode: 0x00000290f33c14c0>
#> <environment: 0x00000290fa27aee8>
#>
#> $h
#> function (x, ...)
#> {
#> fx <- d_function(x, ...)
#> Fx <- p_function(q = x, ...)
#> fx/(1 - Fx)
#> }
#> <bytecode: 0x00000290f3405178>
#> <environment: 0x00000290fa27b2b0>
#>
#> $H
#> function (x, ...)
#> {
#> sx <- 1 - p_function(q = x, ...)
#> -log(sx)
#> }
#> <bytecode: 0x00000290f340b808>
#> <environment: 0x00000290fa27b5c0>
#>
#> $fullname
#> [1] "gamma_gompertz"We can also use the fssg function to build an individual
flexsurvreg model by giving it individual model names! More
on this function later.
fssg(
formula = Surv(time, death) ~ 1,
data = pseudo,
model='gamma_gompertz'
) -> model1
#> Beginning...
#> ------------------------------------------------------------------------
#> gamma_gompertz
#> gamma_gompertz: 2.13 sec elapsed
#> ------------------------------------------------------------------------
#> Final Summary!
#> Model with best AIC: gamma_gompertz
#> Model with best BIC: gamma_gompertz
#> Kapow!: 2.24 sec elapsed
model1$models$gamma_gompertz
#> Call:
#> flexsurv::flexsurvreg(formula = formula, data = data, dist = i,
#> dfns = list(d = i$d, p = i$p), method = opt_method)
#>
#> Estimates:
#> est L95% U95% se
#> b 1.57e-01 1.50e-01 1.65e-01 3.75e-03
#> sigma 5.08e-01 4.50e-01 5.74e-01 3.17e-02
#> beta 1.86e+06 1.10e+06 3.13e+06 4.96e+05
#>
#> N = 10000, Events: 5228, Censored: 4772
#> Total time at risk: 930539
#> Log-likelihood = -23777.51, df = 3
#> AIC = 47561.02fssg FunctionThe titular fssg function provides a one-line way to
test the fit of several distributions on your data. Simply provide it
your model formula and data, and watch it run. By default, the function
will attempt to build all available flexsurv and
fssg parametric distribution models on your data. Progress
will be printed by default, and any models that failed will list the
corresponding error. The function always prints a summary of the models
it ran, and by default returns a list with two items: the summary table
as a data.frame, and a list containing each successful model.
fssg(
survival::Surv(time, status) ~ x,
data = survival::aml
) -> aml_model
#> Beginning...
#> ------------------------------------------------------------------------
#> genf
#> genf: 0.05 sec elapsed
#> ------------------------------------------------------------------------
#> genf.orig
#> genf.orig: 0.27 sec elapsed
#> ------------------------------------------------------------------------
#> gengamma
#> gengamma: 0.01 sec elapsed
#> ------------------------------------------------------------------------
#> gengamma.orig
#> gengamma.orig: 0.22 sec elapsed
#> ------------------------------------------------------------------------
#> exp
#> exp: 0 sec elapsed
#> ------------------------------------------------------------------------
#> weibull
#> weibull: 0.02 sec elapsed
#> ------------------------------------------------------------------------
#> weibullPH
#> weibullPH: 0 sec elapsed
#> ------------------------------------------------------------------------
#> lnorm
#> lnorm: 0 sec elapsed
#> ------------------------------------------------------------------------
#> gamma
#> gamma: 0.01 sec elapsed
#> ------------------------------------------------------------------------
#> gompertz
#> gompertz: 0 sec elapsed
#> ------------------------------------------------------------------------
#> llogis
#> llogis: 0.02 sec elapsed
#> ------------------------------------------------------------------------
#> betaprime
#> betaprime: 0.01 sec elapsed
#> ------------------------------------------------------------------------
#> cauchy_scale
#> cauchy_scale: 0.02 sec elapsed
#> ------------------------------------------------------------------------
#> cauchy_location
#> cauchy_location: 0.01 sec elapsed
#> ------------------------------------------------------------------------
#> dagum
#> dagum: 0.07 sec elapsed
#> ------------------------------------------------------------------------
#> exp_log
#> exp_log: 0.01 sec elapsed
#> ------------------------------------------------------------------------
#> gen_extr_val
#> gen_extr_val: 0.03 sec elapsed
#> ------------------------------------------------------------------------
#> gen_extr_val_location
#> gen_extr_val_location: 0.05 sec elapsed
#> ------------------------------------------------------------------------
#> fatigue2_scale
#> fatigue2_scale: 0 sec elapsed
#> ------------------------------------------------------------------------
#> fatigue3_scale
#> fatigue3_scale: 0.06 sec elapsed
#> ------------------------------------------------------------------------
#> fatigue3_location
#> fatigue3_location: 0.07 sec elapsed
#> ------------------------------------------------------------------------
#> fold_norm_scale
#> fold_norm_scale: 0.01 sec elapsed
#> ------------------------------------------------------------------------
#> fold_norm_location
#> fold_norm_location: 0.06 sec elapsed
#> ------------------------------------------------------------------------
#> feller_pareto
#> feller_pareto: 0.11 sec elapsed
#> ------------------------------------------------------------------------
#> frechet
#> frechet: 0.03 sec elapsed
#> ------------------------------------------------------------------------
#> gamma_gompertz
#> Error in gamma_gompertz model : Error in optim(method = "Nelder-Mead", par = c(b = -5.08140436498446, : non-finite finite-difference value [1]
#> gamma_gompertz: 0.1 sec elapsed
#> ------------------------------------------------------------------------
#> gumbel
#> gumbel: 0.01 sec elapsed
#> ------------------------------------------------------------------------
#> gumbel_T2
#> gumbel_T2: 0.02 sec elapsed
#> ------------------------------------------------------------------------
#> hypertab_a
#> hypertab_a: 0.02 sec elapsed
#> ------------------------------------------------------------------------
#> hypertab_b
#> hypertab_b: 0.01 sec elapsed
#> ------------------------------------------------------------------------
#> inv_chisq
#> inv_chisq: 0 sec elapsed
#> ------------------------------------------------------------------------
#> inv_gamma
#> inv_gamma: 0.02 sec elapsed
#> ------------------------------------------------------------------------
#> inv_gaussian_scale
#> inv_gaussian_scale: 0.03 sec elapsed
#> ------------------------------------------------------------------------
#> inv_gaussian_location
#> inv_gaussian_location: 0.03 sec elapsed
#> ------------------------------------------------------------------------
#> inv_lind
#> inv_lind: 0.01 sec elapsed
#> ------------------------------------------------------------------------
#> laplace
#> laplace: 0.02 sec elapsed
#> ------------------------------------------------------------------------
#> levy
#> Error in levy model : Error in optim(method = "Nelder-Mead", par = c(location = 0, scale = 3.13549421592915, : non-finite finite-difference value [1]
#> levy: 0.02 sec elapsed
#> ------------------------------------------------------------------------
#> lindley
#> lindley: 0 sec elapsed
#> ------------------------------------------------------------------------
#> log_cauchy_location
#> log_cauchy_location: 0.01 sec elapsed
#> ------------------------------------------------------------------------
#> lomax_scale
#> lomax_scale: 0.03 sec elapsed
#> ------------------------------------------------------------------------
#> lomax_shape
#> lomax_shape: 0.02 sec elapsed
#> ------------------------------------------------------------------------
#> nakagami
#> nakagami: 0.01 sec elapsed
#> ------------------------------------------------------------------------
#> pareto_1
#> Error in pareto_1 model : Error in optim(method = "Nelder-Mead", par = c(scale = 1.38629436111989, : non-finite finite-difference value [1]
#> pareto_1: 0.04 sec elapsed
#> ------------------------------------------------------------------------
#> pareto_2
#> Error in pareto_2 model : Error in optim(method = "Nelder-Mead", par = c(location = 1.6094379124341, : non-finite finite-difference value [1]
#> pareto_2: 0.04 sec elapsed
#> ------------------------------------------------------------------------
#> pareto_3
#> Error in pareto_3 model : Error in optim(method = "Nelder-Mead", par = c(location = 1.6094379124341, : non-finite finite-difference value [1]
#> pareto_3: 0.06 sec elapsed
#> ------------------------------------------------------------------------
#> pareto_4
#> Error in pareto_4 model : Error in optim(method = "Nelder-Mead", par = c(location = 1.6094379124341, : non-finite finite-difference value [1]
#> pareto_4: 0.08 sec elapsed
#> ------------------------------------------------------------------------
#> rayleigh
#> rayleigh: 0 sec elapsed
#> ------------------------------------------------------------------------
#> rice_scale
#> rice_scale: 0.02 sec elapsed
#> ------------------------------------------------------------------------
#> rice_location
#> rice_location: 0.01 sec elapsed
#> ------------------------------------------------------------------------
#> shift_gomp
#> shift_gomp: 0.02 sec elapsed
#> ------------------------------------------------------------------------
#> sinmad
#> sinmad: 0.03 sec elapsed
#> ------------------------------------------------------------------------
#> scale_inv_chisq
#> scale_inv_chisq: 0.02 sec elapsed
#> ------------------------------------------------------------------------
#> Final Summary!
#> Model with best AIC: inv_lind
#> Model with best BIC: inv_lind
#> Kapow!: 1.93 sec elapsed
# Summary Table
aml_model$summary
#> dist_name dist_source dist_ran aic bic loglik
#> 1 inv_lind fssg TRUE 162.5793 164.8503 -79.28967
#> 2 inv_gaussian_location fssg TRUE 162.8084 166.2149 -78.40420
#> 3 scale_inv_chisq fssg TRUE 163.1561 166.5626 -78.57806
#> 4 inv_gamma fssg TRUE 163.1561 166.5626 -78.57806
#> 5 gumbel_T2 fssg TRUE 163.2167 166.6232 -78.60835
#> 6 fatigue2_scale fssg TRUE 163.2254 166.6319 -78.61270
#> 7 lnorm flexsurv TRUE 163.8552 167.2617 -78.92761
#> 8 hypertab_b fssg TRUE 164.0928 167.4993 -79.04641
#> 9 hypertab_a fssg TRUE 164.6250 168.0315 -79.31252
#> 10 llogis flexsurv TRUE 164.7053 168.1118 -79.35266
#> 11 lindley fssg TRUE 165.0135 167.2845 -80.50675
#> 12 gengamma flexsurv TRUE 165.1502 169.6921 -78.57508
#> 13 betaprime fssg TRUE 165.1612 169.7032 -78.58061
#> 14 frechet fssg TRUE 165.1707 169.7127 -78.58536
#> 15 dagum fssg TRUE 165.2174 169.7593 -78.60868
#> 16 inv_gaussian_scale fssg TRUE 166.0326 169.4391 -80.01629
#> 17 gen_extr_val_location fssg TRUE 166.2521 170.7941 -79.12605
#> 18 gamma flexsurv TRUE 166.2720 169.6785 -80.13601
#> 19 gengamma.orig flexsurv TRUE 166.3323 170.8743 -79.16617
#> 20 exp flexsurv TRUE 166.5746 168.8456 -81.28729
#> 21 sinmad fssg TRUE 166.5899 171.1319 -79.29494
#> 22 cauchy_location fssg TRUE 166.6615 170.0680 -80.33077
#> 23 cauchy_scale fssg TRUE 167.0362 170.4426 -80.51808
#> 24 weibull flexsurv TRUE 167.0433 170.4498 -80.52165
#> 25 weibullPH flexsurv TRUE 167.0433 170.4498 -80.52165
#> 26 laplace fssg TRUE 167.1507 170.5572 -80.57534
#> 27 genf flexsurv TRUE 167.1508 172.8283 -78.57542
#> 28 genf.orig flexsurv TRUE 167.1518 172.8293 -78.57591
#> 29 gumbel fssg TRUE 167.2659 170.6724 -80.63297
#> 30 shift_gomp fssg TRUE 167.6457 171.0522 -80.82285
#> 31 feller_pareto fssg TRUE 168.3251 175.1381 -78.16257
#> 32 gen_extr_val fssg TRUE 168.4282 172.9701 -80.21409
#> 33 nakagami fssg TRUE 168.4514 171.8579 -81.22571
#> 34 gompertz flexsurv TRUE 168.4822 171.8886 -81.24108
#> 35 lomax_shape fssg TRUE 168.5243 171.9308 -81.26216
#> 36 exp_log fssg TRUE 168.5765 171.9830 -81.28824
#> 37 lomax_scale fssg TRUE 168.5771 171.9836 -81.28857
#> 38 log_cauchy_location fssg TRUE 169.8337 173.2402 -81.91684
#> 39 fold_norm_scale fssg TRUE 169.8585 175.5360 -79.92926
#> 40 fatigue3_scale fssg TRUE 170.6104 175.1524 -81.30521
#> 41 rayleigh fssg TRUE 173.4173 175.6883 -84.70864
#> 42 fatigue3_location fssg TRUE 173.7757 178.3177 -82.88787
#> 43 rice_scale fssg TRUE 175.4175 178.8240 -84.70875
#> 44 fold_norm_location fssg TRUE 176.1256 181.8031 -83.06279
#> 45 rice_location fssg TRUE 189.1910 192.5975 -91.59550
#> 46 inv_chisq fssg TRUE 197.1167 199.3877 -96.55836
#> 47 gamma_gompertz fssg FALSE NA NA NA
#> 48 levy fssg FALSE NA NA NA
#> 49 pareto_1 fssg FALSE NA NA NA
#> 50 pareto_2 fssg FALSE NA NA NA
#> 51 pareto_3 fssg FALSE NA NA NA
#> 52 pareto_4 fssg FALSE NA NA NA
# List of Models
head(aml_model$models)
#> $genf
#> Call:
#> flexsurv::flexsurvreg(formula = formula, data = data, dist = current_dist,
#> method = opt_method)
#>
#> Estimates:
#> data mean est L95% U95% se
#> mu NA 3.26e+00 2.37e+00 4.16e+00 4.58e-01
#> sigma NA 8.27e-01 5.47e-01 1.25e+00 1.74e-01
#> Q NA -7.45e-01 -2.59e+00 1.10e+00 9.43e-01
#> P NA 6.19e-04 6.44e-50 5.94e+42 3.34e-02
#> xNonmaintained 5.22e-01 -7.02e-01 -1.39e+00 -1.41e-02 3.51e-01
#> exp(est) L95% U95%
#> mu NA NA NA
#> sigma NA NA NA
#> Q NA NA NA
#> P NA NA NA
#> xNonmaintained 4.96e-01 2.49e-01 9.86e-01
#>
#> N = 23, Events: 18, Censored: 5
#> Total time at risk: 678
#> Log-likelihood = -78.57542, df = 5
#> AIC = 167.1508
#>
#>
#> $genf.orig
#> Call:
#> flexsurv::flexsurvreg(formula = formula, data = data, dist = current_dist,
#> method = opt_method)
#>
#> Estimates:
#> data mean est L95% U95% se
#> mu NA 3.26e+00 2.37e+00 4.16e+00 4.58e-01
#> sigma NA 1.11e+00 7.25e-02 1.70e+01 1.55e+00
#> s1 NA 1.34e+03 3.00e-27 5.95e+32 4.66e+04
#> s2 NA 1.80e+00 1.26e-02 2.59e+02 4.57e+00
#> xNonmaintained 5.22e-01 -7.02e-01 -1.39e+00 -1.40e-02 3.51e-01
#> exp(est) L95% U95%
#> mu NA NA NA
#> sigma NA NA NA
#> s1 NA NA NA
#> s2 NA NA NA
#> xNonmaintained 4.96e-01 2.49e-01 9.86e-01
#>
#> N = 23, Events: 18, Censored: 5
#> Total time at risk: 678
#> Log-likelihood = -78.57591, df = 5
#> AIC = 167.1518
#>
#>
#> $gengamma
#> Call:
#> flexsurv::flexsurvreg(formula = formula, data = data, dist = current_dist,
#> method = opt_method)
#>
#> Estimates:
#> data mean est L95% U95% se exp(est)
#> mu NA 3.2633 2.3662 4.1603 0.4577 NA
#> sigma NA 0.8274 0.5476 1.2501 0.1742 NA
#> Q NA -0.7449 -2.5931 1.1032 0.9430 NA
#> xNonmaintained 0.5217 -0.7021 -1.3902 -0.0141 0.3510 0.4955
#> L95% U95%
#> mu NA NA
#> sigma NA NA
#> Q NA NA
#> xNonmaintained 0.2490 0.9860
#>
#> N = 23, Events: 18, Censored: 5
#> Total time at risk: 678
#> Log-likelihood = -78.57508, df = 4
#> AIC = 165.1502
#>
#>
#> $gengamma.orig
#> Call:
#> flexsurv::flexsurvreg(formula = formula, data = data, dist = current_dist,
#> method = opt_method)
#>
#> Estimates:
#> data mean est L95% U95% se
#> shape NA 2.46e-01 6.59e-02 9.22e-01 1.66e-01
#> scale NA 1.33e-04 1.89e-16 9.35e+07 1.85e-03
#> k NA 2.24e+01 1.60e+00 3.13e+02 3.01e+01
#> xNonmaintained 5.22e-01 -7.55e-01 -1.51e+00 -1.37e-03 3.84e-01
#> exp(est) L95% U95%
#> shape NA NA NA
#> scale NA NA NA
#> k NA NA NA
#> xNonmaintained 4.70e-01 2.21e-01 9.99e-01
#>
#> N = 23, Events: 18, Censored: 5
#> Total time at risk: 678
#> Log-likelihood = -79.16617, df = 4
#> AIC = 166.3323
#>
#>
#> $exp
#> Call:
#> flexsurv::flexsurvreg(formula = formula, data = data, dist = current_dist,
#> method = opt_method)
#>
#> Estimates:
#> data mean est L95% U95% se exp(est)
#> rate NA 0.01655 0.00789 0.03471 0.00625 NA
#> xNonmaintained 0.52174 0.95809 0.01046 1.90572 0.48349 2.60672
#> L95% U95%
#> rate NA NA
#> xNonmaintained 1.01052 6.72428
#>
#> N = 23, Events: 18, Censored: 5
#> Total time at risk: 678
#> Log-likelihood = -81.28729, df = 2
#> AIC = 166.5746
#>
#>
#> $weibull
#> Call:
#> flexsurv::flexsurvreg(formula = formula, data = data, dist = current_dist,
#> method = opt_method)
#>
#> Estimates:
#> data mean est L95% U95% se exp(est)
#> shape NA 1.264 0.892 1.793 0.225 NA
#> scale NA 60.889 33.828 109.599 18.260 NA
#> xNonmaintained 0.522 -0.929 -1.679 -0.180 0.383 0.395
#> L95% U95%
#> shape NA NA
#> scale NA NA
#> xNonmaintained 0.187 0.836
#>
#> N = 23, Events: 18, Censored: 5
#> Total time at risk: 678
#> Log-likelihood = -80.52165, df = 3
#> AIC = 167.0433In many cases, the user may only want to try a subset of the
available models. This can be done by giving fssg only the
names of the desired distributions. These should be strings and
correspond to either the name or fullname
field of an included distribution object.
fssg(
survival::Surv(time, death) ~ gender,
data = pseudo,
models = c('gamma_gompertz','gompertz','gumbel','gumbel_T2')
)$summary
#> Beginning...
#> ------------------------------------------------------------------------
#> gompertz
#> gompertz: 0.23 sec elapsed
#> ------------------------------------------------------------------------
#> gamma_gompertz
#> gamma_gompertz: 3.33 sec elapsed
#> ------------------------------------------------------------------------
#> gumbel
#> gumbel: 0.58 sec elapsed
#> ------------------------------------------------------------------------
#> gumbel_T2
#> gumbel_T2: 1.22 sec elapsed
#> ------------------------------------------------------------------------
#> Final Summary!
#> Model with best AIC: gamma_gompertz
#> Model with best BIC: gamma_gompertz
#> Kapow!: 5.47 sec elapsed
#> dist_name dist_source dist_ran aic bic loglik
#> 1 gamma_gompertz fssg TRUE 47562.28 47591.12 -23777.14
#> 2 gumbel fssg TRUE 47737.97 47759.60 -23865.99
#> 3 gumbel_T2 fssg TRUE 48181.63 48203.26 -24087.81
#> 4 gompertz flexsurv TRUE 48206.60 48228.23 -24100.30Due to the large number of distributions considered in the
fssg function, it is likely that one or more of them will
fail to work with the provided data. This is typically due to one of
three reasons:
optim perturbs a parameters, but the new
perturbed value leads to invalid results such as NaN or
Inf.fssg includes a basic diagnostic function for checking
whether the init function for a distribution will fail or
not based on specific data. This checks if the initial distribution
parameters provided by init will allow the distribution
function to return valid values for each observed time values.
Each model created by fssg can be interacted with in the
same way you would interact with flexsurv models. Here’s a
quick example of how to look into a specific model using the Gumbel
distribution.
fssg(survival::Surv(time, status) ~ age + sex,
data = survival::cancer,
models = c('gamma_gompertz','gompertz','gumbel','gumbel_T2'), progress = F) -> output
#> Beginning...
#> ------------------------------------------------------------------------
#> Final Summary!
#> Model with best AIC: gamma_gompertz
#> Model with best BIC: gompertz
#> Kapow!: 0.5 sec elapsed
output$models$gumbel
#> Call:
#> flexsurv::flexsurvreg(formula = formula, data = data, dist = i,
#> dfns = list(d = i$d, p = i$p), method = opt_method)
#>
#> Estimates:
#> data mean est L95% U95% se exp(est)
#> mu NA 161.71987 88.62247 234.81728 37.29528 NA
#> sigma NA 244.44378 65.47783 912.56476 164.28841 NA
#> age 62.44737 -0.00878 -0.02685 0.00930 0.00922 0.99126
#> sex 1.39474 0.43641 0.11710 0.75573 0.16292 1.54715
#> L95% U95%
#> mu NA NA
#> sigma NA NA
#> age 0.97350 1.00934
#> sex 1.12423 2.12917
#>
#> N = 228, Events: 165, Censored: 63
#> Total time at risk: 69593
#> Log-likelihood = -1148.943, df = 4
#> AIC = 2305.887Sex appears to be a significant variable here, as the confidence interval does not contain zero. Let’s see what this exp(est) of 1.56 amounts to visually.
plot(
output$models$gumbel,
newdata = data.frame(sex=c(1,2), age=70), col=c('blue','red'),
type='survival',
lwd=2,
)
legend(
'topright',
legend = c('Male','Female'),
col =c('blue','red'),
lwd=2
)It looks like males have a considerably higher hazard rate over time according to this model.
On the other hand, age is not significantly associated with survival time.
plot(
output$models$gumbel,
newdata = data.frame(sex=c(1), age=c(65,80)), col=c('blue','red'),
type='survival',
lwd=2
)
legend(
'topright',
legend = c('Age 65','Age 80'),
col =c('blue','red'),
lwd=2
)This makes sense, as the two survival lines appear to be very similar.
fssg also includes some options for building spline
models using flexsurvspline. fssg includes two
arguments which are used to build splines:
spline: By default this is NA, but will create spline
models when supplied with the name(s) of the desired spline-methods.
Include ‘rp’ or ‘wy’ for Royston-Parmar natural cubic spline, or
Wang-Yan alternative natural cubic spline respectively. Note: The
Wang-Yan version requires the package splines2ns.max_knots: An integer specifying the maximum number of
knots to be considered in the spline models. fssg will
attempt to fit splines using every integer of knots from 1 to
max_knots. Knots are placed at equally spaced quantiles
based on the survival time distribution.It’s worth noting that flexsurvspline allows for three
different scales to be used in the spline estimation: ‘hazard’, ‘odds’
and ‘normal’. fssg will attempt to build splines using each
of these scales, and labels to output model accordingly.
fssg(
survival::Surv(time, status) ~ x,
data = survival::aml,
models = c('weibull'),
spline = c('rp','wy'), # include both methods
max_knots = 3 # attempt up to 3 knots
) -> spline_models
#> Beginning...
#> ------------------------------------------------------------------------
#> weibull
#> weibull: 0 sec elapsed
#> Loading required namespace: splines2
#> ------------------------------------------------------------------------
#> Spline
#> Error in spline_rp_hazard_2 model : Error in optim(method = "Nelder-Mead", par = c(gamma0 = -4.78433978320387, : function cannot be evaluated at initial parameters
#> Error in spline_rp_hazard_3 model : Error in optim(method = "Nelder-Mead", par = c(gamma0 = -5.05412785907675, : function cannot be evaluated at initial parameters
#> Error in spline_wy_hazard_2 model : Error in optim(method = "Nelder-Mead", par = c(gamma0 = -2.26513881676761, : function cannot be evaluated at initial parameters
#> Error in spline_wy_hazard_3 model : Error in optim(method = "Nelder-Mead", par = c(gamma0 = -2.27523490522575, : function cannot be evaluated at initial parameters
#> weibull: 1.55 sec elapsed
#> ------------------------------------------------------------------------
#> Final Summary!
#> Model with best AIC: spline_wy_normal_1
#> Model with best BIC: spline_wy_normal_1
#> Kapow!: 1.66 sec elapsed
spline_models$summary
#> dist_name dist_source dist_ran aic bic loglik
#> 1 spline_wy_normal_1 flexsurv TRUE 165.8190 170.3610 -78.90950
#> 2 spline_rp_normal_1 flexsurv TRUE 165.8190 170.3610 -78.90950
#> 3 spline_wy_normal_3 flexsurv TRUE 166.0324 172.8454 -77.01621
#> 4 spline_rp_normal_3 flexsurv TRUE 166.0325 172.8454 -77.01623
#> 5 spline_rp_odds_1 flexsurv TRUE 166.7050 171.2469 -79.35248
#> 6 spline_wy_odds_1 flexsurv TRUE 166.7050 171.2469 -79.35248
#> 7 spline_wy_odds_3 flexsurv TRUE 166.7448 173.5578 -77.37241
#> 8 spline_rp_odds_3 flexsurv TRUE 166.7448 173.5578 -77.37241
#> 9 spline_rp_normal_2 flexsurv TRUE 167.0326 172.7101 -78.51630
#> 10 spline_wy_normal_2 flexsurv TRUE 167.0326 172.7101 -78.51630
#> 11 weibull flexsurv TRUE 167.0433 170.4498 -80.52165
#> 12 spline_wy_odds_2 flexsurv TRUE 167.1662 172.8437 -78.58309
#> 13 spline_rp_odds_2 flexsurv TRUE 167.1662 172.8437 -78.58310
#> 14 spline_rp_hazard_1 flexsurv TRUE 167.5169 172.0588 -79.75843
#> 15 spline_wy_hazard_1 flexsurv TRUE 167.5169 172.0588 -79.75843
# Best model has 1 knot
# plot(spline_models$models$spline_wy_normal_1)
# Model with 3 knots
# plot(spline_models$models$spline_wy_normal_3)By default, fssg provides AIC, BIC, and Log Likelihood
for the models in the summary table.
# Example using our models on the aml dataset
head(aml_model$summary)
#> dist_name dist_source dist_ran aic bic loglik
#> 1 inv_lind fssg TRUE 162.5793 164.8503 -79.28967
#> 2 inv_gaussian_location fssg TRUE 162.8084 166.2149 -78.40420
#> 3 scale_inv_chisq fssg TRUE 163.1561 166.5626 -78.57806
#> 4 inv_gamma fssg TRUE 163.1561 166.5626 -78.57806
#> 5 gumbel_T2 fssg TRUE 163.2167 166.6232 -78.60835
#> 6 fatigue2_scale fssg TRUE 163.2254 166.6319 -78.61270Included in this package is a function that compiles a wide variety of fit statistics for a model, including concordance, AUCs, Mean Absolute Error (MAE), Integrated Absolute Error (IAE), Integrated Square Error (ISE), Brier Scores, and optionally the very slow to calculate Integrated Brier Scores.
get_fit_stats(aml_model$models$inv_lind, ibs=T) -> fitstats
#> Loading required namespace: survAUC
#> Loading required namespace: SurvMetrics
fitstats
#> $Harrel.C.Index
#> [1] 0.6190476
#>
#> $Uno.C.Index
#> [1] 0.6144108
#>
#> $iAUC
#> [1] 0.689442
#>
#> $iAUC.IQR
#> [1] 0.6506163
#>
#> $iAUC.Q10.Q90
#> [1] 0.6942003
#>
#> $iAUC.Full
#> [1] 0.689442
#>
#> $AUC.Median
#> [1] 0.6097697
#>
#> $AUC.Mean
#> [1] 0.584504
#>
#> $C.Index
#> [1] NA
#>
#> $MAE
#> [1] NA
#>
#> $IAE
#> [1] 2.9773
#>
#> $ISE
#> [1] 0.3439
#>
#> $IAE.IQR
#> [1] 3.7387
#>
#> $ISE.IQR
#> [1] 0.5323
#>
#> $IAE.Q10.Q90
#> [1] 3.11
#>
#> $ISE.Q10.Q90
#> [1] 0.3611
#>
#> $IAE.Full
#> [1] 2.9773
#>
#> $ISE.Full
#> [1] 0.3439
#>
#> $Brier.Median
#> [1] 0.405603
#>
#> $Brier.Mean
#> [1] 0.232592
#>
#> $IBS
#> [1] 0.146813
#>
#> $IBS.Full
#> [1] 0.107497More specifics on these metrics can be found in the vignette “Fit_Statistics”.