NEWS.md
Lrnr_nnls to support binary outcomes, including support for convexity of the resultant model fit and warnings on prediction quality.Lrnr_define_interactions
Lrnr_bound to better support more flexible bounding for continuous outcomes (automatically setting a maximum of infinity).Lrnr_cv_selector to support improved computation of the CV-risk, averaging the risk strictly across validation/holdout sets.Lrnr_earth (improving formals recognition), Lrnr_glmnet (allowing offsets), and Lrnr_caret (reformatting of arguments).Lrnr_lstm_keras and Lrnr_gru_keras provide support for callback functions list and 2-layer networks. Default callbacks list provides early stopping criteria with respect to ‘Keras’ defaults and patience of 10 epochs. Also, these two ‘Keras’ learners now call args_to_list upon initialization, and set verbose argument according to options("keras.fit_verbose") or options("sl3.verbose").Lrnr_xgboost to support prediction tasks consisting of one observation (e.g., leave-one-out cross-validation).Lrnr_sl by adding a new private slot .cv_risk to store the risk estimates, using this to avoid unnecessary re-computation in the print method (the .cv_risk slot is populated on the first print call, and only ever re-printed thereafter).default_metalearner to use native markdown tables.Lrnr_screener_importance’s pairing of (a) covariates returned by the importance function with (b) covariates as they are defined in the task. This issue only arose when discrete covariates were automatically one-hot encoded upon task initiation (i.e., when colnames(task$X) != task$nodes$covariates).importance_plot to plot variables in decreasing order of importance, so most important variables are placed at the top of the dotchart.sl3 task’s add_interactions method to support interactions that involve factors. This method is most commonly used by Lrnr_define_interactions, which is intended for use with another learner (e.g., Lrnr_glmnet or Lrnr_glm) in a Pipeline.Lrnr_gam formula (if not specified by user) to not use mgcv’s default k=10 degrees of freedom for each smooth s term when there are less than k=10 degrees of freedom. This bypasses an mgcv::gam error, and tends to be relevant only for small n.options(java.parameters = "-Xmx2500m") and warning message when Lrnr_bartMachine is initialized, if this option has not already been set. This option was incorporated since the default RAM of 500MB for a Java virtual machine often errors due to memory issues with Lrnr_bartMachine.stratify_cv argument in Lrnr_glmnet, which stratifies internal cross-validation folds such that binary outcome prevalence in training and validation folds roughly matches the prevalence in the training task.min_screen argument Lrnr_screener_coefs, which tries to ensure that at least min_screen number of covariates are selected. If this argument is specified and the learner argument in Lrnr_screener_coefs is a Lrnr_glmnet, then lambda is increased until min_screen number of covariates are selected and a warning is produced. If min_screen is specified and the learner argument in Lrnr_screener_coefs is not a Lrnr_glmnet then it will error.Lrnr_hal9001 to work with v0.4.0 of the hal9001 package.formula parameter and process_formula function to the base learner, Lrnr_base, whose methods carry over to all other learners. When a formula is supplied as a learner parameter, the process_formula function constructs a design matrix by supplying theformulatomodel.matrix. This implementation allowsformulato be supplied to all learners, even those without nativeformulasupport. Theformulashould be an object of class "formula`", or a character string that can be coerced to that class.ROCR performance measures custom_ROCR_risk. Supports cutoff-dependent and scalar ROCR performance measures. The risk is defined as 1 - performance, and is transformed back to the performance measure in cv_risk and importance functions. This change prompted the revision of argument name loss_fun and loss_function to eval_fun and eval_function, respectively, since the evaluation of predictions relative to the observations can be either a risk or a loss function. This argument name change impacted the following: Lrnr_solnp, Lrnr_optim, Lrnr_cv_selector, cv_risk, importance, and CV_Lrnr_sl.cv_risk and importance tables now swap “risk” with this name attribute.folds are not supplied to the sl3_Task and the outcome is a discrete (i.e., binary or categorical) variable.importance method the option to evaluate importance over covariate_groups, by removing/permuting all covariates in the same group together.Lrnr_ga as another metalearner.importance_plot to summarize variable importance findings.reparameterize and retrain to Lrnr_base, which allows modification of the covariate set while training on a conserved task and prediction on a new task using previously trained learners, respectively.Lrnr_hal9001 and Lrnr_glmnet to respect observation-level IDs.Remotes and deprecation of Lrnr_rfcde and Lrnr_condensier:
Lrnr_rfcde wrapped https://github.com/tpospisi/RFCDE, a sporadically maintained tool for conditional density estimation (CDE). Support for this has been removed in favor of built-in CDE tools, including, among others, Lrnr_density_semiparametric.Lrnr_condensier wrapped https://github.com/osofr/condensier, which provided a pooled hazards approach to CDE. This package contained an implementation error (https://github.com/osofr/condensier/issues/15) and was removed from CRAN. Support for this has been removed in favor of Lrnr_density_semiparametric and Lrnr_haldensify, both of which more reliably provide CDE support.Stack objects for time series learners.README.Rmd.Lrnr_nnls.NAs.gam and caret packages.gbm, earth, polspline packages.xgboost and ranger).