Process data to account for missingness in preparation for TMLE
process_missing( data, node_list, complete_nodes = c("A", "Y"), impute_nodes = NULL, max_p_missing = 0.5 )
| data, |
|
|---|---|
| node_list, |
|
| complete_nodes, |
|
| impute_nodes, |
|
| max_p_missing, |
|
list containing the following elements:
data, the updated dataset
node_list, the updated list of nodes
n_dropped, the number of observations dropped
dropped_cols, the variables dropped due to excessive missingness
Rows where there is missingness in any of the complete_nodes will be
dropped. Then, missingness will be median-imputed for the variables in the impute_nodes.
Indicator variables of missingness will be generated for these nodes.
Then covariates will be processed as follows:
any covariate with more than max_p_missing missingness will be dropped
indicators of missingness will be generated
missing values will be median-imputed