Performs balanced bootstrap (or bootknife) resampling of clustered data and
calculates bootstrap bias, standard errors and confidence intervals.
-- Function File: bootclust (DATA)
-- Function File: bootclust (DATA, NBOOT)
-- Function File: bootclust (DATA, NBOOT, BOOTFUN)
-- Function File: bootclust ({D1, D2, ...}, NBOOT, BOOTFUN)
-- Function File: bootclust (DATA, NBOOT, {BOOTFUN, ...})
-- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA)
-- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID)
-- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTSZ)
-- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID, LOO)
-- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID, LOO, SEED)
-- Function File: STATS = bootclust (...)
-- Function File: [STATS, BOOTSTAT] = bootclust (...)
'bootclust (DATA)' uses nonparametric balanced bootstrap resampling
to generate 1999 resamples from clusters of rows of the DATA (column
vector or matrix). By default, each rows is it's own cluster (i.e. no
clustering). The means of the resamples are then computed and the
following statistics are displayed:
- original: the original estimate(s) calculated by BOOTFUN and the DATA
- bias: bootstrap bias of the estimate(s)
- std_error: bootstrap standard error of the estimate(s)
- CI_lower: lower bound(s) of the 95% bootstrap confidence interval
- CI_upper: upper bound(s) of the 95% bootstrap confidence interval
'bootclust (DATA, NBOOT)' specifies the number of bootstrap resamples,
where NBOOT is a scalar, positive integer corresponding to the number
of bootstrap resamples. THe default value of NBOOT is the scalar: 1999.
'bootclust (DATA, NBOOT, BOOTFUN)' also specifies BOOTFUN: the function
calculated on the original sample and the bootstrap resamples. BOOTFUN
must be either a:
<> function handle or anonymous function,
<> string of function name, or
<> a cell array where the first cell is one of the above function
definitions and the remaining cells are (additional) input arguments
to that function (other than the data arguments).
In all cases BOOTFUN must take DATA for the initial input argument(s).
BOOTFUN can return a scalar or any multidimensional numeric variable,
but the output will be reshaped as a column vector. BOOTFUN must
calculate a statistic representative of the finite data sample; it
should NOT be an estimate of a population parameter (unless they are
one of the same). If BOOTFUN is @mean or 'mean', narrowness bias of
the confidence intervals for single bootstrap are reduced by expanding
the probabilities of the percentiles using Student's t-distribution
[1]. By default, BOOTFUN is @mean.
'bootclust ({D1, D2, ...}, NBOOT, BOOTFUN)' resamples from the clusters
of rows of the data vectors D1, D2 etc and the resamples are passed onto
BOOTFUN as multiple data input arguments. All data vectors and matrices
(D1, D2 etc) must have the same number of rows.
'bootclust (DATA, NBOOT, BOOTFUN, ALPHA)', where ALPHA is numeric
and sets the lower and upper bounds of the confidence interval(s). The
value(s) of ALPHA must be between 0 and 1. ALPHA can either be:
<> scalar: To set the (nominal) central coverage of equal-tailed
percentile confidence intervals to 100*(1-ALPHA)%.
<> vector: A pair of probabilities defining the (nominal) lower and
upper percentiles of the confidence interval(s) as
100*(ALPHA(1))% and 100*(ALPHA(2))% respectively. The
percentiles are bias-corrected and accelerated (BCa) [2].
The default value of ALPHA is the vector: [.025, .975], for a 95%
BCa confidence interval.
'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID)' also sets CLUSTID,
which are identifiers that define the grouping of the DATA rows for
cluster bootstrap case resampling. CLUSTID should be a column vector or
cell array with the same number of rows as the DATA. Rows in DATA with
the same CLUSTID value are treated as clusters of observations that are
resampled together.
'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTSZ)' groups consecutive
DATA rows into clusters of length CLUSTSZ. This is equivalent to block
bootstrap resampling. By default, CLUSTSZ is 1.
'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID, LOO)' sets the
resampling method. If LOO is false, the resampling method used is
balanced bootstrap resampling. If LOO is true, the resampling method used
is balanced bootknife resampling [3]. Where N is the number of clusters,
bootknife cluster resampling involves creating leave-one-out jackknife
samples of size N - 1, and then drawing resamples of size N with
replacement from the jackknife samples, thereby incorporating Bessel's
correction into the resampling procedure. LOO must be a scalar logical
value. The default value of LOO is false.
'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID, LOO, SEED)' initialises
the Mersenne Twister random number generator using an integer SEED value
so that bootclust results are reproducible.
'STATS = bootclust (...)' returns a structure with the following fields
(defined above): original, bias, std_error, CI_lower, CI_upper.
'[STATS, BOOTSTAT] = bootclust (...)' returns BOOTSTAT, a vector or matrix
of bootstrap statistics calculated over the bootstrap resamples.
REQUIREMENTS:
The function file boot.m (or better boot.mex) and bootcdf, which are
distributed with the statistics-resampling package.
BIBLIOGRAPHY:
[1] Hesterberg, Tim (2014), What Teachers Should Know about the
Bootstrap: Resampling in the Undergraduate Statistics Curriculum,
http://arxiv.org/abs/1411.5279
[2] Efron, and Tibshirani (1993) An Introduction to the Bootstrap.
New York, NY: Chapman & Hall
[3] Hesterberg T.C. (2004) Unbiasing the Bootstrap—Bootknife Sampling
vs. Smoothing; Proceedings of the Section on Statistics & the
Environment. Alexandria, VA: American Statistical Association.
bootclust (version 2023.09.20)
Author: Andrew Charles Penn
https://www.researchgate.net/profile/Andrew_Penn/
Copyright 2019 Andrew Charles Penn
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41].';
% 95% expanded BCa bootstrap confidence intervals for the mean
bootclust (data, 1999, @mean);
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: mean Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Expanded bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 95% (1.1%, 97.3%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +29.65 +3.553e-15 +2.514 +23.66 +34.35
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41].';
clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ...
'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'};
% 95% expanded BCa bootstrap confidence intervals for the mean with
% cluster resampling
bootclust (data, 1999, @mean, [0.025,0.975], clustid);
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: mean Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Expanded bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 95% (1.1%, 98.8%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +29.65 -0.05243 +2.947 +22.59 +35.90
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41].';
% 90% equal-tailed percentile bootstrap confidence intervals for
% the variance
bootclust (data, 1999, {@var, 1}, 0.1);
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: var Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Percentile (equal-tailed) Nominal coverage (and the percentiles used): 90% (5.0%, 95.0%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +171.5 -6.453 +42.73 +96.47 +237.0
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41].';
clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ...
'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'};
% 90% equal-tailed percentile bootstrap confidence intervals for
% the variance
bootclust (data, 1999, {@var, 1}, 0.1, clustid);
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: var Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Percentile (equal-tailed) Nominal coverage (and the percentiles used): 90% (5.0%, 95.0%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +171.5 -9.464 +33.90 +104.1 +214.8
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41].';
% 90% BCa bootstrap confidence intervals for the variance
bootclust (data, 1999, {@var, 1}, [0.05 0.95]);
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: var Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 90% (12.2%, 98.7%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +171.5 -6.800 +41.10 +115.4 +261.1
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41].';
clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ...
'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'};
% 90% BCa bootstrap confidence intervals for the variance
bootclust (data, 1999, {@var, 1}, [0.05 0.95], clustid);
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: var Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 90% (12.9%, 98.7%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +171.5 -9.876 +32.85 +123.2 +228.9
The following code
% Input dataset
y = randn (20,1); x = randn (20,1); X = [ones(20,1), x];
% 90% BCa confidence interval for regression coefficients
bootclust ({y,X}, 1999, @(y,X) X\y, [0.05 0.95]); % Could also use @regress
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: @(y, X) X \ y Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage: 90% Bootstrap Statistics: original bias std_error CI_lower CI_upper +0.01658 -0.02123 +0.2032 -0.3300 +0.3262 -0.2459 -0.002949 +0.1725 -0.5289 +0.03772
The following code
% Input dataset
y = randn (20,1); x = randn (20,1); X = [ones(20,1), x];
clustid = [1;1;1;1;2;2;2;3;3;3;3;4;4;4;4;4;5;5;5;6];
% 90% BCa confidence interval for regression coefficients
bootclust ({y,X}, 1999, @(y,X) X\y, [0.05 0.95], clustid);
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: @(y, X) X \ y Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage: 90% Bootstrap Statistics: original bias std_error CI_lower CI_upper -0.07271 -0.01387 +0.2734 -0.5017 +0.3980 +0.1061 +0.09242 +0.1964 -0.2119 +0.3748
The following code
% Input bivariate dataset
x = [576 635 558 578 666 580 555 661 651 605 653 575 545 572 594].';
y = [3.39 3.3 2.81 3.03 3.44 3.07 3 3.43 ...
3.36 3.13 3.12 2.74 2.76 2.88 2.96].';
clustid = [1;1;3;1;1;2;2;2;2;3;1;3;3;3;2];
% 95% BCa bootstrap confidence intervals for the correlation coefficient
bootclust ({x, y}, 1999, @cor, [], clustid);
% Please be patient, the calculations will be completed soon...
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: cor Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 95% (1.6%, 96.5%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +0.7764 -0.02425 +0.1437 +0.3909 +0.9902
Package: statistics-resampling