Bootstrap: Resample with replacement to generate new samples and return the
statistic(s) calculated by evaluating the specified function on each resample.
-- Function File: BOOTSTAT = bootstrp (NBOOT, BOOTFUN, D)
-- Function File: BOOTSTAT = bootstrp (NBOOT, BOOTFUN, D1, ..., DN)
-- Function File: BOOTSTAT = bootstrp (..., D1, ..., DN, 'match', MATCH)
-- Function File: BOOTSTAT = bootstrp (..., 'Options', PAROPT)
-- Function File: BOOTSTAT = bootstrp (..., 'Weights', WEIGHTS)
-- Function File: BOOTSTAT = bootstrp (..., 'loo', LOO)
-- Function File: BOOTSTAT = bootstrp (..., 'seed', SEED)
-- Function File: [BOOTSTAT, BOOTSAM] = bootstrp (...)
-- Function File: [BOOTSTAT, BOOTSAM, STATS] = bootstrp (...)
'BOOTSTAT = bootstrp (NBOOT, BOOTFUN, D)' draws NBOOT bootstrap resamples
with replacement from the rows of the data D and returns the statistic
computed by BOOTFUN in BOOTSTAT [1]. BOOTFUN is a function handle (e.g.
specified with @) or name, a string indicating the function name, or a
cell array, where the first cell is one of the above function definitions
and the remaining cells are (additional) input arguments to that function
(after the data argument(s)). The third input argument is the data
(column vector, matrix or cell array), which is supplied to BOOTFUN. This
function is the only function in the statistics-resampling package to also
accept cell arrays for the data arguments. The simulation method used by
default is bootstrap resampling with first order balance [2-3]; see help
for the 'boot' function for more information.
'BOOTSTAT = bootstrp (NBOOT, BOOTFUN, D1,...,DN)' is as above except
that the third and subsequent input arguments are multiple data objects,
(column vectors, matrices or cell arrays,) which are used as input for
BOOTFUN.
'BOOTSTAT = bootstrp (..., D1, ..., DN, 'match', MATCH)' controls the
resampling strategy when multiple data arguments are provided. When MATCH
is true, row indices of D1 to DN are the same (i.e. matched) for each
resample. This is the default strategy when D1 to DN all have the same
number of rows. If MATCH is set to false, then row indices are resampled
independently for D1 to DN in each of the resamples. When any of the data
D1 to DN, have a different number of rows, this input argument is ignored
and MATCH is enforced to have a value of false. Note that the MATLAB
bootstrp function only operates in a mode equivalent to MATCH = true.
One application of setting MATCH to false is to perform stratified
bootstrap resampling.
'BOOTSTAT = bootstrp (..., 'Options', PAROPT)' specifies options that
govern if and how to perform bootstrap iterations using multiple
processors (if the Parallel Computing Toolbox or Octave Parallel package).
is available This argument is a structure with the following recognised
fields:
o 'UseParallel': If true, use parallel processes to accelerate
bootstrap computations on multicore machines.
Default is false for serial computation. In MATLAB,
the default is true if a parallel pool
has already been started.
o 'nproc': nproc sets the number of parallel processes (optional)
'BOOTSTAT = bootstrp (..., D, 'weights', WEIGHTS)' sets the resampling
weights. WEIGHTS must be a column vector with the same number of rows as
the data, D. If WEIGHTS is empty or not provided, the default is a vector
of length N with uniform weighting 1/N.
'BOOTSTAT = bootstrp (..., D1, ... DN, 'weights', WEIGHTS)' as above if
MATCH is true. If MATCH is false, a 1-by-N cell array of column vectors
can be provided to specify independent resampling weights for D1 to DN.
'BOOTSTAT = bootstrp (..., 'loo', LOO)' sets the simulation method. If
LOO is false, the resampling method used is balanced bootstrap resampling.
If LOO is true, the resampling method used is balanced bootknife
resampling [4]. The latter involves creating leave-one-out (jackknife)
samples of size N - 1, and then drawing resamples of size N with
replacement from the jackknife samples, thereby incorporating Bessel's
correction into the resampling procedure. LOO must be a scalar logical
value. The default value of LOO is false.
'BOOTSTAT = bootstrp (..., 'seed', SEED)' initialises the Mersenne Twister
random number generator using an integer SEED value so that bootci results
are reproducible.
'[BOOTSTAT, BOOTSAM] = bootstrp (...)' also returns indices used for
bootstrap resampling. If MATCH is true or only one data argument is
provided, BOOTSAM is a matrix. If multiple data arguments are provided
and MATCH is false, BOOTSAM is returned in a 1-by-N cell array of
matrices, where each cell corresponds to the respective data argument
D1 to DN. To get the output samples BOOTSAM without applying a function,
set BOOTFUN to empty (i.e. []).
'[BOOTSTAT, BOOTSAM, STATS] = bootstrp (...)' also calculates and returns
the following basic statistics relating to each column of BOOTSTAT:
- original: the original estimate(s) calculated by BOOTFUN and the DATA
- mean: the mean of the bootstrap distribution(s)
- bias: bootstrap estimate of the bias of the sampling distribution(s)
- bias_corrected: original estimate(s) after subtracting the bias
- var: bootstrap variance of the original estimate(s)
- std_error: bootstrap estimate(s) of the standard error(s)
If BOOTSTAT is not numeric, STATS only returns the 'original' field. If
BOOTFUN is empty, then the value of the 'original' field is also empty.
Bibliography:
[1] Efron, and Tibshirani (1993) An Introduction to the
Bootstrap. New York, NY: Chapman & Hall
[2] Davison et al. (1986) Efficient Bootstrap Simulation.
Biometrika, 73: 555-66
[3] Booth, Hall and Wood (1993) Balanced Importance Resampling
for the Bootstrap. The Annals of Statistics. 21(1):286-298
[4] Hesterberg T.C. (2004) Unbiasing the Bootstrap—Bootknife Sampling
vs. Smoothing; Proceedings of the Section on Statistics & the
Environment. Alexandria, VA: American Statistical Association.
bootstrp (version 2024.05.24)
Author: Andrew Charles Penn
https://www.researchgate.net/profile/Andrew_Penn/
Copyright 2019 Andrew Charles Penn
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41]';
% Compute 500 bootstrap statistics for the mean and calculate the bootstrap
% standard error of the mean
bootstat = bootstrp (500, @mean, data, 'seed', 1);
% Or equivalently
bootstat = bootstrp (500, @mean, data, 'seed', 1, 'loo', false);
std (bootstat)
Produces the following output
ans = 2.5977
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41]';
% Compute 500 bootknife statistics for the mean and calculate the unbiased
% bootstrap standard error of the mean
bootstat = bootstrp (500, @mean, data, 'seed', 1, 'loo', true);
std (bootstat)
Produces the following output
ans = 2.6441
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41]';
% Split data into consecutive blocks of two data observations per cell
data_blocks = mat2cell (data, 2 * (ones (13, 1)), 1);
% Compute 500 bootknife statistics for the mean and calculate the unbiased
% bootstrap standard error of the mean
bootstat = bootstrp (500, @(x) mean (cell2mat (x)), data_blocks, 'seed', 1, ...
'loo', true);
std (bootstat)
Produces the following output
ans = 3.045
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41]';
% Compute 500 bootknife statistics for the variance and calculate the
% unbiased standard error of the variance
bootstat = bootstrp (500, {@var, 1}, data, 'loo', true);
std (bootstat)
Produces the following output
ans = 42.137
The following code
% Input two-sample dataset
X = [212 435 339 251 404 510 377 335 410 335 ...
415 356 339 188 256 296 249 303 266 300]';
Y = [247 461 526 302 636 593 393 409 488 381 ...
474 329 555 282 423 323 256 431 437 240]';
% Compute 500 bootknife statistics for the mean difference between X and Y
% and calculate the unbiased bootstrap standard error of the mean difference
bootstat = bootstrp (500, @(x, y) mean (x - y), X, Y, 'loo', true);
% Or equivalently
bootstat = bootstrp (500, @(x, y) mean (x - y), X, Y, 'loo', true, ...
'match', true);
std (bootstat)
Produces the following output
ans = 18.185
The following code
% Input two-sample dataset
X = [212 435 339 251 404 510 377 335 410 335 ...
415 356 339 188 256 296 249 303 266 300]';
Y = [247 461 526 302 636 593 393 409 488 381 ...
474 329 555 282 423 323 256 431 437 240]';
% Compute 500 bootknife statistics for the difference in mean between
% between independent samples X and Y and calculate the unbiased bootstrap
% standard error of the difference in mean
bootstat = bootstrp (500, @(x, y) mean (x) - mean(y), X, Y, 'loo', true, ...
'match', false);
std (bootstat)
Produces the following output
ans = 31.797
The following code
% Input bivariate dataset
X = [212 435 339 251 404 510 377 335 410 335 ...
415 356 339 188 256 296 249 303 266 300]';
Y = [247 461 526 302 636 593 393 409 488 381 ...
474 329 555 282 423 323 256 431 437 240]';
% Compute 500 bootstrap statistics for the correlation coefficient and
% calculate the bootstrap standard error of the correlation coefficient
bootstat = bootstrp (500, @cor, X, Y);
std (bootstat)
Produces the following output
ans = 0.10017
The following code
% Input bivariate dataset
X = [212 435 339 251 404 510 377 335 410 335 ...
415 356 339 188 256 296 249 303 266 300]';
Y = [247 461 526 302 636 593 393 409 488 381 ...
474 329 555 282 423 323 256 431 437 240]';
% Compute 500 bootstrap statistics for the coefficient of determination and
% calculate it's bootstrap standard error
bootstat = bootstrp (500, {@cor,'squared'}, X, Y);
std (bootstat)
Produces the following output
ans = 0.12767
The following code
% Input bivariate dataset
X = [212 435 339 251 404 510 377 335 410 335 ...
415 356 339 188 256 296 249 303 266 300]';
Y = [247 461 526 302 636 593 393 409 488 381 ...
474 329 555 282 423 323 256 431 437 240]';
% Compute 4999 bootstrap statistics for the coefficient of determination and
% calculate 95% percentile confidence intervals
bootstat = bootstrp (4999, {@cor,'squared'}, X, Y);
bootint (bootstat)
Produces the following output
ans =
0.25642 0.743
The following code
% Input bivariate dataset
X = [212 435 339 251 404 510 377 335 410 335 ...
415 356 339 188 256 296 249 303 266 300]';
Y = [247 461 526 302 636 593 393 409 488 381 ...
474 329 555 282 423 323 256 431 437 240]';
% Compute 500 bootstrap statistics for the slope and intercept of a linear
% regression and calculate their bootstrap standard errors
bootstat = bootstrp (500, @mldivide, cat (2, ones (20, 1), X), Y);
std (bootstat)
Produces the following output
ans =
63.468 0.18955
Package: statistics-resampling