Bootstrap resampling.
-- Function File: BOOTSTAT = bootstrp (NBOOT, BOOTFUN, D)
-- Function File: BOOTSTAT = bootstrp (NBOOT, BOOTFUN, D1, ..., DN)
-- Function File: BOOTSTAT = bootstrp (..., D1, ..., DN, 'match', MATCH)
-- Function File: BOOTSTAT = bootstrp (..., 'Options', PAROPT)
-- Function File: BOOTSTAT = bootstrp (..., 'Weights', WEIGHTS)
-- Function File: BOOTSTAT = bootstrp (..., 'loo', LOO)
-- Function File: BOOTSTAT = bootstrp (..., 'seed', SEED)
-- Function File: [BOOTSTAT, BOOTSAM] = bootstrp (...)
'BOOTSTAT = bootstrp (NBOOT, BOOTFUN, D)' draws NBOOT bootstrap resamples
with replacement from the rows of the data D and returns the statistic
computed by BOOTFUN in BOOTSTAT [1]. BOOTFUN is a function handle (e.g.
specified with @) or name, a string indicating the function name, or a
cell array, where the first cell is one of the above function definitions
and the remaining cells are (additional) input arguments to that function
(after the data argument(s)). The third input argument is the data
(column vector, matrix or cell array), which is supplied to BOOTFUN. The
simulation method used by default is bootstrap resampling with first order
balance [2-3].
'BOOTSTAT = bootstrp (NBOOT, BOOTFUN, D1,...,DN)' is as above except that
the third and subsequent input arguments are multiple data objects used
as input for BOOTFUN.
'BOOTSTAT = bootstrp (..., D1, ..., DN, 'match', MATCH)' controls the
resampling strategy when multiple data arguments are provided. When MATCH
is true, row indices of D1 to DN are the same (i.e. matched) for each
resample. This is the default strategy when D1 to DN all have the same
number of rows. If MATCH is set to false, then row indices are resampled
independently for D1 to DN in each of the resamples. When any of the data
D1 to DN, have a different number of rows, this input argument is ignored
and MATCH is enforced to have a value of false. Note that the MATLAB
bootstrp function only operates in a mode equivalent to MATCH = true.
'BOOTSTAT = bootstrp (..., 'Options', PAROPT)' specifies options that
govern if and how to perform bootstrap iterations using multiple
processors (if the Parallel Computing Toolbox or Octave Parallel package).
is available This argument is a structure with the following recognised
fields:
o 'UseParallel': If true, use parallel processes to accelerate
bootstrap computations on multicore machines.
Default is false for serial computation. In MATLAB,
the default is true if a parallel pool
has already been started.
o 'nproc': nproc sets the number of parallel processes (optional)
'BOOTSTAT = bootstrp (..., D, 'weights', WEIGHTS)' sets the resampling
weights. WEIGHTS must be a column vector with the same number of rows as
the data, D. If WEIGHTS is empty or not provided, the default is a vector
of length N with uniform weighting 1/N.
'BOOTSTAT = bootstrp (..., D1, ... DN, 'weights', WEIGHTS)' as above if
MATCH is true. If MATCH is false, a 1-by-N cell array of column vectors
can be provided to specify independent resampling weights for D1 to DN.
'BOOTSTAT = bootstrp (..., 'loo', LOO)' sets the simulation method. If
LOO is false, the resampling method used is balanced bootstrap resampling.
If LOO is true, the resampling method used is balanced bootknife
resampling [4]. The latter involves creating leave-one-out (jackknife)
samples of size N - 1, and then drawing resamples of size N with
replacement from the jackknife samples, thereby incorporating Bessel's
correction into the resampling procedure. LOO must be a scalar logical
value. The default value of LOO is false.
'BOOTSTAT = bootstrp (..., 'seed', SEED)' initialises the Mersenne Twister
random number generator using an integer SEED value so that bootci results
are reproducible.
'[BOOTSTAT, BOOTSAM] = bootstrp (...)' also returns indices used for
bootstrap resampling. If MATCH is true or only one data argument is
provided, BOOTSAM is a matrix. If multiple data arguments are provided
and MATCH is false, BOOTSAM is returned in a 1-by-N cell array of
matrices, where each cell corresponds to the respective data argument
D1 to DN.
Bibliography:
[1] Efron, and Tibshirani (1993) An Introduction to the
Bootstrap. New York, NY: Chapman & Hall
[2] Davison et al. (1986) Efficient Bootstrap Simulation.
Biometrika, 73: 555-66
[3] Booth, Hall and Wood (1993) Balanced Importance Resampling
for the Bootstrap. The Annals of Statistics. 21(1):286-298
[4] Hesterberg T.C. (2004) Unbiasing the Bootstrap—Bootknife Sampling
vs. Smoothing; Proceedings of the Section on Statistics & the
Environment. Alexandria, VA: American Statistical Association.
bootstrp (version 2024.04.23)
Author: Andrew Charles Penn
https://www.researchgate.net/profile/Andrew_Penn/
Copyright 2019 Andrew Charles Penn
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41]';
% Compute 50 bootstrap statistics for the mean and calculate the bootstrap
% standard error of the mean
bootstat = bootstrp (50, @mean, data, 'seed', 1);
% Or equivalently
bootstat = bootstrp (50, @mean, data, 'seed', 1, 'loo', false);
std (bootstat)
Produces the following output
ans = 2.7156
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41]';
% Compute 50 bootknife statistics for the mean and calculate the unbiased
% bootstrap standard error of the mean
bootstat = bootstrp (50, @mean, data, 'seed', 1, 'loo', true);
std (bootstat)
Produces the following output
ans = 2.4052
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41]';
% Split data into consecutive blocks of two data observations per cell
data_blocks = mat2cell (data, 2 * (ones (13, 1)), 1);
% Compute 50 bootknife statistics for the mean and calculate the unbiased
% bootstrap standard error of the mean
bootstat = bootstrp (50, @(x) mean (cell2mat (x)), data_blocks, 'seed', 1, ...
'loo', true);
std (bootstat)
Produces the following output
ans = 2.9384
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41]';
% Compute 50 bootknife statistics for the variance and calculate the
% unbiased standard error of the variance
bootstat = bootstrp (50, {@var, 1}, data, 'loo', true);
std (bootstat)
Produces the following output
ans = 39.204
The following code
% Input two-sample dataset
X = [212 435 339 251 404 510 377 335 410 335 ...
415 356 339 188 256 296 249 303 266 300]';
Y = [247 461 526 302 636 593 393 409 488 381 ...
474 329 555 282 423 323 256 431 437 240]';
% Compute 50 bootknife statistics for the mean difference between X and Y
% and calculate the unbiased bootstrap standard error of the mean difference
bootstat = bootstrp (50, @(x, y) mean (x - y), X, Y, 'loo', true);
% Or equivalently
bootstat = bootstrp (50, @(x, y) mean (x - y), X, Y, 'loo', true, ...
'match', true);
std (bootstat)
Produces the following output
ans = 14.614
The following code
% Input two-sample dataset
X = [212 435 339 251 404 510 377 335 410 335 ...
415 356 339 188 256 296 249 303 266 300]';
Y = [247 461 526 302 636 593 393 409 488 381 ...
474 329 555 282 423 323 256 431 437 240]';
% Compute 50 bootknife statistics for the difference in mean between
% between independent samples X and Y and calculate the unbiased bootstrap
% standard error of the difference in mean
bootstat = bootstrp (50, @(x, y) mean (x) - mean(y), X, Y, 'loo', true, ...
'match', false);
std (bootstat)
Produces the following output
ans = 32.705
The following code
% Input bivariate dataset
X = [212 435 339 251 404 510 377 335 410 335 ...
415 356 339 188 256 296 249 303 266 300]';
Y = [247 461 526 302 636 593 393 409 488 381 ...
474 329 555 282 423 323 256 431 437 240]';
% Compute 50 bootstrap statistics for the correlation coefficient and
% calculate the bootstrap standard error of the correlation coefficient
bootstat = bootstrp (50, @cor, X, Y);
std (bootstat)
Produces the following output
ans = 0.098974
The following code
% Input bivariate dataset
X = [212 435 339 251 404 510 377 335 410 335 ...
415 356 339 188 256 296 249 303 266 300]';
Y = [247 461 526 302 636 593 393 409 488 381 ...
474 329 555 282 423 323 256 431 437 240]';
% Compute 50 bootstrap statistics for the coefficient of determination and
% calculate it's bootstrap standard error
bootstat = bootstrp (50, {@cor,'squared'}, X, Y);
std (bootstat)
Produces the following output
ans = 0.1265
The following code
% Input bivariate dataset
X = [212 435 339 251 404 510 377 335 410 335 ...
415 356 339 188 256 296 249 303 266 300]';
Y = [247 461 526 302 636 593 393 409 488 381 ...
474 329 555 282 423 323 256 431 437 240]';
% Compute 4999 bootstrap statistics for the coefficient of determination and
% calculate 95% percentile confidence intervals
bootstat = bootstrp (4999, {@cor,'squared'}, X, Y);
bootint (bootstat)
Produces the following output
ans =
0.25642 0.743
The following code
% Input bivariate dataset
X = [212 435 339 251 404 510 377 335 410 335 ...
415 356 339 188 256 296 249 303 266 300]';
Y = [247 461 526 302 636 593 393 409 488 381 ...
474 329 555 282 423 323 256 431 437 240]';
% Compute 50 bootstrap statistics for the slope and intercept of a linear
% regression and calculate their bootstrap standard errors
bootstat = bootstrp (50, @mldivide, cat (2, ones (20, 1), X), Y);
std (bootstat)
Produces the following output
ans =
58.311 0.18007
Package: statistics-resampling