Performs single or double bootstrap (or bootknife) resampling and calculates
confidence intervals.
-- Function File: CI = bootci (NBOOT, BOOTFUN, D)
-- Function File: CI = bootci (NBOOT, BOOTFUN, D1,...,DN)
-- Function File: CI = bootci (NBOOT, {BOOTFUN, D}, NAME, VALUE)
-- Function File: CI = bootci (NBOOT, {BOOTFUN, D1, ..., DN}, NAME, VALUE)
-- Function File: CI = bootci (...,'type', TYPE)
-- Function File: CI = bootci (...,'type', 'stud', 'nbootstd', NBOOTSTD)
-- Function File: CI = bootci (...,'type', 'cal', 'nbootcal', NBOOTCAL)
-- Function File: CI = bootci (...,'alpha', ALPHA)
-- Function File: CI = bootci (...,'strata', STRATA)
-- Function File: CI = bootci (...,'loo', LOO)
-- Function File: CI = bootci (...,'seed', SEED)
-- Function File: CI = bootci (...,'Options', PAROPT)
-- Function File: [CI, BOOTSTAT] = bootci (...)
'CI = bootci (NBOOT, BOOTFUN, D)' draws NBOOT bootstrap resamples from
the rows of a data sample D and returns 95% confidence intervals (CI) for
the bootstrap statistics computed by BOOTFUN [1]. BOOTFUN is a function
handle (e.g. specified with @), or a string indicating the function name.
The third input argument, data D (a column vector or a matrix), is used
as input for BOOTFUN. The bootstrap resampling method yields first-order
balance [2-3].
'CI = bootci (NBOOT, BOOTFUN, D1,...,DN)' is as above except that the
third and subsequent numeric input arguments are data vectors that are
used to create inputs for bootfun.
'CI = bootci (NBOOT, {BOOTFUN, D}, NAME, VALUE)' is as above but includes
setting optional parameters using Name-Value pairs.
'CI = bootci (NBOOT, {BOOTFUN, D1, ..., DN}, NAME, VALUE)' is as above but
includes setting optional parameters using NAME-VALUE pairs.
bootci can take a number of optional parameters as NAME-VALUE pairs:
'CI = bootci (..., 'alpha', ALPHA)' where ALPHA sets the lower and upper
bounds of the confidence interval(s). The value of ALPHA must be between
0 and 1. The nominal lower and upper percentiles of the confidence
intervals CI are then 100*(ALPHA/2)% and 100*(1-ALPHA/2)% respectively,
and nominal central coverage of the intervals is 100*(1-ALPHA)%. The
default value of ALPHA is 0.05.
'CI = bootci (..., 'type', TYPE)' computes bootstrap confidence interval
CI using one of the following methods:
<> 'norm' or 'normal': Using bootstrap bias and standard error [4].
<> 'per' or 'percentile': Percentile method [1,4].
<> 'basic': Basic bootstrap method [1,4].
<> 'bca': Bias-corrected and accelerated method [5,6] (Default).
<> 'stud' or 'student': Studentized bootstrap (bootstrap-t) [1,4].
<> 'cal': Calibrated percentile method (by double bootstrap [7]).
Note that when BOOTFUN is the mean, percentile, basic and bca intervals
are automatically expanded using Student's t-distribution in order to
improve coverage for small samples [8]. The bootstrap-t method includes
an additive correction to stabilize the variance when the sample size
is small [9].
'CI = bootci (..., 'type', 'stud', 'nbootstd', NBOOTSTD)' computes the
Studentized bootstrap confidence intervals CI, with the standard errors
of the bootstrap statistics estimated automatically using resampling
methods. NBOOTSTD is a positive integer value > 0 defining the number
of resamples. Standard errors are computed using NBOOTSTD bootstrap
resamples. The default value of NBOOTSTD is 100.
'CI = bootci (..., 'type', 'cal', 'nbootcal', NBOOTCAL)' computes the
calibrated percentile bootstrap confidence intervals CI, with the
calibrated percentiles of the bootstrap statistics estimated from NBOOTCAL
bootstrap data samples. NBOOTCAL is a positive integer value. The default
value of NBOOTCAL is 199.
'CI = bootci (..., 'strata', STRATA)' sets STRATA, which are identifiers
that define the grouping of the DATA rows for stratified bootstrap
resampling. STRATA should be a column vector or cell array with the same
number of rows as the DATA.
'CI = bootci (..., 'loo', LOO)' is a logical scalar that specifies whether
the resamples of size n should be obtained by sampling from the original
data (false) or from Leave-One-Out (LOO) jackknife samples (true) of the
data - otherwise known as bootknife resampling [10]. Default is false.
'CI = bootci (..., 'seed', SEED)' initialises the Mersenne Twister random
number generator using an integer SEED value so that bootci results are
reproducible.
'CI = bootci (..., 'Options', PAROPT)' specifies options that govern if
and how to perform bootstrap iterations using multiple processors (if the
Parallel Computing Toolbox or Octave Parallel package is available). This
argument is a structure with the following recognised fields:
<> 'UseParallel': If true, use parallel processes to accelerate
bootstrap computations on multicore machines,
specifically non-vectorized function evaluations,
double bootstrap resampling and jackknife function
evaluations. Default is false for serial computation.
In MATLAB, the default is true if a parallel pool
has already been started.
<> 'nproc': nproc sets the number of parallel processes
'[CI, BOOTSTAT] = bootci (...)' also returns the bootstrap statistics
used to calculate the confidence intervals CI.
'[CI, BOOTSTAT, BOOTSAM] = bootci (...)' also returns BOOTSAM, a matrix
of indices from the bootstrap. Each column in BOOTSAM corresponds to one
bootstrap sample and contains the row indices of the values drawn from
the nonscalar data argument to create that sample.
Bibliography:
[1] Efron, and Tibshirani (1993) An Introduction to the
Bootstrap. New York, NY: Chapman & Hall
[2] Davison et al. (1986) Efficient Bootstrap Simulation.
Biometrika, 73: 555-66
[3] Booth, Hall and Wood (1993) Balanced Importance Resampling
for the Bootstrap. The Annals of Statistics. 21(1):286-298
[4] Davison and Hinkley (1997) Bootstrap Methods and their Application.
(Cambridge University Press)
[5] Efron (1987) Better Bootstrap Confidence Intervals. JASA,
82(397): 171-185
[6] Efron, and Tibshirani (1993) An Introduction to the
Bootstrap. New York, NY: Chapman & Hall
[7] Hall, Lee and Young (2000) Importance of interpolation when
constructing double-bootstrap confidence intervals. Journal
of the Royal Statistical Society. Series B. 62(3): 479-491
[8] Hesterberg, Tim (2014), What Teachers Should Know about the
Bootstrap: Resampling in the Undergraduate Statistics Curriculum,
http://arxiv.org/abs/1411.5279
[9] Polansky (2000) Stabilizing bootstrap-t confidence intervals
for small samples. Can J Stat. 28(3):501-516
[10] Hesterberg T.C. (2004) Unbiasing the Bootstrap—Bootknife Sampling
vs. Smoothing; Proceedings of the Section on Statistics & the
Environment. Alexandria, VA: American Statistical Association.
bootci (version 2023.07.04)
Author: Andrew Charles Penn
https://www.researchgate.net/profile/Andrew_Penn/
Copyright 2019 Andrew Charles Penn
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41].';
% 95% BCa bootstrap confidence intervals for the mean
ci = bootci (1999, @mean, data)
Produces the following output
ci =
23.616
34.358
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41].';
% 95% calibrated percentile bootstrap confidence intervals for the mean
ci = bootci (1999, {@mean, data}, 'type', 'cal','nbootcal',199)
% Please be patient, the calculations will be completed soon...
Produces the following output
ci =
23.975
34.269
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41].';
% 95% calibrated percentile bootstrap confidence intervals for the median
% with smoothing
ci = bootci (1999, {@smoothmedian, data}, 'type', 'cal', 'nbootcal', 199)
% Please be patient, the calculations will be completed soon...
Produces the following output
ci =
25.04
36.477
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41].';
% 90% percentile bootstrap confidence intervals for the variance
ci = bootci (1999, {{@var,1}, data}, 'type', 'per', 'alpha', 0.1)
Produces the following output
ci =
96.629
235.91
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41].';
% 90% BCa bootstrap confidence intervals for the variance
ci = bootci (1999, {{@var,1}, data}, 'type', 'bca', 'alpha', 0.1)
Produces the following output
ci =
117.01
260.73
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41]';
% 90% Studentized bootstrap confidence intervals for the variance
ci = bootci (1999, {{@var,1}, data}, 'type', 'stud', ...
'nbootstd', 50, 'alpha', 0.1)
% Please be patient, the calculations will be completed soon...
Produces the following output
ci =
108.55
297.71
The following code
% Input univariate dataset
data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
0 33 28 34 4 32 24 47 41 24 26 30 41].';
% 90% calibrated percentile bootstrap confidence intervals for the variance
ci = bootci (1999, {{@var,1}, data}, 'type', 'cal', 'nbootcal', ...
199, 'alpha', 0.1)
% Please be patient, the calculations will be completed soon...
Produces the following output
ci =
111.53
268.13
The following code
% Input bivariate dataset
x = [2.12,4.35,3.39,2.51,4.04,5.1,3.77,3.35,4.1,3.35, ...
4.15,3.56, 3.39,1.88,2.56,2.96,2.49,3.03,2.66,3].';
y = [2.47,4.61,5.26,3.02,6.36,5.93,3.93,4.09,4.88,3.81, ...
4.74,3.29,5.55,2.82,4.23,3.23,2.56,4.31,4.37,2.4].';
% 95% BCa bootstrap confidence intervals for the correlation coefficient
ci = bootci (1999, @cor, x, y)
% Please be patient, the calculations will be completed soon...
Produces the following output
ci =
0.50501
0.86333
The following code
% Calculating confidence intervals for the coefficients from logistic
% regression using an example with an ordinal response from:
% https://uk.mathworks.com/help/stats/mnrfit.html
%>>>>>>>>> This code block must be run first in Octave only >>>>>>>>>>>>
try
pkg load statistics
load carbig
info = ver;
if ( str2num ({info.Version}{strcmp({info.Name},'statistics')}(1:3)) < 1.5)
error ('statistics package version must be > 1.5')
end
if (~ exist ('mnrfit', 'file'))
% Octave Statistics package does not currently have the mnrfit function,
% so we will use it's logistic_regression function for fitting ordinal
% models instead.
function [B, DEV] = mnrfit (X, Y, varargin)
% Note that if the outcome has more than two levels, the
% logistic_regression function is only suitable when the
% outcome is ordinal, so we would need to use append 'model',
% 'ordinal' as a name-value pair in MATLAB when executing
% it's mnrfit function (see below)
[INTERCEPT, SLOPE, DEV] = logistic_regression (Y - 1, X, false);
B = cat (1, INTERCEPT, SLOPE);
end
end
stats_pkg = true;
catch
stats_pkg = false;
fprintf ('\nSkipping this demo...')
fprintf ('\nRequired features of the statistics package not found.\n\n');
end
%<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
if (stats_pkg)
%>>>>>>>>>>>>>>>>>>> This code block is the demo >>>>>>>>>>>>>>>>>>>>>>
% This demo requires the statistics package in Octave (equivalent to
% the Statistics and Machine Learning Toolbox in Matlab)
% Create the dataset
load carbig
X = [Acceleration Displacement Horsepower Weight];
% The responses 1 - 4 correspond to the following classification:
% 1: 9 - 19 miles per gallon
% 2: 19 - 29 miles per gallon
% 3: 29 - 39 miles per gallon
% 4: 39 - 49 miles per gallon
miles = [1,1,1,1,1,1,1,1,1,1,NaN,NaN,NaN,NaN,NaN,1,1,NaN,1,1,2,2,1,2, ...
2,2,2,2,2,2,2,1,1,1,1,2,2,2,2,NaN,2,1,1,2,1,1,1,1,1,1,1,1,1, ...
2,2,1,2,2,3,3,3,3,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,2,1,1,1,1, ...
1,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,1,1,1, ...
1,1,2,2,2,1,2,2,2,1,1,3,2,2,2,1,2,2,1,2,2,2,1,3,2,3,2,1,1,1, ...
1,1,1,1,1,3,2,2,3,3,2,2,2,2,2,3,2,1,1,1,1,1,1,1,1,1,1,1,2,2, ...
1,3,2,2,2,2,2,2,1,3,2,2,2,2,2,3,2,2,2,2,2,1,1,1,1,2,2,2,2,3, ...
2,3,3,2,1,1,1,3,3,2,2,2,1,2,2,1,1,1,1,1,3,3,3,2,3,1,1,1,1,1, ...
2,2,1,1,1,1,1,3,2,2,2,3,3,3,3,2,2,2,4,3,3,4,3,2,2,2,2,2,2,2, ...
2,2,2,2,1,1,2,1,1,1,3,2,2,3,2,2,2,2,2,1,2,1,3,3,2,2,2,2,2,1, ...
1,1,1,1,1,2,1,3,3,3,2,2,2,2,2,3,3,3,3,2,2,2,3,4,3,3,3,2,2,2, ...
2,3,3,3,3,3,4,2,4,4,4,3,3,4,4,3,3,3,2,3,2,3,2,2,2,2,3,4,4,3, ...
3,3,3,3,3,3,3,3,3,3,3,3,3,2,NaN,3,2,2,2,2,2,1,2,2,3,3,3,2,2, ...
2,3,3,3,3,3,3,3,3,3,3,3,2,3,2,2,3,3,2,2,4,3,2,3]';
% Model coefficients from logistic regression
B = mnrfit (X, miles, 'model', 'ordinal');
% Bootsrap confidence intervals for each logistic regression coefficient
ci = bootci (1999, @(X, miles) mnrfit (X, miles, 'model', 'ordinal'), ...
X, miles);
[B, ci']
% Where the first 3 rows are the intercept terms, and the last 4 rows
% are the slope coefficients. For each predictor, the slope coefficient
% corresponds to how a unit change in the predictor impacts on the odds,
% which are proportional across the (ordered) catagories, where each
% log-odds in each case is:
%
% ln ( ( P[below] ) / ( P[above] ) )
%
% i.e. in mnrfit, the reference class is the higher of the two classes.
% Therefore, a positive slope value indicates that a unit increase in the
% predictor increases the odds of running at fewer miles per gallon.
% Note that ordinal and multinomial logistic regression (appropriate
% for ordinal and nominal responses respectively) would be equivalent
% for any binary outcome
%<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
end
Produces the following output
ans =
-16.69 -20.555 -12.325
-11.721 -15.104 -7.907
-8.0606 -11.244 -4.3564
0.10476 -0.081023 0.28225
0.010336 -0.0032993 0.022647
0.06452 0.033083 0.095608
0.0016638 0.00017173 0.0031554
The following code
% Spatial Test Data from Table 14.1 of Efron and Tibshirani (1993)
% An Introduction to the Bootstrap in Monographs on Statistics and Applied
% Probability 57 (Springer)
% AIM:
% To construct 90% nonparametric bootstrap confidence intervals for var(A,1)
% var(A,1) = 171.5
% Exact intervals based on Normal theory are [118.4, 305.2].
% Calculations using Matlab's 'Statistics and Machine Learning toolbox'
% (R2020b)
%
% A = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
% 0 33 28 34 4 32 24 47 41 24 26 30 41].';
% varfun = @(A) var(A, 1);
% rng('default'); % For reproducibility
% rng('default'); ci1 = bootci (19999,{varfun,A},'alpha',0.1,'type','norm');
% rng('default'); ci2 = bootci (19999,{varfun,A},'alpha',0.1,'type','per');
% rng('default'); ci4 = bootci (19999,{varfun,A},'alpha',0.1,'type','bca');
% rng('default'); ci5 = bootci (19999,{varfun,A},'alpha',0.1,'type','stud');
%
% Summary of results from Matlab's 'Statistics and Machine Learning toolbox'
% (R2020b)
%
% method | 0.05 | 0.95 | length | shape |
% -------------------|--------|--------|--------|-------|
% ci1 - normal | 108.9 | 247.4 | 138.5 | 1.21 |
% ci2 - percentile | 97.6 | 235.8 | 138.2 | 0.87 |
% ci4 - BCa | 114.9 | 260.5 | 145.6 | 1.57 |*
% ci5 - bootstrap-t | 46.7 | 232.5 | 185.8 | 0.49 |**
% -------------------|--------|--------|--------|-------|
% parametric - exact | 118.4 | 305.2 | 186.8 | 2.52 |
%
% * Bug in the fx0 subfunction of MathWorks MATLAB bootci function
% ** Bug in the bootstud subfunction of MathWorks MATLAB bootci
% Calculations using the 'boot' and 'bootstrap' packages in R
%
% library (boot) % Functions from Davison and Hinkley (1997)
% A <- c(48,36,20,29,42,42,20,42,22,41,45,14,6, ...
% 0,33,28,34,4,32,24,47,41,24,26,30,41);
% n <- length(A)
% var.fun <- function (d, i) {
% % Function to compute the population variance
% n <- length (d);
% return (var (d[i]) * (n - 1) / n) };
% boot.fun <- function (d, i) {
% % Compute the estimate
% t <- var.fun (d, i);
% % Compute sampling variance of the estimate using Tukey's jackknife
% n <- length (d);
% U <- empinf (data=d[i], statistic=var.fun, type="jack", stype="i");
% var.t <- sum (U^2 / (n * (n - 1)));
% return ( c(t, var.t) ) };
% set.seed(1)
% var.boot <- boot (data=A, statistic=boot.fun, R=19999, sim='balanced')
% ci1 <- boot.ci (var.boot, conf=0.90, type="norm")
% ci2 <- boot.ci (var.boot, conf=0.90, type="perc")
% ci3 <- boot.ci (var.boot, conf=0.90, type="basic")
% ci4 <- boot.ci (var.boot, conf=0.90, type="bca")
% ci5 <- boot.ci (var.boot, conf=0.90, type="stud")
%
% library (bootstrap) % Functions from Efron and Tibshirani (1993)
% set.seed(1);
% ci4a <- bcanon (A, 19999, var.fun, alpha=c(0.05,0.95))
% set.seed(1);
% ci5a <- boott (A, var.fun, nboott=19999, nbootsd=499, perc=c(.05,.95))
%
% Summary of results from 'boot' and 'bootstrap' packages in R
%
% method | 0.05 | 0.95 | length | shape |
% -------------------|--------|--------|--------|-------|
% ci1 - normal | 109.6 | 246.7 | 137.1 | 1.22 |
% ci2 - percentile | 97.9 | 234.8 | 136.9 | 0.86 |
% ci3 - basic | 108.3 | 245.1 | 136.8 | 1.16 |
% ci4 - BCa | 116.0 | 260.7 | 144.7 | 1.60 |
% ci4a - BCa | 115.8 | 260.6 | 144.8 | 1.60 |
% ci5 - bootstrap-t | 112.0 | 291.8 | 179.8 | 2.02 |
% ci5a - bootstrap-t | 116.1 | 290.9 | 174.8 | 2.16 |
% -------------------|--------|--------|--------|-------|
% parametric - exact | 118.4 | 305.2 | 186.8 | 2.52 |
% Calculations using the 'statistics-resampling' package for Octave/Matlab
%
% A = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
% 0 33 28 34 4 32 24 47 41 24 26 30 41].';
% ci1 = bootci (19999,{{@var,1},A},'alpha',0.1,'type','norm','seed',1);
% ci2 = bootci (19999,{{@var,1},A},'alpha',0.1,'type','per','seed',1);
% ci3 = bootci (19999,{{@var,1},A},'alpha',0.1,'type','basic','seed',1);
% ci4 = bootci (19999,{{@var,1},A},'alpha',0.1,'type','bca','seed',1);
% ci5 = bootci (19999,{{@var,1},A},'alpha',0.1,'type','stud',...
% 'nbootstd',100,'seed',1);
% ci6 = bootci (19999,{{@var,1},A},'alpha',0.1,'type','cal', ...
% 'nbootcal',499,'seed',1);
%
% Summary of results from 'statistics-resampling' package for Octave/Matlab
%
% method | 0.05 | 0.95 | length | shape |
% -------------------|--------|--------|--------|-------|
% ci1 - normal | 110.1 | 246.2 | 136.1 | 1.22 |
% ci2 - percentile | 98.1 | 234.7 | 136.6 | 0.86 |
% ci3 - basic | 108.4 | 245.0 | 136.1 | 1.17 |
% ci4 - BCa | 116.1 | 259.3 | 143.2 | 1.59 |
% ci5 - bootstrap-t | 114.0 | 290.3 | 176.3 | 2.07 |
% ci6 - calibrated | 115.3 | 276.4 | 161.1 | 1.87 |
% -------------------|--------|--------|--------|-------|
% parametric - exact | 118.4 | 305.2 | 186.8 | 2.52 |
%
% Simulation results for constructing 90% confidence intervals for the
% variance of a population N(0,1) from 1000 random samples of size 26
% (analagous to the situation above). Simulation performed using the
% bootsim script with nboot of 1999.
%
% method | coverage | lower | upper | length | shape |
% ---------------------|----------|--------|--------|--------|-------|
% normal | 81.5% | 3.0% | 15.5% | 0.77 | 1.21 |
% percentile | 81.5% | 0.9% | 17.6% | 0.76 | 0.91 |
% basic | 81.1% | 2.5% | 16.4% | 0.78 | 1.09 |
% BCa | 84.2% | 5.4% | 10.4% | 0.86 | 1.82 |
% bootstrap-t | 89.2% | 4.3% | 6.5% | 0.99 | 2.15 |
% calibrated | 87.4% | 4.2% | 8.4% | 0.91 | 2.03 |
% ---------------------|----------|--------|--------|--------|-------|
% parametric - exact | 90.8% | 3.7% | 5.5% | 0.99 | 2.52 |
gives an example of how 'bootci' is used.
Package: statistics-resampling