Performs Bayesian nonparametric bootstrap and calculates posterior statistics
for the mean, or regression coefficients from a linear model.
-- Function File: bootbayes (Y)
-- Function File: bootbayes (Y, X)
-- Function File: bootbayes (Y, X, CLUSTID)
-- Function File: bootbayes (Y, X, BLOCKSZ)
-- Function File: bootbayes (Y, X, ..., NBOOT)
-- Function File: bootbayes (Y, X, ..., NBOOT, PROB)
-- Function File: bootbayes (Y, X, ..., NBOOT, PROB, PRIOR)
-- Function File: bootbayes (Y, X, ..., NBOOT, PROB, PRIOR, SEED)
-- Function File: bootbayes (Y, X, ..., NBOOT, PROB, PRIOR, SEED, L)
-- Function File: STATS = bootbayes (Y, ...)
-- Function File: [STATS, BOOTSTAT] = bootbayes (Y, ...)
'bootbayes (Y)' performs Bayesian nonparametric bootstrap [1] to create
1999 bootstrap statistics, each representing the weighted mean(s) of the
column vector (or column-major matrix), Y, using a vector of weights
randomly generated from a symmetric Dirichlet distribution. The resulting
bootstrap (or posterior [1,2]) distribution(s) is/are summarised with the
following statistics printed to the standard output:
- original: the mean(s) of the data column(s) of Y
- bias: bootstrap bias estimate(s)
- median: the median of the posterior distribution(s)
- stdev: the standard deviation of the posterior distribution(s)
- CI_lower: lower bound(s) of the 95% credible interval
- CI_upper: upper bound(s) of the 95% credible interval
By default, the credible intervals are shortest probability
intervals, which represent a more computationally stable version
of the highest posterior density interval [3].
'bootbayes (Y, X)' also specifies the design matrix (X) for least squares
regression of Y on X. X should be a column vector or matrix the same
number of rows as Y. If the X input argument is empty, the default for X
is a column of ones (i.e. intercept only) and thus the statistic computed
reduces to the mean (as above). The statistics calculated and returned in
the output then relate to the coefficients from the regression of Y on X.
Y must be a column vector (not matrix) for regression.
'bootbayes (Y, X, CLUSTID)' specifies a vector or cell array of numbers
or strings respectively to be used as cluster labels or identifiers.
Rows in Y (and X) with the same CLUSTID value are treated as clusters with
dependent errors. Rows of Y (and X) assigned to a particular cluster
will have identical weights during Bayesian bootstrap. If empty (default),
no clustered resampling is performed and all errors are treated as
independent.
'bootbayes (Y, X, BLOCKSZ)' specifies a scalar, which sets the block size
for bootstrapping when the residuals have serial dependence. Identical
weights are assigned within each (consecutive) block of length BLOCKSZ
during Bayesian bootstrap. Rows of Y (and X) within the same block are
treated as having dependent errors. If empty (default), no block
resampling is performed and all errors are treated as independent.
'bootbayes (Y, X, ..., NBOOT)' specifies the number of bootstrap resamples,
where NBOOT must be a positive integer. If empty, the default value of
NBOOT is 1999.
'bootbayes (Y, X, ..., NBOOT, PROB)' where PROB is numeric and sets the
lower and upper bounds of the credible interval(s). The value(s) of PROB
must be between 0 and 1. PROB can either be:
<> scalar: To set the central mass of shortest probability
intervals (SPI) to 100*(1-PROB)%
<> vector: A pair of probabilities defining the lower and upper
percentiles of the credible interval(s) as 100*(PROB(1))%
and 100*(PROB(2))% respectively.
Credible intervals are not calculated when the value(s) of PROB
is/are NaN. The default value of PROB is 0.95.
'bootbayes (Y, X, ..., NBOOT, PROB, PRIOR)' accepts a positive real
numeric scalar to parametrize the form of the symmetric Dirichlet
distribution. The Dirichlet distribution is the conjugate PRIOR used to
randomly generate weights for linear least squares fitting of the observed
data, and subsequently to estimate the posterior for the regression
coefficients by Bayesian bootstrap.
If PRIOR is not provided or is empty, and the model is not intercept
-only, then the default value of PRIOR is 1, which corresponds to Bayes
rule: a uniform (or flat) Dirichlet distribution (over all points in its
support). Otherwise, the value of PRIOR is set to 'auto'.
The value 'auto' sets a value for PRIOR that effectively incorporates
Bessel's correction a priori. Thus, for a sample size of N and PRIOR set
to 'auto', the variance of the posterior (i.e. BOOTSTAT) becomes an
unbiased estimator of the sampling variance. For example, when the PRIOR
is 1, the prior is flat over the range of the data Y, approximated by the
interval +/- 2 * std (Y, 1), which is 4 * std (Y, 1) wide according to
the range rule of thumb for a normal distribution. Therefore, a PRIOR
set to 'auto' is flat over the approximate interval +/- 2 * std (Y, 0).
The calculation used for 'auto' is as follows:
PRIOR = 1 - 2 / N
For block or cluster bootstrap, N corresponds to the number of blocks
or clusters (i.e. the number of independent sampling units). When N = 2,
the PRIOR is equal to 0, which is the Haldane prior, in which case:
std (BOOTSTAT, 1, 2) ~ std (Y, 1) == std (Y, 0) / sqrt (N)
Note that in this particular case, intervals will be computed using
the standard deviation of the posterior distribution and quantiles
from a standard normal distribution.
'bootbayes (Y, X, ..., NBOOT, PROB, PRIOR, SEED)' initialises the
Mersenne Twister random number generator using an integer SEED value so
that 'bootbayes' results are reproducible.
'bootbayes (Y, X, ..., NBOOT, PROB, PRIOR, SEED, L)' multiplies the
regression coefficients by the hypothesis matrix L. If L is not provided
or is empty, it will assume the default value of 1 (i.e. no change to
the design). Otherwise, L must have the same number of rows as the number
of columns in X.
'STATS = bootbayes (...) returns a structure with the following fields:
original, bias, median, stdev, CI_lower, CI_upper & prior.
'[STATS, BOOTSTAT] = bootbayes (...) also returns the a vector (or
matrix) of bootstrap statistics (BOOTSTAT) calculated over the bootstrap
resamples.
Bibliography:
[1] Rubin (1981) The Bayesian Bootstrap. Ann. Statist. 9(1):130-134
[2] Weng (1989) On a Second-Order Asymptotic property of the Bayesian
Bootstrap Mean. Ann. Statist. 17(2):705-710
[3] Liu, Gelman & Zheng (2015). Simulation-efficient shortest probability
intervals. Statistics and Computing, 25(4), 809–819.
bootbayes (version 2023.07.06)
Author: Andrew Charles Penn
https://www.researchgate.net/profile/Andrew_Penn/
Copyright 2019 Andrew Charles Penn
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/
The following code
% Input univariate dataset heights = [183, 192, 182, 183, 177, 185, 188, 188, 182, 185].'; % 95% credible interval for the mean bootbayes (heights); % Please be patient, the calculations will be completed soon...
Produces the following output
Summary of Bayesian bootstrap estimates of bias and precision for linear models ******************************************************************************* Bootstrap settings: Function: sum (w .* Y) Resampling method: Bayesian bootstrap Prior: Symmetric Dirichlet distribution (a = 0.8) Number of resamples: 1999 Credible interval (CI) type: Shortest probability interval Credible interval: 95% Posterior Statistics: original bias median stdev CI_lower CI_upper +184.5 +0.04987 +184.5 1.286 +182.1 +187.1
The following code
% Input bivariate dataset
X = [ones(43,1),...
[01,02,03,04,05,06,07,08,09,10,11,...
12,13,14,15,16,17,18,19,20,21,22,...
23,25,26,27,28,29,30,31,32,33,34,...
35,36,37,38,39,40,41,42,43,44]'];
y = [188.0,170.0,189.0,163.0,183.0,171.0,185.0,168.0,173.0,183.0,173.0,...
173.0,175.0,178.0,183.0,192.4,178.0,173.0,174.0,183.0,188.0,180.0,...
168.0,170.0,178.0,182.0,180.0,183.0,178.0,182.0,188.0,175.0,179.0,...
183.0,192.0,182.0,183.0,177.0,185.0,188.0,188.0,182.0,185.0]';
% 95% credible interval for the regression coefficents
bootbayes (y, X);
% Please be patient, the calculations will be completed soon...
Produces the following output
Summary of Bayesian bootstrap estimates of bias and precision for linear models ******************************************************************************* Bootstrap settings: Function: pinv (X' * W * X) * (X' * W * y) Resampling method: Bayesian bootstrap Prior: Symmetric Dirichlet distribution (a = 1) Number of resamples: 1999 Credible interval (CI) type: Shortest probability interval Credible interval: 95% Posterior Statistics: original bias median stdev CI_lower CI_upper +175.5 -0.1213 +175.3 2.401 +171.0 +180.2 +0.1904 +0.003674 +0.1961 0.07964 +0.03563 +0.3449
Package: statistics-resampling