R/uncertainty_quantification.R
bootstrap_trendfilter.Rd
Generate a bootstrap ensemble of trend filtering estimates in order to quantify the uncertainty in the optimized trend filtering estimate.
bootstrap_trendfilter( obj, algorithm = c("nonparametric", "parametric", "wild"), B = 100L, x_eval = NULL, edf = NULL, mc_cores = parallel::detectCores() - 4, ... )
obj | An object of class ' |
---|---|
algorithm | A string specifying which variation of the bootstrap to use. One of
|
B | The number of bootstrap samples to be drawn to generate the trend filtering
ensemble. Defaults to |
edf | The desired number of effective degrees of freedom in each bootstrap
estimate. When |
mc_cores | Number of cores to utilize for parallel computing. Defaults to the number of cores detected, minus 4. |
... | Additional named arguments. Currently only a few experimental arguments may be passed by experts. |
An object of class 'bootstrap_trendfilter
' and subclass
'trendfilter
'. Generic functions such as predict()
,
fitted
, and residuals()
may also be called on bootstrap_trendfilter
objects, with the same effect as if they were called on the obj
argument
originally passed to bootstrap_trendfilter()
. A bootstrap_trendfilter
object is a list containing the follow elements:
x_eval
Input grid that each bootstrap trend filtering estimate was evaluated on.
ensemble
The full trend filtering bootstrap ensemble as a matrix
with length(x_eval)
rows and B
columns.
algorithm
String specifying which variation of the bootstrap was used to generate the ensemble.
edf_opt
Number of effective degrees of freedom that each bootstrap
trend filtering fit should approximately possess in our fixed-edf bootstrap
procedure. Identical to the value passed to edf
, or its default.
edf_boots
Vector of the estimated number of effective degrees of
freedom of each trend filtering bootstrap estimate. These are unlikely to
all be exactly equal to edf_opt
, but should be relatively close.
n_iter_boots
Vector of the number of iterations taken by the ADMM algorithm before reaching a stopping criterion, for each bootstrap estimate.
lambda_boots
Vector of the hyperparameter values used for each bootstrap fit. In general, these are not all equal because our bootstrap implementation instead seeks to hold the number of effective degrees of freedom constant across all bootstrap estimates.
lambda
Vector of the original grid of candidate hyperparameter
values, inherited from obj
.
edf
Number of effective degrees of freedom in the trend filtering
estimator, for every hyperparameter value in lambda
.
fitted_values
Fitted values of all trend filtering point
estimates with hyperparameter values in lambda
, inherited from obj
.
x
Vector of observed values for the input variable, inherited from
obj
.
y
Vector of observed values for the output variable, inherited from
obj
.
weights
Vector of weights for the observed outputs, inherited from
obj
.
k
Degree of the trend filtering point estimate (and bootstrap
estimates), inherited from obj
.
call
The function call.
scale_xy
For internal use.
One of three possible bootstrap algorithms should be chosen
according to the criteria below. Pointwise variability bands are then
obtained by passing the 'bootstrap_trendfilter
' object to vbands()
,
along with the desired level (e.g. level = 0.95
). Bootstrapping trend
filtering estimators tends to yield more accurate uncertainties when, for
each bootstrap estimate, we fix the number of effective degrees of freedom,
edf
(a reparametrization of the hyperparameter lambda
), instead of fixing
lambda
itself. Thus, bootstrap_trendfilter()
has an edf
argument
instead of lambda
. See the edf
argument description and Examples
section for guidance on how edf
can be chosen.
Our recommendations for when to use each of the possible settings
for the algorithm
argument are shown in the table below. See
Politsch et al. (2020a) for more details.
Scenario | Uncertainty quantification |
x is unevenly sampled | algorithm = "nonparametric" |
x is evenly sampled and measurement variances for y are available | algorithm = "parametric" |
x is evenly sampled and measurement variances for y are not available | algorithm = "wild" |
For our purposes, an evenly sampled data set with some discarded pixels
(either sporadically or in large consecutive chunks) is still considered to
be evenly sampled. When the inputs are evenly sampled on a transformed scale,
we recommend transforming to that scale and carrying out the full trend
filtering analysis on that scale. See Example 2 below for a case when the
inputs are evenly sampled on the log10(x)
scale.
Politsch et al. (2020a). Trend filtering – I. A modern statistical tool for time-domain astronomy and astronomical spectroscopy. MNRAS, 492(3), p. 4005-4018. [Publisher] [arXiv].
Politsch et al. (2020b). Trend Filtering – II. Denoising astronomical signals with varying degrees of smoothness. MNRAS, 492(3), p. 4019-4032. [Publisher] [arXiv].
# Example 1: Phase-folded light curve of an eclipsing binary star system # # The apparent brightness over time of a star system that has two suns # that regularly eclipse one another from our vantage point on Earth. Here, # the time series is stacked according to the orbital period of the binary # system, with the primary eclipse occuring at `phase = 0` and the input # domain ranging from -0.5 to 0.5. data("eclipsing_binary") head(eclipsing_binary)#> # A tibble: 6 × 3 #> phase flux std_err #> <dbl> <dbl> <dbl> #> 1 -0.499 0.938 0.0102 #> 2 -0.498 0.930 0.0102 #> 3 -0.496 0.944 0.0102 #> 4 -0.494 0.963 0.0102 #> 5 -0.494 0.934 0.0102 #> 6 -0.494 0.949 0.0102x <- eclipsing_binary$phase y <- eclipsing_binary$flux weights <- 1 / eclipsing_binary$std_err^2 cv_tf <- cv_trendfilter( x = x, y = y, weights = weights, max_iter = 1e4, obj_tol = 1e-6 ) boot_tf <- bootstrap_trendfilter( obj = cv_tf, algorithm = "nonparametric", edf = cv_tf$edf_min["MAE"] ) # Example 2: The "Lyman-alpha forest" in the spectrum of a distant quasar data("quasar_spectrum") head(quasar_spectrum)#> # A tibble: 6 × 3 #> log10_wavelength flux weights #> <dbl> <dbl> <dbl> #> 1 3.55 0.424 0.0417 #> 2 3.55 -2.11 0.125 #> 3 3.55 -3.78 0.128 #> 4 3.55 3.60 0.121 #> 5 3.55 -4.89 0.134 #> 6 3.55 1.76 0.132x <- quasar_spectrum$log10_wavelength y <- quasar_spectrum$flux weights <- quasar_spectrum$weights sure_tf <- sure_trendfilter(x, y, weights) boot_tf <- bootstrap_trendfilter( obj = sure_tf, algorithm = "parametric", edf = sure_tf$edf_min )