Generate a bootstrap ensemble of trend filtering estimates in order to quantify the uncertainty in the optimized trend filtering estimate.

bootstrap_trendfilter(
  obj,
  algorithm = c("nonparametric", "parametric", "wild"),
  B = 100L,
  x_eval = NULL,
  edf = NULL,
  mc_cores = parallel::detectCores() - 4,
  ...
)

Arguments

obj

An object of class 'cv_trendfilter' or 'sure_trendfilter'.

algorithm

A string specifying which variation of the bootstrap to use. One of c("nonparametric", "parametric", "wild"). See Details section below for guidelines on when each choice should be used.

B

The number of bootstrap samples to be drawn to generate the trend filtering ensemble. Defaults to B = 100L (larger values encouraged if the computational cost is acceptable).

edf

The desired number of effective degrees of freedom in each bootstrap estimate. When obj is of class sure_trendfilter', edf = obj$edf_min and edf = obj$edf_1se are advisible options. When obj is of class cv_trendfilter', any element of the (now vectors) obj$edf_min and obj$edf_1se may be a reasonable choice. Defaults to edf = obj$edf_min["MAE"].

mc_cores

Number of cores to utilize for parallel computing. Defaults to the number of cores detected, minus 4.

...

Additional named arguments. Currently only a few experimental arguments may be passed by experts.

Value

An object of class 'bootstrap_trendfilter' and subclass 'trendfilter'. Generic functions such as predict(), fitted, and residuals() may also be called on bootstrap_trendfilter objects, with the same effect as if they were called on the obj argument originally passed to bootstrap_trendfilter(). A bootstrap_trendfilter object is a list containing the follow elements:

x_eval

Input grid that each bootstrap trend filtering estimate was evaluated on.

ensemble

The full trend filtering bootstrap ensemble as a matrix with length(x_eval) rows and B columns.

algorithm

String specifying which variation of the bootstrap was used to generate the ensemble.

edf_opt

Number of effective degrees of freedom that each bootstrap trend filtering fit should approximately possess in our fixed-edf bootstrap procedure. Identical to the value passed to edf, or its default.

edf_boots

Vector of the estimated number of effective degrees of freedom of each trend filtering bootstrap estimate. These are unlikely to all be exactly equal to edf_opt, but should be relatively close.

n_iter_boots

Vector of the number of iterations taken by the ADMM algorithm before reaching a stopping criterion, for each bootstrap estimate.

lambda_boots

Vector of the hyperparameter values used for each bootstrap fit. In general, these are not all equal because our bootstrap implementation instead seeks to hold the number of effective degrees of freedom constant across all bootstrap estimates.

lambda

Vector of the original grid of candidate hyperparameter values, inherited from obj.

edf

Number of effective degrees of freedom in the trend filtering estimator, for every hyperparameter value in lambda.

fitted_values

Fitted values of all trend filtering point estimates with hyperparameter values in lambda, inherited from obj.

x

Vector of observed values for the input variable, inherited from obj.

y

Vector of observed values for the output variable, inherited from obj.

weights

Vector of weights for the observed outputs, inherited from obj.

k

Degree of the trend filtering point estimate (and bootstrap estimates), inherited from obj.

call

The function call.

scale_xy

For internal use.

Details

One of three possible bootstrap algorithms should be chosen according to the criteria below. Pointwise variability bands are then obtained by passing the 'bootstrap_trendfilter' object to vbands(), along with the desired level (e.g. level = 0.95). Bootstrapping trend filtering estimators tends to yield more accurate uncertainties when, for each bootstrap estimate, we fix the number of effective degrees of freedom, edf (a reparametrization of the hyperparameter lambda), instead of fixing lambda itself. Thus, bootstrap_trendfilter() has an edf argument instead of lambda. See the edf argument description and Examples section for guidance on how edf can be chosen.

Our recommendations for when to use each of the possible settings for the algorithm argument are shown in the table below. See Politsch et al. (2020a) for more details.

ScenarioUncertainty quantification
x is unevenly sampledalgorithm = "nonparametric"
x is evenly sampled and measurement variances for y are availablealgorithm = "parametric"
x is evenly sampled and measurement variances for y are not availablealgorithm = "wild"

For our purposes, an evenly sampled data set with some discarded pixels (either sporadically or in large consecutive chunks) is still considered to be evenly sampled. When the inputs are evenly sampled on a transformed scale, we recommend transforming to that scale and carrying out the full trend filtering analysis on that scale. See Example 2 below for a case when the inputs are evenly sampled on the log10(x) scale.

References

  1. Politsch et al. (2020a). Trend filtering – I. A modern statistical tool for time-domain astronomy and astronomical spectroscopy. MNRAS, 492(3), p. 4005-4018. [Publisher] [arXiv].

  2. Politsch et al. (2020b). Trend Filtering – II. Denoising astronomical signals with varying degrees of smoothness. MNRAS, 492(3), p. 4019-4032. [Publisher] [arXiv].

See also

Examples

# Example 1: Phase-folded light curve of an eclipsing binary star system # # The apparent brightness over time of a star system that has two suns # that regularly eclipse one another from our vantage point on Earth. Here, # the time series is stacked according to the orbital period of the binary # system, with the primary eclipse occuring at `phase = 0` and the input # domain ranging from -0.5 to 0.5. data("eclipsing_binary") head(eclipsing_binary)
#> # A tibble: 6 × 3 #> phase flux std_err #> <dbl> <dbl> <dbl> #> 1 -0.499 0.938 0.0102 #> 2 -0.498 0.930 0.0102 #> 3 -0.496 0.944 0.0102 #> 4 -0.494 0.963 0.0102 #> 5 -0.494 0.934 0.0102 #> 6 -0.494 0.949 0.0102
x <- eclipsing_binary$phase y <- eclipsing_binary$flux weights <- 1 / eclipsing_binary$std_err^2 cv_tf <- cv_trendfilter( x = x, y = y, weights = weights, max_iter = 1e4, obj_tol = 1e-6 ) boot_tf <- bootstrap_trendfilter( obj = cv_tf, algorithm = "nonparametric", edf = cv_tf$edf_min["MAE"] ) # Example 2: The "Lyman-alpha forest" in the spectrum of a distant quasar data("quasar_spectrum") head(quasar_spectrum)
#> # A tibble: 6 × 3 #> log10_wavelength flux weights #> <dbl> <dbl> <dbl> #> 1 3.55 0.424 0.0417 #> 2 3.55 -2.11 0.125 #> 3 3.55 -3.78 0.128 #> 4 3.55 3.60 0.121 #> 5 3.55 -4.89 0.134 #> 6 3.55 1.76 0.132
x <- quasar_spectrum$log10_wavelength y <- quasar_spectrum$flux weights <- quasar_spectrum$weights sure_tf <- sure_trendfilter(x, y, weights) boot_tf <- bootstrap_trendfilter( obj = sure_tf, algorithm = "parametric", edf = sure_tf$edf_min )