Construct pointwise variability bands via a bootstrap

Generate a bootstrap ensemble of trend filtering estimates in order to quantify the uncertainty in the optimized trend filtering estimate.

bootstrap_trendfilter(
  obj,
  algorithm = c("nonparametric", "parametric", "wild"),
  B = 100L,
  x_eval = NULL,
  edf = NULL,
  mc_cores = parallel::detectCores() - 4,
  ...
)

Arguments

obj	An object of class '`cv_trendfilter`' or '`sure_trendfilter`'.
algorithm	A string specifying which variation of the bootstrap to use. One of `c("nonparametric", "parametric", "wild")`. See Details section below for guidelines on when each choice should be used.
B	The number of bootstrap samples to be drawn to generate the trend filtering ensemble. Defaults to `B = 100L` (larger values encouraged if the computational cost is acceptable).
edf	The desired number of effective degrees of freedom in each bootstrap estimate. When `obj` is of class `sure_trendfilter`', `edf = obj$edf_min` and `edf = obj$edf_1se` are advisible options. When `obj` is of class `cv_trendfilter`', any element of the (now vectors) `obj$edf_min` and `obj$edf_1se` may be a reasonable choice. Defaults to `edf = obj$edf_min["MAE"]`.
mc_cores	Number of cores to utilize for parallel computing. Defaults to the number of cores detected, minus 4.
...	Additional named arguments. Currently only a few experimental arguments may be passed by experts.

Value

An object of class 'bootstrap_trendfilter' and subclass 'trendfilter'. Generic functions such as predict(), fitted, and residuals() may also be called on bootstrap_trendfilter objects, with the same effect as if they were called on the obj argument originally passed to bootstrap_trendfilter(). A bootstrap_trendfilter object is a list containing the follow elements:

x_eval: Input grid that each bootstrap trend filtering estimate was evaluated on.
ensemble: The full trend filtering bootstrap ensemble as a matrix with length(x_eval) rows and B columns.
algorithm: String specifying which variation of the bootstrap was used to generate the ensemble.
edf_opt: Number of effective degrees of freedom that each bootstrap trend filtering fit should approximately possess in our fixed-edf bootstrap procedure. Identical to the value passed to edf, or its default.
edf_boots: Vector of the estimated number of effective degrees of freedom of each trend filtering bootstrap estimate. These are unlikely to all be exactly equal to edf_opt, but should be relatively close.
n_iter_boots: Vector of the number of iterations taken by the ADMM algorithm before reaching a stopping criterion, for each bootstrap estimate.
lambda_boots: Vector of the hyperparameter values used for each bootstrap fit. In general, these are not all equal because our bootstrap implementation instead seeks to hold the number of effective degrees of freedom constant across all bootstrap estimates.
lambda: Vector of the original grid of candidate hyperparameter values, inherited from obj.
edf: Number of effective degrees of freedom in the trend filtering estimator, for every hyperparameter value in lambda.
fitted_values: Fitted values of all trend filtering point estimates with hyperparameter values in lambda, inherited from obj.
x: Vector of observed values for the input variable, inherited from obj.
y: Vector of observed values for the output variable, inherited from obj.
weights: Vector of weights for the observed outputs, inherited from obj.
k: Degree of the trend filtering point estimate (and bootstrap estimates), inherited from obj.
call: The function call.
scale_xy: For internal use.

Details

One of three possible bootstrap algorithms should be chosen according to the criteria below. Pointwise variability bands are then obtained by passing the 'bootstrap_trendfilter' object to vbands(), along with the desired level (e.g. level = 0.95). Bootstrapping trend filtering estimators tends to yield more accurate uncertainties when, for each bootstrap estimate, we fix the number of effective degrees of freedom, edf (a reparametrization of the hyperparameter lambda), instead of fixing lambda itself. Thus, bootstrap_trendfilter() has an edf argument instead of lambda. See the edf argument description and Examples section for guidance on how edf can be chosen.

Our recommendations for when to use each of the possible settings for the algorithm argument are shown in the table below. See Politsch et al. (2020a) for more details.

Scenario	Uncertainty quantification
`x` is unevenly sampled	`algorithm = "nonparametric"`
`x` is evenly sampled and measurement variances for `y` are available	`algorithm = "parametric"`
`x` is evenly sampled and measurement variances for `y` are not available	`algorithm = "wild"`

For our purposes, an evenly sampled data set with some discarded pixels (either sporadically or in large consecutive chunks) is still considered to be evenly sampled. When the inputs are evenly sampled on a transformed scale, we recommend transforming to that scale and carrying out the full trend filtering analysis on that scale. See Example 2 below for a case when the inputs are evenly sampled on the log10(x) scale.

References

Politsch et al. (2020a). Trend filtering – I. A modern statistical tool for time-domain astronomy and astronomical spectroscopy. MNRAS, 492(3), p. 4005-4018. [Publisher] [arXiv].
Politsch et al. (2020b). Trend Filtering – II. Denoising astronomical signals with varying degrees of smoothness. MNRAS, 492(3), p. 4019-4032. [Publisher] [arXiv].

Examples

# Example 1: Phase-folded light curve of an eclipsing binary star system
#
# The apparent brightness over time of a star system that has two suns
# that regularly eclipse one another from our vantage point on Earth. Here,
# the time series is stacked according to the orbital period of the binary
# system, with the primary eclipse occuring at `phase = 0` and the input
# domain ranging from -0.5 to 0.5.

data("eclipsing_binary")
head(eclipsing_binary)
#> # A tibble: 6 × 3
#>    phase  flux std_err
#>    <dbl> <dbl>   <dbl>
#> 1 -0.499 0.938  0.0102
#> 2 -0.498 0.930  0.0102
#> 3 -0.496 0.944  0.0102
#> 4 -0.494 0.963  0.0102
#> 5 -0.494 0.934  0.0102
#> 6 -0.494 0.949  0.0102

x <- eclipsing_binary$phase
y <- eclipsing_binary$flux
weights <- 1 / eclipsing_binary$std_err^2

cv_tf <- cv_trendfilter(
  x = x,
  y = y,
  weights = weights,
  max_iter = 1e4,
  obj_tol = 1e-6
)

boot_tf <- bootstrap_trendfilter(
  obj = cv_tf,
  algorithm = "nonparametric",
  edf = cv_tf$edf_min["MAE"]
)

# Example 2: The "Lyman-alpha forest" in the spectrum of a distant quasar

data("quasar_spectrum")
head(quasar_spectrum)
#> # A tibble: 6 × 3
#>   log10_wavelength   flux weights
#>              <dbl>  <dbl>   <dbl>
#> 1             3.55  0.424  0.0417
#> 2             3.55 -2.11   0.125
#> 3             3.55 -3.78   0.128
#> 4             3.55  3.60   0.121
#> 5             3.55 -4.89   0.134
#> 6             3.55  1.76   0.132 

x <- quasar_spectrum$log10_wavelength
y <- quasar_spectrum$flux
weights <- quasar_spectrum$weights

sure_tf <- sure_trendfilter(x, y, weights)
boot_tf <- bootstrap_trendfilter(
  obj = sure_tf,
  algorithm = "parametric",
  edf = sure_tf$edf_min
)

Arguments

Value

Details

References

See also

Examples