Optimize a trend filtering model by V-fold cross validation

For every candidate hyperparameter value, estimate the trend filtering model's out-of-sample error by $V$-fold cross validation. See the Details section for guidelines on when cv_trendfilter() should be used versus sure_trendfilter().

cv_trendfilter(
  x,
  y,
  weights = NULL,
  nlambda = 250L,
  V = 10L,
  mc_cores = V,
  loss_funcs = NULL,
  fold_ids = NULL,
  ...
)

Arguments

x	Vector of observed values for the input variable.
y	Vector of observed values for the output variable.
weights	Weights for the observed outputs, defined as the reciprocal variance of the additive noise that contaminates the output signal. When the noise is expected to have an equal variance $\sigma^2$ for all observations, a scalar may be passed to `weights`, i.e. `weights =` $1/\sigma^2$. Otherwise, `weights` must be a vector with the same length as `x` and `y`.
nlambda	Number of hyperparameter values to evaluate during cross validation. Defaults to `nlambda = 250`. The hyperparameter grid is internally constructed to span the full trend filtering model space (which is bookended by a global polynomial solution and an interpolating solution), with `nlambda` controlling the granularity of the hyperparameter grid.
V	Number of folds that the data are partitioned into for V-fold cross validation. Must be at least 2 and no more than 10. Defaults to `V = 10`.
mc_cores	Number of cores to utilize for parallel computing. Defaults to `mc_cores = V`.
loss_funcs	A named list of one or more functions, with each defining a loss function to be evaluated during cross validation. See the Loss functions section below for an example.
fold_ids	An integer vector defining a custom partition of the data for cross validation. `fold_ids` must have the same length as `x` and `y`, and only contain integer values `1`, ..., `V` designating the fold assignments.
...	Additional named arguments to pass to `.trendfilter()`.

Value

An object of class 'cv_trendfilter'. The object has subclass 'trendfilter' and may therefore be passed to generic stats functions such as predict(), fitted(), and residuals(). And more precisely, a cv_trendfilter' object is a list with the elements below, as well as all elements from a trendfilter' call on the full data set.

lambda: Vector of candidate hyperparameter values (always returned in descending order).
edf: Number of effective degrees of freedom in the trend filtering estimator, for every candidate hyperparameter value in lambda.
error: A named list of vectors, with each representing the CV error curve for every loss function in loss_funcs (see below).
se_error: Standard errors for the error, within a named list of the same structure.
lambda_min: A named vector with length equal to length(lambda), containing the hyperparameter value that minimizes the CV error curve, for every loss function in loss_funcs.
lambda_1se: A named vector with length equal to length(lambda), containing the "1-standard-error rule" hyperparameter, for every loss function in loss_funcs. The "1-standard-error rule" hyperparameter is the largest hyperparameter value in lambda that has a CV error within one standard error of min(error). It serves as an Occam's razor-like heuristic. More precisely, given two models with approximately equal performance (in terms of some loss function), it may be wise to opt for the simpler model, i.e. the model with the larger hyperparameter value / fewer effective degrees of freedom.
edf_min: A named vector with length equal to length(lambda), containing the number of effective degrees of freedom in the trend filtering estimator that minimizes the CV error curve, for every loss function in loss_funcs.
edf_1se: A named vector with length equal to length(lambda), containing the number of effective degrees of freedom in the "1-standard-error rule" trend filtering estimator, for every loss function in loss_funcs.
i_min: A named vector with length equal to length(lambda), containing the index of lambda that minimizes the CV error curve, for every loss function in loss_funcs.
i_1se: A named vector with length equal to length(lambda), containing the index of lambda that gives the "1-standard-error rule" hyperparameter value, for every loss function in loss_funcs.
fitted_values: The fitted values of all trend filtering estimates, return as a matrix with length(lambda) columns, with fitted_values[,i] corresponding to the trend filtering estimate with hyperparameter lambda[i].
admm_params: A list of the parameter values used by the ADMM algorithm used to solve the trend filtering convex optimization.
obj_func: The relative change in the objective function over the ADMM algorithm's final iteration, for every candidate hyperparameter in lambda.
n_iter: Total number of iterations taken by the ADMM algorithm, for every candidate hyperparameter in lambda. If an element of n_iter is exactly equal to admm_params$max_iter, then the ADMM algorithm stopped before reaching the objective tolerance admm_params$obj_tol. In these situations, you may need to increase the maximum number of tolerable iterations by passing a max_iter argument to cv_trendfilter() in order to ensure that the ADMM solution has converged to satisfactory precision.
loss_funcs: A named list of functions that defines all loss functions --- both internal and user-passed --- evaluated during cross validation.
V: The number of folds the data were split into for cross validation.
x: Vector of observed values for the input variable.
y: Vector of observed values for the output variable (if originally present, observations with is.na(y) or weights == 0 are dropped).
weights: Vector of weights for the observed outputs.
k: Degree of the trend filtering estimates.
status: For internal use. Output from the C solver.
call: The function call.
scale_xy: For internal use.

Details

Many common regression loss functions are defined internally, and a cross validation curve is returned for each. Custom loss functions may also be passed via the loss_funcs argument. See the Loss functions section below for definitions of the internal loss functions. Generic functions such as predict(), fitted, and residuals() may be called on the cv_trendfilter() output.

Our recommendations for when to use cv_trendfilter() versus sure_trendfilter() are summarized in the table below. See Section 3.5 of Politsch et al. (2020a) for more details.

Scenario	Hyperparameter optimization
`x` is unevenly sampled	`cv_trendfilter()`
`x` is evenly sampled and measurement variances for `y` are not available	`cv_trendfilter()`
`x` is evenly sampled and measurement variances for `y` are available	`sure_trendfilter()`

For our purposes, an evenly sampled data set with some discarded pixels (either sporadically or in wide consecutive chunks) is still considered to be evenly sampled. When x is evenly sampled on a transformed scale, we recommend transforming to that scale and carrying out the full trend filtering analysis on that scale. See the sure_trendfilter() examples for a case when the inputs are evenly sampled on the log10(x) scale.

Loss functions

The following loss functions are automatically computed during cross validation and their CV error curves are returned in the error list within the cv_trendfilter() output.

Mean absolute deviations error: \[MAE(\lambda) = \frac{1}{n} \sum_{i=1}^{n}|Y_i - \hat{f}(x_i; \lambda)|\]
Weighted mean absolute deviations error: \[WMAE(\lambda) = \sum_{i=1}^{n} |Y_i - \hat{f}(x_i; \lambda)|\frac{\sqrt{w_i}}{\sum_j\sqrt{w_j}}\]
Mean-squared error: \[MSE(\lambda) = \frac{1}{n} \sum_{i=1}^{n} |Y_i - \hat{f}(x_i; \lambda)|^2\]
Weighted mean-squared error: \[WMSE(\lambda) = \sum_{i=1}^{n}|Y_i - \hat{f}(x_i; \lambda)|^2\frac{w_i}{\sum_jw_j}\]
log-cosh error: \[logcosh(\lambda) = \frac{1}{n}\sum_{i=1}^{n} \log\left(\cosh\left(Y_i - \hat{f}(x_i; \lambda)\right)\right)\]
Weighted log-cosh error: \[wlogcosh(\lambda) = \sum_{i=1}^{n} \log\left(\cosh\left((Y_i - \hat{f}(x_i; \lambda))\sqrt{w_i}\right)\right)\]
Huber loss: \[Huber(\lambda) = \frac{1}{n}\sum_{i=1}^{n}L_{\lambda}(Y_i; \delta)\] \[\text{where}\;\;\;\;L_{\lambda}(Y_i; \delta) = \cases{ |Y_i - \hat{f}(x_i; \lambda)|^2, & $|Y_i - \hat{f}(x_i; \lambda)| \leq \delta$ \cr 2\delta|Y_i - \hat{f}(x_i; \lambda)| - \delta^2, & $|Y_i - \hat{f}(x_i; \lambda)| > \delta$}\]
Weighted Huber loss: \[wHuber(\lambda) = \sum_{i=1}^{n}L_{\lambda}(Y_i; \delta)\] \[\text{where}\;\;\;\;L_{\lambda}(Y_i; \delta) = \cases{ |Y_i - \hat{f}(x_i; \lambda)|^2w_i, & $|Y_i - \hat{f}(x_i; \lambda)|\sqrt{w_i} \leq \delta$ \cr 2\delta|Y_i - \hat{f}(x_i; \lambda)|\sqrt{w_i} - \delta^2, & $|Y_i - \hat{f}(x_i; \lambda)|\sqrt{w_i} > \delta$}\]
Mean-squared logarithmic error: \[MSLE(\lambda) = \frac{1}{n}\sum_{i=1}^{n} \left|\log(Y_i + 1) - \log(\hat{f}(x_i; \lambda) + 1)\right|^2 \]

where $w_i:=$ weights[i].

When defining custom loss functions, each function within the named list passed to loss_funcs should take three vector arguments --- y, tf_estimate, and weights --- and return a single scalar value for the validation loss. For example, if we wanted to optimize the hyperparameter by minimizing an uncertainty-weighted median of the model's MAD errors, we would pass the list below to loss_funcs:

MedAE <- function(tf_estimate, y, weights) {
  matrixStats::weightedMedian(abs(tf_estimate - y), sqrt(weights))
}

my_loss_funcs <- list(MedAE = MedAE)

References

Politsch et al. (2020a). Trend filtering – I. A modern statistical tool for time-domain astronomy and astronomical spectroscopy. MNRAS, 492(3), p. 4005-4018. [Publisher] [arXiv].
Politsch et al. (2020b). Trend Filtering – II. Denoising astronomical signals with varying degrees of smoothness. MNRAS, 492(3), p. 4019-4032. [Publisher] [arXiv].

Examples

data("eclipsing_binary")
head(eclipsing_binary)
#> # A tibble: 6 × 3
#>    phase  flux std_err
#>    <dbl> <dbl>   <dbl>
#> 1 -0.499 0.938  0.0102
#> 2 -0.498 0.930  0.0102
#> 3 -0.496 0.944  0.0102
#> 4 -0.494 0.963  0.0102
#> 5 -0.494 0.934  0.0102
#> 6 -0.494 0.949  0.0102

x <- eclipsing_binary$phase
y <- eclipsing_binary$flux
weights <- 1 / eclipsing_binary$std_err^2

cv_tf <- cv_trendfilter(
  x = x,
  y = y,
  weights = weights,
  max_iter = 1e4,
  obj_tol = 1e-6
)