R/hyperparameter-tuning.R
cv_trendfilter.Rd
For every candidate hyperparameter value, estimate the trend
filtering model's out-of-sample error by \(V\)-fold cross validation.
See the Details section for guidelines on when cv_trendfilter()
should be used versus sure_trendfilter()
.
cv_trendfilter( x, y, weights = NULL, nlambda = 250L, V = 10L, mc_cores = V, loss_funcs = NULL, fold_ids = NULL, ... )
x | Vector of observed values for the input variable. |
---|---|
y | Vector of observed values for the output variable. |
weights | Weights for the observed outputs, defined as the reciprocal variance of the
additive noise that contaminates the output signal. When the noise is
expected to have an equal variance \(\sigma^2\) for all observations,
a scalar may be passed to |
nlambda | Number of hyperparameter values to evaluate during cross validation.
Defaults to |
V | Number of folds that the data are partitioned into for V-fold cross
validation. Must be at least 2 and no more than 10. Defaults to |
mc_cores | Number of cores to utilize for parallel computing. Defaults to
|
loss_funcs | A named list of one or more functions, with each defining a loss function to be evaluated during cross validation. See the Loss functions section below for an example. |
fold_ids | An integer vector defining a custom partition of the data for cross
validation. |
... | Additional named arguments to pass to |
An object of class 'cv_trendfilter
'. The object has subclass
'trendfilter
' and may therefore be passed to generic
stats functions such as predict()
, fitted()
, and residuals()
.
And more precisely, a cv_trendfilter
' object is a list with the elements
below, as well as all elements from a trendfilter
' call
on the full data set.
lambda
Vector of candidate hyperparameter values (always returned in descending order).
edf
Number of effective degrees of freedom in the trend filtering
estimator, for every candidate hyperparameter value in lambda
.
error
A named list of vectors, with each representing the
CV error curve for every loss function in loss_funcs
(see below).
se_error
Standard errors for the error
, within a named list of
the same structure.
lambda_min
A named vector with length equal to length(lambda)
,
containing the hyperparameter value that minimizes the CV error curve, for
every loss function in loss_funcs
.
lambda_1se
A named vector with length equal to length(lambda)
,
containing the "1-standard-error rule" hyperparameter, for every loss
function in loss_funcs
. The "1-standard-error rule" hyperparameter is the
largest hyperparameter value in lambda
that has a CV error within one
standard error of min(error)
. It serves as an Occam's razor-like
heuristic. More precisely, given two models with approximately equal
performance (in terms of some loss function), it may be wise to opt for the
simpler model, i.e. the model with the larger hyperparameter value / fewer
effective degrees of freedom.
edf_min
A named vector with length equal to length(lambda)
,
containing the number of effective degrees of freedom in the trend filtering
estimator that minimizes the CV error curve, for every loss function in
loss_funcs
.
edf_1se
A named vector with length equal to length(lambda)
,
containing the number of effective degrees of freedom in the
"1-standard-error rule" trend filtering estimator, for every loss function in
loss_funcs
.
i_min
A named vector with length equal to length(lambda)
,
containing the index of lambda
that minimizes the CV error curve, for
every loss function in loss_funcs
.
i_1se
A named vector with length equal to length(lambda)
,
containing the index of lambda
that gives the "1-standard-error rule"
hyperparameter value, for every loss function in loss_funcs
.
fitted_values
The fitted values of all trend filtering estimates,
return as a matrix with length(lambda)
columns, with fitted_values[,i]
corresponding to the trend filtering estimate with hyperparameter
lambda[i]
.
admm_params
A list of the parameter values used by the ADMM algorithm used to solve the trend filtering convex optimization.
obj_func
The relative change in the objective function over the
ADMM algorithm's final iteration, for every candidate hyperparameter in
lambda
.
n_iter
Total number of iterations taken by the ADMM algorithm, for
every candidate hyperparameter in lambda
. If an element of n_iter
is
exactly equal to admm_params$max_iter
, then the ADMM algorithm stopped
before reaching the objective tolerance admm_params$obj_tol
. In these
situations, you may need to increase the maximum number of tolerable
iterations by passing a max_iter
argument to cv_trendfilter()
in order
to ensure that the ADMM solution has converged to satisfactory precision.
loss_funcs
A named list of functions that defines all loss functions --- both internal and user-passed --- evaluated during cross validation.
V
The number of folds the data were split into for cross validation.
x
Vector of observed values for the input variable.
y
Vector of observed values for the output variable (if originally
present, observations with is.na(y)
or weights == 0
are dropped).
weights
Vector of weights for the observed outputs.
k
Degree of the trend filtering estimates.
status
For internal use. Output from the C solver.
call
The function call.
scale_xy
For internal use.
Many common regression loss functions are defined internally, and a
cross validation curve is returned for each. Custom loss functions may also
be passed via the loss_funcs
argument. See the Loss functions section
below for definitions of the internal loss functions.
Generic functions such as predict()
, fitted
, and residuals()
may
be called on the cv_trendfilter()
output.
Our recommendations for when to use cv_trendfilter()
versus
sure_trendfilter()
are summarized in the table below. See Section 3.5 of
Politsch et al. (2020a) for more details.
Scenario | Hyperparameter optimization |
x is unevenly sampled | cv_trendfilter() |
x is evenly sampled and measurement variances for y are not available | cv_trendfilter() |
x is evenly sampled and measurement variances for y are available | sure_trendfilter() |
For our purposes, an evenly sampled data set with some discarded pixels
(either sporadically or in wide consecutive chunks) is still considered to
be evenly sampled. When x
is evenly sampled on a transformed scale, we
recommend transforming to that scale and carrying out the full trend
filtering analysis on that scale. See the sure_trendfilter()
examples for
a case when the inputs are evenly sampled on the log10(x)
scale.
The following loss functions are automatically computed during cross
validation and their CV error curves are returned in the error
list
within the cv_trendfilter()
output.
Mean absolute deviations error: \[MAE(\lambda) = \frac{1}{n} \sum_{i=1}^{n}|Y_i - \hat{f}(x_i; \lambda)|\]
Weighted mean absolute deviations error: \[WMAE(\lambda) = \sum_{i=1}^{n} |Y_i - \hat{f}(x_i; \lambda)|\frac{\sqrt{w_i}}{\sum_j\sqrt{w_j}}\]
Mean-squared error: \[MSE(\lambda) = \frac{1}{n} \sum_{i=1}^{n} |Y_i - \hat{f}(x_i; \lambda)|^2\]
Weighted mean-squared error: \[WMSE(\lambda) = \sum_{i=1}^{n}|Y_i - \hat{f}(x_i; \lambda)|^2\frac{w_i}{\sum_jw_j}\]
log-cosh error: \[logcosh(\lambda) = \frac{1}{n}\sum_{i=1}^{n} \log\left(\cosh\left(Y_i - \hat{f}(x_i; \lambda)\right)\right)\]
Weighted log-cosh error: \[wlogcosh(\lambda) = \sum_{i=1}^{n} \log\left(\cosh\left((Y_i - \hat{f}(x_i; \lambda))\sqrt{w_i}\right)\right)\]
Huber loss: \[Huber(\lambda) = \frac{1}{n}\sum_{i=1}^{n}L_{\lambda}(Y_i; \delta)\] \[\text{where}\;\;\;\;L_{\lambda}(Y_i; \delta) = \cases{ |Y_i - \hat{f}(x_i; \lambda)|^2, & $|Y_i - \hat{f}(x_i; \lambda)| \leq \delta$ \cr 2\delta|Y_i - \hat{f}(x_i; \lambda)| - \delta^2, & $|Y_i - \hat{f}(x_i; \lambda)| > \delta$}\]
Weighted Huber loss: \[wHuber(\lambda) = \sum_{i=1}^{n}L_{\lambda}(Y_i; \delta)\] \[\text{where}\;\;\;\;L_{\lambda}(Y_i; \delta) = \cases{ |Y_i - \hat{f}(x_i; \lambda)|^2w_i, & $|Y_i - \hat{f}(x_i; \lambda)|\sqrt{w_i} \leq \delta$ \cr 2\delta|Y_i - \hat{f}(x_i; \lambda)|\sqrt{w_i} - \delta^2, & $|Y_i - \hat{f}(x_i; \lambda)|\sqrt{w_i} > \delta$}\]
Mean-squared logarithmic error: \[MSLE(\lambda) = \frac{1}{n}\sum_{i=1}^{n} \left|\log(Y_i + 1) - \log(\hat{f}(x_i; \lambda) + 1)\right|^2 \]
where \(w_i:=\) weights[i]
.
When defining custom loss functions, each function within the named list
passed to loss_funcs
should take three vector arguments --- y
,
tf_estimate
, and weights
--- and return a single scalar value for the
validation loss. For example, if we wanted to optimize the hyperparameter by
minimizing an uncertainty-weighted median of the model's MAD errors, we would
pass the list below to loss_funcs
:
MedAE <- function(tf_estimate, y, weights) { matrixStats::weightedMedian(abs(tf_estimate - y), sqrt(weights)) } my_loss_funcs <- list(MedAE = MedAE)
Politsch et al. (2020a). Trend filtering – I. A modern statistical tool for time-domain astronomy and astronomical spectroscopy. MNRAS, 492(3), p. 4005-4018. [Publisher] [arXiv].
Politsch et al. (2020b). Trend Filtering – II. Denoising astronomical signals with varying degrees of smoothness. MNRAS, 492(3), p. 4019-4032. [Publisher] [arXiv].
#> # A tibble: 6 × 3 #> phase flux std_err #> <dbl> <dbl> <dbl> #> 1 -0.499 0.938 0.0102 #> 2 -0.498 0.930 0.0102 #> 3 -0.496 0.944 0.0102 #> 4 -0.494 0.963 0.0102 #> 5 -0.494 0.934 0.0102 #> 6 -0.494 0.949 0.0102x <- eclipsing_binary$phase y <- eclipsing_binary$flux weights <- 1 / eclipsing_binary$std_err^2 cv_tf <- cv_trendfilter( x = x, y = y, weights = weights, max_iter = 1e4, obj_tol = 1e-6 )