Title: | Tools for generalized quantile modeling |
---|---|
Description: | Tools for generalized quantile modeling: regularized quantile regression (with generalized lasso penalties and noncrossing constraints), cross-validation, quantile extrapolation, and quantile ensembles. |
Authors: | Ryan Tibshirani [aut, cre], Logan Brooks [aut] |
Maintainer: | Ryan Tibshirani <[email protected]> |
License: | GPL-2 |
Version: | 1.0.0 |
Built: | 2024-11-08 06:14:48 UTC |
Source: | https://github.com/ryantibs/quantgen |
Retrieve ensemble coefficients for estimating the conditional quantiles at given tau values.
## S3 method for class 'quantile_ensemble' coef(object, ...)
## S3 method for class 'quantile_ensemble' coef(object, ...)
object |
The |
... |
Additional arguments (not used). |
Retrieve generalized lasso coefficients for estimating the conditional quantiles at specified tau or lambda values.
## S3 method for class 'quantile_genlasso' coef(object, s = NULL, ...)
## S3 method for class 'quantile_genlasso' coef(object, s = NULL, ...)
object |
The |
s |
Vector of integers specifying the tau and lambda values to consider
for coefficients; for each |
... |
Additional arguments (not used). |
Combine (say) p matrices, each of dimension n x r, into an n x p x r array.
combine_into_array(mat, ...)
combine_into_array(mat, ...)
mat |
First matrix to combine into an array. Alternatively, a list of matrices to combine into an array. |
... |
Additional matrices to combine into an array. These additional
arguments will be ignored if |
Run cross-validation for the quantile generalized lasso on a tau by lambda grid. For each tau, the lambda value minimizing the cross-validation error is reported.
cv_quantile_genlasso( x, y, d, tau, lambda = NULL, nlambda = 30, lambda_min_ratio = 0.001, weights = NULL, nfolds = 5, train_test_inds = NULL, intercept = TRUE, standardize = TRUE, lb = -Inf, ub = Inf, noncross = FALSE, x0 = NULL, lp_solver = c("glpk", "gurobi"), time_limit = NULL, warm_starts = TRUE, params = list(), transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE, sort = FALSE, iso = FALSE, nonneg = FALSE, round = FALSE )
cv_quantile_genlasso( x, y, d, tau, lambda = NULL, nlambda = 30, lambda_min_ratio = 0.001, weights = NULL, nfolds = 5, train_test_inds = NULL, intercept = TRUE, standardize = TRUE, lb = -Inf, ub = Inf, noncross = FALSE, x0 = NULL, lp_solver = c("glpk", "gurobi"), time_limit = NULL, warm_starts = TRUE, params = list(), transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE, sort = FALSE, iso = FALSE, nonneg = FALSE, round = FALSE )
nfolds |
Number of cross-validation folds. Default is 5. |
train_test_inds |
List of length two, with components named |
All arguments through verbose
(except for nfolds
and
train_test_inds
) are as in quantile_genlasso_grid
and
quantile_genlasso
. Note that the noncross
and x0
arguments are not passed to quantile_genlasso_grid
for the
calculation of cross-validation errors and optimal lambda values; they are
only passed to quantile_genlasso
for the final object that is fit to
the full training set. Past verbose
, the arguments are as in
predict.quantile_genlasso
, and control what happens with the
predictions made on the validation sets.
A list with the following components:
qgl_obj |
A |
cv_mat |
Matrix of cross-validation errors (as measured by quantile loss), of dimension (number of tuning parameter values) x (number of quantile levels) |
lambda_min |
Vector of optimum lambda values, one per quantile level |
tau , lambda
|
Vectors of tau and lambda values used |
Run cross-validation for the quantile lasso on a tau by lambda grid. For each tau, the lambda value minimizing the cross-validation error is reported.
cv_quantile_lasso( x, y, tau, lambda = NULL, nlambda = 30, lambda_min_ratio = 0.001, weights = NULL, no_pen_vars = c(), nfolds = 5, train_test_inds = NULL, intercept = TRUE, standardize = TRUE, lb = -Inf, ub = Inf, noncross = FALSE, x0 = NULL, lp_solver = c("glpk", "gurobi"), time_limit = NULL, warm_starts = TRUE, params = list(), transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE, sort = FALSE, iso = FALSE, nonneg = FALSE, round = FALSE )
cv_quantile_lasso( x, y, tau, lambda = NULL, nlambda = 30, lambda_min_ratio = 0.001, weights = NULL, no_pen_vars = c(), nfolds = 5, train_test_inds = NULL, intercept = TRUE, standardize = TRUE, lb = -Inf, ub = Inf, noncross = FALSE, x0 = NULL, lp_solver = c("glpk", "gurobi"), time_limit = NULL, warm_starts = TRUE, params = list(), transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE, sort = FALSE, iso = FALSE, nonneg = FALSE, round = FALSE )
nfolds |
Number of cross-validation folds. Default is 5. |
train_test_inds |
List of length two, with components named |
All arguments through verbose
(except for nfolds
and
train_test_inds
) are as in quantile_lasso_grid
and
quantile_lasso
. Note that the noncross
and x0
arguments are not passed to quantile_lasso_grid
for the calculation
of cross-validation errors and optimal lambda values; they are only passed
to quantile_lasso
for the final object that is fit to the full
training set. Past verbose
, the arguments are as in
predict.quantile_lasso
, and control what happens with the
predictions made on the validation sets. The associated predict
function is just that for the cv_quantile_genlasso
class.
A list with the following components:
qgl_obj |
A |
cv_mat |
Matrix of cross-validation errors (as measured by quantile loss), of dimension (number of tuning parameter values) x (number of quantile levels) |
lambda_min |
Vector of optimum lambda values, one per quantile level |
Construct a difference operator, of a given order, for use in trend filtering penalties.
get_diff_mat(p, k)
get_diff_mat(p, k)
p |
Dimension (number of columns) of the difference matrix. |
k |
Order of the difference matrix. |
A sparse matrix of dimension (p - k) x p.
Compute lambda max for a quantile generalized lasso problem.
get_lambda_max(x, y, d, weights = NULL, lp_solver = c("glpk", "gurobi"))
get_lambda_max(x, y, d, weights = NULL, lp_solver = c("glpk", "gurobi"))
This is not exact, but should be close to the exact value of
such that
at the solution
of the quantile generalized lasso problem. It is derived
from the KKT conditions when
.
Compute a lambda sequence for a quantile generalized lasso problem.
get_lambda_seq( x, y, d, nlambda, lambda_min_ratio, weights = NULL, intercept = TRUE, standardize = TRUE, lp_solver = c("glpk", "gurobi"), transform = NULL )
get_lambda_seq( x, y, d, nlambda, lambda_min_ratio, weights = NULL, intercept = TRUE, standardize = TRUE, lp_solver = c("glpk", "gurobi"), transform = NULL )
This function returns nlambda
values log-spaced in between
lambda_max
, as computed by get_lambda_max
, and
lamdba_max * lambda_min_ratio
. If d
is not specified, we will
set it equal to the identity (hence interpret the problem as a quantile
lasso problem).
Returns functions that map and
. (These are inverses.)
log_pad(a = 1, b = 1) exp_pad(a = 1, b = 1)
log_pad(a = 1, b = 1) exp_pad(a = 1, b = 1)
Returns functions that map and
. (These are inverses.)
logit_pad(a = 1, b = 0.01) sigmd_pad(a = 1, b = 0.011)
logit_pad(a = 1, b = 0.01) sigmd_pad(a = 1, b = 0.011)
Plot the cross-validation error curves, for each quantile level, as functions of the tuning parameter value.
## S3 method for class 'cv_quantile_genlasso' plot(x, legend_pos = "topleft", ...)
## S3 method for class 'cv_quantile_genlasso' plot(x, legend_pos = "topleft", ...)
x |
The |
legend_pos |
Position for the legend; default is "topleft"; use NULL to suppress the legend. |
... |
Additional arguments (not used). |
Predict the conditional quantiles at a new set of predictor variables, using the generalized lasso coefficients tuned by cross-validation.
## S3 method for class 'cv_quantile_genlasso' predict( object, newx, s = NULL, sort = FALSE, iso = FALSE, nonneg = FALSE, round = FALSE, ... )
## S3 method for class 'cv_quantile_genlasso' predict( object, newx, s = NULL, sort = FALSE, iso = FALSE, nonneg = FALSE, round = FALSE, ... )
This just calls the predict
function on the
quantile_genlasso
that is stored within the given
cv_quantile_genlasso
object.
Predict the conditional quantiles at a new set of ensemble realizations, using the ensemble coefficients at given tau values.
## S3 method for class 'quantile_ensemble' predict( object, newq, s = NULL, sort = TRUE, iso = FALSE, nonneg = FALSE, round = FALSE, ... )
## S3 method for class 'quantile_ensemble' predict( object, newq, s = NULL, sort = TRUE, iso = FALSE, nonneg = FALSE, round = FALSE, ... )
object |
The |
newq |
Array of new predicted quantiles, of dimension (number of new prediction points) x (number or ensemble components) x (number of quantile levels). |
sort |
Should the returned quantile estimates be sorted? Default is TRUE. |
iso |
Should the returned quantile estimates be passed through isotonic
regression? Default is FALSE; if TRUE, takes priority over |
nonneg |
Should the returned quantile estimates be truncated at 0? Natural for count data. Default is FALSE. |
round |
Should the returned quantile estimates be rounded? Natural for count data. Default is FALSE. |
... |
Additional arguments (not used). |
Predict the conditional quantiles at a new set of predictor variables, using the generalized lasso coefficients at specified tau or lambda values.
## S3 method for class 'quantile_genlasso' predict( object, newx, s = NULL, sort = FALSE, iso = FALSE, nonneg = FALSE, round = FALSE, ... )
## S3 method for class 'quantile_genlasso' predict( object, newx, s = NULL, sort = FALSE, iso = FALSE, nonneg = FALSE, round = FALSE, ... )
object |
The |
newx |
Matrix of new predictor variables at which predictions should be made. |
s |
Vector of integers specifying the tau and lambda values to consider
for predictions; for each |
sort |
Should the returned quantile estimates be sorted? Default is
FALSE. Note: this option only makes sense if the values in the stored
|
iso |
Should the returned quantile estimates be passed through isotonic
regression? Default is FALSE; if TRUE, takes priority over |
nonneg |
Should the returned quantile estimates be truncated at 0? Natural for count data. Default is FALSE. |
round |
Should the returned quantile estimates be rounded? Natural for count data. Default is FALSE. |
... |
Additional arguments (not used). |
Predict the conditional quantiles at a new set of predictor variables, using the generalized lasso coefficients at given tau or lambda values.
## S3 method for class 'quantile_genlasso_grid' predict( object, newx, sort = FALSE, iso = FALSE, nonneg = FALSE, round = FALSE, ... )
## S3 method for class 'quantile_genlasso_grid' predict( object, newx, sort = FALSE, iso = FALSE, nonneg = FALSE, round = FALSE, ... )
This function operates as in the predict.quantile_genlasso
function for a quantile_genlasso
object, but with a few key
differences. First, the output is reformatted so that it is an array of
dimension (number of prediction points) x (number of tuning parameter
values) x (number of quantile levels). This output is generated from the
full set of tau and lambda pairs stored in the given
quantile_genlasso_grid
object obj
(selecting a subset is
disallowed). Second, the arguments sort
and iso
operate on
the appropriate slices of this array: for a fixed lambda value, we sort or
run isotonic regression across all tau values.
This package provides tools for generalized quantile modeling: regularized quantile regression (with generalized lasso penalties and noncrossing constraints), cross-validation, quantile extrapolation, and quantile ensembles.
We recommend the "getting started" and other vignettes, provided online: https://ryantibs.github.io/quantgen/.
Fit ensemble weights, given a set of quantile predictions.
quantile_ensemble( qarr, y, tau, weights = NULL, tau_groups = rep(1, length(tau)), intercept = FALSE, nonneg = TRUE, unit_sum = TRUE, noncross = TRUE, q0 = NULL, lp_solver = c("glpk", "gurobi"), time_limit = NULL, params = list(), verbose = FALSE )
quantile_ensemble( qarr, y, tau, weights = NULL, tau_groups = rep(1, length(tau)), intercept = FALSE, nonneg = TRUE, unit_sum = TRUE, noncross = TRUE, q0 = NULL, lp_solver = c("glpk", "gurobi"), time_limit = NULL, params = list(), verbose = FALSE )
qarr |
Array of predicted quantiles, of dimension (number of prediction points) x (number or ensemble components) x (number of quantile levels). |
y |
Vector of responses (whose quantiles are being predicted by
|
tau |
Vector of quantile levels at which predictions are made. Assumed to be distinct, and sorted in increasing order. |
weights |
Vector of observation weights (to be used in the loss function). Default is NULL, which is interpreted as a weight of 1 for each observation. |
tau_groups |
Vector of group labels, having the same length as
|
intercept |
Should an intercept be included in the ensemble model? Default is FALSE. |
nonneg |
Should the ensemble weights be constrained to be nonnegative? Default is TRUE. |
unit_sum |
Should the ensemble weights be constrained to sum to 1? Default is TRUE. |
noncross |
Should noncrossing constraints be enforced? Default is
TRUE. Note: this option only matters when there is more than group of
ensemble weights, as determined by |
q0 |
Array of points used to define the noncrossing
constraints. Must have dimension (number of points) x (number of ensemble
components) x (number of quantile levels). Default is NULL, which means
that we consider noncrossing constraints at the training points
|
lp_solver |
One of "glpk" or "gurobi", indicating which LP solver to use. If possible, "gurobi" should be used because it is much faster and more stable; default is "glpk"; however, because it is open-source. |
time_limit |
This sets the maximum amount of time (in seconds) to allow Gurobi or GLPK to solve any single quantile generalized lasso problem (for a single tau and lambda value). Default is NULL, which means unlimited time. |
params |
List of control parameters to pass to Gurobi or GLPK. Default
is |
verbose |
Should progress be printed out to the console? Default is FALSE. |
This function solves the following quantile ensemble optimization
problem, over quantile levels :
for a response vector and quantile array
, where
is an estimate of the quantile of
at the level
, from ensemble component member
. Here
is the "pinball" or "tilted
" loss. A more advanced version allows us to estimate a
separate ensemble weight
per component method
,
per quantile level
:
As a form of regularization, we can additionally incorporate noncrossing constraints into the above optimization, which take the form:
where the quantile levels are assumed to be in
increasing order, and
is a collection of points over
which to enforce the noncrossing constraints. Finally, somewhere in between
these two extremes is to allow one ensemble weight per component member
, per quantile group
. This can be interpreted as a set of
further constraints which enforce equality between
and
, for all
that are in the same group
.
A list with the following components:
alpha |
Vector or matrix of ensemble weights. If |
tau |
Vector of quantile levels used |
weights , tau_groups , ... , params
|
Values of these other arguments used in the function call |
Extrapolate a set of quantiles at new quantile levels: parametric in the tails, nonparametric in the middle.
quantile_extrapolate( tau, qvals, tau_out = c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99), sort = TRUE, iso = FALSE, nonneg = FALSE, round = FALSE, qfun_left = qnorm, qfun_right = qnorm, n_tau_left = 1, n_tau_right = 1, middle = c("cubic", "linear"), param0 = NULL, param1 = NULL, grid_size = 1000, tol = 0.01, max_iter = 10 )
quantile_extrapolate( tau, qvals, tau_out = c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99), sort = TRUE, iso = FALSE, nonneg = FALSE, round = FALSE, qfun_left = qnorm, qfun_right = qnorm, n_tau_left = 1, n_tau_right = 1, middle = c("cubic", "linear"), param0 = NULL, param1 = NULL, grid_size = 1000, tol = 0.01, max_iter = 10 )
tau |
Vector of quantile levels. Assumed to be distinct, and sorted in increasing order. |
qvals |
Vector or matrix quantiles; if a matrix, each row is a separate
set of quantiles, at the same (common) quantile levels, given by
|
tau_out |
Vector of quantile levels at which to perform extrapolation. Default is a sequence of 23 quantile levels from 0.01 to 0.99. |
sort |
Should the returned quantile estimates be sorted? Default is TRUE. |
iso |
Should the returned quantile estimates be passed through isotonic
regression? Default is FALSE; if TRUE, takes priority over |
nonneg |
Should the returned quantile estimates be truncated at 0? Natural for count data. Default is FALSE. |
round |
Should the returned quantile estimates be rounded? Natural for count data. Default is FALSE. |
qfun_left , qfun_right
|
Quantile functions on which to base extrapolation
in the left and right tails, respectively; each must be a function whose
first two arguments are a quantile level and a distribution parameter (such
as a mean parameter); these are assumed to be vectorized in the first
argument when the second argument is fixed, and also vectorized in the
second argument when the first argument is fixed. Default is
|
n_tau_left , n_tau_right
|
Integers between 1 and the length of
|
middle |
One of "cubic" or "linear", indicating the interpolation method
to use in the middle (outside of the tails, as determined by
|
param0 , param1 , grid_size , tol , max_iter
|
Arguments for the algorithm used for parameter-fitting for tail extrapolation. See details. |
This function interpolates/extrapolates an initial sparser set of
quantiles, say at the levels
into a denser set of quantiles, say
at the
levels
. At a high-level, the strategy
is to nonparametrically interpolate the quantiles whose levels fall in the
interval
, and parametrically extrapolate the
quantiles whose levels fall in
or
. Let
us call these the "middle" and "tail" strategies, respectively.
To give more details on the middle strategy: a monotone spline
interpolant—either a cubic spline (if middle="cubic"
) or linear
spline interpolant (if middle="linear"
)—is fit to the points
Denoting by this interpolant, we then set
To give more details on the tail strategy: in each tail, left and right,
the user specifies a tail function which depends on a
parameter
. This is done via the functions
qfun_left
and qfun_right
; the default is qnorm
for both, in which case
represents the mean of the normal distribution (and the
standard deviation is fixed at 1, as per the default in
qnorm
). Given this tail function, we then find the parameter value
that best matches the given quantile, and use this for
extrapolation. That is, for the left tail, we first fit
such that
and we then set
The right tail is similar.
The fitting algorithm used for determining in each tail is
a kind of iterative grid search that proceeds in "rounds". The arguments
param0,param1
give the left and right endpoints of the initial
interval used in the first round of the search—this interval typically
contracts as the rounds proceed, but can also expand as needed; the
argument grid_size
is the number of grid points to consider in each
round; the argument tol
is the error tolerance for stopping; and the
argument max_iter
is the maximum number of rounds to consider. This
fitting algorithm is robust to the case when the optimal parameter value
that matches the given quantile, as per the above display, is not unqiue;
in this case we take the mean of the range of optimal parameter values.
Finally, when the arguments n_tau_left
and n_tau_right
are
changed from their defaults, then this changes the definition of the
"middle" and the "tail" ranges, but otherwise the analogous strategies are
employed. In fact, the middle strategy is unchanged, just applied to a
different range. The tail strategy is similar, but now in each tail, left
and right, we fit a separate parameter value for each
given quantile level in the tail range (for example, for each of the two
leftmost quantile levels if
ntau_left=2
), and then take the mean of
these parameters as a single parameter value on which to base tail
extrapolation.
A matrix of dimension (number of rows in qvals
) x (length of
tau_out
), where each row is the extrapolation of the set of
quantiles in the corresponding row of qvals
, at the quantile levels
specified in tau_out
.
Compute quantile generalized lasso solutions.
quantile_genlasso( x, y, d, tau, lambda, weights = NULL, intercept = TRUE, standardize = TRUE, lb = -Inf, ub = Inf, noncross = FALSE, x0 = NULL, lp_solver = c("glpk", "gurobi"), time_limit = NULL, warm_starts = TRUE, params = list(), transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE )
quantile_genlasso( x, y, d, tau, lambda, weights = NULL, intercept = TRUE, standardize = TRUE, lb = -Inf, ub = Inf, noncross = FALSE, x0 = NULL, lp_solver = c("glpk", "gurobi"), time_limit = NULL, warm_starts = TRUE, params = list(), transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE )
x |
Matrix of predictors. If sparse, then passing it an appropriate
sparse |
y |
Vector of responses. |
d |
Matrix defining the generalized lasso penalty; see details. If
sparse, then passing it an appropriate sparse |
tau , lambda
|
Vectors of quantile levels and tuning parameter values. If
these are not of the same length, the shorter of the two is recycled so
that they become the same length. Then, for each |
weights |
Vector of observation weights (to be used in the loss function). Default is NULL, which is interpreted as a weight of 1 for each observation. |
intercept |
Should an intercept be included in the regression model? Default is TRUE. |
standardize |
Should the predictors be standardized (to have unit variance) before fitting? Default is TRUE. |
lb , ub
|
Lower and upper bounds, respectively, to place as constraints on
the coefficients in the optimization problem. These can be constants (to
place the same bound on each coefficient) or vectors of length equal to the
number of predictors (to place a potentially different bound on each
coefficient). Default is code-Inf and |
noncross |
Should noncrossing constraints be applied? These force the
estimated quantiles to be properly ordered across all quantile levels being
considered. The default is FALSE. If TRUE, then noncrossing constraints are
applied to the estimated quantiles at all points specified by the next
argument |
x0 |
Matrix of points used to define the noncrossing
constraints. Default is NULL, which means that we consider noncrossing
constraints at the training points |
lp_solver |
One of "glpk" or "gurobi", indicating which LP solver to use. If possible, "gurobi" should be used because it is much faster and more stable; default is "glpk"; however, because it is open-source. |
time_limit |
This sets the maximum amount of time (in seconds) to allow Gurobi or GLPK to solve any single quantile generalized lasso problem (for a single tau and lambda value). Default is NULL, which means unlimited time. |
warm_starts |
Should warm starts be used in the LP solver (from one LP solve to the next)? Only supported for Gurobi. |
params |
List of control parameters to pass to Gurobi or GLPK. Default
is |
transform , inv_trans
|
The first is a function to transform y before
solving the quantile generalized lasso; the second is the corresponding
inverse transform. For example: for count data, we might want to model
log(1+y) (which would be the transform, and the inverse transform would be
exp(x)-1). Both |
jitter |
Function for applying random jitter to y, which might help
optimization. For example: for count data, there can be lots of ties (with
or without transformation of y), which can make optimization more
difficult. The function |
verbose |
Should progress be printed out to the console? Default is FALSE. |
This function solves the quantile generalized lasso problem, for
each pair of quantile level and tuning parameter
:
for a response vector with components
, predictor matrix
with rows
, and penalty matrix
. Here
is the
"pinball" or "tilted
" loss. When noncrossing constraints are
applied, we instead solve one big joint optimization, over all quantile
levels and tuning parameter values:
where the quantile levels are assumed to be in
increasing order, and
is a collection of points over
which to enforce the noncrossing constraints.
Either problem is readily converted into a linear program (LP), and solved using either Gurobi (which is free for academic use, and generally fast) or GLPK (which free for everyone, but slower).
A list with the following components:
beta |
Matrix of generalized lasso coefficients, of dimension =
(number of features + 1) x (number of quantile levels) assuming
|
these coefficients will always be on the appropriate scale;
they are always on the scale of original features, even if
standardize=TRUE
status |
Vector of status flags returned by Gurobi's or GLPK's LP solver, of length = (number of quantile levels) |
tau , lambda
|
Vectors of tau and lambda values used |
weights , intercept , ... , jitter
|
Values of these other arguments used in the function call |
Ryan Tibshirani
Convenience function for computing quantile generalized lasso solutions on a tau by lambda grid.
quantile_genlasso_grid( x, y, d, tau, lambda = NULL, nlambda = 30, lambda_min_ratio = 0.001, weights = NULL, intercept = TRUE, standardize = TRUE, lb = -Inf, ub = Inf, lp_solver = c("glpk", "gurobi"), time_limit = NULL, warm_starts = TRUE, params = list(), transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE )
quantile_genlasso_grid( x, y, d, tau, lambda = NULL, nlambda = 30, lambda_min_ratio = 0.001, weights = NULL, intercept = TRUE, standardize = TRUE, lb = -Inf, ub = Inf, lp_solver = c("glpk", "gurobi"), time_limit = NULL, warm_starts = TRUE, params = list(), transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE )
nlambda |
Number of lambda values to consider, for each quantile level. Default is 30. |
lambda_min_ratio |
Ratio of the minimum to maximum lambda value, for each quantile levels. Default is 1e-3. |
This function forms a lambda
vector either determined by the
nlambda
and lambda_min_ratio
arguments, or the lambda
argument; if the latter is specified, then it takes priority. Then, for
each i
and j
, we solve a separate quantile generalized lasso
problem at quantile level tau[i]
and tuning parameter value
lambda[j]
, using the quantile_genlasso
function. All
arguments (aside from nlambda
and lambda_min_ratio
) are as in
the latter function; noncrossing constraints are disallowed.
Compute generalized lasso objective for a single tau and lambda value.
quantile_genlasso_objective(x, y, d, beta, tau, lambda)
quantile_genlasso_objective(x, y, d, beta, tau, lambda)
Compute quantile lasso solutions.
quantile_lasso( x, y, tau, lambda, weights = NULL, no_pen_vars = c(), intercept = TRUE, standardize = TRUE, lb = -Inf, ub = Inf, noncross = FALSE, x0 = NULL, lp_solver = c("glpk", "gurobi"), time_limit = NULL, warm_starts = TRUE, params = list(), transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE )
quantile_lasso( x, y, tau, lambda, weights = NULL, no_pen_vars = c(), intercept = TRUE, standardize = TRUE, lb = -Inf, ub = Inf, noncross = FALSE, x0 = NULL, lp_solver = c("glpk", "gurobi"), time_limit = NULL, warm_starts = TRUE, params = list(), transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE )
x |
Matrix of predictors. If sparse, then passing it an appropriate
sparse |
y |
Vector of responses. |
tau , lambda
|
Vectors of quantile levels and tuning parameter values. If
these are not of the same length, the shorter of the two is recycled so
that they become the same length. Then, for each |
weights |
Vector of observation weights (to be used in the loss function). Default is NULL, which is interpreted as a weight of 1 for each observation. |
no_pen_vars |
Indices of the variables that should be excluded from the
lasso penalty. Default is |
This function solves the quantile lasso problem, for each pair of
quantile level and tuning parameter
:
for a response vector with components
, and predictor
matrix
with rows
. Here
is the "pinball" or "tilted
" loss. When
noncrossing constraints are applied, we instead solve one big joint
optimization, over all quantile levels and tuning parameter values:
where the quantile levels are assumed to be in
increasing order, and
is a collection of points over
which to enforce the noncrossing constraints.
Either problem is readily converted into a linear program (LP), and solved using either Gurobi (which is free for academic use, and generally fast) or GLPK (which free for everyone, but slower).
All arguments not described above are as in the quantile_genlasso
function. The associated coef
and predict
functions are just
those for the quantile_genlasso
class.
A list with the following components:
beta |
Matrix of lasso coefficients, of dimension = (number of
features + 1) x (number of quantile levels) assuming |
status |
Vector of status flags returned by Gurobi's or GLPK's LP solver, of length = (number of quantile levels) |
tau , lambda
|
Vectors of tau and lambda values used |
weights , no_pen_vars , ... , jitter
|
Values of these other arguments used in the function call |
Ryan Tibshirani
Convenience function for computing quantile lasso solutions on a tau by lambda grid.
quantile_lasso_grid( x, y, tau, lambda = NULL, nlambda = 30, lambda_min_ratio = 0.001, weights = NULL, no_pen_vars = c(), intercept = TRUE, standardize = TRUE, lb = -Inf, ub = Inf, lp_solver = c("glpk", "gurobi"), time_limit = NULL, warm_starts = TRUE, params = list(), transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE )
quantile_lasso_grid( x, y, tau, lambda = NULL, nlambda = 30, lambda_min_ratio = 0.001, weights = NULL, no_pen_vars = c(), intercept = TRUE, standardize = TRUE, lb = -Inf, ub = Inf, lp_solver = c("glpk", "gurobi"), time_limit = NULL, warm_starts = TRUE, params = list(), transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE )
nlambda |
Number of lambda values to consider, for each quantile level. Default is 30. |
lambda_min_ratio |
Ratio of the minimum to maximum lambda value, for each quantile levels. Default is 1e-3. |
This function forms a lambda
vector either determined by the
nlambda
and lambda_min_ratio
arguments, or the lambda
argument; if the latter is specified, then it takes priority. Then, for
each i
and j
, we solve a separate quantile lasso problem at
quantile level tau[i]
and tuning parameter value lambda[j]
,
using the quantile_lasso
function. All arguments (aside from
nlambda
and lambda_min_ratio
) are as in the latter function;
noncrossing constraints are disallowed. The associated predict
function is just that for the quantile_genlasso_grid
class.
Compute lasso objective for a single tau and lambda value.
quantile_lasso_objective(x, y, beta, tau, lambda)
quantile_lasso_objective(x, y, beta, tau, lambda)
Compute the quantile (tilted absolute) loss for a single tau value.
quantile_loss(yhat, y, tau)
quantile_loss(yhat, y, tau)
Refit generalized lasso solutions at a new set of quantile levels, given
an existing cv_quantile_genlasso
object.
refit_quantile_genlasso( obj, x, y, d, tau_new, weights = NULL, intercept = NULL, standardize = NULL, lb = NULL, ub = NULL, noncross = FALSE, x0 = NULL, lp_solver = NULL, time_limit = NULL, warm_starts = NULL, params = NULL, transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE )
refit_quantile_genlasso( obj, x, y, d, tau_new, weights = NULL, intercept = NULL, standardize = NULL, lb = NULL, ub = NULL, noncross = FALSE, x0 = NULL, lp_solver = NULL, time_limit = NULL, warm_starts = NULL, params = NULL, transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE )
obj |
The |
x |
Matrix of predictors. |
y |
Vector of responses. |
d |
Matrix defining the generalized lasso penalty. |
tau_new |
Vector of new quantile levels at which to fit new solutions. |
noncross |
Should noncrossing constraints be applied? These force the
estimated quantiles to be properly ordered across all quantile levels being
considered. The default is FALSE. If TRUE, then noncrossing constraints are
applied to the estimated quantiles at all points specified by the next
argument |
x0 |
Matrix of points used to define the noncrossing
constraints. Default is NULL, which means that we consider noncrossing
constraints at the training points |
verbose |
Should progress be printed out to the console? Default is FALSE. |
This function simply infers, for each quantile level in
tau_new
, a (very) roughly-CV-optimal tuning parameter value, then
calls quantile_genlasso
at the new quantile levels and corresponding
tuning parameter values. If not specified, the arguments weights
,
intercept
, standardize
, lb
, ub
,
lp_solver
, time_limit
, warm_starts
, params
,
transform
, inv_transorm
, jitter
are all inherited from
the given cv_quantile_genlasso
object.
A quantile_genlasso
object, with solutions at quantile levels
tau_new
.
Refit lasso solutions at a new set of quantile levels, given an existing
cv_quantile_lasso
object.
refit_quantile_lasso( obj, x, y, tau_new, weights = NULL, no_pen_vars = NULL, intercept = NULL, standardize = NULL, lb = NULL, ub = NULL, noncross = FALSE, x0 = NULL, lp_solver = NULL, time_limit = NULL, warm_starts = NULL, params = NULL, transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE )
refit_quantile_lasso( obj, x, y, tau_new, weights = NULL, no_pen_vars = NULL, intercept = NULL, standardize = NULL, lb = NULL, ub = NULL, noncross = FALSE, x0 = NULL, lp_solver = NULL, time_limit = NULL, warm_starts = NULL, params = NULL, transform = NULL, inv_trans = NULL, jitter = NULL, verbose = FALSE )
obj |
The |
x |
Matrix of predictors. |
y |
Vector of responses. |
tau_new |
Vector of new quantile levels at which to fit new solutions. |
noncross |
Should noncrossing constraints be applied? These force the
estimated quantiles to be properly ordered across all quantile levels being
considered. The default is FALSE. If TRUE, then noncrossing constraints are
applied to the estimated quantiles at all points specified by the next
argument |
x0 |
Matrix of points used to define the noncrossing
constraints. Default is NULL, which means that we consider noncrossing
constraints at the training points |
verbose |
Should progress be printed out to the console? Default is FALSE. |
This function simply infers, for each quantile level in
tau_new
, a (very) roughly-CV-optimal tuning parameter value, then
calls quantile_lasso
at the new quantile levels and corresponding
tuning parameter values. If not specified, the arguments weights
,
no_pen_vars
, intercept
, standardize
, lp_solver
,
time_limit
, warm_start
, params
, transform
,
inv_transorm
, jitter
are all inherited from the given
cv_quantile_lasso
object.
A quantile_lasso
object, with solutions at quantile levels
tau_new
.
Function to generate random draws from .
unif_jitter(a = 0, b = 0.01)
unif_jitter(a = 0, b = 0.01)