\[ \newcommand{\argmin}{\mathop{\mathrm{argmin}}} \newcommand{\argmax}{\mathop{\mathrm{argmax}}} \newcommand{\minimize}{\mathop{\mathrm{minimize}}} \newcommand{\st}{\mathop{\mathrm{subject\,\,to}}} \]
This package provides tools for generalized quantile modeling: regularized quantile regression (with generalized lasso penalties and noncrossing constraints), cross-validation, quantile extrapolation, and quantile ensembles.
This package is not on CRAN yet, so it can be installed using the devtools
package:
Building the vignettes takes a substantial amount of time. They are not included in the package by default. If you want to include vignettes (for local access), then you can use:
Here we give some basic examples of how to fit a quantile lasso
model. For details on how this problem is defined (and how
quantgen casts it into a linear program), see the
[mathematical details vignette].
To fit a quantile lasso model, we can use
quantile_lasso(). Below, we do so at the quantile level
\(\tau = 0.8\):
library(quantgen)
set.seed(0)
n = 100
p = 10
x = matrix(rnorm(n*p), n, p)
x0 = rnorm(p)
mu = function(x) 2 + x[1] + x[2]
y = apply(x, 1, mu) + rnorm(n)
tau = 0.8
lambda = 2 * sqrt(get_lambda_max(x, y, Matrix::Diagonal(p)))
obj = quantile_lasso(x, y, tau=tau, lambda=lambda)
class(obj)## [1] "quantile_lasso" "quantile_genlasso"
## [1] 2.885 0.395 0.656 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## [,1]
## [1,] 3.979133
The default in quantile_lasso() is to include an
intercept in the model (via intercept = TRUE) and to
standardize the predictors (to have unit norm, via
standardize = TRUE). Important note
(obvious in hindsight, but important nonetheless): in a quantile model,
the intercept cannot be omitted by simply centering the response and the
predictors, as it can in a standard regression model. The intercept
depends critically on the quantile level \(\tau\). Above, the fact that the estimated
intercept exceeds 2 is not an instance of poor estimation in
the quantile lasso model, but reflects the gap between the level 0.8 and
0.5 quantiles of the standard normal distribution (note that
qnorm(0.8) \(\approx
0.84\)).
The get_lambda_max() function computes (approximately)
the effective maximum of the regularization path for a penalized
quantile model with \(\tau = 0.5\),
given a predictor matrix, response vector, and penalty matrix. In the
case of the lasso, the penalty matrix is the identity.
We can see that the quantile_lasso() function returns an
object of class quantile_lasso (inherited from
quantile_genlasso), and comes with associated utilities
coef() and predict().
To fit solutions at multiple quantile levels, we can simply pass a
vector for the tau argument:
tau_vec = c(0.1, 0.5, 0.9)
obj = quantile_lasso(x, y, tau=tau_vec, lambda=lambda)
round(coef(obj), 3) # Matrix of dimension (p + 1) x (number of tau values)## 11 x 3 sparse Matrix of class "dgCMatrix"
## tau=0.1, lam=7.81 tau=0.5, lam=7.81 tau=0.9, lam=7.81
## [1,] 0.116 1.890 3.461
## [2,] 0.386 0.920 0.353
## [3,] 0.382 0.789 0.489
## [4,] 0.000 0.000 0.000
## [5,] 0.000 0.000 .
## [6,] 0.000 0.000 0.000
## [7,] 0.000 0.059 0.000
## [8,] 0.000 0.000 0.000
## [9,] . 0.000 .
## [10,] 0.000 0.000 .
## [11,] 0.000 0.000 0.000
## tau=0.1, lam=7.81 tau=0.5, lam=7.81 tau=0.9, lam=7.81
## [1,] 0.7093832 3.074489 4.259551
To fit solutions at multiple tuning parameter values, we can also
pass a vector for the lambda argument, in which case
quantile_lasso() internally recycles the tau
and lambda arguments so that they have equal length, and
then solves separate quantile lasso problems, each with a quantile level
tau[i] and tuning parameter lambda[i]. The two
most common use cases are to pass a single value for tau
and a vector for lambda, or to pass a vector for
tau and a single value for lambda.
Solving quantile lasso problems over a two-dimensional grid of
quantile levels and tuning parameter values is most convenient with
quantile_lasso_grid(). In this function, passing
lambda is optional; when not specified, the function
quantile_lasso_grid() internally sets lambda
to be a vector of length nlambda = 30, log-spaced between
the value computed by get_lambda_max() and
lambda_min_ratio = 0.001 times this value.
obj = quantile_lasso_grid(x, y, tau_vec)
# Matrix of dimension (p + 1) x (number of tau and lambda pairs)
dim(coef(obj)) ## [1] 11 90
# Array of dimension (number of x0 points) x (number of lambda values) x
# (number of tau values)
dim(predict(obj, new=x0)) ## [1] 1 30 3
The quantile_lasso() function is a special case of its a
parent function quantile_genlasso(), which allows for
specification of a general penalty matrix. Both
quantile_lasso() and quantile_genlasso() have
several other options that you can read about in their
documentation:
weights;noncross
and x0;gurobi package), which can be set with
lp_solver;transform (and
inv_trans).