Bayesian optimization for high-dimensional Lasso applications
Title: Bayesian optimization for high-dimensional Lasso applications
DNr: SNIC 2021/22-792
Project Type: SNIC Small Compute
Principal Investigator: Kenan Sehic <kenan.sehic@cs.lth.se>
Affiliation: Lunds universitet
Duration: 2021-10-08 – 2022-11-01
Classification: 10201
Keywords:

Abstract

Bayesian optimization (BO) has recently emerged as a powerful technique for the global optimization of expensive-to-evaluate black-box functions. Even though BO is a sample-efficient and robust approach for optimizing such black-box applications, a critical limitation is the number of parameters that BO can optimize. It is stated that BO is still impractical for more than 15-20 parameters. Thus, one of the most important goals in the field is to expand BO to higher-dimensional search spaces which is the main objective of this project. Here, our focus is to improve the well-known high-dimensional BO method TuRBO to be applicable for Lasso applications. While high-dimensional Lasso regression has appealing statistical guarantees, it still requires one hyperparameter per feature resulting in a complex high-dimensional HPO search space that the Lasso community typically avoids. Standard Lasso regression with a single hyperparameter (identical for all features) has been applied to various settings to detect signals in brain imaging, genomics, or finance, where a dataset is commonly explained with thousands of features but with only a few of them being important for prediction. Improved discoveries could be expected by using a high-dimensional setting. Even though TuRBO can provide good results in high-dimensional settings, it still suffers from the curse of dimensionality because training is still done in an original high-dimensional search space. The proposal is to utilize low-dimensional projections locally within trust regions of TuRBO to utilize the sparsity of a Lasso application. Furthermore, as Lasso applications do not include clearly defined bounds, these trust regions could be potentially be used to avoid specifying the bounds in the first place.