Black-box optimization for efficient Molecular design
	  
	  
  Abstract
  Bayesian optimization (BO) is a black-box technique for sample-efficient optimization. Recently, it has been employed to optimize over the latent spaces of deep autoencoder models over structured, discrete, hard-to-enumerate search spaces (e.g., molecules, proteins). Here, the DAE dramatically simplifies the search space by mapping in-enumerable inputs into a continuous latent space where familiar Bayesian optimization tools can be readily applied. Despite this simplification, the latent space typically remains complex, limiting the efficiency of the optimization when latent space queries are mapped back to the original molecular space. Thus, even with a well-suited latent space, these approaches do not necessarily provide a complete solution, but may rather shift the structured optimization problem to a complex and unpredictable one. We aim to lower the complexity and unpredictability, using recent work on high-dimensional, complex Bayesian optimization by the PI. By reformulating the encoder to function as both an encoder and a complexity-lowering deep kernel for the surrogate model, we better align the optimization in the latent space with optimization with the encoder. We hope to increase the efficiency of optimization, finding molecules with improved binding affinity in the process, as asessed on the Guacamol Antibody Design benchmarking suite. The project, a collaboration between LU, Cambridge and Penn, involves several experienced reseachers within the field of both BO and its application to Molecular design.