Data-driven synthetic biology for proteins
Title: Data-driven synthetic biology for proteins
SNIC Project: Berzelius-2022-20
Project Type: LiU Berzelius
Principal Investigator: Aleksej Zelezniak <>
Affiliation: Chalmers tekniska högskola
Duration: 2022-02-01 – 2022-08-01
Classification: 10203


Protein have evolved for millions of years to adapt to various environmental conditions such as temperatures well above boiling all the way to near freezing. To be able to adapt proteins in silico to new conditions have been a long standing goal in biotechnology. Existing methods have focused on either relying on composition values or hand-picked numerical features, with little overall success, or leveraging the evolutionary data in orthologous groups that include the target properties. These methods are dependent on the existence of orthologous groups with the desired property. However, new methods in natural language processing using deep learning can be used to capture sequence features of proteins and learn grammars of protein expression. We want to adapt these models to learn protein features, as e.g. abundance, temperature and stability for yeast and human proteins.