Design of radioprotecitive proteins with large DNA language models
Abstract
Radiation therapy is the only curative treatment for certain cancers. Radiation can cause extensive damage to otherwise healthy tumor-adjacent tissue. An example of this is prostate cancer, where radiation therapy causes damage to the colon and bladder. The aim of this project is to use the large DNA language model Evo 2 to generate novel radioprotective proteins that can protect tumor-adjacent tissue from radiation damage. The Evo 2 model is trained on a dataset of 9 trillion nucleotides and can be used to generate functional genes with sequences that don't occur in nature. Specifically, it can be used to simulate counterfactual evolution and design new genes based on existing genes as if they had evolved in a different organism. Here, a radioprotective protein that only exists in tardigrades will be used as a starting point to design human-like variants through counterfactual evolution simulation with Evo 2. The model (evo2_40b, 40 billion parameters) requires two H100 or one H200 GPUs to run.
Up to 100,000 variants will be generated in the initial design campaign. These variants will be cloned into a lentiviral library and transduced into a human cell line. The transduced cells will be exposed to a dose of radiation that would usually be lethal to those cells. Cells that carry variants that confer radiation resistance will survive. The genomic DNA of the surviving cells will be isolated, and the lentiviral integration site will be amplified and sequenced. The variants that confer the strongest radiation resistance will be highly overrepresented in the sequencing results. The top candidates will be validated and characterized in individual radiation exposure experiments. The best candidates from the validation will be tested in animal models.