A transformer for prediction of MS2 spectrum intensities
Title: A transformer for prediction of MS2 spectrum intensities
SNIC Project: Berzelius-2022-7
Project Type: LiU Berzelius
Principal Investigator: Lukas Käll <lukas.kall@scilifelab.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2022-01-20 – 2022-08-01
Classification: 10203
Homepage: http://kaell.org


Machine learning has for a long time been an integral part of the interpretation of data from mass spectrometry-based proteomics. Relatively recently a machine-learning structure appeared that has successfully been employed in other areas of bioinformatics, Transformers. One of their key properties is that they enable so-called transfer learning, i.e. adapting networks trained for other tasks to new functionality with relatively few training examples. Here, we implemented a Transformer based on the pre-trained model TAPE for the task of predicting MS2 intensities. TAPE is a general model trained to predict missing residues from protein sequences. Despite being trained for a different task, we could modify its behavior by adding a prediction head at the end of the TAPE model and train it using the spectrum intensity from the training set to the well-known predictor Prosit. We just demonstrate that the predictor, which we call Prosit-Transformer, is outperforming the recurrent neural network-based predictor Prosit, increasing the median angular similarity on its hold-out set from 0.908 to 0.923. However, in order to further improve the results, we need better GPU performance to shorten our weeks-long training cycle. We believe that transformers will significantly increase prediction accuracy for other types of predictions within mass spectrometry-based proteomics, particularly predictions that use amino acid sequences as input.