A transformer for prediction of MS2 spectrum intensities
||A transformer for prediction of MS2 spectrum intensities|
||Lukas Käll <email@example.com>|
||Kungliga Tekniska högskolan|
||2021-09-17 – 2021-12-01|
Machine learning has for a long time been an integral part of the interpretation of data from mass spectrometry-based proteomics. Relatively recently a machine-learning structure appeared that has successfully been employed in other areas of bioinformatics, Transformers. One of their key properties is that they enable so-called transfer learning, i.e. adapting networks trained for other tasks to new functionality with relatively few training examples.
Here, we implemented a Transformer based on the pre-trained model TAPE for the task of predicting MS2 intensities. TAPE is a general model trained to predict missing residues from protein sequences. Despite being trained for a different task, we could modify its behavior by adding a prediction head at the end of the TAPE model and train it using the spectrum intensity from the training set to the well-known predictor Prosit.
We just demonstrate that the predictor, which we call Prosit-Transformer, is outperforming the recurrent neural network-based predictor Prosit, increasing the median angular similarity on its hold-out set from 0.908 to 0.923. However, in order to further improve the results, we need better GPU performance to shorten our weeks-long training cycle.
We believe that transformers will significantly increase prediction accuracy for other types of predictions within mass spectrometry-based proteomics, particularly predictions that use amino acid sequences as input.