Prosit: Proteome-wide prediction of peptide tandem mass spectra by deep learning
Prosit (https://www.proteomicsdb.org/prosit/) is a deep learning framework that offers free and easy generation of custom in-silico spectral libraries using very high quality predicted HCD MS2 spectra for any organism and protease as well as iRT prediction.
In mass spectrometry-based proteomics, the identification and quantification of peptides and proteins heavily relies on sequence database searching or spectral library matching. The lack of accurate predictive models for fragment ion intensities impairs the realization of the full potential of these approaches. Here, we extended the ProteomeTools synthetic peptide library to 550k tryptic peptides and 21M high quality tandem mass spectra. We trained a deep neural network, termed Prosit, resulting in chromatographic retention time and fragment ion intensity predictions that exceed the quality of the experimental data. Integrating Prosit into database search pipelines led to more identifications at >10x lower false discovery rates. We show the general applicability of Prosit by predicting spectra for other proteases, generating spectral libraries for data independent acquisition, and improving the analysis of metaproteomes. Integration into ProteomicsDB allows search result re-scoring and custom spectral library generation for any organism based on peptide sequence alone.