Lysine succinylation (Ksucc) is an important post-translational modification (PTM), attaches bulky and acidic succinyl groups (-CO-CH2-CH2-CO2H, 100.0186 Da) to positively charged lysine (Lys) residues of protein substrates, and plays a critical role in a variety of biological processes (Alleyn, et al., 2018; Grimolizzi and Arranz, 2018; Hirschey and Zhao, 2015; Mills and O'Neill, 2014). Similar to Lys acetylation (Kac), Ksucc occurs in both histone and non-histone proteins, and participates in regulating metabolism (Park, et al., 2013; Rardin, et al., 2013), immunity (Yang, et al., 2012), autophagy (Polletta, et al., 2015), genome stability (Li, et al., 2016) and gene expression (Wang, et al., 2017). The dysregulation of Ksucc is associated with human diseases such as cancer and neurodegenerative disorders (Alleyn, et al., 2018; Yang, et al., 2018). Thus, the identification of modified substrates with exact sites is fundamental to understanding the molecular mechanisms and regulatory roles of Ksucc.

    In this work, we implemented a hybrid-learning architecture by integrating deep-leaning and conventional machine-learning algorithms into a single framework, and developed a novel tool of HybridSucc for general and species-specific succinylation site prediction. For the preparation of the benchmark data set, we collected 26,243 known Ksucc sites of 8,830 proteins from 13 organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Saccharomyces cerevisiae, Oryza sativa, Brachypodium distachyon, Solanum lycopersicum, Toxoplasma gondii, Escherichia coli, Vibrio parahaemolyticus, Bacillus subtilis, Corynebacterium glutamicum and Mycobacterium tuberculosis. Ten types of informative features, including amino acid composition (AAC) (Zhao, et al., 2015), composition of k-spaced amino acid pairs (CKSAAP) (Xu, et al., 2015), orthogonal binary coding (OBC) (Chen, et al., 2006), Amino Acid index (AAindex) (Xu, et al., 2015), autocorrelation functions (ACF) (Zhao, et al., 2015), Group-based Prediction System (GPS) similarity (Deng, et al., 2017), and position-specific scoring matrix (PSSM) (Jia, et al., 2016), accessible surface area (ASA), secondary structure (SS) and backbone torsion angles (BTA) (Lopez, et al., 2017; Lopez, et al., 2018), were adopted for encoding protein sequences. HybridSucc achieved AUC values of 0.885 and 0.952 for general and human-specific prediction of Ksucc sites, respectively. In comparison, the accuracy of HybridSucc was 19.27% to 50.62% better than that of other existing tools.

For publication of results please cite the following article:

HybridSucc: a hybrid-learning architecture for general and species-specific succinylation site prediction
Wanshan Ning, Haodong Xu, Han Cheng, Wankun Deng, Yaping Guo and Yu Xue.

In submission.