DV-iSucLys: Decision Voting to Improve Protein Lysine Succinylation Site Identification from Sequence Data
American Journal of Biomedical and Life Sciences
Volume 5, Issue 6, December 2017, Pages: 135-143
Received: Sep. 8, 2017;
Accepted: Oct. 8, 2017;
Published: Nov. 30, 2017
Views 1535 Downloads 101
Md. Khaled Ben Islam, Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh; Department of Computer Science & Engineering, Pabna University of Science & Technology, Pabna, Bangladesh
Md. Nazrul Islam Mondal, Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh
Julia Rahman, Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh
Md. Al Mehedi Hassan, Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh
Protein Post Translation Modification identification is one of the important steps in conducting disease-associated mutation studies. Though multiple chemical alterations happen in a protein after translation, the addition of succinyl group to lysine residue plays a vital role in regulating cellular metabolism and thus disease. Use of a classification algorithm on some features, driven either from protein structural, physicochemical or even biochemical information becomes a common approach that can yield a satisfactory result up to a certain level. Although, researchers already developed many computational methods to identify whether a lysine residue modified with succinyl group after translation, most of them focused on the improvement either on a single decision using a single method or feature enrichment or even development of a benchmark dataset. Therefore, there still exists scope for further improvement to characterise lysine residues of a protein sequence by considering multiple predictors at a time. In this study, an ensemble based approach called DV-iSucLys has been designed to characterise the lysine residue by adapting three well known and conceptually different classifiers and ensembling their decisions. Also, a benchmark succinylation dataset was extracted from existing benchmark datasets and recently updated succinylation data from UniProt consortium to investigate the performance of the proposed approach as well as contribute to further research. Analysing rigorous cross-validation results show that DV-iSucLys can characterise succinyl lysine residue better than the existing predictors.
Md. Khaled Ben Islam,
Md. Nazrul Islam Mondal,
Md. Al Mehedi Hassan,
DV-iSucLys: Decision Voting to Improve Protein Lysine Succinylation Site Identification from Sequence Data, American Journal of Biomedical and Life Sciences.
Vol. 5, No. 6,
2017, pp. 135-143.
B. N. Sobolev, A. V. Veselovsky, and V. V. Poroikov, “Prediction of protein post-translational modifications: main trends and methods,” Russian Chemical Reviews: Russian Academy of Sciences and Turpion Ltd, vol. 83(2), pp. 143-154, 2014.
Rosen and R. et al., “Probing the active site of homoserine trans-succinylase,” FEBS Lett., vol. 577, pp. 386-392, 2004.
X. Zhao, Q. Ning, H. Chai, and Z. Ma, “Accurate in silico identification of protein succinylation sites using an iterative semi-supervised learning technique,” Journal of Theoretical Biology, vol. 374, pp. 60-65, 2015.
H. D. Xu, S. P. Shi, P. P. Wen, and J. D. Qiu, “SuccFind: a novel succinylation sites online prediction tool via enhanced characteristic strategy,” Bioinformatics, vol. 31(23), pp. 3748-3750, 2015.
Y. Xu et al., “iSuc-PseAAC: predicting lysine succinylation in proteins by incorporating peptide positionspecific propensity,” Scientific Reports, vol. 5, 2015.
J. Jia et al., “iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset,” Analytical Biochemistry, vol. 497, pp. 48-56, 2016.
A. M. Hasan, S. Yang, Y. Zhoua, and M. N. H. Mollahb, “SuccinSite: a computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties,” Molecular BioSystems, vol. 12(3), pp. 786-795, 2016.
J. Jia et al., “pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach,” Journal of Theoretical Biology, vol. 394, pp. 223-230, 2016.
W. Bao, L. Zhu, and D. S. Huang, “ILSES: Identification lysine succinylation-sites with ensemble classification.” In IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2016.
L. Nanni, A. Lumini, and S. Brahnam, “An empirical study of different approaches for protein classification,” The Scientific World Journal, 2014.
K. Chen, L. Kurgan, and M. Rahbari, “Prediction of protein crystallization using collocation of amino acid pairs,” Biochemical and Biophysical Research Communications, vol. 355(3), pp. 764-769, 2007.
S. Kawashima et al., “AAindex: amino acid index database, progress report 2008,” Nucleic Acids Research, vol. 36(D202-5), 2008.
Y. R. Tang, Y. Z. Chen, C. A. Canchaya, and Z. Zhang, “GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network,” Protein Engineering, Design and Selection, vol. 20(8), pp. 405-412, 2007.
M. A. M. Hasan, M. Nasser, S. Ahmad and K. I. Molla, “Feature Selection for Intrusion Detection Using Random Forest,” Journal of Information Security, vol. 7, pp. 129-140, 2016.
S. Wang and S. Liu, “Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA.” International Journal of Molicular Science, vol. 16(12), pp. 30343-30361, 2015.
Y. López et al., “SucStruct: Prediction of succinylated lysine residues by using structural properties of amino acids,” Analytical Biochemistry, vol. 527, pp. 24-32, 2017.
The UniProt Consortium, “UniProt: the universal protein knowledgebase,” Nucleic Acids Research; vol. 45, 2016, (D1): D158-D169. doi: 10.1093/nar/gkw1099.
Z. Liu et al. “CPLM: a database of protein lysine modifications.” Nucleic Acids Res. Vol. 42, pp. D531–D536, 2016.
W. R. Qiu, B. Q. Sun, X. Xiao, Z. C. Xu, K. C. Chou, iPTM-mLys: identifying multiple lysine PTM sites and their different types. Bioinformatics, 32(20), pp. 3116-3123, 2016.
Z. Ju, J. J. He, "Prediction of lysine propionylation sites using biased SVM and incorporating four different sequence features into Chou’s PseAAC", Journal of Molecular Graphics and Modelling, vol. 76, pp. 356-363, 2017.
W. R. Qiu, Q. S. Zheng, B. Q. Sun, X. Xiao, “Multi-iPPseEvo: A Multi‐label Classifier for Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into Chou′ s General PseAAC via Grey System Theory”, Molecular Informatics, 36(3), 2017.
H. Long, M. Wang, H. Fu, “Deep Convolutional Neural Networks for Predicting Hydroxyproline in Proteins” Current Bioinformatics, 12(3), pp. 233-238, 2017.
M. A. M. Hasan, S. Ahmad, M. K. I. Molla, "iMulti-HumPhos: a multi-label classifier for identifying human phosphorylated proteins using multiple kernel learning based support vector machines", Molecular Bio Systems, vol. 13, pp. 1608-1618, 2017.