TY - JOUR
T1 - Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach
AU - Xu, Ruifeng
AU - Zhou, Jiyun
AU - Liu, Bin
AU - He, Yulan
AU - Zou, Quan
AU - Wang, Xiaolong
AU - Chou, Kuo Chen
N1 - Supplementary material: http://dx.doi.10.1080/07391102.2014.968624
PY - 2015
Y1 - 2015
N2 - DNA-binding proteins are crucial for various cellular processes and hence have become an important target for both basic research and drug development. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to establish an automated method for rapidly and accurately identifying DNA-binding proteins based on their sequence information alone. Owing to the fact that all biological species have developed beginning from a very limited number of ancestral species, it is important to take into account the evolutionary information in developing such a high-throughput tool. In view of this, a new predictor was proposed by incorporating the evolutionary information into the general form of pseudo amino acid composition via the top-n-gram approach. It was observed by comparing the new predictor with the existing methods via both jackknife test and independent data-set test that the new predictor outperformed its counterparts. It is anticipated that the new predictor may become a useful vehicle for identifying DNA-binding proteins. It has not escaped our notice that the novel approach to extract evolutionary information into the formulation of statistical samples can be used to identify many other protein attributes as well.
AB - DNA-binding proteins are crucial for various cellular processes and hence have become an important target for both basic research and drug development. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to establish an automated method for rapidly and accurately identifying DNA-binding proteins based on their sequence information alone. Owing to the fact that all biological species have developed beginning from a very limited number of ancestral species, it is important to take into account the evolutionary information in developing such a high-throughput tool. In view of this, a new predictor was proposed by incorporating the evolutionary information into the general form of pseudo amino acid composition via the top-n-gram approach. It was observed by comparing the new predictor with the existing methods via both jackknife test and independent data-set test that the new predictor outperformed its counterparts. It is anticipated that the new predictor may become a useful vehicle for identifying DNA-binding proteins. It has not escaped our notice that the novel approach to extract evolutionary information into the formulation of statistical samples can be used to identify many other protein attributes as well.
KW - Chou’s pseudo amino acid composition
KW - DNA-binding proteins
KW - evolutionary information
KW - top-n-gram-SVM
UR - http://www.scopus.com/inward/record.url?scp=84930573967&partnerID=8YFLogxK
UR - https://www.tandfonline.com/doi/full/10.1080/07391102.2014.968624
U2 - 10.1080/07391102.2014.968624
DO - 10.1080/07391102.2014.968624
M3 - Article
AN - SCOPUS:84930573967
SN - 0739-1102
VL - 33
SP - 1720
EP - 1730
JO - Journal of Biomolecular Structure and Dynamics
JF - Journal of Biomolecular Structure and Dynamics
IS - 8
ER -