EnDNA-Prot: identification of DNA-binding proteins by applying ensemble learning

Ruifeng Xu; Jiyun Zhou; Bin Liu; Lin Yao; Yulan He; Quan Zou; Xiaolong Wang

doi:10.1155/2014/294279

EnDNA-Prot: identification of DNA-binding proteins by applying ensemble learning

Ruifeng Xu, Jiyun Zhou, Bin Liu^*, Lin Yao, Yulan He, Quan Zou, Xiaolong Wang

^*Corresponding author for this work

Computer Science Research Group

Research output: Contribution to journal › Article › peer-review

Abstract

DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97-9.52% in ACC and 0.08-0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83-16.63% in terms of ACC and 0.02-0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.

Original language	English
Article number	294279
Number of pages	10
Journal	BioMed Research International
Volume	2014
DOIs	https://doi.org/10.1155/2014/294279
Publication status	Published - 26 May 2014

Bibliographical note

Copyright © 2014 Ruifeng Xu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Access to Document

10.1155/2014/294279

Identification of DNA-binding proteins by applying ensemble learning
Copyright © 2014 Ruifeng Xu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Final published version, 1.26 MBLicence: CC BY 3.0

http://www.hindawi.com/journals/bmri/2014/294279/

Cite this

@article{d05a09512d0644d4ba5f2df74d3dc6dc,

title = "EnDNA-Prot: identification of DNA-binding proteins by applying ensemble learning",

abstract = "DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97-9.52% in ACC and 0.08-0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83-16.63% in terms of ACC and 0.02-0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.",

author = "Ruifeng Xu and Jiyun Zhou and Bin Liu and Lin Yao and Yulan He and Quan Zou and Xiaolong Wang",

note = "Copyright {\textcopyright} 2014 Ruifeng Xu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.",

year = "2014",

month = may,

day = "26",

doi = "10.1155/2014/294279",

language = "English",

volume = "2014",

journal = "BioMed Research International",

issn = "2314-6133",

publisher = "Hindawi Limited",

}

TY - JOUR

T1 - EnDNA-Prot

T2 - identification of DNA-binding proteins by applying ensemble learning

AU - Xu, Ruifeng

AU - Zhou, Jiyun

AU - Liu, Bin

AU - Yao, Lin

AU - He, Yulan

AU - Zou, Quan

AU - Wang, Xiaolong

N1 - Copyright © 2014 Ruifeng Xu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PY - 2014/5/26

Y1 - 2014/5/26

N2 - DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97-9.52% in ACC and 0.08-0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83-16.63% in terms of ACC and 0.02-0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.

AB - DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97-9.52% in ACC and 0.08-0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83-16.63% in terms of ACC and 0.02-0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.

UR - http://www.scopus.com/inward/record.url?scp=84902214411&partnerID=8YFLogxK

U2 - 10.1155/2014/294279

DO - 10.1155/2014/294279

M3 - Article

AN - SCOPUS:84902214411

SN - 2314-6133

VL - 2014

JO - BioMed Research International

JF - BioMed Research International

M1 - 294279

ER -

EnDNA-Prot: identification of DNA-binding proteins by applying ensemble learning

Abstract

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this