Abstract
Allergy is an overreaction by the immune system to a previously encountered, ordinarily harmless substance - typically proteins - resulting in skin rash, swelling of mucous membranes, sneezing or wheezing, or other abnormal conditions. The use of modified proteins is increasingly widespread: their presence in food, commercial products, such as washing powder, and medical therapeutics and diagnostics, makes predicting and identifying potential allergens a crucial societal issue. The prediction of allergens has been explored widely using bioinformatics, with many tools being developed in the last decade; many of these are freely available online. Here, we report a set of novel models for allergen prediction utilizing amino acid E-descriptors, auto- and cross-covariance transformation, and several machine learning methods for classification, including logistic regression (LR), decision tree (DT), naïve Bayes (NB), random forest (RF), multilayer perceptron (MLP) and k nearest neighbours (kNN). The best performing method was kNN with 85.3% accuracy at 5-fold cross-validation. The resulting model has been implemented in a revised version of the AllerTOP server (http://www.ddg-pharmfac.net/AllerTOP).
Original language | English |
---|---|
Article number | 2278 |
Number of pages | 6 |
Journal | Journal of Molecular Modeling |
Volume | 20 |
Issue number | 6 |
DOIs | |
Publication status | Published - 31 May 2014 |
Bibliographical note
This paper belongs to Topical Collection MIB 2013 (Modeling Interactions in Biomolecules VI).Funding: Bulgarian Science Fund (Grants DCVNP 02-1/2009 and IO1/7)
Keywords
- ACC transformation
- allergen prediction
- decision tree
- e-descriptors
- k nearest neighbours
- logistic regression
- multilayer perceptrone
- naïve bayes
- random forest