Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences

Enrique Audain; Yassel Ramos; Henning Hermjakob; Darren R. Flower; Yasset Perez-Riverol

doi:10.1093/bioinformatics/btv674

Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences

Enrique Audain, Yassel Ramos, Henning Hermjakob, Darren R. Flower, Yasset Perez-Riverol^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

Motivation: In any macromolecular polyprotic system - for example protein, DNA or RNA - the isoelectric point - commonly referred to as the pI - can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge - and thus the electrophoretic mobility - of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. Contact: yperez@ebi.ac.uk Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR. Supplementary information: Supplementary data are available at Bioinformatics online.

Original language	English
Pages (from-to)	821-827
Number of pages	7
Journal	Bioinformatics
Volume	32
Issue number	6
Early online date	14 Nov 2015
DOIs	https://doi.org/10.1093/bioinformatics/btv674
Publication status	Published - 15 Mar 2016

Bibliographical note

© The Author 2015. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Funding: BBSRC ‘PROCESS’ grant (BB/K01997X/1).

Supplementary data available on the journal website.

Access to Document

10.1093/bioinformatics/btv674Licence: CC BY 3.0

Isoelectric point of protein and peptide based on amino acid sequences
© The Author 2015. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Final published version, 398 KBLicence: CC BY 3.0

http://bioinformatics.oxfordjournals.org/content/32/6/821Licence: CC BY 3.0

Cite this

@article{31b232539ca547afb3029fa045fea07f,

title = "Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences",

abstract = "Motivation: In any macromolecular polyprotic system - for example protein, DNA or RNA - the isoelectric point - commonly referred to as the pI - can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge - and thus the electrophoretic mobility - of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. Contact: yperez@ebi.ac.uk Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR. Supplementary information: Supplementary data are available at Bioinformatics online.",

author = "Enrique Audain and Yassel Ramos and Henning Hermjakob and Flower, {Darren R.} and Yasset Perez-Riverol",

note = "{\textcopyright} The Author 2015. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Funding: BBSRC {\textquoteleft}PROCESS{\textquoteright} grant (BB/K01997X/1). Supplementary data available on the journal website.",

year = "2016",

month = mar,

day = "15",

doi = "10.1093/bioinformatics/btv674",

language = "English",

volume = "32",

pages = "821--827",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "6",

}

TY - JOUR

T1 - Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences

AU - Audain, Enrique

AU - Ramos, Yassel

AU - Hermjakob, Henning

AU - Flower, Darren R.

AU - Perez-Riverol, Yasset

N1 - © The Author 2015. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Funding: BBSRC ‘PROCESS’ grant (BB/K01997X/1). Supplementary data available on the journal website.

PY - 2016/3/15

Y1 - 2016/3/15

N2 - Motivation: In any macromolecular polyprotic system - for example protein, DNA or RNA - the isoelectric point - commonly referred to as the pI - can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge - and thus the electrophoretic mobility - of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. Contact: yperez@ebi.ac.uk Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR. Supplementary information: Supplementary data are available at Bioinformatics online.

AB - Motivation: In any macromolecular polyprotic system - for example protein, DNA or RNA - the isoelectric point - commonly referred to as the pI - can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge - and thus the electrophoretic mobility - of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. Contact: yperez@ebi.ac.uk Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR. Supplementary information: Supplementary data are available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=84962199714&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btv674

DO - 10.1093/bioinformatics/btv674

M3 - Article

C2 - 26568629

AN - SCOPUS:84962199714

SN - 1367-4803

VL - 32

SP - 821

EP - 827

JO - Bioinformatics

JF - Bioinformatics

IS - 6

ER -

Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences

Abstract

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this