Protein lipograms

Jason Laurie, Amit K Chattopadhyay, Darren R Flower*

*Corresponding author for this work

Research output: Contribution to journalArticle

Abstract

Linguistic analysis of protein sequences is an underexploited technique. Here, we capitalize on the concept of the lipogram to characterize sequences at the proteome levels. A lipogram is a literary composition which omits one or more letters. A protein lipogram likewise omits one or more types of amino acid. In this article, we establish a usable terminology for the decomposition of a sequence collection in terms of the lipogram. Next, we characterize Uniref50 using a lipogram decomposition. At the global level, protein lipograms exhibit power-law properties. A clear correlation with metabolic cost is seen. Finally, we use the lipogram construction to assign proteomes to the four branches of the tree-of-life: archaea, bacteria, eukaryotes and viruses. We conclude from this pilot study that the lipogram demonstrates considerable potential as an additional tool for sequence analysis and proteome classification.

Original languageEnglish
Pages (from-to)109-116
Number of pages8
JournalJournal of Theoretical Biology
Volume430
Early online date15 Jul 2017
DOIs
Publication statusPublished - 7 Oct 2017

Fingerprint

Proteome
proteome
Proteins
Protein
Decompose
Sequence Analysis
Protein Sequence
Bacteria
Virus
Decomposition
Assign
Amino Acids
Power Law
Branch
degradation
proteins
Archaea
Protein Sequence Analysis
terminology
Terminology

Bibliographical note

© 2017, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/

Keywords

  • Amino Acid Sequence
  • Archaea
  • Bacteria
  • Eukaryota
  • Evolution, Molecular
  • Pilot Projects
  • Proteins/chemistry
  • Proteome/classification
  • Viruses

Cite this

Laurie, Jason ; Chattopadhyay, Amit K ; Flower, Darren R. / Protein lipograms. In: Journal of Theoretical Biology. 2017 ; Vol. 430. pp. 109-116.
@article{a2f83a4351fe4d49936a5701e0120bda,
title = "Protein lipograms",
abstract = "Linguistic analysis of protein sequences is an underexploited technique. Here, we capitalize on the concept of the lipogram to characterize sequences at the proteome levels. A lipogram is a literary composition which omits one or more letters. A protein lipogram likewise omits one or more types of amino acid. In this article, we establish a usable terminology for the decomposition of a sequence collection in terms of the lipogram. Next, we characterize Uniref50 using a lipogram decomposition. At the global level, protein lipograms exhibit power-law properties. A clear correlation with metabolic cost is seen. Finally, we use the lipogram construction to assign proteomes to the four branches of the tree-of-life: archaea, bacteria, eukaryotes and viruses. We conclude from this pilot study that the lipogram demonstrates considerable potential as an additional tool for sequence analysis and proteome classification.",
keywords = "Amino Acid Sequence, Archaea, Bacteria, Eukaryota, Evolution, Molecular, Pilot Projects, Proteins/chemistry, Proteome/classification, Viruses",
author = "Jason Laurie and Chattopadhyay, {Amit K} and Flower, {Darren R}",
note = "{\circledC} 2017, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/",
year = "2017",
month = "10",
day = "7",
doi = "10.1016/j.jtbi.2017.07.009",
language = "English",
volume = "430",
pages = "109--116",
journal = "Journal of Theoretical Biology",
issn = "0022-5193",
publisher = "Academic Press Inc.",

}

Protein lipograms. / Laurie, Jason; Chattopadhyay, Amit K; Flower, Darren R.

In: Journal of Theoretical Biology, Vol. 430, 07.10.2017, p. 109-116.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Protein lipograms

AU - Laurie, Jason

AU - Chattopadhyay, Amit K

AU - Flower, Darren R

N1 - © 2017, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/

PY - 2017/10/7

Y1 - 2017/10/7

N2 - Linguistic analysis of protein sequences is an underexploited technique. Here, we capitalize on the concept of the lipogram to characterize sequences at the proteome levels. A lipogram is a literary composition which omits one or more letters. A protein lipogram likewise omits one or more types of amino acid. In this article, we establish a usable terminology for the decomposition of a sequence collection in terms of the lipogram. Next, we characterize Uniref50 using a lipogram decomposition. At the global level, protein lipograms exhibit power-law properties. A clear correlation with metabolic cost is seen. Finally, we use the lipogram construction to assign proteomes to the four branches of the tree-of-life: archaea, bacteria, eukaryotes and viruses. We conclude from this pilot study that the lipogram demonstrates considerable potential as an additional tool for sequence analysis and proteome classification.

AB - Linguistic analysis of protein sequences is an underexploited technique. Here, we capitalize on the concept of the lipogram to characterize sequences at the proteome levels. A lipogram is a literary composition which omits one or more letters. A protein lipogram likewise omits one or more types of amino acid. In this article, we establish a usable terminology for the decomposition of a sequence collection in terms of the lipogram. Next, we characterize Uniref50 using a lipogram decomposition. At the global level, protein lipograms exhibit power-law properties. A clear correlation with metabolic cost is seen. Finally, we use the lipogram construction to assign proteomes to the four branches of the tree-of-life: archaea, bacteria, eukaryotes and viruses. We conclude from this pilot study that the lipogram demonstrates considerable potential as an additional tool for sequence analysis and proteome classification.

KW - Amino Acid Sequence

KW - Archaea

KW - Bacteria

KW - Eukaryota

KW - Evolution, Molecular

KW - Pilot Projects

KW - Proteins/chemistry

KW - Proteome/classification

KW - Viruses

UR - http://www.scopus.com/inward/record.url?scp=85024840569&partnerID=8YFLogxK

U2 - 10.1016/j.jtbi.2017.07.009

DO - 10.1016/j.jtbi.2017.07.009

M3 - Article

C2 - 28716385

VL - 430

SP - 109

EP - 116

JO - Journal of Theoretical Biology

JF - Journal of Theoretical Biology

SN - 0022-5193

ER -