Protein lipograms

Jason Laurie, Amit K Chattopadhyay, Darren R Flower*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


Linguistic analysis of protein sequences is an underexploited technique. Here, we capitalize on the concept of the lipogram to characterize sequences at the proteome levels. A lipogram is a literary composition which omits one or more letters. A protein lipogram likewise omits one or more types of amino acid. In this article, we establish a usable terminology for the decomposition of a sequence collection in terms of the lipogram. Next, we characterize Uniref50 using a lipogram decomposition. At the global level, protein lipograms exhibit power-law properties. A clear correlation with metabolic cost is seen. Finally, we use the lipogram construction to assign proteomes to the four branches of the tree-of-life: archaea, bacteria, eukaryotes and viruses. We conclude from this pilot study that the lipogram demonstrates considerable potential as an additional tool for sequence analysis and proteome classification.

Original languageEnglish
Pages (from-to)109-116
Number of pages8
JournalJournal of Theoretical Biology
Early online date15 Jul 2017
Publication statusPublished - 7 Oct 2017

Bibliographical note

© 2017, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International


  • Amino Acid Sequence
  • Archaea
  • Bacteria
  • Eukaryota
  • Evolution, Molecular
  • Pilot Projects
  • Proteins/chemistry
  • Proteome/classification
  • Viruses


Dive into the research topics of 'Protein lipograms'. Together they form a unique fingerprint.

Cite this