On the structural repertoire of pools of short, random RNA sequences

Michael Stich, Carlos Briones, Susanna C. Manrubia

Research output: Contribution to journalArticle

Abstract

A detailed knowledge of the mapping between sequence and structure spaces in populations of RNA molecules is essential to better understand their present-day functional properties, to envisage a plausible early evolution of RNA in a prebiotic chemical environment and to improve the design of in vitro evolution experiments, among others. Analysis of natural RNAs, as well as in vitro and computational studies, show that certain RNA structural motifs are much more abundant than others, pointing out a complex relation between sequence and structure. Within this framework, we have investigated computationally the structural properties of a large pool (10 molecules) of single-stranded, 35 nt-long, random RNA sequences. The secondary structures obtained are ranked and classified into structure families. The number of structures in main families is analytically calculated and compared with the numerical results. This permits a quantification of the fraction of structure space covered by a large pool of sequences. We further show that the number of structural motifs and their frequency is highly unbalanced with respect to the nucleotide composition: simple structures such as stem-loops and hairpins arise from sequences depleted in G, while more complex structures require an enrichment of G. In general, we observe a strong correlation between subfamilies-characterized by a fixed number of paired nucleotides-and nucleotide composition. Our results are compared to the structural repertoire obtained in a second pool where isolated base pairs are prohibited.
Original languageEnglish
Pages (from-to)750-763
Number of pages14
JournalJournal of Theoretical Biology
Volume252
Issue number4
DOIs
Publication statusPublished - 21 Jun 2008

Fingerprint

RNA
Nucleotides
nucleotide sequences
nucleotides
Inverted Repeat Sequences
Prebiotics
Nucleotide Motifs
family structure
Base Pairing
prebiotics
Molecules
functional properties
Chemical analysis
Secondary Structure
Structural properties
Complex Structure
Structural Properties
Quantification
Population
stems

Bibliographical note

NOTICE: this is the author’s version of a work that was accepted for publication in Journal of theoretical biology. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Stich, M, Briones, C & Manrubia, SC, 'On the structural repertoire of pools of short, random RNA sequences' Journal of theoretical biology, vol. 252, no. 4 (2008) DOI http://dx.doi.org/10.1016/j.jtbi.2008.02.018

Keywords

  • RNA motif
  • genotype–phenotype map
  • RNA folding
  • RNA world
  • structural family

Cite this

Stich, Michael ; Briones, Carlos ; Manrubia, Susanna C. / On the structural repertoire of pools of short, random RNA sequences. In: Journal of Theoretical Biology. 2008 ; Vol. 252, No. 4. pp. 750-763.
@article{586d33d4c03244c790f4b44ad43b9bf3,
title = "On the structural repertoire of pools of short, random RNA sequences",
abstract = "A detailed knowledge of the mapping between sequence and structure spaces in populations of RNA molecules is essential to better understand their present-day functional properties, to envisage a plausible early evolution of RNA in a prebiotic chemical environment and to improve the design of in vitro evolution experiments, among others. Analysis of natural RNAs, as well as in vitro and computational studies, show that certain RNA structural motifs are much more abundant than others, pointing out a complex relation between sequence and structure. Within this framework, we have investigated computationally the structural properties of a large pool (10 molecules) of single-stranded, 35 nt-long, random RNA sequences. The secondary structures obtained are ranked and classified into structure families. The number of structures in main families is analytically calculated and compared with the numerical results. This permits a quantification of the fraction of structure space covered by a large pool of sequences. We further show that the number of structural motifs and their frequency is highly unbalanced with respect to the nucleotide composition: simple structures such as stem-loops and hairpins arise from sequences depleted in G, while more complex structures require an enrichment of G. In general, we observe a strong correlation between subfamilies-characterized by a fixed number of paired nucleotides-and nucleotide composition. Our results are compared to the structural repertoire obtained in a second pool where isolated base pairs are prohibited.",
keywords = "RNA motif, genotype–phenotype map, RNA folding, RNA world, structural family",
author = "Michael Stich and Carlos Briones and Manrubia, {Susanna C.}",
note = "NOTICE: this is the author’s version of a work that was accepted for publication in Journal of theoretical biology. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Stich, M, Briones, C & Manrubia, SC, 'On the structural repertoire of pools of short, random RNA sequences' Journal of theoretical biology, vol. 252, no. 4 (2008) DOI http://dx.doi.org/10.1016/j.jtbi.2008.02.018",
year = "2008",
month = "6",
day = "21",
doi = "10.1016/j.jtbi.2008.02.018",
language = "English",
volume = "252",
pages = "750--763",
journal = "Journal of Theoretical Biology",
issn = "0022-5193",
publisher = "Academic Press Inc.",
number = "4",

}

On the structural repertoire of pools of short, random RNA sequences. / Stich, Michael; Briones, Carlos; Manrubia, Susanna C.

In: Journal of Theoretical Biology, Vol. 252, No. 4, 21.06.2008, p. 750-763.

Research output: Contribution to journalArticle

TY - JOUR

T1 - On the structural repertoire of pools of short, random RNA sequences

AU - Stich, Michael

AU - Briones, Carlos

AU - Manrubia, Susanna C.

N1 - NOTICE: this is the author’s version of a work that was accepted for publication in Journal of theoretical biology. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Stich, M, Briones, C & Manrubia, SC, 'On the structural repertoire of pools of short, random RNA sequences' Journal of theoretical biology, vol. 252, no. 4 (2008) DOI http://dx.doi.org/10.1016/j.jtbi.2008.02.018

PY - 2008/6/21

Y1 - 2008/6/21

N2 - A detailed knowledge of the mapping between sequence and structure spaces in populations of RNA molecules is essential to better understand their present-day functional properties, to envisage a plausible early evolution of RNA in a prebiotic chemical environment and to improve the design of in vitro evolution experiments, among others. Analysis of natural RNAs, as well as in vitro and computational studies, show that certain RNA structural motifs are much more abundant than others, pointing out a complex relation between sequence and structure. Within this framework, we have investigated computationally the structural properties of a large pool (10 molecules) of single-stranded, 35 nt-long, random RNA sequences. The secondary structures obtained are ranked and classified into structure families. The number of structures in main families is analytically calculated and compared with the numerical results. This permits a quantification of the fraction of structure space covered by a large pool of sequences. We further show that the number of structural motifs and their frequency is highly unbalanced with respect to the nucleotide composition: simple structures such as stem-loops and hairpins arise from sequences depleted in G, while more complex structures require an enrichment of G. In general, we observe a strong correlation between subfamilies-characterized by a fixed number of paired nucleotides-and nucleotide composition. Our results are compared to the structural repertoire obtained in a second pool where isolated base pairs are prohibited.

AB - A detailed knowledge of the mapping between sequence and structure spaces in populations of RNA molecules is essential to better understand their present-day functional properties, to envisage a plausible early evolution of RNA in a prebiotic chemical environment and to improve the design of in vitro evolution experiments, among others. Analysis of natural RNAs, as well as in vitro and computational studies, show that certain RNA structural motifs are much more abundant than others, pointing out a complex relation between sequence and structure. Within this framework, we have investigated computationally the structural properties of a large pool (10 molecules) of single-stranded, 35 nt-long, random RNA sequences. The secondary structures obtained are ranked and classified into structure families. The number of structures in main families is analytically calculated and compared with the numerical results. This permits a quantification of the fraction of structure space covered by a large pool of sequences. We further show that the number of structural motifs and their frequency is highly unbalanced with respect to the nucleotide composition: simple structures such as stem-loops and hairpins arise from sequences depleted in G, while more complex structures require an enrichment of G. In general, we observe a strong correlation between subfamilies-characterized by a fixed number of paired nucleotides-and nucleotide composition. Our results are compared to the structural repertoire obtained in a second pool where isolated base pairs are prohibited.

KW - RNA motif

KW - genotype–phenotype map

KW - RNA folding

KW - RNA world

KW - structural family

UR - http://www.scopus.com/inward/record.url?scp=44449168461&partnerID=8YFLogxK

U2 - 10.1016/j.jtbi.2008.02.018

DO - 10.1016/j.jtbi.2008.02.018

M3 - Article

AN - SCOPUS:44449168461

VL - 252

SP - 750

EP - 763

JO - Journal of Theoretical Biology

JF - Journal of Theoretical Biology

SN - 0022-5193

IS - 4

ER -