2011 Rustbelt RNA Meeting
RRM

 

Home

Registration

Agenda

Abstracts

Directions

Talk abstracts

Talk on Friday 01:15-01:30pm submitted by James Roll

Inference of recurrent 3D RNA motifs from sequence

James Roll (Department of Mathematics and Statistics, Center for Biomolecular Sciences, Bowling Green State University, Bowling Green OH 43403), Craig L. Zirbel (Department of Mathematics and Statistics, Center for Biomolecular Sciences, Bowling Green State University, Bowling Green OH 43403), Anton I. Petrov (Department of Biological Sciences, Bowling Green State University, Bowling Green OH 43403), Neocles B. Leontis (Department of Chemistry, Center for Biomolecular Sciences, Bowling Green State University, Bowling Green OH 43403)

Abstract:
Correct prediction of RNA 3D structure from sequence is a major challenge and a bottle-neck in genomic analysis. Most of the eukaryl genome is transcribed, and most transcripts are non-coding RNA (ncRNA). Moreover, many ncRNAs comprise one or more structured domains. An important sub-goal in the prediction of global 3D structures is the inference of the structures of recurrent, modular motifs corresponding to hairpin and internal "loop" regions in secondary structures. Such recurrent 3D motifs can play architectural roles (e.g., kink-turns), serve to anchor RNA tertiary interactions (e.g., GNRA loops), or provide binding sites for proteins or ligands. Different sequences can form the same 3D motif. We extract all hairpin and internal loops from a non-redundant (NR) set of RNA 3D structures from the PDB/NDB and cluster them in geometrically similar families. For each motif, we construct a probabilistic model for sequence variability based on a hybrid Stochastic Context-Free Grammar/Markov Random Field (SCFG/MRF) method. To parameterize each model, we use all instances of the motif found in the NR dataset and RNA knowledge of nucleotide interactions, especially isosteric basepairs and their substitution patterns. SCFG techniques can account for nested pairs and insertions, while MRF ideas can handle non-nested interactions. Given the sequence of a hairpin or internal loop as input, each SCFG/MRF model calculates the probability that it forms the corresponding 3D motif. If the score is in the same range as sequences known to form the 3D structure, we infer that the new sequence forms the same 3D structure. This approach correctly infers the 3D structures of nearly all structured internal loops when using as input sequences from 3D structures, a first validation step. A web server will be presented for users to input motif sequences and obtain the most likely 3D structures, or a statistical indication that the motif is likely to be new.

Keywords: RNA, Motif, SCFG