Poster abstracts
Poster number 11 submitted by Arun Kumar Boddapati
A computational framework for classifying RNA-binding proteins based on their preference for RNA structural motifs
Arun Kumar Boddapati (BioHealth Informatics and Indiana University-Purdue University), Sarath Chandra Janga (BioHealth Informatics and Indiana University-Purdue University)
Abstract:
RNA molecules are composed of conserved sequences or structural components. RNA molecules with structural components form 3D structures and contain secondary structures which are recognized as structural motifs. For instance, the secondary structures can be hairpin loops, helices, internal loops, bulges and junction loops. RNA Binding Proteins (RBPs) predominantly recognizing these “structural elements” can be categorized as structure recognizing RBPs while those exhibiting a preference for sequences can be classified as sequence binding RBPs.
RNA binding proteins are components of the post-transcriptional mechanism of the cell which binds to RNA by recognizing motifs on RNA. Although several algorithms have been developed to identify the RNA recognition motifs of an RBP, most approaches assume either a sequence or structural preference of an RBP. In this study, we adopt an unbiased approach to classify 172 RBPs based on their experimentally known CLIP-sequencing binding site profiles into sequence, structure or their preference to recognize both. Our approach employs BEAM [1] (BEAr Motif finder) tool together with BEAR[2] encoding of RNA motifs and RNAfold[3] tools to find conserved structural motifs across known RNA binding sites from various CLIP and eCLIP datasets in the public domain.
The RBPs were categorized into sequence recognizing and structure recognizing proteins according to a newly proposed Motif Structure Score (MSS). The score considers the coverage of the conserved sequences across the top ten most significant motifs recognized from the output of BEAM and assigns a value from 0 to 1 to determine the sequence to structure recognition preference of an RBP. Based on the distribution of Motif Structure Score, we classified the RBPs above a threshold of 0.6 as structure recognizing and while those below 0.4 as sequence recognizing.
Such an unbiased classification of RBPs based on available CLIP profiles can provide an ensemble of recognition profiles recognized by RBPs to enable a deeper understanding of their sequence/structural binding space along with their combinatorial regulatory mechanisms.
References:
1. Pietrosanto, M., et al., A novel method for the identification of conserved structural patterns in RNA: From small scale to high-throughput applications. Nucleic acids research, 2016. 44(18): p. 8600-8609.
2. Mattei, E., et al., A novel approach to represent and compare RNA secondary structures. Nucleic acids research, 2014. 42(10): p. 6146-6157.
3. Hofacker, I.L., RNA secondary structure analysis using the Vienna RNA package. Current protocols in bioinformatics, 2009: p. 12.2. 1-12.2. 16.
Keywords: RBP, Structure, Motif Structure Score