Poster abstracts

Poster number 167 submitted by Mingyi Zhu

DecoyFinder: Identification of Decoy Sequences in Sets of Homologous RNA Sequences

Mingyi Zhu (Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, United States; Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY), Jeffrey Zuber (Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, United States; Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY), Zhen Tan (Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, United States; Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY), Gaurav Sharma (University of Rochester, Department of Electrical and Computer Engineering, Rochester, NY, United States;University of Rochester, Department of Computer Science, Rochester, NY, United States), David H. Mathews (Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, United States; Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY;)

Abstract:
Motivation: RNA structure is essential roles for function. Accurate RNA secondary structure modeling is essential for understanding RNA function and targeting RNA with therapeutics. Using multiple homologous sequences, which share structure and function, secondary structure can be predicted with much higher accuracy than with a single sequence. It can be difficult, however, to establish a set of homologous sequences. We developed a method to identify sequences in a set of putative homologs that are in fact non-homologs.
Results: Previously, we developed TurboFold, to estimate the conserved structure using multiple, unaligned input homologs. Here, we report that TurboFold’s predication accuracy is slightly, but significantly, reduced due to the presence of contamination by non-homologous sequences. We developed a method called DecoyFinder, which applies machine learning trained with the features from TurboFold predictions, to detect sequences that are not homologous with the other sequences in the set. This method can identify approximately 45% of non-homologous sequences, at a rate of 5% misidentification of true homologous sequences.
Availability: DecoyFinder will be incorporated in RNAstructure, which is provided for free and open source under the GPL V2 license.

Keywords: RNA secondary structure, RNA secondary structure prediction, Machine learning