Poster abstracts

Poster number 45 submitted by Mingyi Zhu

DecoyFinder: Identification of Decoy Sequences in Sets of Homologous RNA Sequences

Mingyi Zhu (Center for RNA Biology, University of Rochester Medical Center; Department of Biochemistry and Biophysics, University of Rochester Medical Center), Jeffrey Zuber (Center for RNA Biology, University of Rochester Medical Center; Department of Biochemistry and Biophysics, University of Rochester Medical Center), Zhen Tan (Center for RNA Biology, University of Rochester Medical Center;Department of Biochemistry and Biophysics, University of Rochester Medical Center), Gaurav Sharma (Department of Electrical and Computer Engineering, University of Rochester;Department of Computer Science,University of Rochester), David H. Mathews (Center for RNA Biology, University of Rochester Medical Center; Department of Biochemistry and Biophysics, University of Rochester Medical Center;Department of Biostatistics and Computational Biology)

Abstract:
Abstract
Motivation: RNA structure plays essential roles for its functions. Using multiple homologous sequences, which share structure and function, secondary structure can be predicted with much higher accuracy than with a single sequence. It can be difficult, however, to establish a set of homologous sequences. We developed a method to identify sequences in a set of putative homologs that are in fact non-homologs.
Results: Previously, we developed TurboFold, to estimate the conserved structure using multiple, unaligned input homologs. Here, we report that TurboFold’s predication accuracy reduces due to the presence of contamination, but the reduction is significant. We developed a method called DecoyFinder, which applies machine learning trained with the feature data from TurboFold, to detect sequences that are not homologous with the other sequences in the set. DecoyFinder can identify approximately 45% of non-homologous sequences, at a rate of 5% misidentification of true homologous sequences.
Availability: DecoyFinder and TurboFold are incorporated in RNAstructure, which is provided free and open source under the GPL V2 license. It can be downloaded at http://rna.urmc.rochester.edu/RNAstructure.html

References:
Mathews, D. H. (2004). Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA, 10(8), 1178-1190. https://doi.org/10.1261/rna.7650904
Mathews, D. H., Disney, M. D., Childs, J. L., Schroeder, S. J., Zuker, M., & Turner, D. H. (2004). Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci U S A, 101(19), 7287-7292. https://doi.org/10.1073/pnas.0401799101
Tan, Z., Fu, Y., Sharma, G., & Mathews, D. H. (2017). TurboFold II: RNA structural alignment and secondary structure prediction informed by multiple homologs. Nucleic Acids Res, 45(20), 11570-11581. https://doi.org/10.1093/nar/gkx815

Keywords: RNA structure predication , Machine learning, RNA secondary structure