Talk abstracts

Talk on Friday 03:15-03:30pm submitted by Raja Shekar Varma Kadumuri

Accurate transcriptome-wide identification of m5C RNA modification events at single molecule resolution from direct RNA sequencing of human cell lines

Raja Shekar Varma Kadumuri (Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University), Mir Quoseena (Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University), Sarath Chandra Janga (Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University.)

Abstract:
N5-methyl Cytosine (m5C) is an abundant chemical modification of RNA known to regulate RNA processing[1], stability[2], mRNA export[3], cleavage and translation processes. Although recent methods have enabled the epitranscriptome profiling of m5C in various cell lines and tissues[3], current sequencing based methods have some major drawbacks including multifaceted labor-intensive protocols, cross-reactivity to chemical compounds and ambiguous mapping of m5C positions resulting from short read sequencing technologies[4]. Recent nanopore-based direct RNA sequencing provides an opportunity to simultaneously obtain full length isoforms and m5C modification detection with reduced labor-intensive processes and better accuracy[5, 6]. However, currently there is no tool to detect RNA modifications using Direct RNA sequencing data. Here, we present RAVEN, a deep neural network-based framework developed mainly on Convolutional Neural Network (CNN) architecture, for the detection of m5C modifications from nanopore-based direct RNA sequencing data. Raven is trained and validated on ~10,000 experimentally known N5-methyl Cytosine (m5C) and unmodified Cytosine (C) signal signatures[3] from matched polyA direct RNA sequencing and bisulfite treated short read sequencing data, generated in a human Hela cell line. Raven takes the albacore base called fast5 files resulting from direct RNA sequencing as input to provide m5C modification predictions at read level with an accuracy ~80%, along with a statistical probability of an m5C modification at a genomic loci considering the depth of the aligned reads. Raven also enables machine learning models including K-Nearest neighbor and Ada boost classifiers which were found to show 73-75% accuracy for m5C prediction. Current version of Raven can be accessed at (github wiki page: https://github.iu.edu/rkadumu/Raven/wiki)

References:
1 Kadumuri, R.V. and Janga, S.C. (2018) Epitranscriptomic Code and Its Alterations in Human Disease. Trends Mol Med
2 Hussain, S., et al. (2013) Characterizing 5-methylcytosine in the mammalian epitranscriptome. Genome biology 14, 215
3 Yang, X., et al. (2017) 5-methylcytosine promotes mRNA export - NSUN2 as the methyltransferase and ALYREF as an m(5)C reader. Cell research 27, 606-625
4 Wreczycka, K., et al. (2017) Strategies for analyzing bisulfite sequencing data. Journal of biotechnology 261, 105-115
5 Qian Liu, D.C.G., Dieter Egli, Kai Wang (2018) NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data. bioRxiv
6 Marcus H Stoiber, J.Q., Rob Egan, Ji Eun Lee, Susan E Celniker, Robert Neely, Nicholas Loman, Len Pennacchio, James B Brown (2017) De novo Identification of DNA Modifications Enabled by Genome-Guided Nanopore Signal Processing. bioRxiv

Keywords: N5-methyl cytosine modification, Deep learning neural network, Direct RNA nanopore sequencing