Poster abstracts

Poster number 98 submitted by Aniruddhan Govindaraman

R(A)PTOR - A tool for systematic identification of Poly(A) tails and 3’ unmapped regions from single molecule direct RNA-sequencing datasets

Aniruddhan Govindaraman (Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis), Raja Shekar Varma Kadumuri (Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis), Mir Quoseena (Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis), Sarath Chandra Janga (Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis)

Abstract:
The 3’ cleavage of pre-messenger RNA (mRNA) and successive polyadenylation is a fundamental cellular process in eukaryotes. Studies report poly-A tail as a long chain of adenine nucleotides added during RNA processing to 3’ terminal of a messenger RNA (mRNA) molecule, however, the terminal 3’ region is known to harbor additional unmappable regions (UMR) composed of poly uridylation and guanylation[1]. Although short read sequencing technologies are extensively used for study of 3’ terminal polyA regions, the major drawback of third generation sequencing technologies lies in their inability to detect full length homopolymeric sequences [1],[2]. Recent long read sequencing technologies like Nanopore sequencing enable sequencing of full length transcripts at a single molecule resolution, and currently there are no tools for systematically analyzing 3’ terminal unmapped regions from direct RNA-sequencing datasets. We present RAPTOR (https://github.com/aniram118/RAPTOR), a command line tool for 3’ terminal unmapped region analysis of nanopore direct RNA sequencing data. RAPTOR provides a comprehensive report of UMR length, sequences, conserved polyA hexamer regions, nucleotide base composition and transcript vs UMR length correlation analysis at a single molecule resolution. In our benchmarking studies, we sequenced mRNA samples obtained from HepG2 & K562 cell lines resulting in 243,802 & 598,428 reads respectively. RAPTOR identified UMRs exhibited a median length of 50-100 nt, in agreement with previous studies[1].Our results also support an enrichment of previously known conserved polyA hexamers [3]. Nucleotide composition analysis of the identified 3’ UMR regions showed an enrichment for A and U nucleotides in both HepG2 [A : 29%, U: 28%, G:20%, C:23% ] and K562 [A : 30%, U: 29%, G:19%, C:22%] and interestingly, guanylation was observed in upstream and downstream regions of UMR while uridylation was found to occur more in central regions, suggesting their characteristic role in mRNA stability. In addition, conserved motif analysis of UMR regions followed by RBP binding site analysis, identified several RBPs including HNRPK, PCB2, SART SRSF9, HNRPR and RBM4 to be enriched in the unmapped regions, suggestive of an unappreciated of role of these RBPs in binding to 3’ tails of mRNAs.

References:
1. Chang, Hyeshik; Lim, Jaechul; Ha, Minju; Kim, V Narry (2014) TAIL-seq: Genome-wide Determination of Poly(A) Tail Length and 3′ End Modifications. 10.1016/j.molcel.2014.02.007

2. Byrne, A., Beaudin, A. E., Olsen, H. E., Jain, M., Cole, C., Palmer, T., … Vollmers, C. (2017). Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells. Nature Communications, 8, 16027. http://doi.org/10.1038/ncomms16027

3. Pesole, G., Liuni, S., Grillo, G., Licciulli, F., Mignone, F., Gissi, C., & Saccone, C. (2002). UTRdb and UTRsite: specialized databases of sequences and functional elements of 5′ and 3′ untranslated regions of eukaryotic mRNAs. Update 2002. Nucleic Acids Research, 30(1), 335–340.

Keywords: Polyadenylation, Nanopore Sequencing