Poster abstracts

Poster number 3 submitted by Robert Wang

Robust discovery and quantification of transcript isoforms from error-prone long-read RNA sequencing data

Yuan Gao, Feng Wang (Center for Computational and Genomic Medicine, The Childrens Hospital of Philadelphia), Robert Wang (Center for Computational and Genomic Medicine, The Childrens Hospital of Philadelphia), Eric Kutschera, Yang Xu, Stephan Xie (Center for Computational and Genomic Medicine, The Childrens Hospital of Philadelphia), Yuanyuan Wang, Kathryn E. Kadash-Edmondson (Center for Computational and Genomic Medicine, The Childrens Hospital of Philadelphia), Lan Lin (Department of Pathology and Laboratory Medicine, University of Pennsylvania), Yi Xing (Center for Computational and Genomic Medicine, The Childrens Hospital of Philadelphia)

Abstract:
Long-read RNA sequencing (RNA-seq) holds great potential for characterizing transcriptome variation and full-length transcript isoforms, but the relatively high error rate of current long-read sequencing platforms poses a major challenge. We present ESPRESSO, a computational tool for robust discovery and quantification of transcript isoforms from error-prone long reads. ESPRESSO jointly considers alignments of all long reads aligned to a gene and uses error profiles of individual reads to improve the identification of splice junctions and the discovery of their corresponding transcript isoforms. On both a synthetic spike-in RNA sample and human RNA samples, ESPRESSO outperforms multiple contemporary tools in not only transcript isoform discovery but also transcript isoform quantification. In total, we generated and analyzed ~1.1 billion nanopore RNA-seq reads covering 30 human tissue samples and three human cell lines. ESPRESSO and its companion dataset provide a useful resource for studying the RNA repertoire of eukaryotic transcriptomes.

Keywords: long-read RNA sequencing, transcript isoform variation, alternative splicing