Poster abstracts

Poster number 29 submitted by Defne Ceyhan

Predicting the subcellular localization of RNA-binding proteins

Defne Ceyhan (Memorial Sloan Kettering Cancer Center), Jessica Lin (Memorial Sloan Kettering Cancer Center), Cyrus Tam (Memorial Sloan Kettering Cancer Center), Ilyes Baali (Memorial Sloan Kettering Cancer Center), Kaitlin Laverty (Memorial Sloan Kettering Cancer Center), Quaid Morris (Memorial Sloan Kettering Cancer Center)

Abstract:
RNA-binding proteins (RBPs) bind specific RNA sequences or structures to regulate various post-transcriptional processes, including splicing, nuclear export, localization, translation, and RNA degradation. The processes that an RBP is involved in are closely related to its subcellular localization — whether it binds to RNA in nuclear or cytoplasmic compartments. A number of experimental methods have been developed to assess RBP function. In particular, enhanced crosslinking and immunoprecipitation (eCLIP) assays have allowed for identification of an RBP’s RNA-binding sites [1]. Peakhood, a recently developed tool, has allowed for more accurate eCLIP analysis, enabling the determination of the RNA context in which the RBP binds, specifically whether the bound RNA is spliced or unspliced [2]. Using Peakhood, we derived the proportions of exonic sites, spliced-transcript-context sites, and exon border sites from 244 eCLIP experiment datasets representing 168 unique RBPs and two cell lines [3]. We then determined the proportion of eCLIP peak sites overlapping with a 3’UTR. In addition to these site statistics, we explored nanopore RNA-seq data from both chromatin and cytoplasmic compartments by mapping isoform-level quantifications against Peakhood-predicted RBP-bound transcripts [4]. Combining these features derived from eCLIP and RNA-seq data, we developed a machine learning model to predict the subcellular localization of RBPs. This classifier has the potential to provide additional insight into RBP cellular function. To further improve our approach, we intend to explore other methods for deriving “true” localization labels, as well as how filtering for eCLIP data quality and incorporating additional informative features impact classifier performance.

References:
1. Van Nostrand, E. L. et al. (2016). Nature Methods 13.
2. Uhl, M. et al. (2022). Bioinformatics 38(4).
3. Van Nostrand, E. L. et al. (2020). Nature
4. Ietswaart, R. et al. (2024). Molecular Cell 84(14).

Keywords: RNA-binding proteins, machine learning, subcellular localization