Accurate transcriptome-wide identification and quantification of alternative polyadenylation from RNA-seq data with APAIQ
- Yongkang Long1,2,11,
- Bin Zhang1,2,11,
- Shuye Tian3,11,
- Jia Jia Chan4,
- Juexiao Zhou1,2,
- Zhongxiao Li1,2,
- Yisheng Li3,5,
- Zheng An6,
- Xingyu Liao1,2,
- Yu Wang7,
- Shiwei Sun8,
- Ying Xu9,
- Yvonne Tay4,10,
- Wei Chen3 and
- Xin Gao1,2
- 1Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955 Saudi Arabia;
- 2Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955 Saudi Arabia;
- 3Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, 518055 China;
- 4Cancer Science Institute of Singapore, National University of Singapore, 117599 Singapore;
- 5Shenzhen Haoshi Biotechnology Company, Limited, Bao An District, Shenzhen, 518000 China;
- 6Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology and Institute of Bioinformatics, the University of Georgia, Athens, Georgia 30605, USA;
- 7Syneron Technology, Guangzhou, 510535 China;
- 8Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190 China;
- 9Systems Biology Lab for Metabolic Reprogramming, School of Medicine, Southern University of Science and Technology, Shenzhen, 518055 China;
- 10Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117597 Singapore
-
↵11 These authors contributed equally to this work.
Abstract
Alternative polyadenylation (APA) enables a gene to generate multiple transcripts with different 3′ ends, which is dynamic across different cell types or conditions. Many computational methods have been developed to characterize sample-specific APA using the corresponding RNA-seq data, but suffered from high error rate on both polyadenylation site (PAS) identification and quantification of PAS usage (PAU), and bias toward 3′ untranslated regions. Here we developed a tool for APA identification and quantification (APAIQ) from RNA-seq data, which can accurately identify PAS and quantify PAU in a transcriptome-wide manner. Using 3′ end-seq data as the benchmark, we showed that APAIQ outperforms current methods on PAS identification and PAU quantification, including DaPars2, Aptardi, mountainClimber, SANPolyA, and QAPA. Finally, applying APAIQ on 421 RNA-seq samples from liver cancer patients, we identified >540 tumor-associated APA events and experimentally validated two intronic polyadenylation candidates, demonstrating its capacity to unveil cancer-related APA with a large-scale RNA-seq data set.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.277177.122.
-
Freely available online through the Genome Research Open Access option.
- Received August 3, 2022.
- Accepted February 28, 2023.
This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











