Challenges in identifying mRNA transcript starts and ends from long-read sequencing data

Table 1.

Computational approaches to identify and quantify mRNA terminal ends in LRS data

Approach Known annotations De novo identification Terminal end selection Orthogonal features Full-length status Reference
TAPIS Optional Clustering Longest isoform with shared splice sites Terminal end A's, terminal end adapters No (Abdel-Ghany et al. 2016)
StringTie2 Optional Graph-based Unknown No No (Kovaka et al. 2019)
FLAIR Yes Clustering Read density Terminal end adapters Read level (Tang et al. 2020)
TALON Yes Clustering Clustering Terminal end A's Transcript level, FSM/ISM/NIC categories (Wyman et al. 2020)
FLAMES Yes Clustering Annotations SRS RNA-seq No (Tian et al. 2021)
Bambu Yes Probabilistic ML model Categorization and modeling Terminal end A's Read level and transcript level (Chen et al. 2023)
ESPRESSO Yes Clustering Splice junctions of terminal exons No Read level and transcript level with FSM/ISM/NIC categories (Gao et al. 2023)
IsoQuant Optional Graph-based Clustering Terminal end A's No (Prjibelski et al. 2023)
IsoTools Yes Graph-based Peak calling, read density No No (Lienhard et al. 2023)
SQANTI3 Recommended Probabilistic ML model Categorization and modeling Terminal end A's, terminal end adapters, SRS RNA-seq, complementary SRS data sets Transcript level, FSM/ISM/NIC categories (Pardo-Palacios et al. 2024a)

This Article

  1. Genome Res. 34: 1719-1734

Preprint Server