Revolutionizing genomics and medicine—one long molecule at a time

  1. Fritz J. Sedlazeck3,4,5
  1. 1Institute for Integrative Systems Biology (I2SysBio), Spanish National Research Council (CSIC), Paterna, 46980, Spain;
  2. 2Department of Human Genetics and Department of Internal Medicine; Radboudumc Research Institute for Medical Innovation; Radboud Centre for Infectious Diseases (RCI); Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands;
  3. 3Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA;
  4. 4Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA;
  5. 5Department of Computer Science, Rice University, Houston, Texas 77005, USA
  • Corresponding authors: Ana.Conesa{at}csic.es, Alexander.Hoischen{at}radboudumc.nl, Fritz.Sedlazeck{at}bcm.edu
  • Long-read sequencing (LRS) has matured, and the dramatically increased accuracy, ever-increasing throughput, and access now allow new and advanced studies even at scale. This Special Issue of Genome Research on “Long-read DNA and RNA Sequencing Applications in Biology and Medicine” garnered a record number of submissions, reflecting both the intense and broad interest in the technologies and the next round of revolutionary genomic science enabled by them. This interest is rooted in that all long-read technologies combine the core benefit of utilizing much longer DNA molecules (from tens of kb to Mb-scale), which offers several benefits common to most long-read technologies including: improved sensitivity to structural variants (SVs), detection of (complex) cytogenetic aberrations; the ability to assemble and phase genomes, with the potential to move away from variants to alleles/haplotypes; sensitivity for repeat expansion/contraction detection; and access to the “dark genome,” for example, repeat-rich and GC-rich areas of the human genome, even including segmental duplications (SegDups) or other sequence homologies.

    In this first of two Special Issues, we have assembled a diverse collection of research and review articles highlighting the novel applications and developments around long-read sequencing. The trends that can be deduced from this Special Issue are that a steadily increasing amount of human genome and transcriptome studies are now enabled, more assembly-based analyses show the power of long-read sequencing, and non-human studies showcase the value for non-reference and model species.

    Two-thirds of all manuscripts in this issue include the study of human samples, especially focusing on germline testing and rare diseases genetics. It's remarkable to see the successes of these studies despite long-read sequencing being on the market and routinely used for only a decade now. First, rare disease cohort studies of the added benefit of LRS are emerging (Hiatt et al. 2024; Eisfeldt et al. 2024a), and many more systematically designed studies are still under way. Currently, these focus on the identification of structural variants (SVs) and short tandem repeat (STR) expansions. In addition, particular efforts to provide complete sets of SVs now are possible at the population level (Gustafson et al. 2024), enabling a full set of SVs with breakpoint resolution, dramatically altering the catalog of SVs in human populations. Intriguing studies focusing on very complex SVs appear in this issue (Guitart et al. 2024; Eisfeldt et al. 2024b; Bilgrav Saether et al. 2024), and additional tools and visualizations of all clinically or biologically relevant variant types are emerging (De Coster et al. 2024; Tesi et al. 2024; Zhou et al. 2024), as well as tools to call additional genomic information (Gocuk et al. 2024). While current costs are still prohibitive for utilizing LRS as a first-tier, generic clinical test for all germline variants in human genetics, this may rapidly change. Nevertheless, Iyer et al. (2024) review approaches for cost-efficient targeted sequencing to enhance variant detection in certain regions of the genome.

    Advancements in both accuracy and read length have driven the rapid expansion of improved assembly-based analysis methods, which are now capable of delivering nearly telomere-to-telomere and chromosome-level assemblies, for non-human and non-reference species, too, as illustrated by Kamath et al. (2024), Koren et al. (2024), Gardner et al. (2024), and Byerly et al. (2024). Long-read data now offer unprecedented insights into some of the most complex regions of genomes (de Groot et al. 2024; Volarić et al. 2024) and pave the way for the development of the next generation of high-quality reference genomes (Li et al. 2024). Moreover, novel assays and analysis methods facilitate studying the relationship between the genome and the epigenome using long reads (Jha et al. 2024).

    Long reads can capture full-length transcript molecules, offering enhanced resolution for detecting alternative isoforms and accurately profiling transcriptome complexity. This Special Issue highlights the expanding applications of LRS in transcriptome studies, which are increasingly addressing complex questions in transcriptome biology.

    Several studies in this issue demonstrate the power of LRS in cancer research, where it has identified deep intronic variants and aberrant splicing across various cancer types (Gulsuner et al. 2024; Pacholewska et al. 2024), and characterized novel transcriptional features linked to tumor progression (Lee et al. 2024). Additionally, increased read-throughput and reduced costs are enabling large-scale studies to capture transcriptome variability across multiple samples. For example, Zhang et al. (2024) profiled isoform diversity in house mouse populations using long-read RNA-seq, while Adams and Vollmers (2024) combined R2C2 library preparation with nanopore sequencing to profile the transcriptome across numerous mouse tissues, creating a Tissue-Level Atlas of Mouse Isoforms (TAMI).

    Adaptations and advancements in experimental protocols paired with RNA LRS have allowed researchers to go beyond bulk transcriptomics. Numerous studies now employ long reads for single-cell and spatial transcriptomics analyses, as reviewed by Belchikov et al. (2024), to profile isoform usage across diverse cell types. Meanwhile, the integration of long-read RNA sequencing with a translatomic method (LR Frac-seq) introduced by Ritter et al. (2024) enables subcellular fractionation, revealing subcellular enrichment for thousands of transcripts, while the long-read Ribo-STAMP (LR-Ribo-STAMP) technique captures the translating transcriptome (Jagannatha et al. 2024). Single-molecule nanopore sequencing technologies uniquely enable the sequencing of native RNA molecules, offering exciting possibilities for studying RNA modifications and their roles in transcriptome function and disease (Diensthuber et al. 2024, Teng et al. 2024). Despite all these successes, important challenges such as coverage needs remain for long-read RNA applications as reviewed by Calvo-Roitberg et al. (2024). Nevertheless, Zeglinski et al. (2024) demonstrate the value of nanopore sequencing even for quality control of gene therapy vectors.

    This Special Issue also highlights the power of long-read sequencing for rapid pathogen characterization and epidemiological control. For instance, Gomez-Simmonds et al. (2024) show the application of nanopore sequencing to profile numerous enterobacteria isolates, identifying sources of acquired antibiotic resistance, while Slizovskiy et al. (2024) present TELSeq, a targeted method to detect antimicrobial resistance genes (ARGs) and mobile genetic elements to study the transferability of ARG. However, Lohde et al. (2024) caution that nanopore sequencing accuracy can be compromised by DNA methylation, causing errors that may lead to false isolate identification. This limitation could be addressed by using PCR-based sequencing, offering a promising pathway for rapid outbreak tracing with long-read technologies. Conversely, bacterial DNA modifications can be of scientific interest, as they impact bacterial function, and Liu et al. (2024) introduce a novel pipeline to accurately detect DNA modifications in prokaryotes.

    We would like to thank all the authors, reviewers, and the Genome Research editorial team, especially Dr. Hillary Sussman, for their hard work and significant contributions to this Special Issue. Given all these studies highlighting the benefits of long-read sequencing, it is clear that these technologies have already made a substantial impact in many studies. We look forward to the next decade of innovations around long reads as they reveal novel genomic and biological insights, and we are reminded of this quote from Carl Sagan: “Somewhere, something incredible is waiting to be known.”

    Competing interest statement

    A.C., A.H., and F.J.S. served as Guest Editors for this issue of Genome Research, and had access to all papers prior to publication. F.J.S. obtained research support from Illumina, PacBio, and ONT. For some research studies by A.H., reagent costs were in part shared between Radboudumc and Bionano, or Radboudumc and PacBio. A.C. obtained in-kind support from PacBio. A.C. participates with ONT in the EU-funded project LongTREC.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    References

    | Table of Contents

    Preprint Server