Unraveling the hidden complexity of cancer through long-read sequencing
- Qiuhui Li1,
- Ayse G. Keskus2,
- Justin Wagner3,
- Michal B. Izydorczyk4,
- Winston Timp5,
- Fritz J. Sedlazeck4,6,7,
- Alison P. Klein8,
- Justin M. Zook3,
- Mikhail Kolmogorov2 and
- Michael C. Schatz1,8
- 1Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
- 2Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland 20892, USA;
- 3Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA;
- 4Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA;
- 5Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA;
- 6Department of Molecular and Human Genetics, Baylor College of Medicine, Texas 77030, USA;
- 7Department of Computer Science, Rice University, Houston, Texas 77251, USA;
- 8Sidney Kimmel Comprehensive Cancer Center, Department of Oncology, Johns Hopkins Medicine, Baltimore, Maryland 21031, USA
Abstract
Cancer is fundamentally a disease of the genome, characterized by extensive genomic, transcriptomic, and epigenomic alterations. Most current studies predominantly use short-read sequencing, gene panels, or microarrays to explore these alterations; however, these technologies can systematically miss or misrepresent certain types of alterations, especially structural variants, complex rearrangements, and alterations within repetitive regions. Long-read sequencing is rapidly emerging as a transformative technology for cancer research by providing a comprehensive view across the genome, transcriptome, and epigenome, including the ability to detect alterations that previous technologies have overlooked. In this Perspective, we explore the current applications of long-read sequencing for both germline and somatic cancer analysis. We provide an overview of the computational methodologies tailored to long-read data and highlight key discoveries and resources within cancer genomics that were previously inaccessible with prior technologies. We also address future opportunities and persistent challenges, including the experimental and computational requirements needed to scale to larger sample sizes, the hurdles in sequencing and analyzing complex cancer genomes, and opportunities for leveraging machine learning and artificial intelligence technologies for cancer informatics. We further discuss how the telomere-to-telomere genome and the emerging human pangenome could enhance the resolution of cancer genome analysis, potentially revolutionizing early detection and disease monitoring in patients. Finally, we outline strategies for transitioning long-read sequencing from research applications to routine clinical practice.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.











