Abstract

Long-read sequencing provides a more complete view of the genome than short-read sequencing, with improved detection of structural variants, tandem repeats, and small variants (single nucleotide variants and insertions and deletions) in difficult-to-map regions. One limitation of long-read sequencing has been high input DNA requirements, with several micrograms required per sample. Here, we evaluate two methods of amplification-based long-read, whole-genome sequencing: ultra-low input HiFi (ULI-HiFi) sequencing and droplet multiple displacement amplification (dMDA) sequencing. When benchmarked against the Genome in a Bottle reference set (NA24385), we observe high precision and recall of single nucleotide variants (SNVs) with ULI-HiFi compared to the dMDA-amplified samples (F1 scores for SNVs of 99.82% for ULI-HiFi compared to 89.46% for dMDA). Across a catalog of >1.6 million tandem repeats (TRs), ULI-HiFi achieves 90.4% perfect concordance and 98.9% accuracy when allowing for single motif differences. ULI-HiFi also illuminates medically-important genes that were poorly mapped by short-read sequencing. We further apply ULI-HiFi to analyze a normal, polyp, and adenocarcinoma sample from a patient with familial adenomatous polyposis (FAP), a hereditary form of colorectal cancer. We identify a TR that progressively expanded in length from normal to polyp to adenocarcinoma. This repeat is located in the 5'; UTR of LIMD1, a reported tumor suppressor. Reporter assays reveal significantly reduced expression in colorectal cancer cell lines with increasing repeat length in the LIMD1 5' UTR. We conclude that ULI-HiFi improves the characterization of genetic variants in dark regions of genomes from patient samples, enabling a better understanding of human disease.

Loading
Loading
Back to top