Analytical validation of germline small variant detection using long-read HiFi genome sequencing

Table 2.

Long-read HiFi genome sequencing SNV/indel (<50 bp) reproducibility (GIAB high confidence)

Genome-wide RefSeq CDS
Low complexity Low mappability Segmental duplications All difficult regions Not in any difficult region All Not in any difficult region All
SNVs Count 170,332 181,786 113,753 584,487 2,721,331 3,305,818 14,023 20,467
Concordance 99.10% 99.51% 99.30% 99.54% 99.95% 99.88% 99.91% 99.85%
Insertions (1–5 bp) Count 139,069 4085 4495 151,456 69,083 221,146 48 115
Concordance 97.41% 99.38% 99.26% 97.61% 99.94% 98.35% 99.66% 99.17%
Insertions (6–15 bp) Count 15,335 335 490 16,341 6487 22,882 5 43
Concordance 98.13% 98.99% 99.31% 98.23% 99.96% 98.74% 100.00% 100.00%
Insertions (≥16 bp) Count 2527 118 119 2805 1847 4664 2 12
Concordance 98.22% 97.70% 98.01% 98.31% 99.84% 98.93% 100.00% 95.22%
Deletions (1–5 bp) Count 150,504 4459 4313 163,229 69,693 232,407 58 141
Concordance 98.13% 99.35% 99.21% 98.25% 99.97% 98.77% 99.72% 99.41%
Deletions (6–15 bp) Count 17,426 457 552 18,499 6652 24,940 10 47
Concordance 98.38% 99.17% 98.96% 98.46% 99.96% 98.86% 100.00% 100.00%
Deletions (≥16 bp) Count 3525 160 101 3771 1583 5220 3 16
Concordance 99.21% 98.92% 99.01% 99.26% 99.96% 99.50% 100.00% 100.00%
All indels Count 311,652 9518 9877 339,360 155,318 494,492 125 370
Concordance 97.90% 99.32% 99.21% 98.05% 99.95% 98.64% 99.73% 99.37%
SNVs and indels Count 498,720 191,398 123,824 940,587 2,876,675 3,817,077 14,148 20,841
Concordance 98.27% 99.50% 99.30% 98.96% 99.95% 99.71% 99.91% 99.84%
  • (indels) insertions/deletions, (RefSeq CDS) NCBI Reference Sequence gene coding sequence, (SNVs) single nucleotide variants.

This Article

  1. Genome Res. 35: 1391-1399

Preprint Server