Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 3.
Figure 3.

Precision and recall for variant classes as a function of LRS coverage using assembly-based algorithms for HG002. (A) Recall for HG002 for GIAB truth sets plotted against sequencing coverage for assembly-based callers across all algorithms capable of calling SNVs. (B) Recall for HG002 against HGSVC truth sets plotted against sequencing coverage for assembly-based callers across all algorithms capable of calling indels. Recall in ONT assemblies performs better at low coverages before being surpassed by HiFi assemblies at 12×. (C) Recall for HG002 against the HGSVC Freeze 4 truth set plotted against sequencing coverage for assembly-based callers across all algorithms capable of calling SVs. (D) Precision for HG002 against HGSVC truth sets plotted against sequencing coverage for read-based callers across all algorithms capable of calling SNVs. ONT methods are comparable to HiFi precision at high coverages though are noticeably worse at coverages below 15×. (E) Precision plotted against sequencing coverage for assembly-based callers across all algorithms capable of calling indels. Like read-based methods, values for all technologies and coverages remain low, likely because of the incomplete nature of indels in complex regions in the GIAB truth set. (F) Precision plotted against sequencing coverage for assembly-based callers across all algorithms capable of calling SVs.

This Article

  1. Genome Res. 33: 2029-2040

Preprint Server