Table 3.
Quality Assessment Measurements at Various Stages of Atlas Assembly
|
Reads |
BCM trace-quality (TQ) analysis on all traceruns primary feedback to production group. |
| BCM-cross-repeat scan for wrong-organism repeats (similar to RepeatMasker-primspec). | |
| All reads scanned for first and last 50-base window with no contaminant matches and <1.25 expected errors. | |
| Head and tail beyond the windows trimmed off. Remaining insert required to have 100 bases of Phred quality ≥20 (for WGS) or 50 such bases (for BAC reads). | |
| Trimmed reads used to compute oligo frequencies (32-mers) over all WGS sequence; only oligos with frequency ≤12 (∼3 times the coverage) used to seed overlaps. | |
| Untrimmed BAC and WGS reads used in assembly (masked for contaminant and vector), WGS read must have passed quality or be mate of passed and fished read. | |
| eBAC assembly internal checks | Post-Phrap, paired ends used to split and trim contigs which are inconsistent internally or cannot be consistently scaffolded. |
| BAC purification QC | Each tracerun (96-well group) checked for coassembly against other Traceruns. |
| Coassembly indicated by participation in same contigs (one test) or in same scaffold (for comparison). | |
| Groups of traceruns not coassembling with the bulk of a project are pulled, and if comprising ≥200 passed reads, placed in a “synthetic project.” | |
| (Or relocated to their correct original project if possible, based on both sequence similarity and lab tracking proximity.) | |
| Bactigging QC | Linearized sequence for each enriched BAC scaffold BLASTZ'd against others (rendered efficient by prefilter for shared WGS reads). |
| Enriched BACs with excess overlaps flagged for closer examination in BAC purification. | |
| Bactigs reassembled, scaffolded into superbactigs, laid out by markers. Adjacent bactigs whose terminal BACs had low-confidence overlaps are re-examined for overlaps and joined if confirmed. | |
| Mapping QC | Markers, mouse synteny, human synteny, and FPC all examined simultaneously along with superbactig data primarily driven by BAC ends; feedback between assembly layout of BACs and FPC mapping group. |
| Overall checks | Alignment with finished BACs (and multi-BAC regions from NISC) dot-plotting (BLASTZ and atlas-dot). Alignment and scoring using MUMmer. |
| Large-scale alignment with Mouse, dot-plotting (BLASTZ and atlas-dot); see mapping QC. | |
| Duplications and collapses: | |
| Oligo analysis (24-mers): check for regions overrepresented in assembly (artifactual duplications) especially at bactig and superbactig boundaries-found and corrected small number of cases (<6). | |
| Approximately 4% of unique WGS oligomers missing in final assembly (as compared with 1% in Mouse)-Oligomers with frequency 20-50 underrepresented in Mouse (by 10% to 20%)-oligomer representation in Rnor 3.1 consistently ∼96% beyond frequency = 100. | |
|
|
cDNA and EST alignment consistency.
|











