Table 2.

Sequence Quality as a Function of Read Lengths

Organisms sequenced Percent fidelity of sequence and no. of ambiguous bases within 100-base intervals from first readable base Useful data range (bases)
1–100 101–200 201–300 301–400 401–500 501–600 601–700
E. coli apaG gene99 (0)99 (1)100 (0)99 (1)100 (0)99 (0)95 (2)700
E. coli HtpG gene99 (1)99 (1)99 (0)98 (2)95 (5)87 (5)64 (14)560
E. coli ldhA gene97 (2)100 (0)100 (0)98 (2)98 (2)97 (3)91 (5)740
S. pneumoniae pspAgene98 (0)100 (0)100 (0)100 (0)100 (0)99 (1)91 (5)790
U. urealyticum ⧣1 96 (4)100 (0)100 (0)100 (0)99 (1)96 (3)90 (3)655
U. urealyticum ⧣293 (3)98 (1)100 (0)100 (0)99 (1)99 (1)89 (6)689
U. urealyticum ⧣393 (3)98 (1)100 (0)100 (0)100 (0)99 (0)77 (8)674
U. urealyticum ⧣496 (3)99 (1)99 (1)97 (2)92 (8)96 (3)68 (8)650
M. fermentans[ii] 100 (0)0[ii] 0[ii] 0[ii] 0[ii] 14725
 Average96.8 (1.6)99.1 (0.6)99.8 (0.1)99.1 (0.6)97.8 (1.8)96.5 (1.9)83 (6.0)712 ± 18

[i] Each unedited sequence determined from genomic DNA template is aligned and compared with its corresponding GenBank database sequence. The fidelity of each sequence is listed as percentage agreement in a 100-bp interval. The number following the percentage value is the number of ambiguous bases (N) for each interval. The usable data range is the length of sequence that would be employed after human editing of the computer-generated base-calls.

[ii] The M. fermentans initiation factor database sequence only overlapped the new sequence for 150 bases, limiting the comparison to that stretch.