Markup | Genome Research

Table 2.

Sequence Quality as a Function of Read Lengths

Organisms sequenced		Percent fidelity of sequence and no. of ambiguous bases within 100-base intervals from first readable base		Useful data range (bases)
Organisms sequenced		1–100	101–200	Useful data range (bases)		201–300	301–400	401–500	501–600	601–700
E. coli apaG gene	99 (0)	99 (1)	100 (0)	99 (1)	100 (0)	99 (0)	95 (2)	700
E. coli HtpG gene	99 (1)	99 (1)	99 (0)	98 (2)	95 (5)	87 (5)	64 (14)	560
E. coli ldhA gene	97 (2)	100 (0)	100 (0)	98 (2)	98 (2)	97 (3)	91 (5)	740
S. pneumoniae pspAgene	98 (0)	100 (0)	100 (0)	100 (0)	100 (0)	99 (1)	91 (5)	790
U. urealyticum ⧣1	96 (4)	100 (0)	100 (0)	100 (0)	99 (1)	96 (3)	90 (3)	655
U. urealyticum ⧣2	93 (3)	98 (1)	100 (0)	100 (0)	99 (1)	99 (1)	89 (6)	689
U. urealyticum ⧣3	93 (3)	98 (1)	100 (0)	100 (0)	100 (0)	99 (0)	77 (8)	674
U. urealyticum ⧣4	96 (3)	99 (1)	99 (1)	97 (2)	92 (8)	96 (3)	68 (8)	650
M. fermentans[ii]	100 (0)	0[ii]	0[ii]	0[ii]	0[ii]	1	4	725
Average	96.8 (1.6)	99.1 (0.6)	99.8 (0.1)	99.1 (0.6)	97.8 (1.8)	96.5 (1.9)	83 (6.0)	712 ± 18

[i] Each unedited sequence determined from genomic DNA template is aligned and compared with its corresponding GenBank database sequence. The fidelity of each sequence is listed as percentage agreement in a 100-bp interval. The number following the percentage value is the number of ambiguous bases (N) for each interval. The usable data range is the length of sequence that would be employed after human editing of the computer-generated base-calls.

[ii] The M. fermentans initiation factor database sequence only overlapped the new sequence for 150 bases, limiting the comparison to that stretch.