The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes

Click on image to view larger version.

Table 1.

Growth and current size of the CCDS set

Click on table to view larger version.

Table 1.
  • aNCBI build numbers and Ensembl release numbers (e.g., build 35.1 and release 23, etc.) are displayed in the Map Viewer and Ensembl browser, respectively, and represent distinct whole-genome annotation runs. The values reported here reflect the input annotation data set used to calculate new candidates. (Hs) Homo sapiens; (Mm) Mus musculus.

  • bIf unexpected losses, indicated in the “Withdrawn, Other” column, are found in a later build, the CCDS ID is reinstated with a “public” status. Reinstatement requires that the CDS structure be identical to the version that was previously lost, or, if the CDS structure has changed and is found as identical in both input data sets, then the CCDS version number is incremented. For example, see CCDS2672.

  • cUnexpected loss of consistent CDS annotation includes changed or removed annotation that is not tracked by the CCDS database as curation-based change. The large accidental loss in human build 36.2 resulted in improved tracking of annotation input data by both the NCBI and Ensembl annotation pipelines. Robust CCDS tracking continues to be a goal of annotation pipelines.

This Article

  1. Genome Res. 19: 1316-1323

Preprint Server