The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes

Kim D. Pruitt; Jennifer Harrow; Rachel A. Harte; Craig Wallin; Mark Diekhans; Donna R. Maglott; Steve Searle; Catherine M. Farrell; Jane E. Loveland; Barbara J. Ruef; Elizabeth Hart; Marie-Marthe Suner; Melissa J. Landrum; Bronwen Aken; Sarah Ayling; Robert Baertsch; Julio Fernandez-Banet; Joshua L. Cherry; Val Curwen; Michael DiCuccio; Manolis Kellis; Jennifer Lee; Michael F. Lin; Michael Schuster; Andrew Shkeda; Clara Amid; Garth Brown; Oksana Dukhanina; Adam Frankish; Jennifer Hart; Bonnie L. Maidak; Jonathan Mudge; Michael R. Murphy; Terence Murphy; Jeena Rajan; Bhanu Rajput; Lillian D. Riddick; Catherine Snow; Charles Steward; David Webb; Janet A. Weber; Laurens Wilming; Wenyu Wu; Ewan Birney; David Haussler; Tim Hubbard; James Ostell; Richard Durbin; David Lipman

doi:10.1101/gr.080531.108

Resource

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes

- ¹ National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894, USA;
- ² Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom;
- ³ Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA;
- ⁴ Zebrafish Information Network, University of Oregon, Eugene, Oregon 97403-5291, USA;
- ⁵ The University of Manchester, Faculty of Life Sciences, Manchester Interdisciplinary Biocentre, Manchester M1 7DN, United Kingdom;
- ⁶ Computer Science and Artificial Intelligence Laboratory, Institute of Technology, Cambridge, Massachusetts 02139, USA;
- ⁷ Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02141, USA;
- ⁸ European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom
- 9 Corresponding author. E-mail [email protected]; fax (301) 480-2918.

Published June 4, 2009. Vol 19 Issue 7, pp. 1316-1323. https://doi.org/10.1101/gr.080531.108

Cite Article Permissions

Current Issue:

April 2026, Vol. 36, No. 4

This article requires a subscription/paid access. Click here for options on how to access the full text.

Purchase short term access

Buy access to this article online for 24 hours. This includes access to:

The HTML version on the journal website, along with any supplementary material
A PDF version that can be downloaded for offline use during or after the access period

Access via an Institutional Subscription

You may already have access via your institution. Connect securely to your campus network or connect via an institutional VPN to see whether you have access.

Recommend this journal to your institution

If you do not have subscription access and would like to recommend this journal to your librarian, please use this online form.

Focus view

Abstract

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.

Article contents

Article (Back to top)
- Abstract
- Results
- Discussion
- Methods
- Acknowledgments
- References
- Notes

Resource

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes

Current Issue:

Abstract

Recommended articles

Article contents

Announcement(s)

Resource

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes

Cite this article

Share

Current Issue:

Purchase short term access

Access via an Institutional Subscription

Recommend this journal to your institution

Abstract

Recommended articles

Article contents

Announcement(s)