TY - JOUR A1 - Pruitt, Kim D. A1 - Harrow, Jennifer A1 - Harte, Rachel A. A1 - Wallin, Craig A1 - Diekhans, Mark A1 - Maglott, Donna R. A1 - Searle, Steve A1 - Farrell, Catherine M. A1 - Loveland, Jane E. A1 - Ruef, Barbara J. A1 - Hart, Elizabeth A1 - Suner, Marie-Marthe A1 - Landrum, Melissa J. A1 - Aken, Bronwen A1 - Ayling, Sarah A1 - Baertsch, Robert A1 - Fernandez-Banet, Julio A1 - Cherry, Joshua L. A1 - Curwen, Val A1 - DiCuccio, Michael A1 - Kellis, Manolis A1 - Lee, Jennifer A1 - Lin, Michael F. A1 - Schuster, Michael A1 - Shkeda, Andrew A1 - Amid, Clara A1 - Brown, Garth A1 - Dukhanina, Oksana A1 - Frankish, Adam A1 - Hart, Jennifer A1 - Maidak, Bonnie L. A1 - Mudge, Jonathan A1 - Murphy, Michael R. A1 - Murphy, Terence A1 - Rajan, Jeena A1 - Rajput, Bhanu A1 - Riddick, Lillian D. A1 - Snow, Catherine A1 - Steward, Charles A1 - Webb, David A1 - Weber, Janet A. A1 - Wilming, Laurens A1 - Wu, Wenyu A1 - Birney, Ewan A1 - Haussler, David A1 - Hubbard, Tim A1 - Ostell, James A1 - Durbin, Richard A1 - Lipman, David T1 - The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes Y1 - 2009/07/01 JF - Genome Research JO - Genome Research SP - 1316 EP - 1323 DO - 10.1101/gr.080531.108 VL - 19 IS - 7 UR - http://genome.cshlp.org/content/19/7/1316.abstract N2 - Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions. ER -