Compositional Gene Landscapes in Vertebrates
Abstract
The existence of a well conserved linear relationship between GC levels of genes' second and third codon positions (GC2, GC3) prompted us to focus on the landscape, or joint distribution, spanned by these two variables. In human, well curated coding sequences now cover at least 15%–30% of the estimated total gene set. Our analysis of the landscape defined by this gene set revealed not only the well documented linear crest, but also the presence of several peaks and valleys along that crest, a property that was also indicated in two other warm-blooded vertebrates represented by large gene databases, that is, mouse and chicken. GC2 is the sum of eight amino acid frequencies, whereas GC3 is linearly related to the GC level of the chromosomal region containing the gene. The landscapes therefore portray relations between proteins and the DNA environments of the genes that encode them.
Footnotes
-
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2246704.
-
↵3 Present address: Atelier de Génomique Comparative, Genoscope, Centre National de Séquençage, CP 5706, 91057 Evry Cedex, France.
-
↵4 ENS/CNRS FRE 2433, Organismes Photosynthétiques et Environment, Département de Biologie, Ecole Normale Supérieure, 75230 Paris Cedex 05, France.
-
↵5 Corresponding author. E-MAIL bernardi{at}szn.it; FAX 39 081-764-1355.
-
- Accepted February 26, 2004.
- Received December 11, 2003.
- Cold Spring Harbor Laboratory Press











