
A summary of locus biotypes in GENCODE. This schematic details the major classes of loci found in the GENCODE v16 human gene set, and in square brackets the total number of each set. These counts are made at the locus level as opposed to the transcript level. GENCODE contains 194,034 transcripts in total, 81,626 of which have an annotated CDS. This means there is an average of 4.0 CDS transcripts per protein-coding gene, while 14,786 protein-coding genes contain more than one distinct CDS (i). Long intergenic RNAs (lincRNAs), antisense RNAs, and sense intronic RNAs are treated as sub-biotypes of lncRNA (ii–iv). In GENCODE, lincRNAs are models that do not overlap a protein-coding gene or pseudogene on either strand, antisense RNAs are models found on the opposite strand to exons or introns of protein-coding genes, and sense intronic RNAs are found entirely with the intron of a protein-coding gene. In total, GENCODE contains 22,444 lncRNA transcripts, an average of 1.7 per lncRNA locus. (v) The 9173 loci classed as small noncoding RNA loci include the classic rRNA and tRNA genes, as well as the more recently identified categories of loci such as miRNAs, snoRNAs, and piRNAs. The 13,419 pseudogenes found in GENCODE can be divided into three major classes: unprocessed, processed, and unitary (vi–viii). Unprocessed pseudogenes result from the genomic duplication of protein-coding genes; pseudogenization may come from the fact that the duplication is partial, or by subsequent mutation. Processed pseudogenes are formed by the retroinsertion of mRNAs into the genome sequence, and these loci are thus typically intronless. Unitary pseudogenes are protein-coding genes that are pseudogenized in the human lineage, as judged by a comparison with an intact coding ortholog in another species. Further to this diagram, GENCODE also contains 26 polymorphic pseudogenes: models in the reference assembly that are known to exist as intact protein-coding loci in other human genomes. All classes of pseudogenes may be subjected to transcription.











