Construction and evaluation of a new rat reference genome assembly, GRCr8, from long reads and long-range scaffolding

Table 6.

Comparison of NCBI annotation reports for mRatBN7.2, GRCr8, and GRCm39

Feature mRatBN7.2 GRCr8 GRCm39
A. Genes, pseudogenes, mRNA's pseudo transcripts and CDS's annotated in the assemblies
Genes and pseudogenes 42,054 47,357 50,561
 Protein-coding 21,990 23,154 22,186
 Noncoding 11,558 14,864 17,518
 Transcribed pseudogenes 157 117 474
 Nontranscribed pseudogenes 8005 8687 9869
 Genes with variants 16,116 17,767 18,531
 Immunoglobulin/T cell receptor gene segments 300 501 490
 Other 44 34 24
mRNAs 73,402 85,576 92,486
 Fully supported 72,688 84,885 92,206
 With >5% ab initio 369 371 113
 Partial 83 79 54
 With filled gap(s) 0 0 0
 Known RefSeq (NM_) 20,574 21,191 37,907
 Model RefSeq (XM_) 52,828 64,385 54,579
Noncoding RNAs 24,927 30,129 38,629
 Fully supported 21,377 26,261 35,041
 With >5% ab initio 0 0 0
 Partial 1 0 5
 With filled gap(s) 0 0 0
 Known RefSeq (NR_) 817 821 7443
 Model RefSeq (XR_) 23,041 28,207 29,867
Pseudotranscripts 159 121 533
 Fully supported 141 108 524
 With >5% ab initio 0 0 0
 Partial 0 0 0
 With filled gap(s) 0 0 0
 Known RefSeq (NR_) 43 43 461
 Model RefSeq (XR_) 116 78 72
CDSs 73,715 86,077 92,989
 Fully supported 72,688 84,885 92,206
 With >5% ab initio 432 441 155
 Partial 83 75 53
 With major correction(s) 413 208 30
 Known RefSeq (NP_) 20,574 21,191 37,920
 Model RefSeq (XP_) 52,841 64,385 54,579
B. featureCounts in the assemblies
Genes 33,592 38,052 39,728
All transcripts 98,329 115,705 131,115
 mRNA 73,402 85,576 92,486
 misc_RNA 4637 4764 10,029
 miRNA 796 796 2112
 tRNA 737 771 422
 lncRNA 16,266 21,024 23,611
 snoRNA 1288 1564 1331
 snRNA 1026 1013 999
 antisense_RNA 1 1 9
 rRNA 139 159 41
 telomerase_RNA 1 1 64
 RNase_MRP_RNA 1 1 1
 SRP_RNA 1 1
Single-exon transcripts 3200 3392 2941
 Coding transcripts (NM_/XM_) 3179 3371 2710
 Noncoding transcripts (NR_/XR_) 21 21 231
Exons 316,617 337,524 352,138
 In coding transcripts (NM_/XM_) 272,844 284,365 285,412
 In noncoding transcripts (NR_/XR_) 72,831 78,922 120,842
Introns 264,390 276,634 291,708
 In coding transcripts (NM_/XM_) 233,700 240,878 243,819
 In noncoding transcripts (NR_/XR_) 59,216 61,168 100,211

This Article

  1. Genome Res. 34: 2081-2093

Preprint Server