Markup | Genome Research

Table 2.

Comparison of CodonBERT to prior methods on seven downstream tasks

Model	Flu vaccines	mRFP expression	Fungal expression	E. coli proteins	mRNA stability	Tc-riboswitch	SARS-CoV-2 vaccine degradation
Nucleotide-based
Plain TextCNN	0.72	0.62	0.53	0.39	0.01	0.41	0.55
RNABERT_+TextCNN	0.65	0.40	0.41	0.39	0.16	0.47	0.64
RNA-FM_+TextCNN	0.71	0.80	0.59	0.43	0.34	0.58	0.74
Codon-based
TF-IDF	0.68	0.57	0.68	0.44	0.54	0.49	0.69
Plain TextCNN	0.71	0.78	0.76	0.36	0.26	0.43	0.80
Codon2vec_+TextCNN	0.72	0.77	0.61	0.43	0.33	0.56	0.70
CodonBERT	0.81	0.85	0.88	0.55	0.51	0.56	0.77

[i] For regression tasks, the corresponding Spearman's rank correlation values are listed. For the classification task (E. coli protein data set), classification accuracy is calculated. The best values of correlation and accuracy for each task are in bold. The corresponding loss values are listed in Supplemental Table S1.