Table 2.

Comparison of CodonBERT to prior methods on seven downstream tasks

ModelFlu vaccinesmRFP expressionFungal expressionE. coli proteinsmRNA stabilityTc-riboswitchSARS-CoV-2 vaccine degradation
Nucleotide-based
 Plain TextCNN0.720.620.530.390.010.410.55
 RNABERT+TextCNN0.650.400.410.390.160.470.64
 RNA-FM+TextCNN0.710.800.590.430.340.580.74
Codon-based
 TF-IDF0.680.570.680.440.540.490.69
 Plain TextCNN0.710.780.760.360.260.430.80
 Codon2vec+TextCNN0.720.770.610.430.330.560.70
 CodonBERT0.810.850.880.550.510.560.77

[i] For regression tasks, the corresponding Spearman's rank correlation values are listed. For the classification task (E. coli protein data set), classification accuracy is calculated. The best values of correlation and accuracy for each task are in bold. The corresponding loss values are listed in Supplemental Table S1.