RT Journal
A1 Lu, Zhi John
A1 Yip, Kevin Y.
A1 Wang, Guilin
A1 Shou, Chong
A1 Hillier, LaDeana W.
A1 Khurana, Ekta
A1 Agarwal, Ashish
A1 Auerbach, Raymond
A1 Rozowsky, Joel
A1 Cheng, Chao
A1 Kato, Masaomi
A1 Miller, David M.
A1 Slack, Frank
A1 Snyder, Michael
A1 Waterson, Robert H.
A1 Reinke, Valerie
A1 Gerstein, Mark
T1 Prediction and characterization of non-coding RNAs in C. elegans by integrating conservation, secondary structure and high throughput sequencing and array data
JF Genome Research 
JO Genome Research 
YR 2010 
FD December 22 
DO 10.1101/gr.110189.110 
SP gr.110189.110 
UL http://genome.cshlp.org/content/early/2010/12/21/gr.110189.110.abstract 
AB We present an integrative machine learning method, incRNA, for whole-genome identification of non-coding RNAs (ncRNAs). It combines a large amount of expression data, RNA secondary-structure stability, and evolutionary conservation at the protein and nucleic-acid level. Using the incRNA model and data from the modENCODE consortium, we are able to separate known C. elegans ncRNAs from coding sequences and other genomic elements with a high level of accuracy (97% AUC on an independent validation set), and find >7,000 novel ncRNA candidates, among which >1,000 are located in the intergenic regions of C. elegans genome. Based on the validation set, we estimate that 91% of the ~7000 novel ncRNA candidates are true positives. We then analyze fifteen novel ncRNA candidates by RT-PCR, detecting the expression for fourteen. In addition, we characterize the properties of all the novel ncRNA candidates and find that they have distinct expression patterns across developmental stages and tend to use novel RNA structural families. We also find that they are often targeted by specific transcription factors (~59% of intergenic novel ncRNA candidates). Overall, our study identifies many new potential ncRNAs in C. elegans and provides a method that can be adapted to other organisms.