RT Journal A1 Liu, Peng A1 Ewald, Jessica A1 Galvez, Jose Hector A1 Head, Jessica A1 Crump, Doug A1 Bourque, Guillaume A1 Basu, Niladri A1 Xia, Jianguo T1 Ultrafast functional profiling of RNA-seq data for nonmodel organisms JF Genome Research JO Genome Research YR 2021 FD April 01 VO 31 IS 4 SP 713 OP 720 DO 10.1101/gr.269894.120 UL http://genome.cshlp.org/content/31/4/713.abstract AB Computational time and cost remain a major bottleneck for RNA-seq data analysis of nonmodel organisms without reference genomes. To address this challenge, we have developed Seq2Fun, a novel, all-in-one, ultrafast tool to directly perform functional quantification of RNA-seq reads without transcriptome de novo assembly. The pipeline starts with raw read quality control: sequencing error correction, removing poly(A) tails, and joining overlapped paired-end reads. It then conducts a DNA-to-protein search by translating each read into all possible amino acid fragments and subsequently identifies possible homologous sequences in a well-curated protein database. Finally, the pipeline generates several informative outputs including gene abundance tables, pathway and species hit tables, an HTML report to visualize the results, and an output of clean reads annotated with mapped genes ready for downstream analysis. Seq2Fun does not have any intermediate steps of file writing and loading, making I/O very efficient. Seq2Fun is written in C++ and can run on a personal computer with a limited number of CPUs and memory. It can process >2,000,000 reads/min and is >120 times faster than conventional workflows based on de novo assembly, while maintaining high accuracy in our various test data sets.