TransRate: reference free quality assessment of de novo transcriptome assemblies

  1. Steven Kelly3,4
  1. 1 University of Cambridge;
  2. 2 Stony Brook University;
  3. 3 University of Oxford
  1. * Corresponding author; email: steven.kelly{at}plants.ox.ac.uk

Abstract

TransRate is a tool for reference-free quality assessment of de novo transcriptome assemblies. Using only the sequenced reads and the assembly as input, we show multiple common artifacts of de novo transcriptome assembly can be readily detected. These include chimeras, structural errors, incomplete assembly and base errors. TransRate evaluates these errors to produce a diagnostic quality score for each contig and these contig scores are integrated to evaluate whole assemblies. Thus TransRate can be used for do novo assembly filtering and optimisation as well as comparison of assemblies generated using different methods from the same input reads. Applying the method to a dataset of 155 published de novo transcriptome assemblies we deconstruct the contribution that assembly method, read length, read quantity, and read quality make to the accuracy of de novo transcriptome assemblies and reveal that variance in the quality of the input data explains 43% of the variance in the quality of published de novo transcriptome assemblies. As TransRate is reference-free it is suitable for assessment of assemblies of all types of RNA, including assemblies of long non-coding RNA, rRNA, mRNA, and mixed RNA samples.

  • Received July 2, 2015.
  • Accepted May 27, 2016.

This manuscript is Open Access.

This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International license), as described at http://creativecommons.org/licenses/by/4.0/.

Articles citing this article

OPEN ACCESS ARTICLE
ACCEPTED MANUSCRIPT

Preprint Server