Long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes

  1. Chaochun Wei3,6
  1. 1 Chinese Academy of Agricultural Sciences;
  2. 2 Shanghai Jiao Tong University, School of Life Sciences and Biotechnology;
  3. 3 Shanghai Jiao Tong University;
  4. 4 Anhui Agricultural University;
  5. 5 Institute of Crop Sciences, Chinese Academy of Agricultural Sciences
  • * Corresponding author; email: ccwei{at}sjtu.edu.cn
  • Abstract

    The concept of a pan-genome, which is the collection of all genomes from a population, has shown great potential in genomics study, especially for crop sciences. The rice pan-genome constructed from the second-generation sequencing (SGS) data is about 270 Mb larger than Nipponbare, the rice reference genome (NipRG), but it still suffers from incompleteness and loss of genomic contexts. The third-generation sequencing (TGS) with long reads can help to construct better pan-genomes. In this paper, we reported a high-quality rice pan-genome construction method by introducing a series of new steps to deal with the long-read data including unmapped sequence block filtering, redundancy removing, and sequence block elongating. Compared to NipRG, the long-read sequencing-based pan-genome constructed from 105 rice accessions, which contains 604 Mb novel sequences, is much more comprehensive than the one constructed from ~3000 rice genomes sequenced with short reads. The repetitive sequences are the main components of novel sequences, which partially explained the differences between the pan-genomes based on TGS and SGS. Adding 6 wild rice accessions, there are about 879 Mb novel sequences and 19,000 novel genes in the rice pan-genome in total. In addition, we have created high-quality reference genomes for all representative rice populations, including 5 gapless reference genomes. This study has brought significant progress for our understanding about the rice pan-genome, and this pan-genome construction method for long-read data can be applied to accelerate a broad range of genomics studies.

    • Received September 3, 2021.
    • Accepted March 31, 2022.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    Articles citing this article

    ACCEPTED MANUSCRIPT

    This Article

    1. Genome Res. gr.276015.121 Published by Cold Spring Harbor Laboratory Press

    Article Category

    ORCID

    Share

    Preprint Server