Centromere reference models for human chromosomes X and Y satellite arrays

Karen H. Miga; Yulia Newton; Miten Jain; Nicolas Altemose; Huntington F. Willard; W. James Kent

doi:10.1101/gr.159624.113

Centromere reference models for human chromosomes X and Y satellite arrays

¹Duke Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina 27708, USA;
²Center for Biomolecular Science & Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA

Abstract

The human genome sequence remains incomplete, with multimegabase-sized gaps representing the endogenous centromeres and other heterochromatic regions. Available sequence-based studies within these sites in the genome have demonstrated a role in centromere function and chromosome pairing, necessary to ensure proper chromosome segregation during cell division. A common genomic feature of these regions is the enrichment of long arrays of near-identical tandem repeats, known as satellite DNAs, which offer a limited number of variant sites to differentiate individual repeat copies across millions of bases. This substantial sequence homogeneity challenges available assembly strategies and, as a result, centromeric regions are omitted from ongoing genomic studies. To address this problem, we utilize monomer sequence and ordering information obtained from whole-genome shotgun reads to model two haploid human satellite arrays on chromosomes X and Y, resulting in an initial characterization of 3.83 Mb of centromeric DNA within an individual genome. To further expand the utility of each centromeric reference sequence model, we evaluate sites within the arrays for short-read mappability and chromosome specificity. Because satellite DNAs evolve in a concerted manner, we use these centromeric assemblies to assess the extent of sequence variation among 366 individuals from distinct human populations. We thus identify two satellite array variants in both X and Y centromeres, as determined by array length and sequence composition. This study provides an initial sequence characterization of a regional centromere and establishes a foundation to extend genomic characterization to these sites as well as to other repeat-rich regions within complex genomes.

Footnotes

↵3 Corresponding author

E-mail kent{at}soe.ucsc.edu
[Supplemental material is available for this article.]
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.159624.113.

Received July 1, 2013.
Accepted January 22, 2014.

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported), as described at http://creativecommons.org/licenses/by-nc/3.0/.

Articles citing this article

The GENCODE CLS project: massively expanding the lncRNA catalog through capture long-read RNA sequencing bioRxiv January 23, 2026 0: 2024.10.29.620654v2-2024.10.29.620654

Telomere interactions and structural variants in ALT cells revealed with TelSPRITE bioRxiv November 26, 2024 0: 2024.11.22.624895v1-2024.11.22.624895

ModDotPlot--Rapid and interactive visualization of complex repeats bioRxiv April 23, 2024 0: 2024.04.15.589623v1-2024.04.15.589623

Nanopore-based DNA long-read sequencing analysis of the aged human brain bioRxiv February 7, 2024 0: 2024.02.01.578450v1-2024.02.01.578450

Small variant benchmark from a complete assembly of X and Y chromosomes bioRxiv November 21, 2023 0: 2023.10.31.564997v2-2023.10.31.564997

De novo reconstruction of satellite repeat units from sequence data Genome Res November 1, 2023 33: 1994-2001

The complete sequence of a human Y chromosome bioRxiv July 14, 2023 0: 2022.12.01.518724v2-2022.12.01.518724

Assembly of 43 diverse human Y chromosomes reveals extensive complexity and variation bioRxiv June 14, 2023 0: 2022.12.01.518658v2-2022.12.01.518658

A small number of human lineage mutations regulated RNA-protein binding of conserved genes and promoted human evolution bioRxiv March 30, 2023 0: 2023.03.27.534315v1-2023.03.27.534315

Epigenetic centromere identity is precisely maintained through DNA replication but is uniquely specified among human cells LSA January 3, 2023 6: e202201807

Assembly of 43 diverse human Y chromosomes reveals extensive complexity and variation bioRxiv December 3, 2022 0: 2022.12.01.518658v1-2022.12.01.518658

Fast and accurate mapping of long reads to complete genome assemblies with VerityMap Genome Res November 1, 2022 32: 2107-2118

Epigenetic centromere identity is precisely maintained through DNA replication but is uniquely specified among human cells bioRxiv September 10, 2022 0: 2022.09.07.506974v1-2022.09.07.506974

Three recent sex chromosome-to-autosome fusions in a Drosophila virilis strain with high satellite content bioRxiv August 29, 2022 0: 2021.06.14.448339v2-2021.06.14.448339

In silico discovery of repetitive elements as key sequence determinants of 3D genome folding bioRxiv August 14, 2022 0: 2022.08.11.503410v1-2022.08.11.503410

Automated annotation of human centromeres with HORmon Genome Res June 1, 2022 32: 1137-1151

Short arms of human acrocentric chromosomes and the completion of the human genome sequence Genome Res April 1, 2022 32: 599-607

Telomere-to-telomere human DNA replication timing profiles bioRxiv March 30, 2022 0: 2022.03.28.486072v1-2022.03.28.486072

Chromosome length and gene density contribute to micronuclear membrane stability LSA November 17, 2021 5: e202101210

HORmon: automated annotation of human centromeres bioRxiv October 14, 2021 0: 2021.10.12.464028v1-2021.10.12.464028

Benchmarking challenging small variants with linked and long reads bioRxiv October 8, 2021 0: 2020.07.24.212712v5-2020.07.24.212712

The complete sequence of a human genome bioRxiv July 28, 2021 0: 2021.05.26.445798v1-2021.05.26.445798

Complete genomic and epigenetic maps of human centromeres bioRxiv July 21, 2021 0: 2021.07.12.452052v2-2021.07.12.452052

A complete reference genome improves analysis of human genetic variation bioRxiv July 15, 2021 0: 2021.07.12.452063v1-2021.07.12.452063

Mosaic cis-regulatory evolution drives transcriptional partitioning of HERVH endogenous retrovirus in the human embryo bioRxiv July 10, 2021 0: 2021.07.08.451617v1-2021.07.08.451617

DiMeLo-seq: a long-read, single-molecule method for mapping protein-DNA interactions genome-wide bioRxiv July 9, 2021 0: 2021.07.06.451383v1-2021.07.06.451383

Identification and characterization of centromeric sequences in Xenopus laevis Genome Res June 1, 2021 31: 958-967

Chromosome length and gene density contribute to micronuclear membrane stability bioRxiv May 17, 2021 0: 2021.05.12.443914v1-2021.05.12.443914

Alpha satellite insertion close to an ancestral centromeric region bioRxiv May 8, 2021 0: 2021.03.10.434819v2-2021.03.10.434819

Rapid and ongoing evolution of repetitive sequence structures in human centromeres Sci Adv December 11, 2020 6: eabd9230

Full Text (PDF)

Macrophage metallothioneins participate in the antileishmanial activity of antimonials bioRxiv October 4, 2020 0: 2020.09.30.321471v1-2020.09.30.321471

A Game of Thrones at Human Centromeres I. Multifarious structure necessitates a new molecular/evolutionary model bioRxiv September 21, 2020 0: 731430v2-731430

The structure, function, and evolution of a complete human chromosome 8 bioRxiv September 10, 2020 0: 2020.09.08.285395v1-2020.09.08.285395

Repeat RNAs associate with replication forks and post-replicative DNA RNA September 1, 2020 26: 1104-1117

Improved contiguity of the threespine stickleback genome using long-read sequencing bioRxiv July 5, 2020 0: 2020.06.30.170787v2-2020.06.30.170787

Identification and characterization of centromeric sequences in Xenopus laevis bioRxiv June 26, 2020 0: 2020.06.23.167643v1-2020.06.23.167643

Alpha-satellite RNA transcripts are repressed by centromere-nucleolus associations bioRxiv April 17, 2020 0: 2020.04.14.040766v1-2020.04.14.040766

The Genomic Landscape of Centromeres in Cancers bioRxiv December 27, 2019 0: 505800v5-505800

TandemMapper and TandemQUAST: mapping long reads and assessing/improving assembly quality in extra-long tandem repeats bioRxiv December 26, 2019 0: 2019.12.23.887158v1-2019.12.23.887158

Assembly of a young vertebrate Y chromosome reveals convergent signatures of sex chromosome evolution bioRxiv December 24, 2019 0: 2019.12.12.874701v2-2019.12.12.874701

Islands of retroelements are the major components of Drosophila centromeres bioRxiv December 19, 2019 0: 537357v1-537357

Construction and Integration of Three De Novo Japanese Human Genome Assemblies toward a Population-Specific Reference bioRxiv December 4, 2019 0: 861658v1-861658

Long-read Data Revealed Structural Diversity in Human Centromere Sequences bioRxiv September 29, 2019 0: 784785v1-784785

centroFlye: Assembling Centromeres with Long Error-Prone Reads bioRxiv September 18, 2019 0: 772103v1-772103

Rich polymorphic variants of alpha satellite 34mer higher order repeats in hg38 assembly of human chromosome Y bioRxiv September 15, 2019 0: 768861v1-768861

Telomere-to-telomere assembly of a complete human X chromosome bioRxiv August 18, 2019 0: 735928v3-735928

A Game of Thrones at Human Centromeres II. A new molecular/evolutionary model bioRxiv August 12, 2019 0: 731471v1-731471

Early diverging fungus Mucor circinelloides lacks centromeric histone CENP-A and displays a mosaic of point and regional centromeres bioRxiv July 20, 2019 0: 706580v1-706580

Cohesin removal reprograms gene expression upon mitotic entry bioRxiv June 28, 2019 0: 678003v1-678003

Antagonism of Forkhead Box Subclass O Transcription Factors Elicits Loss of Soluble Guanylyl Cyclase Expression Mol. Pharmacol. June 1, 2019 95: 629-637

DNA replication-mediated error correction of ectopic CENP-A deposition maintains centromere identity bioRxiv April 19, 2019 0: 428557v1-428557

High inter- and intraspecific turnover of satellite repeats in great apes bioRxiv April 18, 2019 0: 470054v1-470054

CENP-A associated lncRNAs influence chromosome segregation in human cells bioRxiv April 16, 2019 0: 097956v2-97956

Classification and monomer-by-monomer annotation of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly bioRxiv April 12, 2019 0: 408674v2-408674

Haplotypes spanning centromeric regions reveal persistence of large blocks of archaic DNA bioRxiv April 9, 2019 0: 351569v2-351569

Linear Assembly of a Human Y Centromere bioRxiv March 31, 2019 0: 170373v2-170373

Genome Graphs and the Evolution of Genome Inference bioRxiv March 17, 2019 0: 101816v1-101816

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly bioRxiv March 11, 2019 0: 072116v2-72116

Single molecule long read sequencing resolves the detailed structure of complex satellite DNA loci in Drosophila melanogaster bioRxiv March 9, 2019 0: 054155v2-54155

Rapid molecular assays to study human centromere genomics Genome Res December 1, 2017 27: 2040-2049

Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications Genome Res May 1, 2017 27: 697-708

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly Genome Res May 1, 2017 27: 849-864

Genome graphs and the evolution of genome inference Genome Res May 1, 2017 27: 665-676

Single-molecule sequencing resolves the detailed structure of complex satellite DNA loci in Drosophila melanogaster Genome Res May 1, 2017 27: 709-721

Human centromeric CENP-A chromatin is a homotypic, octameric nucleosome at all cell cycle points JCB March 6, 2017 216: 607-621

Centromere location in Arabidopsis is unaltered by extreme divergence in CENH3 protein sequence Genome Res March 1, 2017 27: 471-478

Genomic variation within alpha satellite DNA influences centromere location on human chromosomes with metastable epialleles Genome Res October 1, 2016 26: 1301-1311

Mitotic noncoding RNA processing promotes kinetochore and spindle assembly in Xenopus JCB July 18, 2016 214: 133-141

Chromosome-scale shotgun assembly using an in vitro method for long-range linkage Genome Res March 1, 2016 26: 342-350

Alteration/Deficiency in Activation 3 (ADA3) Protein, a Cell Cycle Regulator, Associates with the Centromere through CENP-B and Regulates Chromosome Segregation J Biol Chem November 20, 2015 290: 28299-28310

Centromere reference models for human chromosomes X and Y satellite arrays

Abstract

Footnotes

Articles citing this article

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

Share

Preprint Server

Current Issue

In This Issue