Centromeric reference database and sequence annotation. Linear representation of the DYZ3 array is shown to completely replace the centromere gap placeholder in the chromosome Y reference assembly. Evaluation of monomer ordering across the array predicts 40 higher-order repeat units within a generated array of 227 kb. Increased resolution in the linearized centromeric array demonstrates the monomer sequence order along the bottom in blue shading (labels m1v–m34v), which defines the particular HOR arrangement and the variant sites and base changes observed in the data set (shown in purple). Each 24-bp sliding window across this region demonstrates the representation of these sequences within the HuRef WGS database, with peaks indicating sites that are overrepresented and likely due to exact homology with satellites outside of the Y array. The top 75th percentile mappable sites are provided to extend the survey across other individuals. Six individual array profiles are provided as an example of population-based data, where DYZ3 array group 1 (three individuals from the CEU population) is shown in blue, and array group 2 (three individuals from the CHS population) is shown in red.
