Deep sequencing of 1320 genes reveals the landscape of protein-truncating variants and their contribution to psoriasis in 19,973 Chinese individuals

  1. Liangdan Sun1,3,4,5
  1. 1Department of Dermatology, the First Affiliated Hospital of Anhui Medical University, Hefei 230032, China;
  2. 2School of Medicine, South China University of Technology, Guangzhou 510006, China;
  3. 3Key Laboratory of Dermatology (Anhui Medical University), Ministry of Education, Anhui, Hefei 230032, China;
  4. 4Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, China;
  5. 5Anhui Provincial Institute of Translational Medicine, Hefei 230032, China;
  6. 6Guangdong Engineering Research Center of Life Sciences Bigdata, Shenzhen 518083, China;
  7. 7Department of Biology, University of Copenhagen, Copenhagen 2200, Denmark;
  8. 8Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08540, USA;
  9. 9Microsoft Corporation, Redmond, Washington 98052, USA;
  10. 10College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China;
  11. 11School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen 510006, Guangdong, China;
  12. 12The First Affiliated Hospital of Shenzhen University, Shenzhen Second People's Hospital, Shenzhen 518035, China
  1. 13 These authors contributed equally to this work.

  • Corresponding authors: jakey{at}princeton.edu, jinxin{at}scut.edu.cn, sunld{at}ahmu.edu.cn
  • Abstract

    Protein-truncating variants (PTVs) have important impacts on phenotype diversity and disease. However, their population genetics characteristics in more globally diverse populations are not well defined. Here, we describe patterns of PTVs in 1320 genes sequenced in 10,539 healthy controls and 9434 patients with psoriasis, all of Han Chinese ancestry. We identify 8720 PTVs, of which 77% are novel, and estimate 88% of all PTVs are deleterious and subject to purifying selection. Furthermore, we show that individuals with psoriasis have a significantly higher burden of PTVs compared to controls (P = 0.02). Finally, we identified 18 PTVs in 14 genes with unusually high levels of population differentiation, consistent with the action of local adaptation. Our study provides insights into patterns and consequences of PTVs.

    Protein-truncating variants (PTVs) are expected to have strong impacts on gene function and phenotypes (Rivas et al. 2015; Exome Aggregation Consortium et al. 2016; Cassa et al. 2017; DeBoever et al. 2018; Emdin et al. 2018; Ganna et al. 2018). Indeed, PTVs are enriched for disease-related mutations (Holbrook et al. 2004; Stenson et al. 2014), are generally rare and deleterious (Exome Aggregation Consortium et al. 2016), and are abundant in the genomes of individuals (MacArthur and Tyler-Smith 2010; Rivas et al. 2015; Saleheen et al. 2017). The potential contribution of PTVs to clinically important traits has engendered significant attention in large-scale sequencing studies (Karczewski et al. 2020). For example, the Exome Aggregation Consortium (ExAC) (Exome Aggregation Consortium et al. 2016) analyzed exome sequence data from 60,706 individuals of diverse ancestry and identified 3230 genes (17.7%) intolerant to PTVs. The Pakistan Risk of Myocardial Infarction Study (Saleheen et al. 2017) performed exome sequencing on 10,503 South Asians with a high rate of consanguinity and identified 49,138 rare PTVs estimated to knock out 1317 genes. Finally, Ganna et al. (2018) explored the impacts of PTVs on 13 quantitative traits and 10 diseases using exome sequence from 100,296 individuals.

    However, the role of PTVs in complex human autoimmune diseases has not been thoroughly evaluated. Moreover, patterns and characteristics of PTVs in the Chinese population remain understudied, even though the Chinese people are the world's largest population, comprising 1.4 billion people. To address this gap, we reanalyze targeted sequence data from 1320 genes in 19,973 individuals of Han Chinese ancestry (Tang et al. 2007; Sheng et al. 2014; Zhou et al. 2016; Zhen et al. 2019).

    Results

    Characteristics of PTVs in a large Chinese population

    Sequence data were subjected to standard quality control filters as previously described (Supplemental Materials; Supplemental Table S1, S2; Supplemental Fig. S1, S2; Tennessen et al. 2012; Fu et al. 2013; Exome Aggregation Consortium et al. 2016), resulting in a data set of 19,973 individuals, including 10,539 healthy control individuals and 9434 patients with psoriasis. High-coverage sequencing with an average depth of 45.8× ± 9.6× (mean ± sd) (Supplemental Table S3) was obtained for 1320 protein-coding genes (Supplemental Table S4).

    Overall, we identified 8720 high-confidence autosomal PTVs (stop-gain, frameshift, or splice-site mutations predicted to inactivate a gene) (Supplemental Table S5), including 6389 (73.3%) singletons and 1054 (12.1%) doubletons. Only 351 (4.0%) PTVs were observed 10 or more times and 14 (0.2%) PTVs were common (allele frequency ≥ 5%). Furthermore, 2179 (25.0%) PTVs were present in ExAC or dbSNP (Sherry et al. 2001), and thus, 6541 (75.0%) PTVs were novel and not previously reported (Fig. 1A). Among the discovered PTVs, 6721 (77.1%) and 7828 (90.0%) were novel when compared to ExAC or its East Asian (EAS-ExAC) population (Fig. 1B), respectively. Among 892 PTVs shared by the Chinese and EAS-ExAC population, allele frequencies were highly correlated (R2 of 0.98), but as expected, the correlation was much lower for rare variants (R2 of 0.57) (Fig. 1C). One PTV (rs200988634 at OTOA) (Fig. 1C) had a highly discordant allele frequency compared with EAS-ExAC, which may result from technical differences in variant calling or population structure, as the EAS-ExAC sample is comprised of individuals of East Asian descent, whereas our sample is predominantly from mainland China (Supplemental Table S6).

    Figure 1.

    Protein-truncating variants of 19,973 Chinese in 1320 targeted genes. (A) Allele frequency spectrum of known and novel PTVs. (B) Venn diagram of PTVs in the Chinese (CHN, n = 19,973), ExAC (n = 60,706), and East Asian (EAS-ExAC, n = 4327) population in 1320 genes. (C) Allele frequency comparisons of 892 overlapped PTVs between CHN and EAS-ExAC. The inset shows those of PTVs with a maximum allele frequency of 1%. (DF) Individual level summary of PTVs in variant types (D), genotypic composition (E), and consequence (F). (G) Number of pPTVs per individual in heterozygous and homozygous genotypes. (H) Relative proportions of pPTVs in heterozygous and homozygous genotypes. (I) Number of PTVs and pPTVs per individual in singletons and doubletons.

    Individual burdens of PTVs

    The per-individual burden of PTVs in our samples is 8.6 ± 2.2 among these 1320 targeted genes. We next quantified the number of PTVs due to SNVs, insertions, and deletions (Supplemental Table S7). We observed more PTVs due to SNVs than deletions (P < 2.2 × 10−16, Mann–Whitney U test) or insertions (P < 2.2 × 10−16, Mann–Whitney U test) (Fig. 1D). However, insertions and deletions accounted for a higher proportion of PTVs compared to SNVs (Supplemental Table S7). We also observed more PTVs caused by frameshifts than those of stop-gained and splice variants (P < 2.2 × 10−16 and P < 2.2 × 10−16, respectively; Mann–Whitney U test) (Fig. 1F). Among these 1320 genes, Chinese individuals carried PTV burdens of 0.1 ± 0.3 pathogenic heterozygotes and 0.0001 ± 0.01 pathogenic homozygotes (Fig. 1G) out of an average of 6.8 ± 2.3 and 1.8 ± 1.1 heterozygous and homozygous PTVs (Fig. 1E), respectively. Relative proportions of pathogenic PTVs (pPTVs) in heterozygotes were significantly higher than those in homozygotes (P < 2.2 × 10−16, Mann–Whitney U test) (Fig. 1H), consistent with the effects of purifying selection on homozygous pPTVs. We also inferred purifying selection was acting on singleton pPTVs by comparing allele frequencies between pPTVs and PTVs (Fig. 1I). Individual level summaries of PTVs in allele frequency and CDS position are shown in Supplemental Figures S3 and S4.

    Functional impacts of PTVs

    We used Combined Annotation Dependent Depletion (CADD) to characterize the functional impacts of PTVs (Kircher et al. 2014). CADD is a measure of variant deleteriousness that integrates diverse genomic features into a Phred-scaled score for each variant. PTVs with higher CADD values are more likely to have deleterious effects. Whereas a negative correlation was detected between PTVs' allele frequency and CADD (Supplemental Figs. S5, S6), rarer PTVs tend to be enriched in their deleterious effects (CADD ≥ 30), regardless of their position in the coding DNA sequence (CDS) (Fig. 2A). A similar trend was observed with missense variants, though the signal was less pronounced and few synonymous variants have predicted deleterious effects. We also observed a lower proportion of singleton PTVs with CADD ≥ 30 in the head (<10%) CDS than those in the body (10%–90%) or tail (>90%), which could be explained by their higher deleteriousness (Fig. 2A).

    Figure 2.

    Deleteriousness of PTVs. (A) Fractions of variants that are potentially deleterious (CADD ≥ 30) for each functional category and location of variants. Note, head, body, and tail are defined as CDS position of <10%, 10%–90%, and >90%, respectively. (B) Fractions of deleterious variants under purifying selection (f) for protein-truncating (PTV, red) and missense (Mis, green) variants across 19,973 Chinese (CHN, dark) and 4327 East Asians (EAS-ExAC, light). (CF) Fractions of deleterious variants that are subject to purifying selection (f) for PTVs across CHN from B grouped by CDS positions (C), variant types (D), consequence (E), and gene sets (F). Note that theoretical 95% confidence intervals are obtained via bootstrapping of 1000 replications and shown as error bars.

    To quantify the burden of deleterious PTVs, the fraction of sites under selection (f) was calculated (see Methods) in the CHN and EAS-ExAC population as a function of CDS position, variant types, consequence, and gene sets as previously described (Moon and Akey 2016). We observed that 87.9% and 54.3% of PTVs and missense variants (P < 2.2 × 10−16; OR = 6.1; 95% CI: 5.7–6.6, two-sided Fisher's exact test) were deleterious and subject to purifying selection in 19,973 Chinese, respectively (Fig. 2B). Our estimate of f suggests a considerably higher burden of deleterious PTVs in 19,973 CHN compared with that in 4327 EAS (fCHN = 87.9% vs. fEAS-ExAC = 78.5%, P < 2.2 × 10−16; OR = 2.0; 95% CI: 1.7–2.3, two-sided Fisher's exact test) (Fig. 2B) in these 1320 genes, likely due to sample size differences. Moreover, we could see a lower f (but almost the same scale) in the whole genome of 2234 Japanese (f∼70%) (Okada et al. 2018), in which such decrease could be due to smaller sample sizes and larger genome regions. PTVs occurring in the head CDS were significantly more deleterious than those in the body (P < 2.2 × 10−16; OR = 39.5; 95% CI: 22.4–77.3, two-sided Fisher's exact test) and tail (P < 2.2 × 10−16; OR = 13.4; 95% CI: 7.2–27.4, two-sided Fisher's exact test) (Fig. 2C). We found PTVs due to deletions were significantly more deleterious than those of insertions (P = 2.3 × 10−9; OR = 1.83; 95% CI: 1.5–2.2, two-sided Fisher's exact test) (Fig. 2D), consistent with Sjödin et al. (2010). PTVs of stop-gain were significantly more deleterious than those of frameshift (P = 8.8 × 10−4; OR = 1.3; 95% CI: 1.1–1.6, two-sided Fisher's exact test) and splice variants (P = 9.3 × 10−7; OR = 1.6; 95% CI: 1.3–1.9, two-sided Fisher's exact test) (Fig. 2E). The fraction of PTVs that were deleterious in psoriasis patients (fCase = 85.3%) was slightly lower (but not significant, P = 0.603, two-sided Fisher's exact test) than controls (fControl = 85.7%). Estimates of f for PTVs and missense variants in the worldwide population among 1320 genes are shown in Supplemental Figure S7.

    We also studied patterns of f across gene sets (Fig. 2F). The most deleterious genes of PTVs were ClinGen haploinsufficient (f = 98.7%) (Rehm et al. 2015) and ExAC constrained genes (f = 98.5%). As expected, estimates of f in PTVs associated with autosomal-dominant diseases (f = 97.2%) were significantly higher than those from genes harboring autosomal-recessive diseases (f = 90.0%; P = 1.95 × 10−6, two-sided Fisher's exact test). Moreover, positive selection in human olfactory receptor genes has been reported (Gilad et al. 2003; Nielsen et al. 2005), and we estimate 3.9% of PTVs in the olfactory receptor genes exhibit signals of weak positive selection.

    Leveraging PTVs to study gene constraint

    We used pLI, the probability of being loss-of-function (LoF)-intolerant (Exome Aggregation Consortium et al. 2016), as a gene-level constraint metric for PTV intolerance (see Methods; Supplemental Fig. S8). Our estimates of pLI were not correlated with sequencing depth or transcript length (Supplemental Fig. S9). We observed logarithmic increases in the proportion of constrained (pLI ≥ 0.9) and unconstrained (pLI ≤ 0.1) genes with increasing sample sizes (Fig. 3A) and estimate ∼100,000 individuals would be adequate to fully study gene constraint. In our data set of 19,973 individuals, we estimate 121 (9.9%) genes are constrained and 682 (55.6%) genes are unconstrained. Using two-sided Fisher's exact tests, we observed that the 121 constrained genes were significantly enriched for a number of annotation terms (Fig. 3B; Supplemental Table S8), including ClinGen haploinsufficiency (P = 6.5 × 10−5; OR = 35.3; 95% CI: 4.2–1621.2), FMRP interactors (P = 8.8 × 10−16; OR = 15.0; 95% CI: 7.3–31.8), dominant diseases (P = 5.1 × 10−6; OR: 4.7; 95% CI: 2.4–9.2), essential in mice (P = 1.8 × 10−10; OR = 4.6; 95% CI: 2.8–7.3) and GWAS hits (P = 0.02; OR = 1.6; 95% CI: 1.1–2.5). Unlike Ji et al. (2016), we found constrained genes were significantly depleted relative to unconstrained in 123 genes related to recessive diseases (P = 0.001; OR = 0.2; 95% CI: 0.1–0.6) (Fig. 3B).

    Figure 3.

    Gene constraint of PTVs. (A) Increased proportions of constrained (pLI ≥ 0.9), unconstrained (pLI ≤ 0.1), and constraint-classified (pLI ≥ 0.9 or pLI ≤ 0.1) genes with greater sample sizes. (B) (Left) Proportions of constrained versus unconstrained genes of 19,973 Chinese across gene sets. (Right) Enrichment or depletion of constrained genes in the gene sets of interest. P-values and odds ratios with 95% confidence intervals were calculated via two-sided Fisher's exact tests. The dots and lines would appear in red if P < 0.05, otherwise in black. Detailed information is shown in Supplemental Table S8.

    Contribution of PTVs to psoriasis

    Next, we tested whether patterns of PTVs varied among healthy controls and psoriatic patients. We observed 4909 and 5012 PTVs in 9434 healthy controls and 9434 psoriasis patients, respectively (Supplemental Table S5). Using logistic regression (see Methods), we observed that the burden of PTVs was modestly enriched in psoriasis patients compared to healthy controls (8.6 vs. 8.5 in patients and controls, respectively; P = 0.02, logistic regression). Even though the sequencing depth as a covariate exhibited case-control difference (P < 2.0 × 10−16), the fact that higher sequencing depth was observed in controls compared to those in cases (Supplemental Table S3) made our observations more robust. Similarly, heterozygous PTVs (6.8 vs. 6.7 in patients and controls, respectively; P = 0.04, logistic regression) (Fig. 4A) and PTVs due to SNPs (3.6 vs. 3.5 in patients and controls, respectively; P = 2.4 × 10−4, logistic regression) (Fig. 4B) were enriched in patients compared to controls. We also observed substantially higher amounts of pPTVs (0.110 vs. 0.088 in patients and controls, respectively; P = 4.4 × 10−6, logistic regression) and heterozygous pPTVs (P = 4.8 × 10−6, logistic regression) in cases relative to controls (Fig. 4C; Supplemental Table S9). Note, other subclasses of pPTVs were also significantly different between psoriasis cases and controls (Fig. 4D; Supplemental Table S9). No differences in the proportion of rare and common PTVs were observed among patients and controls (Supplemental Fig. S10).

    Figure 4.

    PTV enrichments in psoriasis patients compared to healthy controls. (A) Per-individual rates of heterozygous and homozygous PTVs and across cases and controls. (B) Per-individual rates of PTVs due to SNVs, deletions, and insertions across cases and controls. (C) Per-individual rates of heterozygous and homozygous pPTVs and across cases and controls. (D) Per-individual rates of pPTVs of SNVs, deletions, and insertions across cases and controls. Note that P-values are calculated through logistic regressions and shown in red if there exists significant difference between psoriasis cases and controls. (*) P < 0.05, (**) P < 0.01, (***) P < 0.001. Detailed information is shown in Supplemental Table S9. (Het) Heterozygotes, (Hom) homozygotes, (Del) deletions, (Ins) insertions.

    To discover novel genetic risk factors underlying psoriasis, we calculated per-individual counts of PTVs and missense variants at the gene level. Using multiple logistic regression analyses and Bonferroni correction (see Methods), we observed a significant excess of PTVs in GJB2 in psoriasis cases (0.05/person; P = 1.9 × 10−7; OR = 1.5; 95% CI: 1.3–1.8, logistic regression) compared to controls (0.03/person) (Table 1). We also identified significantly different amounts of missense variants between psoriasis patients and controls in six genes that were previously reported to be associated with psoriasis in the Chinese population (Table 1; Supplemental Fig. S11; Tang et al. 2007; Sun et al. 2010; Qin et al. 2014; Sheng et al. 2014; Shen et al. 2015; Zhou et al. 2016).

    Table 1.

    Gene-based association results for PTVs and missense variants in 9434 cases and 9434 controls

    Signatures of selection among PTVs

    We calculated FST (Weir and Cockerham 1984; Akey 2002) to identify PTVs that exhibit high variation in allelic frequencies between Chinese (CHN, n = 19,973), non-Finnish Europeans (NFE, n = 33,370), and Africans (AFR, n = 5203). A total of 18 PTVs in 14 genes (Fig. 5A; Supplemental Table S10) were identified with significant population allele frequency differences (Z[FST] > 4.3; Bonferroni-corrected P < 0.05). Of these 14 genes, 10 were previously reported to be under positive selection. We also detected four novel genes with high FST that were not previously implicated as targets of local adaptation, harboring genes involved in sensory perception of sound (TMPRSS3), drug metabolism and immune response to pathogens (FMO2), and immune response (DMKN, FCGR2A).

    Figure 5.

    PTVs with high population differentiation in allelic frequency between Chinese (CHN), non-Finnish European (NFE), and African (AFR) populations. (A) Target-wide Z(FST) Manhattan plot. Note that Weir and Cockerham FST was normalized and transformed into Z values and the Bonferroni correction set the significant Z(FST) cutoff at 4.3. (B) (Top) Gene structure of FUT2. (Bottom) Nucleotide sequence and allele frequency distribution of a stop-gained G-A PTV (rs601338) with a Z(FST) of 18.5. (C) (Top) Gene structure of EFCAB13 with five PTVs of significant Z(FST). (Bottom) Nucleotide sequence and allele frequency distribution of a splice-acceptor G-A PTV (rs76299620) with a Z(FST) of 12.2.

    A description of all genes with highly differentiated PTVs is provided in the Supplemental Materials, but here we focus on FUT2 and EFCAB13. FUT2 (Fig. 5B), encodes the enzyme fucosyltransferase 2, which influences the expression of ABO blood group antigens. The lead SNP, rs601338 (Z = 18.5) with increased derived allele frequency in European and Africans, is a stop-gain variant where the reference allele rs601338*G encodes the secretor allele and the rs601338*A variant encodes the nonsecretor allele. The derived allele rs601338*A has been implicated in higher circulating serum vitamin B12 levels, slower progression of HIV, resistance to the Norovirus, and susceptibility to Crohn's disease, type 1 diabetes, celiac disease, and inflammatory bowel disease (Ferrer-Admetlla et al. 2009; MacArthur et al. 2017), possibly via changes in energy metabolism in the gut microbiome (Tong et al. 2014; Galinsky et al. 2016). This SNP is also a very strong expression quantitative trait locus (eQTL) for expression in the esophagus, pancreas, skin, stomach, colon, pituitary, brain, and minor salivary gland in the GTEx data (Karlsson et al. 2014; Carithers and Moore 2015). Previous studies have suggested balancing selection of this SNP (rs601338) in the African and European populations (Ferrer-Admetlla et al. 2009; Ishigaki et al. 2017) and more recent selection in UK Biobank populations (Galinsky et al. 2016). Positive selection of FUT2 was also reported in an East Asian population (Ferrer-Admetlla et al. 2009).

    EFCAB13 (EF-hand calcium binding domain 13) includes five PTVs with significantly high FST (Fig. 5C), including a lead SNP splice-acceptor variant of rs76299620 (Z = 12.2). EFCAB13 contains the EF-hand domain, which is indispensable for calcium ion binding and related cellular processes (Safran et al. 2010). Analysis of GTEx data (Carithers and Moore 2015) showed that the EFCAB13 is highly expressed in testis and thyroid but is also widely expressed across many tissue types. EFCAB13 is subject to selection in Africans (Pickrell et al. 2009) and suggestive of recent positive selection in the UK Biobank population (Palamara et al. 2018).

    Discussion

    Here, we described the largest study of PTVs in the Chinese population to date. Our analysis provides quantitative estimates on the frequency of PTVs at the individual and population levels. However, our study has several limitations. For example, our results are based on 1320 genes, which may not be representative of the exome. Indeed, the 1320 targeted genes are overrepresented with immune-disease-related loci (∼47%) (Methods; Tang et al. 2014) and make up only ∼7% of ∼19,000 human protein coding genes. The overrepresentation of immune-disease loci may account, for example, for why the fraction of deleterious PTVs in these 1320 genes is 0.5%–6% higher compared to whole-exome data in worldwide populations of ExAC (Supplemental Fig. S12).

    Our observation that PTVs are enriched in psoriasis patients is consistent with previous studies in psychiatric disorders, including autism spectrum disorder (ASD), amyotrophic lateral sclerosis, intellectual disability, schizophrenia, bipolar disorder, and attention deficit hyperactivity disorder (ADHD) (Ganna et al. 2018; Farhan et al. 2019; Satterstrom et al. 2019). However, Satterstrom et al. (2019) observed that ASD and ADHD cases harbored a significant excess of novel (not found in ExAC), constrained (occurring in genes with a pLI of at least 0.9), and rare (10 copies or fewer in the population) PTVs, whereas we found rates of novel PTVs showed significant differences between psoriasis cases and controls (Supplemental Fig. S13).

    We identified 18 PTVs in 14 genes as candidates for positive selection. These results are consistent with the “less is more” hypothesis of adaptive gene loss (Olson 1999), which posits that loss of gene function may play a role in adaptive evolutionary change. Note, however, that large allele frequency differences may also arise from neutral processes such as genetic drift. For example, although rs28369860, a frameshift, has a much higher frequency in the African population compared to the European and East Asian populations (Supplemental Table S10), previous studies showed no evidence of selection in FMO2 (Veeramah et al. 2008). Thus, additional analyses are necessary to fully interpret the evidence for adaptive PTVs in these 14 genes.

    In summary, our study complements and extends previous large-scale studies of PTVs (MacArthur and Tyler-Smith 2010; Rivas et al. 2015; Exome Aggregation Consortium et al. 2016; Cassa et al. 2017; Saleheen et al. 2017; DeBoever et al. 2018; Emdin et al. 2018; Ganna et al. 2018; Karczewski et al. 2020; Van Hout et al. 2020) and illustrates the benefits of sampling diverse populations. Our findings help clarify per-individual effects and burdens of PTVs, their contribution to psoriatic disease, and their evolutionary significance.

    Methods

    Study sample and targeted resequencing

    The data set we used in this paper was initiated as independent second-stage validation of 1326 genes after first-stage GWAS discoveries of psoriasis-related loci (Tang et al. 2014). A total of 21,309 samples were collected through collaborations with multiple Chinese hospitals (Sheng et al. 2014; Tang et al. 2014; Zhou et al. 2016; Zhen et al. 2019). All individuals are of Han Chinese ancestry. Involved healthy control subjects were not found with psoriasis, autoimmune diseases, systemic disorders, or family history of psoriasis (including relatives within third-degree). Written informed consent was provided by every participant. The study was reviewed and approved by the Biomedical Ethics Board of Anhui Medical University, Hefei, China.

    Targeted 1326 genes included 622 (47.0%) known immune-disease–related genes, which included 57 genes in 44 psoriasis susceptibility loci identified by GWAS. Other selected genes were 742 top genes in the GWAS of whole-exome sequencing of 781 psoriasis cases and 676 controls.

    Deep sequencing was performed on 21,309 samples with 3.2 Mb coding sequence of 1326 genes. The details of library construction, targeted region capture, and sequencing were described in previous publications (Sheng et al. 2014; Tang et al. 2014; Zhou et al. 2016). Data preprocessing, including alignments, variant calling, and quality controls, was included in the Supplemental Materials. Note that the GRCh37 build assembly was utilized as reference sequence in our analysis in order to facilitate comparisons to previous PTV studies. The differences between GRCh38 and GRCh37 are most significant for highly repetitive genomic regions (Guo et al. 2017). Thus, because our analyses focused on single-copy exonic sequences, we do not expect our set of PTVs and results to substantially differ between GRCh37 and GRCh38. Genomic coordinates between GRCh38 and GRCh37 can be converted using the liftOver tool provided by the UCSC Genome Browser (Kent et al. 2002).

    Definitions of pathogenic PTVs

    PTVs are deemed as pathogenic if they meet either of the following two criteria: (1) Annotated as pathogenic or likely pathogenic with star 1+ by ClinVar (i.e., one submitter with assertion criteria, multiple submitters with assertion criteria, expert panel, or practice guideline) (Landrum et al. 2018). Note, that the term “likely pathogenic” is used to mean greater than 90% certainty of a variant being disease-causing (Richards et al. 2015). (2) Annotated as disease-causing mutations (“DM”) by HGMD (Stenson et al. 2017).

    Fraction of deleterious or advantageous variants

    The fraction of sites under selection pressure (f) was applied to quantify the burden of deleterious PTVs (Moon and Akey 2016). Computation of f involves comparing the site frequency spectrum (SFS) between a class of test sites and a class of putatively neutral variants (synonymous variants). Our tests sites include missense variants, PTVs, and their breakdowns according to CDS position (head: <10%, body: 10%–90%, and tail: >90%), variant types (SNV, insertion, and deletion), consequence (frameshift, splice, and stop-gain), and gene sets defined in the Supplemental Materials. Note that SNVs can cause PTVs through stop-gain or splicing variants, whereas PTVs caused by indels could be due to stop-gain, frameshift, or splicing variants. Positive and negative values of f represent estimates for the fraction of test sites that are under purifying and positive selection, respectively. Estimates of f are robust to various evolutionary and demographic confounding forces (Moon and Akey 2016).

    We defined the unfolded SFS as a vector η = (η1, η2, …, ηn−1), where ηi denotes number of sites that comprised i derived alleles and n − i ancestral alleles for a given context. Variants with minimum non-missingness of 99% were selected to compute the unfolded SFS, which resulted in ∼86.0% of targeted sites to be included. We classified ancestral alleles regarding ancestral genome sequence for Homo sapiens (GRCh37), which was constructed by The 1000 Genomes Project Consortium et al. (2015) and available via the website ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_ancestor_GRCh37_e59.tar.bz2. We only took into consideration biallelic sites with ancestral alleles of high-confidence calls. To account for missing data, we projected SFS down to 99% (19,773/19,973) of samples based on a hypergeometric distribution. We further generated the SFS for Chinese psoriasis cases, Chinese healthy controls, and other worldwide populations in ExAC using the same strategy.

    Gene constraint analysis with pLI

    We used pLI (Exome Aggregation Consortium et al. 2016) as the probability of being loss-of-function to infer gene constraint and intolerance to PTVs in our Chinese samples. Briefly, we applied ExAC's selection-neutral, sequence context-based mutational model to compare the observed number of rare (minor allele frequency, MAF < 0.1%) variants per gene to an expected number of those after depth correction (Supplemental Fig. S8A–C), then assessed its deviation with a corrected Z score for each mutational class (synonymous, missense, and protein-truncating) (Supplemental Fig. S8D) in 1226 cleaned transcripts. Finally, an expectation-maximization algorithm with inputs of the observed and expected PTV counts within each gene was used to classify genes into three categories: null, where pLI ≤ 0.1 (PTVs are tolerated); recessive, where pLI > 0.1 and pLI < 0.9 (heterozygous PTVs are tolerated); and haploinsufficient, where pLI ≥ 0.9 (heterozygous PTVs are not tolerated). For further details, see Samocha et al. (2014) and Exome Aggregation Consortium et al. (2016). Two-sided Fisher's exact tests were performed to test for enrichments or depletions of constrained genes (pLI ≥ 0.9) in the specific gene set than in the rest of the target genes.

    Association study of psoriasis using PTVs

    For the individual level of PTV burdens between cases and controls, P-values were calculated via the logistic regression, in which covariates of gender, age, average sequencing depth, and the top 10 principle components (PCs) from LD-pruned sequence data were included to control for possible confounding factors. For calculating gene-level P-values for classes of variants (protein-truncating, synonymous, and missense variants), covariates in the logistic regression model included gender, age, the first 10 PCs, mean depth at sites passing QC, percent of QCed targeted sites covered at a depth of at least 20, and number of corresponding synonymous and missense variants. Gene-based P-values were then subject to Bonferroni correction. The quantile-quantile (QQ) plot of gene-based associations is shown in Supplemental Figure S11.

    Population differentiation analysis using FST

    FST was calculated for each PTV to quantify allele frequency difference between our Chinese, non-Finnish Europeans, and African samples (Exome Aggregation Consortium et al. 2016). We calculated FST as previously described (Weir and Cockerham 1984; Akey 2002). To determine which lineage(s) the allele frequency changed in, we also calculated three pairwise sets of FST values, which included FST (CHN-NFE), FST (CHN-AFR) and FST (NFE-AFR). We excluded PTVs in which the minor allele has frequency estimates of 0 in at most two populations or for which no data are available for the NFE or AFR populations, which left 1824 PTVs for FST computation. A Bonferroni correction gave 18 PTVs with Z > 4.3 (comparable to P < 0.05/1824) for target-wide significance after normalization of FST.

    Data access

    All raw sequencing data and locus-based summary of filtered variant calls generated in this study have been submitted to the CNGB Sequence Archive (CNSA, https://db.cngb.org/cnsa/) (Guo et al. 2020) of China National GeneBank DataBase (CNGBdb) under accession numbers CNP0000412 and CNP0001423, respectively. The release of summary data was approved by the Ministry of Science and Technology of China (Project ID: 2021BAT0726).

    Competing interest statement

    J.M.A. is a paid consultant of Glenview Capital. Other authors declare no competing interests.

    Acknowledgments

    This work was supported in part by the National Science Fund for Excellent Young Scholars (No. 81222022), National Natural Science Foundation of China (No. 81773313, No. 81573035, No. 81972927, No. 61433009, No. 32000398), Natural Science Foundation of Guangdong Province, China (No. 2017A030306026, No. 2015A030308017), Anhui Institute of Translational Medicine (No. ZHYX2020A005), Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases (No. 2019B121205005), and National Institutes of Health grant R01GM110068 (J.M.A.).

    Author contributions: L.S., J.M.A., X.J., H.X., and Q.Z. conceived and designed the study; L.S., X.J., and J.M.A. supervised analyses; H.X., Yong Z., L.L., T.Z., Yuanwei Z., and H.Y. conducted data preprocessing and analyses; W.F. and J.M.A. oversaw data quality control; S.M. provided the algorithm and directions to compute and analyze f; L.F., Yong Z., and X.J. provided computation resources; S.L. and R.C. provided feedback to improve manuscript writing; Q.Z., B.L., H.G., W.C., Q.X., Y.Y., L.Y., S.C., H.Z., L.C., R.Z., and X.H. participated in the retrieval of original data and the revision of the manuscript and provided different degrees of contribution to the integration of the manuscript data; M.B., Yuwen Z., Q.Z., X.J., and L.S. performed overall study coordination; H.X., J.M.A., and S.M. designed the figures; H.X. and J.M.A. primarily wrote the manuscript with contributions from all authors.

    Footnotes

    • Received June 27, 2020.
    • Accepted May 10, 2021.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    References

    | Table of Contents

    Preprint Server