Negligible Male Gene Flow Across Ethnic Boundaries in India, Revealed by Analysis of Y-Chromosomal DNA Polymorphisms

  1. Nitai Pada Bhattacharyya1,
  2. Priyadarshi Basu1,
  3. Madhusudan Das2,
  4. Srimanta Pramanik1,
  5. Rajat Banerjee2,
  6. Bidyut Roy3,
  7. Susanta Roychoudhury2, and
  8. Partha P. Majumder3,4
  1. 1Crystallography and Molecular Biology Division, Saha Institute of Nuclear Physics, Calcutta, India; 2Department of Human Genetics, Indian Institute of Chemical Biology, Calcutta, India; 3Anthropology and Human Genetics Unit, Indian Statistical Institute, Calcutta 700 035, India

Abstract

From the historically prevalent social structure of Indian populations it may be predicted that there has been very little male gene flow across ethnic boundaries. To test this finding, we have analyzed DNA samples of individuals belonging to 10 ethnic groups, speaking Indo-European or Austroasiatic languages and inhabiting the eastern and northern regions of India. Eight Y-chromosomal markers, two biallelic and six microsatellite, were studied. All populations were monomorphic for the deletion allele at the YAP (DYS287) locus and for the 119-bp allele at the DYS288 locus. Y-chromosomal haplotypes were constructed on the basis of one RFLP locus and five microsatellite loci. The haplotype distribution among the groups showed that different ethnic groups harbor nearly disjoint sets of haplotypes. This indicates that there has been virtually no male gene flow among ethnic groups. Analysis of molecular variance revealed that there was significant haplotypic variation between castes and tribes, but nonsignificant variation among ranked caste clusters. Haplotypic variation attributable to differences in geographical regions of habitat was also nonsignificant.

Although several early studies (Jakubiczka et al. 1989; Malaspina et al. 1990; Spurdle and Jenkins 1992; Dorit et al. 1995; Hammer 1995; Whitfield et al. 1995) pointed to a low level of variation in the Y chromosome, it has now been established beyond doubt that there are many Y-chromosomal markers that are highly polymorphic in all global populations (Deka et al. 1996; Ruiz-Linares et al. 1996;Santos and Pena 1996; Hammer et al. 1997; Karafet et al. 1997;Rodriguez-Delfin et al. 1997; Zerjal et al. 1997). Because the Y chromosome, except for its telomeric regions, is transmitted uniparentally (paternally) as a linkage group, it has turned out to be extremely useful in population genetic studies for establishing paternal lineages (Deka et al. 1996). Studies on Y-chromosomal variation permit the interesting possibility of contrasting male-specific histories of populations to female-specific ones, which are revealed by mitochondrial DNA (mtDNA) studies.

Population differentiation with respect to the Y chromosome has been studied in many regions of the world, and India represents one of the most ethnically and genetically diverse regions (Majumder 1998). Socially, the vast majority (∼80%) of the Indian population belong to the Hindu religious fold and are organized into ∼2000 caste groups, each of which belongs to a socially ranked (broadly, upper, middle, and lower) caste cluster. The social rank is dependent on occupation, certain beliefs of purity and pollution, and continued settlement in a particular geographical location (Thapar 1992). The tribal populations of India are organized into clan groups; there are ∼400 tribes in India. Additionally, there are several religious communities, such as Sikhs, Muslims, Christians, Jews, etc. Marriages between different religious groups are extremely infrequent. The caste structure is also fairly rigid, and each caste remains as an endogamous unit, although the levels of endogamy can vary substantially (Malhotra and Vasulu 1993). The extent of admixture among caste groups of the same social rank is higher than among those belonging to differnt social ranks. Boundaries of middle caste groups have been the most fluid; these groups have admixed with both upper and lower caste groups. Despite the admixture between caste groups, the genetic implication of the approved social rule of hypergamy, by which a man can marry a woman belonging to a caste of lower social rank and continue to retain his caste affiliation (the woman is absorbed in her husband’s caste subsequent to marriage), is that crossings of Y chromosomes across ranked caste-cluster boundaries have been negligible in historical times. We note that the converse union of a woman marrying a man of a lower social status and retaining her caste affiliation (hypogamy) has been discouraged, historically. In this extremely infrequent type of marriage, the woman moves to the husband’s caste, which is of a lower social rank. Although the rules governing marriage, that is, admixture of genes among castes, are clearly delineated, the practices leading to intermixture of genes between castes and tribes or other religious groups have not been consistent. Often such marriages result in social ostracization and excommunication, forcing the spouses to move to other geographical areas; thereafter, they become absorbed by a local group, generally of a low social rank. In view of the interesting social norms governing marriage, it is of considerable interest to examine the degree of differentiation of population groups (including castes, tribes, and other religious groups) of India with respect to male lineages. While the present study was in progress, a similar study, comprising 12 caste groups inhabiting a restricted geographical area (Andhra Pradesh) within India, recently has been completed (Bamshad et al. 1998). This study has indicated a relative lack of male gene flow among castes compared to female gene flow. In this paper we report results of a study conducted among 10 population groups (8 caste and 2 tribal) inhabiting a much wider geographical area (three different states—West Bengal and Orissa in the eastern region and Uttar Pradesh in the northern region—of the Republic of India) and covering two linguistic families (Indo-European and Austroasiatic), with respect to eight Y-chromosomal markers (two biallelic and six short-tandem repeat markers). Ethnic descriptions, sampling locations, and sample sizes of the 10 study populations are given in Table 1. Our study, in addition to permitting estimation of male gene flow across caste boundaries, also permitted estimation of such gene flow across caste and tribal boundaries and has a wider geographical coverage than the study of Bamshad et al. (1998).

Table 1.

Names of Study Populations, Locations of Sampling, Ethnological and Linguistic Descriptions, and Sample Sizes

RESULTS

All populations were monomorphic for the Y Alu polymorphic [YAP−(deletion)] allele at the DYS287 locus and for the 119-bp allele at the DYS288 locus. [Correspondences between repeat numbers and allele sizes (bp) at all STR loci were obtained from Kayser et al. (1997).] The remaining six loci were polymorphic in all populations.

Haplotypes constructed on the basis of data of the six polymorphic loci and their frequencies observed in the study populations are presented in Table 2. The 125 sampled individuals harbored 81 distinct haplotypes, indicating extensive Y-chromosomal diversity. It was also observed that the haplotypes were generally population-specific; that is, the sets of haplotypes observed in the different study populations were largely disjoint. Only 12 (15%) of the 81 distinct haplotypes were shared between populations. Among the northern Indian populations of Uttar Pradesh (Brahmin, Chamar, and Rajput), the total number of distinct haplotypes was 35, of which only 3 (8.6%) were shared among the populations (1 haplotype was shared between the upper caste Brahmin and middle caste Rajput; 2 were shared between Rajput and lower caste Chamar). Among the eastern Indian populations of West Bengal and Orissa (Brahmin, Agharia, Bagdi, Mahishya, Tanti, Lodha, and Santal), the total number of distinct haplotypes observed was 60, of which only 7 (11.7%) were shared among the populations. The upper caste Brahmin shared one haplotype with middle caste Agharia and another with Santal tribals; the Agharia also shared one other haplotype with lower caste Mahishya and two other haplotypes with Lodha tribals; the Mahishyas shared a haplotype with the Santal tribals; and one haplotype was shared by three groups—the lower caste Mahishya and Tanti and the two tribal groups of Lodha and Santal. Therefore, there was minimal sharing of haplotypes among the ethnic groups studied.

Table 2.

Y-Chromosomal Haplotypes and Their Frequencies in 10 Ethnic Populations of India

However, because the sample sizes of the individual ethnic populations were small and because our hypothesis largely pertained to ranked caste cluster boundaries, we decided to carry out further analyses by pooling data of ethnic groups belonging to separate ranked caste clusters (upper, middle, and lower) and also of the two tribal groups. The sample sizes of these pooled categories were upper caste = 27, middle caste = 29, lower caste = 37, and tribal cluster = 32. The numbers of distinct haplotypes observed among upper, middle, and lower castes were, respectively, 18, 27, and 30. This number among the tribes was 20. The upper castes shared two haplotypes with the middle castes and three with the lower castes, but none with the tribals. The middle castes additionally shared five haplotypes with lower castes and three with tribes. The lower castes also shared three haplotypes with tribes. We noted that of these shared haplotypes, one haplotype was shared by all three caste clusters and another was shared by the middle and lower caste groups with the tribals. The haplotypes shared by these four clusters of populations are presented in Table 3. Therefore, even when individual ethnic populations are grouped in ranked clusters, there is very little haplotype sharing among clusters, indicating that there has been minimal gene flow even across these ethnic clusters. It is, however, noteworthy that the upper castes, while sharing haplotypes with middle and lower castes, do not share any haplotype with the tribes.

Table 3.

Y-Chromosomal Haplotypes Shared by Various Ethnic Clusters of India

We sought to examine whether the observed extent of haplotype sharing among these ranked clusters of populations was statistically significantly greater than expected by chance. Because the frequencies of distinct haplotypes vary among clusters, to test this hypothesis, it is not sufficient to take only number of distinct haplotypes shared, but to actually count the numbers of individuals between pairs of clusters that have identical haplotypes. For the six pairs of clusters, the numbers of individuals with identical haplotypes were upper caste and middle caste = 5, upper caste and lower caste = 3, upper caste and tribal cluster = 0, middle caste and lower caste = 8, middle caste and tribal cluster = 11, and lower caste and tribal cluster = 8. Based on 500 simulation runs (details provided in Methods), the upper 95% cutoff values of the distribution of the numbers of individuals who possess identical haplotypes by chance were, for the six pairs of clusters listed above, 14, 17, 16, 19, 16, and 19, respectively. Because all of the observed values are smaller than these cutoff values, it is clear that the observed sharing of haplotypes between clusters of populations is not statistically significant.

The allele frequencies and haplotype diversities are presented in Figures 1 and 2, respectively. Considerable variation in allele frequencies has been observed at many loci among these clusters of populations. For example, at DYS391, not all alleles are observed in all the clusters. Although all clusters harbor very high levels of haplotype diversity (range, 0.9658–0.9975), the highest level of haplotype diversity (0.9975 ± 0.0099) is observed among middle caste populations.

Figure 1.

Allele frequency distributions at six polymorphic Y-chromosomal loci in four ethnic clusters of India.

Figure 2.

Y-chromosomal haplotype diversities in four ethnic clusters of India.

We have performed analyses of molecular variance (Excoffier et al. 1992) to quantitatively establish that there is much greater haplotypic variation within clusters of populations than between them. Because the analysis of molecular variance (AMOVA) takes into account not only the relative frequencies of haplotypes but also the number of mutational steps separating pairs of haplotypes, we excluded the αhHindIII RFLP locus and analyzed haplotypes defined by the five polymorphic microsatellite loci only. The individual ethnic populations were suitably grouped to enable examination of the effect on linguistic, geographical, and social rank on Y-chromosomal haplotype differentiation.

It may be noted from Table 1 that there is a complete confounding of linguistic and caste/tribal affiliations of populations. All caste populations speak languages that belong to the Indo-European family, whereas both tribal populations are Austroasiatic speakers. AMOVA results showed that although 95.75% of the Y-chromosomal microsatellite haplotypic variation was within populations belonging to the two language families (or within populations belonging to caste or tribal clusters), the extent of variation between these groups of populations was 4.25% [F(ST) = 0.0425]. ThisF(ST) value was statistically significant, indicating that significant Y-chromosomal structuring due to differences in language, or analogously, for the present data set, due to caste–tribal differences. When we subdivided our data further by ranked caste categories (i.e., when the caste populations were grouped into the three separate ranked clusters), the additional variance thus explained was not statistically significant.

To examine whether Y-chromosomal variation was significantly structured because of differences in geographical locations of habitat of the populations, we grouped the populations as “northern Indian inhabitants” and “eastern Indian inhabitants”. Only 8 (13.3%) of the 60 distinct haplotypes (defined by five polymorphic microsatellite loci) were shared between populations inhabiting these two geographical regions. (It may be noted that of the 81 observed haplotypes defined on the basis of six polymorphic loci, several haplotypes defined by five microsatellite loci occurred on both αhHindIII+ and αhHindIII− backgrounds; hence, the number of distinct haplotypes dropped from 81 to 60.) AMOVA results indicated that haplotypic variation attributable to geographical differences was not statistically significant.

Because microsatellite loci are known to have higher mutation rates than biallelic RFLP loci, we sought to examine whether insights into the evolutionary histories of these populations can be obtained by examining variations at the microsatellite loci separately for chromosomes that possess the αhHindIII restriction site and for chromosomes that do not possess this site. In the pooled set of chromosomes from all populations, there were 59 (47.2%) chromosomes that possessed the HindIII site and 66 (52.8%) that did not. This difference was not statistically significant at the 5% level. Although these two groups of chromosomes shared only 5 (8.33%) haplotypes out of 60 distinct haplotypes (defined by the 5 polymorphic microsatellite loci), the ranges and frequency distributions of repeat numbers at the microsatellite loci between these two groups of chromosomes were, however, strikingly similar (Table4). The variances among individuals of repeat numbers at the microsatellite loci in these two groups of chromosomes (αhHindIII+, αhHindIII−) were DYS19 = (0.74, 0.72), DYS389I = (0.46, 0.36), DYS390 = (1.45, 1.48), DYS391 = (0.69, 0.26), and DYS393 = (0.90, 0.72). Therefore, except for the DYS391 locus at which the αhHindIII+ chromosomes showed greater variability, the extent of variability at all the other microsatellite loci were similar in both groups of chromosomes. Furthermore, the numbers of distinct microsatellite haplotypes in αhHindIII+ and αhHindIII− chromosomes were 39 and 42, respectively. The haplotype diversities in these two groups were, respectively, 0.97 ± 0.01 and 0.98 ± 0.01. Therefore, it appears that the antiquities of both of these groups of chromosomes are roughly equal. All populations harbor microsatellite haplotypes on both αhHindIII+ and αhHindIII− backgrounds. Differentiation of the populations, therefore, seems to have taken place after this locus became polymorphic.

Table 4.

Allele Frequencies at Five Y-Chromosomal Microsatellite Loci for Chromosomes on αhHindIII + and − Backgrounds

We have also examined the relationships among the haplotypes defined by the five microsatellite loci. The haplotype tree (not shown) comprised four major clusters of haplotypes; one haplotype (Table 1, haplotype 43) formed a single-point cluster. [This haplotype, observed among the tribal Santals, possessed a 15-repeat allele at the DYS391 locus, which was not observed in any other population.] Contrary to our expectations, however, the clusters of haplotypes did not correspond to the ethnic clusters.

DISCUSSION

The prevailing social customs of hypergamy and hypogamy in India restrict male gene flow among ethnic groups of India. We have tested this prediction, using several biallelic and microsatellite Y-chromosomal DNA markers.

The YAP element, which is found in varying frequencies in most global populations, is absent in all the populations included in the present study. The YAP+ frequency is very high among most African groups and low among European populations (see Table 7 of Passarino et al. 1998). The absence of this element among Indian populations confirms that Indians show relatively more genetic similarities with the Caucasoids than with the Negroids (Majumder 1998). It is, however, noteworthy that our earlier studies have revealed that the Austroasiatic tribal populations of India have some of the human-specific Aluelements in the nuclear genome at frequencies that are similar to those found in many African populations (Majumder et al. 1999). The ranges of repeat numbers and allele frequencies at the polymorphic microsatellite loci in the study populations are consistent with global estimates (Kayser et al. 1997). At most of these loci, the most frequent allele is not the same across populations. This may indicate different origins of the study populations or may be due to effects of genetic drift. The locus DYS288 was found to be monomorphic. Comparable data at this locus are not available from many other populations (Kayser et al. 1997).

High levels of haplotype diversity were noted in all clusters of populations. Consistent with our earlier findings, based on serum protein and enzyme polymorphisms, the ethnic populations of India harbor higher levels of genetic diversity that most comparable global regions (Majumder 1998). The highest level of haplotype diversity was found among the middle castes. This finding is not unexpected in view of the fact that the social boundaries of the middle caste groups have, historically, been the most fluid.

Nearly disjoint sets of haplotypes were found among the study populations. Because the effective population size with respect to the Y chromosome is only one-quarter of the autosomal effective population size, this phenomenon may be due to drift effects but is consistent with the prevailing norms governing marriage that severely restrict male gene flow across ethnic boundaries. In the caste hierarchy, the middle caste groups are expected to be the most fluid genetically. The data on haplotype sharing presented in this paper are largely consistent with this expectation. However, although within a restricted geographical region (eastern or northern India in the present study) the caste groups belonging to the upper social rank do not share haplotypes with groups belonging to the lower social rank, there is such sharing of haplotypes across geographical regions. This may indicate that when there are unions, within or outside of marriage, between an upper caste man and a lower caste woman inhabiting a geographical area, there is a tendency for them to move away to distant geographical areas and then affiliate with a lower, not upper, caste in the new location. The fact that tribal clusters share haplotypes with middle and lower castes, but not with upper castes, is also interesting. There are documented instances of tribal groups that after relinquishing the hunter–gatherer life style and adopting agriculture, were converted to castes—mostly lower castes (Bose 1953; Mandelbaum 1970). Sharing of haplotypes among clusters of populations was, however, not significantly higher than chance expectation.

The extent of molecular variance attributable to differences among socially ranked clusters was found to be statistically nonsignificant. This is striking and indicates that Y-chromosomal variation is not structured by social rank, consistent with the anthropological finding that there has been social mobility and variations in ranks of ethnic groups in India (Thapar 1992). However, our analysis has revealed that the extent of molecular variation at the Y-chromosomal microsatellite loci between castes, who are all Indo-European speakers in our sample, and tribes, who are all Austroasiatic speakers in our sample, is significant. No such significant variation was observed between geographical regions.

The comparison of chromosomes on αhHindIII+ and αhHindIII− backgrounds has also revealed an interesting aspect of the population differentiation in India. If one makes the reasonable assumption that the loss of a restriction site is more probable than its gain, it is clear that the αhHindIII site loss was a very ancient mutation and that the two alleles at this locus had reached nearly equal frequencies before the differentiation of the study populations into separate ethnic groups. The finding that the most frequent alleles at all microsatellite loci, except at DYS393, are the same on chromosomes with αhHindIII+ and αhHindIII− backgrounds further corroborates this view. However, the fact that the clustering of haplotypes did not correspond to the ethnic clusters, is contrary to the simple expectation arising from the finding of highly disjoint sets of haplotypes among the populations. We are unable to provide a clear explanation of this lack of correspondence. We hypothesize that there was large Y-chromosomal haplotype diversity even before the people of India became organized into distinct social groups and that each social group was formed by a restricted number of male lineages. Collection of data on more biallelic loci and further study variation at microsatellite loci on the multibiallelic locus backgrounds in these populations will contribute to a deeper understanding of the population history of India.

METHODS

Populations

One hundred twenty-five males belonging to 10 ethnic populations of India were studied. Their anthropological details and sample sizes are presented in Table 1. All individuals were not related at least at the first-cousin level.

DNA Isolation and Genotyping

From each selected individual, 5–10 ml of blood was drawn with consent. DNA was isolated following the protocol of Miller et al. (1988). PCR primers and conditions used for screening DYS19, DYS287, DYS288, DYS389I, DYS390, DYS391, and DYS393 loci were essentially the same as those given in Jobling and Tyler-Smith (1995). However, for the DYS19 and DYS391 loci, PCR-amplified products were run on 6% sequencing gels, transferred to Hybond N+ membranes, probed with one of the end-labeled primers, blotted, and autoradiographed. Band sizes were determined by comparing against locus-specific allelic ladders. For the DYS288, DYS389I, DYS390, and DYS393 loci, product sizes were determined from electrophoretographs, using GeneScan Analysis version 2.02 in an ABI-377 automated DNA sequencer. The αhHindIII site was screened, using the primers and protocols given in Santos et al. (1995).

Statistical Analysis

For estimating haplotype diversities and performing AMOVA, Arlequin version 1.1 (Schneider et al. 1997) was used. Significance of variance components was tested, using the nonparametric permutation procedure approach described in Excoffier et al. (1992).

For testing the significance of the observed numbers of shared haplotypes among individuals between pairs of the four ranked ethnic clusters, we carried out a permutation test as follows. The total number of sampled individuals in our study was 125; the numbers of individuals belonging to the four ranked clusters—upper caste, middle caste, lower caste, and tribal clusters—were, respectively, 18, 27, 30, and 20. We first randomly permuted the haplotype data of the 125 individuals and then partitioned the data into four subsets of sizes 18, 27, 30, and 20, respectively. To obtain the number of individuals who shared haplotypes between any two subsets I and J(I < J; I,J = 1,2,3,4), we compared individuals i and j(i ε I, j ε J; i < j) and checked whether they had the same haplotypes. For all pairs of the four subsets, numbers of shared haplotypes were counted. This procedure was repeated 500 times. The upper 95% cutoff point of the frequency distribution (based on the 500 replications) of the number of haplotypes shared between subsetsI and J was then calculated. If the actual number of shared haplotypes between the corresponding pair of clusters was less than this cutoff point, then the observed number of shared haplotypes was declared to be statistically nonsignificant at the 5% level.

For constructing a tree of observed haplotypes defined by the five polymorphic microsatellite loci, we computed pairwise distances between haplotypes, using the squared Euclidean distance measure, and then the UPGMA and neighbor-joining clustering algorithms.

Acknowledgments

This work was supported by a grant from the Department of Biotechnology, Government of India. We are grateful to Badal Dey, Monami Roy, Madan Chakraborty, R.S. Balgir, and B.P. Dash for their participation in fieldwork for collection of samples. We are also grateful to Chris Tyler-Smith and Andres Ruiz-Linares for information and advice, and to Lynn Jorde and W. Scott Watkins for contributing some labeled primers for initiation of this work. Suggestions provided by an anonymous reviewer were extremely helpful.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.

Footnotes

  • 4 Corresponding author.

  • E-MAIL ppm{at}isical.ac.in; FAX 91-33-577 6680.

    • Received February 16, 1999.
    • Accepted June 8, 1999.

REFERENCES

| Table of Contents

Preprint Server