Rice Transposable Elements: A Survey of 73,000 Sequence-Tagged-Connectors

Long Mao; Todd C. Wood; Yeisoo Yu; Muhammad A. Budiman; Jeff Tomkins; Sung-sick Woo; Maciek Sasinowski; Gernot Presting; David Frisch; Steve Goff; Ralph A. Dean; Rod A. Wing

doi:10.1101/gr.10.7.982

Rice Transposable Elements: A Survey of 73,000 Sequence-Tagged-Connectors

¹Clemson University Genomics Institute, Clemson, South Carolina 29634 USA; ²Novartis Agricultural Discovery Institute, San Diego, California 92121 USA

Abstract

As part of an international effort to sequence the rice genome, the Clemson University Genomics Institute is developing a sequence-tagged-connector (STC) framework. This framework includes the generation of deep-coverage BAC libraries from O. sativa ssp.japonica c.v. Nipponbare and the sequencing of both ends of the genomic DNA insert of the BAC clones. Here, we report a survey of the transposable elements (TE) in >73,000 STCs. A total of 6848 STCs were found homologous to regions of known TE sequences (E<10⁻⁵) by FASTX search of STCs against a set of 1358 TE protein sequences obtained from GenBank. Of these TE-containing STCs (TE–STCs), 88% (6027) are related to retroelements and the remaining are transposase homologs. Nearly all DNA transposons known previously in plants were present in the STCs, including maize Ac/Ds,En/Spm, Mutator, and mariner-like elements. In addition, 2746 STCs were found to contain regions homologous to known miniature inverted-repeat transposable elements (MITEs). The distribution of these MITEs in regions near genes was confirmed by EST comparisons to MITE-containing STCs, and our results showed that the association of MITEs with known EST transcripts varies by MITE type. Unlike the biased distribution of retroelements in maize, we found no evidence for the presence of gene islands when we correlated TE–STCs with a physical map of the CUGI BAC library. These analyses of TEs in nearly 50 Mb of rice genomic DNA provide an interesting and informative preview of the rice genome.

Transposable elements (TEs) are ubiquitous in all organisms (Burge and Howe 1989; Xiong and Eickbush 1990). In plants, TEs are classified into two main classes (Flavell et al. 1994). Retrotransposons comprise Class I and transpose via an RNA intermediate. Class I TEs include retrotransposons with long terminal repeats (LTRs) such as Ty1/Copia-like and Ty3/Gypsy-like, as well as non-LTR retrotransposons. The class II TEs transpose via a DNA intermediate and in plants have been found mainly in maize. Class II TEs include Ac/Ds,En/Spm, and Mutator (Federoff 1989). MITEs, that is, miniature inverted-repeat transposable elements, such as maizeTourist and Stowaway, fall into a newly described third class of TEs (Bureau and Wessler 1992, 1994a,b, 1996). The mechanism of transposition of MITEs is still unclear, although they have received considerable attention recently due to their high copy numbers and tendency to be associated with genes in maize (Wessler et al. 1995; Zhang et al. 2000).

Rice (Oryza sativa) is the main staple food for more than half of the world's population and is of great economic importance. Among the cereal grasses, rice has the smallest genome size (430 Mb) and, as revealed by comparative mapping, has substantial conservation of synteny with other cereal crops such as maize, sorghum, and wheat (Gale and Devos 1998). Consequently, rice is an ideal representative for cereal genomics studies and is the focus of an international effort to completely sequence its genome. Although numerous TEs have been reported in rice, no comprehensive investigation has been carried out on a genome-wide scale, because the majority of rice TEs were uncovered by chance or by limited assays using conserved regions such as reverse transcriptase of retrotransposons (Hirochika et al. 1992; Motohashi et al. 1996; Kumekawa et al. 1999). As part of the International Rice Genome Sequencing Project (IRGSP), a rice BAC library was constructed from a partial HindIII digest of the genome of the rice variety Nipponbare (Budiman 1999), and the ends of BAC clone inserts have been sequenced. BAC end sequences will serve as sequence-tagged-connectors (STCs) for selecting minimum overlapping clones for genome sequencing (Venter et al. 1996).

The generation of >73,000 Nipponbare STCs also provides an opportunity to preview TE content and distribution in rice genome. The current STC library contains ∼48 Mb of rice genomic DNA after vector removal, with an average sequence read of 707 nucleotides. With an average insert of 128.5 kb, the CUGI rice BAC library is expected to cover ∼10 rice genome equivalents. Preliminary efforts to confirm the coverage of the library based strictly on sequence comparison of the STCs to finished rice BACs have shown that the estimated coverage is ∼10.4 genome equivalents (data not shown). Assuming that theHindIII sites are evenly distributed, our 73,000 STCs should be distributed one STC every 9 kb across the 430-Mb rice genome.

TEs are one of the major sources of repetitive sequences in cereal plants and have been a concern of the IRGSP as a potential source of problems in completing the rice genome sequence. Here, we report the TE content of the STC database and show that the rice genome probably contains a small fraction of TEs in comparison with other cereal genomes, such as maize. The small amount of TEs confirms rice as a well-chosen model crop genome. We note the discovery of several potentially novel TEs, and we investigate the location of TE–STCs on the current physical map of the CUGI rice BAC library. We find that the TEs appear to be randomly distributed with respect to potential genes, identified by similarity to rice ESTs.

Previous Section Next Section

RESULTS

TE Content of STC Library

To analyze the number and types of TE-like elements present in the STC database, we used FASTX (Pearson et al. 1997) to compare 73,362 BAC end sequences (STCs) with a set of 1358 TE protein sequences. At an expectation cut-off value of 10⁻⁵ or less, 6848 STCs were found to contain regions of homology to known transposable elements. The vast majority of STCs (6027) are homologous to retrotransposons, whereas the remaining 821 are homologous to various transposases of class II transposons (Table 1). STCs homologous to retrotransposons were further classified as Gypsy-like (4124),Copia-like (1401), and non-LTR (502) on the basis of classification of the most similar protein sequences. To assess the accuracy of our retrotransposon classification, we used TFASTX to search the STC database with protein sequences of representativeGypsy (rice RIRE2), Copia (maize Hopscotch), and non-LTR (rice CAA73800) retrotransposons as query sequences. For all three searches, we found a total of 1959 STCs with significant similarity (E<10⁻⁵). Divided by retrotransposon classification, the proportions of STCs identified in each class for both the FASTX and TFASTX searches were nearly identical (Fig.1).

View this table:

Table 1.

Transposable Element Content of the Rice STC Database

View larger version:

Download as PowerPoint Slide

Figure 1.

Classification of retrotransposons identified by FASTX and TFASTX searches. Fractions shown are percentages of total retrotransposon-containing STCs. FASTX searches were conducted using the rice STC database as queries to search the 1358-member TE database. Classification as gypsy, copia, or non-LTR was made on the basis of the most similar transposable element protein sequence. TFASTX searches were conducted using Gypsy-like rice RIRE2 (BAA84458, 1397 homologous STCs), Copia-likeHopscotch from maize (T02087, 528 homologous STCs), and a rice non-LTR LINE (CAA73800, 119 homologous STCs) as queries to search the STC database.

As a control, we performed an identical survey on 16,360Arabidopsis STCs sequenced by Genoscope (http://www.genoscope.cns.fr/externe/arabidopsis/data/bac_ends) and compared the results from both species with the publicly available chromosomal sequences. In our FASTX survey of the ArabidopsisSTCs, we found 1197 and 143 STCs homologous to retroelements and transposases, respectively. Although the actual numbers differ, the proportions of TEs in the rice and Arabidopsis STC databases are nearly the same, with 8.2% of the Arabidopsis STCs and 9.3% of the rice STCs showing homology to a TE. Within each species, retroelements account for 89.3% of Arabidopsis TE–STCs and 88.0% of rice TE–STCs (Fig. 2). The TE content of the chromosomal sequences from each plant shows slightly different proportions. The annotation of Arabidopsis chromosome 2 identified 563 TEs with 404 (71.7%) retroelements (Lin et al. 1999). Similarly, a survey of a 1-Mb PAC contig from rice chromosome 1 sequenced by the Rice Genome Research Program (http://www.dna.affrc.go.jp:82/genomicdata/GenomeFinished.html) revealed 68 unique regions homologous to TEs in TFASTX searches with the proteins of our 1358-member TE database. Of these 68 unique TE-like regions, 66.1% are homologous to retroelements (Fig. 2).

View larger version:

Download as PowerPoint Slide

Figure 2.

Proportions of retroelements found in redundant STCs, nonredundant STCs, and genomic sequences from Arabidopsis and rice. Transposable element homologies were identified as described in text. Classification of Arabidopsis chromosome 2 transposable elements was obtained from the chromosomal annotation (Lin et al. 1999). Total observed homologs are as follows: Nonredundant STCs: 350 rice transposases, 2754 rice retroelements; 101 Arabidopsistransposases, 628 Arabidopsis retroelements. Redundant STCs: 821 rice transposases, 6027 rice retroelements; 143Arabidopsis transposases, 1197 Arabidopsisretroelements. Genomic DNA: 23 rice transposases, 45 rice retroelements; 159 Arabidopsis transposases, 404Arabidopsis retroelements.

On the basis of these results, it is clear that the proportions of retroelements present in both the Arabidopsis and rice STC databases are slightly higher than preliminary estimates of the actual genomic content. The over-representation of retroelements is not likely to be the result of errors in the FASTX analysis, as the TEs of the 1-Mb rice PAC contig was analyzed in a similar way (TFASTX) and also showed a lower proportion of retroelements than identified in the rice STCs. Further, if we eliminate STC redundancy by examining only STCs that are <95% identical to each other, we find 729 TE–STCs inArabidopsis (628 of which are retroelements) and 3104 TE–STCs in rice (2754 of which are retroelements). In both the redundant and nonredundant STC analyses, the ratio of retroelements to transposases is ∼9 to 1 (Fig. 2). Thus, the over-representation of retroelements appears to be inherent to both STC databases and may be due to cloning-site bias.

Novel TE Subfamilies in Rice STCs

Despite the over-representation of retroelements in the rice STCs, the current theoretical density of 1 STC every 9 kb across the rice genome affords us many possibilities to observe STCs homologous to TEs unknown previously or rarely discovered in rice. We found STCs homologous to maize Activator, En/Spm, andMutator transposons as well as Mariner transposons and pararetrovirus coat proteins. Phylogenetic analyses of these sequences revealed two separate subfamilies of Activator, several subfamilies of Mariner paralogs in various plants, and a potentially novel endogenous pararetrovirus in rice.

Activator

We found 75 STCs with homology to maize Ac ORF1, but no STCs homologous to Ac ORF2. A Fitch-Margoliash (1967) protein phylogeny of Activator ORF1 sequences, including two riceActivator homologs identified in the STC database, showed two separate paralogs of Activator present in rice (Fig.3A). Rice STC OSJNBa0076F14f is probably a rice ortholog of Activator, because the branching pattern of maize, pearl millet, and rice is the same as would be expected from a species phylogeny (Macrae et al. 1990, 1994; Paterson et al. 1996). Clearly, the rice STC OSJNBa0005B04f is a paralog of Activator and may have diverged from the line leading to Activator and snapdragon Tam3 early in plant evolution.

View larger version:

Download as PowerPoint Slide

Figure 3.

Phylogenies of TE homologs in the rice STC database. All phylogenies were constructed using the Fitch-Margoliash (1967) method. (A) Phylogeny of Activator-like protein sequences, derived from a partial-length multiple sequence alignment of 197 amino acids. Sequences from top to bottom are maizeActivator (P08770), pearl millet Activator homolog (1091678), rice STC OSJNBa0076F14f, snap dragon Tam3 (S13518),Arabidopsis putative transposase (AAD24567), rice STC (OSJNBa0005B04f), and human putative transposon (NP_004720). Translations of rice STCs were obtained from TFASTX alignments of maizeActivator (P08770) with the STC database. (B) Phylogeny of Mariner-like protein sequences, derived from a partial-length multiple sequence alignment of 107 amino acids. Sequences from top to bottom are Arabidopsis thaliana genome survey sequence 1851xb.lb4 (AF005799),Medicago truncatula genome survey sequence 14-E-22–029 from the Crop Biotechnology Center, Texas A & M University (AQ841462), rice STCs OSJNBa0034B17f and OSJNBa0063J06f, soybean Marinerelement soymar1 (AAC28384), and flatworm Girardia tigrina mariner-3 (CAA56859). Translations of rice STCs and other genome survey sequences were obtained from TFASTX alignments ofsoymar1 with the rice STC database and GenBank. (C) Phylogeny of pararetrovirus coat protein sequences, derived from a partial-length multiple sequence alignment of 220 amino acids. Sequences are from rice tungro bacilliform virus (RTBV, AAD30194), banana streak virus (BSV, CAA05264), cacao swollen shoot virus (CSSV, AAA03171), Commelina yellow mottle virus (CYMV, S11479), sugarcane bacilliform virus (SBV, S27938), and rice STC OSJNBa0074G14r. Translation of rice STC was obtained from a TFASTX alignment with CYMV protein 3 (S11479).

En-Spm/Tam1

We found 324 STCs homologous to the TNP2 protein fromAntirrhinum TAM1 transposon (CAA40555), making it the most abundant class II transposon in the STC database. Over-representation could occur, as TNP2 is 752 amino acids, and multiple STCs from the same genomic element may align to different regions of the TNP2 query. Nevertheless, the large quantity of TNP2 homologs implies that rice genome contains a substantial amount of En-Spm/Tam1-like transposons, even though no activity of En/Spm elements has been detected in rice so far.

Mutator

A total of 122 STCs were found to be homologous to the maizemudrA gene product, suggesting that the rice genome may contain Mutator-like elements; however, the most similar STC (OSJNBa0036C06f) is only 55.8% identical in a 238-amino acid alignment. The previously known rice mudrA homologOs-MuDR (AB012392, Yoshida et al. 1998) is also not present in our STC database (the closest match is only 47.5% identical over a 120-amino acid alignment). Together, these results imply the presence of a number of mudrA paralogs in the rice genome.

Mariner

Five STCs were identified as homologous to the soybeanmariner-like transposon soymar1 (AAC28384). A Fitch-Margoliash protein phylogeny of translations of these STC sequences together with other plant mariner homologs identified from GenBank reveals that the rice STCs are probably not orthologous to soymar1 (Fig. 3B). From the phylogeny, it appears that soymar1 and the other plant mariner-like elements diverged early in plant evolution. A minimum of twomariner paralogs appear in the rice STCs alone, and, if they are orthologous to each other, the Arabidopsis andMedicago genome survey sequences shown in the phylogeny comprise a fourth plant paralog of Mariner. During the preparation of this work, several mariner-like sequences have been identified and annotated in rice genomic sequences (AF172282,AP000837, AP000836); although to our knowledge, this is only the second published report of a monocot mariner homolog (Tarchini et al. 2000).

Pararetrovirus coat proteins

Although technically not TEs, fragments of a unique pararetrovirus sequence found in the tobacco genome (TPVL) interspersed at an estimated frequency of 10³ per diploid genome (Jakowisch et al 1999). Jakowisch et al. suggest that a special mechanism of pararetrovirus dispersion and integration is sustaining such an unusually high copy number in the tobacco genome. To assess whether similar pararetrovirus-like sequences exist in the rice genome, we compared 36 pararetrovirus protein sequences with the rice STC database using TFASTX. The results showed that only three STCs are homologous to a pararetrovirus coat protein sequence found in Commelina yellow mottle virus, rice tungro bacilliform virus, and banana streak virus. Further, a multiple sequence alignment (data not shown) revealed that these three were most likely from the same element that integrated at minimum three times in the genome. The very low frequency of these homologs suggests that pararetrovirus-like sequences, such as TPVL, are not present in the rice genome; however, a Fitch-Margoliash protein phylogeny of these coat proteins (Fig. 3C) shows that the rice STC sequence is most similar to the coat protein sequence from rice tungro bacilliform virus but is not identical. This divergence may have resulted from a very ancient integration of the protein sequence of the tungro bacilliform virus, or the existence of an unknown rice pararetrovirus that is distantly related to the tungro bacilliform virus.

Miniature Inverted-repeat Transposable Elements

The first reported MITEs were the maize Tourist andStowaway families (Bureau and Wessler 1992, 1994a,b), which were subsequently reported in rice (Bureau et al 1996; Song et al. 1998). To identify MITEs in the rice STC database, a FASTA search (Pearson and Lipman 1988) was performed against the STC database by use of 23 known MITEs as query sequences (Bureau et al. 1996; Song et al. 1998). Because DNA—DNA sequence comparisons detect distant homology relationships poorly (States et al 1991; Pearson 1997), the sequence of the lowest-scoring significant STC with a full-length alignment to a known MITE was also used as a query in a second FASTA search of the rice STC database. Even so, the total number of MITEs was almost certainly underestimated and should be considered as a minimum only.

A total of 2746 STCs were found to contain various MITES as shown in Table 2. Several rice MITEs were represented abundantly, with nine MITEs showing homology to >100 STCs. The most abundant MITE in the rice STC database is Truncator, with 491 unique homologous STCs, followed by Tourist with 391 homologs, and Wanderer with 353 homologs. The two least frequent MITEs in the STC database are Krispie (no STC homolog) andPop (11 STCs). Interestingly, apart from maizeTourist and Stowaway, no non-rice MITEs were present in our STC database. Searches with bell pepper Alien (X87869),Medicago Bigfoot (AJ237732), maize Heartbreaker(transcribed from Zhang et al. 2000), and sorghum S-1,S-2, and S-3 (annotated in AF010283) showed no homologous STCs. Furthermore, MITEs that were first discovered in African Oryza species (Crackle, Krispie,Pop, and Snap from O. longistaminata andp-SINE1 from O. glaberrima) appear to occur with less frequency than other rice MITEs. Whereas known Oryza sativaMITEs occur with an average number of 222.6, non-sativa MITE occur with an average number of only 15. The lack of most of the non-rice MITEs and the biased representation of non-sativaMITEs in the STC database strongly supports a species-specific distribution for MITEs.

View this table:

Table 2.

MITEs Identified in the STC Database by FASTA Searches

Bureau and Wessler (1994a) have noted that the MITE Touristappears to be associated with genes in maize, rice, and sorghum; however, their sample size was very low. Recent work on the maizeHeartbreaker element confirms that these MITEs also appear to be associated with genes (Zhang et al. 2000). To ascertain whether this positional bias of MITEs extends to all MITEs in the rice genome, we used BLASTN (Altschul et al. 1997) to compare the rice STC library with the TIGR Rice Gene Index (OGI; Quackenbush et al. 2000). Our results show that 48.3% of MITE-containing STCs (MITE–STCs) are also homologous to a sequence in OGI (BLASTN E<10⁻⁷); whereas only 11.5% of MITE-lacking STCs show homology to an OGI sequence. This bias is more remarkable when one considers the average length of the STCs; when an STC shows homology to both an OGI and a MITE sequence, the MITE must be within only a few hundred nucleotides of the transcription region.

Broken down by MITE, we find a surprising variation of gene positioning among the different MITE families (Fig. 4). Only 10.5% of 181 Explorer-containing STCs are also homologous to an OGI sequence, but nearly every Stowaway-containing STC (95.8% of 166) is also homologous to an OGI sequence. It is impossible to say whether our results indicate that certain MITEs do not insert near genes in the rice genome or that some MITEs insert further than a few hundred nucleotides from the transcription region. In either case, our results clearly demonstrated that the association of MITEs with genes is not uniform among different MITEs.

View larger version:

Download as PowerPoint Slide

Figure 4.

MITEs are differentially associated with ESTs. Percentage of MITE-containing rice STCs that also show homology to a sequence in the Rice Gene Index (BLASTN E-value<10⁻⁶), displayed by MITE type. Only MITEs with >100 STC homologs are shown.

Rice TEs Are Not Clustered

TEs in plants with small genomes such as Arabidopsis(∼130 Mb) were shown clustered only at the pericentromeric regions (Lin et al. 1999; Mayer et al. 1999). Similarly, Ty3/Gypsy-related DNA fragment from sorghum has been shown present in centromeres of sorghum, wheat, maize, and rye (Miller et al. 1998), and several centromeric repeats from the rice cultivar Indica are also retroelement-related (Dong et al. 1998). On the other hand, in grasses with large genomes such as maize (∼2500 Mb), retrotransposons can be clustered along the chromosomes, inserting between the genes (SanMiguel et al. 1996, 1998). Recent work has shown that the large size of maize genome is largely due to retroelements that have inserted in the last 6 million years (SanMiguel et al. 1998). To analyze possible positional bias of TEs in the rice genome, we mapped our TE–STCs onto the physical map contigs assembled at CUGI. Presently, the CUGI physical map consists of 73,728 clones in 1018 contigs (G. Presting and R. Wing, unpubl.). To estimate gene location, we have mapped EST-containing STCs to this map as well.

We identified EST matches using BLASTN to search the rice gene index (OGI) as described above. STC matches from both the OGI and TE database searches were associated with their physical contigs, and the TE and EST contents of each contig were examined. If TEs were positioned in the rice genome away from genes, we would expect to see a negative correlation between TE and EST content of the physical map contigs, but our results show no correlation whatsoever (Fig. 5). This implies that the TEs and genes of the rice genome appear to be randomly distributed.

View larger version:

Download as PowerPoint Slide

Figure 5.

A scatterplot of the EST and TE contents on 1018 rice contigs. TE homologs in the STC database were identified by FASTX searches (E<10⁻⁵), as described in text. EST homologs in the STC database were identified by BLASTN searches of the Rice gene index (E<10⁻⁶) using STCs as queries.

Previous Section Next Section

DISCUSSION

The TE Compositions in the Rice Genome

We analyzed the TE content in 73,362 STC sequences by a protein homology search of each STC against a set of 1358 TE proteins downloaded from GenBank. A total of 6848 STCs were found to contain regions homologous to the known TEs, representing 9.3% of the STCs in the rice STC database. In contrast to a survey of the TEs on a 1-Mb PAC contig from chromosome 1, our TE–STCs were primarily retroelements (88.0%). The TEs on the 1-Mb PAC contig were only 66.1% retrotransposon. The over-representation of retrotransposons in the rice STCs is not due to the redundancy of the rice database, and curiously enough, is also observable in 16,360 ArabidopsisSTCs. Nevertheless, counting MITE, retrotransposon, and transposon alignments with the redundant STCs, we find that the TE–STCs discussed in this paper cover 2.2 Mb of genomic DNA, only 4.5% of the total sequenced nucleotides. Although the actual number of TEs will remain unclear until the whole rice genome is sequenced, our present analysis shows that TE content of the rice genome is probably <10%.

Our FASTX survey of the rice STCs also revealed that almost all known TEs are present in the rice genome. Sequences of 821 STCs were homologous to class II TEs, including maize Activator,En/Spm, and Mutator. Transposons that are rarely known in plants, such as mariner, were also present in the STC database. Phylogenetic analyses of the mariner elements identified in this study reveal the existence of multiple subfamilies of mariner in plants. We also identified what appears to be a novel variety of rice tungro bacilliform virus, which appears to be endogenous to the rice genome.

Our results also show the abundance of MITEs in the rice STC database. We found 2746 STCs that contain regions homologous to known MITEs. Some MITEs, such as maize Stowaway, are found in numerous species of plants, including both monocots and dicots (Bureau and Wessler 1994b), but our results clearly show a species-specific distribution of many MITE sequences. MITEs first identified in African rice species are present in only low copy numbers in the Nipponbare STC database. Furthermore, we also showed that the gene-preferring insertion bias of some MITEs may not be universal to all MITEs. Although bothExplorer and Stowaway MITEs were found in >100 STCs, only 10.5% of Explorer-containing STCs compared with 98.5% of Stowaway-containing STCs were found to also contain regions homologous to a sequence in the rice gene index, indicating the presence of a gene. This difference may be due to true insertion bias of Explorer and Stowaway, positional bias (Explorer inserts near genes but far enough from the transcript to be undetectable in the STC database), or a representation bias in the rice gene index (Explorer inserts near genes that are transcribed infrequently and thus unlikely to be detected in an EST survey). In any case, our results clearly show the usefulness of MITEs for gene discovery as nearly half (48.3%) of the MITEs identified in the STC database were within a few hundred nucleotides from transcription regions. MITEs may be especially important for crop plants with large genomes, such as maize, barley, and wheat, for which no large-scale genome-sequencing project will be attempted in the near future.

The Distribution of TE–STCs Across Rice Genome and Implications for Genome Sequencing

The completion of two Arabidopsis chromosomes (2 and 4) for the first time provides insight into the physical distribution of TEs along higher plant chromosomes (Lin et al. 1999; Mayer et al. 1999).Arabidopsis TEs are mainly clustered around the centromeres. Clusters of retrotransposons have been reported in the intergenic regions on the maize chromosomes where retrotransposons constitute up to 50% of the genome (SanMiguel et al. 1996). Although 340 kb of genomic DNA surrounding the Adh1 gene from rice has been analyzed, the insertion of large clusters of retroelements was not observed in the rice intergenic regions (Tarchini et al. 2000). Our analysis of the physical location of 6848 TE–STCs did not reveal obvious TE clustering regions in 1018 physical map contigs, confirming the results ofTarchini et al. (2000).

The STC strategy to identify a minimum tile of large-insert clones for genome sequencing has been applied to the human andArabidopsis genome projects (Venter et al. 1996) and has proven to be highly effective (Kelley et al. 1999; Siegel et al. 1999). The low content of TEs in the STC database and their apparent random distribution on the physical map both confirm the quality of the rice genome as a model crop genome. The lack of large blocks of known retrotransposons, which require painstaking effort to resolve during sequence assembly, is good news for the rice genome sequencing community. With the international rice genome project now on track, a complete assay of the sequence composition and organization of rice genome will soon become reality and will provide a more lucid picture of the role of transposable elements in the genome evolution of rice and related cereals.

Previous Section Next Section

METHODS

BAC End Sequencing

A total of 4 μl of BAC culture in LB freezing medium was inoculated into 4 ml of LB medium containing chloramphenicol and incubated for 20 hr at 37°C. BAC DNA was isolated using the Autogen 740 (Integrated Separation System) according to the manufacturer's instructions. DNA pellets were resuspended in 25 μl of 1 mm Tris.HCl (pH 7.5). A total of 20 μl were used as the template for sequencing reactions in a total volume of 30 μl (5 μl of ABI Big Dye (Perkin Elmer); 50 pmole primer; 1.75 μl sequencing buffer containing 800 mm Tris.HCl (pH 9.0) and 20 mm MgCl₂; 2.25 μl dH₂O). Cycle sequencing reactions were performed as one cycle for 4 min at 95°C, followed by 70 cycles of 15 sec at 95°C, 10 sec at 51°C, and 4 min at 60°C. Cycle-sequencing products were precipitated with ethanol containing 1/3 volume of 7.5 m NH₄OAc and run on ABI377 automatic sequencers. The sequence traces were then transferred to a Sun workstation and base called by Phred, and vector sequences were masked by CROSS_MATCH software packages (Ewing and Green 1998).

Sequence and Statistical Analysis

FASTX (Pearson et al 1997) was used to compare all Nipponbare STCs with a database of 1358 transposable-element protein sequences obtained from GenBank, by use of batch Entrez. Additional transposable elements were detected by FASTA searches (Pearson and Lipman 1988) of the STC database using known MITEs as queries and by TFASTX (Pearson et al. 1997) searches using pararetrovirus protein sequences as queries. For phylogenetic analysis, CLUSTALW (Thompson et al 1994) was used to generate multiple sequence alignments, and the PROTDIST and FITCH programs of the PHYLIP package (Felsenstein 1993) were used to estimate sequence distances and phylogenies, respectively. For all alignments used in phylogenies, translations of the STCs were derived from FASTX alignments and end gaps were trimmed. Statistics were calculated using Splus version 5. All FASTA, FASTX, and TFASTX searches were run on a Dell PowerEdge2300 server running LINUX 6.1; all other software were run on a Sun Ultra30 running Solaris 2.6. The complete CUGI STC database is available at ftp.genome.clemson.edu.

Previous Section Next Section

Acknowledgments

We thank the staff of the CUGI BAC/EST Resource, Sequencing, Physical Mapping, and Bioinformatics Centers for supplying the resources and generating and processing the sequence data used for this analysis. We especially thank Dr. P. San Miguel for sharing insights on cereal transposable elements and his critical reading of an earlier version of the manuscript and Mr. R. Kingsburry III for his help with initial computer analyses. This work was funded in part by grants from Novartis, NSF-MRI # 9724557 to R.A.W. and R.A.D., NSF Plant Genome # DBI-987276 to R.A.W., R.A.D., M.S., and D.F., and the Rockefeller Foundation RF98001#630 and the Coker Endowed Chair to R.A.W. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.

Previous Section Next Section

Footnotes

Present addresses: ³Orion Genomics, St. Louis, Missouri 63108 USA; ⁴Department of Agronomy, Konkuk University, Seoul, South Korea 143-701, Korea; ⁵Institute for Computational Genomics, 110 Clemson, South Carolina 29631 USA; ⁶Department of Plant Pathology, North Carolina State University, Raleigh, North Carolina 27606 USA.
↵7 Corresponding author.
E-MAIL rwing{at}clemson.edu; FAX (864) 656–4293.
- Received January 14, 2000.
- Accepted May 17, 2000.
Cold Spring Harbor Laboratory Press

Previous Section

REFERENCES

↵
1. Altschul S.F.,
2. Madden T.L.,
3. Schäffer A.A.,
4. Zhang J.,
5. Zhang Z.,
6. Miller W.,
7. Lipman D.J.
(1997) Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.
Abstract/FREE Full Text
↵
1. Budiman M.A.
(1999) “Construction and characterization of deep coverage BAC libraries for two model crops: Tomato and rice, and initiation of a chromosome walk to jointless -2 in tomato”. Ph.D. thesis (Texas A & M University, College Station, TX).
Google Scholar
↵
1. Bureau T.E.,
2. Wessler S.R.
(1992) Tourist: A large family of small inverted repeat elements frequently associated with maize genes. Plant Cell 4:1283–1294.
Abstract/FREE Full Text
↵

(1994a) Mobile inverted-repeat elements of the Tourist family are associated with the genes of many cereal grasses. Proc. Natl. Acad. Sci. 91:1411–1415, ibid.

Abstract/FREE Full Text
↵

(1994b) Stowaway: A new family of inverted repeat elements associated with the genes of both monocotyledonous and dicotyledonous plants. Plant Cell 6:907–916, ibid.

Abstract/FREE Full Text
↵
1. Bureau T.E.,
2. Ronald P.C.,
3. Wessler S.R.
(1996) A computer-based systematic survey reveals the predominance of small inverted-repeat elements in wild-type rice genes. Proc. Natl. Acad. Sci. 93:8524–8529.
Abstract/FREE Full Text
1. Burge D.E.,
2. Howe M.M.
(1989) Mobile DNA. (American Society for Microbiology, Washington, D.C.).
Google Scholar
↵
1. Dong F.,
2. Miller J.T.,
3. Jackson S.A.,
4. Wang G.L.,
5. Ronald P.C.,
6. Jiang J.
(1998) Rice (Oryza sativa) centromeric regions consist of complex DNA. Proc. Natl. Acad. Sci. 95:8135–8140.
Abstract/FREE Full Text
↵
1. Ewing B.,
2. Green P.
(1998) Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8:186–194.
Abstract/FREE Full Text
↵
1. Federoff N.V.
1. Burge D.E.,
2. Howe M.M.
(1989) Maize transposable elements. in Mobile DNA, eds Burge D.E., Howe M.M.(American Society for Microbiology, Washington, D.C.) pp 375–411.
Google Scholar
↵
1. Felsenstein J.
(1993) PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. (Department of Genetics, University of Washington, Seattle, WA).
Google Scholar
↵
1. Fitch W.M.,
2. Margoliash E.
(1967) Construction of phylogenetic trees. Science 155:279–284.
FREE Full Text
↵
1. Flavell R.B.,
2. Bennett M.D.,
3. Smith J.B.,
4. Smith D.B.
(1974) Genome size and the proportion of repeated nucleotide sequence DNA in plants. Biochem. Genet. 12:257–269.
CrossRef Medline Google Scholar
↵
1. Gale M.D.,
2. Devos K.M.
(1998) Plant comparative genetics after 10 years. Science 282:656–659.
Abstract/FREE Full Text
↵
1. Hirochika H.,
2. Fukuchi A.,
3. Kikuchi F.
(1992) Retrotransposon families in rice. Mol. Gen. Genet. 233:209–216.
CrossRef Medline Google Scholar
↵
1. Kelley J.M.,
2. Field C.E.,
3. Craven M.B.,
4. Bocskai D.,
5. Kim U.J.,
6. Rounsley S.D.,
7. Adams M.D.
(1999) High throughput direct end sequencing of BAC clones. Nucleic Acids Res. 27:1539–1546.
Abstract/FREE Full Text
↵
1. Jakowisch J.,
2. Mette M.F.,
3. van der Winden J.,
4. Matzke M.A.,
5. Matzke A.J.M.
(1999) Integrated pararetroviral sequences define a unique class of dispersed repetitive DNA in plants. Proc. Natl. Acad. Sci. 96:13241–13246.
Abstract/FREE Full Text
↵
1. Kumekawa N.,
2. Ohtsubo H.,
3. Horiuchi T.,
4. Ohtsubo E.
(1999) Identification and characterization of novel retrotransposons of the gypsy type in rice. Mol. Gen. Genet. 260:593–602.
CrossRef Medline Google Scholar
↵
1. Lin X.,
2. Kaul S.,
3. Rounsley S.,
4. Shea T.P.,
5. Benito M.-I.,
6. Town C.D.,
7. Fuji C.Y.,
8. Mason T.,
9. Bowman C.L.,
10. Barnstead M.,
11. et al.
(1999) Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana. Nature 402:761–8.
CrossRef Medline Google Scholar
↵
1. MacRae A.F.,
2. Learn G.H., Jr.,
3. Karjala M.,
4. Clegg M.T.
(1990) Presence of an Activator (Ac)-like sequence in Pennisetum glaucum (pearl millet). Plant Mol. Biol. 15:177–179.
CrossRef Medline Google Scholar
↵
1. MacRae A.F.,
2. Huttley G.A.,
3. Clegg M.T.
(1994) Molecular evolutionary characterization of an Activator (Ac)-like transposable element sequence from pearl millet (Pennisetum glaucum) (Poaceae). Genetica 92:77–89.
CrossRef Medline Google Scholar
↵
1. Mayer K.,
2. Schuller C.,
3. Wambutt R.,
4. Murphy G.,
5. Volckaert G.,
6. Pohl T.,
7. Dusterhoft A.,
8. Stiekema W.,
9. Entian K.D.,
10. Terryn N.,
11. et al.
(1999) Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana. Nature 402:769–77.
CrossRef Medline Google Scholar
↵
1. Miller J.T.,
2. Dong F.,
3. Jackson S.A.,
4. Song J.,
5. Jiang J.
(1998) Retrotransposon-related DNA sequences in the centromeres of grass chromosomes. Genetics 150:1615–1623.
Abstract/FREE Full Text
↵
1. Motohashi R.,
2. Ohtsubo E.,
3. Ohtsubo H.
(1996) Identification of Tnr3, a suppressor-mutator/enhancer-like transposable element from rice. Mol. Gen. Genet. 250:148–52.
Medline Google Scholar
↵
1. Paterson A.H.,
2. Lan T.H.,
3. Reischmann K.P.,
4. Chang C.,
5. Lin Y.R.,
6. Liu S.C.,
7. Burow M.D.,
8. Kowalski S.P.,
9. Katsar C.S.,
10. DelMonte T.A.,
11. et al.
(1996) Toward a unified genetic map of higher plants, transcending the monocot-dicot divergence. Nat. Genet. 14:380–382.
CrossRef Medline Google Scholar
↵
1. Pearson W.R.
(1997) Identifying distantly related protein sequences. Comp. Appl. Biosci. 13:325–332.
FREE Full Text
↵
1. Pearson W.R.,
2. Lipman D.J.
(1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. 85:2444–2448.
Abstract/FREE Full Text
↵
1. Pearson W.R.,
2. Wood T.,
3. Zhang Z.,
4. Miller W.
(1997) Comparison of DNA sequences with protein sequences. Genomics 46:24–36.
CrossRef Medline Google Scholar
↵
1. Quackenbush J.,
2. Liang F.,
3. Holt I.,
4. Pertea G.,
5. Upton J.
(2000) TIGR Gene Indices: Reconstruction and representation of expressed gene sequences. Nucleic Acids Res. 28:141–145.
Abstract/FREE Full Text
↵
1. SanMiguel P.,
2. Tikhonov A.,
3. Jin Y.K.,
4. Motchoulskaia N.,
5. Zakharov D.,
6. Melake-Berhan A.,
7. Springer P.S.,
8. Edwards K.J.,
9. Lee M.,
10. Avramova Z.,
11. Bennetzen J.L.
(1996) Nested retrotransposons in the intergenic regions of the maize genome. Science 274:765–768.
Abstract/FREE Full Text
↵
1. SanMiguel P.,
2. Gaut B.S.,
3. Tikhonov A.,
4. Nakajima Y.,
5. Bennetzen J.L.
(1998) The paleontology of intergene retrotransposons of maize. Nature Genetics 20:43–45.
CrossRef Medline Google Scholar
↵
1. Siegel A.F.,
2. Trask B.,
3. Roach J.C.,
4. Mahairas G.G.,
5. Hood L.,
6. van den Engh G.
(1999) Analysis of sequence-tagged-connector strategies for DNA sequencing. Genome Res. 9:297–307.
Abstract/FREE Full Text
↵
1. Song W.Y.,
2. Pi L.Y.,
3. Bureau T.E.,
4. Ronald P.C.
(1998) Identification and characterization of 14 transposon-like elements in the noncoding regions of members of the Xa21 family of disease resistance genes in rice. Mol. Gen. Genet. 258:449–456.
CrossRef Medline Google Scholar
↵
1. States D.J.,
2. Gish W.,
3. Altschul S.F.
(1991) Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Methods 3:66–70.
Google Scholar
↵
1. Tarchini R.,
2. Biddle P.,
3. Wineland R.,
4. Tingey S.,
5. Rafalski A.
(2000) The complete sequence of 340 kb of DNA around the rice Adh1-Adh2 region reveals interrupted colinearity with maize chromosome 4. Plant Cell 12:381–391.
Abstract/FREE Full Text
↵
1. Thompson J.D.,
2. Higgins D.G.,
3. Gibson T.J.
(1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.
Abstract/FREE Full Text
↵
1. Venter J.C.,
2. Smith H.O.,
3. Hood L.
(1996) A new strategy for genome sequencing. Nature 381:364–366.
CrossRef Medline Google Scholar
↵
1. Wessler S.R.,
2. Bureau T.E.,
3. White S.E.
(1995) LTR-retrotransposons and MITEs: Important players in the evolution of plant genomes. Curr. Opin. Genet. Dev. 5:814–21.
CrossRef Medline Google Scholar
↵
1. Xiong Y.,
2. Eickbush T.H.
(1990) Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353–3362.
Medline Google Scholar
↵
1. Yoshida S.,
2. Tamaki K.,
3. Watanabe K.,
4. Fujino M.,
5. Nakamura C.
(1998) A maize MuDR-like element expressed in rice callus subcultured with proline. Hereditas 129:95–99.
CrossRef Medline Google Scholar
↵
1. Zhang Q.,
2. Arbuckle J.,
3. Wessler S.R.
(2000) Recent, extensive, and preferential insertion of members of the miniature inverted-repeat transposable element family Heartbreaker into genic regions in maize. Proc. Natl. Acad. Sci. 97:1160–1165.
Abstract/FREE Full Text

[1] ↵

Altschul S.F.,

Madden T.L.,

Schäffer A.A.,

Zhang J.,

Zhang Z.,

Miller W.,

Lipman D.J.

(1997) Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.

Abstract/FREE Full Text

[2] Altschul S.F.,

[3] Madden T.L.,

[4] Schäffer A.A.,

[5] Zhang J.,

[6] Zhang Z.,

[7] Miller W.,

[8] Lipman D.J.

[9] ↵

Budiman M.A.

(1999) “Construction and characterization of deep coverage BAC libraries for two model crops: Tomato and rice, and initiation of a chromosome walk to jointless -2 in tomato”. Ph.D. thesis (Texas A & M University, College Station, TX).

Google Scholar

[10] Budiman M.A.

[11] ↵

Bureau T.E.,

Wessler S.R.

(1992) Tourist: A large family of small inverted repeat elements frequently associated with maize genes. Plant Cell 4:1283–1294.

Abstract/FREE Full Text

[12] Bureau T.E.,

[13] Wessler S.R.

[14] ↵

(1994a) Mobile inverted-repeat elements of the Tourist family are associated with the genes of many cereal grasses. Proc. Natl. Acad. Sci. 91:1411–1415, ibid.

Abstract/FREE Full Text

[15] ↵

(1994b) Stowaway: A new family of inverted repeat elements associated with the genes of both monocotyledonous and dicotyledonous plants. Plant Cell 6:907–916, ibid.

Abstract/FREE Full Text

[16] ↵

Bureau T.E.,

Ronald P.C.,

Wessler S.R.

(1996) A computer-based systematic survey reveals the predominance of small inverted-repeat elements in wild-type rice genes. Proc. Natl. Acad. Sci. 93:8524–8529.

Abstract/FREE Full Text

[17] Bureau T.E.,

[18] Ronald P.C.,

[19] Wessler S.R.

[20] Burge D.E.,

Howe M.M.

(1989) Mobile DNA. (American Society for Microbiology, Washington, D.C.).

Google Scholar

[21] Burge D.E.,

[22] Howe M.M.

[23] ↵

Dong F.,

Miller J.T.,

Jackson S.A.,

Wang G.L.,

Ronald P.C.,

Jiang J.

(1998) Rice (Oryza sativa) centromeric regions consist of complex DNA. Proc. Natl. Acad. Sci. 95:8135–8140.

Abstract/FREE Full Text

[24] Dong F.,

[25] Miller J.T.,

[26] Jackson S.A.,

[27] Wang G.L.,

[28] Ronald P.C.,

[29] Jiang J.

[30] ↵

Ewing B.,

Green P.

(1998) Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8:186–194.

Abstract/FREE Full Text

[31] Ewing B.,

[32] Green P.

[33] ↵

Federoff N.V.

Burge D.E.,

Howe M.M.

(1989) Maize transposable elements. in Mobile DNA, eds Burge D.E., Howe M.M.(American Society for Microbiology, Washington, D.C.) pp 375–411.

Google Scholar

[34] Federoff N.V.

[35] Burge D.E.,

[36] Howe M.M.

[37] ↵

Felsenstein J.

(1993) PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. (Department of Genetics, University of Washington, Seattle, WA).

Google Scholar

[38] Felsenstein J.

[39] ↵

Fitch W.M.,

Margoliash E.

(1967) Construction of phylogenetic trees. Science 155:279–284.

FREE Full Text

[40] Fitch W.M.,

[41] Margoliash E.

[42] ↵

Flavell R.B.,

Bennett M.D.,

Smith J.B.,

Smith D.B.

(1974) Genome size and the proportion of repeated nucleotide sequence DNA in plants. Biochem. Genet. 12:257–269.

CrossRef Medline Google Scholar

[43] Flavell R.B.,

[44] Bennett M.D.,

[45] Smith J.B.,

[46] Smith D.B.

[47] ↵

Gale M.D.,

Devos K.M.

(1998) Plant comparative genetics after 10 years. Science 282:656–659.

Abstract/FREE Full Text

[48] Gale M.D.,

[49] Devos K.M.

[50] ↵

Hirochika H.,

Fukuchi A.,

Kikuchi F.

(1992) Retrotransposon families in rice. Mol. Gen. Genet. 233:209–216.

CrossRef Medline Google Scholar

[51] Hirochika H.,

[52] Fukuchi A.,

[53] Kikuchi F.

[54] ↵

Kelley J.M.,

Field C.E.,

Craven M.B.,

Bocskai D.,

Kim U.J.,

Rounsley S.D.,

Adams M.D.

(1999) High throughput direct end sequencing of BAC clones. Nucleic Acids Res. 27:1539–1546.

Abstract/FREE Full Text

[55] Kelley J.M.,

[56] Field C.E.,

[57] Craven M.B.,

[58] Bocskai D.,

[59] Kim U.J.,

[60] Rounsley S.D.,

[61] Adams M.D.

[62] ↵

Jakowisch J.,

Mette M.F.,

van der Winden J.,

Matzke M.A.,

Matzke A.J.M.

(1999) Integrated pararetroviral sequences define a unique class of dispersed repetitive DNA in plants. Proc. Natl. Acad. Sci. 96:13241–13246.

Abstract/FREE Full Text

[63] Jakowisch J.,

[64] Mette M.F.,

[65] van der Winden J.,

[66] Matzke M.A.,

[67] Matzke A.J.M.

[68] ↵

Kumekawa N.,

Ohtsubo H.,

Horiuchi T.,

Ohtsubo E.

(1999) Identification and characterization of novel retrotransposons of the gypsy type in rice. Mol. Gen. Genet. 260:593–602.

CrossRef Medline Google Scholar

[69] Kumekawa N.,

[70] Ohtsubo H.,

[71] Horiuchi T.,

[72] Ohtsubo E.

[73] ↵

Lin X.,

Kaul S.,

Rounsley S.,

Shea T.P.,

Benito M.-I.,

Town C.D.,

Fuji C.Y.,

Mason T.,

Bowman C.L.,

Barnstead M.,

et al.

(1999) Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana. Nature 402:761–8.

CrossRef Medline Google Scholar

[74] Lin X.,

[75] Kaul S.,

[76] Rounsley S.,

[77] Shea T.P.,

[78] Benito M.-I.,

[79] Town C.D.,

[80] Fuji C.Y.,

[81] Mason T.,

[82] Bowman C.L.,

[83] Barnstead M.,

[84] et al.

[85] ↵

MacRae A.F.,

Learn G.H., Jr.,

Karjala M.,

Clegg M.T.

(1990) Presence of an Activator (Ac)-like sequence in Pennisetum glaucum (pearl millet). Plant Mol. Biol. 15:177–179.

CrossRef Medline Google Scholar

[86] MacRae A.F.,

[87] Learn G.H., Jr.,

[88] Karjala M.,

[89] Clegg M.T.

[90] ↵

MacRae A.F.,

Huttley G.A.,

Clegg M.T.

(1994) Molecular evolutionary characterization of an Activator (Ac)-like transposable element sequence from pearl millet (Pennisetum glaucum) (Poaceae). Genetica 92:77–89.

CrossRef Medline Google Scholar

[91] MacRae A.F.,

[92] Huttley G.A.,

[93] Clegg M.T.

[94] ↵

Mayer K.,

Schuller C.,

Wambutt R.,

Murphy G.,

Volckaert G.,

Pohl T.,

Dusterhoft A.,

Stiekema W.,

Entian K.D.,

Terryn N.,

et al.

(1999) Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana. Nature 402:769–77.

CrossRef Medline Google Scholar

[95] Mayer K.,

[96] Schuller C.,

[97] Wambutt R.,

[98] Murphy G.,

[99] Volckaert G.,

[100] Pohl T.,

[101] Dusterhoft A.,

[102] Stiekema W.,

[103] Entian K.D.,

[104] Terryn N.,

[105] et al.

[106] ↵

Miller J.T.,

Dong F.,

Jackson S.A.,

Song J.,

Jiang J.

(1998) Retrotransposon-related DNA sequences in the centromeres of grass chromosomes. Genetics 150:1615–1623.

Abstract/FREE Full Text

[107] Miller J.T.,

[108] Dong F.,

[109] Jackson S.A.,

[110] Song J.,

[111] Jiang J.

[112] ↵

Motohashi R.,

Ohtsubo E.,

Ohtsubo H.

(1996) Identification of Tnr3, a suppressor-mutator/enhancer-like transposable element from rice. Mol. Gen. Genet. 250:148–52.

Medline Google Scholar

[113] Motohashi R.,

[114] Ohtsubo E.,

[115] Ohtsubo H.

[116] ↵

Paterson A.H.,

Lan T.H.,

Reischmann K.P.,

Chang C.,

Lin Y.R.,

Liu S.C.,

Burow M.D.,

Kowalski S.P.,

Katsar C.S.,

DelMonte T.A.,

et al.

(1996) Toward a unified genetic map of higher plants, transcending the monocot-dicot divergence. Nat. Genet. 14:380–382.

CrossRef Medline Google Scholar

[117] Paterson A.H.,

[118] Lan T.H.,

[119] Reischmann K.P.,

[120] Chang C.,

[121] Lin Y.R.,

[122] Liu S.C.,

[123] Burow M.D.,

[124] Kowalski S.P.,

[125] Katsar C.S.,

[126] DelMonte T.A.,

[127] et al.

[128] ↵

Pearson W.R.

(1997) Identifying distantly related protein sequences. Comp. Appl. Biosci. 13:325–332.

FREE Full Text

[129] Pearson W.R.

[130] ↵

Pearson W.R.,

Lipman D.J.

(1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. 85:2444–2448.

Abstract/FREE Full Text

[131] Pearson W.R.,

[132] Lipman D.J.

[133] ↵

Pearson W.R.,

Wood T.,

Zhang Z.,

Miller W.

(1997) Comparison of DNA sequences with protein sequences. Genomics 46:24–36.

CrossRef Medline Google Scholar

[134] Pearson W.R.,

[135] Wood T.,

[136] Zhang Z.,

[137] Miller W.

[138] ↵

Quackenbush J.,

Liang F.,

Holt I.,

Pertea G.,

Upton J.

(2000) TIGR Gene Indices: Reconstruction and representation of expressed gene sequences. Nucleic Acids Res. 28:141–145.

Abstract/FREE Full Text

[139] Quackenbush J.,

[140] Liang F.,

[141] Holt I.,

[142] Pertea G.,

[143] Upton J.

[144] ↵

SanMiguel P.,

Tikhonov A.,

Jin Y.K.,

Motchoulskaia N.,

Zakharov D.,

Melake-Berhan A.,

Springer P.S.,

Edwards K.J.,

Lee M.,

Avramova Z.,

Bennetzen J.L.

(1996) Nested retrotransposons in the intergenic regions of the maize genome. Science 274:765–768.

Abstract/FREE Full Text

[145] SanMiguel P.,

[146] Tikhonov A.,

[147] Jin Y.K.,

[148] Motchoulskaia N.,

[149] Zakharov D.,

[150] Melake-Berhan A.,

[151] Springer P.S.,

[152] Edwards K.J.,

[153] Lee M.,

[154] Avramova Z.,

[155] Bennetzen J.L.

[156] ↵

SanMiguel P.,

Gaut B.S.,

Tikhonov A.,

Nakajima Y.,

Bennetzen J.L.

(1998) The paleontology of intergene retrotransposons of maize. Nature Genetics 20:43–45.

CrossRef Medline Google Scholar

[157] SanMiguel P.,

[158] Gaut B.S.,

[159] Tikhonov A.,

[160] Nakajima Y.,

[161] Bennetzen J.L.

[162] ↵

Siegel A.F.,

Trask B.,

Roach J.C.,

Mahairas G.G.,

Hood L.,

van den Engh G.

(1999) Analysis of sequence-tagged-connector strategies for DNA sequencing. Genome Res. 9:297–307.

Abstract/FREE Full Text

[163] Siegel A.F.,

[164] Trask B.,

[165] Roach J.C.,

[166] Mahairas G.G.,

[167] Hood L.,

[168] van den Engh G.

[169] ↵

Song W.Y.,

Pi L.Y.,

Bureau T.E.,

Ronald P.C.

(1998) Identification and characterization of 14 transposon-like elements in the noncoding regions of members of the Xa21 family of disease resistance genes in rice. Mol. Gen. Genet. 258:449–456.

CrossRef Medline Google Scholar

[170] Song W.Y.,

[171] Pi L.Y.,

[172] Bureau T.E.,

[173] Ronald P.C.

[174] ↵

States D.J.,

Gish W.,

Altschul S.F.

(1991) Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Methods 3:66–70.

Google Scholar

[175] States D.J.,

[176] Gish W.,

[177] Altschul S.F.

[178] ↵

Tarchini R.,

Biddle P.,

Wineland R.,

Tingey S.,

Rafalski A.

(2000) The complete sequence of 340 kb of DNA around the rice Adh1-Adh2 region reveals interrupted colinearity with maize chromosome 4. Plant Cell 12:381–391.

Abstract/FREE Full Text

[179] Tarchini R.,

[180] Biddle P.,

[181] Wineland R.,

[182] Tingey S.,

[183] Rafalski A.

[184] ↵

Thompson J.D.,

Higgins D.G.,

Gibson T.J.

(1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.

Abstract/FREE Full Text

[185] Thompson J.D.,

[186] Higgins D.G.,

[187] Gibson T.J.

[188] ↵

Venter J.C.,

Smith H.O.,

Hood L.

(1996) A new strategy for genome sequencing. Nature 381:364–366.

CrossRef Medline Google Scholar

[189] Venter J.C.,

[190] Smith H.O.,

[191] Hood L.

[192] ↵

Wessler S.R.,

Bureau T.E.,

White S.E.

(1995) LTR-retrotransposons and MITEs: Important players in the evolution of plant genomes. Curr. Opin. Genet. Dev. 5:814–21.

CrossRef Medline Google Scholar

[193] Wessler S.R.,

[194] Bureau T.E.,

[195] White S.E.

[196] ↵

Xiong Y.,

Eickbush T.H.

(1990) Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353–3362.

Medline Google Scholar

[197] Xiong Y.,

[198] Eickbush T.H.

[199] ↵

Yoshida S.,

Tamaki K.,

Watanabe K.,

Fujino M.,

Nakamura C.

(1998) A maize MuDR-like element expressed in rice callus subcultured with proline. Hereditas 129:95–99.

CrossRef Medline Google Scholar

[200] Yoshida S.,

[201] Tamaki K.,

[202] Watanabe K.,

[203] Fujino M.,

[204] Nakamura C.

[205] ↵

Zhang Q.,

Arbuckle J.,

Wessler S.R.

(2000) Recent, extensive, and preferential insertion of members of the miniature inverted-repeat transposable element family Heartbreaker into genic regions in maize. Proc. Natl. Acad. Sci. 97:1160–1165.

Abstract/FREE Full Text

[206] Zhang Q.,

[207] Arbuckle J.,

[208] Wessler S.R.

Rice Transposable Elements: A Survey of 73,000 Sequence-Tagged-Connectors

Abstract

RESULTS

TE Content of STC Library

Novel TE Subfamilies in Rice STCs

Activator

En-Spm/Tam1

Mutator

Mariner

Pararetrovirus coat proteins

Miniature Inverted-repeat Transposable Elements

Rice TEs Are Not Clustered

DISCUSSION

The TE Compositions in the Rice Genome

The Distribution of TE–STCs Across Rice Genome and Implications for Genome Sequencing

METHODS

BAC End Sequencing

Sequence and Statistical Analysis

Acknowledgments

Footnotes

REFERENCES

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

Share

Preprint Server

Navigate This Article

Current Issue

In This Issue