Genome-wide A-to-I RNA editing in fungi independent of ADAR enzymes

Yeasts and filamentous fungi do not have adenosine deaminase acting on RNA (ADAR) orthologs and are believed to lack A-to-I RNA editing, which is the most prevalent editing of mRNA in animals. However, during this study with the PUK1 (FGRRES_01058) pseudokinase gene important for sexual reproduction in Fusarium graminearum, we found that two tandem stop codons, UA1831GUA1834G, in its kinase domain were changed to UG1831GUG1834G by RNA editing in perithecia. To confirm A-to-I editing of PUK1 transcripts, strand-specific RNA-seq data were generated with RNA isolated from conidia, hyphae, and perithecia. PUK1 was almost specifically expressed in perithecia, and 90% of transcripts were edited to UG1831GUG1834G. Genome-wide analysis identified 26,056 perithecium-specific A-to-I editing sites. Unlike those in animals, 70.5% of A-to-I editing sites in F. graminearum occur in coding regions, and more than two-thirds of them result in amino acid changes, including editing of 69 PUK1-like pseudogenes with stop codons in ORFs. PUK1 orthologs and other pseudogenes also displayed stage-specific expression and editing in Neurospora crassa and F. verticillioides. Furthermore, F. graminearum differs from animals in the sequence preference and structure selectivity of A-to-I editing sites. Whereas A's embedded in RNA stems are targeted by ADARs, RNA editing in F. graminearum preferentially targets A's in hairpin loops, which is similar to the anticodon loop of tRNA targeted by adenosine deaminases acting on tRNA (ADATs). Overall, our results showed that A-to-I RNA editing occurs specifically during sexual reproduction and mainly in the coding regions in filamentous ascomycetes, involving adenosine deamination mechanisms distinct from metazoan ADARs.

Workbench 7.5, which employs statistical models for estimating the sequencing error rate with the default setting of 1% for required significance parameter.
To exclude amplification bias by PCR during library preparation, duplicate reads with identical start and end positions were removed from the read mappings by the duplicate mapped read removal plugin of CLC Genomics Workbench 7.5. The following filters were used to eliminate false-positives due to amplification bias, sequencing errors, and mapping errors: i) Read filter to remove non-specific matched reads likely related to erroneous mapping; ii) Coverage and count filters to retain only SNVs present in at least 5 reads with a minimum frequency of 3% and minimum coverage of 10 reads; iii) Base quality filter to remove reads with a central quality below 20 and neighborhood quality below 15 within 5 neighborhood radius; iv) Read direction and position filters to remove SNVs whose relative read direction and read position distribution are significantly different from the expected at the significance cut-off of 1%; and v) Multiple type of mismatches filter to discard SNV sites with more than one types of mismatch.
Identification of A-to-I editing sites in perithecia of F. graminearum: Consistent with the fact that the reference genome is of high quality and the same strain was used in this study, only 160 SNVs were identified in genome re-sequencing data of strain PH-1 (Cuomo et al. 2007). Authentic RNA SNVs were obtained by filtering out any genomic SNVs from the RNA variant sets. In total, 23,041 and 19,764 SNV sites were identified in the two independent biological replicates of perithecia, Rep1 and Rep2, respectively. Among these SNVs, 22,578 in Rep1 and 19,261 Rep2 correspond to A-to-G transitions. A total of 17,613 A-to-G transitions are common in both replicates, accounting for 91.4% of the A-to-G sites in Rep2. Because A-to-G variants are highly concordant between Rep1 and Rep2, we combined read mappings from these two replicates to maximize the statistical power for identification of A-to-I editing sites. A total of 26,056 A-to-G variants (referred to A-to-I editing sites in this study) were identified in the combined datasets at an estimated falsediscovery rate of 0.43%. The non A-to-G variants are much less abundant (463 and 503 for Rep1 and Rep2, respectively) and only 35.9% of them are common between Rep1 and Rep2, suggesting that they are not related to RNA editing.

Identification of genes specifically expressed or upregulated in perithecia:
The number of reads (counts) aligned to each predicted gene of PH-1 was calculated by FeatureCounts (Liao et al. 2014) based on the RNA-seq mappings. Differential expression analysis of genes in perithecia compared to conidia and hyphae stages was performed with the edgeRun package (Dimont et al. 2015) using the UCexactTest function with the Benjamini and Hochberg's algorithm to control the false discovery rate (FDR). To filter out weakly expressed genes, only genes with a minimum expression level of 1 count per million (cpm) in at least two RNA-seq libraries were included in the analysis. Genes with a FDR below 0.05 and fold-change greater than 2 were considered to be upregulated genes. Genes upregulated in perithecia with an expression value (cpm) larger than 1 in both replicates of perithecia but lower than 1 in the RNA-seq libraries of conidia and hyphae were considered to be specifically expressed in perithecia.

Analyses of sequence and structure features of A-to-I editing sites in F. graminearum:
Mutation site annotation and functional consequence prediction was performed with the SnpEff (Cingolani et al. 2012) and Amino Acid Changes tool of the CLC Genomics Workbench 7.5. Blast2GO (www.blast2go.com) was used for gene functional annotation.
RNA-seq mappings were viewed using Integrative Genomics Viewer (IGV) (Robinson et al. 2011). Sequence logo was generated with WebLogo 3 (Crooks et al. 2004). Two Sample Logo (Vacic et al. 2006) was used to estimate and visualize the differences between neighboring nucleotides of the A-to-I editing sites identified in this study and 30,000 A sites randomly chosen from the predicted cDNA sequences of the F. graminearum genome (King et al. 2015). Secondary structures of RNA sequences containing the edited A sites (30 bp upstream and 30 bp downstream from the edited As) were predicted with RNAFold (Lorenz et al. 2011). Statistical analysis was performed using R (R Core Team 2015). The P value less than 0.05, 0.01, 0.001, and 0.0001 was designated with one (*), two (**), three (***), and four (****) asterisks, respectively.

Analysis of A-to-I editing in F. verticillioides and N. crassa:
The polyA-selected but unstranded RNA-seq data of F. verticillioides (Sikhakolli et al. 2012) and N. crassa  were downloaded from the NCBI SRA database under accession numbers GSE61865 and GSE41484, respectively. For F. verticillioides, RNA-seq reads of hyphae (2 h after induction of the sexual cycle) and perithecia (4-dpf) samples were mapped to the genome sequence of strain 7600 (FV3) obtained from Broad institute (http://www.broadinstitute.org/). For N. crassa, RNA-seq reads of hyphae before the sexual crossing (N6-0 h) and 5-dpf perithecia (N6-120 h) samples were mapped to the genome sequence of N. crassa OR74A (NC12) obtained from Broad institute (http://www.broadinstitute.org/). Genome-scale identification of A-to-I editing sites in F. verticillioides used the method similar to that for F. graminearum. Since the RNA-seq reads are unpaired and only 36 bp in length, we did not attempt to remove duplicated reads from the RNA-seq mappings. Because the strain of F. verticillioides used to generate the RNA-seq data was different from the strain whose genome was sequenced (Sikhakolli et al. 2012), we removed all homozygous variant sites and sites with extreme degree of variation (>90%), which were most likely to be genomic variants.
Also because the library preparation protocol used to generate RNA-seq data of F. verticillioides was non-strand specific, we inferred the type of substitution based on the strand information of annotated genes. Variant sites without strand information were excluded from the final list.

Generation of the gene knockout mutants:
The split-marker approach (Catlett et al. 2003) was used to generate the gene replacement constructs for the five genes with PUK1-like editing events (Supplemental Table 4) and the FgTAD1 gene. Protoplasts of the wild-type strain PH-1 (Cuomo et al. 2007) were prepared and transformed with each gene replacement construct as described previously (Hou et al. 2002). For transformant selection, hygromycin B (Calbiochem, La Jolla, CA) was added to the final concentration of 250 mg/ml.
Hygromycin-resistant transformants were screed by PCR and putative knockout mutants were confirmed by Southern hybridization analyses.
Generation of the PUK1 TGGTGG and PUK1 TAATAA constructs and transformants: To generate the PUK1 TGGTGG allele, the PUK1 gene (including its 1.5 kb promoter region) was amplified by overlapping PCR with the TA 1831 G TA 1834 G to TG 1831 G TG 1834 G mutations introduced into the PCR primers. The resulting PCR products were cloned into XhoIdigested plasmid pFL2 (Zhou et al. 2011) by yeast gap repair (Bruno et al. 2004). The same approach was used to generate the PUK1 TAATAA allele with two stop codons (TAATAA) inserted right behind the two edited stop codons TA 1831 G TA 1834 G. The PUK1 TGGTGG and PUK1 TAATAA constructs rescued from Trp+ yeast transformants were confirmed by sequencing analysis and transformed into the puk1 deletion mutant. Geneticin-resistant transformants expressing the PUK1 TGGTGG and PUK1 TAATAA constructs were identified by PCR. For qRT-PCR analysis, RNA samples were isolated from vegetative hyphae harvested from 24 h YEPD cultures of PH-1 and transformants expressing the PUK1 TGGTGG or PUK1 TAATAA allele with the TRIzol reagent (Invitrogen, Carlsbad, CA). For each sample, at least three biological replicates were analyzed to calculate mean and standard deviation.