Birth of protein-coding exons by ancient domestication of LINE-1 retrotransposon

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 1.
Figure 1.

Identification of Lyosin in the alligator genome assembly. (A) UCSC Genome Browser of the MYL4 gene of mouse and American alligator. The noncanonical exon L was annotated in the American alligator's MYL4 gene. (B) Representation of the MYL4 and Lyosin protein structure in American alligator. The amino acid sequences of exons 1 to 2 and exons 3 to 6 correspond to the actin-binding site and the EF-hand calcium-binding domain, respectively, in the MYL4 protein. The Lyosin protein consists of amino acids encoded in exon L and exons 3 to 6. Exon L encodes a 246 amino acid protein similar to the L1 ORF1 protein (ORF1p), which is an RNA-binding protein. The ORF2 protein (ORF2p) of L1 contains an enzymatic protein with endonuclease and reverse transcriptase activity. (C) The amino acid sequence alignment of L1-32_DR ORF1p and exon L of the American alligator's Lyosin protein. (D) Protein structures of human L1 ORF1p (PDB: AF_AFQ9UN81F1) and the Lyosin protein predicted by AlphaFold2 (Jumper et al. 2021) implemented in ColabFold (Mirdita et al. 2022). ORF1p consists of the disordered N-terminal domain (NTD), coiled-coil (CC), RNA recognition motif (RRM), and C-terminal domain (CTD). Lyosin contains NTD, CC, and partial RRM. (E) The βαββαβ structure of the predicted Lyosin RRM domain. (F) Structural comparison of the predicted Lyosin RRM domain and the L1 ORF1p RRM domain determined by X-ray diffraction (Protein Data Bank [PDB; https://www.rcsb.org] 2W7A).

This Article

  1. Genome Res. 35: 1287-1300

Preprint Server