"Quantification of percent sequence divergence"

Sequence divergence for each comparison was accomplished by pairwise aligning all exons.
The initial pairwise alignment depends on the Fast Statistical Aligner software (FSA) (Bradley et al. 2009).
These alignments are done via the following command:

      perl pairwise_aln_FSA.pl <species1> <species2> <FASTA1> <FASTA2>

This outputs two new FASTA files: species1.species1_species2_fsa.fasta and species2.species1_species2_fsa.fasta

After those alignments have been performed, a summary of the sites differentiating those species
can be obtained using the following command with those new FSA-aligned FASTA files:

    perl compare_pairwise.pl <species1 FASTA> <species2 FASTA> <species1_species2_SNPs>

This not only prints a summary to the screen, but also individual SNP files.

Lastly, the number of divergent sites per gene can be determined by inputing a list of genes of interest,
the list of SNPs from the previous step, as well as lengths for each gene:

    perl seq_div_from_set.pl <list of genes> <species1_species2_SNPs> <gene lengths>


"Mapping sequencing reads to genes and alleles"

As stated in the methods section, reads were aligned to strain- or species-specific genomes using MOSAIK.

Reads were then lifted from their strain- or species-specific coordinates to dm3 (Drosophila melanogaster) coordinates
using the UCSC liftOver utility.

Reads were then intersected with a list of constitutive exons in dm3, as well as filtering out gaps between
all strain- and species-specific comparisons using the following command, which depends on the BEDtools module
intersectBed:

	perl convert.pl <read alignment in BED format> <constitutive_exons.bed> <gaps.bed>

Finally, based on the strain- or species-specific mapping of both mates of a paired-end read, reads were assigned to
a particular allele using the following command:

  perl classify.pl <constitutive_exons.bed> <species1> <species2> <merged alignment file> <output directory> <filename prefix>

For example, the within-D. mel. comparison of zhr and z30 would have looked something like this:

    perl classify.pl constitutive_exons.bed zhr z30 zhrXz30_merged.txt data zhrXz30_mRNA

As indicated in the classify.pl preamble, the merged alignment file has the following tab-delimited format
(different sequencing technologies will generate different styles of read names):

read_name  mate1species1	mate1species2	 mate2species1	  mate2species2
HWI-12345  4873_8748:geneX	None		 4873_8748:geneX  None
HWI-23456  None			8487_10495:geneY None		  8487_10495:geneY
HWI-34567  123_627:geneZ	123_627:geneZ	 123_627:geneZ	  123_627:geneZ

The mate and species-specific alignments describe the exon interval with start and stop coordinates separated by "_",
followed by a colon and the gene name, or start_stop:gene. The start and stop coordinates as given as in BED format,
with the start coordinate being 0-based and the stop coordinate being 1-based.

In this toy example, read HWI-12345 would be assigned to the species1 allele, read HWI-23456 would be assigned to the
species2 allele, and read HWI-34567 is assigned in a "both" category (because it aligned equally well to both species).