
CONTENTS.txt: a list of files in the Supplemental_Files archive for Mortazavi et al. (2010), 
Genome Research, <http://dx.doi.org/10.1101/gr.111021.110>.

1. Proteomes
  
    This directory contains a compressed ('gzip -9') file of the hybrid Brugia malayi proteome:

    Supplemental_Proteomes/brugpep.WS209.off_w_aug.fa.gz

which was generated from both official annotations and AUGUSTUS predictions to allow a complete
Brugia gene set to be compared in OrthoMCL with other nematode genomes.  Michael Paulini (WormBase
/ WTSI) subsequently corrected this proteome and placed an improved version of it in WormBase.
    
    It also contains a compressed file of the actual PS1010 proteome used for OrthoMCL:

    Supplemental_Proteomes/ps1010rel4.prot.fa.gz
    
Due to curation of the C. sp. 3 PS1010 genome by WormBase or GenBank, the official final versions
of this proteome may not exactly match the proteome used in the paper.

2. OrthoMCL

    This directory contains files pertinent to OrthoMCL analyses.

    OrthoMCL/orthomcl_7spp_ps1010.omcl.CDS_centric.txt -- the original, CDS-based output of OrthoMCL.
    OrthoMCL/cds2gene_7spp.tsv                         -- the conversion table used from CDSes to genes.
    OrthoMCL/orthomcl_7spp_ps1010.omcl.txt             -- the corrected, gene-centric OrthoMCL used for analyses.

    (Note that there are some cases of a single protein in C. elegans being encoded by two or more 
CDSes.  Thus, CDS-to-gene mapping can be unique, but protein-to-gene mapping may not be.)

3. elegans_elements

    This directory contains primary sequence and GFF files describing the final set of 2,672 DNA 
elements in C. elegans that were both highly conserved between C. elegans, C. briggsae, and PS1010, 
and that were likely to encode regulatory sequences or previously unknown ncRNAs:

    elegans_elements/cons_2672_elements.fa        -- primary DNA sequence of the elements
    elegans_elements/cons_2672_elements.WS190.gtf -- GTF of their coordinates for WormBase release WS190
    elegans_elements/cons_2672_elements.WS215.gtf -- GTF of their coordinates for WormBase release WS215

4. elegans_motifs

    This directory contains the two runs of MEME motif predictions done from the primary DNA 
sequence of the 2,672 conserved elements, along with output files for the remapping of these motifs 
to the elements with FIMO.  These data are in subdirectories elegans_motifs/Run_1 and elegans_motifs/Run_2.

5. PS1010_repeats

    This directory contains the genomic repeats predicted with RepeatModeler for the PS1010 genome:

    PS1010_repeats/ps1010rel4_consensi.classified.fa -- the actual repeat sequences.
    PS1010_repeats/NOT_FILTERED.txt -- a disclaimer: these were not filtered for protein-coding or ncRNA genes!

