Title : Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution
Author: Chris Papadopoulos, Isabelle Callebaut, Jean-Christophe Gelly, Isabelle Hatin, 
		Olivier Namy, Maxime Renard, Olivier Lespinet, Anne Lopes
		
In this directory (Random_sequences) you can find all the data needed to generate all the 
analyses on random sequences in order to test the contribution of ORFs sizes, various 
nucleotide GC content and various hydrophobicity content. 

DIRECTORY ORGANIZATION:

		A. NFASTA : All the necessary input nucleotide sequences of all the ORF categories:
				
				a. Scer_Intergenic.nfasta : 105041 Intergenic ORFs 	
				b. Scer_CDS.nfasta : 6669 CDS ORFs
				c. Selectively.nfasta : 31 Highly translated IGORFs
				d. Occasionally.nfasta : 1235 Occasionally translated IGORFs
				e. Denovo.nfasta : 70 De novo genes ORFs
				f. AncFragmants.nfasta : 167 Ancestral IGORFs 
				
		B. Size_&_Hydrophobicity : All the random amino acid sequences with various sizes
								   and various hydrophobicity content.  
								   
				a. Sequences_{minimum size}-{maximum size}_{% hydrophobicity}.fasta
				b. Sequences_{minimum size}-{maximum size}_{% hydrophobicity}.barcodes
					
		C. IGORF_NT_freq_various_sizes : All the random nucleotide sequences which respect
										 the nucleotides distribution of IGORFs and size
										 distribution of every distinct category of ORFs. 
										 
				a. AncIGORF-size_IGORF-freq.* : IGORFs nucleotide frequence & AncIGORFs size
				b. Selectively-size_IGORF-freq.* : IGORFs nucleotide frequence & highly translated IGORFs size
				c. Occasionally-size_IGORF-freq.* : IGORFs nucleotide frequence & occasionally translated IGORFs size
				d. Denovo-size_IGORF-freq.* : IGORFs nucleotide frequence & de novo genes size
				e. CDS-size_IGORF-freq.* : IGORFs nucleotide frequence & CDS ORFs size
				
		D. Scrumbled_nucleotides : All the scrambled nucleotide sequences ORFs
		
				a. Selectively_randomized.* : Highly translated IGORFs scrambled
				b. Scer_IGORF_randomized.*  : IGORFs scrambled
				c. Scer_CDS_randomized.*    : CDS ORFs scrambled 
				d. Occasionally_randomized.*: Occasionally translated IGORFs scrambled 
				e. Denovo_randomized.*      : De novo genes ORFs scrambled
				f. Ancestral_randomized.*   : AncIGORFs scrambled
				
		E. Random_IGR : In this directory we concatenate all the intergenic regions of
						S.cerevisiae genome, we scramble their nucleotides and then using
						ORFtrack we export all the Random ORFs. 
						
				a. Pipeline.sh : Is the pipeline which generated all the data
				b. Scer_IGR_concatenated.nfasta : Concatenated Intergenic regions
				c. Scer_IGR_randomized.nfasta : Randomized Intergenic "Genome"
				d. Scer_Random.pfasta : Random IGORFs amino acid sequences
				e. Scer_Random.nfasta : Random IGORFs nucleotide sequences
				f. Scer_Random.barcodes : Random IGORFs HCA barcodes
				g. mapping_orf_Scer_IGR_randomized.gff : Random Intergenic "Genome" IGORFs annotation
		
		
		
		
		
		
		
		
		
		
		
		

