README.TXT

VELVET SOURCE 
October 16 2007
Daniel Zerbino

NOTE: The PDF manual in the same directory contains the same text as this text
file, only prettier.

----------------------------------------------------------------------------------
For impatient people, just type:
> make
> ./velveth sillyDirectory 21 -shortPaired data/test_reads.fa
> ./velvetg sillyDirectory
> less sillyDirectory/stats.txt
> ./velvetg sillyDirectory 0 100
> less sillyDirectory/contigs.fa
----------------------------------------------------------------------------------

> SUMMARY
	* A/ REQUIREMENTS
	* B/ COMPILING INSTRUCTIONS
	* C/ RUNNING INSTRUCTIONS
		- C.1/ Running velveth
		- C.2/ Running velvetg 
			.C.2.a/ Just reads
			.C.2.b/ With paired ends
			.C.2.c/ Using multiple categories
	* D/ FILE FORMATS 
	* E/ FOR MORE INFORMATION
		- E.1/ Website
		- E.2/ Mailing list
		- E.3/ Contact email

----------------------------------------------------------------------------------
A/ REQUIREMENTS

	Velvet should function on any standard 64bit Linx environment with
gcc. A good amount of physical memory (12GB to start with, more is no luxury)
is recommended. 

----------------------------------------------------------------------------------
B/ COMPILING INSTRUCTIONS

Normally, with a GNU environment, just type:

> make

Otherwise compile each *.c file separately, then execute the default
instructions at the top of Makefile. 

----------------------------------------------------------------------------------
C/ RUNNING INSTRUCTIONS


C.1/ Running velveth 
--------------------

velveth simply takes in a number of sequence files (fasta, fastq or solexa 
sequence file formats), produces a hashtable, then outputs two files in the 
./output_directory/ directory (creating this directory if necessary), 
./output_directory/Sequences and ./output_directory/Roadmaps, which are later 
used by graph.

The syntax is as follows:

> ./velveth output_directory hash_length [[descriptors] filename]

The descriptors inform Velvet of two things, file format and read category.

Supported file formats are:
-fasta (default) 
-fastq
-eland

Read categories are:
-long (for Sanger, 454 or even reference sequences)
-short (default)
-shortPaired
-short2 (same as short, but separate, if for some reason you want to keep
things apart)
-shortPaired2 (see above) 

For concision, descriptors are stable. In other words, they are true until
contradicted by another operator. This allows you to write as many filenames
as you wish without having to re-type identical descriptors. 

Example:
./velveth testdir 21 -fasta -short solexa1.fa solexa2.fa solexa3.fa -long
capillary.fa

In this example, all the files are considered to be in FASTA format,
only the read category changes.

However, the default options are "fasta" and "short", so the previous example
can also be written as:

./velveth testdir 21 solexa*.fa -long capillary.fa

NOTE: for practical reasons, the hash length has to be an odd number <= 31. If
you don't respect these constraints, Velvet will simply decrement the hash
value to one it can handle. 

NOTE: just typing 
> ./velveth
...will produce a short help message.

C.2/ Running velvetg
--------------------

	C.2.a/ Just reads

Initally, you simply run: 
> ./velvetg output_directory 

This will produce a fasta file of long nodes (> 100 bp) and output some stats.
You can read those stats with any decent table reader (I use R, but even Excel
should do the job). Experience shows that there are many short, low-coverage
nodes left over from the intial correction. Determine as you wish a coverage
cutoff value, then run:

> ./velvetg output_directory hash_length_cutoff

... where hash_length_cutoff is the floating point value you wish to use.

The output will be identical in format, so beware of copying results if you do
not want them overwritten. 

	C.2.b/ With paired ends

REMINDER: you must have flagged your reads as being paired ends when running
velveth (cf. supra).

To activate the use of read pairs, simply add another parameter, the maximum
insert length (or at least a rough estimate). You therefore type:

./velvetg working_directory/ coverage_cutoff max_insert_length

This implies that you are necessarily specifying a coverage cutoff. If for some
reason you do not want any, just put a negative value:

./velvetg working_directory/ -1 max_insert_length

NOTE: just typing 
> ./velvetg
...will produce a short help message.

	C.2.c/ Using multiple categories 

You can be interested in keeping several kinds of short read sets separate.
For example, if you have two paired end experiments, with different insert
lengths, mixing the two together would be a loss of information. This is why
Velvet allows for the use of 2 short read channels (plus the long reads, which
are yet another category).

To do so, you simply need to use the appropriate options when hashing the
reads (see C.1). Put the shorter inserts in the first category. Aftewards, 
in velvetg you can use the options

./velvetg working_directory coverage_cutoff max_insert_length1
max_insert_length2

NOTE: Increasing the amount of categories is possible. It's simply a bit more
expensive memory-wise. 

NOTE: In the stats file, you will find all three categories (long,
short1 and short2) treated separately.

----------------------------------------------------------------------------------
D/ FILE FORMATS

Velvet works mainly with fasta and fastq formats. 

For paired-end reads, the assumption is that each read is next to its mate
reads. In other words, if the reads are indexed from 0, then reads 0 and 1 are
paired, 2 and 3, 4 and 5, etc.  

If for some reason you have forward and reverse reads in two different FASTA files
but in corresponding order, the bundled Perl script shuffleSequences.pl will
merge the two files into one as appropriate.

To use it, just type:
./shuffleSequences.pl forward_reads.fa reverse_reads.fa output.fa


----------------------------------------------------------------------------------
E/ FOR MORE INFORMATION

E.1/ Webpage
------------

For general information and FAQ, you can first take a look at:

www.ebi.ac.uk/~zerbino/velvet/


D.2/ Mailing list
-----------------

For questions/requests/etc. you can subscribe to the users' mailing list: velvet-users@ebi.ac.uk 

To do so, see http://listserver.ebi.ac.uk/mailman/listinfo/velvet-users


D.3/ Contact emails
-------------------

For specific questions/requests you can contact us at the following addresses:
- Daniel Zerbino: zerbino@ebi.ac.uk
- Ewan Birney: birney@ebi.ac.uk
