Sequence Assembly with CAFTOOLS

Table 1.

Summary of the CAFTOOLS

A. General CAF utilities, including tools for communication with other software
general CAF utilities
cafpad Converts an unpadded assembly to a padded one. All coordinate-dependent data are updated (written in C).
cafdepad Inverse ofcafpad (C).
cafcat Concatenates and consolidates multiple CAF files into a single file. Also reports semantic errors (C).
cafmerge Merges two CAF files, replacing duplicated objects rather than concatenating them (cf.cafcat) (C).
caf2phrap Extracts a subset of sequences from a CAF file into three files; (1) FASTA DNA; (2) base quality; (3) CAF stub of remaining attributes. This is used to prepare data forphrap but also provides a general way to extract FASTA sequence data from CAF (C).
assembler support
cafphrap Takes a CAF file and an optional list of reads, assembles them with phrap and creates a new CAF file describing the assembly. No other postprocessing is done.
caffak Similar Perl wrapper forFAKII.
acembly support
caf2bly Converts a CAF file into an acemblydatabase.
bly2caf Exports a CAF file from an acemblydatabase.
cafbly Takes a CAF file and a script command file, reads a CAF file into an acembly database, performs the script, re-exports a CAF file on standard output and cleans up.
gap support
caf2gap Converts a CAF file into a gap4 database. All CAF tags are converted to their gap4 equivalents (C).
gap2caf Creates a CAF representation of data in a gap4 database (C).
exp2caf Converts Staden Experiment files into CAF.
updatecaf
consed support
consed2caf Converts a phrap assembly or aconsed database into CAF.
consed2gap Converts aphrap assembly or consed database into agap4 database.
caf2phd Converts CAF reads tophred PHD files required forconsed.
phd2caf Converts phred PHD files to CAf.
B. Specialized processing tools. Programs are written in Perl unless indicated otherwise.
tag generators
cafvector caftagfeature cafalu cafcgi These wrappers extract contig sequence from CAF, screen it usingblast or crossmatch against a library of sequences, and create tags for any matches found.
auto-editor
npedit Proposes edits for an assembly by examining the SCF traces in the context of the alignment. A new CAF file is generated listing the suggested edits as special edit tags that are parsed and acted on byndedit (C).
ndedit Makes the edits proposed by npedit. Editing will change the DNA sequences of the reads. ndedit modifies the coordinates of all tags and base qualities appropriately (C).
clipping
ndclip Clips back all assembled reads according to the Clipping tags. We use this to postprocess phrap assemblies to restrict aligned reads to their higher-quality regions (C).
neclip Used after ndclip to extend back clipped reads where necessary to cover holes created (C).
cafsplit Alternative to neclip. Splits contigs at holes.
finishing
finish Analyzes the assembly to choose directed reads for the purpose of finishing.
cafcop Checks assembly for finishing errors and regions of insufficient sequence coverage.
clone overlap data management
Readraid Incorporates SCF traces and sequence of reads from overlapping regions of neighboring clones in the physical map.
Conraid Incorporates consensus sequence from overlapping regions of neighboring clones.

This Article

  1. Genome Res. 8: 260-267

Preprint Server