Biased Distribution of Inverted and Direct Alus in the Human Genome: Implications for Insertion, Exclusion, and Genome Stability

Table 5.

Program Command Procedures to Interface with GenBank Data Base and Generate Alu Distribution Data Set

Filename Function Invoke command
C programs
 pfollows3 Finds adjacent pairs in map file within a specified distance pfollows3 map 0 650 > source
 vsub2 Extracts library sequence files specified in map file vsub2 map [library] Out
 vflop Changes the position of the columns vflop
 vext Extracts sequences of given fragment boundaries from Out vext
 pplan Sequence annotation, makes loci names unique pplan
 pcomp1 Get the reverse complement of the sequence pcomp1 cd.nd.uniq cd.nd.comp
 pflank3 Aligns the first Alu sequence in pair with the second pflank3 [.uniq] [.comp] [.align]
 prenum2 Gets original coords. from .unique files and restores to .align prenum2 [.align][.uniq1] [.uniq2] [align2]
Perl programs
 get_cc_pairs Extracts CC pairs from pfollows3-generated output get_cc_pairs source > cc
 get_cd_pairs  ” CD  ”  ”    ”     ”    ” get_cd_pairs source > cd
 get_dc_pairs  ” DC  ”  ”    ”     ”    ” get_dc_pairs source > dc
 get_dd_pairs  ” DD  ”  ”    ”     ”    ” get_dd_pairs source > dd
 reformat_grep Calculates the alignment length and percent identity reformat_grep cd.grep > cd.grep1
 coordinates Gets the unaligned fragment coordinates from seq. file coordinates [.uniq]
 get_descrip Gets the gene/cosmid sequence description ”  ”  ” get_descrip [.uniq]
 Bins Echoes the alignment stats for data within a length range Bins [low] [high] [infile]
 Bins2 Sorts data according to (b+c) Bins2 [low] [high] [infile]
 Bins3 Sorts data according to % identity Bins3 [low] [high] [infile]
Shell scripts
 CON1 Translates multiple spaces into a single space CON1 ../cd > cd
 CON Translates spaces to new lines (tr “ ” “\n” <$1) CON cd > temp
 t Calls vflop and vext to extract sequences from Out t
 rename Sends parameters to pplan appends extension .uniq rename cc.st
 ext_coord Calls coordinates, vflop and get_definitions ext_coord cc.nd.uniq
Batch files
 Reformat Invokes CON1, and CON and renames temp [input] Reformat
 Extract Pairs Calls t for cc, cd, dc, and dd files Extract Pairs
 Make_uniq Calls rename for cd.nd, cd.st, cd.tot etc. Make_uniq
 Batch_align Calls pcomp1 and pflank3 to align all .st w/ .nd files Align_in_batch
 Mv_align Renames all .align2 files .align files Mv_align
 Grep_ac Gets the coordinates for the aligned sequences Grep_ac
 Grep_as Gets the alignment statistics from the aligned sequences Grep_as
 Paste_greps Puts the alignment coordinates and statistics on one line Paste_greps
 Get_albcsi Calls reformat_grep for all of the pasted .grep files Get_albcsi
 Coord_extr Calls ext_coord for all files with the extension; .uniq Coord_extr
 Sort_by_len Calls Bins, puts in upper and lower limits for a length Sort_by_len
 Sort_by_bc Calls Bins2, puts in upper and lower limits for b+c length Sort_by_bc
 put_in_bins Calls Bins3, puts in upper and lower limits for % identity put_in_bins
  • Files are viewable athttp://dir.niehs.nih.gov/ALU/methods.html except for the C programs, which were written, and are maintained and are available at the Genetic Information Research Institute.

  • These programs request parameters from inside the program.

  • The filenames cc, dc, and dd may be substituted for cd.

  • Extracts sequences from the 1st Alu in pair, 2ndAlu in the pair, and the entire region of the pair and renames them $1.nd, $1.st, and $1.tot respectively ($1 = [input file]).

  • The file extensions .nd and .tot may be substituted for .st.

This Article

  1. Genome Res. 11: 12-27

Preprint Server