Table 5.

Program Command Procedures to Interface with GenBank Data Base and Generate Alu Distribution Data Set[i]

Filename Function Invoke command
C programs
 pfollows3Finds adjacent pairs in map file within a specified distancepfollows3 map 0 650 > source
 vsub2Extracts library sequence files specified in map filevsub2 map [library] Out
 vflop[ii] Changes the position of the columnsvflop
 vext[ii] Extracts sequences of given fragment boundaries from Outvext
 pplan[ii] Sequence annotation, makes loci names uniquepplan
 pcomp1Get the reverse complement of the sequencepcomp1 cd.nd.uniq cd.nd.comp[iii]
 pflank3Aligns the first Alu sequence in pair with the secondpflank3 [.uniq] [.comp] [.align]
 prenum2Gets original coords. from .unique files and restores to .alignprenum2 [.align][.uniq1] [.uniq2] [align2]
Perl programs
 get_cc_pairsExtracts CC pairs from pfollows3-generated outputget_cc_pairs source > cc
 get_cd_pairs ” CD  ”  ”    ”     ”    ”get_cd_pairs source > cd
 get_dc_pairs ” DC  ”  ”    ”     ”    ”get_dc_pairs source > dc
 get_dd_pairs ” DD  ”  ”    ”     ”    ”get_dd_pairs source > dd
 reformat_grepCalculates the alignment length and percent identityreformat_grep cd.grep > cd.grep1
 coordinatesGets the unaligned fragment coordinates from seq. filecoordinates [.uniq]
 get_descripGets the gene/cosmid sequence description ”  ”  ”get_descrip [.uniq]
 BinsEchoes the alignment stats for data within a length rangeBins [low] [high] [infile]
 Bins2Sorts data according to (b+c)Bins2 [low] [high] [infile]
 Bins3Sorts data according to % identityBins3 [low] [high] [infile]
Shell scripts
 CON1Translates multiple spaces into a single spaceCON1 ../cd > cd[iii]
 CONTranslates spaces to new lines (tr “ ” “\n” <$1)CON cd > temp[iii]
 tCalls vflop and vext to extract sequences[iv] from Outt
 renameSends parameters to pplan appends extension .uniqrename cc.st[iii] [v]
 ext_coordCalls coordinates, vflop and get_definitionsext_coord cc.nd.uniq[iii] [v]
Batch files
 ReformatInvokes CON1, and CON and renames temp [input]Reformat
 Extract PairsCalls t for cc, cd, dc, and dd filesExtract Pairs
 Make_uniqCalls rename for cd.nd, cd.st, cd.tot etc.[iii] [v] Make_uniq
 Batch_alignCalls pcomp1 and pflank3 to align all .st w/ .nd filesAlign_in_batch
 Mv_alignRenames all .align2 files .align filesMv_align
 Grep_acGets the coordinates for the aligned sequencesGrep_ac
 Grep_asGets the alignment statistics from the aligned sequencesGrep_as
 Paste_grepsPuts the alignment coordinates and statistics on one linePaste_greps
 Get_albcsiCalls reformat_grep for all of the pasted .grep filesGet_albcsi
 Coord_extrCalls ext_coord for all files with the extension; .uniqCoord_extr
 Sort_by_lenCalls Bins, puts in upper and lower limits for a lengthSort_by_len
 Sort_by_bcCalls Bins2, puts in upper and lower limits for b+c lengthSort_by_bc
 put_in_binsCalls Bins3, puts in upper and lower limits for % identityput_in_bins

[i] Files are viewable athttp://dir.niehs.nih.gov/ALU/methods.html except for the C programs, which were written, and are maintained and are available at the Genetic Information Research Institute.

[ii] These programs request parameters from inside the program.

[iii] The filenames cc, dc, and dd may be substituted for cd.

[iv] Extracts sequences from the 1st Alu in pair, 2ndAlu in the pair, and the entire region of the pair and renames them $1.nd, $1.st, and $1.tot respectively ($1 = [input file]).

[v] The file extensions .nd and .tot may be substituted for .st.