Program Command Procedures to Interface with GenBank Data Base and Generate Alu Distribution Data Set
| Filename | Function | Invoke command |
| C programs | ||
| pfollows3 | Finds adjacent pairs in map file within a specified distance | pfollows3 map 0 650 > source |
| vsub2 | Extracts library sequence files specified in map file | vsub2 map [library] Out |
| vflop | Changes the position of the columns | vflop |
| vext | Extracts sequences of given fragment boundaries from Out | vext |
| pplan | Sequence annotation, makes loci names unique | pplan |
| pcomp1 | Get the reverse complement of the sequence | pcomp1 cd.nd.uniq cd.nd.comp |
| pflank3 | Aligns the first Alu sequence in pair with the second | pflank3 [.uniq] [.comp] [.align] |
| prenum2 | Gets original coords. from .unique files and restores to .align | prenum2 [.align][.uniq1] [.uniq2] [align2] |
| Perl programs | ||
| get_cc_pairs | Extracts CC pairs from pfollows3-generated output | get_cc_pairs source > cc |
| get_cd_pairs | ” CD ” ” ” ” ” | get_cd_pairs source > cd |
| get_dc_pairs | ” DC ” ” ” ” ” | get_dc_pairs source > dc |
| get_dd_pairs | ” DD ” ” ” ” ” | get_dd_pairs source > dd |
| reformat_grep | Calculates the alignment length and percent identity | reformat_grep cd.grep > cd.grep1 |
| coordinates | Gets the unaligned fragment coordinates from seq. file | coordinates [.uniq] |
| get_descrip | Gets the gene/cosmid sequence description ” ” ” | get_descrip [.uniq] |
| Bins | Echoes the alignment stats for data within a length range | Bins [low] [high] [infile] |
| Bins2 | Sorts data according to (b+c) | Bins2 [low] [high] [infile] |
| Bins3 | Sorts data according to % identity | Bins3 [low] [high] [infile] |
| Shell scripts | ||
| CON1 | Translates multiple spaces into a single space | CON1 ../cd > cd |
| CON | Translates spaces to new lines (tr “ ” “\n” <$1) | CON cd > temp |
| t | Calls vflop and vext to extract sequences from Out | t |
| rename | Sends parameters to pplan appends extension .uniq | rename cc.st |
| ext_coord | Calls coordinates, vflop and get_definitions | ext_coord cc.nd.uniq |
| Batch files | ||
| Reformat | Invokes CON1, and CON and renames temp [input] | Reformat |
| Extract Pairs | Calls t for cc, cd, dc, and dd files | Extract Pairs |
| Make_uniq | Calls rename for cd.nd, cd.st, cd.tot etc. | Make_uniq |
| Batch_align | Calls pcomp1 and pflank3 to align all .st w/ .nd files | Align_in_batch |
| Mv_align | Renames all .align2 files .align files | Mv_align |
| Grep_ac | Gets the coordinates for the aligned sequences | Grep_ac |
| Grep_as | Gets the alignment statistics from the aligned sequences | Grep_as |
| Paste_greps | Puts the alignment coordinates and statistics on one line | Paste_greps |
| Get_albcsi | Calls reformat_grep for all of the pasted .grep files | Get_albcsi |
| Coord_extr | Calls ext_coord for all files with the extension; .uniq | Coord_extr |
| Sort_by_len | Calls Bins, puts in upper and lower limits for a length | Sort_by_len |
| Sort_by_bc | Calls Bins2, puts in upper and lower limits for b+c length | Sort_by_bc |
| put_in_bins | Calls Bins3, puts in upper and lower limits for % identity | put_in_bins |
-
↵Files are viewable athttp://dir.niehs.nih.gov/ALU/methods.html except for the C programs, which were written, and are maintained and are available at the Genetic Information Research Institute.
-
↵These programs request parameters from inside the program.
-
↵The filenames cc, dc, and dd may be substituted for cd.
-
↵Extracts sequences from the 1st Alu in pair, 2ndAlu in the pair, and the entire region of the pair and renames them $1.nd, $1.st, and $1.tot respectively ($1 = [input file]).
-
↵The file extensions .nd and .tot may be substituted for .st.











