--- /dev/null
+
+
+
+ CLUSTAL 2.0.12 Multiple Sequence Alignments
+
+
+
+
+>> HELP NEW << NEW FEATURES/OPTIONS
+
+==UPGMA==
+ The UPGMA algorithm has been added to allow faster tree construction. The user now
+ has the choice of using Neighbour Joining or UPGMA. The default is still NJ, but the
+ user can change this by setting the clustering parameter.
+
+ -CLUSTERING= :NJ or UPGMA
+
+==ITERATION==
+
+ A remove first iteration scheme has been added. This can be used to improve the final
+ alignment or improve the alignment at each stage of the progressive alignment. During the
+ iteration step each sequence is removed in turn and realigned. If the resulting alignment
+ is better than the previous alignment it is kept. This process is repeated until the score
+ converges (the score is not improved) or until the maximum number of iterations is
+ reached. The user can iterate at each step of the progressive alignment by setting the
+ iteration parameter to TREE or just on the final alignment by seting the iteration
+ parameter to ALIGNMENT. The default is no iteration. The maximum number of iterations can
+ be set using the numiter parameter. The default number of iterations is 3.
+
+ -ITERATION= :NONE or TREE or ALIGNMENT
+
+ -NUMITER=n :Maximum number of iterations to perform
+
+==HELP==
+
+ -FULLHELP :Print out the complete help content
+
+==MISC==
+
+ -MAXSEQLEN=n :Maximum allowed sequence length
+
+ -QUIET :Reduce console output to minimum
+
+ -STATS=file :Log some alignents statistics to file
+
+
+>> HELP 1 << General help for CLUSTAL W (2.0.12)
+
+Clustal W is a general purpose multiple alignment program for DNA or proteins.
+
+SEQUENCE INPUT: all sequences must be in 1 file, one after another.
+7 formats are automatically recognised: NBRF-PIR, EMBL-SWISSPROT,
+Pearson (Fasta), Clustal (*.aln), GCG-MSF (Pileup), GCG9-RSF and GDE flat file.
+All non-alphabetic characters (spaces, digits, punctuation marks) are ignored
+except "-" which is used to indicate a GAP ("." in MSF-RSF).
+
+To do a MULTIPLE ALIGNMENT on a set of sequences, use item 1 from this menu to
+INPUT them; go to menu item 2 to do the multiple alignment.
+
+PROFILE ALIGNMENTS (menu item 3) are used to align 2 alignments. Use this to
+add a new sequence to an old alignment, or to use secondary structure to guide
+the alignment process. GAPS in the old alignments are indicated using the "-"
+character. PROFILES can be input in ANY of the allowed formats; just
+use "-" (or "." for MSF-RSF) for each gap position.
+
+PHYLOGENETIC TREES (menu item 4) can be calculated from old alignments (read in
+with "-" characters to indicate gaps) OR after a multiple alignment while the
+alignment is still in memory.
+
+
+The program tries to automatically recognise the different file formats used
+and to guess whether the sequences are amino acid or nucleotide. This is not
+always foolproof.
+
+FASTA and NBRF-PIR formats are recognised by having a ">" as the first
+character in the file.
+
+EMBL-Swiss Prot formats are recognised by the letters
+ID at the start of the file (the token for the entry name field).
+
+CLUSTAL format is recognised by the word CLUSTAL at the beginning of the file.
+
+GCG-MSF format is recognised by one of the following:
+ - the word PileUp at the start of the file.
+ - the word !!AA_MULTIPLE_ALIGNMENT or !!NA_MULTIPLE_ALIGNMENT
+ at the start of the file.
+ - the word MSF on the first line of the line, and the characters ..
+ at the end of this line.
+
+GCG-RSF format is recognised by the word !!RICH_SEQUENCE at the beginning of
+the file.
+
+
+If 85% or more of the characters in the sequence are from A,C,G,T,U or N, the
+sequence will be assumed to be nucleotide. This works in 97.3% of cases
+but watch out!
+
+
+>> HELP 2 << Help for multiple alignments
+
+If you have already loaded sequences, use menu item 1 to do the complete
+multiple alignment. You will be prompted for 2 output files: 1 for the
+alignment itself; another to store a dendrogram that describes the similarity
+of the sequences to each other.
+
+Multiple alignments are carried out in 3 stages (automatically done from menu
+item 1 ...Do complete multiple alignments now):
+
+1) all sequences are compared to each other (pairwise alignments);
+
+2) a dendrogram (like a phylogenetic tree) is constructed, describing the
+approximate groupings of the sequences by similarity (stored in a file).
+
+3) the final multiple alignment is carried out, using the dendrogram as a guide.
+
+
+PAIRWISE ALIGNMENT parameters control the speed-sensitivity of the initial
+alignments.
+
+MULTIPLE ALIGNMENT parameters control the gaps in the final multiple alignments.
+
+
+RESET GAPS (menu item 7) will remove any new gaps introduced into the sequences
+during multiple alignment if you wish to change the parameters and try again.
+This only takes effect just before you do a second multiple alignment. You
+can make phylogenetic trees after alignment whether or not this is ON.
+If you turn this OFF, the new gaps are kept even if you do a second multiple
+alignment. This allows you to iterate the alignment gradually. Sometimes, the
+alignment is improved by a second or third pass.
+
+SCREEN DISPLAY (menu item 8) can be used to send the output alignments to the
+screen as well as to the output file.
+
+You can skip the first stages (pairwise alignments; dendrogram) by using an
+old dendrogram file (menu item 3); or you can just produce the dendrogram
+with no final multiple alignment (menu item 2).
+
+
+OUTPUT FORMAT: Menu item 9 (format options) allows you to choose from 6
+different alignment formats (CLUSTAL, GCG, NBRF-PIR, PHYLIP, GDE, NEXUS, and FASTA).
+
+
+
+>> HELP 3 << Help for pairwise alignment parameters
+
+A distance is calculated between every pair of sequences and these are used to
+construct the dendrogram which guides the final multiple alignment. The scores
+are calculated from separate pairwise alignments. These can be calculated using
+2 methods: dynamic programming (slow but accurate) or by the method of Wilbur
+and Lipman (extremely fast but approximate).
+
+You can choose between the 2 alignment methods using menu option 8. The
+slow-accurate method is fine for short sequences but will be VERY SLOW for
+many (e.g. >100) long (e.g. >1000 residue) sequences.
+
+SLOW-ACCURATE alignment parameters:
+ These parameters do not have any affect on the speed of the alignments.
+They are used to give initial alignments which are then rescored to give percent
+identity scores. These % scores are the ones which are displayed on the
+screen. The scores are converted to distances for the trees.
+
+1) Gap Open Penalty: the penalty for opening a gap in the alignment.
+2) Gap extension penalty: the penalty for extending a gap by 1 residue.
+3) Protein weight matrix: the scoring table which describes the similarity
+ of each amino acid to each other.
+4) DNA weight matrix: the scores assigned to matches and mismatches
+ (including IUB ambiguity codes).
+
+
+FAST-APPROXIMATE alignment parameters:
+
+These similarity scores are calculated from fast, approximate, global align-
+ments, which are controlled by 4 parameters. 2 techniques are used to make
+these alignments very fast: 1) only exactly matching fragments (k-tuples) are
+considered; 2) only the 'best' diagonals (the ones with most k-tuple matches)
+are used.
+
+K-TUPLE SIZE: This is the size of exactly matching fragment that is used.
+INCREASE for speed (max= 2 for proteins; 4 for DNA), DECREASE for sensitivity.
+For longer sequences (e.g. >1000 residues) you may need to increase the default.
+
+GAP PENALTY: This is a penalty for each gap in the fast alignments. It has
+little affect on the speed or sensitivity except for extreme values.
+
+TOP DIAGONALS: The number of k-tuple matches on each diagonal (in an imaginary
+dot-matrix plot) is calculated. Only the best ones (with most matches) are
+used in the alignment. This parameter specifies how many. Decrease for speed;
+increase for sensitivity.
+
+WINDOW SIZE: This is the number of diagonals around each of the 'best'
+diagonals that will be used. Decrease for speed; increase for sensitivity.
+
+
+>> HELP 4 << Help for multiple alignment parameters
+
+These parameters control the final multiple alignment. This is the core of the
+program and the details are complicated. To fully understand the use of the
+parameters and the scoring system, you will have to refer to the documentation.
+
+Each step in the final multiple alignment consists of aligning two alignments
+or sequences. This is done progressively, following the branching order in
+the GUIDE TREE. The basic parameters to control this are two gap penalties and
+the scores for various identical-non-indentical residues.
+
+1) and 2) The GAP PENALTIES are set by menu items 1 and 2. These control the
+cost of opening up every new gap and the cost of every item in a gap.
+Increasing the gap opening penalty will make gaps less frequent. Increasing
+the gap extension penalty will make gaps shorter. Terminal gaps are not
+penalised.
+
+3) The DELAY DIVERGENT SEQUENCES switch delays the alignment of the most
+distantly related sequences until after the most closely related sequences have
+been aligned. The setting shows the percent identity level required to delay
+the addition of a sequence; sequences that are less identical than this level
+to any other sequences will be aligned later.
+
+
+
+4) The TRANSITION WEIGHT gives transitions (A <--> G or C <--> T
+i.e. purine-purine or pyrimidine-pyrimidine substitutions) a weight between 0
+and 1; a weight of zero means that the transitions are scored as mismatches,
+while a weight of 1 gives the transitions the match score. For distantly related
+DNA sequences, the weight should be near to zero; for closely related sequences
+it can be useful to assign a higher score.
+
+
+5) PROTEIN WEIGHT MATRIX leads to a new menu where you are offered a choice of
+weight matrices. The default for proteins in version 1.8 is the PAM series
+derived by Gonnet and colleagues. Note, a series is used! The actual matrix
+that is used depends on how similar the sequences to be aligned at this
+alignment step are. Different matrices work differently at each evolutionary
+distance.
+
+6) DNA WEIGHT MATRIX leads to a new menu where a single matrix (not a series)
+can be selected. The default is the matrix used by BESTFIT for comparison of
+nucleic acid sequences.
+
+Further help is offered in the weight matrix menu.
+
+
+7) In the weight matrices, you can use negative as well as positive values if
+you wish, although the matrix will be automatically adjusted to all positive
+scores, unless the NEGATIVE MATRIX option is selected.
+
+8) PROTEIN GAP PARAMETERS displays a menu allowing you to set some Gap Penalty
+options which are only used in protein alignments.
+
+
+>> HELP A << Help for protein gap parameters.
+
+1) RESIDUE SPECIFIC PENALTIES are amino acid specific gap penalties that reduce
+or increase the gap opening penalties at each position in the alignment or
+sequence. See the documentation for details. As an example, positions that
+are rich in glycine are more likely to have an adjacent gap than positions that
+are rich in valine.
+
+2) 3) HYDROPHILIC GAP PENALTIES are used to increase the chances of a gap within
+a run (5 or more residues) of hydrophilic amino acids; these are likely to
+be loop or random coil regions where gaps are more common. The residues that
+are "considered" to be hydrophilic are set by menu item 3.
+
+4) GAP SEPARATION DISTANCE tries to decrease the chances of gaps being too
+close to each other. Gaps that are less than this distance apart are penalised
+more than other gaps. This does not prevent close gaps; it makes them less
+frequent, promoting a block-like appearance of the alignment.
+
+5) END GAP SEPARATION treats end gaps just like internal gaps for the purposes
+of avoiding gaps that are too close (set by GAP SEPARATION DISTANCE above).
+If you turn this off, end gaps will be ignored for this purpose. This is
+useful when you wish to align fragments where the end gaps are not biologically
+meaningful.
+
+
+>> HELP 5 << Help for output format options.
+
+Six output formats are offered. You can choose any (or all 6 if you wish).
+
+CLUSTAL format output is a self explanatory alignment format. It shows the
+sequences aligned in blocks. It can be read in again at a later date to
+(for example) calculate a phylogenetic tree or add a new sequence with a
+profile alignment.
+
+GCG output can be used by any of the GCG programs that can work on multiple
+alignments (e.g. PRETTY, PROFILEMAKE, PLOTALIGN). It is the same as the GCG
+.msf format files (multiple sequence file); new in version 7 of GCG.
+
+PHYLIP format output can be used for input to the PHYLIP package of Joe
+Felsenstein. This is an extremely widely used package for doing every
+imaginable form of phylogenetic analysis (MUCH more than the the modest intro-
+duction offered by this program).
+
+NBRF-PIR: this is the same as the standard PIR format with ONE ADDITION. Gap
+characters "-" are used to indicate the positions of gaps in the multiple
+alignment. These files can be re-used as input in any part of clustal that
+allows sequences (or alignments or profiles) to be read in.
+
+GDE: this is the flat file format used by the GDE package of Steven Smith.
+
+NEXUS: the format used by several phylogeny programs, including PAUP and
+MacClade.
+
+GDE OUTPUT CASE: sequences in GDE format may be written in either upper or
+lower case.
+
+CLUSTALW SEQUENCE NUMBERS: residue numbers may be added to the end of the
+alignment lines in clustalw format.
+
+OUTPUT ORDER is used to control the order of the sequences in the output
+alignments. By default, the order corresponds to the order in which the
+sequences were aligned (from the guide tree-dendrogram), thus automatically
+grouping closely related sequences. This switch can be used to set the order
+to the same as the input file.
+
+PARAMETER OUTPUT: This option allows you to save all your parameter settings
+in a parameter file. This file can be used subsequently to rerun Clustal W
+using the same parameters.
+
+
+>> HELP 6 << Help for profile and structure alignments
+
+By PROFILE ALIGNMENT, we mean alignment using existing alignments. Profile
+alignments allow you to store alignments of your favourite sequences and add
+new sequences to them in small bunches at a time. A profile is simply an
+alignment of one or more sequences (e.g. an alignment output file from CLUSTAL
+W). Each input can be a single sequence. One or both sets of input sequences
+may include secondary structure assignments or gap penalty masks to guide the
+alignment.
+
+The profiles can be in any of the allowed input formats with "-" characters
+used to specify gaps (except for MSF-RSF where "." is used).
+
+You have to specify the 2 profiles by choosing menu items 1 and 2 and giving
+2 file names. Then Menu item 3 will align the 2 profiles to each other.
+Secondary structure masks in either profile can be used to guide the alignment.
+
+Menu item 4 will take the sequences in the second profile and align them to
+the first profile, 1 at a time. This is useful to add some new sequences to
+an existing alignment, or to align a set of sequences to a known structure.
+In this case, the second profile would not be pre-aligned.
+
+
+The alignment parameters can be set using menu items 5, 6 and 7. These are
+EXACTLY the same parameters as used by the general, automatic multiple
+alignment procedure. The general multiple alignment procedure is simply a
+series of profile alignments. Carrying out a series of profile alignments on
+larger and larger groups of sequences, allows you to manually build up a
+complete alignment, if necessary editing intermediate alignments.
+
+SECONDARY STRUCTURE OPTIONS. Menu Option 0 allows you to set 2D structure
+parameters. If a solved structure is available, it can be used to guide the
+alignment by raising gap penalties within secondary structure elements, so
+that gaps will preferentially be inserted into unstructured surface loops.
+Alternatively, a user-specified gap penalty mask can be supplied directly.
+
+A gap penalty mask is a series of numbers between 1 and 9, one per position in
+the alignment. Each number specifies how much the gap opening penalty is to be
+raised at that position (raised by multiplying the basic gap opening penalty
+by the number) i.e. a mask figure of 1 at a position means no change
+in gap opening penalty; a figure of 4 means that the gap opening penalty is
+four times greater at that position, making gaps 4 times harder to open.
+
+The format for gap penalty masks and secondary structure masks is explained
+in the help under option 0 (secondary structure options).
+
+
+>> HELP B << Help for secondary structure - gap penalty masks
+
+The use of secondary structure-based penalties has been shown to improve the
+accuracy of multiple alignment. Therefore CLUSTAL W now allows gap penalty
+masks to be supplied with the input sequences. The masks work by raising gap
+penalties in specified regions (typically secondary structure elements) so that
+gaps are preferentially opened in the less well conserved regions (typically
+surface loops).
+
+Options 1 and 2 control whether the input secondary structure information or
+gap penalty masks will be used.
+
+Option 3 controls whether the secondary structure and gap penalty masks should
+be included in the output alignment.
+
+Options 4 and 5 provide the value for raising the gap penalty at core Alpha
+Helical (A) and Beta Strand (B) residues. In CLUSTAL format, capital residues
+denote the A and B core structure notation. The basic gap penalties are
+multiplied by the amount specified.
+
+Option 6 provides the value for the gap penalty in Loops. By default this
+penalty is not raised. In CLUSTAL format, loops are specified by "." in the
+secondary structure notation.
+
+Option 7 provides the value for setting the gap penalty at the ends of
+secondary structures. Ends of secondary structures are observed to grow
+and-or shrink in related structures. Therefore by default these are given
+intermediate values, lower than the core penalties. All secondary structure
+read in as lower case in CLUSTAL format gets the reduced terminal penalty.
+
+Options 8 and 9 specify the range of structure termini for the intermediate
+penalties. In the alignment output, these are indicated as lower case.
+For Alpha Helices, by default, the range spans the end helical turn. For
+Beta Strands, the default range spans the end residue and the adjacent loop
+residue, since sequence conservation often extends beyond the actual H-bonded
+Beta Strand.
+
+CLUSTAL W can read the masks from SWISS-PROT, CLUSTAL or GDE format input
+files. For many 3-D protein structures, secondary structure information is
+recorded in the feature tables of SWISS-PROT database entries. You should
+always check that the assignments are correct - some are quite inaccurate.
+CLUSTAL W looks for SWISS-PROT HELIX and STRAND assignments e.g.
+
+FT HELIX 100 115
+FT STRAND 118 119
+
+The structure and penalty masks can also be read from CLUSTAL alignment format
+as comment lines beginning "!SS_" or "!GM_" e.g.
+
+!SS_HBA_HUMA ..aaaAAAAAAAAAAaaa.aaaAAAAAAAAAAaaaaaaAaaa.........aaaAAAAAA
+!GM_HBA_HUMA 112224444444444222122244444444442222224222111111111222444444
+HBA_HUMA VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK
+
+Note that the mask itself is a set of numbers between 1 and 9 each of which is
+assigned to the residue(s) in the same column below.
+
+In GDE flat file format, the masks are specified as text and the names must
+begin with "SS_ or "GM_.
+
+Either a structure or penalty mask or both may be used. If both are included in
+an alignment, the user will be asked which is to be used.
+
+
+>> HELP C << Help for secondary structure - gap penalty mask output options
+
+The options in this menu let you choose whether or not to include the masks
+in the CLUSTAL W output alignments. Showing both is useful for understanding
+how the masks work. The secondary structure information is itself very useful
+in judging the alignment quality and in seeing how residue conservation
+patterns vary with secondary structure.
+
+
+>> HELP 7 << Help for phylogenetic trees
+
+1) Before calculating a tree, you must have an ALIGNMENT in memory. This can be
+input in any format or you should have just carried out a full multiple
+alignment and the alignment is still in memory.
+
+
+*************** Remember YOU MUST ALIGN THE SEQUENCES FIRST!!!! ***************
+
+
+The methods used are NJ (Neighbour Joining) and UPGMA. First
+you calculate distances (percent divergence) between all pairs of sequence from
+a multiple alignment; second you apply the NJ or UPGMA method to the distance matrix.
+
+2) EXCLUDE POSITIONS WITH GAPS? With this option, any alignment positions where
+ANY of the sequences have a gap will be ignored. This means that 'like' will be
+compared to 'like' in all distances, which is highly desirable. It also
+automatically throws away the most ambiguous parts of the alignment, which are
+concentrated around gaps (usually). The disadvantage is that you may throw away
+much of the data if there are many gaps (which is why it is difficult for us to
+make it the default).
+
+
+
+3) CORRECT FOR MULTIPLE SUBSTITUTIONS? For small divergence (say <10%) this
+option makes no difference. For greater divergence, it corrects for the fact
+that observed distances underestimate actual evolutionary distances. This is
+because, as sequences diverge, more than one substitution will happen at many
+sites. However, you only see one difference when you look at the present day
+sequences. Therefore, this option has the effect of stretching branch lengths
+in trees (especially long branches). The corrections used here (for DNA or
+proteins) are both due to Motoo Kimura. See the documentation for details.
+
+Where possible, this option should be used. However, for VERY divergent
+sequences, the distances cannot be reliably corrected. You will be warned if
+this happens. Even if none of the distances in a data set exceed the reliable
+threshold, if you bootstrap the data, some of the bootstrap distances may
+randomly exceed the safe limit.
+
+4) To calculate a tree, use option 4 (DRAW TREE NOW). This gives an UNROOTED
+tree and all branch lengths. The root of the tree can only be inferred by
+using an outgroup (a sequence that you are certain branches at the outside
+of the tree .... certain on biological grounds) OR if you assume a degree
+of constancy in the 'molecular clock', you can place the root in the 'middle'
+of the tree (roughly equidistant from all tips).
+
+5) TOGGLE PHYLIP BOOTSTRAP POSITIONS
+By default, the bootstrap values are correctly placed on the tree branches of
+the phylip format output tree. The toggle allows them to be placed on the
+nodes, which is incorrect, but some display packages (e.g. TreeTool, TreeView
+and Phylowin) only support node labelling but not branch labelling. Care
+should be taken to note which branches and labels go together.
+
+6) OUTPUT FORMATS: four different formats are allowed. None of these displays
+the tree visually. Useful display programs accepting PHYLIP format include
+NJplot (from Manolo Gouy and supplied with Clustal W), TreeView (Mac-PC), and
+PHYLIP itself - OR get the PHYLIP package and use the tree drawing facilities
+there. (Get the PHYLIP package anyway if you are interested in trees). The
+NEXUS format can be read into PAUP or MacClade.
+
+
+>> HELP 8 << Help for choosing a weight matrix
+
+For protein alignments, you use a weight matrix to determine the similarity of
+non-identical amino acids. For example, Tyr aligned with Phe is usually judged
+to be 'better' than Tyr aligned with Pro.
+
+There are three 'in-built' series of weight matrices offered. Each consists of
+several matrices which work differently at different evolutionary distances. To
+see the exact details, read the documentation. Crudely, we store several
+matrices in memory, spanning the full range of amino acid distance (from almost
+identical sequences to highly divergent ones). For very similar sequences, it
+is best to use a strict weight matrix which only gives a high score to
+identities and the most favoured conservative substitutions. For more divergent
+sequences, it is appropriate to use "softer" matrices which give a high score
+to many other frequent substitutions.
+
+1) BLOSUM (Henikoff). These matrices appear to be the best available for
+carrying out database similarity (homology searches). The matrices used are:
+Blosum 80, 62, 45 and 30. (BLOSUM was the default in earlier Clustal W
+versions)
+
+2) PAM (Dayhoff). These have been extremely widely used since the late '70s.
+We use the PAM 20, 60, 120 and 350 matrices.
+
+3) GONNET. These matrices were derived using almost the same procedure as the
+Dayhoff one (above) but are much more up to date and are based on a far larger
+data set. They appear to be more sensitive than the Dayhoff series. We use the
+GONNET 80, 120, 160, 250 and 350 matrices. This series is the default for
+Clustal W version 1.8.
+
+We also supply an identity matrix which gives a score of 1.0 to two identical
+amino acids and a score of zero otherwise. This matrix is not very useful.
+Alternatively, you can read in your own (just one matrix, not a series).
+
+A new matrix can be read from a file on disk, if the filename consists only
+of lower case characters. The values in the new weight matrix must be integers
+and the scores should be similarities. You can use negative as well as positive
+values if you wish, although the matrix will be automatically adjusted to all
+positive scores.
+
+
+
+For DNA, a single matrix (not a series) is used. Two hard-coded matrices are
+available:
+
+
+1) IUB. This is the default scoring matrix used by BESTFIT for the comparison
+of nucleic acid sequences. X's and N's are treated as matches to any IUB
+ambiguity symbol. All matches score 1.9; all mismatches for IUB symbols score 0.
+
+
+2) CLUSTALW(1.6). The previous system used by Clustal W, in which matches score
+1.0 and mismatches score 0. All matches for IUB symbols also score 0.
+
+INPUT FORMAT The format used for a new matrix is the same as the BLAST program.
+Any lines beginning with a # character are assumed to be comments. The first
+non-comment line should contain a list of amino acids in any order, using the
+1 letter code, followed by a * character. This should be followed by a square
+matrix of integer scores, with one row and one column for each amino acid. The
+last row and column of the matrix (corresponding to the * character) contain
+the minimum score over the whole matrix.
+
+
+>> HELP 9 << Help for command line parameters
+
+ DATA (sequences)
+
+-INFILE=file.ext :input sequences.
+-PROFILE1=file.ext and -PROFILE2=file.ext :profiles (old alignment).
+
+
+ VERBS (do things)
+
+-OPTIONS :list the command line parameters
+-HELP or -CHECK :outline the command line params.
+-FULLHELP :output full help content.
+-ALIGN :do full multiple alignment.
+-TREE :calculate NJ tree.
+-PIM :output percent identity matrix (while calculating the tree)
+-BOOTSTRAP(=n) :bootstrap a NJ tree (n= number of bootstraps; def. = 1000).
+-CONVERT :output the input sequences in a different file format.
+
+
+ PARAMETERS (set things)
+
+***General settings:****
+-INTERACTIVE :read command line, then enter normal interactive menus
+-QUICKTREE :use FAST algorithm for the alignment guide tree
+-TYPE= :PROTEIN or DNA sequences
+-NEGATIVE :protein alignment with negative values in matrix
+-OUTFILE= :sequence alignment file name
+-OUTPUT= :GCG, GDE, PHYLIP, PIR or NEXUS
+-OUTORDER= :INPUT or ALIGNED
+-CASE :LOWER or UPPER (for GDE output only)
+-SEQNOS= :OFF or ON (for Clustal output only)
+-SEQNO_RANGE=:OFF or ON (NEW: for all output formats)
+-RANGE=m,n :sequence range to write starting m to m+n
+-MAXSEQLEN=n :maximum allowed input sequence length
+-QUIET :Reduce console output to minimum
+-STATS= :Log some alignents statistics to file
+
+***Fast Pairwise Alignments:***
+-KTUPLE=n :word size
+-TOPDIAGS=n :number of best diags.
+-WINDOW=n :window around best diags.
+-PAIRGAP=n :gap penalty
+-SCORE :PERCENT or ABSOLUTE
+
+
+***Slow Pairwise Alignments:***
+-PWMATRIX= :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
+-PWDNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename
+-PWGAPOPEN=f :gap opening penalty
+-PWGAPEXT=f :gap opening penalty
+
+
+***Multiple Alignments:***
+-NEWTREE= :file for new guide tree
+-USETREE= :file for old guide tree
+-MATRIX= :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
+-DNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename
+-GAPOPEN=f :gap opening penalty
+-GAPEXT=f :gap extension penalty
+-ENDGAPS :no end gap separation pen.
+-GAPDIST=n :gap separation pen. range
+-NOPGAP :residue-specific gaps off
+-NOHGAP :hydrophilic gaps off
+-HGAPRESIDUES= :list hydrophilic res.
+-MAXDIV=n :% ident. for delay
+-TYPE= :PROTEIN or DNA
+-TRANSWEIGHT=f :transitions weighting
+-ITERATION= :NONE or TREE or ALIGNMENT
+-NUMITER=n :maximum number of iterations to perform
+-NOWEIGHTS :disable sequence weighting
+
+
+***Profile Alignments:***
+-PROFILE :Merge two alignments by profile alignment
+-NEWTREE1= :file for new guide tree for profile1
+-NEWTREE2= :file for new guide tree for profile2
+-USETREE1= :file for old guide tree for profile1
+-USETREE2= :file for old guide tree for profile2
+
+
+***Sequence to Profile Alignments:***
+-SEQUENCES :Sequentially add profile2 sequences to profile1 alignment
+-NEWTREE= :file for new guide tree
+-USETREE= :file for old guide tree
+
+
+***Structure Alignments:***
+-NOSECSTR1 :do not use secondary structure-gap penalty mask for profile 1
+-NOSECSTR2 :do not use secondary structure-gap penalty mask for profile 2
+-SECSTROUT=STRUCTURE or MASK or BOTH or NONE :output in alignment file
+-HELIXGAP=n :gap penalty for helix core residues
+-STRANDGAP=n :gap penalty for strand core residues
+-LOOPGAP=n :gap penalty for loop regions
+-TERMINALGAP=n :gap penalty for structure termini
+-HELIXENDIN=n :number of residues inside helix to be treated as terminal
+-HELIXENDOUT=n :number of residues outside helix to be treated as terminal
+-STRANDENDIN=n :number of residues inside strand to be treated as terminal
+-STRANDENDOUT=n:number of residues outside strand to be treated as terminal
+
+
+***Trees:***
+-OUTPUTTREE=nj OR phylip OR dist OR nexus
+-SEED=n :seed number for bootstraps.
+-KIMURA :use Kimura's correction.
+-TOSSGAPS :ignore positions with gaps.
+-BOOTLABELS=node OR branch :position of bootstrap values in tree display
+-CLUSTERING= :NJ or UPGMA
+
+
+>> HELP 0 << Help for tree output format options
+
+Four output formats are offered: 1) Clustal, 2) Phylip, 3) Just the distances
+4) Nexus
+
+None of these formats displays the results graphically. Many packages can
+display trees in the the PHYLIP format 2) below. It can also be imported into
+the PHYLIP programs RETREE, DRAWTREE and DRAWGRAM for graphical display.
+NEXUS format trees can be read by PAUP and MacClade.
+
+1) Clustal format output.
+This format is verbose and lists all of the distances between the sequences and
+the number of alignment positions used for each. The tree is described at the
+end of the file. It lists the sequences that are joined at each alignment step
+and the branch lengths. After two sequences are joined, it is referred to later
+as a NODE. The number of a NODE is the number of the lowest sequence in that
+NODE.
+
+2) Phylip format output.
+This format is the New Hampshire format, used by many phylogenetic analysis
+packages. It consists of a series of nested parentheses, describing the
+branching order, with the sequence names and branch lengths. It can be used by
+the RETREE, DRAWGRAM and DRAWTREE programs of the PHYLIP package to see the
+trees graphically. This is the same format used during multiple alignment for
+the guide trees.
+
+Use this format with NJplot (Manolo Gouy), supplied with Clustal W. Some other
+packages that can read and display New Hampshire format are TreeView (Mac/PC),
+TreeTool (UNIX), and Phylowin.
+
+3) The distances only.
+This format just outputs a matrix of all the pairwise distances in a format
+that can be used by the Phylip package. It used to be useful when one could not
+produce distances from protein sequences in the Phylip package but is now
+redundant (Protdist of Phylip 3.5 now does this).
+
+4) NEXUS FORMAT TREE. This format is used by several popular phylogeny programs,
+including PAUP and MacClade. The format is described fully in:
+Maddison, D. R., D. L. Swofford and W. P. Maddison. 1997.
+NEXUS: an extensible file format for systematic information.
+Systematic Biology 46:590-621.
+
+5) TOGGLE PHYLIP BOOTSTRAP POSITIONS
+By default, the bootstrap values are placed on the nodes of the phylip format
+output tree. This is inaccurate as the bootstrap values should be associated
+with the tree branches and not the nodes. However, this format can be read and
+displayed by TreeTool, TreeView and Phylowin. An option is available to
+correctly place the bootstrap values on the branches with which they are
+associated.