From: pvtroshin Date: Tue, 21 Jun 2011 14:34:00 +0000 (+0000) Subject: getting rid of ambiguous documentation folder. all prog_doc are now stored in website... X-Git-Url: http://source.jalview.org/gitweb/?a=commitdiff_plain;h=ec9abbd86a00354b78df7d026275f4ceaca65605;p=jabaws.git getting rid of ambiguous documentation folder. all prog_doc are now stored in website/prog_docs git-svn-id: link to svn.lifesci.dundee.ac.uk/svn/barton/ptroshin/JABA2@4299 e3abac25-378b-4346-85de-24260fe3988d --- diff --git a/binaries/help/AACon_manual.txt b/binaries/help/AACon_manual.txt deleted file mode 100644 index e9b4b53..0000000 --- a/binaries/help/AACon_manual.txt +++ /dev/null @@ -1,94 +0,0 @@ - -AA Conservation version 1.0b (2 September 2010) - -This program allows calculation of conservation of amino acids in -multiple sequence alignments. -It implements 17 different conservation scores as described by Valdar in -his paper (Scoring Residue Conservation, PROTEINS: Structure, Function -and Bioinformatics 48:227-241 (2002)) and SMERFS scoring algorithm as described -by Manning, Jefferson and Barton (The contrasting properties of conservation -and correlated phylogeny in protein functional residue prediction, -BMC Bioinformatics (2008)). - -The conservation algorithms supported are: - -KABAT, JORES, SCHNEIDER, SHENKIN, GERSTEIN, TAYLOR_GAPS, TAYLOR_NO_GAPS, -ZVELIBIL, KARLIN, ARMON, THOMPSON, NOT_LANCET, MIRNY, WILLIAMSON, -LANDGRAF, SANDER, VALDAR, SMERFS - -Input format is either a FASTA formatted file containing aligned sequences with -gaps or a Clustal alignment. The valid gap characters are *, -, space character, -X and . (a dot). By default program prints the results to the command window. -If the output file is provided the results are printed to the file in two -possible formats with or without an alignment. -If format is not specified, the program outputs conservation scores without -alignment. The scores are not normalized by default but they can be (see below). -SMERFS default parameters are window width of 7, column score is set to -the middle column (MID_SCORE), gap% cutoff of 0.1. Different parameters for SMERFS -can be provided (see below). Details of the program execution can be recorded to -a separate file if an appropriate file path is provided. - -List of command line arguments: - --m= precedes a comma separated list of method names - EXAMPLE: -m=KABAT,JORES,GERSTEIN - Optional, if no method is specified request for all is assumed. - --i= precedes a full path to the input FASTA file, required - --o= precedes a full path to the output file, optional, if no output file is - provided the program will output to the standard out. - --t= precedes the number of CPUs (CPU cores more precisely) to use. Optional, - defaults to all processors available on the machine. - --f= precedes the format of the results in the output file - two different formats are possible: - RESULT_WITH_ALIGNMENT - RESULT_NO_ALIGNMENT - Optional, if not specified RESULT_NO_ALIGNMENT is assumed - --d= precedes a full path to a file where program execution details are to be - listed. Optional, if not provided, no execution statistics is produced. - --g= precedes comma separated list of gap characters provided by the user, if - you're using an unusual gap character (not a -,., ,*,X) you have to - provide it. If you you provide this list you have to list all the gaps - accepted. Including those that were previously treated as a default. - Optional. - --n using this key causes the results to be normalized. - Normalized results have values between 0 and 1. Please note however, that - some results cannot be normalized. In such a case, the system returns not - normalized value, and log the issue to the standard error stream. - The following formula is used for normalization - n = (d - dmin)/(dmax - dmin) - Negative results first converted to positive by adding an absolute value of - the most negative result. Optional. - -SMERFS Only Parameters: - --smerfsGT= precedes SMERFS Gap Treshold - a gap percentage cutoff - - a float greater than 0 and smaller or equal 1. Optional defaults - to 0.1 - --smerfsCS= precedes SMERFS Column Score algorithm defines the window scores to - columns allocation , two methods are possible: - MID_SCORE - gives the window score to the middle column - MAX_SCORE - gives the column the highest score of all the windows it - belongs to. Optional defaults to MID_SCORE. - --smerfsWW= precedes Window Width parameter - an integer and an odd number. - Optional, defaults to 7 - - -EXAMPLE HOW TO RUN THE PROGRAM: -java -jar -m=KABAT,SMERFS -i=prot1 -o=prot1_results -n - -As a result of the execution KABAT and SMERFS scores will be calculated. -Input comes form prot1 file and an output without an alignment is recorded to -prot1_results file. - -Authors: Peter Troshin, Agnieszka Golicz, David Martin and Geoff Barton. -Please visit http://www.compbio.dundee.ac.uk/aacon for further information. - \ No newline at end of file diff --git a/binaries/help/clustalw-help.txt b/binaries/help/clustalw-help.txt deleted file mode 100644 index fd50671..0000000 --- a/binaries/help/clustalw-help.txt +++ /dev/null @@ -1,720 +0,0 @@ - - - - CLUSTAL 2.0.12 Multiple Sequence Alignments - - - - ->> HELP NEW << NEW FEATURES/OPTIONS - -==UPGMA== - The UPGMA algorithm has been added to allow faster tree construction. The user now - has the choice of using Neighbour Joining or UPGMA. The default is still NJ, but the - user can change this by setting the clustering parameter. - - -CLUSTERING= :NJ or UPGMA - -==ITERATION== - - A remove first iteration scheme has been added. This can be used to improve the final - alignment or improve the alignment at each stage of the progressive alignment. During the - iteration step each sequence is removed in turn and realigned. If the resulting alignment - is better than the previous alignment it is kept. This process is repeated until the score - converges (the score is not improved) or until the maximum number of iterations is - reached. The user can iterate at each step of the progressive alignment by setting the - iteration parameter to TREE or just on the final alignment by seting the iteration - parameter to ALIGNMENT. The default is no iteration. The maximum number of iterations can - be set using the numiter parameter. The default number of iterations is 3. - - -ITERATION= :NONE or TREE or ALIGNMENT - - -NUMITER=n :Maximum number of iterations to perform - -==HELP== - - -FULLHELP :Print out the complete help content - -==MISC== - - -MAXSEQLEN=n :Maximum allowed sequence length - - -QUIET :Reduce console output to minimum - - -STATS=file :Log some alignents statistics to file - - ->> HELP 1 << General help for CLUSTAL W (2.0.12) - -Clustal W is a general purpose multiple alignment program for DNA or proteins. - -SEQUENCE INPUT: all sequences must be in 1 file, one after another. -7 formats are automatically recognised: NBRF-PIR, EMBL-SWISSPROT, -Pearson (Fasta), Clustal (*.aln), GCG-MSF (Pileup), GCG9-RSF and GDE flat file. -All non-alphabetic characters (spaces, digits, punctuation marks) are ignored -except "-" which is used to indicate a GAP ("." in MSF-RSF). - -To do a MULTIPLE ALIGNMENT on a set of sequences, use item 1 from this menu to -INPUT them; go to menu item 2 to do the multiple alignment. - -PROFILE ALIGNMENTS (menu item 3) are used to align 2 alignments. Use this to -add a new sequence to an old alignment, or to use secondary structure to guide -the alignment process. GAPS in the old alignments are indicated using the "-" -character. PROFILES can be input in ANY of the allowed formats; just -use "-" (or "." for MSF-RSF) for each gap position. - -PHYLOGENETIC TREES (menu item 4) can be calculated from old alignments (read in -with "-" characters to indicate gaps) OR after a multiple alignment while the -alignment is still in memory. - - -The program tries to automatically recognise the different file formats used -and to guess whether the sequences are amino acid or nucleotide. This is not -always foolproof. - -FASTA and NBRF-PIR formats are recognised by having a ">" as the first -character in the file. - -EMBL-Swiss Prot formats are recognised by the letters -ID at the start of the file (the token for the entry name field). - -CLUSTAL format is recognised by the word CLUSTAL at the beginning of the file. - -GCG-MSF format is recognised by one of the following: - - the word PileUp at the start of the file. - - the word !!AA_MULTIPLE_ALIGNMENT or !!NA_MULTIPLE_ALIGNMENT - at the start of the file. - - the word MSF on the first line of the line, and the characters .. - at the end of this line. - -GCG-RSF format is recognised by the word !!RICH_SEQUENCE at the beginning of -the file. - - -If 85% or more of the characters in the sequence are from A,C,G,T,U or N, the -sequence will be assumed to be nucleotide. This works in 97.3% of cases -but watch out! - - ->> HELP 2 << Help for multiple alignments - -If you have already loaded sequences, use menu item 1 to do the complete -multiple alignment. You will be prompted for 2 output files: 1 for the -alignment itself; another to store a dendrogram that describes the similarity -of the sequences to each other. - -Multiple alignments are carried out in 3 stages (automatically done from menu -item 1 ...Do complete multiple alignments now): - -1) all sequences are compared to each other (pairwise alignments); - -2) a dendrogram (like a phylogenetic tree) is constructed, describing the -approximate groupings of the sequences by similarity (stored in a file). - -3) the final multiple alignment is carried out, using the dendrogram as a guide. - - -PAIRWISE ALIGNMENT parameters control the speed-sensitivity of the initial -alignments. - -MULTIPLE ALIGNMENT parameters control the gaps in the final multiple alignments. - - -RESET GAPS (menu item 7) will remove any new gaps introduced into the sequences -during multiple alignment if you wish to change the parameters and try again. -This only takes effect just before you do a second multiple alignment. You -can make phylogenetic trees after alignment whether or not this is ON. -If you turn this OFF, the new gaps are kept even if you do a second multiple -alignment. This allows you to iterate the alignment gradually. Sometimes, the -alignment is improved by a second or third pass. - -SCREEN DISPLAY (menu item 8) can be used to send the output alignments to the -screen as well as to the output file. - -You can skip the first stages (pairwise alignments; dendrogram) by using an -old dendrogram file (menu item 3); or you can just produce the dendrogram -with no final multiple alignment (menu item 2). - - -OUTPUT FORMAT: Menu item 9 (format options) allows you to choose from 6 -different alignment formats (CLUSTAL, GCG, NBRF-PIR, PHYLIP, GDE, NEXUS, and FASTA). - - - ->> HELP 3 << Help for pairwise alignment parameters - -A distance is calculated between every pair of sequences and these are used to -construct the dendrogram which guides the final multiple alignment. The scores -are calculated from separate pairwise alignments. These can be calculated using -2 methods: dynamic programming (slow but accurate) or by the method of Wilbur -and Lipman (extremely fast but approximate). - -You can choose between the 2 alignment methods using menu option 8. The -slow-accurate method is fine for short sequences but will be VERY SLOW for -many (e.g. >100) long (e.g. >1000 residue) sequences. - -SLOW-ACCURATE alignment parameters: - These parameters do not have any affect on the speed of the alignments. -They are used to give initial alignments which are then rescored to give percent -identity scores. These % scores are the ones which are displayed on the -screen. The scores are converted to distances for the trees. - -1) Gap Open Penalty: the penalty for opening a gap in the alignment. -2) Gap extension penalty: the penalty for extending a gap by 1 residue. -3) Protein weight matrix: the scoring table which describes the similarity - of each amino acid to each other. -4) DNA weight matrix: the scores assigned to matches and mismatches - (including IUB ambiguity codes). - - -FAST-APPROXIMATE alignment parameters: - -These similarity scores are calculated from fast, approximate, global align- -ments, which are controlled by 4 parameters. 2 techniques are used to make -these alignments very fast: 1) only exactly matching fragments (k-tuples) are -considered; 2) only the 'best' diagonals (the ones with most k-tuple matches) -are used. - -K-TUPLE SIZE: This is the size of exactly matching fragment that is used. -INCREASE for speed (max= 2 for proteins; 4 for DNA), DECREASE for sensitivity. -For longer sequences (e.g. >1000 residues) you may need to increase the default. - -GAP PENALTY: This is a penalty for each gap in the fast alignments. It has -little affect on the speed or sensitivity except for extreme values. - -TOP DIAGONALS: The number of k-tuple matches on each diagonal (in an imaginary -dot-matrix plot) is calculated. Only the best ones (with most matches) are -used in the alignment. This parameter specifies how many. Decrease for speed; -increase for sensitivity. - -WINDOW SIZE: This is the number of diagonals around each of the 'best' -diagonals that will be used. Decrease for speed; increase for sensitivity. - - ->> HELP 4 << Help for multiple alignment parameters - -These parameters control the final multiple alignment. This is the core of the -program and the details are complicated. To fully understand the use of the -parameters and the scoring system, you will have to refer to the documentation. - -Each step in the final multiple alignment consists of aligning two alignments -or sequences. This is done progressively, following the branching order in -the GUIDE TREE. The basic parameters to control this are two gap penalties and -the scores for various identical-non-indentical residues. - -1) and 2) The GAP PENALTIES are set by menu items 1 and 2. These control the -cost of opening up every new gap and the cost of every item in a gap. -Increasing the gap opening penalty will make gaps less frequent. Increasing -the gap extension penalty will make gaps shorter. Terminal gaps are not -penalised. - -3) The DELAY DIVERGENT SEQUENCES switch delays the alignment of the most -distantly related sequences until after the most closely related sequences have -been aligned. The setting shows the percent identity level required to delay -the addition of a sequence; sequences that are less identical than this level -to any other sequences will be aligned later. - - - -4) The TRANSITION WEIGHT gives transitions (A <--> G or C <--> T -i.e. purine-purine or pyrimidine-pyrimidine substitutions) a weight between 0 -and 1; a weight of zero means that the transitions are scored as mismatches, -while a weight of 1 gives the transitions the match score. For distantly related -DNA sequences, the weight should be near to zero; for closely related sequences -it can be useful to assign a higher score. - - -5) PROTEIN WEIGHT MATRIX leads to a new menu where you are offered a choice of -weight matrices. The default for proteins in version 1.8 is the PAM series -derived by Gonnet and colleagues. Note, a series is used! The actual matrix -that is used depends on how similar the sequences to be aligned at this -alignment step are. Different matrices work differently at each evolutionary -distance. - -6) DNA WEIGHT MATRIX leads to a new menu where a single matrix (not a series) -can be selected. The default is the matrix used by BESTFIT for comparison of -nucleic acid sequences. - -Further help is offered in the weight matrix menu. - - -7) In the weight matrices, you can use negative as well as positive values if -you wish, although the matrix will be automatically adjusted to all positive -scores, unless the NEGATIVE MATRIX option is selected. - -8) PROTEIN GAP PARAMETERS displays a menu allowing you to set some Gap Penalty -options which are only used in protein alignments. - - ->> HELP A << Help for protein gap parameters. - -1) RESIDUE SPECIFIC PENALTIES are amino acid specific gap penalties that reduce -or increase the gap opening penalties at each position in the alignment or -sequence. See the documentation for details. As an example, positions that -are rich in glycine are more likely to have an adjacent gap than positions that -are rich in valine. - -2) 3) HYDROPHILIC GAP PENALTIES are used to increase the chances of a gap within -a run (5 or more residues) of hydrophilic amino acids; these are likely to -be loop or random coil regions where gaps are more common. The residues that -are "considered" to be hydrophilic are set by menu item 3. - -4) GAP SEPARATION DISTANCE tries to decrease the chances of gaps being too -close to each other. Gaps that are less than this distance apart are penalised -more than other gaps. This does not prevent close gaps; it makes them less -frequent, promoting a block-like appearance of the alignment. - -5) END GAP SEPARATION treats end gaps just like internal gaps for the purposes -of avoiding gaps that are too close (set by GAP SEPARATION DISTANCE above). -If you turn this off, end gaps will be ignored for this purpose. This is -useful when you wish to align fragments where the end gaps are not biologically -meaningful. - - ->> HELP 5 << Help for output format options. - -Six output formats are offered. You can choose any (or all 6 if you wish). - -CLUSTAL format output is a self explanatory alignment format. It shows the -sequences aligned in blocks. It can be read in again at a later date to -(for example) calculate a phylogenetic tree or add a new sequence with a -profile alignment. - -GCG output can be used by any of the GCG programs that can work on multiple -alignments (e.g. PRETTY, PROFILEMAKE, PLOTALIGN). It is the same as the GCG -.msf format files (multiple sequence file); new in version 7 of GCG. - -PHYLIP format output can be used for input to the PHYLIP package of Joe -Felsenstein. This is an extremely widely used package for doing every -imaginable form of phylogenetic analysis (MUCH more than the the modest intro- -duction offered by this program). - -NBRF-PIR: this is the same as the standard PIR format with ONE ADDITION. Gap -characters "-" are used to indicate the positions of gaps in the multiple -alignment. These files can be re-used as input in any part of clustal that -allows sequences (or alignments or profiles) to be read in. - -GDE: this is the flat file format used by the GDE package of Steven Smith. - -NEXUS: the format used by several phylogeny programs, including PAUP and -MacClade. - -GDE OUTPUT CASE: sequences in GDE format may be written in either upper or -lower case. - -CLUSTALW SEQUENCE NUMBERS: residue numbers may be added to the end of the -alignment lines in clustalw format. - -OUTPUT ORDER is used to control the order of the sequences in the output -alignments. By default, the order corresponds to the order in which the -sequences were aligned (from the guide tree-dendrogram), thus automatically -grouping closely related sequences. This switch can be used to set the order -to the same as the input file. - -PARAMETER OUTPUT: This option allows you to save all your parameter settings -in a parameter file. This file can be used subsequently to rerun Clustal W -using the same parameters. - - ->> HELP 6 << Help for profile and structure alignments - -By PROFILE ALIGNMENT, we mean alignment using existing alignments. Profile -alignments allow you to store alignments of your favourite sequences and add -new sequences to them in small bunches at a time. A profile is simply an -alignment of one or more sequences (e.g. an alignment output file from CLUSTAL -W). Each input can be a single sequence. One or both sets of input sequences -may include secondary structure assignments or gap penalty masks to guide the -alignment. - -The profiles can be in any of the allowed input formats with "-" characters -used to specify gaps (except for MSF-RSF where "." is used). - -You have to specify the 2 profiles by choosing menu items 1 and 2 and giving -2 file names. Then Menu item 3 will align the 2 profiles to each other. -Secondary structure masks in either profile can be used to guide the alignment. - -Menu item 4 will take the sequences in the second profile and align them to -the first profile, 1 at a time. This is useful to add some new sequences to -an existing alignment, or to align a set of sequences to a known structure. -In this case, the second profile would not be pre-aligned. - - -The alignment parameters can be set using menu items 5, 6 and 7. These are -EXACTLY the same parameters as used by the general, automatic multiple -alignment procedure. The general multiple alignment procedure is simply a -series of profile alignments. Carrying out a series of profile alignments on -larger and larger groups of sequences, allows you to manually build up a -complete alignment, if necessary editing intermediate alignments. - -SECONDARY STRUCTURE OPTIONS. Menu Option 0 allows you to set 2D structure -parameters. If a solved structure is available, it can be used to guide the -alignment by raising gap penalties within secondary structure elements, so -that gaps will preferentially be inserted into unstructured surface loops. -Alternatively, a user-specified gap penalty mask can be supplied directly. - -A gap penalty mask is a series of numbers between 1 and 9, one per position in -the alignment. Each number specifies how much the gap opening penalty is to be -raised at that position (raised by multiplying the basic gap opening penalty -by the number) i.e. a mask figure of 1 at a position means no change -in gap opening penalty; a figure of 4 means that the gap opening penalty is -four times greater at that position, making gaps 4 times harder to open. - -The format for gap penalty masks and secondary structure masks is explained -in the help under option 0 (secondary structure options). - - ->> HELP B << Help for secondary structure - gap penalty masks - -The use of secondary structure-based penalties has been shown to improve the -accuracy of multiple alignment. Therefore CLUSTAL W now allows gap penalty -masks to be supplied with the input sequences. The masks work by raising gap -penalties in specified regions (typically secondary structure elements) so that -gaps are preferentially opened in the less well conserved regions (typically -surface loops). - -Options 1 and 2 control whether the input secondary structure information or -gap penalty masks will be used. - -Option 3 controls whether the secondary structure and gap penalty masks should -be included in the output alignment. - -Options 4 and 5 provide the value for raising the gap penalty at core Alpha -Helical (A) and Beta Strand (B) residues. In CLUSTAL format, capital residues -denote the A and B core structure notation. The basic gap penalties are -multiplied by the amount specified. - -Option 6 provides the value for the gap penalty in Loops. By default this -penalty is not raised. In CLUSTAL format, loops are specified by "." in the -secondary structure notation. - -Option 7 provides the value for setting the gap penalty at the ends of -secondary structures. Ends of secondary structures are observed to grow -and-or shrink in related structures. Therefore by default these are given -intermediate values, lower than the core penalties. All secondary structure -read in as lower case in CLUSTAL format gets the reduced terminal penalty. - -Options 8 and 9 specify the range of structure termini for the intermediate -penalties. In the alignment output, these are indicated as lower case. -For Alpha Helices, by default, the range spans the end helical turn. For -Beta Strands, the default range spans the end residue and the adjacent loop -residue, since sequence conservation often extends beyond the actual H-bonded -Beta Strand. - -CLUSTAL W can read the masks from SWISS-PROT, CLUSTAL or GDE format input -files. For many 3-D protein structures, secondary structure information is -recorded in the feature tables of SWISS-PROT database entries. You should -always check that the assignments are correct - some are quite inaccurate. -CLUSTAL W looks for SWISS-PROT HELIX and STRAND assignments e.g. - -FT HELIX 100 115 -FT STRAND 118 119 - -The structure and penalty masks can also be read from CLUSTAL alignment format -as comment lines beginning "!SS_" or "!GM_" e.g. - -!SS_HBA_HUMA ..aaaAAAAAAAAAAaaa.aaaAAAAAAAAAAaaaaaaAaaa.........aaaAAAAAA -!GM_HBA_HUMA 112224444444444222122244444444442222224222111111111222444444 -HBA_HUMA VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK - -Note that the mask itself is a set of numbers between 1 and 9 each of which is -assigned to the residue(s) in the same column below. - -In GDE flat file format, the masks are specified as text and the names must -begin with "SS_ or "GM_. - -Either a structure or penalty mask or both may be used. If both are included in -an alignment, the user will be asked which is to be used. - - ->> HELP C << Help for secondary structure - gap penalty mask output options - -The options in this menu let you choose whether or not to include the masks -in the CLUSTAL W output alignments. Showing both is useful for understanding -how the masks work. The secondary structure information is itself very useful -in judging the alignment quality and in seeing how residue conservation -patterns vary with secondary structure. - - ->> HELP 7 << Help for phylogenetic trees - -1) Before calculating a tree, you must have an ALIGNMENT in memory. This can be -input in any format or you should have just carried out a full multiple -alignment and the alignment is still in memory. - - -*************** Remember YOU MUST ALIGN THE SEQUENCES FIRST!!!! *************** - - -The methods used are NJ (Neighbour Joining) and UPGMA. First -you calculate distances (percent divergence) between all pairs of sequence from -a multiple alignment; second you apply the NJ or UPGMA method to the distance matrix. - -2) EXCLUDE POSITIONS WITH GAPS? With this option, any alignment positions where -ANY of the sequences have a gap will be ignored. This means that 'like' will be -compared to 'like' in all distances, which is highly desirable. It also -automatically throws away the most ambiguous parts of the alignment, which are -concentrated around gaps (usually). The disadvantage is that you may throw away -much of the data if there are many gaps (which is why it is difficult for us to -make it the default). - - - -3) CORRECT FOR MULTIPLE SUBSTITUTIONS? For small divergence (say <10%) this -option makes no difference. For greater divergence, it corrects for the fact -that observed distances underestimate actual evolutionary distances. This is -because, as sequences diverge, more than one substitution will happen at many -sites. However, you only see one difference when you look at the present day -sequences. Therefore, this option has the effect of stretching branch lengths -in trees (especially long branches). The corrections used here (for DNA or -proteins) are both due to Motoo Kimura. See the documentation for details. - -Where possible, this option should be used. However, for VERY divergent -sequences, the distances cannot be reliably corrected. You will be warned if -this happens. Even if none of the distances in a data set exceed the reliable -threshold, if you bootstrap the data, some of the bootstrap distances may -randomly exceed the safe limit. - -4) To calculate a tree, use option 4 (DRAW TREE NOW). This gives an UNROOTED -tree and all branch lengths. The root of the tree can only be inferred by -using an outgroup (a sequence that you are certain branches at the outside -of the tree .... certain on biological grounds) OR if you assume a degree -of constancy in the 'molecular clock', you can place the root in the 'middle' -of the tree (roughly equidistant from all tips). - -5) TOGGLE PHYLIP BOOTSTRAP POSITIONS -By default, the bootstrap values are correctly placed on the tree branches of -the phylip format output tree. The toggle allows them to be placed on the -nodes, which is incorrect, but some display packages (e.g. TreeTool, TreeView -and Phylowin) only support node labelling but not branch labelling. Care -should be taken to note which branches and labels go together. - -6) OUTPUT FORMATS: four different formats are allowed. None of these displays -the tree visually. Useful display programs accepting PHYLIP format include -NJplot (from Manolo Gouy and supplied with Clustal W), TreeView (Mac-PC), and -PHYLIP itself - OR get the PHYLIP package and use the tree drawing facilities -there. (Get the PHYLIP package anyway if you are interested in trees). The -NEXUS format can be read into PAUP or MacClade. - - ->> HELP 8 << Help for choosing a weight matrix - -For protein alignments, you use a weight matrix to determine the similarity of -non-identical amino acids. For example, Tyr aligned with Phe is usually judged -to be 'better' than Tyr aligned with Pro. - -There are three 'in-built' series of weight matrices offered. Each consists of -several matrices which work differently at different evolutionary distances. To -see the exact details, read the documentation. Crudely, we store several -matrices in memory, spanning the full range of amino acid distance (from almost -identical sequences to highly divergent ones). For very similar sequences, it -is best to use a strict weight matrix which only gives a high score to -identities and the most favoured conservative substitutions. For more divergent -sequences, it is appropriate to use "softer" matrices which give a high score -to many other frequent substitutions. - -1) BLOSUM (Henikoff). These matrices appear to be the best available for -carrying out database similarity (homology searches). The matrices used are: -Blosum 80, 62, 45 and 30. (BLOSUM was the default in earlier Clustal W -versions) - -2) PAM (Dayhoff). These have been extremely widely used since the late '70s. -We use the PAM 20, 60, 120 and 350 matrices. - -3) GONNET. These matrices were derived using almost the same procedure as the -Dayhoff one (above) but are much more up to date and are based on a far larger -data set. They appear to be more sensitive than the Dayhoff series. We use the -GONNET 80, 120, 160, 250 and 350 matrices. This series is the default for -Clustal W version 1.8. - -We also supply an identity matrix which gives a score of 1.0 to two identical -amino acids and a score of zero otherwise. This matrix is not very useful. -Alternatively, you can read in your own (just one matrix, not a series). - -A new matrix can be read from a file on disk, if the filename consists only -of lower case characters. The values in the new weight matrix must be integers -and the scores should be similarities. You can use negative as well as positive -values if you wish, although the matrix will be automatically adjusted to all -positive scores. - - - -For DNA, a single matrix (not a series) is used. Two hard-coded matrices are -available: - - -1) IUB. This is the default scoring matrix used by BESTFIT for the comparison -of nucleic acid sequences. X's and N's are treated as matches to any IUB -ambiguity symbol. All matches score 1.9; all mismatches for IUB symbols score 0. - - -2) CLUSTALW(1.6). The previous system used by Clustal W, in which matches score -1.0 and mismatches score 0. All matches for IUB symbols also score 0. - -INPUT FORMAT The format used for a new matrix is the same as the BLAST program. -Any lines beginning with a # character are assumed to be comments. The first -non-comment line should contain a list of amino acids in any order, using the -1 letter code, followed by a * character. This should be followed by a square -matrix of integer scores, with one row and one column for each amino acid. The -last row and column of the matrix (corresponding to the * character) contain -the minimum score over the whole matrix. - - ->> HELP 9 << Help for command line parameters - - DATA (sequences) - --INFILE=file.ext :input sequences. --PROFILE1=file.ext and -PROFILE2=file.ext :profiles (old alignment). - - - VERBS (do things) - --OPTIONS :list the command line parameters --HELP or -CHECK :outline the command line params. --FULLHELP :output full help content. --ALIGN :do full multiple alignment. --TREE :calculate NJ tree. --PIM :output percent identity matrix (while calculating the tree) --BOOTSTRAP(=n) :bootstrap a NJ tree (n= number of bootstraps; def. = 1000). --CONVERT :output the input sequences in a different file format. - - - PARAMETERS (set things) - -***General settings:**** --INTERACTIVE :read command line, then enter normal interactive menus --QUICKTREE :use FAST algorithm for the alignment guide tree --TYPE= :PROTEIN or DNA sequences --NEGATIVE :protein alignment with negative values in matrix --OUTFILE= :sequence alignment file name --OUTPUT= :GCG, GDE, PHYLIP, PIR or NEXUS --OUTORDER= :INPUT or ALIGNED --CASE :LOWER or UPPER (for GDE output only) --SEQNOS= :OFF or ON (for Clustal output only) --SEQNO_RANGE=:OFF or ON (NEW: for all output formats) --RANGE=m,n :sequence range to write starting m to m+n --MAXSEQLEN=n :maximum allowed input sequence length --QUIET :Reduce console output to minimum --STATS= :Log some alignents statistics to file - -***Fast Pairwise Alignments:*** --KTUPLE=n :word size --TOPDIAGS=n :number of best diags. --WINDOW=n :window around best diags. --PAIRGAP=n :gap penalty --SCORE :PERCENT or ABSOLUTE - - -***Slow Pairwise Alignments:*** --PWMATRIX= :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename --PWDNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename --PWGAPOPEN=f :gap opening penalty --PWGAPEXT=f :gap opening penalty - - -***Multiple Alignments:*** --NEWTREE= :file for new guide tree --USETREE= :file for old guide tree --MATRIX= :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename --DNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename --GAPOPEN=f :gap opening penalty --GAPEXT=f :gap extension penalty --ENDGAPS :no end gap separation pen. --GAPDIST=n :gap separation pen. range --NOPGAP :residue-specific gaps off --NOHGAP :hydrophilic gaps off --HGAPRESIDUES= :list hydrophilic res. --MAXDIV=n :% ident. for delay --TYPE= :PROTEIN or DNA --TRANSWEIGHT=f :transitions weighting --ITERATION= :NONE or TREE or ALIGNMENT --NUMITER=n :maximum number of iterations to perform --NOWEIGHTS :disable sequence weighting - - -***Profile Alignments:*** --PROFILE :Merge two alignments by profile alignment --NEWTREE1= :file for new guide tree for profile1 --NEWTREE2= :file for new guide tree for profile2 --USETREE1= :file for old guide tree for profile1 --USETREE2= :file for old guide tree for profile2 - - -***Sequence to Profile Alignments:*** --SEQUENCES :Sequentially add profile2 sequences to profile1 alignment --NEWTREE= :file for new guide tree --USETREE= :file for old guide tree - - -***Structure Alignments:*** --NOSECSTR1 :do not use secondary structure-gap penalty mask for profile 1 --NOSECSTR2 :do not use secondary structure-gap penalty mask for profile 2 --SECSTROUT=STRUCTURE or MASK or BOTH or NONE :output in alignment file --HELIXGAP=n :gap penalty for helix core residues --STRANDGAP=n :gap penalty for strand core residues --LOOPGAP=n :gap penalty for loop regions --TERMINALGAP=n :gap penalty for structure termini --HELIXENDIN=n :number of residues inside helix to be treated as terminal --HELIXENDOUT=n :number of residues outside helix to be treated as terminal --STRANDENDIN=n :number of residues inside strand to be treated as terminal --STRANDENDOUT=n:number of residues outside strand to be treated as terminal - - -***Trees:*** --OUTPUTTREE=nj OR phylip OR dist OR nexus --SEED=n :seed number for bootstraps. --KIMURA :use Kimura's correction. --TOSSGAPS :ignore positions with gaps. --BOOTLABELS=node OR branch :position of bootstrap values in tree display --CLUSTERING= :NJ or UPGMA - - ->> HELP 0 << Help for tree output format options - -Four output formats are offered: 1) Clustal, 2) Phylip, 3) Just the distances -4) Nexus - -None of these formats displays the results graphically. Many packages can -display trees in the the PHYLIP format 2) below. It can also be imported into -the PHYLIP programs RETREE, DRAWTREE and DRAWGRAM for graphical display. -NEXUS format trees can be read by PAUP and MacClade. - -1) Clustal format output. -This format is verbose and lists all of the distances between the sequences and -the number of alignment positions used for each. The tree is described at the -end of the file. It lists the sequences that are joined at each alignment step -and the branch lengths. After two sequences are joined, it is referred to later -as a NODE. The number of a NODE is the number of the lowest sequence in that -NODE. - -2) Phylip format output. -This format is the New Hampshire format, used by many phylogenetic analysis -packages. It consists of a series of nested parentheses, describing the -branching order, with the sequence names and branch lengths. It can be used by -the RETREE, DRAWGRAM and DRAWTREE programs of the PHYLIP package to see the -trees graphically. This is the same format used during multiple alignment for -the guide trees. - -Use this format with NJplot (Manolo Gouy), supplied with Clustal W. Some other -packages that can read and display New Hampshire format are TreeView (Mac/PC), -TreeTool (UNIX), and Phylowin. - -3) The distances only. -This format just outputs a matrix of all the pairwise distances in a format -that can be used by the Phylip package. It used to be useful when one could not -produce distances from protein sequences in the Phylip package but is now -redundant (Protdist of Phylip 3.5 now does this). - -4) NEXUS FORMAT TREE. This format is used by several popular phylogeny programs, -including PAUP and MacClade. The format is described fully in: -Maddison, D. R., D. L. Swofford and W. P. Maddison. 1997. -NEXUS: an extensible file format for systematic information. -Systematic Biology 46:590-621. - -5) TOGGLE PHYLIP BOOTSTRAP POSITIONS -By default, the bootstrap values are placed on the nodes of the phylip format -output tree. This is inaccurate as the bootstrap values should be associated -with the tree branches and not the nodes. However, this format can be read and -displayed by TreeTool, TreeView and Phylowin. An option is available to -correctly place the bootstrap values on the branches with which they are -associated. diff --git a/binaries/help/iupred.txt b/binaries/help/iupred.txt deleted file mode 100644 index e023ac9..0000000 --- a/binaries/help/iupred.txt +++ /dev/null @@ -1,45 +0,0 @@ -INTERPRETATION OF THE OUTPUT: - -In the case of long and short types of disorder the output gives the -likelihood of disorder for each residue, i.e. it is a value between 0 and 1, -and higher values indicate higher probability of disorder. Residues with values -above 0.5 can be regarded as disordered, and at this cutoff 5% of globular -proteins is expected to be predicted to disordered (false positives). - -For the prediction type of globular domains it gives the number of globular -domains and list their start and end position in the sequence. This is followed -by the submitted sequence with residues of globular domains indicated by -uppercase letters. - - -SHORT SUMMARY OF THE METHOD - -Intrinsically unstructured/disordered proteins have no single well-defined -tertiary structure in their native, functional state. Our server recognizes -such regions from the amino acid sequence based on the estimated pairwise -energy content. The underlying assumption is that globular proteins make a -large number of interresidue interactions, providing the stabilizing energy to -overcome the entropy loss during folding. In contrast, IUPs have special -sequences that do not have the capacity to form sufficient interresidue -interactions. Taking a set of globular proteins with known structure, we have -developed a simple formalism that allows the estimation of the pairwise -interaction energies of these proteins. It uses a quadratic expression in the -amino acid composition, which takes into account that the contribution of an -amino acid to order/disorder depends not only its own chemical type, but also -on its sequential environment, including its potential interaction partners. -Applying this calculation for IUP sequences, their estimated energies are -clearly shifted towards less favorable energies compared to globular proteins, -enabling the predicion of protein disorder on this ground. - - -References - -"The Pairwise Energy Content Estimated from Amino Acid Composition -Discriminates between Folded and Intrinsically Unstructured Proteins" -Zsuzsanna Dosztanyi, Veronika Csizmok, Peter Tompa and Istvan Simon -J. Mol. Biol. (2005) 347, 827-839. - -"IUPred: web server for the prediction of intrinsically unstructured -regions of proteins based on estimated energy content" -Zsuzsanna Dosztanyi, Veronika Csizmok, Peter Tompa and Istvan Simon -Bioinformatics (2005) 21, 3433-3434. diff --git a/binaries/help/mafft_manual.htm b/binaries/help/mafft_manual.htm deleted file mode 100644 index 3848d71..0000000 --- a/binaries/help/mafft_manual.htm +++ /dev/null @@ -1,721 +0,0 @@ - -Manpage of MAFFT - - -

MAFFT

-Section: Mafft Manual (1)
Updated: 2007-06-09
Index -Return to Main Contents
- - - - - -  -

NAME

- -
-mafft - Multiple alignment program for amino acid or nucleotide sequences -
- -  -

SYNOPSIS

- -
-
-
-mafft [options] input [> output] -
-linsi input [> output] -
-ginsi input [> output] -
-einsi input [> output] -
-fftnsi input [> output] -
-fftns input [> output] -
-nwns input [> output] -
-nwnsi input [> output] -
-mafft-profile group1 group2 [> output] -
-

-input, group1 and group2 must be in FASTA format. -

-
- -  -

DESCRIPTION

- -
-MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods. -
-  -

Accuracy-oriented methods:

- -

-

-*L-INS-i (probably most accurate; recommended for <200 sequences; -iterative refinement method incorporating local pairwise alignment -information): -
-
-mafft --localpair --maxiterate 1000 input [> output] -
-linsi input [> output] -
-
- -

-

*G-INS-i (suitable for sequences -of similar lengths; recommended for <200 sequences; iterative -refinement method incorporating global pairwise alignment information): -
-
-mafft --globalpair --maxiterate 1000 input [> output] -
-ginsi input [> output] -
-
- -

-

-*E-INS-i (suitable for sequences containing large unalignable regions; recommended for <200 sequences): -
-
-mafft --ep 0 --genafpair --maxiterate 1000 input [> output] -
-einsi input [> output] -
-

-

For E-INS-i, the ---ep -0 -option is recommended to allow large gaps. -
-
- -  -

Speed-oriented methods:

- -

-

-*FFT-NS-i (iterative refinement method; two cycles only): -
-
-mafft --retree 2 --maxiterate 2 input [> output] -
-fftnsi input [> output] -
-
- -

-

-*FFT-NS-i (iterative refinement method; max. 1000 iterations): -
-
-mafft --retree 2 --maxiterate 1000 input [> output] -
-
- -

-

-*FFT-NS-2 (fast; progressive method): -
-
-mafft --retree 2 --maxiterate 0 input [> output] -
-fftns input [> output] -
-
- -

-

-*FFT-NS-1 (very fast; recommended for >2000 sequences; progressive method with a rough guide tree): -
-
-mafft --retree 1 --maxiterate 0 input [> output] -
-
- -

-

-*NW-NS-i (iterative refinement method without FFT approximation; two cycles only): -
-
-mafft --retree 2 --maxiterate 2 --nofft input [> output] -
-nwnsi input [> output] -
-
- -

-

-*NW-NS-2 (fast; progressive method without the FFT approximation): -
-
-mafft --retree 2 --maxiterate 0 --nofft input [> output] -
-nwns input [> output] -
-
- -

-

-*NW-NS-PartTree-1 (recommended for ~10,000 to ~50,000 sequences; progressive method with the PartTree algorithm): -
-
-mafft --retree 1 --maxiterate 0 --nofft --parttree input [> output] -
-
- -  -

Group-to-group alignments

- -
-
-
-mafft-profile group1 group2 [> output] -

-

or: -

-mafft --maxiterate 1000 --seed group1 --seed group2 /dev/null [> output] -

- - - -
-  -

OPTIONS

- -  -

Algorithm

- -
-

- ---auto -

-Automatically selects an appropriate strategy from L-INS-i, FFT-NS-i and FFT-NS-2, according to data -size. Default: off (always FFT-NS-2) -
- -

- ---6merpair -

-Distance is calculated based on the number of shared 6mers. Default: on -
- -

- ---globalpair -

-All pairwise alignments are computed with the Needleman-Wunsch -algorithm. More accurate but slower -than --6merpair. Suitable for a set of -globally alignable sequences. Applicable to -up to ~200 sequences. A combination with --maxiterate 1000 is recommended (G-INS-i). Default: off (6mer distance is used) -
- -

- ---localpair -

-All pairwise alignments are computed with the Smith-Waterman -algorithm. More accurate but slower -than --6merpair. Suitable for a set of -locally alignable sequences. Applicable to -up to ~200 sequences. A combination with --maxiterate 1000 is recommended (L-INS-i). Default: off (6mer distance is used) -
- -

- ---genafpair -

-All pairwise alignments are computed with a local -algorithm with the generalized affine gap cost -(Altschul 1998). More accurate but slower -than --6merpair. Suitable when large internal gaps -are expected. Applicable to -up to ~200 sequences. A combination with --maxiterate 1000 is recommended (E-INS-i). Default: off (6mer distance is used) -
- - - - - - - -

- ---fastapair -

-All pairwise alignments are computed with FASTA (Pearson and Lipman 1988). -FASTA is required. Default: off (6mer distance is used) -
- - - - - - - -

- ---weighti number -

-Weighting factor for the consistency term calculated from pairwise alignments. Valid when -either of --blobalpair, --localpair, --genafpair, --fastapair or ---blastpair is selected. Default: 2.7 -
- -

- ---retree number -

-Guide tree is built number times in the -progressive stage. Valid with 6mer distance. Default: 2 -
- -

- ---maxiterate number -

-number cycles of iterative refinement are performed. Default: 0 -
- -

- ---fft -

-Use FFT approximation in group-to-group alignment. Default: on -
- -

- ---nofft -

-Do not use FFT approximation in group-to-group alignment. Default: off -
- -

- ---noscore -

-Alignment score is not checked in the iterative refinement stage. Default: off (score is checked) -
- -

- ---memsave -

-Use the Myers-Miller (1988) algorithm. Default: automatically turned on when the alignment length exceeds 10,000 (aa/nt). -
- -

- ---parttree -

-Use a fast tree-building method (PartTree, Katoh and Toh 2007) with -the 6mer distance. Recommended for a large number (> ~10,000) -of sequences are input. Default: off -
- -

- ---dpparttree -

-The PartTree algorithm is used with distances based on DP. Slightly -more accurate and slower than --parttree. Recommended for a large -number (> ~10,000) of sequences are input. Default: off -
- -

- ---fastaparttree -

The PartTree algorithm is used -with distances based on FASTA. Slightly more accurate and slower than ---parttree. Recommended for a large number (> ~10,000) of sequences -are input. FASTA is required. Default: off -
- -

- ---partsize number -

-The number of partitions in the PartTree algorithm. Default: 50 -
- -

- ---groupsize number -

-Do not make alignment larger than number sequences. Valid only with the --*parttree options. Default: the number of input sequences -
- -
- -  -

Parameter

- -
-

- ---op number -

-Gap opening penalty at group-to-group alignment. Default: 1.53 -
- -

- ---ep number -

-Offset value, which works like gap extension penalty, for -group-to-group alignment. Deafult: 0.123 -
- -

- ---lop number -

-Gap opening penalty at local pairwise -alignment. Valid when -the --localpair or --genafpair option is selected. Default: -2.00 -
- -

- ---lep number -

-Offset value at local pairwise alignment. Valid when -the --localpair or --genafpair option is selected. Default: 0.1 -
- -

- ---lexp number -

-Gap extension penalty at local pairwise alignment. Valid when -the --localpair or --genafpair option is selected. Default: -0.1 -
- -

- ---LOP number -

-Gap opening penalty to skip the alignment. Valid when the ---genafpair option is selected. Default: -6.00 -
- -

- ---LEXP number -

-Gap extension penalty to skip the alignment. Valid when the ---genafpair option is selected. Default: 0.00 -
- -

- ---bl number -

-BLOSUM number matrix (Henikoff and Henikoff 1992) is used. number=30, 45, 62 or 80. Default: 62 -
- -

- ---jtt number -

-JTT PAM number (Jones et al. 1992) matrix is used. number>0. Default: BLOSUM62 -
- -

- ---tm number -

-Transmembrane PAM number (Jones et al. 1994) matrix is used. number>0. Default: BLOSUM62 -
- -

- ---aamatrix matrixfile -

-Use a user-defined AA scoring matrix. The format of matrixfile is -the same to that of BLAST. Ignored when nucleotide sequences are input. Default: BLOSUM62 -
- -

- ---fmodel -

-Incorporate the AA/nuc composition information into -the scoring matrix. Deafult: off -
- -
- -  -

Output

- -
-

- ---clustalout -

-Output format: clustal format. Default: off (fasta format) -
- -

- ---inputorder -

-Output order: same as input. Default: on -
- -

- ---reorder -

-Output order: aligned. Default: off (inputorder) -
- -

- ---treeout -

-Guide tree is output to the input.tree file. Default: off -
- -

- ---quiet -

-Do not report progress. Default: off -
- -
- -  -

Input

- -
-

- ---nuc -

-Assume the sequences are nucleotide. Deafult: auto -
- -

- ---amino -

-Assume the sequences are amino acid. Deafult: auto -
- -

- ---seed alignment1 [--seed alignment2 --seed alignment3 ...] -

-Seed alignments given in alignment_n (fasta format) are aligned with -sequences in input. The alignment within every seed is preserved. -
- -
- -  -

FILES

- -
-

- -Mafft stores the input sequences and other files in a temporary directory, which by default is located in -/tmp. -

- -  -

ENVIONMENT

- -
-

- -MAFFT_BINARIES -

-Indicates the location of the binary files used by mafft. By default, they are searched in -/usr/local/lib/mafft, but on Debian systems, they are searched in -/usr/lib/mafft. -
- -

- -FASTA_4_MAFFT -

-This variable can be set to indicate to mafft the location to the fasta34 program if it is not in the PATH. -
- -
- -  -

SEE ALSO

- -
-

- -

-mafft-homologs(1) -

- -  -

REFERENCES

- -
-
-  -

In English

- -

-

*Katoh and Toh (Bioinformatics -23:372-374, 2007) PartTree: an algorithm to build an approximate tree -from a large number of unaligned sequences (describes the PartTree -algorithm). -
- -

-

*Katoh, Kuma, Toh and Miyata -(Nucleic Acids Res. 33:511-518, 2005) MAFFT version 5: improvement in -accuracy of multiple sequence alignment (describes [ancestral versions -of] the G-INS-i, L-INS-i and E-INS-i strategies) -
- -

-

*Katoh, Misawa, Kuma and Miyata -(Nucleic Acids Res. 30:3059-3066, 2002) MAFFT: a novel method for rapid -multiple sequence alignment based on fast Fourier transform (describes -the FFT-NS-1, FFT-NS-2 and FFT-NS-i strategies) -
- -  -

In Japanese

- -

-

-*Katoh and Misawa (Seibutsubutsuri 46:312-317, 2006) Multiple Sequence Alignments: the Next Generation -
- -

-

-*Katoh and Kuma (Kagaku to Seibutsu 44:102-108, 2006) Jissen-teki Multiple Alignment -
- - -  -

AUTHORS

- -
-

- -Kazutaka Katoh <katoh_at_bioreg.kyushu-u.ac.jp> -

-

-
-Wrote Mafft. -
-

- -Charles Plessy <charles-debian-nospam_at_plessy.org> -

-

-
-Wrote this manpage in DocBook XML for the Debian distribution, using Mafft's homepage as a template. -
-
- -  -

COPYRIGHT

- -
-Copyright © 2002-2007 Kazutaka Katoh (mafft) -
- -Copyright © 2007 Charles Plessy (this manpage) -
- -

- -Mafft and its manpage are offered under the following conditions: -

Redistribution and use in source and binary forms, with or -without modification, are permitted provided that the following -conditions are met: -

-

- 1.Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. -
- -

-

2.Redistributions in binary -form must reproduce the above copyright notice, this list of conditions -and the following disclaimer in the documentation and/or other -materials provided with the distribution. -
- -

-

3.The name of the author may -not be used to endorse or promote products derived from this software -without specific prior written permission. -
- -

THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR -IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED -WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE -DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, -INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES -(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) -HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, -STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING -IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. -
- -

- -

- -


- 

Index

-
-
NAME
-
SYNOPSIS
-
DESCRIPTION
-
-
Accuracy-oriented methods:
-
Speed-oriented methods:
-
Group-to-group alignments
-
-
OPTIONS
-
-
Algorithm
-
Parameter
-
Output
-
Input
-
-
FILES
-
ENVIONMENT
-
SEE ALSO
-
REFERENCES
-
-
In English
-
In Japanese
-
-
AUTHORS
-
COPYRIGHT
-
-
-This document was created by -man2html, -using the manual pages.
-Time: 02:26:04 GMT, August 14, 2007 - \ No newline at end of file diff --git a/binaries/help/muscle3.6.html b/binaries/help/muscle3.6.html deleted file mode 100644 index 36055f8..0000000 --- a/binaries/help/muscle3.6.html +++ /dev/null @@ -1,4171 +0,0 @@ - - - - - - - - -MUSCLE User Guide - - - - - - - -
- -

 

- -

 

- -

 

- -

 

- -

MUSCLE User Guide

- -

                                                                                                                                                                            

- -

 

- -

 

- -

 

- -

 

- -

 

- -

 

- -

Multiple sequence comparison -by log-expectation

- -

by Robert C. Edgar

- -

 

- -

Version 3.6

- -

September 2005

- -

 

- -

 

- -

http://www.drive5.com/muscle

- -

email: muscle (at) drive5.com

- -

 

- -

MUSCLE is updated regularly. -Send me an e-mail if you would like to be notified of new releases.

- -

 

- -

 

- -

Citation:

- -

 

- -

Edgar, -Robert C. (2004), MUSCLE: multiple sequence alignment with high accuracy and -high throughput, Nucleic Acids Research 32(5), 1792-97.

- -

 

- -

For a complete -description of the algorithm, see also:

- -

 

- -

Edgar, Robert C (2004), MUSCLE: a multiple sequence alignment method -with reduced time and space complexity. BMC Bioinformatics, 5(1):113.
-
Table of Contents

- -

1 -Introduction. 3

- -

2 -Quick Start 3

- -

2.1 -Installation. 3

- -

2.2 -Making an alignment 3

- -

2.3 -Large alignments. 3

- -

2.4 -Faster speed. 4

- -

2.5 -Huge alignments. 4

- -

2.6 -Accuracy: caveat emptor 4

- -

2.7 -Pipelining. 4

- -

2.8 -Refining an existing alignment 4

- -

2.9 -Using a pre-computed guide tree. 4

- -

2.10 -Profile-profile alignment 5

- -

2.11 -Adding sequences to an existing alignment 5

- -

2.12 -Sequence clustering. 5

- -

2.13 -Specifying a substitution matrix. 6

- -

2.14 -Refining a long alignment 6

- -

3 -File Formats. 6

- -

3.1 -Input files. 6

- -

3.1.1 -Amino acid sequences. 6

- -

3.1.2 -Nucleotide sequences. 6

- -

3.1.3 -Determining sequence type. 7

- -

3.2 -Output files. 7

- -

3.2.1 -Sequence grouping. 7

- -

3.3 -CLUSTALW format 7

- -

3.4 -MSF format 7

- -

3.5 -HTML format 8

- -

3.6 -Phylip format 8

- -

4 -Using MUSCLE. 8

- -

4.1 -How the algorithm works. 8

- -

4.2 -Command-line options. 9

- -

4.3 -The maxiters option. 9

- -

4.4 -The maxtrees option. 10

- -

4.5 -The maxhours option. 10

- -

4.6 -The maxmb option. 10

- -

4.7 -The profile scoring function. 10

- -

4.8 -Diagonal optimization. 10

- -

4.9 -Anchor optimization. 11

- -

4.10 -Log file. 11

- -

4.11 -Progress messages. 11

- -

4.12 -Running out of memory. 12

- -

4.13 -Troubleshooting. 12

- -

4.14 -Technical support 12

- -

5 -Command Line Reference. 13

- -

 

- -
-
- -

1 Introduction

- -

MUSCLE is a program for creating multiple alignments of -amino acid or nucleotide sequences. A range of options is provided that give -you the choice of optimizing accuracy, speed, or some compromise between the -two. Default parameters are those that give the best average accuracy in our -tests. Using versions current at the time of writing, my tests show that MUSCLE -can achieve both better average accuracy and better speed than CLUSTALW or T‑Coffee, -depending on the chosen options. Many command line options are provided to vary -the internals of the algorithm; some of these will primarily be of interest to -algorithm developers who wish to better understand which features of the algorithm -are important in different circumstances.

- -

2 Quick Start

- -

The MUSCLE algorithm is delivered as a command-line program -called muscle. If you are running under Linux or Unix -you will be working at a shell prompt. If you are running under Windows, you should -be in a command window (nostalgically known to us older people as a DOS prompt). -If you don't know how to use command-line programs, you should get help from a -local guru.

- -

2.1 Installation

- -

Copy the muscle binary file to a directory that is -accessible from your computer. That's it—there are no configuration files, -libraries, environment variables or other settings to worry about. If you are -using Windows, then the binary file is named muscle.exe. From now on muscle -should be understood to mean "muscle if you are using Linux or Unix, muscle.exe if you are using Windows".

- -

2.2 Making -an alignment

- -

Make a FASTA file containing some sequences. (If you are not -familiar with FASTA format, it is described in detail later in this Guide.) For -now, just to make things fast, limit the number of sequence in the file to no -more than 50 and the sequence length to be no more than 500. Call the input -file seqs.fa. (An example file named seqs.fa is distributed with the standard MUSCLE -package). Make sure the directory containing the muscle binary is in -your path. (If it isn't, you can run it by typing the full path name, and the -following example command lines must be changed accordingly). Now type:

- -

 

- -

muscle -in seqs.fa -out seqs.afa

- -

 

- -

You should see some progress messages. If muscle completes -successfully, it will create a file seqs.afa -containing the alignment. By default, output is created in "aligned -FASTA" format (hence the .afa extension). -This is just like regular FASTA except that gaps are added in order to align -the sequences. This is a nice format for computers but not very readable for -people, so to look at the alignment you will want an alignment viewer such as Belvu, or a script that converts FASTA to a more readable format. -You can also use the –clw command-line option -to request output in CLUSTALW format, which is easier to understand for people. -If muscle gives an error message and you don't know how to fix it, -please read the Troubleshooting section.

- -

 

- -

The default settings are designed to give the best accuracy, -so this may be all you need to know.

- -

2.3 Large -alignments

- -

If you have a large number of sequences (a few thousand), or -they are very long, then the default settings of may be too slow for practical -use. A good compromise between speed and accuracy is to run just the first two -iterations of the algorithm. On average, this gives accuracy comparable to -T-Coffee and speeds much faster than CLUSTALW. This is done by the option –maxiters -2, as in the following example.

- -

 

- -

muscle -in seqs.fa -out seqs.afa -maxiters 2

- -

2.4 Faster -speed

- -

The –diags option enables -an optimization for speed by finding common words (6-mers in a compressed amino -acid alphabet) between the two sequences as seeds for diagonals. This is -related to optimizations in programs such as BLAST and FASTA: you get faster speed, -but sometimes lower average accuracy. For large numbers of closely related -sequences, this option works very well.

- -

 

- -

If you want the fastest possible speed, then the following -example shows the applicable options for proteins.

- -

 

- -

muscle -in seqs.fa -out seqs.afa -maxiters 1 -diags -sv --distance1 kbit20_3

- -

 

- -

For nucleotides, use:

- -

 

- -

muscle -in seqs.fa -out seqs.afa -maxiters 1 -diags

- -

 

- -

At the time of writing, muscle with these options is faster -than any other multiple sequence alignment program -that I have tested. The alignments are not bad, especially when the sequences -are closely related. However, as you might expect, this blazing speed comes at -the cost of the lowest average accuracy of the options that muscle -provides.

- -

2.5 Huge -alignments

- -

If you have a very large number of sequences (several -thousand), or they are very long, then the kbit20_3 option may cause -problems because it needs a relatively large amount of memory. Better is to use -the default distance measure, which is roughly 2× or 3× slower but needs less -memory, like this:

- -

 

- -

muscle -in seqs.fa -out seqs.afa -maxiters 1 -diags1 -sv

- -

2.6 Accuracy: -caveat emptor

- -

Why do I keep using the clumsy phrase "average -accuracy" instead of just saying "accuracy"? That's because the -quality of alignments produced by MUSCLE varies, as do those produced other programs -such as CLUSTALW and T-Coffee. The state of the art leaves plenty of room for -improvement. Sometimes the fastest speed options to muscle give -alignments that are better than T-Coffee, though the reverse will more often be -the case. With challenging sets of sequences, it is a good idea to make several -different alignments using different muscle options and to try other programs -too. Regions where different alignments agree are more believable than regions -where they disagree.

- -

2.7 Pipelining

- -

Input can be taken from standard input, and output can be -written to standard output. This is the default, so our first example would -also work like this:

- -

 

- -

muscle < seqs.fa > seqs.afa

- -

2.8 Refining -an existing alignment

- -

You can ask muscle to try to improve an existing -alignment by using the –refine option. The input file must then be a -FASTA file containing an alignment. All sequences must be of equal length, gaps -can be specified using dots "." or dashes "–". For example:

- -

 

- -

muscle -in seqs.afa -out refined.afa -refine

- -

2.9 Using -a pre-computed guide tree

- -

The –usetree option allows -you to provide your own guide tree. For example,

- -

 

- -

muscle -in seqs.fa -out seqs.afa -usetree mytree.phy

- -

 

- -

The tree must by in Newick format, -as used by the Phylip package (hence the .phy extension). The Newick -format is described here:

- -

 

- -

        http://evolution.genetics.washington.edu/phylip/newicktree.html

- -

 

- -

WARNING. Do not use this -option just because you believe that you have an accurate evolutionary tree for -your sequences. The best guide tree for multiple alignment -is not in general the correct evolutionary tree. This can be understood -by the following argument. Alignment accuracy decreases with lower sequence -identity. It follows that given a set of profiles, the -two that can be aligned most accurately will tend to be the pair with the -highest identity, i.e. at the shortest evolutionary distance. This is exactly -the pair selected by the nearest-neighbor criterion which MUSCLE uses by -default. When mutation rates are variable, the evolutionary neighbor may -not be the nearest neighbor. This explains why a nearest-neighbor tree -may be superior to the true evolutionary tree for guiding a progressive alignment.

- -

 

- -

You will get a warning if you use the –usetree -option. To disable the warning, use ­–usetree_nowarn -instead,

- -

e.g.:

- -

 

- -

muscle -in seqs.fa -out seqs.afa -usetree_nowarn mytree.phy

- -

2.10 Profile-profile -alignment

- -

A fundamental step in the MUSCLE algorithm is aligning two -multiple sequence alignments. This operation is sometimes called -"profile-profile alignment". If you have two existing alignments of -related sequences you can use the –profile option of MUSCLE to align -those two sequences. Typical usage is:

- -

 

- -

muscle -profile -in1 one.afa -in2 two.afa -out both.afa

- -

 

- -

The alignments in one.afa -and two.afa, which must be in aligned FASTA -format, are aligned to each other, keeping input columns intact and inserting -columns of gaps where needed. Output is stored in both.afa.

- -

 

- -

MUSCLE does not compute a similarity measure or measure of -statistical significance (such as an E-value), so this option is not useful for -discriminating homologs from unrelated sequences. For this task, I recommend Sadreyev & Grishin's COMPASS -program.

- -

2.11 Adding -sequences to an existing alignment

- -

To add a sequence to an existing alignment that you wish to -keep intact, use profile-profile alignment with the new sequence as a profile. -For example, if you have an existing alignment existing_aln.afa -and want to add a new sequence in new_seq.fa, -use the following commands:

- -

 

- -

muscle -profile -in1 existing_aln.afa -in2 new_seq.fa -out -combined.afa

- -

 

- -

If you have more than one new sequences, -you can align them first then add them, for example:

- -

 

- -

muscle -in new_seqs.fa -out new_seqs.afa

- -

muscle -profile -in1 existing_aln.afa -in2 new_seqs.fa -out -combined.afas

- -

2.12 Sequence -clustering

- -

The first stage in MUSCLE is a fast clustering algorithm. -This may be of use in other applications. Typical usage is:

- -

 

- -

muscle -cluster -in seqs.fa -tree1 tree.phy -maxiters 1

- -

 

- -

The sequences will be clustered, and a tree written to tree.phy. Options –weight1, –distance1, -–cluster1 and –root1 can be applied if desired. Note that by -default, UPGMA clustering is used. You can use

- -

 –neighborjoining if you prefer, but note that this is -substantially slower than UPGMA for large numbers of sequences, and is also -slightly less accurate. See discussion of –usetree -above.

- -

2.13 Specifying -a substitution matrix

- -

You can specify your own substitution matrix by using the -matrix -option. This reads a protein substitution matrix in NCBI or WU-BLAST format. -The alphabet is assumed to be amino acid, and sum-of-pairs scoring is used. The -­-gapopen, -gapextend -and -center parameters should be specified; normally you will specify a -zero value for the center. Note that gap penalties MUST be negative. The -environment variable MUSCLE_MXPATH can be used to specify a path where the -matrices are stored. For example,

- -

 

- -

muscle -in seqs.fa -out seqs.afa -matrix blosum62 -gapopen -12.0

- -

    -gapextend -1.0 -center 0.0

- -

 

- -

You can hack a nucleotide matrix by pretending that AGCT are -amino acids and making a 20x20 matrix out of the original 4x4 matrix. Let me -know if this isn't clear, I can help you through it.

- -

2.14 Refining -a long alignment

- -

A long alignment can be refined using the –refinew option, which is primarily designed for -refining whole-genome nucleotide alignments. Usage is:

- -

 

- -

muscle -in input.afa -out output.afa

- -

 

- -

MUSCLE divides the input alignment into non-overlapping -windows and re-aligns each window from scratch, i.e. all gap characters are -discarded. The –refinewindow option may be -used to change the window length, which is 200 columns -by default.

- -

3 File Formats

- -

MUSCLE uses FASTA format for both input and output. For -output only, it also offers CLUSTALW, MSF, HTML, Phylip -sequential and Phylip interleaved formats. See the -following command-line options: ‑clw, ‑clwstrict, –msf, -–html, –phys, –phyi,clwout, ‑clwstrictout, -–msfout, –htmlout, -–physout and –phyiout.

- -

 

- -

3.1 Input -files

- -

Input files must be in FASTA format. These are plain text -files (word processing files such as Word documents are not understood!). Unix, Windows and DOS text files are supported (end-of-line -may be NL or CR NL). There is no explicit limit on the length of a sequence, -however if you are running a 32-bit version of muscle then the maximum -will be very roughly 10,000 letters due to maximum addressable size of tables -required in memory. Each sequence starts with an annotation line, which is -recognized by having a greater-than symbol ">" as its first -character. There is no limit on the length of an annotation line (this is new -as of version 3.5), and there is no requirement that the annotation be unique. The -sequence itself follows on one or more subsequent lines, and is terminated -either by the next annotation line or by the end of the file.

- -

3.1.1 Amino acid sequences

- -

The standard single-letter amino acid alphabet is used. Upper -and lower case is allowed, the case is not significant. The special characters -X, B, Z and U are understood. X means "unknown amino acid", B is D or -N, Z is E or Q. U is understood to be the 21st amino acid Selenocysteine. -White space (spaces, tabs and the end-of-line characters CR and NL) is allowed -inside sequence data. Dots "." and dashes "–" in sequences -are allowed and are discarded unless the input is expected to be aligned (e.g. -for the –refine option).

- -

3.1.2 Nucleotide sequences

- -

The usual letters A, G, C, T and U stand for nucleotides. -The letters T and U are equivalent as far as MUSCLE is concerned. N is the -wildcard meaning "unknown nucleotide". R means A or G, Y means C or -T/U. Other wildcards, such as those used by RFAM, are not understood in this -version and will be replaced by Ns. If you would like support for other DNA / -RNA alphabets, please let me know.

- -

3.1.3 Determining sequence type

- -

By default, MUSCLE looks at the first 100 letters in the -input sequence data (excluding gaps). If 95% or more of those letters are valid -nucleotides (AGCTUN), then the file is treated as nucleotides, otherwise as -amino acids. This method almost always guesses correctly, but you can make sure -by specifying the sequence type on the command line. This is done using the –seqtype option, which can take the following values:

- -

 

- -

        –­seqtype protein                          Amino acid

- -

        –seqtype nucleo                          Nucleotide

- -

        –seqtype auto                               Automatic -detection (default).

- -

3.2 Output -files

- -

By default, output is also written in FASTA format. All -letters are upper-case and gaps are represented by dashes "–". Output -is written to the following destination(s):

- -

 

- -

        If no other -output option is given, then standard output.

- -

        If -out <filename> -is given, to the specified file.

- -

        For all of the -xxxout options -(e.g. -fastaout, -clwout), -to the specified files.

- -

3.2.1 Sequence grouping

- -

By default, MUSCLE re-arranges sequences so that similar -sequences are adjacent in the output file. (This is done by ordering sequences -according to a prefix traversal of the guide tree). This makes the alignment -easier to evaluate by eye. If you want to the sequences to be output in the -same order as the input file, you can use the –stable option.

- -

3.2.2 Output to multiple file -formats

- -

You can request output to more than one file format by using -the -xxxout options. For example, to get both -FASTA and CLUSTALW formats:

- -

 

- -

muscle -in seqs.fa -fastaout seqs.afa -clwout seqs.aln

- -

3.3 CLUSTALW -format

- -

You can request CLUSTALW output by using the –clw option. This should be compatible with CLUSTALW, -with the exception of the program name in the file header. You can ask MUSCLE -to impersonate CLUSTALW by writing "CLUSTAL W (1.81)" as the program -name by using –clwstrict or clwstrictout. Note that MUSCLE allows duplicate -sequence labels, while CLUSTALW forbids duplicates. If you use the –stable -option of muscle, then the order of the input sequences is preserved and -sequences can be unambiguously identified even if the labels differ. If you -have problems parsing MUSCLE output with scripts designed for CLUSTALW, please -let me know and I'll do my best to provide a fix.

- -

3.4 MSF -format

- -

MSF format, as used in the GCG package, is requested by -using the –msf option. As with CLUSTALW -format, this is easier for people to read than FASTA. As of MUSCLE 3.52, the -MSF format has been tweaked to be more compatible with GCG. The following -differences remain.

- -

 

- -

(a) MUSCLE truncates at the first white space or after 63 -characters, which ever comes first. The GCG package apparently truncates after -10 characters. If this is a problem for you, please let me know and I'll add an -option to truncate after 10 in a future version.

- -

 

- -

(b) MUSCLE allows duplicate sequence labels, while GCG -forbids duplicates. If you use the –stable option of muscle, then -the order of the input sequences is preserved and sequences can be -unambiguously identified even if the labels differ.

- -

 

- -

Thanks to Eric Martel for help with improving GCG -compatibility.

- -

3.5 HTML -format

- -

I've added an experimental feature starting in version 3.4. To -get a Web page as output, use the –html option. The alignment is colored -using a color scheme from Eric Sonnhammer's Belvu editor, which is my personal favorite. A drawback of -this option is that the Web page typically contains a very large number of HTML -tags, which can be slow to display in the Internet Explorer browser. The -Netscape browser works much better. If you have any ideas about good ways to -make Web pages, please let me know.

- -

3.6 Phylip format

- -

The Phylip package supports two -different multiple sequence alignment file formats, called sequential and -interleaved respectively.

- -

4 Using MUSCLE

- -

In this section we give more details of the MUSCLE algorithm -and the more important options offered by the muscle implementation.

- -

4.1 How -the algorithm works

- -

I won't give a complete description of the MUSCLE algorithm -here—for that, you will have to read the papers. (See citations on title page -above). But hopefully a summary will help explain what some of the command-line -options do and how they might be useful in your work.

- -

 

- -

The first step is to calculate a tree. In CLUSTALW, this is -done as follows. Each pair of input sequences is aligned, and used to compute -the pair-wise identity of the pair. Identities are converted to a measure of -distance. Finally, the distance matrix is converted to a tree using a -clustering method (CLUSTALW uses neighbor-joining). If you have 1,000 -sequences, there are (1,000 ´ 999)/2 = 499,500 pairs, so aligning every pair can take -a while. MUSCLE uses a much faster, but somewhat more approximate, method to -compute distances: it counts the number of short sub-sequences (known as k-mers, k-tuples or words) -that two sequences have in common, without constructing an alignment. This is -typically around 3,000 times faster that CLUSTALW's -method, but the trees will generally be less accurate. We call this step "k-mer clustering".

- -

 

- -

The second step is to use the tree to construct what is -known as a progressive alignment. At each node of the binary tree, a pair-wise -alignment is constructed, progressing from the leaves towards the root. The -first alignment will be made from two sequences. Later alignments will be one -of the three following types: sequence-sequence, profile-sequence or -profile-profile, where "profile" means the multiple alignment of the sequences under a given internal node of -the tree. This is very similar to what CLUSTALW does once it has built a tree.

- -

 

- -

Now we have a multiple -alignment, which has been built very quickly compared with conventional -methods, mainly because of the distance calculation using k-mers rather than alignments. The quality of this alignment -is typically pretty good—it will often tie or beat a T-Coffee alignment on our -tests. However, on average, we find that it can be improved by proceeding -through the following steps.

- -

 

- -

From the multiple alignment, we can now compute the pair-wise identities of -each pair of sequences. This gives us a new distance matrix, from which we -estimate a new tree. We compare the old and new trees, and re-align subgroups -where needed to produce a progressive multiple alignment from the new tree. If -the two trees are identical, there is nothing to do; if there are no subtrees -that agree (very unusual), then the whole progressive alignment procedure must -be repeated from scratch. Typically we find that the tree is pretty stable near -the leaves, but some re-alignments are needed closer the root. This procedure -(compute pair-wise identities, estimate new tree, compare trees, re-align) is -iterated until the tree stabilizes or until a specified maximum number of -iterations has been done. We call this process "tree refinement", -although it also tends to improve the alignment.

- -

 

- -

We now keep the tree fixed -and move to a new procedure which is designed to improve the multiple -alignment. The set of sequences is divided into two subsets (i.e., we make a -bipartition on the set of sequences). A profile is constructed for each of the -two subsets based on the current multiple alignment. -These two profiles are then re-aligned to each other using the same pair-wise -alignment algorithm as used in the progressive stage. If this improves an -"objective score" that measures the quality of the alignment, then -the new multiple alignment is kept, otherwise it is -discarded. By default, the objective score is the classic sum-of-pairs score -that takes the (sequence weighted) average of the pair-wise alignment score of -every pair of sequences in the alignment. Bipartitions are chosen by deleting -an edge in the guide tree, each of the two resulting subtrees defines a subset -of sequences. This procedure is called "tree dependent refinement". One iteration of tree dependent refinement tries -bipartitions produced by deleting every edge of the tree in depth order moving -from the leaves towards the center of the tree. Iterations continue until -convergence or up to a specified maximum.

- -

 

- -

For convenience, the major -steps in MUSCLE are described as "iterations", though the first three -iterations all do quite different things and may take very different lengths of -time to complete. The tree-dependent refinement iterations 3, 4 ... are true -iterations and will take similar lengths of time.

- -

 

- - - - - - - - - - - - - - - - - - -
-

Iteration

-
-

Actions

-
-

1

-
-

Distance matrix by k-mer clustering, estimate tree, progressive alignment - according to this tree.

-

 

-
-

2

-
-

Distance matrix by - pair-wise identities from current multiple alignment, estimate tree, - progressive alignment according to new tree, repeat until convergence or specified - maximum number of times.

-

 

-
-

3, 4 ...

-
-

Tree-dependent refinement. One iteration visits every edge in the tree one time.

-
- -

4.2 Command-line -options

- -

There are two types of command-line options: value options -and flag options. Value options are followed by the value of the given -parameter, for example –in <filename>; flag options just stand for -themselves, such as –msf. All options are a -dash (not two dashes!) followed by a long name; there are no single-letter -equivalents. Value options must be separated from their values by white space -in the command line. Thus, muscle does not follow Unix, -Linux or Posix standards, for which we apologize. The -order in which options are given is irrelevant unless two options contradict, -in which case the right-most option silently wins.

- -

4.3 The -maxiters option

- -

You can control the number of iterations that MUSCLE does by -specifying the –maxiters option. If you specify 1, 2 or 3, then this is -exactly the number of iterations that will be performed. If the value is -greater than 3, then muscle will continue up to the maximum you specify -or until convergence is reached, which ever happens sooner. The default is 16. -If you have a large number of sequences, refinement may be rather slow.

- -

4.4 The -maxtrees option

- -

This option controls the maximum number of new trees to -create in iteration 2. Our experience suggests that a point of diminishing -returns is typically reached after the first tree, so the default value is 1. -If a larger value is given, the process will repeat until convergence or until -this number of trees has been created, which ever comes first.

- -

4.5 The -maxhours option

- -

If you have a large alignment, muscle may take a long -time to complete. It is sometimes convenient to say "I want the best -alignment I can get in 24 hours" rather than specifying a set of options -that will take an unknown length of time. This is done by using –maxhours, which specifies a floating-point number of -hours. If this time is exceeded, muscle will write out current alignment -and stop. For example,

- -

 

- -

muscle -in huge.fa -out huge.afa -maxiters 9999 -maxhours 24.0

- -

 

- -

Note that the actual time may exceed the specified limit by -a few minutes while muscle finishes up on a step. It is also possible -for no alignment to be produced if the time limit is too small.

- -

4.6 The -maxmb option

- -

If the amount of memory needed by MUSCLE exceeds available -physical RAM, then the operating system will probably begin paging (i.e., -swapping memory to and from hard disk), causing MUSCLE to run very slowly. This -is especially problematic when MUSCLE is used for batch processing, where one or -two very large alignments can cause a batch to effectively hang. Starting in -version 3.52, MUSCLE attempts to limit the amount of memory used. If the limit -is exceeded, MUSCLE quits, saving the best alignment so far produced (if any). -MUSCLE attempts to determine the amount of physical RAM by making an -appropriate operating system call. Under Linux and Windows, this works well. On -other systems, particularly other flavors of Unix, -MUSCLE doesn't know how to query the system and assumes that there is 500 Mb of -RAM. To override this default, you can specify the maximum number of megabytes -to allocate by using the –maxmb option, for -example to set a limit of 1.5 Gb:

- -

 

- -

muscle -in huge.fa -out huge.afa -maxhours 1.0 -maxmb 1500

- -

 

- -

This feature has been hacked on top of code that wasn't -really designed for it. So it doesn't always work perfectly, but is better than -nothing. The ideal solution would be to implement linear space dynamic -programming code (e.g., the Myers-Miller algorithm) for situations where memory -is tight. One day I might do this if there is sufficient interest. If you are -interested in contributing the code, e.g. for a class project, please let me -know, I'll be glad to provide support.

- -

4.7 The -profile scoring function

- -

Three different protein profile scoring functions are -supported, the log-expectation score (–le option) and a sum of pairs -score using either the PAM200 matrix (–sp) or the VTML240 matrix (–sv). The log-expectation score is the default as it -gives better results on our tests, but is typically somewhere between two or -three times slower than the sum-of-pairs score. For nucleotides, –spn is currently the only option (which is of course -the default for nucleotide data, so you don't need to specify this option).

- -

4.8 Diagonal -optimization

- -

Creating a pair-wise alignment by dynamic programming -requires computing an L1 ´ L2 -matrix, where L1 and L2 are the sequence -lengths. A trick used in algorithms such as BLAST is to reduce the size of this -matrix by using fast methods to find "diagonals", i.e. short regions -of high similarity between the two sequences. This speeds up the algorithm at -the expense of some reduction in accuracy. MUSCLE uses a technique called k-mer extension to find diagonals. It is disabled by default -because of the slight reduction in average accuracy and can be turned on by -specifying the –diags option. To enable -diagonal optimization in the first iteration, use –diags1, to enable -diagonal optimization in the second iteration, use –diags2. These are -provided separately because it would be a reasonable strategy to enable -diagonals in the first iteration but not the second (because the main goal of -the first iteration is to construct a multiple alignment quickly in order to -improve the distance matrix, which is not very sensitive to alignment quality; -whereas the goal of the second iteration is to make the best possible -progressive alignment).

- -

4.9 Anchor -optimization

- -

Tree-dependent refinement (iterations 3, 4 ... ) can be speeded up by dividing the alignment vertically -into blocks. Block boundaries are found by identifying high-scoring columns -(e.g., a perfectly conserved column of Cs or Ws would be a candidate). Each -vertical block is then refined independently before reassembling the complete -alignment, which is faster because of the L2 factor in -dynamic programming (e.g., suppose the alignment is split into two vertical -blocks, then 2 ´ -0.52 = 0.5, so the dynamic programming time is roughly halved). The -–noanchors option is used to disable this -feature. This option has no effect if –maxiters 1 or –maxiters 2 -is specified. On benchmark tests, enabling anchors has little or no effect on -accuracy, but if you want to be very conservative and are striving for the best -possible accuracy then –noanchors is a -reasonable choice.

- -

4.10 Log -file

- -

You can specify a log file by using –log <filename> -or –loga <filename>. Using –log -causes any existing file to be deleted, –loga -appends to any existing file. A message will be written to the log file when muscle -starts and stops. Error and warning messages will also be written to the log. -If –verbose is specified, then more information will be written, -including the command line used to invoke muscle, the resulting internal -parameter settings, and also progress messages. The content and format of -verbose log file output is subject to change in future versions.

- -

 

- -

The use of a log file may seem contrary to Unix conventions for using standard output and standard -error. I like these conventions, but never found a fully satisfactory way to -use them. I like progress messages (see below), but they mess up a file if you -re-direct standard error and there are errors or warning messages too. I could -try to detect whether a standard file handle is a tty device or a disk file and change behavior -accordingly, but I regard this as too complicated and too hard for the user to -understand. On Windows it can be hard to re-direct standard file handles, -especially when working in a GUI debugger. Maybe one day I will figure out a -better solution (suggestions welcomed).

- -

 

- -

I highly recommend using –verbose and ­–log[a], -especially when running muscle in a batch mode. This enables you to -verify whether a particular alignment was completed and to review any errors or -warnings that occurred.

- -

4.11 Progress -messages

- -

By default, muscle writes progress messages to -standard error periodically so that you know it's doing something and get some -feedback about the time and memory requirements for the alignment. Here is a -typical progress message.

- -

 

- -

00:00:23     25 Mb (5%)  Iter   -2  87.20%  Build guide tree

- -

 

- -

The fields are as follows.

- -

 

- - - - - - - - - - - - - - - - - - - - - - -
-

00:00:23

-
-

Elapsed time since muscle - started.

-
-

25 Mb (5%)

-
-

Peak memory use in megabytes - (i.e., not the current usage, but the maximum amount of memory used since muscle - started). The number in parentheses is the fraction of physical memory (see –maxmb option for more discussion).

-
-

Iter 2

-
-

Iteration currently in - progress.

-
-

87.20%

-
-

How much of the current step - has been completed (percentage).

-
-

Build...

-
-

A brief description of the current step.

-
- -

 

- -

The –quiet command-line option disables writing -progress messages to standard error. If the –verbose command-line option -is specified, a progress message will be written to the log file when each iteration completes. So –quiet and –verbose -are not contradictory.

- -

4.12 Running -out of memory

- -

The muscle code tries to deal gracefully with -low-memory conditions by using the following technique. A block of "emergency -reserve" memory is allocated when muscle starts. If a later request -to allocate memory fails, this reserve block is made available, and muscle -attempts to save the current alignment. With luck, the reserved memory will be -enough to allow muscle to save the alignment and exit gracefully with an -informative error message. See also the –maxmb -option.

- -

4.13 Troubleshooting

- -

Here is some general advice on what to do if muscle -fails and you don't understand what happened. The code is designed to fail -gracefully with an informative error message when something goes wrong, but -there will no doubt be situations I haven't anticipated (not to mention bugs).

- -

 

- -

Check the MUSCLE web site for updates, bug reports and other -relevant information.

- -

 

- -

        http://www.drive5.com/muscle

- -

 

- -

Check the input file to make sure it is in valid FASTA -format. Try giving it to another sequence analysis program that can accept -large FASTA files (e.g., the NCBI formatdb -utility) to see if you get an informative error message. Try dividing the file -into two halves and using each half individually as input. If one half fails -and the other does not, repeat until the problem is -localized as far as possible.

- -

 

- -

Use –log or –loga -and –­verbose and check the log file to see if there are any messages -that give you a hint about the problem. Look at the peak memory requirements -(reported in progress messages) to see if you may be exceeding the physical or -virtual memory capacity of your computer.

- -

 

- -

If muscle crashes without giving an error message, or -hangs, then you may need to refer to the source code or use a debugger. A -"debug" version, muscled, may be provided. This is built from -the same source code but with the DEBUG macro defined and without compiler -optimizations. This version runs much more slowly (perhaps by a factor of three -or more), but does a lot more internal checking and may be able to catch -something that is going wrong in the code. The –­core option specifies -that muscle should not catch exceptions. When –core is specified, -an exception may result in a debugger trap or a core dump, depending on the -execution environment. The –nocore option has -the opposite effect. In muscle, –nocore -is the default, –­core is the default in muscled.

- -

4.14 Technical -support

- -

I am happy to provide support. But I am busy, and am -offering this program at no charge, so I ask you to make a reasonable effort to -figure things out for yourself before contacting me.

- -

5 Command Line Reference

- -

 


-

Value option

-
-

Legal values

-
-

Default

-
-

Description

-
-

anchorspacing

-
-

Integer

-
-

32

-
-

Minimum spacing between - anchor columns.

-

 

-
-

center

-
-

Floating point

-
-

[1]

-
-

Center parameter. Should be - negative.

-

 

-
-

cluster1

-

cluster2

-
-

upgma

-

upgmb

-

neighborjoining

-
-

upgmb

-
-

Clustering method. cluster1 is used in iteration 1 and 2, cluster2 in later - iterations.

-

 

-
-

clwout

-
-

File name

-
-

None

-
-

Write output in CLUSTALW - format to given file name.

-
-

clwout

-
-

File name

-
-

None

-
-

As -clwout, - except that header is strictly compatible with CLUSTALW 1.81.

-

 

-
-

diagbreak

-
-

Integer

-
-

1

-
-

Maximum distance between two diagonals that allows them to - merge into one diagonal.

-

 

-
-

diaglength

-
-

Integer

-
-

24

-
-

Minimum length of diagonal.

-

 

-
-

diagmargin

-
-

Integer

-
-

5

-
-

Discard this many positions - at ends of diagonal.

-

 

-
-

distance1

-

 

-
-

kmer6_6

-

kmer20_3

-

kmer20_4

-

kbit20_3

-

kmer4_6

-

 

-
-

Kmer6_6 - (amino) or Kmer4_6 (nucleo)

-
-

Distance measure for iteration 1.

-
-

distance2

-

 

-
-

kmer6_6

-

kmer20_3

-

kmer20_4

-

kbit20_3

-

pctid_kimura

-

pctid_log

-

 

-
-

pctid_kimura

-
-

Distance measure for iterations 2, 3 ...

-

 

-

 

-

 

-

 

-
-

fastaout

-
-

File - name

-
-

None

-
-

Write output in FASTA format to the given file.

-

 

-
-

gapopen

-
-

Floating point

-
-

[1]

-
-

The gap open score. Must be - negative.

-

 

-
-

hydro

-
-

Integer

-
-

5

-
-

Window size for determining whether a region is - hydrophobic.

-

 

-
-

hydrofactor

-
-

Floating point

-
-

1.2

-
-

Multiplier for gap open/close penalties in hydrophobic - regions.

-

 

-
-

in

-
-

Any file name

-
-

standard input

-
-

Where to find the input sequences.

-

 

-
-

in1

-
-

Any file name

-
-

None

-
-

Where to find an input alignment.

-

 

-
-

in2

-
-

Any file name

-
-

None

-
-

Where to find an input alignment.

-

 

-
-

log

-
-

File name

-
-

None.

-
-

Log file name (delete existing file).

-

 

-
-

loga

-
-

File name

-
-

None.

-
-

Log file name (append to existing file).

-

 

-
-

matrix

-
-

File name

-
-

None

-
-

File name for substitution matrix in NCBI or WU-BLAST - format. If you specify your own matrix, you should also specify:

-

 

-

-gapopen <g>, -gapextend <e> -center 0.0

-

 

-

Note that <g> and <e> MUST be negative.

-

 

-
-

maxhours

-
-

Floating point

-
-

None.

-
-

Maximum time to run in hours. The actual time may exceed - the requested limit by a few minutes. Decimals are allowed, so 1.5 means one hour - and 30 minutes.

-

 

-
-

maxiters

-
-

Integer 1, 2 ...

-
-

16

-
-

Maximum number of iterations.

-

 

-
-

maxmb

-
-

Integer

-
-

80% - of Physical RAM, or 500 Mb if not known.

-

 

-
-

Maximum memory to allocate in Mb.

-
-

maxtrees

-
-

Integer

-
-

1

-
-

Maximum number of new trees to build in iteration 2.

-

 

-
-

minbestcolscore

-
-

Floating point

-
-

[1]

-
-

Minimum score a column must - have to be an anchor.

-

 

-
-

minsmoothscore

-
-

Floating point

-
-

[1]

-
-

Minimum smoothed score a - column must have to be an anchor.

-

 

-
-

msaout

-
-

File name

-
-

None

-
-

Write output to given file - name in MSF format.

-

 

-
-

objscore

-
-

sp

-

ps

-

dp

-

xp

-

spf

-

spm

-
-

spm

-
-

Objective score used by tree dependent refinement.

-

sp=sum-of-pairs score.

-

spf=sum-of-pairs score (dimer - approximation)

-

spm=sp for < 100 seqs, otherwise spf

-

dp=dynamic programming score.

-

ps=average profile-sequence score.

-

xp=cross profile score.

-

 

-
-

out

-
-

File name

-
-

standard output

-
-

Where to write the alignment.

-

 

-
-

phyiout

-
-

File name

-
-

None

-
-

Write output in Phylip - interleaved format to given file name.

-

 

-
-

physout

-
-

File name

-
-

None

-
-

Write output in Phylip - sequential format to given file name.

-

 

-
-

refinewindow

-
-

Integer

-
-

200

-
-

Length of window for -refinew.

-

 

-
-

root1

-

root2

-
-

pseudo

-

midlongestspan

-

minavgleafdist

-
-

psuedo

-
-

Method used to root tree; root1 is used in iteration 1 and - 2, root2 in later iterations.

-

 

-

 

-
-

scorefile

-
-

File - name

-
-

None

-
-

File name where to write a score file. This contains one - line for each column in the alignment. The line contains the letters in the - column followed by the average BLOSUM62 score over pairs of letters in the column.

-

 

-
-

seqtype

-
-

protein

-

nucleo

-

auto

-

 

-
-

auto

-
-

Sequence type.

-
-

smoothscoreceil

-
-

Floating point

-
-

[1]

-
-

Maximum value of column score for - smoothing purposes.

-

 

-
-

smoothwindow

-
-

Integer

-
-

7

-
-

Window used for anchor column smoothing.

-

 

-
-

spscore

-
-

File name

-
-

 

-
-

Compute SP objective score of multiple alignment.

-

 

-
-

SUEFF

-
-

Floating point value between 0 and 1.

-

 

-
-

0.1

-
-

Constant used in UPGMB clustering. Determines the relative - fraction of average linkage (SUEFF) vs. nearest-neighbor linkage (1 – SUEFF).
-
-

-
-

tree1

-

tree2

-
-

File name

-
-

None

-
-

Save tree produced in first or second iteration to given - file in Newick (Phylip-compatible) - format.

-

 

-
-

usetree

-
-

File name

-
-

None

-
-

Use given tree as guide tree. Must by in Newick (Phyip-compatible) - format.

-

 

-
-

weight1

-

weight2

-
-

none

-

henikoff

-

henikoffpb

-

gsc

-

clustalw

-

threeway

-
-

clustalw

-

 

-
-

Sequence weighting scheme.

-

weight1 - is used in iterations 1 and 2.

-

weight2 - is used for tree-dependent refinement.

-

none=all - sequences have equal weight.

-

henikoff=Henikoff & - Henikoff weighting scheme.

-

henikoffpb=Modified - Henikoff scheme as used in PSI-BLAST.

-

clustalw=CLUSTALW method.

-

threeway=Gotoh three-way - method.

-

 

-
- -

 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-

Flag option

-
-

Set by default?

-
-

Description

-
-

anchors

-
-

yes

-
-

Use anchor optimization in - tree dependent refinement iterations.

-

 

-
-

brenner

-
-

no

-
-

Use Steven Brenner's method - for computing the root alignment.

-

 

-
-

cluster

-
-

no

-
-

Perform fast clustering of - input sequences. Use the –tree1 option to save the tree.

-

 

-
-

dimer

-
-

no

-
-

Use dimer approximation for - the SP score (faster, slightly less accurate).

-

 

-
-

clw

-
-

no

-
-

Write output in CLUSTALW - format (default is FASTA).

-

 

-
-

clwstrict

-
-

no

-
-

Write output in CLUSTALW - format with the "CLUSTAL W (1.81)" header rather than the MUSCLE - version. This is useful when a post-processing step is picky about the file - header.

-

 

-
-

core

-
-

yes in muscle,

-

no - in muscled.

-
-

Do not catch exceptions.

-

 

-

 

-
-

diags

-
-

no

-
-

Use diagonal optimizations. - Faster, especially for closely related sequences, but may be less accurate.

-

 

-
-

diags1

-
-

no

-
-

Use diagonal optimizations - in first iteration.

-

 

-
-

diags2

-
-

no

-
-

Use diagonal optimizations - in second iteration.

-

 

-
-

fasta

-
-

yes

-
-

Write output in FASTA - format.

-

 

-
-

group

-
-

yes

-
-

Group similar sequences - together in the output. This is the default. See also –stable.

-

 

-
-

html

-
-

no

-
-

Write output in HTML format - (default is FASTA).

-

 

-
-

le

-
-

maybe

-
-

Use log-expectation profile score (VTML240). Alternatives - are to use –sp or –sv. This is the - default for amino acid sequences.

-

 

-
-

msf

-
-

no

-
-

Write output in MSF format (default is FASTA). Designed to - be compatible with the GCG package.

-

 

-
-

noanchors

-
-

no

-
-

Disable anchor optimization. Default is –anchors.

-

 

-
-

nocore

-
-

no in muscle,

-

yes in muscled.

-
-

Catch exceptions and give an error message if possible.

-

 

-

 

-
-

phyi

-
-

no

-
-

Write output in Phylip - interleaved format.

-

 

-
-

phys

-
-

no

-
-

Write output in Phylip - sequential format.

-

 

-
-

profile

-
-

no

-
-

Compute profile-profile alignment. Input alignments must - be given using –in1 and –in2 options.

-

 

-
-

quiet

-
-

no

-
-

Do not display progress messages.

-

 

-
-

refine

-
-

no

-
-

Input file is already aligned, skip first two iterations - and begin tree dependent refinement.

-

 

-
-

refinew

-
-

no

-
-

Refine an alignment by dividing it into non-overlapping - windows and re-aligning each window. Typically used for whole-genome - nucleotide alignments.

-

 

-
-

sp

-
-

no

-
-

Use sum-of-pairs protein profile score (PAM200). Default - is –le.

-

 

-
-

spscore

-
-

no

-
-

Compute alignment score of profile-profile alignment. - Input alignments must be given using –in1 and –in2 options. - These must be pre-aligned with gapped columns as needed, i.e. must be of the - same length (have same number of columns).

-

 

-
-

spn

-
-

maybe

-

 

-
-

Use sum-of-pairs nucleotide profile score. This is the - only option for nucleotides, and is therefore the default. The substitution - scores and gap penalty scores are "borrowed" from BLASTZ.

-

 

-
-

stable

-
-

no

-
-

Preserve input order of sequences in output file. Default - is to group sequences by similarity (–group).

-

 

-
-

sv

-
-

no

-
-

Use sum-of-pairs profile score (VTML240). Default is –le.

-

 

-
-

termgaps4

-
-

yes

-
-

Use 4-way test for treatment of terminal gaps. (Cannot be - disabled in this version).

-

 

-
-

termgapsfull

-
-

no

-
-

Terminal gaps penalized with full penalty.

-

[1] Not fully supported in this version.

-

 

-
-

termgapshalf

-
-

yes

-
-

Terminal gaps penalized with half penalty.

-

[1] Not fully supported in this version.

-

 

-
-

termgapshalflonger

-
-

no

-
-

Terminal gaps penalized with half penalty if gap relative - to

-

longer sequence, otherwise with - full penalty.

-

[1] Not fully supported in this version.

-

 

-
-

verbose

-
-

no

-
-

Write parameter settings and progress messages to log - file.

-

 

-
-

version

-
-

no

-
-

Write version string to stdout - and exit.

-
- -

 

- -

Notes

- -

[1] Default depends on the profile scoring function. To -determine the default, use –verbose –log and check the log file.

- -

 

- -
- - - - diff --git a/binaries/help/muscle3.7.txt b/binaries/help/muscle3.7.txt deleted file mode 100644 index 7f23ee4..0000000 --- a/binaries/help/muscle3.7.txt +++ /dev/null @@ -1,32 +0,0 @@ -MUSCLE v3.7 by Robert C. Edgar - -http://www.drive5.com/muscle -This software is donated to the public domain. -Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97. - - -Basic usage - - muscle -in -out - -Common options (for a complete list please see the User Guide): - - -in Input file in FASTA format (default stdin) - -out Output alignment in FASTA format (default stdout) - -diags Find diagonals (faster for similar sequences) - -maxiters Maximum number of iterations (integer, default 16) - -maxhours Maximum time to iterate in hours (default no limit) - -maxmb Maximum memory to allocate in Mb (default 80% of RAM) - -html Write output in HTML format (default FASTA) - -msf Write output in GCG MSF format (default FASTA) - -clw Write output in CLUSTALW format (default FASTA) - -clwstrict As -clw, with 'CLUSTAL W (1.81)' header - -log[a] Log to file (append if -loga, overwrite if -log) - -quiet Do not write progress messages to stderr - -stable Output sequences in input order (default is -group) - -group Group sequences by similarity (this is the default) - -version Display version information and exit - -Without refinement (very fast, avg accuracy similar to T-Coffee): -maxiters 2 -Fastest possible (amino acids): -maxiters 1 -diags -sv -distance1 kbit20_3 -Fastest possible (nucleotides): -maxiters 1 -diags diff --git a/binaries/help/probcons.pdf b/binaries/help/probcons.pdf deleted file mode 100644 index 74dd22b..0000000 Binary files a/binaries/help/probcons.pdf and /dev/null differ diff --git a/binaries/help/t_coffee.htm b/binaries/help/t_coffee.htm deleted file mode 100644 index 2468a40..0000000 --- a/binaries/help/t_coffee.htm +++ /dev/null @@ -1,19848 +0,0 @@ - - - - - - - - - - -Manual - - - - - - - - - - - - - -
- -
- - - - - -
-

Technical

-
- -
- -
- - -

Centre National -De LA Recherche scientifique (France)
-CeNTRO De REGULACIO GENOMICA (SPAIN)

- -
- -

Cédric Notredame
-www.tcoffee.org

- -
- -
- -

T-Coffee:
-Technical Documentation

- -
- -
- -

 

- -
- -
- -
-
- -
- -
- -

T-Coffee Technical Documentation
-(Version 8.01, July 2009)
-www.tcoffee.org
-
-T-Coffee, seq_reformat
- PSI-Coffee, 3D-Coffee, M-Coffee, R-Coffee, -APDB, iRMSD, T-RMSD

- -
- -

ã Cédric Notredame, Centro de Regulacio Genomica, Centre National de -la Recherche Scientifique, France

- -
- -
-
- -
- -

License and Terms -of Use. 6

- -

T-Coffee is distributed under the Gnu Public License. 6

- -

T-Coffee code can be re-used freely. 6

- -

T-Coffee can be incorporated in most pipelines: -Plug-in/Plug-out…... 6

- -

Addresses and Contacts. 7

- -

Contributors. 7

- -

Addresses. 7

- -

Citations. 8

- -

T-Coffee. 8

- -

Mocca. 9

- -

CORE.. 10

- -

Other Contributions. 10

- -

Bug Reports and Feedback. 10

- -

Installation of The T-Coffee Packages. 11

- -

Third Party Packages and On Demand Installations. 11

- -

Standard Installation of T-Coffee. 11

- -

Unix. 11

- -

Microsoft Windows/Cygwin. 13

- -

MAC osX, Linux. 13

- -

CLUSTER Installation. 13

- -

If you have PDB installed: 13

- -

Installing BLAST for T-Coffee. 14

- -

Why Do I need BLAST with T-Coffee?. 14

- -

Using the EBI BLAST Client 14

- -

Using the NCBI BLAST Client 15

- -

Using another Client 15

- -

Using a BLAST local version on UNIX.. 16

- -

Using a BLAST local version on Windows/cygwin. 16

- -

Installing Other Companion Packages. 17

- -

Installation of PSI-Coffee and Expresso. 18

- -

Installation of M-Coffee. 19

- -

Automated Installation. 19

- -

Manual Installation. 20

- -

Installation of APDB and iRMSD.. 21

- -

Installation of tRMSD.. 21

- -

Installation of seq_reformat 22

- -

Installation of extract_from_pdb. 22

- -

Installation of 3D-Coffee/Expresso. 22

- -

Automated Installation. 22

- -

Manual Installation. 23

- -

Installing Fugue for T-Coffee. 23

- -

Installation of R-Coffee. 23

- -

Automated Installation. 24

- -

Manual Installation. 24

- -

Installing ProbbonsRNA for R-Coffee. 24

- -

Installing Consan for R-Coffee. 24

- -

Quick Start 25

- -

T-COFFEE.. 25

- -

M-Coffee. 25

- -

Expresso. 26

- -

R-Coffee. 26

- -

iRMSD and APDB.. 27

- -

tRMSD.. 27

- -

MOCCA.. 28

- -

Recent Modifications. 29

- -

Reference Manual 30

- -

Environment Variables. 30

- -

http_proxy_4_TCOFFEE.. 30

- -

email_4_TCOFFEE.. 31

- -

DIR_4_TCOFFEE.. 31

- -

TMP_4_TCOFFEE.. 31

- -

CACHE_4_TCOFFEE.. 31

- -

NO_ERROR_REPORT_4_TCOFFEE.. 31

- -

PDB_DIR.. 31

- -

NO_WARNING_4_TCOFFEE.. 31

- -

Setting up the T-Coffee environment variables. 31

- -

Well Behaved Parameters. 32

- -

Separation. 32

- -

Posix. 32

- -

Entering the right parameters. 32

- -

Parameters Syntax. 32

- -

No Flag. 32

- -

-parameters. 33

- -

-t_coffee_defaults. 33

- -

-mode. 33

- -

-score [Deprecated]. 34

- -

-evaluate. 34

- -

-convert [cw]. 34

- -

-do_align [cw]. 34

- -

Special Parameters. 35

- -

-version. 35

- -

-proxy. 35

- -

-email 35

- -

-check_configuration. 35

- -

-cache. 35

- -

-update. 35

- -

-full_log. 35

- -

-other_pg. 36

- -

Input 36

- -

Sequence Input 36

- -

-infile [cw]. 36

- -

-in (Cf –in from the Method and Library Input section) 36

- -

-get_type. 36

- -

-type [cw]. 36

- -

-seq. 37

- -

-seq_source. 37

- -

Structure Input 37

- -

-pdb. 37

- -

Tree Input 37

- -

-usetree. 37

- -

Structures, Sequences Methods and Library Input via the -in Flag. 38

- -

-in. 38

- -

Profile Input 40

- -

-profile. 40

- -

-profile1 [cw]. 40

- -

-profile2 [cw]. 40

- -

Alignment Computation. 40

- -

Library Computation: Methods. 40

- -

-lalign_n_top. 40

- -

-align_pdb_param_file. 41

- -

-align_pdb_hasch_mode. 41

- -

Library Computation: Extension. 41

- -

-lib_list [Unsupported]. 41

- -

-do_normalise. 41

- -

-extend. 41

- -

-extend_mode. 41

- -

-max_n_pair. 42

- -

-seq_name_for_quadruplet 42

- -

-compact 42

- -

-clean. 42

- -

-maximise. 42

- -

-do_self 42

- -

-seq_name_for_quadruplet 42

- -

-weight 43

- -

Tree Computation. 43

- -

-distance_matrix_mode. 43

- -

-quicktree [CW]. 44

- -

Pair-wise Alignment Computation. 44

- -

-dp_mode. 44

- -

-ktuple. 45

- -

-ndiag. 45

- -

-diag_mode. 45

- -

-diag_threshold. 46

- -

-sim_matrix. 46

- -

-matrix [CW]. 46

- -

-nomatch. 46

- -

-gapopen. 46

- -

-gapext 47

- -

-fgapopen. 47

- -

-fgapext 47

- -

-cosmetic_penalty. 47

- -

-tg_mode. 47

- -

Weighting Schemes. 47

- -

-seq_weight 47

- -

Multiple Alignment Computation. 48

- -

-msa_mode. 48

- -

-one2all 48

- -

-profile_comparison. 48

- -

-profile_mode. 49

- -

Alignment Post-Processing. 49

- -

-clean_aln. 49

- -

-clean_threshold. 49

- -

-clean_iteration. 49

- -

-clean_evaluation_mode. 49

- -

-iterate. 49

- -

CPU Control 50

- -

Multithreading. 50

- -

-multi_thread [NOT Supported]. 50

- -

Limits. 50

- -

-mem_mode. 50

- -

-ulimit 50

- -

-maxlen. 50

- -

Aligning more than 100 sequences with DPA.. 50

- -

-maxnseq. 50

- -

-dpa_master_aln. 50

- -

-dpa_maxnseq. 51

- -

-dpa_min_score1. 51

- -

-dpa_min_score2. 51

- -

-dap_tree [NOT IMPLEMENTED]. 51

- -

Using Structures. 51

- -

Generic. 51

- -

-mode. 51

- -

-check_pdb_status. 52

- -

3D Coffee: Using SAP.. 52

- -

Using/finding PDB templates for the Sequences. 52

- -

-template_file. 52

- -

-struc_to_use. 54

- -

Multiple Local Alignments. 54

- -

-domain/-mocca. 55

- -

-start 55

- -

-len. 55

- -

-scale. 55

- -

-domain_interactive [Examples]. 56

- -

Output Control 57

- -

Generic. 57

- -

Conventions Regarding Filenames. 57

- -

Identifying the Output files automatically. 57

- -

-no_warning. 57

- -

Alignments. 57

- -

-outfile. 57

- -

-output 57

- -

-outseqweight 58

- -

-case. 58

- -

-cpu. 58

- -

-outseqweight 58

- -

-outorder [cw]. 59

- -

-inorder [cw]. 59

- -

-seqnos. 59

- -

Libraries. 59

- -

-out_lib. 59

- -

-lib_only. 59

- -

Trees. 60

- -

-newtree. 60

- -

Reliability Estimation. 60

- -

CORE Computation. 60

- -

-evaluate_mode. 60

- -

Generic Output 61

- -

-run_name. 61

- -

-quiet 61

- -

-align [CW]. 61

- -

APDB, iRMSD and tRMSD Parameters. 61

- -

-quiet [Same as T-Coffee]. 61

- -

-run_name [Same as T-Coffee]. 61

- -

-aln. 61

- -

-n_excluded_nb. 62

- -

-maximum_distance. 62

- -

-similarity_threshold. 62

- -

-local_mode. 62

- -

-filter. 62

- -

-print_rapdb [Unsupported]. 63

- -

-outfile [Same as T-Coffee]. 63

- -

-color_mode. 63

- -

Building a Server 64

- -

Environment Variables. 64

- -

Output of the .dnd file. 65

- -

Permissions. 65

- -

Other Programs. 65

- -

Formats. 66

- -

Parameter files. 66

- -

Sequence Name Handling. 66

- -

Automatic Format Recognition. 67

- -

Structures. 67

- -

RNA Structures. 67

- -

Sequences. 67

- -

Alignments. 67

- -

Libraries. 68

- -

T-COFFEE_LIB_FORMAT_01. 68

- -

T-COFFEE_LIB_FORMAT_02. 69

- -

Library List 69

- -

Substitution matrices. 69

- -

ClustalW Style [Deprecated]. 69

- -

BLAST Format [Recommended]. 70

- -

Sequences Weights. 70

- -

Known Problems. 71

- -

Technical Notes. 72

- -

Development 72

- -

Command Line List 72

- -

To Do….. 74

- -

- -
- -
-
- -
- - - -

 

- -

T-Coffee is distributed -under the Gnu Public License

- -

 

- -

Please make sure you have agreed with the terms -of the license attached to the package before using the T-Coffee package or its -documentation. T-Coffee is a freeware open source distributed under a GPL license. -This means that there are very little restrictions to its use, either in an -academic or a non academic environment.

- -

T-Coffee code can be -re-used freely

- -

Our philosophy is that code is meant to be re-used, -including ours. No permission is needed for the cut and paste of a few -functions, although we are always happy to receive pieces of improved code.

- -

T-Coffee can be -incorporated in most pipelines: Plug-in/Plug-out…

- -

Our philosophy is to insure -that as many methods as possible can be used as plug-ins within T-Coffee. -Likewise, we will give as much support as possible to anyone wishing to turn -T-Coffee into a plug-in for another method. For more details on how to do this, -see the plug-in and the plug-out sections of the Tutorial Manual.

- -

Again, you do not need our -permission to either use T-Coffee (or your method as a plug-in/out) but if you -let us know, we will insure the stability of T-Coffee within your system -through future releases.

- -

The current license only -allows for the incorporation of T-Coffee in non-commercial pipelines (i.e. -where you do not sell the pipeline, or access to it). If your pipeline is -commercial, please get in touch with us.

- -

 

- - - -

Contributors

- -

T-coffee is developed, maintained, -monitored, used and debugged by a dedicated team that include or have included:

- -

            Cédric -Notredame, Fabrice Armougom, Des Higgins, Sebastien Moretti, Orla O’Sullivan. Eamon -O’Toole, Olivier Poirot, Karsten Suhre, Iain Wallace, Andreas Wilm

- -

Addresses

- -

We are always very eager to get some user -feedback. Please do not hesitate to drop us a line  at: cedric.notredame@europe.com The latest updates of T-Coffee -are always available  on: www.tcoffee.org -. On this address you will also find a link to some of the online T-Coffee -servers, including Tcoffee@igs

- -

 

- -

T-Coffee can be used to automatically check if -an updated version is available, however the program will not update -automatically, as this can cause endless reproducibility problems.

- -
- -

PROMPT: -t_coffee –update

- -
- -

 

- -
- -

Citations

- -
- -

It is important that you cite T-Coffee when you -use it. Citing us is (almost) like giving us money: it helps us convincing our -institutions that what we do is useful and that they should keep paying our -salaries and deliver Donuts to our offices from time to time (Not that they -ever did it, but it would be nice anyway).

- -

 

- -

Cite the server if you used it, otherwise, cite -the original paper from 2000 (No, it was never named "T-Coffee 2000").

- - - - - - - - - -
-

Notredame - C, Higgins DG, Heringa J.

-
-

Related Articles, Links

-
-

T-Coffee: A novel method for fast - and accurate multiple sequence alignment.
- J Mol Biol. 2000 Sep 8;302(1):205-17.
- PMID: 10964570 [PubMed - indexed for MEDLINE]

-
- -

Other useful publications include:

- -

T-Coffee

- - - - - - - - - -
-

Claude - JB, Suhre K, Notredame C, Claverie JM, Abergel C.

-
-

Related Articles, Links

-
-

CaspR: a web server for automated - molecular replacement using homology modelling.
- Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W606-9.
- PMID: 15215460 [PubMed - indexed for MEDLINE]

-
- -

 

- - - - - - - - - -
-

Poirot - O, Suhre K, Abergel C, O'Toole E, Notredame C.

-
-

Related Articles, Links

-
-

3DCoffee@igs: a web server for - combining sequences and structures into a multiple sequence alignment.
- Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W37-40.
- PMID: 15215345 [PubMed - indexed for MEDLINE]

-
- -

 

- - - - - - - - - -
-

O'Sullivan - O, Suhre K, Abergel C, Higgins DG, Notredame C.

-
-

Related Articles, Links

-
-

3DCoffee: combining protein - sequences and structures within multiple sequence alignments.
- J Mol Biol. 2004 Jul 2;340(2):385-95.
- PMID: 15201059 [PubMed - indexed for MEDLINE]

-
- -

 

- - - - - - - - - -
-

Poirot - O, O'Toole E, Notredame C.

-
-

Related Articles, Links

-
-

Tcoffee@igs: A web server for - computing, evaluating and combining multiple sequence alignments.
- Nucleic Acids Res. 2003 Jul 1;31(13):3503-6.
- PMID: 12824354 [PubMed - indexed for MEDLINE]

-
- -

 

- - - - - - - - - -
-

Notredame - C.

-
-

Related Articles, Links

-
-

Mocca: semi-automatic method for - domain hunting.
- Bioinformatics. 2001 Apr;17(4):373-4.
- PMID: 11301309 [PubMed - indexed for MEDLINE]

-
- -

 

- - - - - - - - - -
-

Notredame - C, Higgins DG, Heringa J.

-
-

Related Articles, Links

-
-

T-Coffee: A novel method for fast - and accurate multiple sequence alignment.
- J Mol Biol. 2000 Sep 8;302(1):205-17.
- PMID: 10964570 [PubMed - indexed for MEDLINE]

-
- -

 

- - - - - - - - - -
-

Notredame - C, Holm L, Higgins DG.

-
-

Related Articles, Links

-
-

COFFEE: an objective function for - multiple sequence alignments.
- Bioinformatics. 1998 Jun;14(5):407-22.
- PMID: 9682054 [PubMed - indexed for MEDLINE]

-
- -

 

- -

Mocca

- - - - - - - - - -
-

Notredame C.

-
-

Related Articles, Links

-
-

Mocca: semi-automatic method for domain - hunting.
- Bioinformatics. 2001 Apr;17(4):373-4.
- PMID: 11301309 [PubMed - indexed for MEDLINE]

-
- -

CORE

- -

http://www.tcoffee.org/Publications/Pdf/core.pp.pdf

- -

Other Contributions

- -

We do not mean to steal code, but we will always -try to re-use pre-existing code whenever that code exists, free of copyright, -just like we expect people to do with our code. However, whenever this happens, -we make a point at properly citing the source of the original contribution. If -ever you recognize a piece of your code improperly cited, please drop us a note -and we will be happy to correct that.

- -

In the mean time, here are some important pieces -of code from other packages that have been incorporated within the T-Coffee -package. These include:

- -

         -The -Sim algorithm of Huang and Miller that given two sequences computes the N best -scoring local alignments.

- -

         -The -tree reading/computing routines are taken from the ClustalW Package, courtesy -of Julie Thompson, Des Higgins and Toby Gibson (Thompson, Higgins, Gibson, -1994, 4673-4680,vol. 22, Nucleic Acid Research).

- -

         -The -implementation of the algorithm for aligning two sequences in linear space was -adapted from Myers and Miller, in CABIOS, 1988, 11-17, vol. 1)

- -

         -Various techniques and algorithms have -been implemented. Whenever relevant, the source of the code/algorithm/idea is -indicated in the corresponding function.

- -

         -64 -Bits compliance was implemented by Benjamin Sohn, Performance Computing Center -Stuttgart (HLRS), Germany

- -

         -David -Mathog (Caltech) provided many fixes and useful feedback for improving the code -and making the whole soft behaving more rationally

- -

Bug Reports and Feedback

- -

         -Prof -David Jones (UCL) reported and corrected the PDB1K bug (now t_coffee/sap can -align PDB sequences longer than 1000 AA).

- -

         -Johan -Leckner reported several bugs related to the treatment of PDB structures, -insuring a consistent behavior between version 1.37 and current ones.

- -

 

- -

 

- -
- -

Installation of The T-Coffee -Packages

- -
- -

Third Party Packages and On Demand -Installations

- -

T-Coffee is a complex package that interacts with many other third part -software. If you only want a standalone version of T-Coffee, you may install -that package on its own. If you want to use a most sophisticated flavor -(3dcoffee, expresso, rcofeee, etc...), the installer will try to install all -the third paparty packages required.

- -

 

- -

Note that since version 7.56, T-Coffee will use 'on demand' -installation and install the third party packages it needs *when* it needs -them. This only works for packages not requiring specific licenses and that can -be installed by the regular installer. Please let us know if you would like -another third party package to be included.

- -

Whenver on-demand installation or automated installation fails because -of unforessen system specificities, users should install the third party -package manually. In this documentation gives some tips we have found useful, -but users are encouraged to check the original documentation.

- -

Standard -Installation of T-Coffee

- -

Unix

- -

You need to have: gcc, g77, CPAN and an internet -connection and your root password (to install SOAP). If you cannot log as root, -ask (kindly) your system manager to install SOAP::Lite for you. You may do this -before or after the installation of T-Coffee. Even without SOAP you will still -be able to use the basic functions of T-Coffee (simplest usage).

- -

 

- -

1.        -gunzip t_coffee.tar.gz

- -

2.        -tar -xvf t_coffee.tar

- -

3.        -cd t_coffee

- -

4.        -./install t_coffee

- -

This installation will try to install EVERY -flavor of T-Coffee along with the packages it requires. It will not re-install -the packages that are already on your computer.

- -

If you want a more specific installation, you -can try:

- -
- -

   ./install t_coffee

- -

   ./install mcoffee

- -

   ./install 3dcoffee

- -

   ./install rcoffee

- -

   ./install psicoffee

- -

 

- -
- -

 

- -

Or even

- -
- -

   ./install all

- -
- -

 

- -

-All the corresponding executables will be -downloaded automatically and installed in

- -
- -

   $HOME/.t_coffee/plugins

- -
- -

 

- -

-if you executables are in a different -location, give it to T-Coffee using the -plugins flag.

- -

-If the installation of any of the companion -package fails, you should install it yourself using the provided link (see -below) and following the authors instructions.     

- -

-If you have not managed to install SOAP::Lite, -you can re-install it later (from anywhere) following steps 1-2.

- -

 

- -

-This procedure attempts 3 things: -installing and Compiling T-Coffee (C program), Installing and compiling TMalign (Fortran), Installing and -compiling SOAP::Lite(Perl Module).

- -

 

- -

-If you have never installed SOAP::Lite, -CPAN will ask you many questions: say Yes to all

- -

-If everything went well, the procedure has -created in the bin directory two executables: t_coffee and TMalign (Make -sure these executables are on your $PATH!)

- -

 

- -

Microsoft Windows/Cygwin

- -

Install Cygwin

- -

Download The Installer (NOT Cygwin/X)

- -

Click on view to list ALL the packages

- -

Select: gcc-core, make, wget

- -

Optional: ssh, xemacs, nano

- -

Run mkpasswd in Cywin (as requested when -you start cygwin)

- -

Install T-Coffee within Cygwin using the -Unix procedure

- -

MAC osX, Linux

- -

Make sure you have the Developer's kit installed -(compilers and makefile)

- -

Follow the Unix Procedure

- -

 

- -

CLUSTER Installation

- -

In order to run, T-Coffee must have a value for -the http_proxy and for the E-mail. In order to do so you can either:

- -

export the following values:

- -

export -http_proxy_4_TCOFFEE="proxy" or "" if no proxy

- -

export EMAIL_4_TCOFFEE="your -email"

- -

OR

- -

modify the file ~/.t_coffee/t_coffee_env

- -

OR

- -

add to your command line: t_coffee …. --proxy=<proxy> -email=<email

- -

if you have no proxy: t_coffee … -proxy --email=<email>

- -

 

- -

 

- -

If you have PDB installed:

- -

Assuming you have a standard PDB installation -in your file system

- -

setenv (or export)  PDB_DIR <abs path>/data/structures/all/pdb/

- -

OR

- -

setenv (or export)  PDB_DIR <abs -path>/structures/divided/pdb/

- -

If you do not have PDB installed, don't worry, -t_coffee will go and fetch any structure it needs directly from the PDB -repository. It will simply be a bit slower than if you had PDB locally.

- -

Installing BLAST for T-Coffee

- -

BLAST is a program that search sequence databases for homologues of a -query sequence. It works for proteins and Nucleic Acids. In theory BLAST is -just a package like any, but in practice things are a bit more complex. To run -well, BLST requires up to date databases (that can be fairly large, like NR or -UNIPROT) and a powerful computer.

- -

Fortunately, an increasing number of institutes or companies are now providing -BLAST clients that run over the net. It means that all you need is a small -program that send your query to the big server and gets the results back. This -prevents you from the hassle of installing and maintaining BLAST, but of course -it is less private and you rely on the network and the current load of these -busy servers.

- -

Thanks to its interaction with BLAST, T-Coffee can gather structures -and protein profiles and deliver an alignment significantly more accurate than -the default you would get with T-Coffee or any similar method.

- -

Let us go through the various modes available for T-Coffee

- -

 

- -

Why -Do I need BLAST with T-Coffee?

- -

The most accurate modes of T-Coffe scan the databases for templates -that they use to align the sequences. There are currently two types of templates -for proteins:

- -

structures (PDB) that can be found by a blastp against the PDB database -and profiles that can be constructed with eiether a blastp or a psiblast -against nr or uniprot.

- -

These templates are automatically built if you use:

- -
- -

   t_coffee <yourseq> -mode expresso

- -
- -

         that fetches aand uses -pdb templates, or

- -
- -

          t_coffee <your -seq> -mode psicoffee

- -
- -

         that fetches and uses -profile templates, or

- -
- -

          t_coffee <your -seq> -mode accurate

- -
- -

         that does everything and -tries to use the best template. Now that you see why it is useful let's see how -to get BLAST up and running, from the easy solution to tailor made ones.

- -

 

- -

Using -the EBI BLAST Client

- -

This is by far the easiest (and the default mode). The perl clients are -already incorporated in T-Coffeem and all you need is the SOAP::Lite perl -library. In theory, T-Coffee should have already installed this library during -the standard installation. Yet, this requires having toot access. If you did -not have it at the time of the installation, or if you need your system administrator -to install SOAP::Lite, simply follow the instruction provided on the website:

- -

 

- -
- -

   http://search.cpan.org/~byrne/SOAP-Lite-0.60a

- -
- -

It really is worth the effort, since the EBI is providing one of the -best webservice available around, and most notably, the only public psiblast -via a web service.

- -

 

- -

Another important point is that the EBI requires your E-mail address to -process your queries. Normally, T-Coffee should have asked you to provide this -address. If you have not, or if you have provided a phony address, you should -correct this by directly editing the file

- -
- -

   ~/.t_coffee/email.txt

- -
- -

Be Careful! If you provide a fake E-mail, the -EBI may suspend the service for all machines associated with your IP address -(that could mean your entire lab, or entire institute, or even the entire -country or, but I doubt it, the whole universe).

- -

Using -the NCBI BLAST Client

- -

The NCBI is the next best alternative. In my hand it was always a bit -slower and most of all, it does not incorporate PSI-BLAST (as a web sevice). A -big miss. The NCBI web blast client is a small executable that you should -install on your system following the instructions given on this link

- -
- -

ftp://ftp.ncbi.nih.gov/blast/executables/LATEST

- -
- -

Simply go for netbl, download the executable that corresponds to -your architecture (cygwin users should go for the win executable). Despite all -the files that come along the executable blastcl3 is a stand alone executable -that you can safely move to your $BIN.

- -

All you will then need to do is to make sure that T-Coffee uses the -right client, when you run it.

- -
- -

-blast_server=NCBI

- -
- -

No need for any E-mail here, but you don't get psiblast, and whenever -T-Coffee wants to use it, blastp will be used instead.

- -

Using -another Client

- -

You may have your own client (lucky you). If that is so, all you need -is to make sure that this client is complient with the blast command line. If -your client is named foo.pl, all you need to to is run T-Coffee with

- -
- -

-blast_server=CLIENT_foo.pl

- -
- -

Foo will be called as if it were blastpgp, and it is your responsability -to make sure it can handle the following command line:

- -
- -

foo.pl -p <method> -d -<db> -i <infile> -o <outfile> -m 7

- -
- -

method can either be blastp or psiblast.

- -

infile is a FASTA file

- -

-m7 triggers the XML output. T-Coffee is able to parse both the EBI XML -output and the NCBI XML output.

- -

 

- -

If foo.pl behaves differently, the easiest will probably be to write a -wrapper around it so that wrapped_foo.pl behaves like blastpgp

- -

 

- -

Using -a BLAST local version on UNIX

- -

If you have blastpgp installed, you can run it instead of the remote -clients by using:

- -
- -

-blast_server=LOCAL

- -
- -

 The documnentation for blastpgp -can be found on:

- -
- -

www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastpgp.html

- -
- -

and the package is part of the standard BLAST distribution

- -
- -

ftp://ftp.ncbi.nih.gov/blast/executables/LATEST

- -
- -

Depending on your system, your own skills, your requirements and on -more parameters than I have fingers to count, installing a BLAST server suited -for your needs can range from a 10 minutes job to an achivement spread over -several generations. So at this point, you should roam the NCBI website for -suitable information.

- -

If you want to have your own BLAST server to run your own databases, -you should know that it is possible to control both the database and the -program used by BLAST:

- -

 

- -
- -

-prot_db: will specify the database used by all the psi-blast modes

- -

-pdb_db: will specify the database used by the pdb modes

- -
- -

Using -a BLAST local version on Windows/cygwin

- -

For those of you using cygwin, be careful. While cygwin behaves like a -UNIX system, the BLAST executable required for cygwin (win32) is expecting -WINDOWS path and not UNIX path. This has three important consequences:

- -

1- the ncbi file declaring the Data directory must be:

- -

         C:WINDOWS//ncbi.init  [at the root of your WINDOWS]

- -

2- the address mentionned with this file must be WINDOWS formated, for -instance, on my system:

- -

Data=C:\cygwin\home\notredame\blast\data

- -

3- When you pass database addresses to BLAST, these must be in Windows -format:

- -

         -protein_db="c:/somewhere/somewhereelse/database"

- -

(using the slash (/) or the andtislash (\) does not matter on new -systems but I would reommand against incorporating white spaces.

- -

Installing -Other Companion Packages

- -

T-Coffee is meant to interact with as many packages as possible, either -for aligning or using predictions. If you type

- -
- -

   t_coffee

- -
- -

You will receive a list of supported packages that looks like the next -table. In theory, most of these packages can be installed by T-Coffee

- -

 

- -
- -

****** Pairwise Sequence Alignment Methods:

- -

--------------------------------------------

- -

fast_pair          built_in

- -

exon3_pair         built_in

- -

exon2_pair         built_in

- -

exon_pair          built_in

- -

slow_pair          built_in

- -

proba_pair         built_in

- -

lalign_id_pair     built_in

- -

seq_pair           built_in

- -

externprofile_pair built_in

- -

hh_pair            built_in

- -

profile_pair       built_in

- -

cdna_fast_pair     built_in

- -

cdna_cfast_pair    built_in

- -

clustalw_pair      -ftp://www.ebi.ac.uk/pub/clustalw

- -

mafft_pair         -http://www.biophys.kyoto-u.ac.jp/~katoh/programs/align/mafft/

- -

mafftjtt_pair      http://www.biophys.kyoto-u.ac.jp/~katoh/programs/align/mafft/

- -

mafftgins_pair     -http://www.biophys.kyoto-u.ac.jp/~katoh/programs/align/mafft/

- -

dialigntx_pair     -http://dialign-tx.gobics.de/

- -

dialignt_pair      -http://dialign-t.gobics.de/

- -

poa_pair           http://www.bioinformatics.ucla.edu/poa/

- -

probcons_pair      -http://probcons.stanford.edu/

- -

muscle_pair        -http://www.drive5.com/muscle/

- -

t_coffee_pair      -http://www.tcoffee.org

- -

pcma_pair          -ftp://iole.swmed.edu/pub/PCMA/

- -

kalign_pair        -http://msa.cgb.ki.se

- -

amap_pair          -http://bio.math.berkeley.edu/amap/

- -

proda_pair         -http://bio.math.berkeley.edu/proda/

- -

prank_pair         -http://www.ebi.ac.uk/goldman-srv/prank/

- -

consan_pair        -http://selab.janelia.org/software/consan/

- -

 

- -

****** Pairwise Structural Alignment Methods:

- -

--------------------------------------------

- -

align_pdbpair      built_in

- -

lalign_pdbpair     built_in

- -

extern_pdbpair     built_in

- -

thread_pair        built_in

- -

fugue_pair         -http://www-cryst.bioc.cam.ac.uk/fugue/download.html

- -

pdb_pair           built_in

- -

sap_pair           -http://www-cryst.bioc.cam.ac.uk/fugue/download.html

- -

mustang_pair       -http://www.cs.mu.oz.au/~arun/mustang/

- -

tmalign_pair       -http://zhang.bioinformatics.ku.edu/TM-align/

- -

 

- -

****** Multiple Sequence Alignment Methods:

- -

--------------------------------------------

- -

clustalw_msa       -ftp://www.ebi.ac.uk/pub/clustalw

- -

mafft_msa          -http://www.biophys.kyoto-u.ac.jp/~katoh/programs/align/mafft/

- -

mafftjtt_msa       -http://www.biophys.kyoto-u.ac.jp/~katoh/programs/align/mafft/

- -

mafftgins_msa      -http://www.biophys.kyoto-u.ac.jp/~katoh/programs/align/mafft/

- -

dialigntx_msa      -http://dialign-tx.gobics.de/

- -

dialignt_msa       -http://dialign-t.gobics.de/

- -

poa_msa            -http://www.bioinformatics.ucla.edu/poa/

- -

probcons_msa       -http://probcons.stanford.edu/

- -

muscle_msa         -http://www.drive5.com/muscle/

- -

t_coffee_msa       -http://www.tcoffee.org

- -

pcma_msa           -ftp://iole.swmed.edu/pub/PCMA/

- -

kalign_msa         -http://msa.cgb.ki.se

- -

amap_msa           -http://bio.math.berkeley.edu/amap/

- -

proda_msa          -http://bio.math.berkeley.edu/proda/

- -

prank_msa          -http://www.ebi.ac.uk/goldman-srv/prank/

- -

 

- -

#######   Prediction Methods -available to generate Templates

- -

-------------------------------------------------------------

- -

RNAplfold          -http://www.tbi.univie.ac.at/~ivo/RNA/

- -

HMMtop             -www.enzim.hu/hmmtop/

- -

GOR4               -http://mig.jouy.inra.fr/logiciels/gorIV/

- -

wublast_client     -http://www.ebi.ac.uk/Tools/webservices/services/wublast

- -

blastpgp_client    http://www.ebi.ac.uk/Tools/webservices/services/blastpgp           

- -

==========================================================

- -
- -

 

- -

 

- -

Installation -of PSI-Coffee and Expresso

- -

PSI-Coffee is a mode of T-Coffee that runs a a Psi-BLAST on each of -your sequences and makes a multiple profile alignment. If you do not have any -structural information, it is by far the most accurate mode of T-Coffee. To use -it, you must have SOAP installed so that the EBI BLAST client can run on your -system.

- -

It is a bit slow, but really worth it if your sequences are hard to -align and if the accuracy of your alignment is important.  

- -

To use this mode, try:

- -
- -

   t_coffee <yoursequence> -mode psicoffee

- -
- -

Note that because PSI-BLAST is time consuming, T-Coffee stores the runs -in its cache (./tcoffee/cache) so that it does not need to be re-run. It means -that if you re-align your sequences (or add a few extra sequences), things will -be considerably faster.

- -

If your installation procedure has managed to compile TMalign, and if -T-Coffee has access to the EBI BLAST server (or any other server) you can also -do the following:

- -
- -

   t_coffee <yoursequence> -mode expresso

- -
- -

That will look for structural templates. And if both these modes are -running fine, then you are ready for the best, the "crème de la -crème":

- -
- -

   t_coffee <yoursequence> -mode accurate

- -
- -

Installation -of M-Coffee

- -

 

- -

M-Coffee is a special mode of T-Coffee that makes it possible to -combine the output of many multiple sequence alignment packages.

- -

Automated -Installation

- -

In the T-Coffee distribution, type:

- -
- -

./install mcoffee

- -
- -

 

- -

In theory, this command should download and install every required -package. If, however, it fails, you should switch to the manual installation -(see next).

- -

By default these packages will be in

- -
- -

$HOME/.t_coffee/plugins

- -
- -

If you want to have these companion packages in a different directory, -you can either set the environement variable

- -
- -

PLUGINS_4_TCOFFEE=<plugins -dir>

- -
- -

Or use the command line flag -plugin (over-rides every other setting)

- -
- -

t_coffee ... --plugins=<plugins dir>

- -
- -

 

- -

 

- -

Manual -Installation

- -

M-Coffee requires a standard T-Coffee installation (c.f. previous -section) and the following packages to be installed on your system:

- -

        

- -
- -

Package           Where From

- -

==========================================================

- -

ClustalW          can interact -with t_coffee

- -

----------------------------------------------------------

- -

Poa               http://www.bioinformatics.ucla.edu/poa/

- -

----------------------------------------------------------

- -

Muscle            http://www.drive5.com

- -

 ----------------------------------------------------------

- -

ProbCons          http://probcons.stanford.edu/

- -

ProbConsRNA       http://probcons.stanford.edu/

- -

----------------------------------------------------------

- -

MAFFT http://www.biophys.kyoto-u.ac.jp/~katoh/programs/align/mafft/

- -

----------------------------------------------------------

- -

Dialign-T         http://dialign-t.gobics.de/

- -

Dialign-TX        http://dialign-tx.gobics.de/

- -

----------------------------------------------------------

- -

PCMA              ftp://iole.swmed.edu/pub/PCMA/

- -

----------------------------------------------------------

- -

kalign            -http://msa.cgb.ki.se

- -

----------------------------------------------------------

- -

amap              -http://bio.math.berkeley.edu/amap/

- -

-----------------------------------------------------------

- -

proda_msa        -http://bio.math.berkeley.edu/proda/

- -

-----------------------------------------------------------

- -

prank_msa        http://www.ebi.ac.uk/goldman-srv/prank/

- -

 

- -
- -

 

- -

In our hands all these packages where very straightforward to compile -and install on a standard cygwin or Linux configuration. Just make sure you -have gcc, the C compiler, properly installed.

- -

Once the package is compiled and ready to use, make sure that the -executable is on your path, so that t_coffee can find it automatically. Our -favorite procedure is to create a bin directory in the home. If you do so, make -sure this bin is in your path and fill it with all your executables (this is a -standard Unix practice).

- -

If for some reason, you do not want this directory to be on your path, -or you want to specify a precise directory containing the executables, you can -use:

- -
- -

   export PLUGINS_4_TCOFFEE=<dir>

- -
- -

By default this directory is set to $HOME/.t_coffee/plugins/$OS, but -you can over-ride it with the environement variable or using the flag:

- -
- -

   t_coffee ...-plugins=<dir>

- -
- -

 

- -

If you cannot, or do not want to use a single bin directory, you can -set the following environment variables to the absolute path values of the -executable you want to use. Whenever they are set, these variables will supersede -any other declaration. This is a convenient way to experiment with multiple -package versions.

- -
- -

POA_4_TCOOFFEE
-CLUSTALW_4_TCOFFEE
-POA_4_TCOFFEE
-TCOFFEE_4_TCOFFEE
-MAFFT_4_TCOFFEE
-MUSCLE_4_TCOFFEE
-DIALIGNT_4_TCOFFEE
-PRANK_4_TCOFFEE
-DIALIGNTX_4_TCOFFEE
- 

- -
- -

For three of these packages, you will need to copy some of the files in -a special T-Coffee directory.

- -
- -

   cp POA_DIR/* ~/.t_coffee/mcoffee/

- -

   cp DIALIGN-T/conf/*  ~/.t_coffee/mcoffee

- -

   cp DIALIGN-TX/conf/*  -~/.t_coffee/mcoffee

- -
- -

Note that the following files are enough for default usage:

- -
- -

BLOSUM.diag_prob_t10   BLOSUM75.scr  -blosum80_trunc.mat          

- -

dna_diag_prob_100_exp_330000  dna_diag_prob_200_exp_110000

- -

BLOSUM.scr             BLOSUM90.scr  dna_diag_prob_100_exp_110000

- -

dna_diag_prob_100_exp_550000  dna_diag_prob_250_exp_110000

- -

BLOSUM75.diag_prob_t2  blosum80.mat  -dna_diag_prob_100_exp_220000 

- -

dna_diag_prob_150_exp_110000  dna_matrix.scr

- -
- -

 

- -

If you would rather have the mcoffee directory in some other location, -set the MCOFFEE_4_TCOFFEE environement variable to the propoer directory:

- -
- -

   setenv MCOFFEE_4_TCOFFEE <directory containing mcoffee -files>

- -
- -

Installation -of APDB and iRMSD

- -

APDB and iRMSD are incorporated in T-Coffee. Once t_coffee is -installed, you can invoque these programs by typing:

- -
- -

   t_coffee –other_pg apdb
-   t_coffee –other_pg irmsd

- -
- -

Installation -of tRMSD

- -

tRMSD comes along with t_coffee but it also requires the package -phylip in order to be functional. Phylip can be obtained from:

- -

        

- -
- -

Package           Function

- -

===================================================

- -

---------------------------------------------------

- -

Phylip            Phylogenetic -tree computation

- -

                  evolution.genetics.washington.edu/phylip.html

- -

---------------------------------------------------

- -
- -
- -

t_coffee –other_pg trmsd

- -
- -

 

- -

Installation -of seq_reformat

- -

Seq_reformat is a reformatting package that is part of t_coffee. To -use it (and see the available options), type:

- -
- -

   t_coffee –other_pg seq_reformat

- -
- -

Installation -of extract_from_pdb

- -

Extract_from_pdb is a PDB reformatting package that is part of -t_coffee. To use it (and see the available options), type.

- -
- -

   t_coffee –other_pg apdb –h

- -
- -

Extract_from_pdb requires wget in order to automatically fetch PDB -structures.

- -

 

- - - -

Installation of 3D-Coffee/Expresso

- -

3D-Coffee/Expresso is a special mode of -T-Coffee that makes it possible to combine sequences and structures. The main -difference between Expresso and 3D-Coffee is that Expresso fetches the -structures itself.

- -

Automated Installation

- -

In the T-Coffee distribution, type:

- -
- -

./install -expresso

- -

OR

- -

./install -3dcoffee

- -
- -

 

- -

In theory, this command should download and -install every required package (except fugue). If, however, it fails, -you should switch to the manual installation (see next).

- -

Manual Installation

- -

In order to make the most out of T-Coffee, you -will need to install the following packages (make sure the executable is named -as indicated below):

- -

        

- -
- -

Package           Function

- -

===================================================

- -

---------------------------------------------------

- -

wget              3DCoffee

- -

                  Automatic Downloading of Structures

- -

---------------------------------------------------

- -

sap               structure/structure comparisons

- -

(obtain it from W. Taylor, NIMR-MRC).

- -

---------------------------------------------------

- -

TMalign           zhang.bioinformatics.ku.edu/TM-align/

- -

---------------------------------------------------

- -

mustang           www.cs.mu.oz.au/~arun/mustang/

- -

---------------------------------------------------

- -

wublastclient     www.ebi.ac.uk/Tools/webservices/clients/wublast

- -

---------------------------------------------------

- -

Blast             www.ncbi.nih.nlm.gov

- -

---------------------------------------------------

- -

Fugue*            protein to structure alignment program

- -

                  http://www-cryst.bioc.cam.ac.uk/fugue/download.html

- -

                  ***NOT -COMPULSORY***

- -
- -

 

- -

Once the package is installed, make sure make -sure that the executable is on your path, so that t_coffee can find it -automatically.

- -

 

- -

The wublast client makes it possible to run -BLAST at the EBI without having to install any database locally. It is an ideal -solution if you are only using expresso occasionally.

- -

 

- -

Installing Fugue for T-Coffee

- -

Uses a standard fugue installation. You only -need to install the following packages:

- -

 joy, -melody, fugueali, sstruc, hbond

- -

If you have root privileges, you can install -the common data in:

- -

cp fugue/classdef.dat           /data/fugue/SUBST/classdef.dat

- -

otherwise

- -

Setenv MELODY_CLASSDEF=<location>

- -

Setenv MELODY_SUBST=fugue/allmat.dat

- -

 

- -

All the other configuration files must be in -the right location.

- -

Installation of R-Coffee

- -

R-Coffee is a special mode able to align RNA -sequences while taking into account their secondary structure.

- -

Automated Installation

- -

In the T-Coffee distribution, type:

- -
- -

./install -rcoffee

- -
- -

 

- -

In theory, this command should download and -install every required package (except consan). If, however, it fails, -you should switch to the manual installation (see next).

- -

Manual Installation

- -

R-Coffee only requires the package Vienna to be installed, in -order to compute multiple sequence alignments. To make the best out of it, you -should also have all the packages required by M-Coffee

- -

        

- -
- -

Package           Function

- -

===================================================

- -

---------------------------------------------------

- -

consan            R-Coffee

- -

                  Computes highly accurate pairwise Alignments

- -

                  ***NOT COMPULSORY***

- -

                  selab.janelia.org/software/consan/

- -

---------------------------------------------------

- -

RNAplfold         Computes RNA secondary Structures

- -

                  www.tbi.univie.ac.at/~ivo/RNA/

- -

---------------------------------------------------

- -

probconsRNA       probcons.stanford.edu/

- -

       

- -

---------------------------------------------------

- -

M-Coffee          T-Coffee and the most common MSA Packages

- -

                  (cf -M-Coffee in this installation guide)

- -
- -

Installing ProbbonsRNA for R-Coffee

- -

Follow the installation procedure, but make -sure you rename the probcons executable into probconsRNA.

- -

Installing Consan for R-Coffee

- -

In order to insure a proper interface beween -consan and R-Coffee, you must make sure that the file mix80.mod is in the -directory ~/.t_coffee/mcoffee or in the mcoffee directory otherwise declared.

- -

 

- -
- -

Quick Start

- -
- -

We only give you the very basics here. -Please use the Tutorial for more detailed information on how to use our tools.

- - - -

T-COFFEE

- -

Write your sequences in the same file -(Swiss-prot, Fasta or Pir) and type.

- -
- -

PROMPT: -t_coffee sample_seq1.fasta 

- -
- -

This will output two files:

- -

sample_seq1.aln: your Multiple -Sequence Alignment

- -

sample_seq1.dnd: The Guide tree -(newick Format)

- -
- -

IMPORTANT:

- -

In theory nucleic acids should be -automatically detected and the default methods should be adapted appropriately. -However, sometimes this may fail, either because the sequences are too short or -contain too many ambiguity codes.

- -

When this happens, you are advised to -explicitly set the type of your sequences

- -

NOTE: the –mode=dna is not needed or -supported anymore

- -
- -
- -

PROMPT: -t_coffee sample_dnaseq1.fasta –type=dna

- -
- -

M-Coffee

- -

M-Coffee is a Meta version of T-Coffee -that makes it possible to combine the output of at least eight packages -(Muscle, probcons, poa, dialignT, mafft, clustalw, PCMA and T-Coffee).

- -

If all these packages are already installed on your machine. You must:

- -

 

- -

1-set the following environment variables

- -
- -

   export POA_DIR=[absolute path of the POA installation dir]

- -

   export DIALIGNT_DIR=[Absolute path of the DIALIGN-T/conf

- -
- -

Once this is done, write your sequences in a file and run: same file -(Swiss-prot, Fasta or Pir) and type.

- -
- -

PROMPT: t_coffee -sample_seq1.fasta –mode mcoffee

- -
- -

If the program starts complaining one package or the other is missing, -this means you will have to go the hard way and install all these packages -yourself... Proceed to the M-Coffee section for more detailed instructions.

- -

Expresso

- -

If you have installed the EBI wublast.pl client, Expresso will BLAST -your sequences against PDB, identify the best targets and use these to align -your proteins.

- -

 

- -
- -

PROMPT: t_coffee -sample_seq1.fasta –mode expresso

- -
- -

If you did not manage to install all the required structural packages -for Expresso, like Fugue or Sap, you can still run expresso by selecting -yourself the structural packages you want to use. For instance, if you'd rather -use TM-Align than sap, try:

- -

        

- -
- -

PROMPT: t_coffee -sample_seq1.fasta –template_file EXPRESSO -method TMalign_pair

- -
- -

 

- -

R-Coffee

- -

R-Coffee can be used to align RNA sequences, using their RNApfold -predicted secondary structures. The best results are obtained by using the -consan pairwise method. If you have consan installed:

- -
- -

t_coffee -sample_rnaseq1.fasta –special_mode rcoffee_consan

- -
- -

This will only work if your sequences are short enough (less than 200 -nucleotides). A good alternative is the rmcoffee mode that will run Muscle, -Probcons4RNA and MAfft and then use the secondary structures predicted by -RNApfold.

- -
- -

PROMPT: t_coffee -sample_rnaseq1.fasta –mode mrcoffee

- -
- -

 

- -

If you want to decide yourself which methods should be combined by -R-Coffee, run:

- -
- -

PROMPT: t_coffee -sample_rnaseq1.fasta –mode rcoffee -method lalign_id_pair slow_pair

- -
- -

 

- -

 

- -


-iRMSD and APDB

- -

All you need is a file containing the alignment of sequences with a -known structure. These sequences must be named according to their PDB ID, -followed by the chain index ( 1aabA for instance). All the sequences do not -need to have a known structure, but at least two need to have it.

- -

Given the alignment:

- -

 

- -
- -

PROMPT: t_coffee –other_pg -irmsd -aln 3d_sample4.aln

- -
- -

tRMSD

- -

tRMSD is a structure based clustering method using the iRMSD to drive -the clustering. The T-RMSD supports all the parameters supported by iRMSD or -APDB.

- -

 

- -
- -

PROMPT: t_coffee –other_pg -trmsd -aln 3d_sample5.aln -template_file 3d_sample5.template_list

- -
- -

3d_sample5.aln is a multiple -alignment in which each sequence has a known structure. The file 3d_sample5.template_list is a fasta like file declaring the structure -associated with each sequence, in the form:

- -
- -

> <seq_name> _P_ <PDB structure file or name>

- -
- -

 

- -
- -

******* 3d_sample5.template_list ********     

- -

>2UWI-3A _P_ 2UWI-3.pdb

- -

>2UWI-2A _P_ 2UWI-2.pdb

- -

>2UWI-1A _P_ 2UWI-1.pdb

- -

>2HEY-4R _P_ 2HEY-4.pdb

- -

...

- -

**************************************

- -
- -

 

- -

The program then outputs a series of files

- -
- -

Template Type: [3d_sample5.template_list] -Mode Or File: [3d_sample5.template_list] [Start]

- -

         -[Sample Columns][TOT=   51][100 -%][ELAPSED TIME:    0 sec.]

- -

         [Tree Cmp][TOT=   13][ 92 %][ELAPSED TIME:    0 sec.]

- -

 #### -File Type=   TreeList Format=     newick Name= 3d_sample5.tot_pos_list

- -

 #### -File Type=       Tree Format=     newick Name= 3d_sample5.struc_tree10

- -

 #### -File Type=       Tree Format=     newick Name= 3d_sample5.struc_tree50

- -

 #### -File Type=       Tree Format=     newick Name= 3d_sample5.struc_tree100

- -

 #### -File Type= Colored MSA Format= score_html Name= 3d_sample5.struc_tree.html

- -

 

- -
- -

 

- -

3d_sample5.tot_pos_list       is a -list of the tRMSD tree associated with every position.

- -

3d_sample5.struc_tree100   is a -consensus tree (phylip/consense) of the trees contained in the previous file. This -file is the default output

- -

3d_sample5.struc_tree10     is a -consensus tree (phylip/consense) of the 10% trees having the higest average -agreement with the rest

- -

3d_sample5.struc_tree10     is a -consensus tree (phylip/consense) of the 50% trees having the higest average -agreement with the rest

- -

3d_sample5.html      is a colored -version of the output showing in red the positions that give the highest -support to 3d_sample5.struc_tree100

- -

 

- -

 

- -

 

- -

MOCCA

- -

Write your sequences in the same file -(Swiss-prot, Fasta or Pir) and type.

- -
- -

PROMPT: -t_coffee –other_pg mocca sample_seq1.fasta

- -
- -

This command output one files (<your -sequences>.mocca_lib) and starts an interactive menu.

- -
- -

Recent Modifications

- -
- -

Warning: This log of recent modifications is -not as thorough and accurate as it should be.

- -

-5.80 Novel assembly algorithm -(linked_pair_wise) and the primary library is now made of probcons style -pairwise alignments (proba_pair)

- -

-4.30 and upward: the FAQ has moved into a new -tutorial document

- -

-4.30 and upward: -in has will be deprecated -and replaced by the flags: -profile,-method,-aln,-seq,-pdb

- -

-4.02: -mode=dna is still available but not any -more needed or supported. Use type=protein or dna if you need to force things

- -

-3.28: corrected a bug that prevents short sequences from being -correctly aligned

- -

-Use of @ as a separator when specifying -methods parameters

- -

-The most notable modifications have to do with -the structure of the input. From version 2.20, all files must be tagged to -indicate their nature (A: alignment, S: Sequence, L: Library…). We are becoming -stricter, but that’s for your own good…

- -

Another important modification has to do with -the flag -matrix: it now controls the matrix being used for the computation

- -
-
- - - -

 

- -

This reference manual gives a list of all the -flags that can be used to modify the behavior of T-Coffee. For your -convenience, we have grouped them according to their nature. To display a list -of all the flags used in the version of T-Coffee you are using (along with -their default value), type:

- -
- -

PROMPT: -t_coffee

- -
- -

Or

- -
- -

PROMPT: -t_coffee –help

- -
- -

Or

- -
- -

PROMPT: -t_coffee –help –in

- -
- -

Or any other parameter

- -

 

- -

Environment -Variables

- -

It is possible to modify T-Coffee’s behavior by setting any of the -following environement variables. On the bash shell, use export VAR=”value”. On -the cshell, use set $VAR=”xxx”

- -

http_proxy_4_TCOFFEE

- -

Sets the http_proxy and HTTP_proxy values used by T-Coffee.

- -

These values get supersede http_proxy and HTTP_proxy. -http_proxy_4_TCOFFEE gets superseded by the command line values (-proxy and --email)

- -

If you have no proxy, just set this value to an empty string.

- -

email_4_TCOFFEE

- -

Set the E-mail values provided to web services called upon by T-Coffee. -Can be over-riden by the flag -email.

- -

DIR_4_TCOFFEE

- -

By default this variable is set to $HOME/.t_coffee. This is where -T-Coffee expects to find its cache, tmp dir and possibly any temporary data -stored by the program.

- -

TMP_4_TCOFFEE

- -

By default this variable is set to $HOME/.t_coffee/tmp. This is where -T-Coffee stores temporary files.

- -

CACHE_4_TCOFFEE

- -

By default this variable is set to $HOME/.t_coffee/cache. This is where -T-Coffee stores any data expensive to obtain: pdb files, sap alignments....

- -

PLUGINS_4_TCOFFEE

- -

By default all the companion packages are searched in the directory -DIR_4_TCOFFEE/plugins/<OS>. This variable overrides the default. This -variable can also be overriden by the -plugins T-Coffee flag

- -

NO_ERROR_REPORT_4_TCOFFEE

- -

By default this variable is no set. Set it if you do not want the -program to generate a verbose error output file (useful for running a server).

- -

PDB_DIR

- -

Indicate the location of your local PDB installation.

- -

NO_WARNING_4_TCOFFEE

- -

Suppresses all the warnings.

- -

 

- -

Setting -up the T-Coffee environment variables

- -

T-Coffee can have its own environment file. This environment is kept -in a file named $HOME/.t_coffee/t_coffee_env and can be edited. The value of -any legal variable can be modified through that file. For instance, here is an -example of a configuration file when not requiring a proxy.

- -
- -

http_proxy_4_TCOFFEE=

- -

EMAIL_4_TCOFFEE=cedric.notredame@europe.com

- -
- -
- -

IMPORTANT:

- -

-proxy, -email >> -t_coffee_env >> env

- -

 

- -
- -

 

- -

 

- -

Well -Behaved Parameters

- -

Separation

- -

You can use any kind of separator you want -(i.e. ,; <space>=). The syntax used in this document is meant to be -consistent with that of ClustalW. However, in order to take advantage of the -automatic filename compleation provided by many shells, you can replace “=” and -“,” with a space.

- -

Posix

- -

T-Coffee is not POSIX compliant.

- -

Entering the right parameters

- -

There are many ways to enter parameters in -T-Coffee, see the -parameter flag in

- -

 

- -
- -

Parameters Priority

- -

 

- -

In -general you will not need to use these complicated parameters. Yet, if you find -yourself typing long command lines on a regular basis, it may be worth reading -this section.

- -

 

- -

One -may easily feel confused with the various manners in which the parameters can -be passed to t_coffee. The reason for these many mechanisms is that they allow -several levels of intervention. For instance, you may install t_coffee for all -the users and decide that the defaults we provide are not the proper ones… In -this case, you will need to make your own t_coffee_default file.

- -

 

- -

Later -on, a user may find that he/she needs to keep re-using a specific set of -parameters, different from those in t_coffee_default, hence the possibility to -write an extra parameter file: parameters. In summary:

- -

 

- -

-parameters -> prompt parameters > -t_coffee_defaults > -mode

- -

 

- -

This -means that -parameters supersede all the others, while parameters -provided via -special mode are the weakest.

- -
- -

 

- -

 

- -

Parameters Syntax

- -

No Flag

- -

If no flag is used <your -sequence> must be the first argument. See format for further -information.

- -
- -

PROMPT: -t_coffee sample_seq1.fasta

- -
- -

Which is equivalent to

- -
- -

PROMPT: -t_coffee Ssample_seq1.fasta

- -
- -

When you do so, sample_seq1 is used as a name prefix for every file the program -outputs.

- -

-parameters

- -

Usage: --parameters=parameters_file

- -

Default: no parameters file

- -

Indicates a file containing extra parameters. -Parameters read this way behave as if they had been added on the right end of -the command line that they either supersede(one value parameter) or complete -(list of values). For instance, the following file (parameter.file) could be -used

- -
- -

*******sample_param_file.param********  

- -

      -in=Ssample_seq1.fasta,Mfast_pair

- -

      -output=msf_aln

- -

**************************************

- -
- -

Note: This is -one of the exceptions (with –infile) where the identifier tag (S,A,L,M…) can be -omitted. Any dataset provided this way will be assumed to be a sequence (S). -These exceptions have been designed to keep the program compatible with -ClustalW.

- -

Note: This -parameter file can ONLY contain valid parameters. Comments are not allowed. -Parameters passed this way will be checked like normal parameters.

- -

Used with:

- -
- -

PROMPT: -t_coffee -parameters=sample_param_file.param

- -
- -

Will cause t_coffee to apply the -fast_pair method onto to the sequences contained in sample_seq.fasta. If you -wish, you can also pipe these arguments into t_coffee, by naming the parameter -file "stdin" (as a rule, any file named stdin is expected to receive -its content via the stdin)

- -

cat sample_param_file.param  | t_coffee -parameters=stdin

- -

-t_coffee_defaults

- -

Usage: --t_coffee_defaults=<file_name>

- -

Default: not used.

- -

This flag tells the program to use -some default parameter file for t_coffee. The format of that file is the same -as the one used with -parameters. The file used is either:

- -

            1. -<file name> if a name has been specified

- -

            2.  ~/.t_coffee_defaults if no file was -specified

- -

            3. -The file indicated by the environment variable TCOFFEE_DEFAULTS

- -

-mode

- -

Usage: -mode= -hard coded mode

- -

Default: not used.

- -

It indicates that t_coffee will use -some hard coded parameters. These include:

- -

            quickaln: very fast approximate -alignment

- -

            dali: a mode used to combine dali -pairwise alignments

- -

            evaluate: defaults for evaluating an -alignment

- -

            3dcoffee: runs t_coffee with the -3dcoffee parameterization

- -

 

- -

Other modes exist that are not yet -fully supported

- -

-score [Deprecated]

- -

Usage: --score

- -

Default: not used

- -

Toggles on the evaluate mode and -causes t_coffee to evaluates a precomputed alignment provided via -infile=<alignment>. -The flag -output must be set to an appropriate format (i.e. --output=score_ascii, score_html or score_pdf). A better default parameterization -is obtained when using the flag -mode=evaluate.

- -

-evaluate

- -

Usage: --evaluate

- -

Default: not used

- -

Replaces –score. This flag toggles on -the evaluate mode and causes t_coffee to evaluates a pre-computed alignment -provided via -infile=<alignment>. The flag -output must be -set to an appropriate format (i.e. -output=score_ascii, score_html or -score_pdf).

- -

 

- -

The main purpose of –evaluate is to -let you control every aspect of the evaluation. Yet it is advisable to use -pre-defined parameterization: mode=evaluate.

- -
- -

PROMPT: -t_coffee –infile=sample_aln1.aln -mode=evaluate

- -

PROMPT: -t_coffee –infile=sample_seq1.aln –in  -Lsample_lib1.tc_lib –mode=evaluate

- -
- -

-convert [cw]

- -

Usage: --convert

- -

Default: turned off

- -

Toggles on the conversion mode and -causes T-Coffee to convert the sequences, alignments, libraries or structures -provided via the -infile and -in flags. The output format must be -set via the -output flag. This flag can also be used if you simply want -to compute a library (i.e. you have an alignment and you want to turn it into a -library).

- -

This flag is ClustalW compliant.

- -

-do_align [cw]

- -

Usage:  -do_align

- -

Default: turned on

- -

Special Parameters

- -

-version

- -

Usage: --version

- -

Default: not used

- -

Returns the current version number

- -

-proxy

- -

Usage: -proxy=<proxy>

- -

Default: not used

- -

Sets the proxy used by -HTTP_proxy AND http_proxy. Setting with the propmpt supersedes ANY other -setting.

- -

Note that if you use no -proxy, you should set

- -

            -proxy

- -

-email

- -

Usage: -email=<email>

- -

Default: not used

- -

Sets your email value as -provided to web services

- -

-check_configuration

- -

Usage: --check_configuration

- -

Default: not used

- -

Checks your system to determine -whether all the programs T-Coffee can interact with are installed.

- -

-cache

- -

Usage: --cache=<use, update, ignore, <filename>>

- -

Default: -cache=use

- -

By default, t_coffee stores in a -cache directory, the results of computationally expensive (structural -alignment) or network intensive (BLAST search) operations.

- -

-update

- -

Usage: --update

- -

Default: turned off

- -

Causes a wget access that checks whether -the t_coffee version you are using needs updating.

- -

-full_log

- -

Usage: --full_log=<filename>

- -

Default: turned off

- -

Causes t_coffee to output a full log -file that contains all the input/output files.

- -

-plugins

- -

Usage: -plugins=<dir>

- -

Default: default

- -

Specifies the directory in which the companion packages (other -multiple aligners used by M-Coffee, structural aligners, etc…) are kept as an -alternative, you can also set the environment variable PLUGINS_4_TCOFFEE

- -

The default is ~/.t_coffee/plugins/

- -

-other_pg

- -

Usage: --other_pg=<filename>

- -

Default: turned off

- -

Some rumours claim that Tetris is -embedded within T-Coffee and could be ran using some special set of commands. -We wish to deny these rumours, although we may admit that several interesting -reformatting programs are now embedded in t_coffee and can be ran through the -–other_pg flag.

- -
- -

PROMPT: -t_coffee –other_pg=seq_reformat

- -

PROMPT: -t_coffee –other_pg=unpack_all

- -

PROMPT: -t_coffee –other_pg=unpack_extract_from_pdb

- -
- -

Input

- -

Sequence Input

- -

-infile [cw]

- -

To remain compatible with ClustalW, -it is possible to indicate the sequences with this flag

- -
- -

PROMPT: -t_coffee -infile=sample_seq1.fasta

- -
- -

Note: Common -multiple sequence alignments format constitute a valid input format.

- -

Note: T-Coffee -automatically removes the gaps before doing the alignment. This behaviour is different -from that of ClustalW where the gaps are kept.

- -

-in (Cf –in from the Method and Library Input section)

- -

-get_type

- -

Usage: --get_type

- -

Default: turned off

- -

Forces t_coffee to identify the -sequences type (PROTEIN, DNA).

- -

-type [cw]

- -

Usage: --type=DNA ¦ PROTEIN¦ DNA_PROTEIN

- -

Default: -type=<automatically set>

- -

This flag sets the type of the -sequences. If omitted, the type is guessed automatically. This flag is -compatible with ClustalW.

- -
- -

Warning:  -In case of low complexity or short sequences, it is recommended to set -the type manually.

- -
- -

-seq

- -

Usage: --seq=[<P,S><name>,]

- -

Default: none

- -

-seq is now the recommended flag to provide -your sequences. It behaves mostly like the -in flag.

- -

-seq_source

- -

Usage: --seq_source=<ANY or  _LS or LS >

- -

Default: ANY.

- -

You may not want to combine all the -provided sequences into a single sequence list. You can do by specifying that -you do not want to treat all the –in files as potential sequence sources.

- -

-seq_source=_LA indicates that -neither sequences provided via the A (Alignment) flag or via the L (Library -flag) should be added to the sequence list.

- -

-seq_source=S means that only -sequences provided via the S tag will be considered. All the other sequences -will be ignored.

- -

Note:  This flag is mostly designed for interactions -between T-Coffee and T-CoffeeDPA (the large scale version of T-Coffee).

- -

Structure Input

- -

-pdb

- -

Usage:  -pdb=<pdbid1>,<pdbid2>…[Max 200]

- -

Default: None

- -

Reads or fetch a pdb file. It is -possible to specify a chain or even a sub-chain:

- -

PDBID(PDB_CHAIN)[opt] (FIRST,LAST)[opt]

- -

It is also possible to input structures via the –in flag. In that -case, you will need to use the TAG identifier:

- -

-in -Ppdb1 Ppdb2…

- -

Tree -Input

- -

-usetree

- -

Usage: --usetree=<tree file>

- -

Default: No file specified

- -

Format: newick tree format (ClustalW -Style)

- -

This flag indicates that rather than -computing a new dendrogram, t_coffee must use a pre-computed one. The tree -files are in phylips format and compatible with ClustalW. In most cases, using -a pre-computed tree will halve the computation time required by t_coffee. It is -also possible to use trees output by ClustalW, Phylips and any other program.

- -

Structures, Sequences Methods and Library Input via the in Flag

- -
- -

The -in Flag and -its Identifier TAGS

- -

 

- -

<-in> is the real -grinder of T-Coffee. Sequences, methods and alignments all pass through so that -T-Coffee can turn it all into a single list of constraints (the library). -Everything is done automatically with T-Coffee going through each file to -extract the sequences it contains. The methods are then applied to the -sequences. Pre-compiled constraint list can also be provided. Each file -provided via this flag must be preceded with a symbol (Identifier TAG) that -indicates its nature to T-Coffee. The TAGs currently supported are the -following:

- -

 

- -

P         PDB structure

- -

S          for sequences (use it as well to treat an MSA as unaligned -sequences)

- -

 

- -

M        Methods used to build the library

- -

L         Pre-computed T-Coffee library

- -

A         Multiple Alignments that must be turned into a Library

- -

 

- -

X         Substitution matrices.

- -

R                     Profiles. This is a legal multiple -alignments that will be treated as single sequences (the sequences it contains -will not be realigned).

- -

 

- -

If you do not want to use the TAGS, you will -need to use the following flags in replacement of -in. Do not use the TAGS when -using these flags:

- -

 

- -

-aln                             Alignments -   (A)

- -

-profile           Profiles -          (R)

- -

-method          Method -         (M)

- -

-seq                             Sequences -     (S)

- -

-lib                              Libraries        (L)

- -
- -

-in

- -

Usage: --in=[<P,S,A,L,M,X><name>,]

- -

Default: --in=Mlalign_id_pair,Mclustalw_pair

- -
- -

Note: -in can be replaced with the combined -usage of -aln, iprofile, .pdb, .lib, -method.

- -
- -

See the box for an explanation of the --in flag. The following argument passed via -in

- -

 

- -
- -

PROMPT: -t_coffee --in=Ssample_seq1.fasta,Asample_aln1.aln,Asample_aln2.msf,Mlalign_id_pair,Lsample_lib1.tc_lib -–outfile=outaln

- -
- -

 

- -

This command will trigger the following -chain of events:

- -

 

- -

1-Gather all the sequences

- -

Sequences within all the provided -files are pooled together. Format recognition is automatic. Duplicates are -removed (if they have the same name). Duplicates in a single file are only -tolerated in FASTA format file, although they will cause sequences to be -renamed.

- -

In the above case, the total set of -sequences will be made of sequences contained in sequences1.seq, -alignment1.aln, alignment2.msf and library.lib, plus the sequences initially -gathered  by -infile.

- -

2-Turn alignments into libraries

- -

alignment1.aln and alignment2.msf -will be read and turned into libraries. Another library will be produced by -applying the method lalign_id_pair to the set of sequences previously obtained -(1). The final library used for the alignment will be the combination of all -this information.

- -

Note as well the following rules:

- -

 

- -

1-Order: The order in which sequences, methods, alignments and libraries are -fed in is irrelevant.

- -

2-Heterogeneity: There is no need for each element (A, S, L) to -contain the same sequences.

- -

3-No Duplicate: Each file should contain only one copy of each -sequence. Duplicates are only allowed in FASTA files but will cause the -sequences to be renamed.

- -

4-Reconciliation: If two files (for instance two alignments) -contain different versions of the same sequence due to an indel, a new sequence -will be reconstructed and used instead:

- -

aln 1:hgab1   AAAAABAAAAA

- -

aln 2:hgab1   AAAAAAAAAACCC

- -

will cause the program to reconstruct -and use the following sequence

- -

hgab1   AAAAABAAAAACCC -

- -

This can be useful if you are trying -to combine several runs of blast, or structural information where residues may -have been deleted. However substitutions are forbidden. If two sequences with -the same name cannot be merged, they will cause the program to exit with an -information message.

- -

5-Methods: The method describer can either be built in -(See ### for a list of all the available methods) or be a file describing the -method to be used. The exact syntax is provided in part 4 of this manual.

- -

6-Substitution Matrices: If the method is a substitution matrix (X) then -no other type of information should be provided. For instance:

- -
- -

PROMPT: -t_coffee sample_seq1.fasta -in=Xpam250mt  --gapopen=-10  -gapext=-1

- -
- -

This command results in a progressive -alignment carried out on the sequences in seqfile. The procedure does not use -any more the T-Coffee concistency based algorithm, but switches to a standard -progressive alignment algorithm (like ClustalW or Pileup) much less accurate. -In this context, appropriate gap penalties should be provided. The matrices are -in the file source/matrices.h. Add-Hoc matrices can also be provided by the -user (see the matrices format section at the end of this manual).

- -
- -

Warning: Xmatrix does not have the same -effect as using the -matrix flag.  The --matrix defines the matrix that will be used while compiling the library while -the Xmatrix defines the matrix used when assembling the final alignment.

- -
- -

Profile Input

- -

-profile

- -

Usage: --profile=[<name>,] maximum of 200 profiles.

- -

Default: no default

- -

This flag causes T-Coffee to treat -multiple alignments as a single sequences, thus making it possible to make -multiple profile alignments. The profile-profile alignment is controlled by -profile_mode and -profile_comparison. When provided -with the -in flag, profiles must be -preceded with the letter R.

- -
- -

PROMPT: -t_coffee –profile sample_aln1.aln,sample_aln2.aln –outfile=profile_aln

- -

PROMPT: -t_coffee –in Rsample_aln1.aln,Rsample_aln2.aln,Mslow_pair,Mlalign_id_pair -–outfile=profile_aln

- -
- -

Note that when using –template_file, -the program will also look for the templates associated with the profiles, even -if the profiles have been provided as templates themselves (however it will not -look for the template of the profile templates of the profile templates…)

- -

-profile1 [cw]

- -

Usage: --profile1=[<name>], one name only

- -

Default: no default

- -

Similar to the previous one and was -provided for compatibility with ClustalW.

- -

-profile2 [cw]

- -

Usage: --profile1=[<name>], one name only

- -

Default: no default

- -

Similar to the previous one and was -provided for compatibility with ClustalW.

- -

Alignment Computation

- -

Library Computation: Methods

- -

-lalign_n_top

- -

Usage: --lalign_n_top=<Integer>

- -

Default: -lalign_n_top=10

- -

Number of alignment reported by the -local method (lalign).

- -

-align_pdb_param_file

- -

Unsuported

- -

-align_pdb_hasch_mode

- -

Unsuported

- -

Library -Computation: Extension

- -

-lib_list [Unsupported]

- -

Usage:  --lib_list=<filename>

- -

Default:unset

- -

Use this flag if you do not want the library computation to take -into account all the possible pairs in your dataset. For instance

- -

Format:

- -
- -

      2 Name1 name2

- -

      2 Name1 name4

- -

      3 Name1 Name2 Name3…

- -
- -

            (the line 3 would -be used by a multiple alignment method).

- -

-do_normalise

- -

Usage:  --do_normalise=<0 or a positive value>

- -

Default:-do_normalise=1000

- -

Development -Only

- -

When using a value different from 0, this flag sets the score of the -highest scoring pair to 1000.

- -

-extend

- -

Usage:  -extend=<0,1 or a positive value>

- -

Default:-extend=1

- -

Development Only

- -

When turned on, this flag indicates -that the library extension should be carried out when performing the multiple -alignment. If -extend =0, the extension is not made, if it is set to 1, -the extension is made on all the pairs in the library. If the extension is set -to another positive value, the extension is only carried out on pairs having a weight -value superior to the specified limit.

- -

-extend_mode

- -

Usage:  -extend=<string>

- -

Default:-extend=very_fast_triplet

- -

Warning: Development Only

- -

Controls the algorithm for matrix -extension. Available modes include:

- -

relative_triplet                  Unsupported

- -

g_coffee                                           Unsupported

- -

g_coffee_quadruplets    Unsupported

- -

fast_triplet                        Fast triplet extension

- -

very_fast_triplet                             slow triplet -extension, limited to the -max_n_pair best sequence pairs when aligning -two profiles

- -

slow_triplet                       Exhaustive use of all the triplets

- -

mixt                                   Unsupported

- -

quadruplet                        Unsupported

- -

test                                     Unsupported

- -

matrix                                               Use of the matrix -matrix

- -

fast_matrix                      Use of the matrix -matrix. Profiles are -turned into consensus

- -

-max_n_pair

- -

Usage:  -max_n_pair=<integer>

- -

Default:-extend=10

- -

Development Only

- -

Controls the number of pairs -considered by the -extend_mode=very_fast_triplet. Setting it to 0 forces -all the pairs to be considered (equivalent to -extend_mode=slow_triplet).

- -

-seq_name_for_quadruplet

- -

Usage:  Unsupported

- -

-compact

- -

Usage:  Unsupported

- -

-clean

- -

Usage:  Unsupported

- -

-maximise

- -

Usage:  Unsupported

- -

-do_self

- -

Usage:  Flag -do_self

- -

Default: No

- -

This flag causes the extension to -carried out within the sequences (as opposed to between sequences). This is -necessary when looking for internal repeats with Mocca.

- -

-seq_name_for_quadruplet

- -

Usage:  Unsupported

- -

-weight

- -

Usage:  -weight=<winsimN, sim or -sim_<matrix_name or matrix_file> or <integer value>

- -

Default: -weight=sim

- -

Weight defines the way alignments are -weighted when turned into a library.  Overweighting can be obtained with the -OW<X> weight mode.

- -

 

- -

winsimN indicates that the weight -assigned to a given pair will be equal to the percent identity within a window -of 2N+1 length centered on that pair. For instance winsim10 defines a window of -10 residues around the pair being considered. This gives its own weight to each -residue in the output library. In our hands, this type of weighting scheme has -not provided any significant improvement over the standard sim value.

- -
- -

PROMPT: -t_coffee sample_seq1.fasta -weight=winsim10 –out_lib=test.tc_lib

- -
- -

sim indicates that the weight equals -the average identity within the sequences containing the matched residues.

- -

OW<X> Will -cause the sim weight to be multiplied by X

- -

sim_matrix_name indicates the average -identity with two residues regarded as identical when their substitution value -is positive. The valid matrices names are in matrices.h (pam250mt) .Matrices not found in this header are -considered to be filenames. See the format section for matrices. For instance, -weight=sim_pam250mt indicates that the -grouping used for similarity will be the set of classes with positive -substitutions.

- -
- -

PROMPT: -t_coffee sample_seq1.fasta -weight=winsim10 –out_lib=test.tc_lib

- -
- -

Other groups include

- -

sim_clustalw_col ( categories of clustalw marked with :)

- -

sim_clustalw_dot ( categories of -clustalw marked with .)

- -

Value indicates that all the pairs found in -the alignments must be given the same weight equal to value. This is useful -when the alignment one wishes to turn into a library must be given a -pre-specified score (for instance if they come from a structure -super-imposition program). Value is an integer:

- -
- -

PROMPT: -t_coffee sample_seq1.fasta -weight=1000 –out_lib=test.tc_lib

- -
- -

Tree Computation

- -

-distance_matrix_mode

- -

Usage: --distance_matrix_mode=<slow, fast, very_fast>

- -

Default: very_fast

- -

This flag indicates the method used -for computing the distance matrix (distance between every pair of sequences) -required for the computation of the dendrogram.

- -

Slow   The -chosen dp_mode using the extended library,

- -

fast:   - The fasta dp_mode using the -extended library.

- -

very_fast          The fasta dp_mode using blosum62mt.

- -

ktup    Ktup matching (Muscle kind)

- -

aln                      Read the distances on a precomputed MSA

- -

-quicktree [CW]

- -

Usage: --quicktree

- -

Description: Causes T-Coffee to compute a -fast approximate guide tree

- -

This flag is kept for compatibility with -ClustalW. It indicates that:

- -
- -

PROMPT: -t_coffee sample_seq1.fasta –distance_matrix_mode=very_fast

- -

PROMPT: -t_coffee sample_seq1.fasta –quicktree

- -
- -

Pair-wise Alignment Computation

- -

 

- -
- -

Controlling Alignment Computation

- -

 

- -

Most -parameters in this section refer to the alignment mode fasta_pair_wise and -cfatsa_pair_wise. When using these alignment modes, things proceed as follow:

- -

1-Sequences -are recoded using a degenerated alphabet provided with <-sim_matrix>

- -

2-Recoded -sequences are then hashed into ktuples of size <-ktup>

- -

3-Dynamic -programming runs on the <-ndiag> best diagonals whose score is -higher than <-diag_threshold>, the way diagonals are scored is -controlled via <-diag_mode> .

- -

4-The -Dynamic computation is made to optimize either the library scoring scheme (as -defined by the -in flag) or a substitution matrix as provided via the -matrix -flag. The penalty scheme is defined by -gapopen and -gapext. If -gapopen -is undefined, the value defined in -cosmetic_penalty is used instead.

- -

5-Terminal -gaps are scored according to -tg_mode

- -
- -

 

- -

 

- -

-dp_mode

- -

Usage:  -dp_mode=<string>

- -

Default: -dp_mode=cfasta_fair_wise

- -

This flag indicates the type of -dynamic programming used by the program:

- -
- -

PROMPT: -t_coffee sample_seq1.fasta –dp_mode myers_miller_pair_wise

- -
- -

gotoh_pair_wise: implementation of -the gotoh algorithm (quadratic in memory and time)

- -

myers_miller_pair_wise: -implementation of the Myers and Miller dynamic programming algorithm ( -quadratic in time and linear in space). This algorithm is recommended for very -long sequences. It is about 2 times slower than gotoh and only accepts tg_mode=1or 2 (i.e. gaps penalized for -opening).

- -

fasta_pair_wise: implementation of the fasta algorithm. -The sequence is hashed, looking for ktuples -words. Dynamic programming is only carried out on the ndiag best scoring diagonals. This is much faster but less accurate -than the two previous. This mode is controlled by the parameters -ktuple, --diag_mode and -ndiag

- -

cfasta_pair_wise: c stands for -checked. It is the same algorithm. The dynamic programming is made on the ndiag best diagonals, and then on the -2*ndiags, and so on until the scores converge. Complexity will depend on the -level of divergence of the sequences, but will usually be L*log(L), with an -accuracy comparable to the two first mode ( this was checked on BaliBase). This -mode is controlled by the parameters -ktuple, -diag_mode and –ndiag

- -

Note: Users may -find by looking into the code that other modes with fancy names exists -(viterby_pair_wise…) Unless mentioned in this documentation, these modes are -not supported.

- -

-ktuple

- -

Usage:  -ktuple=<value>

- -

Default: -ktuple=1 or 2

- -

Indicates the ktuple size for -cfasta_pair_wise dp_mode and fasta_pair_wise. It is set to 1 for proteins, and -2 for DNA. The alphabet used for protein can be a degenerated version, set with --sim_matrix..

- -

-ndiag

- -

Usage:  -ndiag=<value>

- -

Default: -ndiag=0

- -

Indicates the number of diagonals -used by the fasta_pair_wise algorithm (cf -dp_mode). When  -ndiag=0, n_diag=Log (length of the -smallest sequence)+1.

- -

When –ndiag and -–diag_threshold are set, diagonals are selected if and only if they fulfill -both conditions.

- -

-diag_mode

- -

Usage:  -diag_mode=<value>

- -

Default: -diag_mode=0

- -

Indicates the manner in which -diagonals are scored during the fasta hashing.

- -

0: indicates that the score of a -diagonal is equal to the sum of the scores of the exact matches it contains.

- -

1 indicates that this score is set -equal to the score of the best uninterrupted segment (useful when dealing with -fragments of sequences).

- -

-diag_threshold

- -

Usage:  -diag_threshold=<value>

- -

Default: -diag_threshold=0

- -

Sets the value of the threshold when -selecting diagonals.

- -

0: indicates that –ndiag should be -used to select the diagonals (cf –ndiag section).

- -

-sim_matrix

- -

Usage:  -sim_matrix=<string>

- -

Default: -sim_matrix=vasiliky

- -

Indicates the manner in which the -amino acid alphabet is degenerated when hashing in the fasta_pairwise dynamic -programming. Standard ClustalW matrices are all valid. They are used to define -groups of amino acids having positive substitution values. In T-Coffee, the -default is a 13 letter grouping named Vasiliky, with residues grouped as -follows:

- -

rk, de, qh, vilm, fy (other residues kept -alone).

- -

This alphabet is set with the flag -sim_matrix=vasiliky. -In order to keep the alphabet non degenerated, -sim_matrix=idmat can be -used to retain the standard alphabet.

- -

-matrix [CW]

- -

Usage:  -matrix=<blosum62mt>

- -

Default: -matrix=blosum62mt

- -

The usage of this flag has been -modified from previous versions, due to frequent mistakes in its usage. This -flag sets the matrix that will be used by alignment methods within t_coffee -(slow_pair, lalign_id_pair). It does not affect external methods (like -clustal_pair, clustal_aln…).

- -

Users can also provide their own -matrices, using the matrix format described in the appendix.

- -

-nomatch

- -

Usage:  -nomatch=<positive value>

- -

Default: -nomatch=0

- -

Indicates the penalty to associate -with a match. When using a library, all matches are positive or equal to 0. -Matches equal to 0 are unsupported by the library but non-penalized. Setting -nomatch to a non-negative value makes it possible to penalize these null -matches and prevent unrelated sequences from being aligned (this can be useful -when the alignments are meant to be used for structural modeling).

- -

-gapopen

- -

Usage:  -gapopen=<negative value>

- -

Default: -gapopen=0

- -

Indicates the penalty applied for -opening a gap. The penalty must be negative. If no value is provided when using -a substitution matrix, a value will be automatically computed.

- -

Here are some guidelines regarding -the tuning of gapopen and gapext. In T-Coffee matches get a score between 0 -(match) and 1000 (match perfectly consistent with the library). The default -cosmetic penalty is set to -50 (5% of a perfect match). If you want to tune --gapoen and see a strong effect, you should therefore consider values between 0 -and -1000.

- -

-gapext

- -

Usage:  -gapext=<negative value>

- -

Default: -gapext=0

- -

Indicates the penalty applied for -extending a gap (cf -gapopen)

- -

-fgapopen

- -

Unsupported

- -

-fgapext

- -

Unsupported

- -

-cosmetic_penalty

- -

Usage:  -cosmetic_penalty=<negative value>

- -

Default: -cosmetic_penalty=-50

- -

Indicates the penalty applied for -opening a gap. This penalty is set to a very low value. It will only have an -influence on the portions of the alignment that are unalignable. It will not -make them more correct, but only more pleasing to the eye ( i.e. Avoid -stretches of lonely residues).

- -

The cosmetic penalty is automatically -turned off if a substitution matrix is used rather than a library.

- -

-tg_mode

- -

Usage:  -tg_mode=<0, 1, or 2>

- -

Default: -tg_mode=1

- -

0: terminal gaps penalized with -gapopen + -gapext*len

- -

1: terminal gaps penalized with a -gapext*len

- -

2: terminal gaps unpenalized.

- -

 

- -

Weighting Schemes

- -

-seq_weight

- -

Usage: --seq_weight=<t_coffee or <file_name>>

- -

Default: -seq_weight=t_coffee

- -

These are the individual weights -assigned to each sequence. The t_coffee weights try to compensate the bias in -consistency caused by redundancy in the sequences.

- -

            sim(A,B)=%similarity -between A and B, between 0 and 1.

- -

            weight(A)=1/sum(sim(A,X)^3)

- -

Weights are normalized so that their -sum equals the number of sequences. They are applied onto the primary library -in the following manner:

- -

            res_score(Ax,By)=Min(weight(A), -weight(B))*res_score(Ax, By)

- -

These are very simple weights. Their -main goal is to prevent a single sequence present in many copies to dominate -the alignment.

- -

Note: The -library output by -out_lib is the un-weighted  -library.

- -

Note: Weights -can be output using the -outseqweight flag.

- -

Note: You can -use your own weights (see the format section).

- -

 

- -

Multiple Alignment Computation

- -

-msa_mode

- -

Usage: --msa_mode=<tree,graph,precomputed>

- -

Default: -evaluate_mode=tree

- -

Unsupported

- -

-one2all

- -

Usage: -one2all=<name>

- -

Default: not used

- -

Will generate a one to all -library with respect to the specified sequence and will then align all the -sequences in turn to that sequence, in a sequence determined by the order in -which the sequences were provided. 

- -

–profile_comparison =profile, the MSAs -provided via –profile are vectorized and the function specified by -–profile_comparison is used to make profile profile alignments. In that case, -the complexity is NL^2

- -

-profile_comparison

- -

Usage: --profile_mode=<fullN,profile>

- -

Default: -profile_mode=full50

- -

The profile mode flag controls the -multiple profile alignments in T-Coffee. There are two instances where t_coffee -can make multiple profile alignments:

- -

1-When N, the number of sequences is -higher than –maxnseq, the program -switches to its multiple profile alignment mode (t_coffee_dpa).

- -

2-When MSAs are provided via the –profile flag or via –profile1 and –profile2.

- -

In these situations, the -–profile_mode value influences the alignment computation, these values are:

- -

–profile_comparison =profile, the MSAs -provided via –profile are vectorized and the function specified by -–profile_comparison is used to make profile profile alignments. In that case, -the complexity is NL^2

- -

-profile_comparison=fullN, N is an integer value that can omitted. Full indicates that given two profiles, the alignment will be based -on a library that includes every possible pair of sequences between the two -profiles. If N is set, then the library will be restricted to the N most -similar pairs of sequences between the two profiles, as judged from a measure -made on a pairwise alignment of these two profiles.

- -

-profile_mode

- -

Usage: --profile_mode=<cw_profile_profile, muscle_profile_profile, multi_channel>

- -

Default: -profile_mode=cw_profile_profile

- -

When –profile_comparison=profile, this flag selects a profile scoring -function.

- -

Alignment Post-Processing

- -

-clean_aln

- -

Usage:  -clean_aln  -

- -

Default:-clean_aln

- -

This flag causes T-Coffee to -post-process the multiple alignment. Residues that have a reliability score -smaller or equal to -clean_threshold (as given by an evaluation that uses --clean_evaluate_mode)  are realigned to -the rest of the alignment. Residues with a score higher than the threshold -constitute a rigid framework that cannot be altered.

- -

The cleaning algorithm is greedy. It -starts from the top left segment of low constituency residues and works its way -left to right, top to bottom along the alignment. You can require this -operation to be carried out for several cycles using the -clean_iterations -flag.

- -

The rationale behind this operation -is mostly cosmetic. In order to ensure a decent looking alignment, the gop is -set to -20 and the gep to -1. There is no penalty for terminal gaps, and the -matrix is blosum62mt.

- -

Note: Gaps are -always considered to have a reliability score of 0.

- -

Note: The use of the -cleaning option can result in memory overflow when aligning large sequences,

- -

-clean_threshold

- -

Usage:  -clean_threshold=<0-9> 

- -

Default:-clean_aln=1

- -

See -clean_aln for details.

- -

-clean_iteration

- -

Usage:  -clean_iteration=<value between 1 and -> 

- -

Default:-clean_iteration=1

- -

See -clean_aln for details.

- -

-clean_evaluation_mode

- -

Usage:  -clean_iteration=<evaluation_mode -> 

- -

Default:-clean_iteration=t_coffee_non_extended

- -

Indicates the mode used for the -evaluation that will indicate the segments that should be realigned. See --evaluation_mode for the list of accepted modes.

- -

-iterate

- -

Usage: --iterate=<integer>

- -

Default: -iterate=0

- -

Sequences are extracted in turn and -realigned to the MSA. If iterate is set to -1, each sequence is realigned, -otherwise the number of iterations is set by –iterate.

- -

CPU Control

- -

Multithreading

- -

-multi_core

- -

Usage:  -multi_core= templates_jobs_relax_msa

- -

Default: 0

- -

Specifies that the steps of T-Coffee -that should be multi threaded. by default all relevant steps are.

- -
- -

PROMPT: -t_coffee sample_seq2.fasta -multi_core jobs

- -
- -

-n_core

- -

Usage:  -n_core= <number of cores>

- -

Default: 0

- -

Default indicates that all -cores will be used, as indicated by the environment via:

- -
- -

PROMPT: t_coffee sample_seq2.fasta -multi_core jobs

- -
- -

 

- -

Limits

- -

-mem_mode

- -

Usage:  deprecated

- -

-ulimit

- -

Usage:  -ulimit=<value>

- -

Default: -ulimit=0

- -

Specifies the upper limit of memory -usage (in Megabytes). Processes exceeding this limit will automatically exit. A -value 0 indicates that no limit applies.

- -

-maxlen

- -

Usage:  -maxlen=<value, 0=nolimit>

- -

Default: -maxlen=1000

- -

Indicates the maximum length of the -sequences.

- -

Aligning -more than 100 sequences with DPA

- -

-maxnseq

- -

Usage:  -maxnseq=<value, 0=nolimit>

- -

Default: -maxnseq=50

- -

Indicates the maximum number of -sequences before triggering the use of t_coffee_dpa.

- -

-dpa_master_aln

- -

Usage: --dpa_master_aln=<File, method>

- -

Default: -dpa_master_aln=NO

- -

When using dpa, t_coffee needs a seed -alignment that can be computed using any appropriate method. By default, -t_coffee computes a fast approximate alignment.

- -

A pre-alignment can be provided -through this flag, as well as any program using the following syntax:

- -

your_script –in <fasta_file> -out -<file_name>

- -

-dpa_maxnseq

- -

Usage: --dpa_maxnseq=<integer value>

- -

Default: -dpa_maxnseq=30

- -

Maximum number of sequences aligned -simultaneously when DPA is ran. Given the tree computed from the master -alignment, a node is sent to computation if it controls more than –dpa_maxnseq OR if it controls a pair -of sequences having less than –dpa_min_score2 -percent ID.

- -

-dpa_min_score1

- -

Usage: --dpa_min_score1=<integer value>

- -

Default: -dpa_min_score1=95

- -

Threshold for not realigning the -sequences within the master alignment. Given this alignment and the associated -tree, sequences below a node are not realigned if none of them has less than –dpa_min_score1 % identity.

- -

-dpa_min_score2

- -

Usage: --dpa_min_score2

- -

Default: -dpa_min_score2

- -

Maximum number of sequences aligned -simultaneously when DPA is ran. Given the tree computed from the master -alignment, a node is sent to computation if it controls more than –dpa_maxnseq OR if it controls a pair -of sequences having less than –dpa_min_score2 -percent ID.

- -

-dap_tree -[NOT IMPLEMENTED]

- -

Usage:  -dpa_tree=<filename>

- -

Default: -unset

- -

Guide tree used in DPA. This is a -newick tree where the distance associated with each node is set to the minimum -pairwise distance among all considered sequences.

- -

Using Structures

- -

Generic

- -

-mode

- -

Usage: -mode=3dcoffee

- -

Default: turned off

- -

Runs t_coffee with the 3dcoffee mode -(cf next section).

- -

-check_pdb_status

- -

Usage: --check_pdb_status

- -

Default: turned off

- -

Forces t_coffee to run -extract_from_pdb to check the pdb status of each sequence. This can -considerably slow down the program.

- -

 

- -

3D Coffee: Using SAP

- -

It is possible to use t_coffee to -compute multiple structural alignments. To do so, ensure that you have the sap -program installed.

- -
- -

PROMPT: -t_coffee –pdb=struc1.pdb,struc2.pdb,struc3.pdb -method sap_pair

- -
- -

Will combine the pairwise alignments -produced by SAP.  There are currently -four methods that can be interfaced with t_coffee:

- -

sap_pair: -that uses the sap algorithm

- -

align_pdb: -uses a t_coffee implementation of sap, not as accurate.

- -

tmaliagn_pair -(http://zhang.bioinformatics.ku.edu/TM-align/)

- -

mustang_pair -(http://www.cs.mu.oz.au/~arun/mustang)

- -

When providing a PDB file, the -computation is only carried out on the first chain of this file. If your -original file contains several chain, you should extract the chain you want to -work on. You can use t_coffee –other_pg -extract_from_pdb or any pdb handling program.

- -

If you are working with public PDB -files, you can use the PDB identifier and specify the chain by adding its index -to the identifier (i.e. 1pdbC). If your structure is an NMR structure, you are -advised to provide the program with one structure only.

- -

If you wish to align only a portion -of the structure, you should extract it yourself from the pdb file, using t_coffee –other_pg extract_from_pdb or -any pdb handling program.

- -

You can provide t_coffee with a -mixture of sequences and structure. In this case, you should use the special -mode:

- -
- -

PROMPT: -t_coffee –mode 3dcoffee –seq 3d_sample3.fasta -template_file -template_file.template

- -
- -

Using/finding PDB templates for the Sequences

- -

-template_file

- -

Usage: --template_file =

- -

<filename,

- -

SCRIPT_scriptame,

- -

SELF_TAG

- -

SEQFILE_TAG_filename,

- -

no>

- -

Default: no

- -

This flag instructs t_coffee on the -templates that will be used when combining several types of information. For -instance, when using structural information, this file will indicate the -structural template that corresponds to your sequences. The identifier T -indicates that the file should be a FASTA like file, formatted as follows. -There are several ways to pass the templates:

- -

Predefined Modes

- -

EXPRESSO: will use the EBI server to find -_P_ templates

- -

PSIBLAST: will use the EBI sever to find -profiles

- -

 

- -

File name

- -

This file contains the -sequence/template association it uses a FASTA-like format, as follows:

- -
- -

><sequence name> _P_ -<pdb template>

- -

><sequence name> _G_ -<gene template>

- -

><sequence name> _R_ -<MSA template>

- -

><sequence name> _F_ -<RNA Secondary Structure>

- -

><sequence name> _T_ -<Transmembrane Secondary Structure>

- -

><sequence name> _E_ -<Protein Secondary Structure>

- -

 

- -
- -

Each template will be used in place -of the sequence with the appropriate method. For instance, structural templates -will be aligned with sap_pair and the information thus generated will be -transferred onto the alignment.

- -

Note the following rule:

- -

            -Each -sequence can have one template of each type (structural, genomics…)

- -

            -Each -sequence can only have one template of a given type

- -

            -Several -sequences can share the same template

- -

            -All -the sequences do not need to have a template

- -

The type of template on which a -method works is declared with the SEQ_TYPE parameter in the method -configuration file:

- -

            SEQ_TYPE          S: a method that uses sequences

- -

            SEQ_TYPE          PS: a pairwise method that aligns -sequences and structures

- -

            SEQ_TYPE          P: a method that aligns structures -(sap for instance)

- -

There are 4 tags identifying the -template type:

- -

_P_         Structural templates: a pdb identifier -OR a pdb file

- -

_G_        Genomic templates: a -protein sequence where boundary amino-acid have been recoded with ( o:0, i:1, -j:2)

- -

_R_        Profile Templates: a file containing a -multiple sequence alignment

- -

_F_         RNA secondary Structures

- -

 

- -

More than one template file can be -provided. There is no need to have one template for every sequence in the -dataset.

- -

_P_, _G_, -and _R_ are known as template TAGS

- -

2-SCRIPT_<scriptname>

- -

Indicates that filename is a script -that will be used to generate a valid template file. The script will run on a -file containing all your sequences using the following syntax:

- -

scriptname –infile=<your sequences> --outfile=<template_file>

- -

It is also possible to pass some -parameters, use @ as a separator and # in place of the = sign. For instance, if -you want to call the a script named blast.pl with the foloowing parameters;

- -

blast.pl -db=pdb -dir=/local/test

- -

Use

- -

SCRIPT_blast.pl@db#pdb@dir#/local/test

- -

Bear in mind that the input output -flags will then be concatenated to this command line so that t_coffee ends up -calling the program using the following system call:

- -

blast.pl -db=pdb -dir=/local/test --infile=<some tmp file> -outfile=<another tmp file>

- -

 

- -

3-SELF_TAG

- -

TAG can take the value of any of the known -TAGS (_S_, _G_, _P_). SELF indicates that the original name of the sequence -will be used to fetch the template:

- -
- -

PROMPT: -t_coffee 3d_sample2.fasta –template_file SELF_P_

- -
- -

The previous command will work -because the sequences in 3d_sample3 are named

- -

4-SEQFILE_TAG_filename

- -

Use this flag if your templates are -in filename, and are named according to the sequences. For instance, if your -protein sequences have been recoded with Exon/Intron information, you should -have the recoded sequences names according to the original:

- -

SEQFILE_G_recodedprotein.fasta 

- -

-struc_to_use

- -

Usage: --struc_to_use=<struc1, struc2…>

- -

Default: -struc_to_use=NULL

- -

Restricts the 3Dcoffee to a set of -pre-defined structures.

- -

Multiple Local Alignments

- -

It is possible to compute multiple local alignments, -using the moca routine. MOCA is a routine that allows extracting all the local -alignments that show some similarity with another predefined fragment.

- -

'mocca' is a perl script that calls t-coffee -and provides it with the appropriate parameters.

- -

-domain/-mocca

- -

Usage: --domain

- -

Default: not set

- -

This flag indicates that t_coffee -will run using the domain mode. All the sequences will be concatenated, and the -resulting sequence will be compared to itself using lalign_rs_s_pair mode -(lalign of the sequence against itself using keeping the lalign raw score). -This step is the most computer intensive, and it is advisable to save the -resulting file.

- -
- -

PROMPT: -t_coffee -in Ssample_seq1.fasta,Mlalign_rs_s_pair --out_lib=sample_lib1.mocca_lib -domain -start=100 -len=50

- -
- -

This instruction will use the -fragment 100-150 on the concatenated sequences, as a template for the extracted -repeats. The extraction will only be made once. The library will be placed in -the file <lib name>.

- -

 

- -

If you want, you can test other -coordinates for the repeat, such as

- -
- -

PROMPT: -t_coffee -in sample_lib1.mocca_lib -domain -start=100 -len=60

- -
- -

This run will use the fragment -100-160, and will be much faster because it does not need to re-compute the -lalign library.

- -

-start

- -

Usage: --start=<int value>

- -

Default: not set

- -

This flag indicates the starting -position of the portion of sequence that will be used as a template for the -repeat extraction. The value assumes that all the sequences have been -concatenated, and is given on the resulting sequence.

- -

-len

- -

Usage: --len=<int value>

- -

Default: not set

- -

This flag indicates the length of the -portion of sequence that will be used as a template.

- -

-scale

- -

Usage: --scale=<int value>

- -

Default: -scale=-100

- -

This flag indicates the value of the -threshold for extracting the repeats. The actual threshold is equal to:

- -

            motif_len*scale

- -

Increase the scale óIncrease sensitivity ó More -alignments( i.e. -50).

- -

-domain_interactive [Examples]

- -

Usage: --domain_interactive

- -

Default: unset

- -

Launches an interactive mocca -session.

- -
- -

PROMPT: -t_coffee -in Lsample_lib3.tc_lib,Mlalign_rs_s_pair -domain -start=100 -len=60

- -
- -
- -

TOLB_ECOLI_212_26                     211 SKLAYVTFESGR--SALVIQTLANGAVRQV-ASFPRHNGAPAFSPDGSKLAFA

- -

TOLB_ECOLI_165_218    164 TRIAYVVQTNGGQFPYELRVSDYDGYNQFVVHRSPQPLMSPAWSPDGSKLAYV

- -

TOLB_ECOLI_256_306    255 SKLAFALSKTGS--LNLYVMDLASGQIRQV-TDGRSNNTEPTWFPDSQNLAFT

- -

TOLB_ECOLI_307_350    306 -------DQAGR--PQVYKVNINGGAPQRI-TWEGSQNQDADVSSDGKFMVMV

- -

TOLB_ECOLI_351_393    350 -------SNGGQ--QHIAKQDLATGGV-QV-LSSTFLDETPSLAPNGTMVIYS 

- -

                        1           *             *    -:          .   -.:.  :   

- -

 

- -

        MENU: Type Letter Flag[number] and -Return: ex |10

- -

        |x      --->Set     the START to x

- -

        >x      -->Set     the LEN   -to x

- -

        Cx      --->Set     the sCale to x

- -

        Sname   --->Save    the  Alignment

- -

        Bx      --->Save    Goes back x it

- -

        return  --->Compute the  Alignment

- -

        X       --->eXit

- -

 

- -

[ITERATION   1] [START=211] [LEN= 50] [SCALE=-100]      YOUR CHOICE:

- -

For instance, to set the -length of the domain to 40, type:

- -

 

- -

[ITERATION   1] [START=211] [LEN= 50] [SCALE=-100]      YOUR CHOICE:>40[return]

- -

[return]

- -

 

- -

Which will generate:

- -

 

- -

TOLB_ECOLI_212_252    211 -SKLAYVTFESGRSALVIQTLANGAVRQVASFPRHNGAPAF  -251

- -

TOLB_ECOLI_256_296    255 -SKLAFALSKTGSLNLYVMDLASGQIRQVTDGRSNNTEPTW  -295

- -

TOLB_ECOLI_300_340    299 -QNLAFTSDQAGRPQVYKVNINGGAPQRITWEGSQNQDADV  -339

- -

TOLB_ECOLI_344_383    343 -KFMVMVSSNGGQQHIAKQDLATGGV-QVLSSTFLDETPSL  -382

- -

TOLB_ECOLI_387_427    386 -TMVIYSSSQGMGSVLNLVSTDGRFKARLPATDGQVKFPAW  -426

- -

                        1   :     -:     :           ::         -.     40

- -

 

- -

 

- -

 

- -

 

- -

        MENU: Type Letter Flag[number] and -Return: ex |10

- -

        |x      --->Set     the START to x

- -

        >x      -->Set     the LEN   -to x

- -

        Cx      --->Set     the sCale to x

- -

        Sname   --->Save    the  Alignment

- -

        -Bx      -->Save    Goes back x it

- -

        return  --->Compute the  Alignment

- -

        X       --->eXit

- -

 

- -

[ITERATION   3] [START=211] [LEN= 40] [SCALE=-100]      YOUR CHOICE:

- -
- -

 

- -

If you want to indicate the -coordinates, relative to a specific sequence, type:

- -

  |<seq_name>:start

- -

Type S<your name> to save the -current alignment, and extract a new motif.

- -

Type X when you are done.

- -

Output Control

- -

Generic

- -

Conventions Regarding -Filenames

- -

stdout, stderr, stdin, no, /dev/null are valid -filenames. They cause the corresponding file to be output in stderr or stdout, -for an input file, stdin causes the program to requests the corresponding file -through pipe. No causes a suppression of the output, as does /dev/null.

- -

Identifying the Output files -automatically

- -

In the t_coffee output, each output appears in -a line:

- -
- -

##### FILENAME <name> TYPE -<Type> FORMAT <Format>

- -
- -

-no_warning

- -

Usage:  -no_warning

- -

Default: Switched off

- -

Suppresseswarning output.

- -

 

- -

Alignments

- -

-outfile

- -

Usage:  -outfile=<out_aln file,default,no>

- -

Default:-outfile=default

- -

Indicates the name of the alignment -output by t_coffee. If the default is used, the alignment is named <your sequences>.aln

- -

-output

- -

Usage:  -output=<format1,format2,...>

- -

Default:-output=clustalw

- -

Indicates the format used for -outputting the -outfile.

- -

Supported formats are:

- -

           

- -

clustalw_aln, clustalw       : ClustalW format.

- -

gcg, msf_aln                        : MSF alignment.

- -

pir_aln                                  : pir alignment.

- -

fasta_aln                             : fasta alignment.

- -

phylip                                   : Phylip format.

- -

pir_seq                                  : pir sequences (no gap).

- -

fasta_seq                             : fasta sequences (no gap).

- -

                           

- -

As well as:

- -

 

- -

score_ascii           : causes the output of a reliability flag

- -

score_html           : causes the output to be a reliability plot in HTML

- -

score_pdf             : idem in PDF (if ps2pdf is installed on your system).

- -

score_ps                               : -idem in postscript.

- -

 

- -

More than one format can be -indicated:

- -
- -

PROMPT: -t_coffee sample_seq1.fasta -output=clustalw,gcg, score_html

- -
- -

A publication describing the CORE -index is available on:

- -

http://www.tcoffee.org/Publications/Pdf/core.pp.pdf

- -

-outseqweight

- -

Usage:  -outseqweight=<filename>

- -

Default: not used

- -

Indicates the name of the file in -which the sequences weights should be saved..

- -

-case

- -

Usage:  -case=<keep,upper,lower>

- -

Default: -case=keep

- -

Instructs the program on the case to be used in -the output file (Clustalw uses upper case). The default keeps the case and -makes it possible to maintain a mixture of upper and lower case residues.

- -

If you need to change the case of your file, -you can use seq_reformat:

- -
- -

PROMPT: -t_coffee –other_pg seq_reformat –in sample_aln1.aln –action +lower –output -clustalw

- -
- -

-cpu

- -

Usage:  deprecated

- -

-outseqweight

- -

Usage: -outseqweight=<name of the file -containing the weights applied>

- -

Default: -outseqweight=no

- -

Will cause the program to output the weights -associated with every sequence in the dataset.

- -

-outorder [cw]

- -

Usage:  -outorder=<input OR aligned OR -filename>

- -

Default:-outorder=input

- -

Sets the order of the sequences in -the output alignment: -outorder=input means the sequences are kept in -the original order. -outorder=aligned means the sequences come in the -order indicated by the tree. This order can be seen as a one-dimensional -projection of the tree distances. –outdorder=<filename>Filename -is a legal fasta file, whose order will be used in the final alignment.

- -

-inorder [cw]

- -

Usage:  -inorder=<input OR aligned>

- -

Default:-inorder=aligned

- -

Multiple alignments based on dynamic -programming depend slightly on the order in which the incoming sequences are -provided. To prevent this effect sequences are arbitrarily sorted at the -beginning of the program (-inorder=aligned). However, this affects the sequence -order within the library. You can switch this off by ststing –inorder=input.

- -

-seqnos

- -

Usage:  -seqnos=<on or off>

- -

Default:-seqnos=off

- -

Causes the output alignment to contain residue -numbers at the end of each line:

- -
- -

T-COFFEE

- -

seq1 aaa---aaaa--------aa 9

- -

seq2 a-----aa-----------a 4

- -

 

- -

seq1 a-----------------a 11

- -

seq2 aaaaaaaaaaaaaaaaaaa 19

- -
- -

Libraries

- -

Although, it does not necessarily do so -explicitly, T-Coffee always end up combining libraries. Libraries are -collections of pairs of residues. Given a set of libraries, T-Coffee makes an -attempt to assemble the alignment with the highest level of consistence. You -can think of the alignment as a timetable. Each library pair would be a request -from students or teachers, and the job of T-Coffee would be to assemble the -time table that makes as many people as possible happy…

- -

-out_lib

- -

Usage:  --out_lib=<name of the library,default,no>

- -

Default:-out_lib=default

- -

 

- -

Sets the name of the library output. -Default implies <run_name>.tc_lib

- -

-lib_only

- -

Usage:  -lib_only

- -

Default: unset

- -

Causes the program to stop once the -library has been computed. Must be used in conjunction with the flag –out_lib

- -

Trees

- -

-newtree

- -

Usage: --newtree=<tree file>

- -

Default: No file specified

- -

Indicates the name of the file into -which the guide tree will be written. The default will be -<sequence_name>.dnd, or <run_name.dnd>. The tree is written in the -parenthesis format known as newick or New - Hampshire and used by Phylips (see the format -section).

- -
- -

Do NOT confuse this guide tree with a -phylogenetic tree.

- -
- -

Reliability Estimation

- -

CORE Computation

- -

The CORE is an index that indicates the -consistency between the library of piarwise alignments and the final multiple -alignment. Our experiment indicate that the higher this consistency, the more -reliable the alignment. A publication describing the CORE index can be found -on:

- -

http://www.tcoffee.org/Publications/Pdf/core.pp.pdf

- -

-evaluate_mode

- -

Usage: --evaluate_mode=<t_coffee_fast,t_coffee_slow,t_coffee_non_extended >

- -

Default: -evaluate_mode=t_coffee_fast

- -

This flag indicates the mode used to -normalize the t_coffee score when computing the reliability score.

- -

t_coffee_fast: Normalization is made -using the highest score in the MSA. This evaluation mode was validated and in -our hands, pairs of residues with a score of 5 or higher have 90 % chances to -be correctly aligned to one another.

- -

t_coffee_slow: Normalization is made -using the library. This usually results in lower score and a scoring scheme -more sensitive to the number of sequences in the dataset. Note that this -scoring scheme is not any more slower, thanks to the implementation of a faster -heuristic algorithm.

- -

t_coffee_non_extended: the score of each -residue is the ratio between the sum of its non extended scores with the column -and the sum of all its possible non extended scores.

- -

These modes will be useful when -generating colored version of the output, with the –output flag:

- -
- -

PROMPT: -t_coffee sample_seq1.fasta –evaluate_mode t_coffee_slow –output score_ascii, -score_html

- -

PROMPT: -t_coffee sample_seq1.fasta –evaluate_mode t_coffee_fast –output score_ascii, -score_html

- -

PROMPT: -t_coffee sample_seq1.fasta –evaluate_mode t_coffee_non_extended –output -score_ascii, score_html

- -
- -

Generic Output

- -

-run_name

- -

Usage: --run_name=<your run name>

- -

Default: no default set

- -

This flag causes the prefix <your -sequences> to be replaced by <your run name> when renaming the default -output files.

- -

-quiet

- -

Usage: --quiet=<stderr,stdout,file name OR nothing>.

- -

Default:-quiet=stderr

- -

Redirects the standard output to -either a file. -quiet on its own redirect the output to /dev/null.

- -

-align [CW]

- -

This flag indicates that the program must -produce the alignment. It is here for compatibility with ClustalW.

- -

APDB, iRMSD and tRMSD Parameters

- -

 

- -
- -

Warning: These flags will only work within -the APDB package that can be invoked via the –other_pg parameter of T-Coffee:

- -

                                t_coffee –other_pg apdb –aln <your aln>

- -

 

- -
- -

-quiet [Same as T-Coffee]

- -

-run_name [Same as -T-Coffee]

- -

-aln

- -

Usage: --aln=<file_name>.

- -

Default:none

- -

Indicates the name of the file -containing the sequences that need to be evaluated. The sequences whose -structure is meant to be used must be named according to their PDB identifier.

- -

The format can be FASTA, CLUSTAL or -any of the formats supported by T-Coffee. APDB only evaluates residues in -capital and ignores those in lower case. If your sequences are in lower case, -you can upper case them using seq_reformat:

- -
- -

PROMPT: -t_coffee –other_pg seq_reformat –in 3d_sample4.aln –action +upper –output -clustalw > 3d_sample4.cw_aln

- -
- -

The alignment can then be evaluated -using the defaultr of APDB:

- -
- -

PROMPT: -t_coffee –other_pg apdb –aln 3d_sample4.aln

- -
- -

The alignment can contain as many -structures as you wish.

- -

-n_excluded_nb

- -

Usage: --n_excluded_nb=<integer>.

- -

Default:1

- -

When evaluating the local score of a -pair of aligned residues, the residues immediately next to that column should -not contribute to the measure. By default the first to the left and first to -the right are excluded.

- -

-maximum_distance

- -

Usage: --maximum_distance=<float>.

- -

Default:10

- -

Size of the neighborhood considered -around every residue. If .-local_mode is set to sphere, -maximum_distance is -the radius of a sphere centered around each residue. If –local_mode is set to -window, then –maximum_distance is the size of the half window (i.e. -window_size=-maximum_distance*2+1).

- -

-similarity_threshold

- -

Usage: --similarity_threshold=<integer>.

- -

Default:70

- -

Fraction of the neighborhood that -must be supportive for a pair of residue to be considered correct in APDB. The -neighborhood is a sphere defined by –maximum_distance, and the support is -defined by –md_threshold.

- -

-local_mode

- -

Usage: --local_mode=<sphere,window>.

- -

Default:sphere

- -

Defines the shape of a neighborhood, -either as a sphere or as a window.

- -

-filter

- -

Usage: --filter=<0.00-1.00>.

- -

Default:1.00

- -

Defines the centiles that should be -kept when making the local measure. Foir instance, -filter=0.90 means that the -the 10 last centiles will be removed from the evaluation. The filtration is -carried out on the iRMSD values.

- -

-print_rapdb [Unsupported]

- -

Usage: --print_rapdb (FLAG)

- -

Default:off

- -

This causes the prints out of the -exact neighborhood of every considered pair of residues.

- -

-outfile [Same as T-Coffee]

- -

This flag is meant to control the output name -of the colored APDB output. This file will either display the local APDB score -or the local iRMD, depending on the value of –color_mode. The default format is -defined by –ouptut and is score_html.

- -

-color_mode

- -

Usage: --color_mode=<apdb, irmsd>

- -

Default:apdb

- -

This flag is meant to control the colored APDB -output (local score). This file will either display the local APDB score or the -local iRMD.

- - - -

We maintain a T-Coffee server (www.tcoffee.org). -We will be pleased to provide anyone who wants to set up a similar service with -the sources

- -

Environment Variables

- -

T-Coffee stores a lots of -information in locations that may be unsuitable when running a server.

- -

By default, T-Coffee will -generate and rely on the follwing directory structure:

- -

/home/youraccount/         #HOME_4_TCOFFEE

- -

HOME_4_TCOFFEE/.t_coffee/ #DIR_4_TCOFFEE

- -

DIR_4_TCOFFEE/cache        #CACHE_4_TCOFFEE

- -

DIR_4_TCOFFEE/tmp          #TMP_4_TCOFFEE

- -

DIR_4_TCOFFEE/methods            #METHOS_4_TCOFFEE

- -

DIR_4_TCOFFEE/mcoffee            #MCOFFEE_4_TCOFFEE

- -

 

- -

By default, all these -directories are automatically created, following the dependencies suggested -here.

- -

The first step is the -determination of the HOME. By default the program tries to use HOME_4_TCOFFEE, -then the HOME variable and TMP or TEMP if HOME is not set on your system or -your account. It is your responsibility to make sure that one of these -variables is set to some valid location where the T-Coffee process is allowed -to read and write.

- -

If no valid location can be -found for HOME_4_TCOFFEE, the program exits. If you are running T-Coffee on a -server, we recommend to hard set the following locations, where your scratch is -a valid location.

- -

 

- -

HOME_4_TCOFFEE=”your scratch”

- -

TMP_4_TCOFFEE=”your -scratch”

- -

DIR_4_TCOFFEE=”your -scratch”

- -

CACHE_4_TCOFFEE=”your -scratch”

- -

NO_ERROR_REPORT_4_TCOFFEE=1

- -

Note that  it is a good idea to have a cron job that -cleans up this scratch area, once in a while.

- -

 

- -

Output of the .dnd file.

- -

A common source of error when running a server: -T-Coffee MUST output the .dnd file because it re-reads it to carry out the -progressive alignment. By default T-Coffee outputs this file in the directory -where the process is running. If the T-Coffee process does not have permission -to write in that directory, the computation will abort...

- -

To avoid this, simply specify the name of the -output tree:

- -

         -newtree=<writable -file (usually in /tmp)>

- -

Chose the name so that two processes may not -over-write each other dnd file.

- -

Permissions

- -

The t_coffee process MUST be allowed to write -in some scratch area, even when it is ran by Mr nobody... Make sure the /tmp/ -partition is not protected.

- -

Other Programs

- -

T-Coffee may call various programs while it -runs (lalign2list by defaults). Make sure your process knows where to find -these executables.

- -
- -

Formats

- -
- -

Parameter files

- -

Parameter files used with -parameters, --t_coffee_defaults, -dali_defaults... Must contain a valid parameter string -where line breaks are allowed. These files cannot contain any comment, the -recommended format is one parameter per line:

- -
- -

      <parameter -name>=<value1>,<value2>....

- -

      <parameter -name>=.....

- -
- -

Sequence Name Handling

- -

Sequence name handling is meant to be fully -consistent with ClustalW (Version 1.75). This implies that in some cases the -names of your sequences may be edited when coming out of the program. Five -rules apply:

- -

 

- -
- -

Naming -Your Sequences the Right Way

- -

1-No -Space

- -

Names -that do contain spaces, for instance:

- -

            >seq1 human_myc

- -

will be -turned into

- -

            >seq1

- -

It is -your responsibility to make sure that the names you provide are not ambiguous -after such an editing. This editing is consistent with Clustalw (Version 1.75)

- -

 

- -

2-No -Strange Character

- -

Some non -alphabetical characters are replaced with underscores. These are: ';:()'

- -

Other -characters are legal and will be kept unchanged. This editing is meant to keep -in line with Clustalw (Version 1.75).

- -

 

- -

3-> is -NEVER legal (except as a header token in a FASTA file)

- -

 

- -

4-Name -length must be below 100 characters, although 15 is recommended for -compatibility with other programs.

- -

5-Duplicated -sequences will be renamed (i.e. sequences with the same name in the same -dataset) are allowed but will be renamed according to their original order. -When sequences come from multiple sources via the –in flag, consistency of the -renaming is not guaranteed. You should avoid duplicated sequences as they will -cause your input to differ from your output thus making it difficult to track -data.

- -
- -

Automatic Format Recognition

- -

Most common formats are automatically -recognized by t_coffee. See -in and the next section for more details. If your -format is not recognized, use readseq or clustalw to switch to another format. -We recommend Fasta.

- -

Structures

- -

PDB format is recognized by T-Coffee. T-Coffee -uses extract_from_pdb (cf –other_pg flag). extract_from_pdb is a small embeded -module that can be used on its own to extract information from pdb files.

- -

RNA Structures

- -

RNA structures can either be coded as T-Coffee -libraries, with each line indicating two paired residues, or as alifold output. -The selex format is also partly supported (see the seq_reformat tutorial on RNA -sequences handling).

- -

Sequences

- -

Sequences can come in the following formats: -fasta, pir, swiss-prot, clustal aln, msf aln and t_coffee aln. These formats -are the one automatically recognized. Please replace the '*' sign sometimes -used for stop codons with an X.

- -

Alignments

- -

Alignments can come in the following formats: -msf, ClustalW, Fasta, Pir and t_coffee. The t_coffee format is very similar to -the ClustalW format, but slightly more flexible. Any interleaved format with -sequence name on each line will be correctly parsed:

- -
- -

<empy line>       [Facultative]n

- -

<line of text>    [Required]

- -

<line of text>               [Facultative]n

- -

<empty line>                 [Required]

- -

<empty line>                 [Facultative]n

- -

<seq1 -name><space><seq1>

- -

<seq2 -name><space><seq2>

- -

<seq3 -name><space><seq3>

- -

<empty line>                 [Required]

- -

<empty line>                 [Facultative]n

- -

<seq1 -name><space><seq1>

- -

<seq2 -name><space><seq2>

- -

<seq3 -name><space><seq3>

- -

<empty line>                 [Required]

- -

<empty line>                 [Facultative]n

- -
- -

An empty line is a line that does NOT contain -amino-acid. A line that contains the ClustalW annotation (.:*) is empty.

- -

Spaces are forbidden in the name. When the -alignment is being read, non character signs are ignored in the sequence field -(such as numbers, annotation…).

- -

Note: a -different number of lines in the different blocks will cause the program to -crash or hang.

- -

Libraries

- -

T-COFFEE_LIB_FORMAT_01

- -

This is currently the only supported format.

- -
- -

!<space> TC_LIB_FORMAT_01

- -

<nseq>

- -

<seq1 name> <seq1 -length> <seq1>

- -

<seq2 name> <seq2 -length> <seq2>

- -

<seq3 name> <seq3 -length> <seq3>

- -

!Comment

- -

(!Comment)n

- -

#Si1 Si2

- -

Ri1 Ri2 V1 (V2, V3)

- -

#1 2

- -

12 13 99 (12/0 vs 13/1, weight -99)

- -

12 14 70

- -

15 16 56

- -

#1 3

- -

12 13 99

- -

12 14 70

- -

15 16 56

- -

!<space>SEQ_1_TO_N

- -
- -

Si1: index of Sequence 1

- -

Ri1: index of residue 1 in seq1

- -

V1: Integer Value: Weight

- -

V2, V3: optional values

- -

Note 1: There is -a space between the ! And SEQ_1_TO_N

- -

Note 2: The last -line (! SEQ_1_TO_N) indicates that:

- -

Sequences and residues are numbered from 1 to -N, unless the token SEQ_1_TO_N is omitted, in which case the sequences are -numbered from 0 to N-1, and residues are from 1 to N.

- -

Residues do not need to be sorted, and neither -do the sequences. The same pair can appear several times in the library. For -instance, the following file would be legal:

- -
- -

#1 2

- -

12 13 99

- -

#1 2

- -

15 16 99

- -

#1 1

- -

12 14 70

- -
- -

It is also poosible to declare -ranges of resdues rather than single pairs. For instance, the following:

- -
- -

#0 1

- -

+BLOCK+  10 12 14 99

- -

+BLOCK+  15 30 40 99

- -

#0 2

- -

15 16 99

- -

#0 1

- -

12 14 70

- -
- -

The first statement BLOCK -declares a BLOCK of length 10, that starts on position 12 of sequence 1 and -position 14 of sequence 2 and where each pair of residues within the block has -a score of 99. The second BLOCK starts on residue 30 of 1, residue 40 of 2 and -extends for 15 residues.

- -

Blocks can overalp and be -incompatible with one another, just like single constraints.

- -

 

- -

T-COFFEE_LIB_FORMAT_02

- -

A simpler format is being developed, however it -is not yet fully supported and is only mentioned here for development purpose.

- -
- -

! TC_LIB_FORMAT_02

- -

#S1 SEQ1 [OPTIONAL]

- -

#S2 SEQ2 [OPTIONAL]

- -

...

- -

!comment [OPTIONAL]

- -

S1 R1 Ri1 S2 R2 Ri2 V1 (V2 V3)

- -

=> N R1 -Ri1 S2 R2 Ri2 V1 (V2 V3)

- -

...

- -
- -

S1, S2: name of sequence 1 and 2

- -

SEQ1: sequence of S1

- -

Ri1, Ri2: index of the residues in their -respective sequence

- -

R1, R2: Residue type

- -

V1, V2, V3: integer Values (V2 and V3 are -optional)

- -

Value1, Value 2 and Value3 are optional.

- -

Library List

- -

These -are lists of pairs of sequences that must be used to compute a library. The -format is:

- -
- -

<nseq> <S1> <S2>

- -

2 hamg2 globav

- -

3 hamgw hemog singa

- -

...

- -
- -

Substitution -matrices.

- -

If the required substitution matrix is not available, -write your own in a file using the following format:

- -

ClustalW Style [Deprecated]

- -
- -

# CLUSTALW_MATRIX FORMAT

- -

$

- -

v1

- -

v2 v3

- -

v4 v5 v6

- -

...

- -

$

- -
- -

v1, v2... are integers, possibly negatives.

- -

The order of the amino acids is: -ABCDEFGHIKLMNQRSTVWXYZ, which means that v1 is the substitution value for A vs -A, v2 for A vs B, v3 for B vs B, v4 for A vs C and so on.

- -

BLAST Format [Recommended]

- -
- -

# BLAST_MATRIX FORMAT

- -

# ALPHABET=AGCT

- -

 A G C T

- -

A 0 1 2 3

- -

G 0 2 3 4

- -

C 1 1 2 3

- -

...

- -
- -

The alphabet can be freely defined

- -

Sequences Weights

- -

Create your own weight file, using the --seq_weight flag:

- -
- -

# SINGLE_SEQ_WEIGHT_FORMAT_01

- -

seq_name1 v1

- -

seq_name2 v2

- -

...

- -
- -

No duplicate allowed. Sequences not included in -the set of sequences provided to t_coffee will be ignored. Order is free. V1 is -a float. Un-weighted sequences will see their weight set to 1.

- -
-
- - - -

1-Sensitivity to sequence order: It is -difficult to implement a MSA algorithm totally insensitive to the order of -input of the sequences. In t_coffee, robustness is increased by sorting the -sequences alphabetically before aligning them. Beware that this can result in -confusing output where sequences with similar name are unexpectedly close to -one another in the final alignment.

- -

2-Nucleotides sequences with long stretches of -Ns will cause problems to lalign, especially when using Mocca. To avoid any -problem, filter out these nucleotides before running mocca.

- -

3-Stop codons are sometimes coded with '*' in -protein sequences. This will cause the program to crash or hang. Please replace -the '*' signs with an X.

- -

4-Results can differ from one architecture to -another, due rounding differences. This is caused by the tree estimation -procedcure. If you want to make sure an alignment is reproducible, you should -keep the associated dendrogram.

- -

5-Deploying the program on a

- - - -

These notes are only meant for internal -development.

- -

Development

- -

The following examples are only meant for -internal development, and are used to insure stability from release to release

- -

profile2list

- -

prf1: profile containing one structure

- -

prf2: profile containing one structure

- -

 

- -
- -

PROMPT: -t_coffee  -Rsample_profile1.aln,Rsample_profile2.aln -mode=3dcoffee --outfile=aligned_prf.aln

- -

 

- -
- -

Command Line List

- -

These command lines have been checked before -every release (along with the other CL in this documentation:

- -

 

- -

-external methods;

- -
- -

   PROMPT: t_coffee sample_seq1.fasta –in=Mclustalw_pair,Mclustalw_msa,Mslow_pair -–outfile=clustal_text

- -
- -

-fugue_client

- -
- -

PROMPT: -t_coffee –in Ssample_seq5.fasta Pstruc4.pdb Mfugue_pair

- -
- -

-A list of command lines kindly provided by -James Watson (used to crash the pg before version 3.40)

- -
- -

PROMPT: -t_coffee -in Sseq.fas P2PTC Mfugue_pair

- -

PROMPT: -t_coffee -in S2seqs.fas Mfugue_pair -template_file SELF_P_

- -

PROMPT: -t_coffee -mode 3dcoffee -in Sseq.fas P2PTC

- -

PROMPT: -t_coffee -mode 3dcoffee -in S2seqs.fas -template_file SELF_P_

- -
- -

-A list of command lines that crashed the -program before 3.81

- -
- -

PROMPT: -t_coffee sample_seq6.fasta –in Mfast_pair Msap_pair Mfugue_pair –template_file -template_file6.template

- -
- -

            -A -command line to read “relaxed” pdb files...

- -
- -

PROMPT: -t_coffee –in Msap_pair Ssample_seq7.fasta –template_file -template_file7.template –weight 1001 –out_lib test_lib7.tc_lib –lib_only

- -
- -

            -Parsing -of MARNA libraries

- -
- -

PROMPT: -t_coffee –in Lmarna.tc_lib –outfile maran.test

- -
- -

            -Parsing -of long sequence lines:

- -
- -

PROMPT: -t_coffee –in Asample_aln5.aln –outfile test.aln

- -
- -

 

- -
- -

To D

- -
- -

-implement UPGMA tree computation

- -

-implement seq2dpa_tree

- -

-debug dpa

- -

-Reconciliate sequences and template when -reading the template

- -

-Add the server command lines to the checking -procedure

- -

 

- -

 

- -

 

- -

 

- -

 

- -
- - - -