binaries/src/ViennaRNA/Cluster/AnalyseSeqs.1

   1 .TH ANALYSESEQS l
   2 .ER
   3 .SH NAME
   4 AnalyseSeqs \- Analyse a set of sequences of common length
   5 .SH SYNOPSIS
   6 \fBAnalyseSeqs [\-X[\fIbswn\fP]] [\-Q] [\-M{mask}[+|!]] [\-D{H|A|G}] [\-d{S|H|D|B}]
   7 .SH DESCRIPTION
   8 .I AnalyseSeqs
   9 reads a set of sequences from stdin and tries a variety of methods
  10 for sequence analysis on them. Currently available are:
  11 .br
  12 Statistical geometry for quadruples of sequences; THIS IS
  13 PRELIMINARY AND NOT WELL TESTED BY NOW.
  14 .br
  15 split decomposition;
  16 neighbour joining and Ward's variance method for reconstructing
  17 phylogenies using various distance measures.
  18 For statistical geometry and the cluster methods PostScript output
  19 is available.
  20 .br
  21 The program continues reading until it encounters one of the
  22 separator characters '@' or '%'. Only sequences of alphabetical
  23 characters or of a specified alphabet are processed, all other
  24 lines are ignored. The program stops reading
  25 if it either encounters an EOF condition, or if there are no
  26 valid sequence data between two lines beginning with separator
  27 characters.
  28 .br
  29 A list of taxa names can be specified in the input stream. The list
  30 begins with a line beginning with '*'. Optionally, a file name prefix
  31 [fn] for the PostScript output can be specified in this line.
  32 The entries have the form 'x : Taxon',
  33 where x is the number of taxon, i.e., of the corresponding entry in
  34 the list of input sequences. The taxa list need not be complete. It must
  35 end, however, with a line beginning with '*' or any of the separator
  36 characters. The taxa list is printed on top of the output. The specified
  37 taxa names are used as labels in the PostScript output.
  38
  39 .SH OPTIONS
  40 .IP \fB\-X[bswn]\fI\fP
  41 specifies the analysis methods to be used.
  42 .IP \fB[b]\fI\fP
  43 Statistical Geometry. A PostScript file named '[fn_]box.ps' giving a
  44 graphical representation of the statistical geometry is created. The
  45 resulting box is a good measure of 'tree likeness' of the data set.
  46 This is the default.
  47 .IP \fB[s]\fI\fP
  48 Split decomposition.
  49 .IP \fB[w]\fI\fP
  50 Cluster analysis using Ward's method. A PostScript file named '[fn_]wards.ps'
  51 is created containing a drawing of the tree.
  52 .IP \fB[n]\fI\fP
  53 Cluster analysis using Saitou's neighbour joining method. A PostScript
  54 file named '[fn_]nj.ps' is created containing a drawing of the tree.
  55
  56 .IP \fB\-Q\fB
  57 indicates that a statistical geometry analysis is to be performed
  58 comparing four data sets, for instance to confirm the significance of
  59 a proposed phylogeny. This option is only useful for statistical
  60 geometry analysis and hence the -X option is ignored. Each of the
  61 four data sets must be of the form
  62 .br
  63 * [filename_prefix]
  64 .br
  65 # number
  66 .br
  67 [list of taxa names]
  68 .br
  69 *
  70 .br
  71 list of sequences
  72 .br
  73 %
  74 .br
  75 where number is 1,2,3,4 for the four groups to be compared.
  76
  77 .IP \fB\-M{mask}[+|!]\fB
  78 allows to specify a mask for the input file. '{mask}' can be one
  79 of the following letters indicating a predefined alphabet or
  80 the %-sign followed by all characters to be accepted. A + sign
  81 at the very end of the mask indicates that the input is to be
  82 handled case sensitive. Default is conversion of the input to
  83 upper case. A ! sign can be used to convert the input data to
  84 RY code: GgAaXx -> R, UuCcKkTt -> Y, all other letters are
  85 converted to *.
  86 .IP \fB-Ma\fI\fP
  87 all letters A-Z and a-z.
  88 .IP \fB-Mu\fI\fP
  89 uppercase letters.
  90 .IP \fB-Ml\fI\fP
  91 lowercase letters.
  92 .IP \fB-Mc\fI\fP
  93 digits [0-9].
  94 .IP \fB-Mn\fI\fP
  95 all alphanumeric characters.
  96 .IP \fB-MR\fI\fP
  97 RNA alphabet (GCAUgcau).
  98 .IP \fB-MD\fI\fP
  99 DNA alphabet (GCATgcat).
 100 .IP \fB-MA\fI\fP
 101 Amino acids in one-letter code.
 102 .IP \fB-MS\fI\fP
 103 Secondary strcutures coded as '^.()'
 104 .IP \fB-M%alphabet\fI\fP
 105 use the specified alphabet.
 106
 107 .IP \fB\-D\fB
 108 specifies the algorithm to be used for calculating the
 109 distance matrix of the input data set. Available are
 110 .IP \fB-DH\fI\fP
 111 Hamming Distance
 112 .IP \fB-DA[,cost]\fI\fB
 113 Simple alignment distance according to Needleman and Wunsch.
 114 A gap cost different from 1. can be specified after the comma.
 115 .IP \fB-DG[,cost1,cost2]\fI\fB
 116 Gotoh's distance with gap cost function
 117 g(k) = cost2+cost1*(k-1). cost2<=cost1 has to be fulfilled.
 118 Default values are cost1=1., cost2=1., yielding the same
 119 distance as option A.
 120 .br
 121 ONLY THE HAMMING DISTANCE IS WELL TESTED BY NOW !!!
 122
 123 .IP \fB\-d\fB
 124 specifies the edit cost matrix to be used. Available are
 125 .IP \fB-dS\fI\fP
 126 simple distance. Indel and substitution of different characters
 127 all have cost 1. The indel cost can be set by specifying the
 128 gap costs with the algorithm options -DA and -DG. This is the
 129 default.
 130 .IP \fB-dH\fI\fP
 131 A distance matrix for RNA secondary structures. Inspired by
 132 Hogeweg's similarity measure (J.Mol.Biol 1988).
 133 Gap-function is set automatically.
 134 .IP \fB-dD\fI\fP
 135 Dayhoff's matrix for amino acid distances.
 136 .IP \fB-dB\fI\fP
 137 Distinguish purines and pyrimidines only.
 138 CAUTION this option of course influences only the calculation of distances.
 139 It does NOT affect computation of the statistical geometry. This is
 140 done directly on the sequences. If you want to do statistical geometry on
 141 RY sequences use the ! sign with the -M option, for instance -MR!.
 142
 143 .SH REFERENCES
 144 The method of statistical geometry has been introduced by
 145 M. Eigen, R. Winkler-Oswatitsch and A.W.M. Dress
 146 (Proc Natl Acad Sci, 85:1988,5912).
 147 The method of split decomposition was proposed by
 148 H.J. Bandelt and A.W.M. Dress
 149 (Adv Math, 92:1992,47).
 150 The variance method for cluster analysis is due to H.J. Ward
 151 (J Amer Stat Ass, 58:1963,236).
 152 The neighbour joining method was published by Saitou and Nei
 153 (Mol Biol Evol, 4:1987,406).
 154
 155 This program is part of the Vienna RNA Package
 156
 157 .SH WARNING
 158 This is the beta test version. Some options or combinations
 159 of options may still produce nonsense. Please send bug reports to
 160 ivo@tbi.univie.ac.at.
 161
 162 .SH VERSION
 163 This man page is part of the Vienna RNA Package version 1.2.
 164 .SH AUTHOR
 165 Peter F Stadler, Ivo L. Hofacker.
 166 .SH BUGS
 167 Comments should be sent to ivo@itc.univie.ac.at.
 168 .br