forester/archive/RIO/others/hmmer/squid/Man/shuffle.man

   1 .TH "shuffle" 1 "@RELEASEDATE@" "@PACKAGE@ @RELEASE@" "@PACKAGE@ Manual"
   2
   3 .SH NAME
   4 .TP
   5 shuffle - randomize the sequences in a sequence file
   6
   7 .SH SYNOPSIS
   8 .B shuffle
   9 .I [options]
  10 .I seqfile
  11
  12 .SH DESCRIPTION
  13
  14 .B shuffle
  15 reads a sequence file
  16 .I seqfile,
  17 randomizes each sequence, and prints the randomized sequences
  18 in FASTA format on standard output. The sequence names
  19 are unchanged; this allows you to track down the source
  20 of each randomized sequence if necessary.
  21
  22 .pp
  23 The default is to simply shuffle each input sequence, preserving
  24 monosymbol composition exactly. To shuffle
  25 each sequence while preserving both its monosymbol and disymbol
  26 composition exactly, use the
  27 .I -d
  28 option.
  29
  30 .pp
  31 The
  32 .I -0
  33 and
  34 .I -1
  35 options allow you to generate sequences with the same
  36 Markov properties as each input sequence. With
  37 .I -0,
  38 for each input sequence, 0th order Markov statistics
  39 are collected (e.g. symbol composition), and a new
  40 sequence is generated with the same composition.
  41 With
  42 .I -1,
  43 the generated sequence has the same 1st order
  44 Markov properties as the input sequence (e.g.
  45 the same disymbol frequencies).
  46
  47 .pp
  48 Note that the default and
  49 .I -0,
  50 or
  51 .I -d
  52 and
  53 .I -1,
  54 are similar; the shuffling algorithms preserve
  55 composition exactly, while the Markov algorithms
  56 only expect to generate a sequence of similar
  57 composition on average.
  58
  59 .pp
  60 Other shuffling algorithms are also available,
  61 as documented below in the options.
  62
  63 .SH OPTIONS
  64
  65 .TP
  66 .B -0
  67 Calculate 0th order Markov frequencies of each input sequence
  68 (e.g. residue composition); generate output sequence
  69 using the same 0th order Markov frequencies.
  70
  71 .TP
  72 .B -1
  73 Calculate 1st order Markov frequencies for each input
  74 sequence (e.g. diresidue composition); generate output
  75 sequence using the same 1st order Markov frequencies.
  76 The first residue of the output sequence is always
  77 the same as the first residue of the input sequence.
  78
  79 .TP
  80 .B -d
  81 Shuffle the input sequence while preserving both
  82 monosymbol and disymbol composition exactly. Uses
  83 an algorithm published by  S.F. Altschul and B.W. Erickson,
  84 Mol. Biol. Evol. 2:526-538, 1985.
  85
  86 .TP
  87 .B -h
  88 Print brief help; includes version number and summary of
  89 all options, including expert options.
  90
  91 .TP
  92 .B -l
  93 Look only at the length of each input sequence; generate
  94 an i.i.d. output protein sequence of that length,
  95 using monoresidue frequencies typical of proteins
  96 (taken from Swissprot 35).
  97
  98 .TP
  99 .BI -n " <n>"
 100 Make
 101 .I <n>
 102 different randomizations of each input sequence in
 103 .I seqfile,
 104 rather than the default of one.
 105
 106 .TP
 107 .B -r
 108 Generate the output sequence by reversing the
 109 input sequence. (Therefore only one "randomization"
 110 per input sequence is possible, so it's
 111 not worth using
 112 .I -n
 113 if you use reversal.)
 114
 115 .TP
 116 .BI -t " <n>"
 117 Truncate each input sequence to a fixed length of exactly
 118 .I <n>
 119 residues. If the input sequence is shorter than
 120 .I <n>
 121 it is discarded (therefore the output file may contain
 122 fewer sequences than the input file).
 123 If the input sequence is longer than
 124 .I <n>
 125 a contiguous subsequence is randomly chosen.
 126
 127 .TP
 128 .BI -w " <n>"
 129 Regionally shuffle each input sequence in window sizes of
 130 .I <n>,
 131 preserving local residue composition in each window.
 132 Probably a better shuffling algorithm for biosequences
 133 with nonstationary residue composition (e.g. composition
 134 that is varying along the sequence, such as between
 135 different isochores in human genome sequence).
 136
 137 .TP
 138 .B -B
 139 (Babelfish). Autodetect and read a sequence file format other than the
 140 default (FASTA). Almost any common sequence file format is recognized
 141 (including Genbank, EMBL, SWISS-PROT, PIR, and GCG unaligned sequence
 142 formats, and Stockholm, GCG MSF, and Clustal alignment formats). See
 143 the printed documentation for a complete list of supported formats.
 144
 145 .SH EXPERT OPTIONS
 146
 147 .TP
 148 .BI --informat " <s>"
 149 Specify that the sequence file is in format
 150 .I <s>,
 151 rather than the default FASTA format.
 152 Common examples include Genbank, EMBL, GCG,
 153 PIR, Stockholm, Clustal, MSF, or PHYLIP;
 154 see the printed documentation for a complete list
 155 of accepted format names.
 156 This option overrides the default expected format (FASTA)
 157 and the
 158 .I -B
 159 Babelfish autodetection option.
 160
 161 .TP
 162 .B --nodesc
 163 Do not output any sequence description in the output file,
 164 only the sequence names.
 165
 166 .TP
 167 .BI --seed " <s>"
 168 Set the random number seed to
 169 .I <s>.
 170 If you want reproducible results, use the same seed each time.
 171 By default,
 172 .B shuffle
 173 uses a different seed each time, so does not generate
 174 the same output in subsequent runs with the same input.
 175
 176 .SH SEE ALSO
 177
 178 .PP
 179 @SEEALSO@
 180
 181 .SH AUTHOR
 182
 183 @PACKAGE@ and its documentation is @COPYRIGHT@
 184 HMMER - Biological sequence analysis with profile HMMs
 185 Copyright (C) 1992-1999 Washington University School of Medicine
 186 All Rights Reserved
 187
 188     This source code is distributed under the terms of the
 189     GNU General Public License. See the files COPYING and LICENSE
 190     for details.
 191 See COPYING in the source code distribution for more details, or contact me.
 192
 193 .nf
 194 Sean Eddy
 195 Dept. of Genetics
 196 Washington Univ. School of Medicine
 197 4566 Scott Ave.
 198 St Louis, MO 63110 USA
 199 Phone: 1-314-362-7666
 200 FAX  : 1-314-362-7855
 201 Email: eddy@genetics.wustl.edu
 202 .fi
 203
 204