1 .TH "shuffle" 1 "@RELEASEDATE@" "@PACKAGE@ @RELEASE@" "@PACKAGE@ Manual"
5 shuffle - randomize the sequences in a sequence file
17 randomizes each sequence, and prints the randomized sequences
18 in FASTA format on standard output. The sequence names
19 are unchanged; this allows you to track down the source
20 of each randomized sequence if necessary.
23 The default is to simply shuffle each input sequence, preserving
24 monosymbol composition exactly. To shuffle
25 each sequence while preserving both its monosymbol and disymbol
26 composition exactly, use the
35 options allow you to generate sequences with the same
36 Markov properties as each input sequence. With
38 for each input sequence, 0th order Markov statistics
39 are collected (e.g. symbol composition), and a new
40 sequence is generated with the same composition.
43 the generated sequence has the same 1st order
44 Markov properties as the input sequence (e.g.
45 the same disymbol frequencies).
48 Note that the default and
54 are similar; the shuffling algorithms preserve
55 composition exactly, while the Markov algorithms
56 only expect to generate a sequence of similar
57 composition on average.
60 Other shuffling algorithms are also available,
61 as documented below in the options.
67 Calculate 0th order Markov frequencies of each input sequence
68 (e.g. residue composition); generate output sequence
69 using the same 0th order Markov frequencies.
73 Calculate 1st order Markov frequencies for each input
74 sequence (e.g. diresidue composition); generate output
75 sequence using the same 1st order Markov frequencies.
76 The first residue of the output sequence is always
77 the same as the first residue of the input sequence.
81 Shuffle the input sequence while preserving both
82 monosymbol and disymbol composition exactly. Uses
83 an algorithm published by S.F. Altschul and B.W. Erickson,
84 Mol. Biol. Evol. 2:526-538, 1985.
88 Print brief help; includes version number and summary of
89 all options, including expert options.
93 Look only at the length of each input sequence; generate
94 an i.i.d. output protein sequence of that length,
95 using monoresidue frequencies typical of proteins
96 (taken from Swissprot 35).
102 different randomizations of each input sequence in
104 rather than the default of one.
108 Generate the output sequence by reversing the
109 input sequence. (Therefore only one "randomization"
110 per input sequence is possible, so it's
113 if you use reversal.)
117 Truncate each input sequence to a fixed length of exactly
119 residues. If the input sequence is shorter than
121 it is discarded (therefore the output file may contain
122 fewer sequences than the input file).
123 If the input sequence is longer than
125 a contiguous subsequence is randomly chosen.
129 Regionally shuffle each input sequence in window sizes of
131 preserving local residue composition in each window.
132 Probably a better shuffling algorithm for biosequences
133 with nonstationary residue composition (e.g. composition
134 that is varying along the sequence, such as between
135 different isochores in human genome sequence).
139 (Babelfish). Autodetect and read a sequence file format other than the
140 default (FASTA). Almost any common sequence file format is recognized
141 (including Genbank, EMBL, SWISS-PROT, PIR, and GCG unaligned sequence
142 formats, and Stockholm, GCG MSF, and Clustal alignment formats). See
143 the printed documentation for a complete list of supported formats.
148 .BI --informat " <s>"
149 Specify that the sequence file is in format
151 rather than the default FASTA format.
152 Common examples include Genbank, EMBL, GCG,
153 PIR, Stockholm, Clustal, MSF, or PHYLIP;
154 see the printed documentation for a complete list
155 of accepted format names.
156 This option overrides the default expected format (FASTA)
159 Babelfish autodetection option.
163 Do not output any sequence description in the output file,
164 only the sequence names.
168 Set the random number seed to
170 If you want reproducible results, use the same seed each time.
173 uses a different seed each time, so does not generate
174 the same output in subsequent runs with the same input.
183 @PACKAGE@ and its documentation is @COPYRIGHT@
184 HMMER - Biological sequence analysis with profile HMMs
185 Copyright (C) 1992-1999 Washington University School of Medicine
188 This source code is distributed under the terms of the
189 GNU General Public License. See the files COPYING and LICENSE
191 See COPYING in the source code distribution for more details, or contact me.
196 Washington Univ. School of Medicine
198 St Louis, MO 63110 USA
199 Phone: 1-314-362-7666
201 Email: eddy@genetics.wustl.edu