forester/archive/RIO/others/hmmer/squid/Man/sreformat.man

   1 .TH "sreformat" 1 "@RELEASEDATE@" "@PACKAGE@ @RELEASE@" "@PACKAGE@ Manual"
   2
   3 .SH NAME
   4 .TP
   5 sreformat - convert sequence file to different format
   6
   7 .SH SYNOPSIS
   8 .B sreformat
   9 .I [options]
  10 .I format
  11 .I seqfile
  12
  13 .SH DESCRIPTION
  14
  15 .B sreformat
  16 reads the sequence file
  17 .I seqfile
  18 in any supported format, reformats it
  19 into a new format specified by
  20 .I format,
  21 then prints the reformatted text.
  22
  23 .PP
  24 Supported input formats include (but are not limited to) the unaligned
  25 formats FASTA, Genbank, EMBL, SWISS-PROT, PIR, and GCG, and the
  26 aligned formats Stockholm, Clustal, GCG MSF, and Phylip.
  27
  28 .PP
  29 Available unaligned output file format codes
  30 include
  31 .I fasta
  32 (FASTA format);
  33 .I embl
  34 (EMBL/SWISSPROT format);
  35 .I genbank
  36 (Genbank format);
  37 .I gcg
  38 (GCG single sequence format);
  39 .I gcgdata
  40 (GCG flatfile database format);
  41 .I strider
  42 (MacStrider format);
  43 .I zuker
  44 (Zuker MFOLD format);
  45 .I ig
  46 (Intelligenetics format);
  47 .I pir
  48 (PIR/CODATA flatfile format);
  49 .I squid
  50 (an undocumented St. Louis format);
  51 .I raw
  52 (raw sequence, no other information).
  53
  54 .pp
  55 The available aligned output file format
  56 codes include
  57 .I stockholm
  58 (PFAM/Stockholm format);
  59 .I msf
  60 (GCG MSF format);
  61 .I a2m
  62 (aligned FASTA format, called A2M by the UC Santa Cruz
  63 HMM group);
  64 .I PHYLIP
  65 (Felsenstein's PHYLIP format); and
  66 .I selex
  67 (old SELEX/HMMER/Pfam annotated alignment format);
  68
  69 .pp
  70 All thee codes are interpreted case-insensitively
  71 (e.g. MSF, Msf, or msf all work).
  72
  73 .PP
  74 Unaligned format files cannot be reformatted to
  75 aligned formats.
  76 However, aligned formats can be reformatted
  77 to unaligned formats -- gap characters are
  78 simply stripped out.
  79
  80 .PP
  81 This program was originally named
  82 .B reformat,
  83 but that name clashes with a GCG program of the same name.
  84
  85 .SH OPTIONS
  86
  87 .TP
  88 .B -a
  89 Enable alignment reformatting. By default, sreformat expects
  90 that the input file should be handled as an unaligned input
  91 file (even if it is an alignment), and it will not allow you
  92 to convert an unaligned file to an alignment (for obvious
  93 reasons).
  94 .pp
  95 This may seem silly; surely if sreformat can autodetect and parse
  96 alignment file formats as input, it can figure out when it's got an
  97 alignment! There are two reasons.  One is just the historical
  98 structure of the code. The other is that FASTA unaligned format and
  99 A2M aligned format (aligned FASTA) are impossible to tell apart with
 100 100% confidence.
 101
 102 .TP
 103 .B -d
 104 DNA; convert U's to T's, to make sure a nucleic acid
 105 sequence is shown as DNA not RNA. See
 106 .B -r.
 107
 108 .TP
 109 .B -h
 110 Print brief help; includes version number and summary of
 111 all options, including expert options.
 112
 113 .TP
 114 .B -l
 115 Lowercase; convert all sequence residues to lower case.
 116 See
 117 .B -u.
 118
 119 .TP
 120 .B -r
 121 RNA; convert T's to U's, to make sure a nucleic acid
 122 sequence is shown as RNA not DNA. See
 123 .B -d.
 124
 125 .TP
 126 .B -u
 127 Uppercase; convert all sequence residues to upper case.
 128 See
 129 .B -l.
 130
 131 .TP
 132 .B -x
 133 For DNA sequences, convert non-IUPAC characters (such as X's) to N's.
 134 This is for compatibility with benighted people who insist on using X
 135 instead of the IUPAC ambiguity character N. (X is for ambiguity
 136 in an amino acid residue).
 137 .pp
 138 Warning: the code doesn't
 139 check that you are actually giving it DNA. It simply
 140 literally just converts non-IUPAC DNA symbols to N. So
 141 if you accidentally give it protein sequence, it will
 142 happily convert most every amino acid residue to an N.
 143
 144 .TP
 145 .B -B
 146 (Babelfish). Autodetect and read a sequence file format other than the
 147 default (FASTA). Almost any common sequence file format is recognized
 148 (including Genbank, EMBL, SWISS-PROT, PIR, and GCG unaligned sequence
 149 formats, and Stockholm, GCG MSF, and Clustal alignment formats). See
 150 the printed documentation for a complete list of supported formats.
 151
 152
 153 .SH EXPERT OPTIONS
 154
 155 .TP
 156 .BI --informat " <s>"
 157 Specify that the sequence file is in format
 158 .I <s>,
 159 rather than the default FASTA format.
 160 Common examples include Genbank, EMBL, GCG,
 161 PIR, Stockholm, Clustal, MSF, or PHYLIP;
 162 see the printed documentation for a complete list
 163 of accepted format names.
 164 This option overrides the default format (FASTA)
 165 and the
 166 .I -B
 167 Babelfish autodetection option.
 168
 169 .TP
 170 .B --mingap
 171 If
 172 .I seqfile
 173 is an alignment, remove any columns that contain 100% gap
 174 characters, minimizing the overall length of the alignment.
 175 (Often useful if you've extracted a subset of aligned
 176 sequences from a larger alignment.)
 177
 178 .TP
 179 .B --pfam
 180 For SELEX alignment output format only, put the entire
 181 alignment in one block (don't wrap into multiple blocks).
 182 This is close to the format used internally by Pfam
 183 in Stockholm and Cambridge.
 184
 185 .TP
 186 .B --sam
 187 Try to convert gap characters to UC Santa Cruz SAM style, where a .
 188 means a gap in an insert column, and a - means a
 189 deletion in a consensus/match column. This only
 190 works for converting aligned file formats, and only
 191 if the alignment already adheres to the SAM convention
 192 of upper case for residues in consensus/match columns,
 193 and lower case for residues in insert columns. This is
 194 true, for instance, of all alignments produced by old
 195 versions of HMMER. (HMMER2 produces alignments
 196 that adhere to SAM's conventions even in gap character choice.)
 197 This option was added to allow Pfam alignments to be
 198 reformatted into something more suitable for profile HMM
 199 construction using the UCSC SAM software.
 200
 201 .TP
 202 .BI --samfrac " <x>"
 203 Try to convert the alignment gap characters and
 204 residue cases to UC Santa Cruz SAM style, where a .
 205 means a gap in an insert column and a - means a
 206 deletion in a consensus/match column, and
 207 upper case means match/consensus residues and
 208 lower case means inserted resiudes. This will only
 209 work for converting aligned file formats, but unlike the
 210 .B --sam
 211 option, it will work regardless of whether the file adheres
 212 to the upper/lower case residue convention. Instead, any
 213 column containing more than a fraction
 214 .I <x>
 215 of gap characters is interpreted as an insert column,
 216 and all other columns are interpreted as match columns.
 217 This option was added to allow Pfam alignments to be
 218 reformatted into something more suitable for profile HMM
 219 construction using the UCSC SAM software.
 220
 221 .SH SEE ALSO
 222
 223 .PP
 224 @SEEALSO@
 225
 226 .SH AUTHOR
 227
 228 @PACKAGE@ and its documentation is @COPYRIGHT@
 229 HMMER - Biological sequence analysis with profile HMMs
 230 Copyright (C) 1992-1999 Washington University School of Medicine
 231 All Rights Reserved
 232
 233     This source code is distributed under the terms of the
 234     GNU General Public License. See the files COPYING and LICENSE
 235     for details.
 236 See COPYING in the source code distribution for more details, or contact me.
 237
 238 .nf
 239 Sean Eddy
 240 Dept. of Genetics
 241 Washington Univ. School of Medicine
 242 4566 Scott Ave.
 243 St Louis, MO 63110 USA
 244 Phone: 1-314-362-7666
 245 FAX  : 1-314-362-7855
 246 Email: eddy@genetics.wustl.edu
 247 .fi
 248
 249