forester/archive/RIO/others/hmmer/documentation/man/hmmpfam.man

   1 .TH "hmmpfam" 1 @RELEASEDATE@ "HMMER @RELEASE@" "HMMER Manual"
   2
   3 .SH NAME
   4 .TP
   5 hmmpfam - search one or more sequences against an HMM database
   6
   7 .SH SYNOPSIS
   8 .B hmmpfam
   9 .I [options]
  10 .I hmmfile
  11 .I seqfile
  12
  13 .SH DESCRIPTION
  14
  15 .B hmmpfam
  16 reads a sequence file
  17 .I seqfile
  18 and compares each sequence in it, one at a time, against all the HMMs in
  19 .I hmmfile
  20 looking for significantly similar sequence matches.
  21
  22 .PP
  23 .I hmmfile
  24 will be looked for first in the current working directory,
  25 then in a directory named by the environment variable
  26 .I HMMERDB.
  27 This lets administrators install HMM library(s) such as
  28 Pfam in a common location.
  29
  30 .PP
  31 There is a separate output report for each sequence in
  32 .I seqfile.
  33 This report consists of three sections: a ranked list
  34 of the best scoring HMMs, a list of the
  35 best scoring domains in order of their occurrence
  36 in the sequence, and alignments for all the best scoring
  37 domains.
  38 A sequence score may be higher than a domain score for
  39 the same sequence if there is more than one domain in the sequence;
  40 the sequence score takes into account all the domains.
  41 All sequences scoring above the
  42 .I -E
  43 and
  44 .I -T
  45 cutoffs are shown in the first list, then
  46 .I every
  47 domain found in this list is
  48 shown in the second list of domain hits.
  49 If desired, E-value and bit score thresholds may also be applied
  50 to the domain list using the
  51 .I --domE
  52 and
  53 .I --domT
  54 options.
  55
  56 .SH OPTIONS
  57
  58 .TP
  59 .B -h
  60 Print brief help; includes version number and summary of
  61 all options, including expert options.
  62
  63 .TP
  64 .B -n
  65 Specify that models and sequence are nucleic acid, not protein.
  66 Other HMMER programs autodetect this; but because of the order in
  67 which
  68 .B hmmpfam
  69 accesses data, it can't reliably determine the correct "alphabet"
  70 by itself.
  71
  72 .TP
  73 .BI -A " <n>"
  74 Limits the alignment output to the
  75 .I <n>
  76 best scoring domains.
  77 .B -A0
  78 shuts off the alignment output and can be used to reduce
  79 the size of output files.
  80
  81 .TP
  82 .BI -E " <x>"
  83 Set the E-value cutoff for the per-sequence ranked hit list to
  84 .I <x>,
  85 where
  86 .I <x>
  87 is a positive real number. The default is 10.0. Hits with E-values
  88 better than (less than) this threshold will be shown.
  89
  90 .TP
  91 .BI -T " <x>"
  92 Set the bit score cutoff for the per-sequence ranked hit list to
  93 .I <x>,
  94 where
  95 .I <x>
  96 is a real number.
  97 The default is negative infinity; by default, the threshold
  98 is controlled by E-value and not by bit score.
  99 Hits with bit scores better than (greater than) this threshold
 100 will be shown.
 101
 102 .TP
 103 .BI -Z " <n>"
 104 Calculate the E-value scores as if we had seen a sequence database of
 105 .I <n>
 106 sequences. The default is arbitrarily set to 59021, the size of
 107 Swissprot 34.
 108
 109 .SH EXPERT OPTIONS
 110
 111 .TP
 112 .B --acc
 113 Report HMM accessions instead of names in the output reports.
 114 Useful for high-throughput annotation, where the data are being
 115 parsed for storage in a relational database.
 116
 117 .TP
 118 .B --compat
 119 Use the output format of HMMER 2.1.1, the 1998-2001 public
 120 release; provided so 2.1.1 parsers don't have to be rewritten.
 121
 122 .TP
 123 .BI --cpu " <n>"
 124 Sets the maximum number of CPUs that the program
 125 will run on. The default is to use all CPUs
 126 in the machine. Overrides the HMMER_NCPU
 127 environment variable. Only affects threaded
 128 versions of HMMER (the default on most systems).
 129
 130 .TP
 131 .B --cut_ga
 132 Use Pfam GA (gathering threshold) score cutoffs.
 133 Equivalent
 134 to --globT <GA1> --domT <GA2>, but the GA1 and GA2 cutoffs
 135 are read from each HMM in
 136 .I hmmfile
 137 individually. hmmbuild puts these cutoffs there
 138 if the alignment file was annotated in a Pfam-friendly
 139 alignment format (extended SELEX or Stockholm format) and
 140 the optional GA annotation line was present. If these
 141 cutoffs are not set in the HMM file,
 142 .B --cut_ga
 143 doesn't work.
 144
 145 .TP
 146 .B --cut_tc
 147 Use Pfam TC (trusted cutoff) score cutoffs. Equivalent
 148 to --globT <TC1> --domT <TC2>, but the TC1 and TC2 cutoffs
 149 are read from each HMM in
 150 .I hmmfile
 151 individually. hmmbuild puts these cutoffs there
 152 if the alignment file was annotated in a Pfam-friendly
 153 alignment format (extended SELEX or Stockholm format) and
 154 the optional TC annotation line was present. If these
 155 cutoffs are not set in the HMM file,
 156 .B --cut_tc
 157 doesn't work.
 158
 159 .TP
 160 .B --cut_nc
 161 Use Pfam NC (noise cutoff) score cutoffs. Equivalent
 162 to --globT <NC1> --domT <NC2>, but the NC1 and NC2 cutoffs
 163 are read from each HMM in
 164 .I hmmfile
 165 individually. hmmbuild puts these cutoffs there
 166 if the alignment file was annotated in a Pfam-friendly
 167 alignment format (extended SELEX or Stockholm format) and
 168 the optional NC annotation line was present. If these
 169 cutoffs are not set in the HMM file,
 170 .B --cut_nc
 171 doesn't work.
 172
 173 .TP
 174 .BI --domE " <x>"
 175 Set the E-value cutoff for the per-domain ranked hit list to
 176 .I <x>,
 177 where
 178 .I <x>
 179 is a positive real number.
 180 The default is infinity; by default, all domains in the sequences
 181 that passed the first threshold will be reported in the second list,
 182 so that the number of domains reported in the per-sequence list is
 183 consistent with the number that appear in the per-domain list.
 184
 185 .TP
 186 .BI --domT " <x>"
 187 Set the bit score cutoff for the per-domain ranked hit list to
 188 .I <x>,
 189 where
 190 .I <x>
 191 is a real number. The default is negative infinity;
 192 by default, all domains in the sequences
 193 that passed the first threshold will be reported in the second list,
 194 so that the number of domains reported in the per-sequence list is
 195 consistent with the number that appear in the per-domain list.
 196 .I Important note:
 197 only one domain in a sequence is absolutely controlled by this
 198 parameter, or by
 199 .B --domT.
 200 The second and subsequent domains in a sequence have a de facto
 201 bit score threshold of 0 because of the details of how HMMER
 202 works. HMMER requires at least one pass through the main model
 203 per sequence; to do more than one pass (more than one domain)
 204 the multidomain alignment must have a better score than the
 205 single domain alignment, and hence the extra domains must contribute
 206 positive score. See the Users' Guide for more detail.
 207
 208 .TP
 209 .BI --forward
 210 Use the Forward algorithm instead of the Viterbi algorithm
 211 to determine the per-sequence scores. Per-domain scores are
 212 still determined by the Viterbi algorithm. Some have argued that
 213 Forward is a more sensitive algorithm for detecting remote
 214 sequence homologues; my experiments with HMMER have not
 215 confirmed this, however.
 216
 217 .TP
 218 .BI --informat " <s>"
 219 Assert that the input
 220 .I seqfile
 221 is in format
 222 .I <s>;
 223 do not run Babelfish format autodection. This increases
 224 the reliability of the program somewhat, because
 225 the Babelfish can make mistakes; particularly
 226 recommended for unattended, high-throughput runs
 227 of HMMER. Valid format strings include FASTA,
 228 GENBANK, EMBL, GCG, PIR, STOCKHOLM, SELEX, MSF,
 229 CLUSTAL, and PHYLIP. See the User's Guide for a complete
 230 list.
 231
 232 .TP
 233 .B --null2
 234 Turn off the post hoc second null model. By default, each alignment
 235 is rescored by a postprocessing step that takes into account possible
 236 biased composition in either the HMM or the target sequence.
 237 This is almost essential in database searches, especially with
 238 local alignment models. There is a very small chance that this
 239 postprocessing might remove real matches, and
 240 in these cases
 241 .B --null2
 242 may improve sensitivity at the expense of reducing
 243 specificity by letting biased composition hits through.
 244
 245 .TP
 246 .B --pvm
 247 Run on a Parallel Virtual Machine (PVM). The PVM must
 248 already be running. The client program
 249 .B hmmpfam-pvm
 250 must be installed on all the PVM nodes.
 251 The HMM database
 252 .I hmmfile
 253 and an associated GSI index file
 254 .IR hmmfile. gsi
 255 must also be installed on all the PVM nodes.
 256 (The GSI index is produced by the program
 257 .BR hmmindex .)
 258 Because the PVM implementation is I/O bound,
 259 it is highly recommended that each node have a
 260 local copy of
 261 .I hmmfile
 262 rather than NFS mounting a shared copy.
 263 Optional PVM support must have been compiled into
 264 HMMER for
 265 .B --pvm
 266 to function.
 267
 268 .TP
 269 .B --xnu
 270 Turn on XNU filtering of target protein sequences. Has no effect
 271 on nucleic acid sequences. In trial experiments,
 272 .B --xnu
 273 appears to perform less well than the default
 274 post hoc null2 model.
 275
 276
 277
 278 .SH SEE ALSO
 279
 280 .PP
 281 Master man page, with full list of and guide to the individual man
 282 pages: see
 283 .B hmmer(1).
 284 .PP
 285 A User guide and tutorial came with the distribution:
 286 .B Userguide.ps
 287 [Postscript] and/or
 288 .B Userguide.pdf
 289 [PDF].
 290 .PP
 291 Finally, all documentation is also available online via WWW:
 292 .B http://hmmer.wustl.edu/
 293
 294 .SH AUTHOR
 295
 296 This software and documentation is:
 297 .nf
 298 @COPYRIGHT@
 299 HMMER - Biological sequence analysis with profile HMMs
 300 Copyright (C) 1992-1999 Washington University School of Medicine
 301 All Rights Reserved
 302
 303     This source code is distributed under the terms of the
 304     GNU General Public License. See the files COPYING and LICENSE
 305     for details.
 306 .fi
 307 See the file COPYING in your distribution for complete details.
 308
 309 .nf
 310 Sean Eddy
 311 HHMI/Dept. of Genetics
 312 Washington Univ. School of Medicine
 313 4566 Scott Ave.
 314 St Louis, MO 63110 USA
 315 Phone: 1-314-362-7666
 316 FAX  : 1-314-362-7855
 317 Email: eddy@genetics.wustl.edu
 318 .fi
 319
 320