1 HMMER 2.2 release notes
2 http://hmmer.wustl.edu/
3 SRE, Fri May 4 13:00:33 2001
4 ---------------------------------------------------------------
6 As it has been more than 2 years since the last HMMER release, this is
7 unlikely to be a comprehensive list of changes.
9 HMMER is now maintained under CVS. Anonymous read-only access to the
10 development code is permitted. To download the current snapshot:
11 > setenv CVSROOT :pserver:anonymous@skynet.wustl.edu:/repository/sre
13 [password is "anonymous"]
19 The following programs were added to the distribution:
21 - The program "afetch" can fetch an alignment from
22 a Stockholm format multiple alignment database (e.g. Pfam).
23 "afetch --index" creates the index files for such
26 - The program "shuffle" makes "randomized" sequences.
27 It supports a variety of sequence randomization methods,
28 including an implementation of Altschul/Erickson's
29 shuffling-while-preserving-digram-composition algorithm.
31 - The program "sindex" creates SSI indices from sequence
32 files, that "sfetch" can use to rapidly retrieve sequences
33 from databases. Previously, index files were constructed
34 with Perl scripts that were not supported as part of the
37 The following features were added:
39 - hmmsearch and hmmpfam can now use Pfam GA, TC, NC cutoffs,
40 if these have been picked up in the HMM file (by hmmbuild).
41 See the --cut_ga, --cut_tc, and --cut_nc options.
43 - "Stockholm format" alignments are supported, and have replaced
44 SELEX format as the default alignment format. Stockholm format
45 is the alignment format agreed upon by the Pfam Consortium,
46 providing extensible markup and annotation capabilities. HMMER
47 writes Stockholm format alignments by default. The program
48 sreformat can reformat alignments to other formats, including
49 Clustal and GCG MSF formats.
51 - To improve robustness, particularly in high-throughput annotation
52 pipelines, all programs now accept an option --informat <s>,
53 where <s> is the name of a sequence file format (FASTA, for
54 example). The format autodetection code that is used by default
55 is almost always right, and is very helpful in interactive use
56 (HMMER reads almost anything without you worrying much about
57 format issues). --informat bypasses the autodetector, asserts
58 a particular format, and decreases the likelihood that HMMER
59 misparses a sequence file.
62 hmmpfam --acc reports HMM accession numbers instead of
63 HMM names in output files. [Pfam infrastructure]
65 sreformat --nogap, when reformatting an alignment,
66 removes all columns containing any gap symbols; useful
67 as a prefilter for phylogenetic analysis.
69 - The real software version of HMMER is logged into
70 the HMMER2.0 line of ASCII save files, for better
71 version control (e.g. bug tracking, but there are
74 - GCG MSF format reading/writing is now much more robust,
75 thanks to assistance from Steve Smith at GCG.
77 - The PVM implementation of hmmcalibrate is now
78 parallelized in a finer grained fashion; single models
79 can be accelerated. (The previous version parallelized
80 by assigning models to processors, so could not
81 accelerate a single model calibration.)
83 - hmmemit can now take HMM libraries as input, not just
84 a single HMM at a time - useful for instance for producing
85 "consensus sequences" for every model in Pfam with one
88 The following changes may affect HMMER-compatible software:
90 - The name of the sequence retrieval program "getseq" was
91 changed to "sfetch" in this release. The name "getseq"
92 clashes with a Genetics Computer Group package program
93 of similar functionality.
95 - The output format for the headers of hmmsearch and hmmpfam
96 were changed. The accessions and descriptions of query
97 HMMs or sequences, respectively, are reported on separate
98 lines. An option ("--compat") is provided for reverting
99 to the previous format, if you don't want to rewrite your
100 parser(s) right away.
102 - hmmpfam now calculates E-values based on the actual
103 number of HMMs in the database that is searched, unless
104 overridden with the -Z option from the command line.
105 It used to use Z=59021 semi-arbitrarily to make results
106 jibe with a typical hmmsearch, but this just confused
107 people more than it helped. hmmpfam E-values will therefore
108 become more significant in this release by about 37x,
109 for a typical Pfam search (59021/1600 = 37).
111 The following major bugs were fixed:
114 The following minor bugs were fixed:
115 - more argument casting to silence compiler warnings
116 [M. Regelson, Paracel ]
118 - a potential reentrancy problem with setting the
119 alphabet type in the threads version was
120 fixed, but this problem is unlikely to have ever affected
121 anyone. [M. Sievers, Paracel].
123 - fixed a bug where hmmbuild on Solaris machines would crash
124 when presented with an alignment with an #=ID line.
125 Same bug caused a crash when building a model from a single
126 sequence FASTA file [A. Bateman, Sanger]
128 - The configure script was modified to deal better with
129 different vendor's implementations of pthreads, in response
130 to a DEC Digital UNIX compilation problem [W. Pearson,
133 - Automatic sequence file format detection was slightly
134 improved, fixing a bug in detecting GCG-reformatted
135 Swissprot files [reported by J. Holzwarth]
137 - hmmpfam-pvm and hmmindex had a bad interaction if an HMM file had
138 accession numbers as well as names (e.g., Pfam). The phenotype was
139 that hmmpfam-pvm would search each model twice: once for its name,
140 and once for its accession. hmmindex now uses a new
141 indexing scheme (SSI, replacing GSI). [multiple reports;
142 often manifested as a failure of the StL Pfam server to
143 install, because of an hmmindex --one2one option in the Makefile; this was
144 a local hack, never distributed in HMMER].
146 - a rare floating exception bug in ExtremeValueP() was fixed;
147 range-checking protections in the function were in error, and
148 a range error in a log() calculation appeared on
149 Digital Unix platforms for a *very* tiny set of scores
150 for any given mu, lambda.
152 - The default null2 score correction was applied in
153 a way that was justifiable, but differed between per-seq
154 and per-domain scores; thus per-domain scores did not
155 necessarily add up to per-seq scores. In certain cases
156 this produced counterintuitive results. null2 is now
157 applied in a way that is still justifiable, and also
158 consistent; per-domain scores add up to the per-seq score.
159 [first reported by David Kerk]
161 - --domE and --domT did not work correctly in hmmpfam, because
162 the code assumed that E-values are monotonic with score.
163 In some cases, this could cause HMMER to fail to report some
164 significant domains. [Christiane VanSchlun, GCG]
166 The following obscure bugs were fixed (i.e., there were no reports of
167 anyone but me detecting these bugs):
169 - sreformat no longer core dumps when reformatting a
170 single sequence to an alignment format.
172 - Banner() was printing a line to stdout instead of its
173 file handle... but Banner is always called w/ stdout as
174 its filehandle in the current implementation.
175 [M. Regelson, Paracel]
177 - .gz file reading is only supported on POSIX OS's. A compile
178 time define, SRE_STRICT_ANSI, may be defined to allow compiling
179 on ANSI compliant but non-POSIX operating systems.
181 - Several problems with robustness w.r.t. unexpected
182 combinations of command line options were detected by
183 GCG quality control testing. [Christiane VanSchlun]
185 (At least) the following projects remain incomplete:
187 - Ian Holmes' posterior probability routines (POSTAL) are
188 partially assimilated; see postprob.c, display.c
190 - CPU times can now be reported for serial, threaded,
191 and PVM executions; this is only supported by hmmcalibrate
194 - Mixture Dirichlet priors now include some ongoing work
195 in collaboration with Michael Asman and Erik Sonnhammer
196 in Stockholm; also #=GC X-PRM, X-PRT, X-PRI support in
197 hmmbuild/Stockholm annotation.