$Name: fa_34_26_5 $ - $Id: readme.v34t0,v 1.167 2007/04/26 18:42:43 wrp Exp $

>>April 26, 2007

Modify scaleswn.c to prevent mle_cen() from hanging when it fails to
converge.  Also, free() more arrays in work_thr.c; initialize
m_msg.hist.entries=0 in comp_lib.c, and various clean-ups for a_res
encoded alignments.

>>March 22, 2007

Update faatran.c genetic codes (and documentation on -t option).  Update
ncbl2_mlib.c to parse non-NCBI format 12 databases better.

>>March 21, 2007	fasta-34_26_2

Fix conflict between "-S" "-s matrix.file".

>>February 26, 2007	fasta-34_26_2

Fix problem with dropfs2.c (curv.start = lpos before initialized).

>>January 12, 2007

Fix a problem with pssm_asn_subs.c reading strings (sequences) longer
than 1024 bytes.

Remove searchfa.cgi, searchnn.cgi, cgi-lib.pl, my-cgi.pl - this code
was used for an ancient FASTA WWW implementation and has been replaced
by the FASTA_WWW package.

FASTA Version numbers are being modified to make releases easier to
track, thus fa34t26b5 has become fasta-34_26_1.  I would prefer to use
decimal versions, but CVS does not allow '.' in tags.

>>January 4, 2007	fasta-34_26_1

Include scripts for building Mac OS X Universal binaries on a PPC
machine.  Programs are compiled first with Makefile.os_x (gcc-3.3 for
PPC) and then installed into ./ppc/.  Programs are next compiled with
Makefile.os_x86 for i386, and the resulting executables installed into
./i386/.  Finally, the "make_osx_univ.sh" script is run to build the
universal binaries from the two executables using "lipo".

>>December 12, 2006

Fix some problems with p2_workcomp.c: (1) no longer initialize pad
characters for non-existant sequences. (2) deal with small libraries
consistently with the serial versions.

>>November 17, 2006	fa34t26b5

Fixed a problem reading ASN.1 format 2 PSSM's.  It is now possible to
download a PSI-BLAST PSSM RID and search properly.  Next, the query
sequence from the PSSM should be used instead of the provided query
sequence, so that the query sequence is ignored.

>>October 19, 2006	fa34t26b4

Fixed problem with SSE2 code when PSSM's are used.

>>October 6, 2006	fa34t26b3

A new set of WIN32 programs is now available that use the Intel C++
9.1 compiler, rather than the much older Borland Turbo-C compiler. All
of the unthreaded programs that are part of the Unix and MacOSX FASTA
distributions are now available.  Threaded (multiprocessor) versions
of the program as available as well, as are sse2 accelerated versions
of ssearch34 (ssearch34sse2.exe, ssearch34sse2_t.exe).

Th new WIN32 code also uses Microsoft's "nmake" program to build the
programs, which allows much greater consistency between the Unix and
Windows versions.


>>September 18, 2006

Static global alignment variables removed from dropnfa.c, dropfx.c,
dropfz2.c.  dropnfa.c, dropfx.c and dropfz2.c should be thread safe.
Together with the earlier changes, all the FASTA functions should now
be thread safe during the alignment process.

>>August 17, 2006

Begin removal of static variables from Smith-Waterman alignment
functions.  These variables kept the functions from being thread-safe.
Now dropgsw.c and dropnsw.c are thread-safe.

>>August 15, 2006	fa34t26b2

Fixed a problem with pv34compfx/mp34compfx (and fy) producing
improperly labeled alignments and de-allocating memory for the reverse
complement.

>>July 18, 2006

The library file name parsing programs now provide the option for
environment variable substitions.  For example, SLIB2=/slib2 as an
environment variable (e.g. export SLIB2=/slib2 for ksh and bash), then

	fasta34 -q query.aa '${SLIB2}/swissprot.fa'  expands as expected.

While this is not important for command lines, where the Unix shell
would expand things anyway, it is very helpful for various
configuration files, such as files of file names, where:

	<${SLIB2}/blast
	swissprot.fa

now expands properly, and in FASTLIBS files the line:

	NCBI/Blast Swissprot$0S${SLIB2}/blast/swissprot.fa

expands properly.  Currently, Environment variable expansion only
takes place for library file names, and the <directory in a file of
file names.

>>July 14, 2006	  fa34t26b1

Updated Farrar smith_waterman_sse2.c code to address possible bug
(code from Michael Farrar).  Include <sunmedia_intrin.h> for
compilation with Sun compiler with Makefile.sun_x86.

>>July 2, 2006	  fa34t26b0

This release provides an extremely efficient SSE2 implementation of
the Smith-Waterman algorithm for the SSE2 vector instructions written
by Michael Farrar (farrar.michael@gmail.com).  The SSE code speeds up
Smith-Waterman 8 - 10-fold in my tests, making it comparable to Eric
Lindahl's Altivec code for the Apple/IBM G4/G5 architecture.

The Farrar code is largely confined to smith_waterman_sse2.c and
smith_waterman_sse2.h, which are copyright (2006) by Michael Farrar,
and cannot be redistributed without his permission.  Mr. Farrar has
agreed to provide his code under the same policy used by FASTA -
e.g. the code can be used without permission, but not redistributed.

The Farrar code uses GCC version 4.0 SSE2 intrinsic functions to avoid
assembly language code.  Unfortunately, in my hands, "gcc -O3" causes
"out of memory" errors, and other problems, so "gcc -O" is used instead.

>>June 23, 2006   fa34t25d10

Modifications to comp_lib.c, compacc.c, and other files to ensure that
function-specific MAXTOT values are used properly.  MAXTOT is now
available as m_msg.max_tot, which is set in initfa.c (m_msg.max_tot =
MAXTOT) to ensure that functions that need very large MAXTOT values
(e.g. TFASTX) can get them.  tfastx can now search successfully with
titin, a 27,000 residue protein.

Other changes have been made to accomodate long query sequences.

A serious bug was found in fastx34(_t) that caused alignment
coordinates to be calculated improperly when the DNA sequence was much
longer than the protein sequence.

>>May 31, 2006	fa34t25d9

Fixed some problems with fasts/fastf alignments when -m 9 options were
used.  Unlike the other algorithms, the a_res structure does not
capture all the information to re-produce an alignment, so do_walign
now sets bptr->have_ares to indicate whether the a_res structure is
valid.

Various problems with bad library names, and short query titles were
also fixed.

Updated version number/date on all drop*.c functions.

>>May 24, 2006	fa34t25d8

Revised code for NCBI *.pal/*.nal databases has been tested on all
architectures, including Windows.

In addition, support for ASN.1 PSSM:2 files provided by the NCBI
PSI-BLAST WWW site is included.  This code will not work with
iteration 0 PSSM's (which have no PSSM information).  For ASN.1
PSSM's, which provide the matrix name (and in some cases the gap
penalties), the scoring matrix and gap penalties are set appropriately
if they were not specified on the command line. ASN.1 PSSM's are type 2:
	ssearch34 -P "pssm.asn1 2" .....

>>May 18, 2006

Support for NCBI Blast formatdb databases has been expanded.  The
FASTA programs can now read some NCBI *.pal and *.nal files, which are
used to specify subsets of databases.  Specifically, the
swissprot.00.pal and pdbaa.00.pal files are supported.  FASTA supports
files that refer to *.msk files (i.e. swissprot.00.pal refers to
swissprot.00.msk); it does not currently support .pal files that
simply list other .pal or database files (e.g. FASTA does not support
nr.pal or swissprot.pal).

In the process of providing this support, the routines used to read
ASN.1 binary formatdb files were substantially improved.  It is now
possible to see multiple description lines for a single sequence.

IS_BIG_ENDIAN has been removed from all of the Makefiles.  The code
now looks for the definition of __BIG_ENDIAN__ or _BIG_ENDIAN to
decide whether the architecture IS_BIG_ENDIAN.  If, for some reason,
one of these macros is not defined on a BIG_ENDIAN architecture, then
-DIS_BIG_ENDIAN is required.

>>May 12, 2006	CVS fa34t25d7

Corrected serious problem with coordinate display calculation for
fasta34 and ssearch34 - in some cases the coordinates and alignment
symbols were off by the length of the context (typically 30 residues).

Added capability to read ASN.1 binary PSSM information.  This
information is provided (in an encoded form) from the NCBI PSI-BLAST
WWW site.  (What is actually provided from the WWW site is a bzip2-ed
binary file that is converted to ASCII HEX.  The ASCII HEX file must
be converted to binary, and then bunzip'ed. This bunzip-ed file is
binary ASN.1.)  These files can also be generated by 

 blastpgp -J T -C pssm.asn1_bin -u 2

I am parsing the ASN.1 binary manually, not using the NCBI toolkit, so
there may be some files that are not parsed properly - if so, let me
know.

(May 12, 2006 - The NCBI changed the format of the psi-blast ASN.1
PSSM - and has not yet provided documentation of the new structure, so
this code does not work. It does work with blastpgp v 2.2.13, but not
with the web site version 2.2.14.  A fix was provided 24-May-2006)

>>April 18, 2006

Small modification in mshowbest.c to provide more consistent display
widths with -m 9i in list of best hits.

>>April 11, 2006 CVS fa34t25d6

Corrected a problem introduced with the new, more efficient method for
displaying alignments.  For the tfast* programs, which must translate
the library sequence, translations were not done when alignments were
re-displayed.

Corrected an older problem with tfastx34 against very long sequence
databases - the code to more efficiently do the display alignment did
not use the correct sequence coordinates.

Modifications to dropfs2.c to ensure that exact peptide matches are
captured more frequently.

>>March 16, 2006 CVS fa34t25d5

Change to initfa.c to allow lower case DNA libraries using the
-DDNALIB_LC compile time option.

Modify p2_complib.c, p2_worklib.c (and doinit.c, msg.h) to allow the
-V annotation option for the parallel programs.  Also modify to allow
specification of the query range (but only for the first query, like
fasta34) for the parallel programs.

Modification of p2_workcomp.c to correct some problems presenting
percent similarity.  Also correct unreleased bugs in the alignment
routines that allow more efficient alignment re-calculation.

>>Nov 20, 2005

Changes to support asymmetric matrices - a scoring matrix read in from
a file can be asymmetric.  Default matrices are all symmetric.

>>Oct 24, 2005

Modifications extended to p2_complib.c/p2_workcomp.c.  Incorporation
of drop_func.h into p2_workcomp.c greatly simplifies things.  No
changes in communication - struct a_res_str is internal to
p2_workcomp.c.

Additional changes to do_walign() so that aln_func_vals() must be
called to set llfact, qlfact, etc in a_struct aln before or after
do_walign is called.  do_walign produces a_res_str a_res, which has
all the information necessary to produce a calcons() or calc_code()
alignment.

>>Oct 19, 2005 CVS fa34t26b0

Modifications to drop*.c and c_dispn.c to separate (and simplify) some
of the alignment coordinate calculations.  Before, the "a_struct" had
the coordinates of the alignment used in the display (seqc0, seqc1)
AND in the original sequences (aa0, aa1), as well as other information
used to calculate alignment coordinates.  In the new version, astruct
coordinates always refer to seqc0,1, while a new structure, a_res_str,
has coordinates for aa0, aa1 as well as the alignment encoding in res[nres].
Eventually, this should make it possible to display multiple local
alignments from the same two sequences.

In addition, the file "drop_func.h" has been added to the project, and
is included by many of the files (all the drop*.c functions,
mshowbest.c, mshowalign.c) to ensure that the various functions are
declared and used consistently.

>>Sept 19, 2005	CVS fa34t25d4

Changes to support Mac OS 10.4 - Tiger (include sys/types.h in more
files).  Documentation update for prss34/prfx34. Modifications to
comp_lib.c to support prss34_t/prfx34_t.  Shuffle numbers for
prss/prfx can now be specified by "-k #".

>>Sept 2, 2005

The prss34 program has been modified to use the same display routines
as the other search programs.  To be more consistent with the other
programs, the old "-w shuffle-window-size" is now "-v window-size".

prss34/prfx34 will also show the optimal alignment for which the
significance is calculated by using the "-A" option. 

Since the new program reports results exactly like other
fasta/ssearch/fastxy34 programs, parsing for statistical significance
is considerably different.  The old format program can be make using
"make prss34o".

>>Aug 26, 2005

Modifications to save_best() in comp_lib.c to support prss34_t.  It
did not work before.

>>July 25, 2005

Modify mshowbest.c to suppress gi|12345 in HTML mode.

>>July 18, 2005	CVS fa34t25d3

Modifications to Makefile.tc to support NCBI formatdb formats under
Windows.

>>May 19, 2005  CVS fa34t25d2

Modifications to dropfs2.c to fix an obscure bug that occurred when
correctly ordered peptides aligned one residue apart.

>>May 5, 2005 CVS fa34t25d1

Modification to the -x option, so that both an "X:X" match score and
an "X:not-X" mismatch score can be specified. (This score is also used

give a positive score to a "*:*" match - the end of a reading frame,
while giving a negative score to "*:not-*".

>>March 14, 2005  CVS fa34t25b4

Fixed some problems caused by padding characters required for
Smith-Waterman ALTIVEC in the parallel (p2_complib.c, p2_workcomp.c)
versions.

>>Feb 24, 2005	CVS fa34t25b3

Changes to comp_lib.c (and Makefile.pcom) to support prss34_t.

>>Feb 12, 2005

Modify dropfs.c to dynamically allocate space for alignments, so that
queries with a large number of fragments can still place all the
fragments on the alignment.  Also fix a problem produced by removing
-DBIGMEM from most of the Makefile's, but not fixing defs.h to use
BIGMEM sizes by default.

>>Jan 24, 2005

Include a new program, "print_pssm", which reads a blastpgp binary
checkpoint file and writes out the frequency values as text.  These
values can be used with a new option with ssearch34(_t) and prss34,
which provides the ability to read a text PSSM file.  To specify a
text PSSM, use the option -P "query.ckpt 1" where the "1" indicates a
text, rather than a binary checkpoint file.  "initfa.c" has also been
modified to work with PSSM files with zero's in the in the frequency
table.  Presumably these positions (at the ends) do not provide
information. (Jan 26, 2005) blastpgp actually uses BLOSUM62 values
when zero frequencies are provided, so read_pssm() has been modified
to use scoring matrix values for zero frequencies as well.

>>Jan 13, 2005

Change to initfa.c to have fasts34 do a protein comparison by default,
rather than an unknown sequence type.  Automatic checking for fasts34
does not work reliably, because queries can be very short.  Likewise
for fastm34.  [Jan 26, 2004] Undo this change, which broke DNA
comparison when "-n" was specified.

>>Jan 7, 2005

Changes to tatstats.h, dropfs2.c to allow larger numbers of peptides
to match when fasts is used to show coverage on a proteomics
experiment.  Previously fasts could match no more than 30 peptides,
that has been increased to 50.  In addition, ktup=2 can be used
to increase the likelihood that short exact matchs trump longer
mismatched regions.  

>>Nov 11, 2004	   CVS fa34t25

Finished merge of earlier fa34t24 branch with HEAD.  Correct
labeling of TFASTM.

>>Nov 4-8, 2004	   

Incorporation of Erik Lindahl "anti-diagonal" Altivec code for
Smith-Waterman, only.  Altivec SSEARCH is now faster than FASTA for
query sequences < 250 amino acids.

Small modifications to output score display to ensure that the correct
scores are shown, and that they are correctly labeled.

>>Aug 25,26, 2004  CVS fa34t24b3

Small change in output format for p34comp* programs in
">>>query_file#1 string" line before alignments.  This line is not present
in the non-parallel versions - it would be better for them to be consistent.

Change in last_stats.c to properly label fasts statistics with -z != 1.

Change in dropfs2.c to ensure that tatprobs are not precalculated with -z 4.

Modify -m 9i output option to show in HTML output.

Add "#ifdef NOOVERHANG" to dropfs2.c that causes overlapping
alignments to score a 0, rather than the partial overlap score.
Useful for SAGE alignments, because "fasts" requires global alignments
(except for for overhangs, unless NOOVERHANG is defined).

>>Aug 23, 2004

Fix problem with very long definition lines with formatdb version4
ASN databases.  Fix mshowalign.c to re-enable "-L" option.

>>July 28, 2004 

Fix to re-enable -w window shuffle for PRSS.  Modify comp_lib.c
for PRSS to ensure that the unshuffled score and probability
are shown, even for very high probabililty alignments.

>>July 21, 2004

Modifications to support PostgreSQL databases with the same commands
as MySQL databases.  MySQL database libraries are type 16, PostgreSQL
are type 17.  Makefile.linux_sql and Makefile.pvm4_sql support both
database types simultaneously.

>>June 23, 2004 CVS fa34t24b2

Additional fixes to enable -n or -p with fasts34 and
fastm34. Makefile.pcom was fixed for fastm34_t.  A new file,
mgstm1.nts, of DNA fragments from mgstm1.seq, is included for testing
fasts34 and fastm34.

>>May 4, 2004  

Fixes to initfa.c to allow DNA:DNA for FASTS, FASTM.  This change
introduced a bug that broke FASTS completely, but was fixed June 18,
2004 (and retagged fa34t24b2).

>>April 23, 2004 CVS fa34t24b1

Fix bug in initfa.c that caused tfasts/tfastf not to examine all six
frames.

>>May 4, 2004

Fixes to initfa.c to allow DNA:DNA for FASTS, FASTM.

>>March 19, 2004 CVS fa34t24b0

Modify all the drop*.c files, plus mshowbest.c and mshowalign.c, to
display percent similarity, rather than percent ungapped.  An
alignment is counted as similar if the score is greater than or equal
to zero (the same criterion used for placing ".". To disable this
change, remove -DSHOWSIM from the appropriate Makefile.*.

>>March 18, 2004 CVS fa34t23b8

Fix bug in initfa.c tables that caused prss to generally compare
proteins.

>>March 15, 2004 

Fix bug in calls to revcomp(); make revcomp() guarantee NULL termination.

>>March 2, 2004	CVS fa34t23b7

Fix a very embarrassing and surprising bug that caused insertions
in fasta alignments to appear in the wrong sequence.

>>Feb 7, 2004	CVS fa34t23b6

Change initfa.c to allow "-i" (reverse complement) and "-i -3" with
"fastx34" and "prfx34".  In addition, "prfx34" now examines both query
DNA strands in calculated the shuffled statistical significance.

>>Feb 5, 2004

Reverse assignments for G:U baseparing in initfa.c.

Fix memory allocation error caused by doubling DNA alignment width.

>>Jan 7, 2004	CVS fa34t23b5

Change in do_walign() in dropnfa.c to make final DNA alignments use a
band that is 2X as large as the search band width.

>>Dec 22, 2003	CVS fa34t23b4

Fix typo in p2_complib.c that prevented compilation.  Fix problem
with karlin.c for assymetrical matrices, such as used with -U.

>>Dec 10, 2003  CVS fa34t23b3

Fix problem in resetp()/initfa.c that disabled banded Smith-Waterman
DNA alignments.

Allow spam() to do extended alignments for DNA if one of the sequences
is < 50 nt.

Cause default ktup to drop for short sequences.  For protein < 50, ktup=1;
for DNA < 20, 50, 100 ktup = 1, 2, 3, respectively.

>>Dec 7, 2003

A new option, "-U" is available for RNA sequence comparison.  "-U"
functions like "-n", indicating that the query is an RNA sequence.  In
addition, to account for "G:U" base pairs, "-U" modifies the scoring
matrices so that a "G:A" match has the same score as a "G:G" match,
and "T:C" match has the same score as a "T:T" match.  The asymmetric
matrix required changes in dropnfa.c that were similar to the changes
in dropgsw.c required for profiles.  In addition, m_msg.qdnaseq and pst.dnaseq
 can now be SEQT_DNA, SEQT_RNA, SEQT_PROT, SEQT_UNK, or SEQT_OTHER.
m_msg.ldnaseq does not use SEQT_RNA, only SEQT_DNA.  A new member of
struct pstruct: int nt_align, is used to indicate nucleotide
alignments.

>>Nov 19, 2003

Changes to Makefile's to distinguish between tatstats_fs.o and
tatstats_ff.o.

>>Nov 2, 2003

Substantial changes to comp_lib.c, p2_complib.c, mshowbest.c, and
mshowalign.c to support more sophisticated display options.
Previously, one could have only on "-m #" option, even though several
of the options were orthogonal (-m 9c is independent of -m 1 and -m2,
which is independent of -m 6 (HTML)).  The programs now use a bitmask
that allows independent options to be combined.  In particular -m 9c
can be combined with -m 6, which can be very helpful for runs that
need HTML output but can also exploit the encoding provided by -m 9c.

The "-m 9" option now also allows "-m 9i", which shows the standard
best score information, plus percent identity and alignment length.

>>Oct 26, 2003	CVS fa34t23b1

Additional fixes to Makefiles to enable tfastf34(_t).  Changes to
support ossearch34 (a non-Phil Green optimized Smith-Waterman).

>>Oct 8, 2003	CVS fa34t23b0

Fixes to get DNA queries working in both directions, and to fix PCOMPLIB
programs for "-V" option.  Currently, the parallel programs cannot use
the "-V" option.

>>Sept 25, 2003

A new option is available for annotating alignments.  -V '@#?!'
can be used to annotate sites in a sequence, e.g:
	>GTM1_HUMAN ...
	PMILGYWDIRGLAHAIRLLLEYTDS@S?YEEKKYT@MG
	DAPDYDRS@QWLNEKFKLGLDFPNLPYLIDGAHKIT
might mark known and expected (S,T) phosphorylation sites.  These
symbols are then displayed on the query coordinate line:

               10        20    @?  30  @     40  @     50        60
GTM1_H PMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLP
       ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
gtm1_h PMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLP
               10        20        30        40        50        60

This annotation is mostly designed to display post-translational
modifications detected by MassSpec with FASTS, but is also available
with FASTA and SSEARCH.

>>Sept 22, 2003	  CVS fa34t22b5

The Altivec Smith-Waterman code has been removed.

>>Sept 17, 2003	  CVS fa34t22b4

A variety of different bugs have been fixed.  (1) All the functions in
the old initsw.c are now in initfa.c; initsw.c will be removed.
Specifically, the Profile/PSSM code is now in initfa.c.  initfa.c is
now fully table driven. (2) various problems with prss34 and prfx34
have been fixed in initfa.c.  (3) An additional ncbl2_mlib.c buffer
overrun has been fixed. (4) fastf34 is now available in this package.
Its performance is very similar to, but not identical to, fastf33.  I
am tracking down the differences.  In general, the raw scores
calculated by both programs are the same, but the statistical analysis
seems to be slightly different.

>>July 30, 2003   CVS fa34t22b3

Fix bug in ncbl2_mlib.c that caused buffer overrun with blast/formatdb
v3 description lines.

>>July 28, 2003

The initfa.c file has been substantially re-structured to use a
table-driven approach to parameter setting, rather than the previous
confusing combinations of #ifdef's.  Two tables of parameters are
used, pgm_def_arr[] and msg_def_arr[], which specify values like the
program name, reference, scoring matrix, default gap penalties, etc.
msg_def_arr[] has the sequence types for the query, library, and
algorithm, as well as other parameters (qframe, nframe, nrelv, etc),
which greatly simplifies the sequence recognition logic.  ppst->pgm_id
can be used to identify the program that is running.  Eventually,
almost all of the program specific #ifdef's will be removed from
initfa.c.  initfa.c now provides initsw.c functionality, so that
initsw.c is no longer needed.

>>July 25, 2003

A new file is included - fasta.defaults - that lists the scoring
matrix, gap penalty, and other defaults for all of the fasta34
programs.  This file will be used soon to simplify parameter setting
for the FASTA programs, and should also be used by Javascript WWW
interfaces to the FASTA programs.

>>July 22, 2003    CVS fa34t22b2

Fixes to dropfs2.c, tatprobs.c to ensure that negative probabilities
cannot occur.  Negative probabilities were never seen with standard
matrices, but did occur with BL50.  Another optimization in dropfs.c
considerably improves fasts34 performance in some cases.

Fix a problem with formatdb v4 ASN.1 format files.

>>July 12, 2003

Fix a bug that prevented "-L" (long sequence descriptions) from
working.

>>July 9, 2003

Fix reverse complement (M:K) error.  Fix off-by-one error for FASTA
DNA alignments that caused the first aligned residue pair to be
missed.

>>July 4 - 8, 2003

Incorporate blast-def-line ASN.1 parsing so that NCBI formatdb version
4 files can be read.

>>June 26, 2003

The strategy for displaying the match/mismatch line (" .:" for -m 0)
has been changed dramatically to acommodate more sophisticated
strategies for indicating conservative replacements, e.g. because of
PSSM's.  In addition to seqc0 and seqc1, which hold the aligned
sequences for display, there is also seqca, which holds the alignment
symbol.  calcons(), do_show(), and discons() have all changed to
include seqca.  calcons() is somewhat more complex; discons() is much
simpler.  (June 29, 2003 - dropgsw.c calcons() now displays profile
similarity accurately - it is very very illuminating.)

>>June 16, 2003	version: fasta34t22

ssearch34 now supports PSI-BLAST PSSM/profiles.  Currently, it only
supports the "checkpoint" file produced by blastall, and only on
certain architectures where byte-reordering is unnecessary.  It has not
been tested extensively with the -S option.

	ssearch34 -P blast.ckpt -f -11 -g -1 -s BL62 query.aa library

Will use the frequency information in the blast.chkpt file to do a
position specific scoring matrix (PSSM) search using the
Smith-Waterman algorithm.  Because ssearch34 calculates scores for
each of the sequences in the database, we anticipate that PSSM
ssearch34 statistics will be more reliable than PSI-Blast statistics.

The Blast checkpoint file is mostly double precision frequency
numbers, which are represented in a machine specific way.  Thus, you
must generate the checkpoint file on the same machine that you run
ssearch34 or prss34 -P query.ckpt.  To generate a checkpoint file,
run:

blastpgp -j 2 -h 1e-6 -i query.fa -d swissprot -C query.ckpt -o /dev/null

(This searches swissprot for 2 iterations ("-j 2" using a E()
threshold 1e-6 saving the resulting position specific frequencies in
query.ckpt.  Note that the original query.fa and query.ckpt must
match.)

>>June 5, 2003

Fix to mshowbest.c to get -m 9 coordinates correct on reverse strand
with pv34comp*.  Some additional fixes for prfx34.

>>May 22, 2003

Changes to llgetaa.c, getseq.c, comp_lib.c to provide a different
library residue lookup table (sascii) for queries and libraries.  This
allows one to make a prfx34 (like prss34, but using the fastx
algorithm).  prfx34 is now available.

>>May 13,14 2003

Fixes to most of the drop*.c files, and mshowbest.c, to ensure that
coordinates displayed with -m 9(c) and the final alignment are
consistent.  They were consistent for fasta34/ssearch34/fasts34, but
not for fastx34/fasty34.  The alignment coordinate system has been
been revised for consistency in allthe drop*.c programs (coordinates
used to be off-by-one for some, but not other functions).

Fixes to -m 9c for fasty34/pv34compfy.  In addition, a problem was
fixed with fastx34/fasty34 that appeared with a protein sequence was
considerably longer than the DNA query, e.g. an EST vs titin (26K
residues).  This problem only appeared on pv34compfx/fy on Xserve's
under OS_X; but it should improve fastx34/fasty34 performance with
very long protein sequences on all platforms.

>>May 7,8 2003

Changes to p2_workcomp.c, compacc.c, and p_mw.h to fix persistent
bugs in the -m 9c display.  Previous pv34comp* programs would not
return the correct coded alignment if more than 100 alignments came
from the same node, or if an encoding was longer than 127 chars.

Also, fixes to p2_complib.c, comp_lib.c, to allow long query sequences
to be segmented.  Previously, only the first 20,000 residues were
used.  The segmented queries are not overlapped; segmented library
sequences are.

>>May 5, 2003

Changes to last_tat.c, scaleswt.c to ensure that all fasts alignments
that are likely to have significant scores are displayed.  In previous
implementations, if the query had more than 10 fragments, only the 100
best scores were shown.  Now, we rescore up to 2500 alignments.  The
new approach allows large mixtures to be used for searches, where some
of the fragments from the mixture match too many proteins
(e.g. actins).  Some differences between the fasts34 and pv34compfs
implementations have been fixed.  The two programs typically will not
give exactly the same results, because of small differences in the
sampling procedures, but the results are essentially equivalent.

>>Apr 11, 2003  CVS fa34t21b3

Fixes for "-E" and "-F" with ssearch34, which was inadvertantly disabled.

A new option, "-t t", is available to specify that all the protein
sequences have implicit termination codons "*" at the end.  Thus, all
protein sequences are one residue longer, and full length matches are
extended one extra residue and get a higher score.  For
fastx34/tfastx34, this helps extend alignments to the very end in
cases where there may be a mismatch at the C-terminal residues.

-m 9c has also been modified to indicate locations of termination
codons ( *1).

>>Mar 17, 2003  CVS fa34t21b2

A new option on scoring matrices "-MS" (e.g. "BL50-MS") can be used to
turn the I/L, K/Q identities on or off.  Thus, to make "fastm34" use
the isobaric identities, use "-s M20-MS".  To turn them off for "fasts34",
use "-s M20".

More fixes for correct alignment coordinates.  There was a conflict between
-m 9 and -m 9c and subsequent alignment displays.

>>Mar 13, 2003	

Various fixes to produce correct fastm34 alignments.  Changes to all
functions to correct potential problem with -m 9 alignment coordinates
when both -m 9 and actual alignments are shown.

>>Feb 25,27, 2003

Modifications to re-activate showsum.c, which included corrections to
the showbest() call in p2_complib.c.

>>Feb 13, 2003	CVS fa34t21b1

Modifications to dropfx.c to dramatically improve alignment speed for
cases where the DNA sequence is considerably longer than the protein
sequence.  Previously, a 200 aa vs 5000 nt comparison would do a full
200 x 5000 Smith-Waterman alignment; with this modification, no more
than a 200 x 1200 (2x3x200) alignment is done.  This optimization has
not (yet) been applied to dropfz2.c (fasty/tfasty).

>>Feb 11, 2003

Small modifications to comp_lib.c, p2_complib.c, and nmgetlib.c to
pass openlib() a possibly old lmf_str.  This allows openlib() to
re-use memory mapped files.  closelib() no longer releases memory
mapped file buffers.  Under Linux, memory mapped file buffers were not
really released, so when comparing a set of sequences against nr, the
program could not mmap() the database after several searches.  This
will also speed up memory mapped multiple sequence searches.

>>Jan 28-31, 2003  CVS fa34t21b0

Fix another bug (all of v34t20) involved with overlapping long
sequences.  And another bug that occurred when using sampled
statistics, but appeared only on the SGI platform - thanks to Dmitri
Mikhailov.  Several other issues have been addressed based on more
instrumented runtime testing.

Fix an old (all v34) bug that caused problems with -z 11-16 (shuffled
sequence array was not allocated properly).  Fixed another bug with -z
6/16 when using threaded (_t) searches in fasta34_t.

Restructure statistical analysis functions (scaleswn.c, scaleswt.c) to
return the "final" statistical estimation routine done in pst.zsflag_f.
This allows the program to cope with searches against a single sequence
correctly.

Corrected an error for DNA sequences needing Altschul-Gish statistics.

>>Jan 25, 2003

Add option "-J start:stop" to pv34comp*/mp34comp*.  "-J x" used to
allow one to start at query sequence "x"; now both start and stop can
be specified.

>>Jan 14, 2003

Changes to apam.c to provide an error message on stderr when a scoring
matrix cannot be found.

Changes to dropfs2.c, initsw.c, initfa.c to provide -m9c information
for fasts34 searches.  Modify the alignment algorithm to use
probabilistic scores properly.

>>Dec 22, 2002

Change to compacc.c (sortbeste()) to do a second sort on zscore when
several sequences have E() == 0.

>>Nov 27, 2002

Change FSEEK_T to fseek_t to keep Borland BCC5 happy. 

>>Nov 14-22, 2002  CVS fa34t20b6

Include compile-time define (-DPGM_DOC) that causes all the fasta
programs to provide the same command line echo that is provided by the
PVM and MPI parallel programs.  Thus, if you run the program:

    fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12

the first lines of output from FASTA will be:

    # fasta34_t -q gtt1_drome.aa /slib/swissprot
     FASTA searches a protein or DNA sequence data bank
     version 3.4t20 Nov 10, 2002
    Please cite:
     W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448

This has been turned on by default in most FASTA Makefiles.  

Fix p2_complib.c so that qstats[] is always allocated before it is used.

Fix serious bug in non-threaded comp_lib.c that caused some high
scoring sequences to be missed by fasts34.  New tests are included in
test.sh to detect this problem in the future.

The shell sort algorithm in sortbeste(), sortbestz(), and sortbesto()
has been modified to use an improved algorithm that will not go
quadratic in pathological cases.

nmgetlib.c and mmgetaa.c have been modified to remove "^A" in libstr
when used with p2_complib.c.

Fix problem with MAXSEG in tatstats.h with IBM/AIX.

Changes to most Makefiles to use -DSAMP_STATS; fixes to p2_complib.c
for SAMP_STATS.

>>Oct 22, Nov 3, Nov 9, 2002   CVS tag fa34t20b5

Fix problem in comp_lib.c that caused the query sequence length to be
counted twice.

Fixed problem with prss34 (updated find_zp in showrss.c).

Correct shuffling function in several places.

Add jitter back to addhistz() - improves appearance with prss34.

Changes to fix problems with aln_code using -m 9c.

Fix to serious bug in scaleswt.c (fasts34, etc) that caused sorts on
the high scores to take much to long.  The program is now 10X faster,
and scales well on PVM/MPI.

Fix to llgetaa.c to work with new getseq() API with automatic alphabet
recognition.

>>Oct 12, 2002 CVS tag fa34t20b4

Several very obscure (and sometimes old) bugs that appeared in certain
MPI environments have been fixed.  This occurred because the pst.sq[]
array did not always have a '\0' at the end.  In addition,
mshowalign.c/p2_workcomp.c sometimes failed to put the '\0' at the end
of seqc0/seqc1.  Correct bug introduced in fa34t20b3 for fasts34(_t).

>>Oct 9, 2002 CVS tag fa34t20b3

Fix to apam.c build_xascii() to not zero-out qascii[0].  Fix
Makefile.pvm4.  Mix problem with -m 9c with compacc.c.

>>Sept 28, 2002 

Additional fixes to -m 9c in p2_complib.c/compacc.c/mshowbest.c.
Remove restriction in fasts34(_t) to less than 30 peptides (though no
more than 30 peptides can be aligned currently).

>>Sept 24, 2002

Fix p2_workcomp.c so that e_scores are delivered correctly when
last_calc flag is set, and -m 9c provides alignments when only one
best hit is present.

Fix comp_lib.c to use different maxn and overlap for each different
query sequence.  fasta34 and fasta34_t now have identical results when
a long sequence is searched.

Add '@C:101' support to memory mapped FASTA format files.

Fix mshowalign.c so that coordinates returned by cal_coord() use
loffset+l_off.

>>Sept 14, 2002	CVS tag fa34t20b2

Changes to p2_complib.c, compacc.c to fix statistics problems with
pv34compfs on query sequences with more than 10 fragments.

>>Aug 27, 2002

Modifications to mshowbest.c and drop*.c (and p2_workcomp.c,
compacc.c, doinit.c, etc.) to provide more information about the
alignment with the -m 9 option.  There is now a "-m 9c" option, which
displays an encoded alignment after the -m 9 alignment information.
The encoding is a string of the form: "=#mat+#ins=#mat-#del=#mat".
Thus, an alignment over 218 amino acids with no gaps (not necessarily
100% identical) would be =218.  The alignment:

       10        20        30        40        50          60         70  
GT8.7  NVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKFKL--GLDFPNLPYL-IDGSHKITQ
       :.::  . :: ::  .   .:::         : .:    ::.:   .: : ..:.. :::  :..:
XURTG  NARGRMECIRWLLAAAGVEFDEK---------FIQSPEDLEKLKKDGNLMFDQVPMVEIDG-MKLAQ
               20        30                 40        50        60        

would be encoded: "=23+9=13-2=10-1=3+1=5".  The alignment encoding is
with respect to the beginning of the alignment, not the beginning of
either sequence.  The beginning of the alignment in either sequence is
given by the an0/an1 values. This capability is particularly useful
for [t]fast[xy], where it can be used to indicate frameshift positions
"/#\#" compactly.  If "-m 9c" is used, the "The best scores" title
line includes "aln_code".

>>Aug 14, 2002	CVS tag fa34t20

Changes to nmgetlib.c to allow multiple query searches coming from
STDIN, either through pipes or input redirection.  Thus, the command

       cat prot_test.lseg | fasta34 -q -S @ /seqlib/swissprot

produces 11 searches.  If you use the multiple query functions, the
query subset applies only to the first sequence.

Unfortunately, it is not possible to search against a STDIN library,
because the FASTA programs do not keep the entire library in memory
and need to be able to re-read high-scoring library sequences.  Since
it is not possible to fseek() against STDIN, searching against a STDIN
library is not possible.

>>Aug 5, 2002

fasts34(_t) and fastm34(_t) have been modified to allow searches with
DNA sequences.  This gives a new capability to search for DNA motifs,
or to search for ordered or unordered DNA sequences spaced at
arbitrary distances.

>>Aug 4, 2002

comp_lib.c has been modified to provide comp_mlib.c function.
comp_mlib.c is no longer used.  comp_lib.c with the "mlib" function
can now recognize protein or DNA sequences automatically, and reads
from stdin can now detect DNA/protein sequence types automatically.
Changes to compacc.c, getseq.c, doinit.c initfa.c, initsw.c, and
nmgetlib.c to support automatic sequence type detection.

>>July 28-31, 2002

(1) The various Makefile's have been "normalized".  The fast*34[_t]
    (Makefile.34m.common[_sql]), Makefile.pvm4[_sql], and
    Makefile.mpi4[_sql] make files all use a common set of filenames,
    described in Makefile.fcom.  This greatly simplifies adding
    programs, but requires that all *.o files be deleted when moving
    from fast*34* to pv34comp* to mp34comp*.

(2) showalign.c/p_showalign.c have been merged into mshowalign.c
    showbest.c/manshowbest.c have been merged into mshowbest.c.  Some
    of the related files (showun.c, manshowun.c, have not been merged
    or tested).

(3) Code for ranking scores with valid e_value's incorporated.

(4) Bug fixes in p2_complib.c, so that fasts34/fasts34_t/pvcompfs
    provide identical statistics.

>>July 26, 2002

Makefile.pvm4_sql and Makefile.pvm4 have been substantially simplified
by providing the worker program name from the h_init() function in the
initfa.c/initsw.c files.

>>July 24, 2002

Substantial modifications to param.h, structs.h to ensure that no
sequence specific information is kept in struct pstruct.  This
structure now holds the pam[] matrix, and other scoring parameters,
but nothing that is dependent on aa0.  The aa0 dependent stuff (nm0,
Lambda, K, etc) is now stored in struct mngmsg.  This was mostly done
to support the pv34comp* programs, which have separate mngmsg
structures but the same pstructs.

The fasts34, fasts34_t, and pv34compfs/c34.workfs have all been tested
successfully.

>>July 19, 2002

Fix an old bug in the calculation of E()-values in DNA databases
longer than 2147483647 residues on machines with 32-bit longs.


>>July 28-31, 2002

(1) The various Makefile's have been "normalized".  The fast*34[_t]
    (Makefile.34m.common[_sql]), Makefile.pvm4[_sql], and
    Makefile.mpi4[_sql] make files all use a common set of filenames,
    described in Makefile.fcom.  This greatly simplifies adding
    programs, but requires that all *.o files be deleted when moving
    from fast*34* to pv34comp* to mp34comp*.

(2) showalign.c/p_showalign.c have been merged into mshowalign.c
    showbest.c/manshowbest.c have been merged into mshowbest.c.  Some
    of the related files (showun.c, manshowun.c, have not been merged
    or tested).

(3) Code for ranking scores with valid e_value's incorporated.

(4) Bug fixes in p2_complib.c, so that fasts34/fasts34_t/pvcompfs
    provide identical statistics.

>>July 26, 2002

Makefile.pvm4_sql and Makefile.pvm4 have been substantially simplified
by providing the worker program name from the h_init() function in the
initfa.c/initsw.c files.

>>July 24, 2002

Substantial modifications to param.h, structs.h to ensure that no
sequence specific information is kept in struct pstruct.  This
structure now holds the pam[] matrix, and other scoring parameters,
but nothing that is dependent on aa0.  The aa0 dependent stuff (nm0,
Lambda, K, etc) is now stored in struct mngmsg.  This was mostly done
to support the pv34comp* programs, which have separate mngmsg
structures but the same pstructs.

The fasts34, fasts34_t, and pv34compfs/c34.workfs have all been tested
successfully.

>>July 8, 2002

Modifications to comp_lib.c, initfa.c and new scaleswt.c, tatstats.c
to support FASTS with Tatusov statistics.

last_params() has been introduced to allow aa0 dependent changes in m_msg/pstr.

sortbest() has been moved into initfa.c/initsw.c to make it function specific.

find_z() takes an additional parameter, escore.

The do_work() results structure, beststr, and stat_str all accommodate
escores as well as integer scores (stat_str also saves segn and segl
but doesn't need them).

In scaleswt.c, process_hist() now knows much more about Tatusov statistics.

last_stats() provided to accommodate rank-based statistical corrections.

scale_scores() is the last function to modify the beststr scores
(final calculation of E-value).

Some sortbest*() calls and some bptr[i]->zscore=find_zp() loops have
been moved into scale_scores();

>>July 3,5, 2002

Modifications to allow mySQL comments (--) in "library.sql 16" files.
Thus, a first line of:

	--host seqdb user password;

is read by FASTA as the login information to a mySQL server, but is
ignored by mySQL.  "DO" commands in FASTA mySQL files can also be
rendered invisible to mySQL in this way.  See "do.sql".

Modifications to mysql_lib.c to allow very long SQL statements.  The
buffer is now dynamically reallocated in 4Kb chunks.

The fasta3.1 man page has been updated and re-organized.

>>June 26, 2002

Minor modifications to nmgetaa.c (openlib()) to use the same arguments
for searching and PRSS.  PRSS needs access to all of m_msg, but
searches do not.  Other small fixes to comp_mlib.c, towards the goal
of merging comp_mlib.c and comp_lib.c.

>>June 25, 2002

Modify the statistical estimation strategy to sample all the sequences
in the database, not just the first 60,000.  The histogram is still
based only on the first 60,000 scores and lengths, though all scores
an lengths are shown.  The fit to the data may be better than the
histogram indicates, but it should not be worse.

Currently, this modification is available only if the -DSAMPLE_STATS
option is defined.

>>June 23, 2002	CVS fa34t11d4

Fix a very long-standing bug in fasty/tfasty that caused 'NNN' to be
translated as 'S', rather than 'X'.  fastx/tfastx has done this
correctly for many years, but the fasty/tfasty code that I received
from Zheng Zhang was not implemented correctly (my fault, his code was
fine).

>>June 19, 2002

Added "-C #" option, where 6 <= # <= MAX_UID (20), to specify the
length of the sequence name display on the alignment labels.  Until
now, only 6 characters were ever displayed.  Now, up to MAX_UID
characters are available.

>>May 30, 2002	CVS fa34t11d3

Fixed problem with programs using the default -E cutoff when -b was
provided.  With this implementation, -E can override -b, but -b
overrides the default -E.

Fixed problem with 64-bit file offsets in param.h (change USE_FSEEK0
-> USE_FSEEKO, include -D_LARGEFILE_SOURCE and -D_LARGEFILE64_SOURCE
in Makefile.linux_sql).  Put limits on alignment display length (200
chars).  More checks for null returns from SQL queries.

>>Apr 17, 2002	CVS fa34t11d2

Fixed bug in mm_file.h/ncbl2_mlib.c that caused the SGI version to be
unable to read blast2 format files.

Changed "mp_*" tags to "pg_*" for -m 10 option.

>>Mar 30, 2002

Fix embarrassing bug in revcomp() (getseq.c) that failed to complement
the central nucleotide in a sequence with an odd number of residues.

Small changes to dropfs.c for more segments.

>>Mar 16, 2002

Added create_seq_demo.sql, nt_to_sql.pl to show how to build an SQL
protein sequence database that can be used with with the mySQL
versions of the fasta34 programs.  Once the mySQL seq_demo database
has been installed, it can be searched using the command:

	fasta34 -q mgstm1.aa "seq_demo.sql 16"

mysql_lib.c has been modified to remove the restriction that mySQL
protein sequence unique identifiers be integers.  This allows the
program to be used with the PIRPSD database.  The RANLIB() function
call has been changed to include "libstr", to support SQL text keys.
Due to the size of libstr[], unique ID's must be < MAX_UID (20)
characters.

A "pirpsd.sql" file is available for searching the mySQL distribution
of the PIRPSD database.  PIRPSD is available from
ftp://nbrfa.georgetown.edu/pir_databases/psd/mysql.

>>Mar 6, 2002

Fix showbest.c showbest() to report pst.zdb_size as database size.
Fix dropnfa.c spam() to address off-by-one on end of run, and double
counting on backwards scan.  Fix dropnfa.c do_fasta() to fix another
problem introduced by -S.  Changes to comp_lib.c to ensure that both
the beginning and end of the query and library sequence have '\0'
present.  Changes to initfa.c, initsw.c to ensure that a match to a
lower-case letter with -S gets exactly the same score as a match to an
'X'.  Changes to mmgetlib.c to work with 64-bit longs in *.xin files.

>>Feb 26, 2002

Fixes to doinit.c, initfa.c, initsw.c to allow DNA matrices using the
"-s dna.mat" option.  A new matrix, "d50ry.mat" is available that
scores +5 for a match, -2 for a transition, and -5 for a
transversion. "d50ry.mat" corresponds to DNA PAM50 with transitions
twice as common as transversions.  When "-s dna.mat" is used, "-n"
MUST be used as well.

Query sequence names ("aa", "nt") should be more accurate.

>>Feb 22, 2002

Fix to getseq.c to allow "plain" sequence files.

>>Feb 12, 2002

Minor fix to res_stats.c.

>>Jan 28, 2002

Fixes to resurrect res_stats.c.  res_stats (cc -o res_stats
res_stats.c scaleswn.c -lm) takes the output from a current "-R
file.res" file and calculates statistical significance - this allows
one to take exactly the same set of scores (and lengths) and calculate
statistical estimates using different strategies.

>>Jan 24, 2002

modifications to mmgetlib.c, ncbl2_mlib.c to more robustly read memory
mapped files (*.xin, map_db) on machines lacking "native" 64-bit
longs.  If the machine provides some definition for a 64-bit long
(e.g. "long long", "int64_t"), things should work. 64-bit offsets into
memory mapped files work properly on Alpha, SGI, i386 Linux, and
MacOSX.  The current implementation depends either on 64 bit longs
(Compaq Alpha's pre 4.0G) or the <sys/inttype.h> file.  Makefile,
Makefile.alpha, and Makefile.linux have been modified.

Modifications to nmgetlib.c, mmgetlib.c to provide GI numbers and
Accession versions for Genbank searches.  If the GI:123456 number is
available, it will be used and the description line will be formatted:

	gi|123456|gb|ACC1234.1|LOCUS description

This should help FAST_PAN runs, where the version of a sequence
changes frequently.

>>Jan 10, 2002

Modifications to p2_complib.c, p2_workcomp.c to more reliably allocate
space for library sequence descriptions on the master and workers.

>>Jan 2-3, 2002		CVS fa34t10c/fa34t10d3

Fixes to comp_lib.c to support Macintosh and Windows/Turbo-C
compilation.  New Makefile.tc.  Macintosh version supports both
"Classic" and "Carbon" environments.

"<values.h>" has been replaced with the more modern "<limits.h>"

Fixes to p2_complib.c to support n_libstr (libstr length) in GETLIB().

comp_thr.c, complib.c removed.

>>Dec 16, 2001

Complete integration of comp_mlib.c with both the unthreaded and
threaded programs.  Comp_mlib allows fasta34 and fasta34_t to compare
a database with a second database, just as pv34compfa does.  Using
multiple queries with fasta34_t is not as efficient as pv34compfa (and
it cannot use networks of Unix workstations), but it is much easier to
use and install.

With the comp_mlib.c option, fasta34 cannot automatically recognize
DNA sequences, just as pv34compfa no longer recognizes DNA sequences.
You must use the "-n" option to search with DNA sequences.  The other
programs (fastx34, tfastx34, etc) "know" the type of the query and
database sequences, so "-n" is only required for fasta34(_t).

>>Dec 14, 2001		CVS tag fa34t10b

Fix problems reading DNA databases in blast2 format.

>>Dec 11, 2001

Changes to spam() in dropnfa.c so that, for DNA sequences, the
previous behavior for finding the boundaries of a local alignment
region use the same algorithm as previous versions of fasta.  For
protein sequences, the algorithm will extend the local region beyond
the "ktup" boundaries if a better score can be found.  For DNA
sequences, this raises the noise rather than increasing sensitivity,
so it is turned off and "ktup" boundaries are respected.  The old,
"ktup" boundary algorithm is available with -DNOSPAM_EXT.

This version also includes a working res_stats.c, which can be used to
test various statistical estimates on exactly the same set of scores.

Fixed problems with -m 9 percent identity for fastx/fasty/tfastx/tfasty.
These errors have been present since -m 9 was implemented.

>>Dec 10, 2001

Fix to map_db.c to work correctly with files > 2 Gb when 64-bit longs
are available.  It is not yet designed to work with ftello() and other
offset types.

>>Nov 11,21, 2001	CVS tag fa34t10a, fa34t10d1

Substantial changes to revcomp(), getseq(), and other functions to
correct problems with -S on DNA sequences.  Sequences with lower case
nucleotides were not recognized or reverse complemented properly.

Fix to dropnfa.c (v34t07, Nov 21, 2001) bg_align() to re-initialize
static globals - this fixes a problem encountered with pv34compfa.  A
new main program, comp_mlib.c has been added to the CVS archive,
although it is not referenced in any of the Makefile.  comp_mlib.c
works like p2_complib.c and compares a library against another
library.

>>Nov 4, 2001

Change to dropnfa.c spam () while(1) -> while(lpos <= dmax->stop).
This fixes a problem with ktup=1 on Suns only, so far.

>>Oct 4, 2001		CVS tag fa34t10

Add comp_lib.c file, which merges complib.c (unthreaded) and
comp_thr.c (threaded) code into one file.

Modifications to nmgetlib.c, mmgetaa.c to allow Genbank flatfile
format without DESCRIPTION or ACCESSION lines.

Additional fix for -S with ktup=1.

>>Sept. 24, 2001

Fix to have correct gap-penalties for short scoring matrices with
tfastx/fastx.

>>Sept. 10, 2001	CVS tag fa34t05d6

Fix a bug introduced by -S fix in fa34t05d5.  Also, try to remove
changes in p34compfa compared to pv4compfa output.

>>Sept. 6, 2001		CVS tag fa34t05d5

Fix the -S dropnfa/fx/fz2 bug that was not actually fixed in
fa34t05d4.  Incorporate the correct scaleswn.c refered to in
fa34t05d4.

>>Sept. 5, 2001		CVS tag fa34t05d4

Fix problem with m_msg.quiet that prevented interactive prompts for
ktup, file name, etc with threaded programs.

Fix serious bug in dropnfa.c/dropfx.c/dropfz2.c that caused -S to work
improperly on sequences with effective length of 3 or less.

Change to scaleswn.c to make mle_cen(), mle_cen2() more robust to cases
where the top and bottom scores are the same.

Change p2_complib.c to avoid compiler complaints with (void *)wstage2p=NULL
on some platforms.

>>Aug. 30, 2001		CVS tag fa34t05d3

Fixed problem with uthr_subs.c for Suns, but changed Makefile.sun to
use pthreads rather than Sun Unix threads.  Removed SQL stuff from
Makefile.mpi4/pvm4 and added Makefile.mpi4_sql/pvm4_sql.  

fa34t05d2 - fix to map_db.c to provide *sascii.

fa34t05d1 - fixes to ibm_pthr_subs.c and Makefile.ibm from IBM.

>>Aug. 20, 2001		CVS tag fa34t05d0

The pvm/mpi complib programs have been substantially updated with
release 3.4.  See readme.v34t0 for more information.  With version
3.4, the MPI programs are mp34comp*, mu34comp*, etc.

A major effect of this change is to disable automatic sequence type
(protein/DNA) recognition with pv34compfa/mp34compfa.  By default,
protein libraries are assumed.  Thus, pv34compfa/mp34compfa require
the "-n" command line option when running pv34compfa/mp34compfa on DNA
sequence libraries.  This issue does not occur with the other
programs, which will recognize the appropriate sequence type, because
it is determined by the program (e.g. pv34compfx requires
DNA:protein).

Fixed substantial problem with 64-bit file offsets for Linux in
complib.c/comp_thr.c, p2_complib.c.  This problem, solved by Doug
Blair, was preventing the threaded versions from working properly in
memory mapped mode.

In all earlier versions of fasta, when very long sequences were
searched, the sequence length reported was that of the "chunk" that
was actually searched (typically 80,000-query_length) rather than the
actual library sequence length.  The peculiar behavior now changed,
and the full length of the library sequence, not the sequence chunk,
is reported as the library sequence length.  Note that chunks are
still used, however, which can cause the same alignment to be shown
twice.  In addition, the "-m 9" output format has changed to report
the coordinates of the query and library sequence (see below), which
may be different from 1-sequence_length because the the query and
library sequences may have been extracted from larger sequences.  Four
additional fields have been added, "pn0", "px0","pn1", "px1" that are
the positions in for the beginning (pn0/1) and end (px0/1) of they
query/library sequence.  pn0/1 would typically be changed with the
"@C:#" directive, described below.

Changes to doinit.c/initfa.c/initsw.c to provide a new function -
f_lastenv() - that allows function-specific adjustments to parameters
after the command line options have been read but before the first
sequence is read.  This change solved problems with "mp/pv34compfx -S".

fasts34/tfasts34 now recognize that 'I/L' are the same, as are 'Q/K'
(which are apparently indistinguishable by Mass-Spec).  The latter
identity is on by default, but can be turned off with "-h 0".

The MPI/PVM versions of the programs have been tested extensively with
compfa, compfx, and comptfx.  Makefile.mpi4 now works properly.
Changes to p2complib.c to support the PVM option "-T 1-4", which
allows one to run on nodes 1-4 of a (presumably larger) PVM virtual
machine.  This option has no effect on the mp34comp* programs.  The
old "-T 4" to run on 4 nodes, is also available.  If each node has 2
cpu's, as indicated in the "pvmd hostfile", both CPU's will be used
for a total, in this example, of 8 processes. This allows one to
specify a large PVM machine and use separate parts of it
independently.

Changes to nmgetlib.c to fix problems with longer dates in GCG files
(Y2K).  Fixes to faatran.c for extended alphabets and 'X's.  Various
code clean-ups to make "gcc -Wall" a little bit (not much) happier.

This is the first distributed fasta34 version.

================
>>Aug 9, 2001		CVS tag fa34t05

Corrections to initfa.c to allow -S to work with tfastx/y.
Fix to manshowbest.c for query position with -m 9.

>>July 18, 2001		CVS tag fa34t04

Various changes to complib.c, comp_thr.c, p2_complib.c, showbest.c,
showalign.c to deal with overlapping alignments in long sequences that
have been segmented.  When long sequences are segmented (lcont>0), the
eventual total length (n1tot_v) is saved at beststr->n1tot_p.  If
there was no lcont, then beststr->n1tot_p = NULL, and beststr->n1
should be used as the sequence length.  This has the advantage of
requiring space only when long sequences are encountered, and
requiring only one integer for several segments.

m_msg.noshow has been removed.

The -m 9 format has been changed - 5 fields have been added, 4
(pmn0/pmx0/pmn1/pmx1) provide the beginning and end coordinates of the
query and library sequence; the last (fs) reports the number of
frameshifts.  The names of the alignment boundaries have been changed
from min0/max0/min1/max1 to amn0/amx0/amn1/amx1 (Alignment miN/maX).

The SQL format has been extended to provide for statements that do
things but do not generate results, such as creating and selecting into a temporary table, e.g.:
================
    do
    create temporary table seq_pos (
    id int unsigned not null auto_increment primary key,
    prot_id int unsigned not null default 0,
    start int unsigned not null default 0,
    length int unsigned not null default 0,
    )
    ;
    do
    insert into seq_pos (prot_id, start, length)
      select id, 11, len-10
      from protein, annot
      where len > 100
      and annot.protein_id = protein.id
      and annot.pref=1
    ;
    select seq_pos.id,
       substring(protein.seq, start, length),
       concat("@C:", start, " ", descr)
    from protein, seq_pos, annot
    where protein.id = annot.protein_id
      and protein.id = seq_pos.prot_id
      and annot.pref = 1
    ;
    select prot_id,
       concat("@C:", start, " ", descr)
    from seq_pos, annot
    where annot.protein_id = seq_pos.prot_id
      and seq_pos.id = #
      and annot.pref = 1
     ;
================

  In the current implementation, these statements must start with "DO"
as the first two characters on the line, and come immediately after a
line ending with ';'.  The text from "DO" to the next ";", excluding
the "DO", is executed when the database connection is made.

===== >>July 12, 2001

The allocation of the work_info data structure used to send
information to the worker threads has been changed.  The old method
worked, possibly by accident.

A bug in p2_complib.c that caused E()-values to be calculated
improperly for the first query sequence has been fixed.

>>July 11, 2001	--> fa34t02

It is now possible to specify output coordinates in library sequences
by including the string: "@C:number" on the description line, e.g.

   >gtm1_human gi|12345 human glutathione transferase M1 @C:21

would label the first residue in the library sequence "21" rather than
"1".  This capability has been included to provide accurate
coordinates for searches done against subsequences generated by an SQL
query.  For example, one could use a query of the form:

 SELECT protein.id, substring(protein.seq,11,length(protein.seq)-20),
	concat(protein.name," @C:11 ",protein.descr)
 FROM protein;

to generate a sequence set with each sequence starting with residue
11.  Without the "@C:11" option on the description line, the program
would number the alignment positions starting at 1, even though the
first residue of the sequence really started at 11.  "@C:11" allows
one to correct the coordinate system.

Currently, "@C:offset" is available only with library type 1 (fasta
format) and 16 (mySQL).

The SQL-generated database with "@C:offset" can be used with both the
fast*34(_t) programs and with pv34comp*.  However, the SQL syntax is
used differently in the fasta34 and pv34compfa programs.  fast*34(_t)
requires three SQL statements during a search: (1) a statement to
generate a large set of library sequences; (2) a statement to generate
a description of a single sequence, given a unique identifier provided
by (1); and (3) a statement to generate a single sequence given a
unique identifier provided by (1).  For fast*34 searches, the third
(3) SQL statement must provide the "@C:offset" information in the
third results field for the offset to be used.  It is optional in (1)
and (2).

The pv34comp* programs only require one SQL statement, statement (1)
above, which must provide three fields, a unique identifier, the
sequence, and a complete description that must include "@C:offset" if
substrings are used.  If SQL queries (2) and (3) are provided, they
are  ignored.  Thus, the same files can be used by both programs, but
the "@C:offset" is required in different SQL queries by the fast*34
and pv34comp* programs.

Other changes:

Re-incorporation of GAP_OPEN option; fix to Altschul-Gish stats when
GAP_OPEN is used.

Re-incorporation of A. Mackey's spam() improvement in dropnfa.

Fixes to include file ordering to allow fast*34(_t) pv34comp* programs
to compile.

Fix to lascii[] for SQL database queries.

Fix to an old bug in comp_thr.c to send individual worker_info
structures to threads (does not fix LINUX threads problems, however).

=====
>>July 9, 2001

Considerable changes to support no-global library functions. 

(1) Separate ascii/sequence mapping arrays are used by the
    query-reading (qascii), library-reading (lascii), and sequence
    comparison function (pascii) routines.  As a result, there is no
    longer a need for tgetlib.o/lgetlib.o - lgetlib.o can serve both
    functions.

(2) This also allows us to remove all #ifdef TFAST/FASTX conditionals
    from complib.c/comp_thr.c/p2_complib.c.  We no longer need
    tcomp_thr.o, comp_thrx.o, etc.  We still have a variety of
    p2_complib.o variations to support the different c34.work* files.

(3) Because non-global openlib/getlib functions are available, exactly
    the same open/get functions are available for reading both the
    query and reference libraries in pv34comp* programs.  The
    host-specific openlib/getlib functions in hxgetaa.c are now
    provided by nmgetlib.c, etc. This has two effect:

    (a) it is now possible to compare a query database generated by an
        SQL query to a library database generated by a different SQL
        query.

    (b) pv34comp* has lost (at least in this version) the ability to
        automatically detect the query sequence type. To search with a
        DNA query, you MUST use "-n".

(4) the resetp() function is now responsible for almost all of the
    function sepcific (TFAST/FASTX/etc) initializations.  All of the
    function specific code has been removed from complib.c/comp_thr.c
    and most of it has been moved to initfa.c/resetp().

(5) manageacc.c has been merged into compacc.c (mostly prhist()).

=====
>>June 1, 2001

Many changes to accommodate a new - no global variable - strategy for
reading sequence databases.  Every time a file is opened, a struct
lmf_str is allocated which can be used for memory mapped files, ncbl2,
files, and mysql files.

In addition, an open'ed file has a default sequence type: DNA or
protein, or one can open a file in a mode that will allow the sequence
type to be changed.

=====
>>May 18, 2001		CVS: fa33t09d0

A new compile time parameter - -DGAP_OPEN, is available to change the
definition of the "-f gap-open" parameter from the penalty for the
first residue in a gap to a true gap-open penalty, as is used in BLAST
and many other comparison algorithms.  This will probably become the
default for fasta in version 3.4.

Fixes to conflicts between "-S" and "-s matrix".  When a scoring
matrix file was specified, lower-case alignments were not displayed
with -S (although the scores were calculated properly).

More extensive testting of mysql_lib.c (mySQL query-libraries) with
the pv4comp* and mp4comp* programs.

=====
>>April 5, 2001		CVS: fa33t08d4b3

Changes in nmgetlib.c and ncbl2_mlib.c to return long sequence
descriptions for PCOMPLIB (pv4/mp3comp*).  Also fix p2_complib.c to
request DNA library for translated comparisons.

Fix for prss33(_t) to read both sequences from stdin.

=====
>>March 27, 2001	CVS: fa33t08d4

Modifications to allow 64-bit fseek/ftell on machines like Sun,
Linux/Intel, that support -D_FILE_OFFSET_BITS=64, -D_LARGE_FILE_SOURCE
off_t, and fseeko(), ftello() with the option -DUSE_FSEEKO.  Machines
with 64-bit long's do not need this option.  Machines with 32-bit
longs that allow files >2 Gb can do so with 64-bit file access
functions, including fseeko() and ftello(), which work with off_t file
offsets instead of long's.

=====
>>March 3, 2001		CVS: fa33t08d2

Corrected problems in nmgetaa.c and mysql_lib.c with parallel
programs, and one serious problem with alternate DNA scoring matrices
(initfa.c, initsw.c) not being set properly.  A subtle problem with
the merge of scaleswn.c and scaleswg.c is fixed.

>>February 17, 2001

Modified mysql_lib.c to use "#", rather than "%ld", to indicate the
position of the GID.  This change was made because sprintf() cannot be
used reliably to generate an SQL string, as '"' and '%' are used in 
such strings.

=====
>>January 17, 2001
(no version change, date change)

Minor fixes to initfa.c, initsw.c to deal with DNA scoring matrices
properly. "-n -s dna.mat" is required for the sequence/matrix to be
recognized as DNA.

>>January 16, 2001
-->v34t00

Merge of the main CVS trunk - fa33t06 with the latest release branch,
fa33t08.

In addition, PCOMPLIB mods have been made to mysql_lib.c.  Because
p2_complib.c gets sequence description information during the first
read of the database, the mysql_query must be changed to return:
result[0]=GID, result[1]=description, result[2]=sequence.  In the
PCOMPLIB case, the other SQL queries (for GID description, sequence)
are not necessary but must still be provided.