website/archive/binaries/mac/src/fasta34/readme.v34t0

   1
   2  $Name: fa_34_26_5 $ - $Id: readme.v34t0,v 1.167 2007/04/26 18:42:43 wrp Exp $
   3
   4 >>April 26, 2007
   5
   6 Modify scaleswn.c to prevent mle_cen() from hanging when it fails to
   7 converge.  Also, free() more arrays in work_thr.c; initialize
   8 m_msg.hist.entries=0 in comp_lib.c, and various clean-ups for a_res
   9 encoded alignments.
  10
  11 >>March 22, 2007
  12
  13 Update faatran.c genetic codes (and documentation on -t option).  Update
  14 ncbl2_mlib.c to parse non-NCBI format 12 databases better.
  15
  16 >>March 21, 2007        fasta-34_26_2
  17
  18 Fix conflict between "-S" "-s matrix.file".
  19
  20 >>February 26, 2007     fasta-34_26_2
  21
  22 Fix problem with dropfs2.c (curv.start = lpos before initialized).
  23
  24 >>January 12, 2007
  25
  26 Fix a problem with pssm_asn_subs.c reading strings (sequences) longer
  27 than 1024 bytes.
  28
  29 Remove searchfa.cgi, searchnn.cgi, cgi-lib.pl, my-cgi.pl - this code
  30 was used for an ancient FASTA WWW implementation and has been replaced
  31 by the FASTA_WWW package.
  32
  33 FASTA Version numbers are being modified to make releases easier to
  34 track, thus fa34t26b5 has become fasta-34_26_1.  I would prefer to use
  35 decimal versions, but CVS does not allow '.' in tags.
  36
  37 >>January 4, 2007       fasta-34_26_1
  38
  39 Include scripts for building Mac OS X Universal binaries on a PPC
  40 machine.  Programs are compiled first with Makefile.os_x (gcc-3.3 for
  41 PPC) and then installed into ./ppc/.  Programs are next compiled with
  42 Makefile.os_x86 for i386, and the resulting executables installed into
  43 ./i386/.  Finally, the "make_osx_univ.sh" script is run to build the
  44 universal binaries from the two executables using "lipo".
  45
  46 >>December 12, 2006
  47
  48 Fix some problems with p2_workcomp.c: (1) no longer initialize pad
  49 characters for non-existant sequences. (2) deal with small libraries
  50 consistently with the serial versions.
  51
  52 >>November 17, 2006     fa34t26b5
  53
  54 Fixed a problem reading ASN.1 format 2 PSSM's.  It is now possible to
  55 download a PSI-BLAST PSSM RID and search properly.  Next, the query
  56 sequence from the PSSM should be used instead of the provided query
  57 sequence, so that the query sequence is ignored.
  58
  59 >>October 19, 2006      fa34t26b4
  60
  61 Fixed problem with SSE2 code when PSSM's are used.
  62
  63 >>October 6, 2006       fa34t26b3
  64
  65 A new set of WIN32 programs is now available that use the Intel C++
  66 9.1 compiler, rather than the much older Borland Turbo-C compiler. All
  67 of the unthreaded programs that are part of the Unix and MacOSX FASTA
  68 distributions are now available.  Threaded (multiprocessor) versions
  69 of the program as available as well, as are sse2 accelerated versions
  70 of ssearch34 (ssearch34sse2.exe, ssearch34sse2_t.exe).
  71
  72 Th new WIN32 code also uses Microsoft's "nmake" program to build the
  73 programs, which allows much greater consistency between the Unix and
  74 Windows versions.
  75
  76
  77 >>September 18, 2006
  78
  79 Static global alignment variables removed from dropnfa.c, dropfx.c,
  80 dropfz2.c.  dropnfa.c, dropfx.c and dropfz2.c should be thread safe.
  81 Together with the earlier changes, all the FASTA functions should now
  82 be thread safe during the alignment process.
  83
  84 >>August 17, 2006
  85
  86 Begin removal of static variables from Smith-Waterman alignment
  87 functions.  These variables kept the functions from being thread-safe.
  88 Now dropgsw.c and dropnsw.c are thread-safe.
  89
  90 >>August 15, 2006       fa34t26b2
  91
  92 Fixed a problem with pv34compfx/mp34compfx (and fy) producing
  93 improperly labeled alignments and de-allocating memory for the reverse
  94 complement.
  95
  96 >>July 18, 2006
  97
  98 The library file name parsing programs now provide the option for
  99 environment variable substitions.  For example, SLIB2=/slib2 as an
 100 environment variable (e.g. export SLIB2=/slib2 for ksh and bash), then
 101
 102         fasta34 -q query.aa '${SLIB2}/swissprot.fa'  expands as expected.
 103
 104 While this is not important for command lines, where the Unix shell
 105 would expand things anyway, it is very helpful for various
 106 configuration files, such as files of file names, where:
 107
 108         <${SLIB2}/blast
 109         swissprot.fa
 110
 111 now expands properly, and in FASTLIBS files the line:
 112
 113         NCBI/Blast Swissprot$0S${SLIB2}/blast/swissprot.fa
 114
 115 expands properly.  Currently, Environment variable expansion only
 116 takes place for library file names, and the <directory in a file of
 117 file names.
 118
 119 >>July 14, 2006   fa34t26b1
 120
 121 Updated Farrar smith_waterman_sse2.c code to address possible bug
 122 (code from Michael Farrar).  Include <sunmedia_intrin.h> for
 123 compilation with Sun compiler with Makefile.sun_x86.
 124
 125 >>July 2, 2006    fa34t26b0
 126
 127 This release provides an extremely efficient SSE2 implementation of
 128 the Smith-Waterman algorithm for the SSE2 vector instructions written
 129 by Michael Farrar (farrar.michael@gmail.com).  The SSE code speeds up
 130 Smith-Waterman 8 - 10-fold in my tests, making it comparable to Eric
 131 Lindahl's Altivec code for the Apple/IBM G4/G5 architecture.
 132
 133 The Farrar code is largely confined to smith_waterman_sse2.c and
 134 smith_waterman_sse2.h, which are copyright (2006) by Michael Farrar,
 135 and cannot be redistributed without his permission.  Mr. Farrar has
 136 agreed to provide his code under the same policy used by FASTA -
 137 e.g. the code can be used without permission, but not redistributed.
 138
 139 The Farrar code uses GCC version 4.0 SSE2 intrinsic functions to avoid
 140 assembly language code.  Unfortunately, in my hands, "gcc -O3" causes
 141 "out of memory" errors, and other problems, so "gcc -O" is used instead.
 142
 143 >>June 23, 2006   fa34t25d10
 144
 145 Modifications to comp_lib.c, compacc.c, and other files to ensure that
 146 function-specific MAXTOT values are used properly.  MAXTOT is now
 147 available as m_msg.max_tot, which is set in initfa.c (m_msg.max_tot =
 148 MAXTOT) to ensure that functions that need very large MAXTOT values
 149 (e.g. TFASTX) can get them.  tfastx can now search successfully with
 150 titin, a 27,000 residue protein.
 151
 152 Other changes have been made to accomodate long query sequences.
 153
 154 A serious bug was found in fastx34(_t) that caused alignment
 155 coordinates to be calculated improperly when the DNA sequence was much
 156 longer than the protein sequence.
 157
 158 >>May 31, 2006  fa34t25d9
 159
 160 Fixed some problems with fasts/fastf alignments when -m 9 options were
 161 used.  Unlike the other algorithms, the a_res structure does not
 162 capture all the information to re-produce an alignment, so do_walign
 163 now sets bptr->have_ares to indicate whether the a_res structure is
 164 valid.
 165
 166 Various problems with bad library names, and short query titles were
 167 also fixed.
 168
 169 Updated version number/date on all drop*.c functions.
 170
 171 >>May 24, 2006  fa34t25d8
 172
 173 Revised code for NCBI *.pal/*.nal databases has been tested on all
 174 architectures, including Windows.
 175
 176 In addition, support for ASN.1 PSSM:2 files provided by the NCBI
 177 PSI-BLAST WWW site is included.  This code will not work with
 178 iteration 0 PSSM's (which have no PSSM information).  For ASN.1
 179 PSSM's, which provide the matrix name (and in some cases the gap
 180 penalties), the scoring matrix and gap penalties are set appropriately
 181 if they were not specified on the command line. ASN.1 PSSM's are type 2:
 182         ssearch34 -P "pssm.asn1 2" .....
 183
 184 >>May 18, 2006
 185
 186 Support for NCBI Blast formatdb databases has been expanded.  The
 187 FASTA programs can now read some NCBI *.pal and *.nal files, which are
 188 used to specify subsets of databases.  Specifically, the
 189 swissprot.00.pal and pdbaa.00.pal files are supported.  FASTA supports
 190 files that refer to *.msk files (i.e. swissprot.00.pal refers to
 191 swissprot.00.msk); it does not currently support .pal files that
 192 simply list other .pal or database files (e.g. FASTA does not support
 193 nr.pal or swissprot.pal).
 194
 195 In the process of providing this support, the routines used to read
 196 ASN.1 binary formatdb files were substantially improved.  It is now
 197 possible to see multiple description lines for a single sequence.
 198
 199 IS_BIG_ENDIAN has been removed from all of the Makefiles.  The code
 200 now looks for the definition of __BIG_ENDIAN__ or _BIG_ENDIAN to
 201 decide whether the architecture IS_BIG_ENDIAN.  If, for some reason,
 202 one of these macros is not defined on a BIG_ENDIAN architecture, then
 203 -DIS_BIG_ENDIAN is required.
 204
 205 >>May 12, 2006  CVS fa34t25d7
 206
 207 Corrected serious problem with coordinate display calculation for
 208 fasta34 and ssearch34 - in some cases the coordinates and alignment
 209 symbols were off by the length of the context (typically 30 residues).
 210
 211 Added capability to read ASN.1 binary PSSM information.  This
 212 information is provided (in an encoded form) from the NCBI PSI-BLAST
 213 WWW site.  (What is actually provided from the WWW site is a bzip2-ed
 214 binary file that is converted to ASCII HEX.  The ASCII HEX file must
 215 be converted to binary, and then bunzip'ed. This bunzip-ed file is
 216 binary ASN.1.)  These files can also be generated by
 217
 218  blastpgp -J T -C pssm.asn1_bin -u 2
 219
 220 I am parsing the ASN.1 binary manually, not using the NCBI toolkit, so
 221 there may be some files that are not parsed properly - if so, let me
 222 know.
 223
 224 (May 12, 2006 - The NCBI changed the format of the psi-blast ASN.1
 225 PSSM - and has not yet provided documentation of the new structure, so
 226 this code does not work. It does work with blastpgp v 2.2.13, but not
 227 with the web site version 2.2.14.  A fix was provided 24-May-2006)
 228
 229 >>April 18, 2006
 230
 231 Small modification in mshowbest.c to provide more consistent display
 232 widths with -m 9i in list of best hits.
 233
 234 >>April 11, 2006 CVS fa34t25d6
 235
 236 Corrected a problem introduced with the new, more efficient method for
 237 displaying alignments.  For the tfast* programs, which must translate
 238 the library sequence, translations were not done when alignments were
 239 re-displayed.
 240
 241 Corrected an older problem with tfastx34 against very long sequence
 242 databases - the code to more efficiently do the display alignment did
 243 not use the correct sequence coordinates.
 244
 245 Modifications to dropfs2.c to ensure that exact peptide matches are
 246 captured more frequently.
 247
 248 >>March 16, 2006 CVS fa34t25d5
 249
 250 Change to initfa.c to allow lower case DNA libraries using the
 251 -DDNALIB_LC compile time option.
 252
 253 Modify p2_complib.c, p2_worklib.c (and doinit.c, msg.h) to allow the
 254 -V annotation option for the parallel programs.  Also modify to allow
 255 specification of the query range (but only for the first query, like
 256 fasta34) for the parallel programs.
 257
 258 Modification of p2_workcomp.c to correct some problems presenting
 259 percent similarity.  Also correct unreleased bugs in the alignment
 260 routines that allow more efficient alignment re-calculation.
 261
 262 >>Nov 20, 2005
 263
 264 Changes to support asymmetric matrices - a scoring matrix read in from
 265 a file can be asymmetric.  Default matrices are all symmetric.
 266
 267 >>Oct 24, 2005
 268
 269 Modifications extended to p2_complib.c/p2_workcomp.c.  Incorporation
 270 of drop_func.h into p2_workcomp.c greatly simplifies things.  No
 271 changes in communication - struct a_res_str is internal to
 272 p2_workcomp.c.
 273
 274 Additional changes to do_walign() so that aln_func_vals() must be
 275 called to set llfact, qlfact, etc in a_struct aln before or after
 276 do_walign is called.  do_walign produces a_res_str a_res, which has
 277 all the information necessary to produce a calcons() or calc_code()
 278 alignment.
 279
 280 >>Oct 19, 2005 CVS fa34t26b0
 281
 282 Modifications to drop*.c and c_dispn.c to separate (and simplify) some
 283 of the alignment coordinate calculations.  Before, the "a_struct" had
 284 the coordinates of the alignment used in the display (seqc0, seqc1)
 285 AND in the original sequences (aa0, aa1), as well as other information
 286 used to calculate alignment coordinates.  In the new version, astruct
 287 coordinates always refer to seqc0,1, while a new structure, a_res_str,
 288 has coordinates for aa0, aa1 as well as the alignment encoding in res[nres].
 289 Eventually, this should make it possible to display multiple local
 290 alignments from the same two sequences.
 291
 292 In addition, the file "drop_func.h" has been added to the project, and
 293 is included by many of the files (all the drop*.c functions,
 294 mshowbest.c, mshowalign.c) to ensure that the various functions are
 295 declared and used consistently.
 296
 297 >>Sept 19, 2005 CVS fa34t25d4
 298
 299 Changes to support Mac OS 10.4 - Tiger (include sys/types.h in more
 300 files).  Documentation update for prss34/prfx34. Modifications to
 301 comp_lib.c to support prss34_t/prfx34_t.  Shuffle numbers for
 302 prss/prfx can now be specified by "-k #".
 303
 304 >>Sept 2, 2005
 305
 306 The prss34 program has been modified to use the same display routines
 307 as the other search programs.  To be more consistent with the other
 308 programs, the old "-w shuffle-window-size" is now "-v window-size".
 309
 310 prss34/prfx34 will also show the optimal alignment for which the
 311 significance is calculated by using the "-A" option.
 312
 313 Since the new program reports results exactly like other
 314 fasta/ssearch/fastxy34 programs, parsing for statistical significance
 315 is considerably different.  The old format program can be make using
 316 "make prss34o".
 317
 318 >>Aug 26, 2005
 319
 320 Modifications to save_best() in comp_lib.c to support prss34_t.  It
 321 did not work before.
 322
 323 >>July 25, 2005
 324
 325 Modify mshowbest.c to suppress gi|12345 in HTML mode.
 326
 327 >>July 18, 2005 CVS fa34t25d3
 328
 329 Modifications to Makefile.tc to support NCBI formatdb formats under
 330 Windows.
 331
 332 >>May 19, 2005  CVS fa34t25d2
 333
 334 Modifications to dropfs2.c to fix an obscure bug that occurred when
 335 correctly ordered peptides aligned one residue apart.
 336
 337 >>May 5, 2005 CVS fa34t25d1
 338
 339 Modification to the -x option, so that both an "X:X" match score and
 340 an "X:not-X" mismatch score can be specified. (This score is also used
 341
 342 give a positive score to a "*:*" match - the end of a reading frame,
 343 while giving a negative score to "*:not-*".
 344
 345 >>March 14, 2005  CVS fa34t25b4
 346
 347 Fixed some problems caused by padding characters required for
 348 Smith-Waterman ALTIVEC in the parallel (p2_complib.c, p2_workcomp.c)
 349 versions.
 350
 351 >>Feb 24, 2005  CVS fa34t25b3
 352
 353 Changes to comp_lib.c (and Makefile.pcom) to support prss34_t.
 354
 355 >>Feb 12, 2005
 356
 357 Modify dropfs.c to dynamically allocate space for alignments, so that
 358 queries with a large number of fragments can still place all the
 359 fragments on the alignment.  Also fix a problem produced by removing
 360 -DBIGMEM from most of the Makefile's, but not fixing defs.h to use
 361 BIGMEM sizes by default.
 362
 363 >>Jan 24, 2005
 364
 365 Include a new program, "print_pssm", which reads a blastpgp binary
 366 checkpoint file and writes out the frequency values as text.  These
 367 values can be used with a new option with ssearch34(_t) and prss34,
 368 which provides the ability to read a text PSSM file.  To specify a
 369 text PSSM, use the option -P "query.ckpt 1" where the "1" indicates a
 370 text, rather than a binary checkpoint file.  "initfa.c" has also been
 371 modified to work with PSSM files with zero's in the in the frequency
 372 table.  Presumably these positions (at the ends) do not provide
 373 information. (Jan 26, 2005) blastpgp actually uses BLOSUM62 values
 374 when zero frequencies are provided, so read_pssm() has been modified
 375 to use scoring matrix values for zero frequencies as well.
 376
 377 >>Jan 13, 2005
 378
 379 Change to initfa.c to have fasts34 do a protein comparison by default,
 380 rather than an unknown sequence type.  Automatic checking for fasts34
 381 does not work reliably, because queries can be very short.  Likewise
 382 for fastm34.  [Jan 26, 2004] Undo this change, which broke DNA
 383 comparison when "-n" was specified.
 384
 385 >>Jan 7, 2005
 386
 387 Changes to tatstats.h, dropfs2.c to allow larger numbers of peptides
 388 to match when fasts is used to show coverage on a proteomics
 389 experiment.  Previously fasts could match no more than 30 peptides,
 390 that has been increased to 50.  In addition, ktup=2 can be used
 391 to increase the likelihood that short exact matchs trump longer
 392 mismatched regions.
 393
 394 >>Nov 11, 2004     CVS fa34t25
 395
 396 Finished merge of earlier fa34t24 branch with HEAD.  Correct
 397 labeling of TFASTM.
 398
 399 >>Nov 4-8, 2004
 400
 401 Incorporation of Erik Lindahl "anti-diagonal" Altivec code for
 402 Smith-Waterman, only.  Altivec SSEARCH is now faster than FASTA for
 403 query sequences < 250 amino acids.
 404
 405 Small modifications to output score display to ensure that the correct
 406 scores are shown, and that they are correctly labeled.
 407
 408 >>Aug 25,26, 2004  CVS fa34t24b3
 409
 410 Small change in output format for p34comp* programs in
 411 ">>>query_file#1 string" line before alignments.  This line is not present
 412 in the non-parallel versions - it would be better for them to be consistent.
 413
 414 Change in last_stats.c to properly label fasts statistics with -z != 1.
 415
 416 Change in dropfs2.c to ensure that tatprobs are not precalculated with -z 4.
 417
 418 Modify -m 9i output option to show in HTML output.
 419
 420 Add "#ifdef NOOVERHANG" to dropfs2.c that causes overlapping
 421 alignments to score a 0, rather than the partial overlap score.
 422 Useful for SAGE alignments, because "fasts" requires global alignments
 423 (except for for overhangs, unless NOOVERHANG is defined).
 424
 425 >>Aug 23, 2004
 426
 427 Fix problem with very long definition lines with formatdb version4
 428 ASN databases.  Fix mshowalign.c to re-enable "-L" option.
 429
 430 >>July 28, 2004
 431
 432 Fix to re-enable -w window shuffle for PRSS.  Modify comp_lib.c
 433 for PRSS to ensure that the unshuffled score and probability
 434 are shown, even for very high probabililty alignments.
 435
 436 >>July 21, 2004
 437
 438 Modifications to support PostgreSQL databases with the same commands
 439 as MySQL databases.  MySQL database libraries are type 16, PostgreSQL
 440 are type 17.  Makefile.linux_sql and Makefile.pvm4_sql support both
 441 database types simultaneously.
 442
 443 >>June 23, 2004 CVS fa34t24b2
 444
 445 Additional fixes to enable -n or -p with fasts34 and
 446 fastm34. Makefile.pcom was fixed for fastm34_t.  A new file,
 447 mgstm1.nts, of DNA fragments from mgstm1.seq, is included for testing
 448 fasts34 and fastm34.
 449
 450 >>May 4, 2004
 451
 452 Fixes to initfa.c to allow DNA:DNA for FASTS, FASTM.  This change
 453 introduced a bug that broke FASTS completely, but was fixed June 18,
 454 2004 (and retagged fa34t24b2).
 455
 456 >>April 23, 2004 CVS fa34t24b1
 457
 458 Fix bug in initfa.c that caused tfasts/tfastf not to examine all six
 459 frames.
 460
 461 >>May 4, 2004
 462
 463 Fixes to initfa.c to allow DNA:DNA for FASTS, FASTM.
 464
 465 >>March 19, 2004 CVS fa34t24b0
 466
 467 Modify all the drop*.c files, plus mshowbest.c and mshowalign.c, to
 468 display percent similarity, rather than percent ungapped.  An
 469 alignment is counted as similar if the score is greater than or equal
 470 to zero (the same criterion used for placing ".". To disable this
 471 change, remove -DSHOWSIM from the appropriate Makefile.*.
 472
 473 >>March 18, 2004 CVS fa34t23b8
 474
 475 Fix bug in initfa.c tables that caused prss to generally compare
 476 proteins.
 477
 478 >>March 15, 2004
 479
 480 Fix bug in calls to revcomp(); make revcomp() guarantee NULL termination.
 481
 482 >>March 2, 2004 CVS fa34t23b7
 483
 484 Fix a very embarrassing and surprising bug that caused insertions
 485 in fasta alignments to appear in the wrong sequence.
 486
 487 >>Feb 7, 2004   CVS fa34t23b6
 488
 489 Change initfa.c to allow "-i" (reverse complement) and "-i -3" with
 490 "fastx34" and "prfx34".  In addition, "prfx34" now examines both query
 491 DNA strands in calculated the shuffled statistical significance.
 492
 493 >>Feb 5, 2004
 494
 495 Reverse assignments for G:U baseparing in initfa.c.
 496
 497 Fix memory allocation error caused by doubling DNA alignment width.
 498
 499 >>Jan 7, 2004   CVS fa34t23b5
 500
 501 Change in do_walign() in dropnfa.c to make final DNA alignments use a
 502 band that is 2X as large as the search band width.
 503
 504 >>Dec 22, 2003  CVS fa34t23b4
 505
 506 Fix typo in p2_complib.c that prevented compilation.  Fix problem
 507 with karlin.c for assymetrical matrices, such as used with -U.
 508
 509 >>Dec 10, 2003  CVS fa34t23b3
 510
 511 Fix problem in resetp()/initfa.c that disabled banded Smith-Waterman
 512 DNA alignments.
 513
 514 Allow spam() to do extended alignments for DNA if one of the sequences
 515 is < 50 nt.
 516
 517 Cause default ktup to drop for short sequences.  For protein < 50, ktup=1;
 518 for DNA < 20, 50, 100 ktup = 1, 2, 3, respectively.
 519
 520 >>Dec 7, 2003
 521
 522 A new option, "-U" is available for RNA sequence comparison.  "-U"
 523 functions like "-n", indicating that the query is an RNA sequence.  In
 524 addition, to account for "G:U" base pairs, "-U" modifies the scoring
 525 matrices so that a "G:A" match has the same score as a "G:G" match,
 526 and "T:C" match has the same score as a "T:T" match.  The asymmetric
 527 matrix required changes in dropnfa.c that were similar to the changes
 528 in dropgsw.c required for profiles.  In addition, m_msg.qdnaseq and pst.dnaseq
 529  can now be SEQT_DNA, SEQT_RNA, SEQT_PROT, SEQT_UNK, or SEQT_OTHER.
 530 m_msg.ldnaseq does not use SEQT_RNA, only SEQT_DNA.  A new member of
 531 struct pstruct: int nt_align, is used to indicate nucleotide
 532 alignments.
 533
 534 >>Nov 19, 2003
 535
 536 Changes to Makefile's to distinguish between tatstats_fs.o and
 537 tatstats_ff.o.
 538
 539 >>Nov 2, 2003
 540
 541 Substantial changes to comp_lib.c, p2_complib.c, mshowbest.c, and
 542 mshowalign.c to support more sophisticated display options.
 543 Previously, one could have only on "-m #" option, even though several
 544 of the options were orthogonal (-m 9c is independent of -m 1 and -m2,
 545 which is independent of -m 6 (HTML)).  The programs now use a bitmask
 546 that allows independent options to be combined.  In particular -m 9c
 547 can be combined with -m 6, which can be very helpful for runs that
 548 need HTML output but can also exploit the encoding provided by -m 9c.
 549
 550 The "-m 9" option now also allows "-m 9i", which shows the standard
 551 best score information, plus percent identity and alignment length.
 552
 553 >>Oct 26, 2003  CVS fa34t23b1
 554
 555 Additional fixes to Makefiles to enable tfastf34(_t).  Changes to
 556 support ossearch34 (a non-Phil Green optimized Smith-Waterman).
 557
 558 >>Oct 8, 2003   CVS fa34t23b0
 559
 560 Fixes to get DNA queries working in both directions, and to fix PCOMPLIB
 561 programs for "-V" option.  Currently, the parallel programs cannot use
 562 the "-V" option.
 563
 564 >>Sept 25, 2003
 565
 566 A new option is available for annotating alignments.  -V '@#?!'
 567 can be used to annotate sites in a sequence, e.g:
 568         >GTM1_HUMAN ...
 569         PMILGYWDIRGLAHAIRLLLEYTDS@S?YEEKKYT@MG
 570         DAPDYDRS@QWLNEKFKLGLDFPNLPYLIDGAHKIT
 571 might mark known and expected (S,T) phosphorylation sites.  These
 572 symbols are then displayed on the query coordinate line:
 573
 574                10        20    @?  30  @     40  @     50        60
 575 GTM1_H PMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLP
 576        ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
 577 gtm1_h PMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLP
 578                10        20        30        40        50        60
 579
 580 This annotation is mostly designed to display post-translational
 581 modifications detected by MassSpec with FASTS, but is also available
 582 with FASTA and SSEARCH.
 583
 584 >>Sept 22, 2003   CVS fa34t22b5
 585
 586 The Altivec Smith-Waterman code has been removed.
 587
 588 >>Sept 17, 2003   CVS fa34t22b4
 589
 590 A variety of different bugs have been fixed.  (1) All the functions in
 591 the old initsw.c are now in initfa.c; initsw.c will be removed.
 592 Specifically, the Profile/PSSM code is now in initfa.c.  initfa.c is
 593 now fully table driven. (2) various problems with prss34 and prfx34
 594 have been fixed in initfa.c.  (3) An additional ncbl2_mlib.c buffer
 595 overrun has been fixed. (4) fastf34 is now available in this package.
 596 Its performance is very similar to, but not identical to, fastf33.  I
 597 am tracking down the differences.  In general, the raw scores
 598 calculated by both programs are the same, but the statistical analysis
 599 seems to be slightly different.
 600
 601 >>July 30, 2003   CVS fa34t22b3
 602
 603 Fix bug in ncbl2_mlib.c that caused buffer overrun with blast/formatdb
 604 v3 description lines.
 605
 606 >>July 28, 2003
 607
 608 The initfa.c file has been substantially re-structured to use a
 609 table-driven approach to parameter setting, rather than the previous
 610 confusing combinations of #ifdef's.  Two tables of parameters are
 611 used, pgm_def_arr[] and msg_def_arr[], which specify values like the
 612 program name, reference, scoring matrix, default gap penalties, etc.
 613 msg_def_arr[] has the sequence types for the query, library, and
 614 algorithm, as well as other parameters (qframe, nframe, nrelv, etc),
 615 which greatly simplifies the sequence recognition logic.  ppst->pgm_id
 616 can be used to identify the program that is running.  Eventually,
 617 almost all of the program specific #ifdef's will be removed from
 618 initfa.c.  initfa.c now provides initsw.c functionality, so that
 619 initsw.c is no longer needed.
 620
 621 >>July 25, 2003
 622
 623 A new file is included - fasta.defaults - that lists the scoring
 624 matrix, gap penalty, and other defaults for all of the fasta34
 625 programs.  This file will be used soon to simplify parameter setting
 626 for the FASTA programs, and should also be used by Javascript WWW
 627 interfaces to the FASTA programs.
 628
 629 >>July 22, 2003    CVS fa34t22b2
 630
 631 Fixes to dropfs2.c, tatprobs.c to ensure that negative probabilities
 632 cannot occur.  Negative probabilities were never seen with standard
 633 matrices, but did occur with BL50.  Another optimization in dropfs.c
 634 considerably improves fasts34 performance in some cases.
 635
 636 Fix a problem with formatdb v4 ASN.1 format files.
 637
 638 >>July 12, 2003
 639
 640 Fix a bug that prevented "-L" (long sequence descriptions) from
 641 working.
 642
 643 >>July 9, 2003
 644
 645 Fix reverse complement (M:K) error.  Fix off-by-one error for FASTA
 646 DNA alignments that caused the first aligned residue pair to be
 647 missed.
 648
 649 >>July 4 - 8, 2003
 650
 651 Incorporate blast-def-line ASN.1 parsing so that NCBI formatdb version
 652 4 files can be read.
 653
 654 >>June 26, 2003
 655
 656 The strategy for displaying the match/mismatch line (" .:" for -m 0)
 657 has been changed dramatically to acommodate more sophisticated
 658 strategies for indicating conservative replacements, e.g. because of
 659 PSSM's.  In addition to seqc0 and seqc1, which hold the aligned
 660 sequences for display, there is also seqca, which holds the alignment
 661 symbol.  calcons(), do_show(), and discons() have all changed to
 662 include seqca.  calcons() is somewhat more complex; discons() is much
 663 simpler.  (June 29, 2003 - dropgsw.c calcons() now displays profile
 664 similarity accurately - it is very very illuminating.)
 665
 666 >>June 16, 2003 version: fasta34t22
 667
 668 ssearch34 now supports PSI-BLAST PSSM/profiles.  Currently, it only
 669 supports the "checkpoint" file produced by blastall, and only on
 670 certain architectures where byte-reordering is unnecessary.  It has not
 671 been tested extensively with the -S option.
 672
 673         ssearch34 -P blast.ckpt -f -11 -g -1 -s BL62 query.aa library
 674
 675 Will use the frequency information in the blast.chkpt file to do a
 676 position specific scoring matrix (PSSM) search using the
 677 Smith-Waterman algorithm.  Because ssearch34 calculates scores for
 678 each of the sequences in the database, we anticipate that PSSM
 679 ssearch34 statistics will be more reliable than PSI-Blast statistics.
 680
 681 The Blast checkpoint file is mostly double precision frequency
 682 numbers, which are represented in a machine specific way.  Thus, you
 683 must generate the checkpoint file on the same machine that you run
 684 ssearch34 or prss34 -P query.ckpt.  To generate a checkpoint file,
 685 run:
 686
 687 blastpgp -j 2 -h 1e-6 -i query.fa -d swissprot -C query.ckpt -o /dev/null
 688
 689 (This searches swissprot for 2 iterations ("-j 2" using a E()
 690 threshold 1e-6 saving the resulting position specific frequencies in
 691 query.ckpt.  Note that the original query.fa and query.ckpt must
 692 match.)
 693
 694 >>June 5, 2003
 695
 696 Fix to mshowbest.c to get -m 9 coordinates correct on reverse strand
 697 with pv34comp*.  Some additional fixes for prfx34.
 698
 699 >>May 22, 2003
 700
 701 Changes to llgetaa.c, getseq.c, comp_lib.c to provide a different
 702 library residue lookup table (sascii) for queries and libraries.  This
 703 allows one to make a prfx34 (like prss34, but using the fastx
 704 algorithm).  prfx34 is now available.
 705
 706 >>May 13,14 2003
 707
 708 Fixes to most of the drop*.c files, and mshowbest.c, to ensure that
 709 coordinates displayed with -m 9(c) and the final alignment are
 710 consistent.  They were consistent for fasta34/ssearch34/fasts34, but
 711 not for fastx34/fasty34.  The alignment coordinate system has been
 712 been revised for consistency in allthe drop*.c programs (coordinates
 713 used to be off-by-one for some, but not other functions).
 714
 715 Fixes to -m 9c for fasty34/pv34compfy.  In addition, a problem was
 716 fixed with fastx34/fasty34 that appeared with a protein sequence was
 717 considerably longer than the DNA query, e.g. an EST vs titin (26K
 718 residues).  This problem only appeared on pv34compfx/fy on Xserve's
 719 under OS_X; but it should improve fastx34/fasty34 performance with
 720 very long protein sequences on all platforms.
 721
 722 >>May 7,8 2003
 723
 724 Changes to p2_workcomp.c, compacc.c, and p_mw.h to fix persistent
 725 bugs in the -m 9c display.  Previous pv34comp* programs would not
 726 return the correct coded alignment if more than 100 alignments came
 727 from the same node, or if an encoding was longer than 127 chars.
 728
 729 Also, fixes to p2_complib.c, comp_lib.c, to allow long query sequences
 730 to be segmented.  Previously, only the first 20,000 residues were
 731 used.  The segmented queries are not overlapped; segmented library
 732 sequences are.
 733
 734 >>May 5, 2003
 735
 736 Changes to last_tat.c, scaleswt.c to ensure that all fasts alignments
 737 that are likely to have significant scores are displayed.  In previous
 738 implementations, if the query had more than 10 fragments, only the 100
 739 best scores were shown.  Now, we rescore up to 2500 alignments.  The
 740 new approach allows large mixtures to be used for searches, where some
 741 of the fragments from the mixture match too many proteins
 742 (e.g. actins).  Some differences between the fasts34 and pv34compfs
 743 implementations have been fixed.  The two programs typically will not
 744 give exactly the same results, because of small differences in the
 745 sampling procedures, but the results are essentially equivalent.
 746
 747 >>Apr 11, 2003  CVS fa34t21b3
 748
 749 Fixes for "-E" and "-F" with ssearch34, which was inadvertantly disabled.
 750
 751 A new option, "-t t", is available to specify that all the protein
 752 sequences have implicit termination codons "*" at the end.  Thus, all
 753 protein sequences are one residue longer, and full length matches are
 754 extended one extra residue and get a higher score.  For
 755 fastx34/tfastx34, this helps extend alignments to the very end in
 756 cases where there may be a mismatch at the C-terminal residues.
 757
 758 -m 9c has also been modified to indicate locations of termination
 759 codons ( *1).
 760
 761 >>Mar 17, 2003  CVS fa34t21b2
 762
 763 A new option on scoring matrices "-MS" (e.g. "BL50-MS") can be used to
 764 turn the I/L, K/Q identities on or off.  Thus, to make "fastm34" use
 765 the isobaric identities, use "-s M20-MS".  To turn them off for "fasts34",
 766 use "-s M20".
 767
 768 More fixes for correct alignment coordinates.  There was a conflict between
 769 -m 9 and -m 9c and subsequent alignment displays.
 770
 771 >>Mar 13, 2003
 772
 773 Various fixes to produce correct fastm34 alignments.  Changes to all
 774 functions to correct potential problem with -m 9 alignment coordinates
 775 when both -m 9 and actual alignments are shown.
 776
 777 >>Feb 25,27, 2003
 778
 779 Modifications to re-activate showsum.c, which included corrections to
 780 the showbest() call in p2_complib.c.
 781
 782 >>Feb 13, 2003  CVS fa34t21b1
 783
 784 Modifications to dropfx.c to dramatically improve alignment speed for
 785 cases where the DNA sequence is considerably longer than the protein
 786 sequence.  Previously, a 200 aa vs 5000 nt comparison would do a full
 787 200 x 5000 Smith-Waterman alignment; with this modification, no more
 788 than a 200 x 1200 (2x3x200) alignment is done.  This optimization has
 789 not (yet) been applied to dropfz2.c (fasty/tfasty).
 790
 791 >>Feb 11, 2003
 792
 793 Small modifications to comp_lib.c, p2_complib.c, and nmgetlib.c to
 794 pass openlib() a possibly old lmf_str.  This allows openlib() to
 795 re-use memory mapped files.  closelib() no longer releases memory
 796 mapped file buffers.  Under Linux, memory mapped file buffers were not
 797 really released, so when comparing a set of sequences against nr, the
 798 program could not mmap() the database after several searches.  This
 799 will also speed up memory mapped multiple sequence searches.
 800
 801 >>Jan 28-31, 2003  CVS fa34t21b0
 802
 803 Fix another bug (all of v34t20) involved with overlapping long
 804 sequences.  And another bug that occurred when using sampled
 805 statistics, but appeared only on the SGI platform - thanks to Dmitri
 806 Mikhailov.  Several other issues have been addressed based on more
 807 instrumented runtime testing.
 808
 809 Fix an old (all v34) bug that caused problems with -z 11-16 (shuffled
 810 sequence array was not allocated properly).  Fixed another bug with -z
 811 6/16 when using threaded (_t) searches in fasta34_t.
 812
 813 Restructure statistical analysis functions (scaleswn.c, scaleswt.c) to
 814 return the "final" statistical estimation routine done in pst.zsflag_f.
 815 This allows the program to cope with searches against a single sequence
 816 correctly.
 817
 818 Corrected an error for DNA sequences needing Altschul-Gish statistics.
 819
 820 >>Jan 25, 2003
 821
 822 Add option "-J start:stop" to pv34comp*/mp34comp*.  "-J x" used to
 823 allow one to start at query sequence "x"; now both start and stop can
 824 be specified.
 825
 826 >>Jan 14, 2003
 827
 828 Changes to apam.c to provide an error message on stderr when a scoring
 829 matrix cannot be found.
 830
 831 Changes to dropfs2.c, initsw.c, initfa.c to provide -m9c information
 832 for fasts34 searches.  Modify the alignment algorithm to use
 833 probabilistic scores properly.
 834
 835 >>Dec 22, 2002
 836
 837 Change to compacc.c (sortbeste()) to do a second sort on zscore when
 838 several sequences have E() == 0.
 839
 840 >>Nov 27, 2002
 841
 842 Change FSEEK_T to fseek_t to keep Borland BCC5 happy.
 843
 844 >>Nov 14-22, 2002  CVS fa34t20b6
 845
 846 Include compile-time define (-DPGM_DOC) that causes all the fasta
 847 programs to provide the same command line echo that is provided by the
 848 PVM and MPI parallel programs.  Thus, if you run the program:
 849
 850     fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12
 851
 852 the first lines of output from FASTA will be:
 853
 854     # fasta34_t -q gtt1_drome.aa /slib/swissprot
 855      FASTA searches a protein or DNA sequence data bank
 856      version 3.4t20 Nov 10, 2002
 857     Please cite:
 858      W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448
 859
 860 This has been turned on by default in most FASTA Makefiles.
 861
 862 Fix p2_complib.c so that qstats[] is always allocated before it is used.
 863
 864 Fix serious bug in non-threaded comp_lib.c that caused some high
 865 scoring sequences to be missed by fasts34.  New tests are included in
 866 test.sh to detect this problem in the future.
 867
 868 The shell sort algorithm in sortbeste(), sortbestz(), and sortbesto()
 869 has been modified to use an improved algorithm that will not go
 870 quadratic in pathological cases.
 871
 872 nmgetlib.c and mmgetaa.c have been modified to remove "^A" in libstr
 873 when used with p2_complib.c.
 874
 875 Fix problem with MAXSEG in tatstats.h with IBM/AIX.
 876
 877 Changes to most Makefiles to use -DSAMP_STATS; fixes to p2_complib.c
 878 for SAMP_STATS.
 879
 880 >>Oct 22, Nov 3, Nov 9, 2002   CVS tag fa34t20b5
 881
 882 Fix problem in comp_lib.c that caused the query sequence length to be
 883 counted twice.
 884
 885 Fixed problem with prss34 (updated find_zp in showrss.c).
 886
 887 Correct shuffling function in several places.
 888
 889 Add jitter back to addhistz() - improves appearance with prss34.
 890
 891 Changes to fix problems with aln_code using -m 9c.
 892
 893 Fix to serious bug in scaleswt.c (fasts34, etc) that caused sorts on
 894 the high scores to take much to long.  The program is now 10X faster,
 895 and scales well on PVM/MPI.
 896
 897 Fix to llgetaa.c to work with new getseq() API with automatic alphabet
 898 recognition.
 899
 900 >>Oct 12, 2002 CVS tag fa34t20b4
 901
 902 Several very obscure (and sometimes old) bugs that appeared in certain
 903 MPI environments have been fixed.  This occurred because the pst.sq[]
 904 array did not always have a '\0' at the end.  In addition,
 905 mshowalign.c/p2_workcomp.c sometimes failed to put the '\0' at the end
 906 of seqc0/seqc1.  Correct bug introduced in fa34t20b3 for fasts34(_t).
 907
 908 >>Oct 9, 2002 CVS tag fa34t20b3
 909
 910 Fix to apam.c build_xascii() to not zero-out qascii[0].  Fix
 911 Makefile.pvm4.  Mix problem with -m 9c with compacc.c.
 912
 913 >>Sept 28, 2002
 914
 915 Additional fixes to -m 9c in p2_complib.c/compacc.c/mshowbest.c.
 916 Remove restriction in fasts34(_t) to less than 30 peptides (though no
 917 more than 30 peptides can be aligned currently).
 918
 919 >>Sept 24, 2002
 920
 921 Fix p2_workcomp.c so that e_scores are delivered correctly when
 922 last_calc flag is set, and -m 9c provides alignments when only one
 923 best hit is present.
 924
 925 Fix comp_lib.c to use different maxn and overlap for each different
 926 query sequence.  fasta34 and fasta34_t now have identical results when
 927 a long sequence is searched.
 928
 929 Add '@C:101' support to memory mapped FASTA format files.
 930
 931 Fix mshowalign.c so that coordinates returned by cal_coord() use
 932 loffset+l_off.
 933
 934 >>Sept 14, 2002 CVS tag fa34t20b2
 935
 936 Changes to p2_complib.c, compacc.c to fix statistics problems with
 937 pv34compfs on query sequences with more than 10 fragments.
 938
 939 >>Aug 27, 2002
 940
 941 Modifications to mshowbest.c and drop*.c (and p2_workcomp.c,
 942 compacc.c, doinit.c, etc.) to provide more information about the
 943 alignment with the -m 9 option.  There is now a "-m 9c" option, which
 944 displays an encoded alignment after the -m 9 alignment information.
 945 The encoding is a string of the form: "=#mat+#ins=#mat-#del=#mat".
 946 Thus, an alignment over 218 amino acids with no gaps (not necessarily
 947 100% identical) would be =218.  The alignment:
 948
 949        10        20        30        40        50          60         70
 950 GT8.7  NVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKFKL--GLDFPNLPYL-IDGSHKITQ
 951        :.::  . :: ::  .   .:::         : .:    ::.:   .: : ..:.. :::  :..:
 952 XURTG  NARGRMECIRWLLAAAGVEFDEK---------FIQSPEDLEKLKKDGNLMFDQVPMVEIDG-MKLAQ
 953                20        30                 40        50        60
 954
 955 would be encoded: "=23+9=13-2=10-1=3+1=5".  The alignment encoding is
 956 with respect to the beginning of the alignment, not the beginning of
 957 either sequence.  The beginning of the alignment in either sequence is
 958 given by the an0/an1 values. This capability is particularly useful
 959 for [t]fast[xy], where it can be used to indicate frameshift positions
 960 "/#\#" compactly.  If "-m 9c" is used, the "The best scores" title
 961 line includes "aln_code".
 962
 963 >>Aug 14, 2002  CVS tag fa34t20
 964
 965 Changes to nmgetlib.c to allow multiple query searches coming from
 966 STDIN, either through pipes or input redirection.  Thus, the command
 967
 968        cat prot_test.lseg | fasta34 -q -S @ /seqlib/swissprot
 969
 970 produces 11 searches.  If you use the multiple query functions, the
 971 query subset applies only to the first sequence.
 972
 973 Unfortunately, it is not possible to search against a STDIN library,
 974 because the FASTA programs do not keep the entire library in memory
 975 and need to be able to re-read high-scoring library sequences.  Since
 976 it is not possible to fseek() against STDIN, searching against a STDIN
 977 library is not possible.
 978
 979 >>Aug 5, 2002
 980
 981 fasts34(_t) and fastm34(_t) have been modified to allow searches with
 982 DNA sequences.  This gives a new capability to search for DNA motifs,
 983 or to search for ordered or unordered DNA sequences spaced at
 984 arbitrary distances.
 985
 986 >>Aug 4, 2002
 987
 988 comp_lib.c has been modified to provide comp_mlib.c function.
 989 comp_mlib.c is no longer used.  comp_lib.c with the "mlib" function
 990 can now recognize protein or DNA sequences automatically, and reads
 991 from stdin can now detect DNA/protein sequence types automatically.
 992 Changes to compacc.c, getseq.c, doinit.c initfa.c, initsw.c, and
 993 nmgetlib.c to support automatic sequence type detection.
 994
 995 >>July 28-31, 2002
 996
 997 (1) The various Makefile's have been "normalized".  The fast*34[_t]
 998     (Makefile.34m.common[_sql]), Makefile.pvm4[_sql], and
 999     Makefile.mpi4[_sql] make files all use a common set of filenames,
1000     described in Makefile.fcom.  This greatly simplifies adding
1001     programs, but requires that all *.o files be deleted when moving
1002     from fast*34* to pv34comp* to mp34comp*.
1003
1004 (2) showalign.c/p_showalign.c have been merged into mshowalign.c
1005     showbest.c/manshowbest.c have been merged into mshowbest.c.  Some
1006     of the related files (showun.c, manshowun.c, have not been merged
1007     or tested).
1008
1009 (3) Code for ranking scores with valid e_value's incorporated.
1010
1011 (4) Bug fixes in p2_complib.c, so that fasts34/fasts34_t/pvcompfs
1012     provide identical statistics.
1013
1014 >>July 26, 2002
1015
1016 Makefile.pvm4_sql and Makefile.pvm4 have been substantially simplified
1017 by providing the worker program name from the h_init() function in the
1018 initfa.c/initsw.c files.
1019
1020 >>July 24, 2002
1021
1022 Substantial modifications to param.h, structs.h to ensure that no
1023 sequence specific information is kept in struct pstruct.  This
1024 structure now holds the pam[] matrix, and other scoring parameters,
1025 but nothing that is dependent on aa0.  The aa0 dependent stuff (nm0,
1026 Lambda, K, etc) is now stored in struct mngmsg.  This was mostly done
1027 to support the pv34comp* programs, which have separate mngmsg
1028 structures but the same pstructs.
1029
1030 The fasts34, fasts34_t, and pv34compfs/c34.workfs have all been tested
1031 successfully.
1032
1033 >>July 19, 2002
1034
1035 Fix an old bug in the calculation of E()-values in DNA databases
1036 longer than 2147483647 residues on machines with 32-bit longs.
1037
1038
1039 >>July 28-31, 2002
1040
1041 (1) The various Makefile's have been "normalized".  The fast*34[_t]
1042     (Makefile.34m.common[_sql]), Makefile.pvm4[_sql], and
1043     Makefile.mpi4[_sql] make files all use a common set of filenames,
1044     described in Makefile.fcom.  This greatly simplifies adding
1045     programs, but requires that all *.o files be deleted when moving
1046     from fast*34* to pv34comp* to mp34comp*.
1047
1048 (2) showalign.c/p_showalign.c have been merged into mshowalign.c
1049     showbest.c/manshowbest.c have been merged into mshowbest.c.  Some
1050     of the related files (showun.c, manshowun.c, have not been merged
1051     or tested).
1052
1053 (3) Code for ranking scores with valid e_value's incorporated.
1054
1055 (4) Bug fixes in p2_complib.c, so that fasts34/fasts34_t/pvcompfs
1056     provide identical statistics.
1057
1058 >>July 26, 2002
1059
1060 Makefile.pvm4_sql and Makefile.pvm4 have been substantially simplified
1061 by providing the worker program name from the h_init() function in the
1062 initfa.c/initsw.c files.
1063
1064 >>July 24, 2002
1065
1066 Substantial modifications to param.h, structs.h to ensure that no
1067 sequence specific information is kept in struct pstruct.  This
1068 structure now holds the pam[] matrix, and other scoring parameters,
1069 but nothing that is dependent on aa0.  The aa0 dependent stuff (nm0,
1070 Lambda, K, etc) is now stored in struct mngmsg.  This was mostly done
1071 to support the pv34comp* programs, which have separate mngmsg
1072 structures but the same pstructs.
1073
1074 The fasts34, fasts34_t, and pv34compfs/c34.workfs have all been tested
1075 successfully.
1076
1077 >>July 8, 2002
1078
1079 Modifications to comp_lib.c, initfa.c and new scaleswt.c, tatstats.c
1080 to support FASTS with Tatusov statistics.
1081
1082 last_params() has been introduced to allow aa0 dependent changes in m_msg/pstr.
1083
1084 sortbest() has been moved into initfa.c/initsw.c to make it function specific.
1085
1086 find_z() takes an additional parameter, escore.
1087
1088 The do_work() results structure, beststr, and stat_str all accommodate
1089 escores as well as integer scores (stat_str also saves segn and segl
1090 but doesn't need them).
1091
1092 In scaleswt.c, process_hist() now knows much more about Tatusov statistics.
1093
1094 last_stats() provided to accommodate rank-based statistical corrections.
1095
1096 scale_scores() is the last function to modify the beststr scores
1097 (final calculation of E-value).
1098
1099 Some sortbest*() calls and some bptr[i]->zscore=find_zp() loops have
1100 been moved into scale_scores();
1101
1102 >>July 3,5, 2002
1103
1104 Modifications to allow mySQL comments (--) in "library.sql 16" files.
1105 Thus, a first line of:
1106
1107         --host seqdb user password;
1108
1109 is read by FASTA as the login information to a mySQL server, but is
1110 ignored by mySQL.  "DO" commands in FASTA mySQL files can also be
1111 rendered invisible to mySQL in this way.  See "do.sql".
1112
1113 Modifications to mysql_lib.c to allow very long SQL statements.  The
1114 buffer is now dynamically reallocated in 4Kb chunks.
1115
1116 The fasta3.1 man page has been updated and re-organized.
1117
1118 >>June 26, 2002
1119
1120 Minor modifications to nmgetaa.c (openlib()) to use the same arguments
1121 for searching and PRSS.  PRSS needs access to all of m_msg, but
1122 searches do not.  Other small fixes to comp_mlib.c, towards the goal
1123 of merging comp_mlib.c and comp_lib.c.
1124
1125 >>June 25, 2002
1126
1127 Modify the statistical estimation strategy to sample all the sequences
1128 in the database, not just the first 60,000.  The histogram is still
1129 based only on the first 60,000 scores and lengths, though all scores
1130 an lengths are shown.  The fit to the data may be better than the
1131 histogram indicates, but it should not be worse.
1132
1133 Currently, this modification is available only if the -DSAMPLE_STATS
1134 option is defined.
1135
1136 >>June 23, 2002 CVS fa34t11d4
1137
1138 Fix a very long-standing bug in fasty/tfasty that caused 'NNN' to be
1139 translated as 'S', rather than 'X'.  fastx/tfastx has done this
1140 correctly for many years, but the fasty/tfasty code that I received
1141 from Zheng Zhang was not implemented correctly (my fault, his code was
1142 fine).
1143
1144 >>June 19, 2002
1145
1146 Added "-C #" option, where 6 <= # <= MAX_UID (20), to specify the
1147 length of the sequence name display on the alignment labels.  Until
1148 now, only 6 characters were ever displayed.  Now, up to MAX_UID
1149 characters are available.
1150
1151 >>May 30, 2002  CVS fa34t11d3
1152
1153 Fixed problem with programs using the default -E cutoff when -b was
1154 provided.  With this implementation, -E can override -b, but -b
1155 overrides the default -E.
1156
1157 Fixed problem with 64-bit file offsets in param.h (change USE_FSEEK0
1158 -> USE_FSEEKO, include -D_LARGEFILE_SOURCE and -D_LARGEFILE64_SOURCE
1159 in Makefile.linux_sql).  Put limits on alignment display length (200
1160 chars).  More checks for null returns from SQL queries.
1161
1162 >>Apr 17, 2002  CVS fa34t11d2
1163
1164 Fixed bug in mm_file.h/ncbl2_mlib.c that caused the SGI version to be
1165 unable to read blast2 format files.
1166
1167 Changed "mp_*" tags to "pg_*" for -m 10 option.
1168
1169 >>Mar 30, 2002
1170
1171 Fix embarrassing bug in revcomp() (getseq.c) that failed to complement
1172 the central nucleotide in a sequence with an odd number of residues.
1173
1174 Small changes to dropfs.c for more segments.
1175
1176 >>Mar 16, 2002
1177
1178 Added create_seq_demo.sql, nt_to_sql.pl to show how to build an SQL
1179 protein sequence database that can be used with with the mySQL
1180 versions of the fasta34 programs.  Once the mySQL seq_demo database
1181 has been installed, it can be searched using the command:
1182
1183         fasta34 -q mgstm1.aa "seq_demo.sql 16"
1184
1185 mysql_lib.c has been modified to remove the restriction that mySQL
1186 protein sequence unique identifiers be integers.  This allows the
1187 program to be used with the PIRPSD database.  The RANLIB() function
1188 call has been changed to include "libstr", to support SQL text keys.
1189 Due to the size of libstr[], unique ID's must be < MAX_UID (20)
1190 characters.
1191
1192 A "pirpsd.sql" file is available for searching the mySQL distribution
1193 of the PIRPSD database.  PIRPSD is available from
1194 ftp://nbrfa.georgetown.edu/pir_databases/psd/mysql.
1195
1196 >>Mar 6, 2002
1197
1198 Fix showbest.c showbest() to report pst.zdb_size as database size.
1199 Fix dropnfa.c spam() to address off-by-one on end of run, and double
1200 counting on backwards scan.  Fix dropnfa.c do_fasta() to fix another
1201 problem introduced by -S.  Changes to comp_lib.c to ensure that both
1202 the beginning and end of the query and library sequence have '\0'
1203 present.  Changes to initfa.c, initsw.c to ensure that a match to a
1204 lower-case letter with -S gets exactly the same score as a match to an
1205 'X'.  Changes to mmgetlib.c to work with 64-bit longs in *.xin files.
1206
1207 >>Feb 26, 2002
1208
1209 Fixes to doinit.c, initfa.c, initsw.c to allow DNA matrices using the
1210 "-s dna.mat" option.  A new matrix, "d50ry.mat" is available that
1211 scores +5 for a match, -2 for a transition, and -5 for a
1212 transversion. "d50ry.mat" corresponds to DNA PAM50 with transitions
1213 twice as common as transversions.  When "-s dna.mat" is used, "-n"
1214 MUST be used as well.
1215
1216 Query sequence names ("aa", "nt") should be more accurate.
1217
1218 >>Feb 22, 2002
1219
1220 Fix to getseq.c to allow "plain" sequence files.
1221
1222 >>Feb 12, 2002
1223
1224 Minor fix to res_stats.c.
1225
1226 >>Jan 28, 2002
1227
1228 Fixes to resurrect res_stats.c.  res_stats (cc -o res_stats
1229 res_stats.c scaleswn.c -lm) takes the output from a current "-R
1230 file.res" file and calculates statistical significance - this allows
1231 one to take exactly the same set of scores (and lengths) and calculate
1232 statistical estimates using different strategies.
1233
1234 >>Jan 24, 2002
1235
1236 modifications to mmgetlib.c, ncbl2_mlib.c to more robustly read memory
1237 mapped files (*.xin, map_db) on machines lacking "native" 64-bit
1238 longs.  If the machine provides some definition for a 64-bit long
1239 (e.g. "long long", "int64_t"), things should work. 64-bit offsets into
1240 memory mapped files work properly on Alpha, SGI, i386 Linux, and
1241 MacOSX.  The current implementation depends either on 64 bit longs
1242 (Compaq Alpha's pre 4.0G) or the <sys/inttype.h> file.  Makefile,
1243 Makefile.alpha, and Makefile.linux have been modified.
1244
1245 Modifications to nmgetlib.c, mmgetlib.c to provide GI numbers and
1246 Accession versions for Genbank searches.  If the GI:123456 number is
1247 available, it will be used and the description line will be formatted:
1248
1249         gi|123456|gb|ACC1234.1|LOCUS description
1250
1251 This should help FAST_PAN runs, where the version of a sequence
1252 changes frequently.
1253
1254 >>Jan 10, 2002
1255
1256 Modifications to p2_complib.c, p2_workcomp.c to more reliably allocate
1257 space for library sequence descriptions on the master and workers.
1258
1259 >>Jan 2-3, 2002         CVS fa34t10c/fa34t10d3
1260
1261 Fixes to comp_lib.c to support Macintosh and Windows/Turbo-C
1262 compilation.  New Makefile.tc.  Macintosh version supports both
1263 "Classic" and "Carbon" environments.
1264
1265 "<values.h>" has been replaced with the more modern "<limits.h>"
1266
1267 Fixes to p2_complib.c to support n_libstr (libstr length) in GETLIB().
1268
1269 comp_thr.c, complib.c removed.
1270
1271 >>Dec 16, 2001
1272
1273 Complete integration of comp_mlib.c with both the unthreaded and
1274 threaded programs.  Comp_mlib allows fasta34 and fasta34_t to compare
1275 a database with a second database, just as pv34compfa does.  Using
1276 multiple queries with fasta34_t is not as efficient as pv34compfa (and
1277 it cannot use networks of Unix workstations), but it is much easier to
1278 use and install.
1279
1280 With the comp_mlib.c option, fasta34 cannot automatically recognize
1281 DNA sequences, just as pv34compfa no longer recognizes DNA sequences.
1282 You must use the "-n" option to search with DNA sequences.  The other
1283 programs (fastx34, tfastx34, etc) "know" the type of the query and
1284 database sequences, so "-n" is only required for fasta34(_t).
1285
1286 >>Dec 14, 2001          CVS tag fa34t10b
1287
1288 Fix problems reading DNA databases in blast2 format.
1289
1290 >>Dec 11, 2001
1291
1292 Changes to spam() in dropnfa.c so that, for DNA sequences, the
1293 previous behavior for finding the boundaries of a local alignment
1294 region use the same algorithm as previous versions of fasta.  For
1295 protein sequences, the algorithm will extend the local region beyond
1296 the "ktup" boundaries if a better score can be found.  For DNA
1297 sequences, this raises the noise rather than increasing sensitivity,
1298 so it is turned off and "ktup" boundaries are respected.  The old,
1299 "ktup" boundary algorithm is available with -DNOSPAM_EXT.
1300
1301 This version also includes a working res_stats.c, which can be used to
1302 test various statistical estimates on exactly the same set of scores.
1303
1304 Fixed problems with -m 9 percent identity for fastx/fasty/tfastx/tfasty.
1305 These errors have been present since -m 9 was implemented.
1306
1307 >>Dec 10, 2001
1308
1309 Fix to map_db.c to work correctly with files > 2 Gb when 64-bit longs
1310 are available.  It is not yet designed to work with ftello() and other
1311 offset types.
1312
1313 >>Nov 11,21, 2001       CVS tag fa34t10a, fa34t10d1
1314
1315 Substantial changes to revcomp(), getseq(), and other functions to
1316 correct problems with -S on DNA sequences.  Sequences with lower case
1317 nucleotides were not recognized or reverse complemented properly.
1318
1319 Fix to dropnfa.c (v34t07, Nov 21, 2001) bg_align() to re-initialize
1320 static globals - this fixes a problem encountered with pv34compfa.  A
1321 new main program, comp_mlib.c has been added to the CVS archive,
1322 although it is not referenced in any of the Makefile.  comp_mlib.c
1323 works like p2_complib.c and compares a library against another
1324 library.
1325
1326 >>Nov 4, 2001
1327
1328 Change to dropnfa.c spam () while(1) -> while(lpos <= dmax->stop).
1329 This fixes a problem with ktup=1 on Suns only, so far.
1330
1331 >>Oct 4, 2001           CVS tag fa34t10
1332
1333 Add comp_lib.c file, which merges complib.c (unthreaded) and
1334 comp_thr.c (threaded) code into one file.
1335
1336 Modifications to nmgetlib.c, mmgetaa.c to allow Genbank flatfile
1337 format without DESCRIPTION or ACCESSION lines.
1338
1339 Additional fix for -S with ktup=1.
1340
1341 >>Sept. 24, 2001
1342
1343 Fix to have correct gap-penalties for short scoring matrices with
1344 tfastx/fastx.
1345
1346 >>Sept. 10, 2001        CVS tag fa34t05d6
1347
1348 Fix a bug introduced by -S fix in fa34t05d5.  Also, try to remove
1349 changes in p34compfa compared to pv4compfa output.
1350
1351 >>Sept. 6, 2001         CVS tag fa34t05d5
1352
1353 Fix the -S dropnfa/fx/fz2 bug that was not actually fixed in
1354 fa34t05d4.  Incorporate the correct scaleswn.c refered to in
1355 fa34t05d4.
1356
1357 >>Sept. 5, 2001         CVS tag fa34t05d4
1358
1359 Fix problem with m_msg.quiet that prevented interactive prompts for
1360 ktup, file name, etc with threaded programs.
1361
1362 Fix serious bug in dropnfa.c/dropfx.c/dropfz2.c that caused -S to work
1363 improperly on sequences with effective length of 3 or less.
1364
1365 Change to scaleswn.c to make mle_cen(), mle_cen2() more robust to cases
1366 where the top and bottom scores are the same.
1367
1368 Change p2_complib.c to avoid compiler complaints with (void *)wstage2p=NULL
1369 on some platforms.
1370
1371 >>Aug. 30, 2001         CVS tag fa34t05d3
1372
1373 Fixed problem with uthr_subs.c for Suns, but changed Makefile.sun to
1374 use pthreads rather than Sun Unix threads.  Removed SQL stuff from
1375 Makefile.mpi4/pvm4 and added Makefile.mpi4_sql/pvm4_sql.
1376
1377 fa34t05d2 - fix to map_db.c to provide *sascii.
1378
1379 fa34t05d1 - fixes to ibm_pthr_subs.c and Makefile.ibm from IBM.
1380
1381 >>Aug. 20, 2001         CVS tag fa34t05d0
1382
1383 The pvm/mpi complib programs have been substantially updated with
1384 release 3.4.  See readme.v34t0 for more information.  With version
1385 3.4, the MPI programs are mp34comp*, mu34comp*, etc.
1386
1387 A major effect of this change is to disable automatic sequence type
1388 (protein/DNA) recognition with pv34compfa/mp34compfa.  By default,
1389 protein libraries are assumed.  Thus, pv34compfa/mp34compfa require
1390 the "-n" command line option when running pv34compfa/mp34compfa on DNA
1391 sequence libraries.  This issue does not occur with the other
1392 programs, which will recognize the appropriate sequence type, because
1393 it is determined by the program (e.g. pv34compfx requires
1394 DNA:protein).
1395
1396 Fixed substantial problem with 64-bit file offsets for Linux in
1397 complib.c/comp_thr.c, p2_complib.c.  This problem, solved by Doug
1398 Blair, was preventing the threaded versions from working properly in
1399 memory mapped mode.
1400
1401 In all earlier versions of fasta, when very long sequences were
1402 searched, the sequence length reported was that of the "chunk" that
1403 was actually searched (typically 80,000-query_length) rather than the
1404 actual library sequence length.  The peculiar behavior now changed,
1405 and the full length of the library sequence, not the sequence chunk,
1406 is reported as the library sequence length.  Note that chunks are
1407 still used, however, which can cause the same alignment to be shown
1408 twice.  In addition, the "-m 9" output format has changed to report
1409 the coordinates of the query and library sequence (see below), which
1410 may be different from 1-sequence_length because the the query and
1411 library sequences may have been extracted from larger sequences.  Four
1412 additional fields have been added, "pn0", "px0","pn1", "px1" that are
1413 the positions in for the beginning (pn0/1) and end (px0/1) of they
1414 query/library sequence.  pn0/1 would typically be changed with the
1415 "@C:#" directive, described below.
1416
1417 Changes to doinit.c/initfa.c/initsw.c to provide a new function -
1418 f_lastenv() - that allows function-specific adjustments to parameters
1419 after the command line options have been read but before the first
1420 sequence is read.  This change solved problems with "mp/pv34compfx -S".
1421
1422 fasts34/tfasts34 now recognize that 'I/L' are the same, as are 'Q/K'
1423 (which are apparently indistinguishable by Mass-Spec).  The latter
1424 identity is on by default, but can be turned off with "-h 0".
1425
1426 The MPI/PVM versions of the programs have been tested extensively with
1427 compfa, compfx, and comptfx.  Makefile.mpi4 now works properly.
1428 Changes to p2complib.c to support the PVM option "-T 1-4", which
1429 allows one to run on nodes 1-4 of a (presumably larger) PVM virtual
1430 machine.  This option has no effect on the mp34comp* programs.  The
1431 old "-T 4" to run on 4 nodes, is also available.  If each node has 2
1432 cpu's, as indicated in the "pvmd hostfile", both CPU's will be used
1433 for a total, in this example, of 8 processes. This allows one to
1434 specify a large PVM machine and use separate parts of it
1435 independently.
1436
1437 Changes to nmgetlib.c to fix problems with longer dates in GCG files
1438 (Y2K).  Fixes to faatran.c for extended alphabets and 'X's.  Various
1439 code clean-ups to make "gcc -Wall" a little bit (not much) happier.
1440
1441 This is the first distributed fasta34 version.
1442
1443 ================
1444 >>Aug 9, 2001           CVS tag fa34t05
1445
1446 Corrections to initfa.c to allow -S to work with tfastx/y.
1447 Fix to manshowbest.c for query position with -m 9.
1448
1449 >>July 18, 2001         CVS tag fa34t04
1450
1451 Various changes to complib.c, comp_thr.c, p2_complib.c, showbest.c,
1452 showalign.c to deal with overlapping alignments in long sequences that
1453 have been segmented.  When long sequences are segmented (lcont>0), the
1454 eventual total length (n1tot_v) is saved at beststr->n1tot_p.  If
1455 there was no lcont, then beststr->n1tot_p = NULL, and beststr->n1
1456 should be used as the sequence length.  This has the advantage of
1457 requiring space only when long sequences are encountered, and
1458 requiring only one integer for several segments.
1459
1460 m_msg.noshow has been removed.
1461
1462 The -m 9 format has been changed - 5 fields have been added, 4
1463 (pmn0/pmx0/pmn1/pmx1) provide the beginning and end coordinates of the
1464 query and library sequence; the last (fs) reports the number of
1465 frameshifts.  The names of the alignment boundaries have been changed
1466 from min0/max0/min1/max1 to amn0/amx0/amn1/amx1 (Alignment miN/maX).
1467
1468 The SQL format has been extended to provide for statements that do
1469 things but do not generate results, such as creating and selecting into a temporary table, e.g.:
1470 ================
1471     do
1472     create temporary table seq_pos (
1473     id int unsigned not null auto_increment primary key,
1474     prot_id int unsigned not null default 0,
1475     start int unsigned not null default 0,
1476     length int unsigned not null default 0,
1477     )
1478     ;
1479     do
1480     insert into seq_pos (prot_id, start, length)
1481       select id, 11, len-10
1482       from protein, annot
1483       where len > 100
1484       and annot.protein_id = protein.id
1485       and annot.pref=1
1486     ;
1487     select seq_pos.id,
1488        substring(protein.seq, start, length),
1489        concat("@C:", start, " ", descr)
1490     from protein, seq_pos, annot
1491     where protein.id = annot.protein_id
1492       and protein.id = seq_pos.prot_id
1493       and annot.pref = 1
1494     ;
1495     select prot_id,
1496        concat("@C:", start, " ", descr)
1497     from seq_pos, annot
1498     where annot.protein_id = seq_pos.prot_id
1499       and seq_pos.id = #
1500       and annot.pref = 1
1501      ;
1502 ================
1503
1504   In the current implementation, these statements must start with "DO"
1505 as the first two characters on the line, and come immediately after a
1506 line ending with ';'.  The text from "DO" to the next ";", excluding
1507 the "DO", is executed when the database connection is made.
1508
1509 ===== >>July 12, 2001
1510
1511 The allocation of the work_info data structure used to send
1512 information to the worker threads has been changed.  The old method
1513 worked, possibly by accident.
1514
1515 A bug in p2_complib.c that caused E()-values to be calculated
1516 improperly for the first query sequence has been fixed.
1517
1518 >>July 11, 2001 --> fa34t02
1519
1520 It is now possible to specify output coordinates in library sequences
1521 by including the string: "@C:number" on the description line, e.g.
1522
1523    >gtm1_human gi|12345 human glutathione transferase M1 @C:21
1524
1525 would label the first residue in the library sequence "21" rather than
1526 "1".  This capability has been included to provide accurate
1527 coordinates for searches done against subsequences generated by an SQL
1528 query.  For example, one could use a query of the form:
1529
1530  SELECT protein.id, substring(protein.seq,11,length(protein.seq)-20),
1531         concat(protein.name," @C:11 ",protein.descr)
1532  FROM protein;
1533
1534 to generate a sequence set with each sequence starting with residue
1535 11.  Without the "@C:11" option on the description line, the program
1536 would number the alignment positions starting at 1, even though the
1537 first residue of the sequence really started at 11.  "@C:11" allows
1538 one to correct the coordinate system.
1539
1540 Currently, "@C:offset" is available only with library type 1 (fasta
1541 format) and 16 (mySQL).
1542
1543 The SQL-generated database with "@C:offset" can be used with both the
1544 fast*34(_t) programs and with pv34comp*.  However, the SQL syntax is
1545 used differently in the fasta34 and pv34compfa programs.  fast*34(_t)
1546 requires three SQL statements during a search: (1) a statement to
1547 generate a large set of library sequences; (2) a statement to generate
1548 a description of a single sequence, given a unique identifier provided
1549 by (1); and (3) a statement to generate a single sequence given a
1550 unique identifier provided by (1).  For fast*34 searches, the third
1551 (3) SQL statement must provide the "@C:offset" information in the
1552 third results field for the offset to be used.  It is optional in (1)
1553 and (2).
1554
1555 The pv34comp* programs only require one SQL statement, statement (1)
1556 above, which must provide three fields, a unique identifier, the
1557 sequence, and a complete description that must include "@C:offset" if
1558 substrings are used.  If SQL queries (2) and (3) are provided, they
1559 are  ignored.  Thus, the same files can be used by both programs, but
1560 the "@C:offset" is required in different SQL queries by the fast*34
1561 and pv34comp* programs.
1562
1563 Other changes:
1564
1565 Re-incorporation of GAP_OPEN option; fix to Altschul-Gish stats when
1566 GAP_OPEN is used.
1567
1568 Re-incorporation of A. Mackey's spam() improvement in dropnfa.
1569
1570 Fixes to include file ordering to allow fast*34(_t) pv34comp* programs
1571 to compile.
1572
1573 Fix to lascii[] for SQL database queries.
1574
1575 Fix to an old bug in comp_thr.c to send individual worker_info
1576 structures to threads (does not fix LINUX threads problems, however).
1577
1578 =====
1579 >>July 9, 2001
1580
1581 Considerable changes to support no-global library functions.
1582
1583 (1) Separate ascii/sequence mapping arrays are used by the
1584     query-reading (qascii), library-reading (lascii), and sequence
1585     comparison function (pascii) routines.  As a result, there is no
1586     longer a need for tgetlib.o/lgetlib.o - lgetlib.o can serve both
1587     functions.
1588
1589 (2) This also allows us to remove all #ifdef TFAST/FASTX conditionals
1590     from complib.c/comp_thr.c/p2_complib.c.  We no longer need
1591     tcomp_thr.o, comp_thrx.o, etc.  We still have a variety of
1592     p2_complib.o variations to support the different c34.work* files.
1593
1594 (3) Because non-global openlib/getlib functions are available, exactly
1595     the same open/get functions are available for reading both the
1596     query and reference libraries in pv34comp* programs.  The
1597     host-specific openlib/getlib functions in hxgetaa.c are now
1598     provided by nmgetlib.c, etc. This has two effect:
1599
1600     (a) it is now possible to compare a query database generated by an
1601         SQL query to a library database generated by a different SQL
1602         query.
1603
1604     (b) pv34comp* has lost (at least in this version) the ability to
1605         automatically detect the query sequence type. To search with a
1606         DNA query, you MUST use "-n".
1607
1608 (4) the resetp() function is now responsible for almost all of the
1609     function sepcific (TFAST/FASTX/etc) initializations.  All of the
1610     function specific code has been removed from complib.c/comp_thr.c
1611     and most of it has been moved to initfa.c/resetp().
1612
1613 (5) manageacc.c has been merged into compacc.c (mostly prhist()).
1614
1615 =====
1616 >>June 1, 2001
1617
1618 Many changes to accommodate a new - no global variable - strategy for
1619 reading sequence databases.  Every time a file is opened, a struct
1620 lmf_str is allocated which can be used for memory mapped files, ncbl2,
1621 files, and mysql files.
1622
1623 In addition, an open'ed file has a default sequence type: DNA or
1624 protein, or one can open a file in a mode that will allow the sequence
1625 type to be changed.
1626
1627 =====
1628 >>May 18, 2001          CVS: fa33t09d0
1629
1630 A new compile time parameter - -DGAP_OPEN, is available to change the
1631 definition of the "-f gap-open" parameter from the penalty for the
1632 first residue in a gap to a true gap-open penalty, as is used in BLAST
1633 and many other comparison algorithms.  This will probably become the
1634 default for fasta in version 3.4.
1635
1636 Fixes to conflicts between "-S" and "-s matrix".  When a scoring
1637 matrix file was specified, lower-case alignments were not displayed
1638 with -S (although the scores were calculated properly).
1639
1640 More extensive testting of mysql_lib.c (mySQL query-libraries) with
1641 the pv4comp* and mp4comp* programs.
1642
1643 =====
1644 >>April 5, 2001         CVS: fa33t08d4b3
1645
1646 Changes in nmgetlib.c and ncbl2_mlib.c to return long sequence
1647 descriptions for PCOMPLIB (pv4/mp3comp*).  Also fix p2_complib.c to
1648 request DNA library for translated comparisons.
1649
1650 Fix for prss33(_t) to read both sequences from stdin.
1651
1652 =====
1653 >>March 27, 2001        CVS: fa33t08d4
1654
1655 Modifications to allow 64-bit fseek/ftell on machines like Sun,
1656 Linux/Intel, that support -D_FILE_OFFSET_BITS=64, -D_LARGE_FILE_SOURCE
1657 off_t, and fseeko(), ftello() with the option -DUSE_FSEEKO.  Machines
1658 with 64-bit long's do not need this option.  Machines with 32-bit
1659 longs that allow files >2 Gb can do so with 64-bit file access
1660 functions, including fseeko() and ftello(), which work with off_t file
1661 offsets instead of long's.
1662
1663 =====
1664 >>March 3, 2001         CVS: fa33t08d2
1665
1666 Corrected problems in nmgetaa.c and mysql_lib.c with parallel
1667 programs, and one serious problem with alternate DNA scoring matrices
1668 (initfa.c, initsw.c) not being set properly.  A subtle problem with
1669 the merge of scaleswn.c and scaleswg.c is fixed.
1670
1671 >>February 17, 2001
1672
1673 Modified mysql_lib.c to use "#", rather than "%ld", to indicate the
1674 position of the GID.  This change was made because sprintf() cannot be
1675 used reliably to generate an SQL string, as '"' and '%' are used in
1676 such strings.
1677
1678 =====
1679 >>January 17, 2001
1680 (no version change, date change)
1681
1682 Minor fixes to initfa.c, initsw.c to deal with DNA scoring matrices
1683 properly. "-n -s dna.mat" is required for the sequence/matrix to be
1684 recognized as DNA.
1685
1686 >>January 16, 2001
1687 -->v34t00
1688
1689 Merge of the main CVS trunk - fa33t06 with the latest release branch,
1690 fa33t08.
1691
1692 In addition, PCOMPLIB mods have been made to mysql_lib.c.  Because
1693 p2_complib.c gets sequence description information during the first
1694 read of the database, the mysql_query must be changed to return:
1695 result[0]=GID, result[1]=description, result[2]=sequence.  In the
1696 PCOMPLIB case, the other SQL queries (for GID description, sequence)
1697 are not necessary but must still be provided.