website/archive/binaries/mac/src/fasta34/readme.pvm_3.3

   1
   2  $Name: fa_34_26_5 $ - $Id: readme.pvm_3.3,v 1.13 2000/08/04 18:45:15 wrp Exp $
   3
   4 ================
   5 pvcomp* - FAQ's, November, 1999
   6
   7 (The comments below apply to the pv3comp* programs.  This problem has
   8 been addressed in the pv4comp* programs, by dramatically changing
   9 the way databases are distributed.)
  10
  11 I believe that the number one reason why the pvcomp* programs do not
  12 work properly is that the second library must be fully specified.
  13 If you simply type:
  14
  15         pv3compfa query.lib database.lib
  16
  17 The program will not be able to find database.lib on the worker machines.
  18 You need to use:
  19
  20         pv3compfa query.lib /home/user/lib/database.lib
  21
  22 and /home/user/lib/database.lib must be accessible to all of the worker
  23 nodes.
  24
  25 To find error messages from the workers, look at /tmp/pvml.uid, where
  26 uid is your unix uid.
  27
  28 ================
  29 Program summary:
  30
  31 Programs to produce conventional scores and alignments:
  32
  33 pv3compfa       protein vs protein, DNA vs DNA
  34 pv3compsw       protein vs protein, DNA vs DNA
  35 pv3compfx/      DNA vs protein
  36 pv3comptfx/y    protein vs DNA
  37
  38 Programs to summarize the effectiveness of a search (require
  39 super-family-labeled databases):
  40
  41 ps3compfa       protein vs protein, DNA vs DNA
  42 ps3compsw       protein vs protein, DNA vs DNA
  43 ps3compfx/      DNA vs protein
  44 ps3comptfx/y    protein vs DNA
  45
  46 Programs to report the scores and alignments of the highest scoring
  47 unrelated sequence (require super-family-labeled databases). These
  48 programs are used to evaluate the super-family labeling.
  49
  50 pu3compfa       protein vs protein, DNA vs DNA
  51 pu3compsw       protein vs protein, DNA vs DNA
  52 pucompfx/       DNA vs protein
  53 pu3comptfx/y    protein vs DNA
  54
  55 Note that the current parallel implementations distribute the second
  56 database among 'N' parallel workers by approximately dividing the
  57 database into 'N' parts by seeking into the middle of the database and
  58 finding the next entry.  This strategy fails when the database is a
  59 single long sequence (the first worker gets the entire database, the
  60 others get nothing).
  61
  62 ================
  63 Release notes:
  64
  65 --> July 18, 2000
  66
  67 Increase SQSZ in pxgetaa.c to 200000 for long Genbank entries.  This
  68 may still not be long enough.  This increase may allow overlaps to
  69 occur.
  70
  71 --> July 10, 2000
  72
  73 Corrections to the code for breaking up very long sequences.  The last
  74 portion of a long sequence did not have the correct offset.
  75
  76 --> July 1, 2000
  77
  78 Modified pxgetaa.c to read Genbank flatfiles.
  79
  80 Additional pieces of a long sequence no longer have a '+' at the
  81 beginning.
  82
  83 --> June 12, 2000
  84
  85 Restructured p_complib.c, p_workcomp.c to make the -m 9 display more
  86 consistent with the fast33(_t) set of programs.  The alignment (%_id,
  87 swscore, boundary) information is now calculated at the do_opt() stage
  88 of the calculation.  This rearrangement uncovered a problem with the
  89 do_opt() stage (s_func=1) that has been fixed.  This has not yet been
  90 tested with the MPI implementation.
  91
  92 Many changes were made to allow k_H, k_comp information to be passed
  93 back so that the -z 6 scaleswn.c (proc_hist_mle2) function could be
  94 used.
  95
  96 --> February 6, 2000
  97
  98 Corrected some problems with proc_hist_ml() to correctly reinitialize
  99 hist_db_size and num_db_entries.
 100
 101 --> January 20, 2000
 102
 103    The structure of the p[vsu]comp* programs has not changed, but the
 104 the code has been modified to accomodate both PVM and MPI versions of
 105 the programs from the same source code.  Thus, all of the PVM-specific
 106 code is now surrounded by #ifdef PVM_SRC/#endif.  The source files
 107 pvcomplib.c and pvworkcomp.c have been replaced by p_complib.c and
 108 p_workcomp.c, respectively.  Additional changes were made to ensure
 109 that "FIRSTNODE" is used appropriately.  In general, FIRSTNODE=0 for
 110 PVM programs (although with > 8 nodes, FIRSTNODE=1 may be more
 111 effective), but FIRSTNODE=1 for MPI programs.
 112
 113   Modest changes were made to reduce warning messages during
 114 compilation.
 115
 116 --> January, 2000
 117
 118    Modification to hxgetaa.c, pxgetaa.c to handle library sequences,
 119 such as those from NCBI/NR, with very long comment lines.  Additional
 120 modifications to correct problems with long comments, long DNA
 121 sequences with pv3comptfx/tfy.
 122
 123 --> v3.33       December, 1999
 124
 125 Substantial updates to pvcomplib.c/pvworkcomp.c to improve efficiency
 126 and to provide pv3compf[xy] and pv3comptf[xy].  Previous versions of
 127 pvcomplib.c/pvworkcomp.c passed the entire struct mngmsg (structs.h)
 128 each time a new query was initiated or alignments were required.  This
 129 version sends struct mngmsg only once and sends struct qmng_str
 130 (w_msg.h), which is much smaller, for the queries and alignments. In
 131 addition, the buffer size for results is now variable (but can be as
 132 large as 1200, vs 600 previously), which may improve performance when
 133 large numbers of workers are available.  The maximum number of library
 134 sequences per worker has been raised to 200,000 from 50,000.
 135 Nevertheless, very large databases (est_human) may have too many
 136 entries to be examined by 4 workers.
 137
 138 It is likely that pv3comptf[xy] may have problems with very long
 139 sequences.  pv3compf[xy]/tf[xy] have not been tested extensively.
 140
 141 --> v3.32 December, 1999
 142
 143 Substantial corrections to showsum.c (showbest()) for the case of DNA
 144 queries, where two scores are calculated for each query.  As a result
 145 of the changes, bptr[] no longer mapped exactly to best[], which
 146 caused a bug that was very difficult to track down.  To ensure that
 147 bptr[]=best[], bptr[] is now re-initialized for each query.
 148
 149 The output format has changed significantly as well.  Lots of
 150 redundant /** **/ comments have been removed.  An E() value has been
 151 added to the "equ num:" line in showsum.c.
 152
 153 The organization of the inner while() loop in pvcomplib.c has been
 154 modified so that new query sequences can be sent to workers
 155 immediately as soon as a worker is available, rather than waiting for
 156 all to finish and the statistical analysis.
 157
 158 --> v3.30       October, 1999
 159
 160 The p*comp*/c.work* programs have been renamed to pv3compfa,
 161 ps3compfa, etc.  and c3.work* so that the older version 3.2 programs
 162 can co-exist with this version.
 163
 164 Corrected problem with "-n" option that prevented it from functioning
 165 properly.  Include "ACGTCN" in check for DNA query library.a
 166
 167 (from readme.pvm_3.2)
 168
 169 --> August, 1999
 170
 171 Corrected problem with opt_cut initialization that only appeared
 172 with p?compfa programs.
 173
 174 --> v3.26       July, 1999
 175
 176 pvcomp* programs now use the same method for working with forward and
 177 reverse strands as the standard fast*3(_t) programs.  Thus, statistics
 178 for DNA sequences should be very similar for pvcompfa and fasta3 or
 179 fasta3_t.
 180
 181                 February, 1999
 182
 183 With release fasta32t02 of the FASTA package, the alignment
 184 routines for pvcompfa, pvcompsw, etc now work properly
 185 again.
 186
 187 The PVM versions of the FASTA and Smith-Waterman search programs
 188 should now be functionally identical to the multithreaded (fasta3_t,
 189 ssearch3_t) and non-threaded (fasta3, ssearch3) versions.
 190
 191 The programs have also been updated to provide similar -m 10
 192 information to the non-pvm versions.  There are some slight
 193 differences, because the pvcomp* versions are designed to work with
 194 multiple sequences.  But, in general, a script that looks for /^>>>/
 195 to start an alignment set and /^>>><<</ to end the set work
 196 properly.
 197
 198 --> v3.23       March, 1999
 199
 200 Modified Makefile.pvm, showsum.c so that showsum.c is used by
 201 both the complib/_thr and pvcomplib (pvm parallel) versions.
 202
 203 Corrected bug in reading first query for DNA sequences.
 204
 205 --> v3.25       May, 1999
 206
 207 Fixed pvm_showalign.c so that FIRSTNODE (in msg.h) can be 1, rather
 208 than 0.  #define FIRSTNODE 1 is recommended when the virtual machine
 209 has 8 or more nodes.
 210