1 .TH PVCOMPFA/PVCOMPSW/v3.4 1 "January, 2003"
4 \- scan a protein or DNA sequence library for similar
5 sequences using the FASTA algorithm in parallel on a network of
9 \- scan a protein or DNA sequence library for similar
10 sequences using the Smith-Waterman algorithm in parallel on a network
11 of machines running pvm3.
14 \- evaluate sequence comparison parameters using the FASTA
15 algorithm and super-family-annotated libraries.
18 \- evaluate sequence comparison parameters using the
19 Smith-Waterman algorithm and super-family-annotated libraries.
23 [-Q|q -B -b # -d # -E # -f # -g # -H -i J # -n -o -p #
26 \& -r "+n/-m" \& -S -s
28 \& -w # -1 ] query-library reference-library [
32 [\-QBbcefgHiJnopRrSsw1] \- interactive mode
35 [-Q|q -B -b # -e -f delval -g gapval -i
38 \& -r "+n/-m" \& -S -s
41 ] query-library reference-library [
46 [\-QBbefgnpRrsS] \- interactive mode
52 compare all of the sequences in one DNA or protein sequence library
53 (the query library) with to all of the entries in a reference sequence
54 library using the FASTA (pv34compfa) or Smith-Waterman (pv34compsw)
55 algorithms. For example,
57 can compare a library of protein sequences to all of the sequences in
58 the NBRF PIR protein sequence database.
62 are designed to run in parallel on networks of unix workstations using
63 the PVM parallel programming system. (For more information on PVM,
64 send email to "netlib@ornl.gov" with the message "send index for pvm3").
67 uses the rapid sequence comparison algorithm
68 described in Pearson and Lipman, Proc. Natl. Acad. USA, (1988) 85:2444.
69 The program can be invoked either with command line arguments or in
70 interactive mode. The optional third argument,
72 sets the sensitivity and speed of the search. If
74 similar regions in the two sequences being compared are found by
75 looking at pairs of aligned residues; if
77 single aligned amino acids are examined.
79 can be set to 2 or 1 for protein sequences, or from 1 to 6 for DNA sequences.
83 is not specified is 2 for proteins and 6 for DNA.
86 compares a library of query sequences (there need be only one) to a
87 reference sequence library. Normally
89 sorts the output by the
93 option, sequences are ranked by their
95 score. Alternative, the
97 option causes optimized scores to be calculated for every sequence
98 greater than a threshold and the output to be sorted by the optimized
102 uses the rigorous Smith-Waterman algorithm to compare protein or
103 DNA sequences. The gap penalties and scoring matrices can be
115 \&) will automatically decide whether the query sequence is DNA or
116 protein by reading the query sequence as protein and determining
117 whether the `amino-acid composition' is more than 85% A+C+G+T.
126 that evaluate the quality of a search by reporting how many
127 high-scoring related sequences and low-scoring unrelated sequences
128 were found. These programs require that both the query library and
129 the reference library be annotated with superfamily numbers for every
130 sequence in the library.
136 now support all the options of the fasta3(_t) programs.
139 Report z-score, rather than bit-score, in list of best hits.
142 The number of similarity scores to be shown (10 by default).
145 Expectation value limit for displaying best scores.
148 The number of alignments to be shown.
151 (delval) penalty for the first residue in a gap. -12 by default for proteins.
154 (gapval) penalty for additional residues in a gap after the first. -2
155 by default for proteins.
158 turn on histogram display (off by default).
161 invert (reverse complement) DNA sequence.
164 start at the M-th sequence in the query library and continue to the
165 "N-th". By default, J=1 and the search begins with the first sequence
166 and ends with the last, but sometimes it makes sense to start in the
167 middle of the query library if a run partially completed, and to
168 finish "early" if the analysis will be run on several parallel
172 Force the program to use DNA sequence parameters.
175 Number of "slave" processors to use. Typically, one less than
176 the number of processors available with
178 so that one processor can be used to collate results. With
180 \&, it is more efficient to use every processor as a slave and
184 Quiet option. The programs will not prompt for input.
191 to write out the sequence identifier, superfamily number (if available),
192 and similarity scores to
194 for every sequence in the library. These results are not sorted.
197 specify DNA match/mismatch ratio as "+3/-2". Default is "+5/-4".
198 The "+" and "-" are required.
201 Treat lower case residues as low complexity regions.
204 the filename of an alternative scoring matrix file.
211 sort similarity scores by
218 (OPTCUT) the threshold for optimization with the
223 (no-optimize); causes
225 not to perform the default optimization on all of the sequences in the library
233 Width for limited optimization (32 by default).
236 Query library files must be in Pearson/FASTA format, e.g.
239 >seq-id | sfnum descriptive line
240 tmlyrghi... (sequence)
248 recognize the following library formats: 0 - Pearson/FASTA; 1 - Genbank tape;
249 2 - NBRF/PIR Codata; 3 - EMBL/SWISS-PROT; 5 - NBRF/PIR VMS.
251 .I Scoring matrices \-
252 These programs use a different format for the scoring (PAM) matrix
253 file from FASTA; they use the PAM matrix file that is used by BLASTP
254 and produced by Altshul's "pam.c" program in the BLAST package.
256 The program has been tested extensively only with type 0 and type 5
257 files. This documentation file may not be up to date.