1 .TH FASTA/TFASTA/FASTX/TFASTXv3 1 local
3 fasta3, fasta3_t \- scan a protein or DNA sequence library for similar
6 tfasta3, tfasta3_t \- compare a protein sequence to a DNA sequence
7 library, translating the DNA sequence library `on-the-fly'.
9 fastx3, fastx3_t \ - compare a DNA sequence to a protein sequence
10 database, comparing the translated DNA sequence in forward and
13 tfastx3, tfastx3_t \ - compare a protein sequence to a DNA sequence
14 database, calculating similarities with frameshifts to the forward and
17 fasty3, fasty3_t \ - compare a DNA sequence to a protein sequence
18 database, comparing the translated DNA sequence in forward and reverse
21 tfasty3, tfasty3_t \ - compare a protein sequence to a DNA sequence
22 database, calculating similarities with frameshifts to the forward and
25 fasts3, fasts3_t \- compare unordered peptides to a protein sequence database
27 tfasts3, tfasts3_t \- compare unordered peptides to a translated DNA
30 fastf3, fastf3_t \- compare mixed peptides to a protein sequence database
32 tfastf3, tfastf3_t \- compare mixed peptides to a translated DNA
35 ssearch3, ssearch3_t \- compare a protein or DNA sequence to a
36 sequence database using the Smith-Waterman algorithm.
38 prss3, prfx3 \- estimate statistical significance of an alignment by
39 comparing the score to the distribution of similarity scores generated
40 by shuffling the second sequence. prss3 uses Smith-Waterman. prfx3
41 uses the fastx algorithm.
45 Release 3.x of the FASTA package provides a modular set of sequence
46 comparison programs that can run on conventional single processor
47 computers or in parallel on multiprocessor computers. Seven different
48 programs \- fasta3, fastx3, fasty3, tfastx3, tfasty3, tfasta3, and
49 ssearch3 \- are currently available.
51 All of the comparison programs share a set of basic command line
52 options; additional options are available for individual comparison
55 The fasta3_t, fastx3_t, fasty3_t, tfasta3_t, tfastx3_t, tfasty3_t and
56 ssearch3_t programs are threaded versions that will run in parallel on
57 Digital Equipment, Sun, and SGI multiprocessor computers.
59 .SH Options for comparison functions
61 These versions of the fasta programs have been modified to accept a
62 query sequence from the unix "stdin" data stream. This makes it much
63 easier to use fasta3 and its relatives as part of a WWW page. To
64 indicate that stdin is to be used, use "@" as the query
65 sequence file name. "@" can also be used to specify a
66 subset of the query sequence to be used, e.g:
69 cat query.aa | fasta3 -q @:50-150 s
71 would search the 's' database with residues 50-150 of query.aa. FASTA
72 cannot automatically detect the sequence type (protein vs DNA) when
73 "stdin" is used, so the '-n' option is required for DNA.
76 Sort by "init1" score.
79 (TFASTA3, TFASTX/Y3 only) use only forward frame translations
82 "SHOWALL" option attempts to align all of both sequences in FASTA and SSEARCH.
85 force Smith-Waterman alignment for output. Smith-Waterman is the
86 default for protein sequences and FASTX3, but not for TFASTA3 or DNA
87 comparisons with FASTA3.
90 number of best scores to show (must be < -E cutoff if -E is given)
93 show z-scores rather than bit scores
96 threshold for band optimization (FASTA, FASTX)
99 (fasta34t11d4) length of name abbreviation in alignments, default = 6.
102 number of best alignments to show ( must be < -e cutoff)
105 turn on debugging mode. Enables checks on sequence alphabet that
106 cause problems with tfastx3, tfasty3, tfasta3.
109 expectation value upper limit for score and alignment display.
110 Defaults are 10.0 for FASTA3 and SSEARCH3 protein searches, 5.0 for
111 translated DNA/protein comparisons, and 2.0 for DNA/DNA searches.
114 penalty for opening a gap (or first residue for older versions)
117 expectation value lower limit for score and alignment display.
118 -F 1e-6 prevents library sequences with E()-values lower than 1e-6
119 from being displayed. This allows the use to focus on more distant
123 penalty for additional residues in a gap
126 (FASTX3, TFASTX3, FASTY3, TFASTY3 only) penalty for a frameshift between
130 (FASTY3, TFASTY3 only) penalty for a frameshift within a codon.
133 turn off histogram display
136 (DNA only) reverse complement the query sequence. (TFASTX) compare against
137 only the reverse complement of the library sequence.
140 specify FASTLIBS file
143 report long sequence description in alignments
145 \-m 0,1,2,3,4,5,6,9,10 alignment display options. \fC-m 0, 1, 2, 3\fP
146 display different types of alignments. \fC-m 4\fP provides an
147 alignment "map" on the query. \fC-m 5\fP combines the alignment map
148 and a \fC-m 0\fP alignment. \fC-m 6\fP provides an HTML output.
149 \fC-m 9\fP does not change the alignment output, but provides
150 alignment coordinate and percent identity information with the best
151 scores report. \fC-m 9c\fP adds encoded alignment information to the
152 \fC-m 9\fP; \fC-m 9i\fP provides only percent identity and alignment
153 length information with the best scores. With current versions of the
154 FASTA programs, independent \fC-m\fP options can be combined;
155 e.g. \fC-m 1 -m 9c -m 6\fP.
158 molecular weight (residue) cutoffs. -M "101-200" examines only sequences that are 101-200 residues long.
161 force query to nucleotide sequence
164 break long library sequences into blocks of # residues. Useful for
165 bacterial genomes, which have only one sequence entry. -N 2000 works
166 well for well for bacterial genomes.
169 (FASTA) turn fasta band optimization off during initial phase. This was
170 the behavior of fasta1.x versions.
176 quiet option; do not prompt for input
179 values for match/mismatch for DNA comparisons. \fC+n\fP is
180 used for the maximum positive value and \fC-m\fP is used for the
181 maximum negative value. Values between max and min, are rescaled, but
182 residue pairs having the value -1 continue to be -1.
185 save all scores to statistics file (previously -r file)
188 specify substitution matrix. BLOSUM50 is used by default;
189 PAM250, PAM120, and BLOSUM62 can be specified by setting -s P120,
190 P250, or BL62. With this version, many more scoring matrices are
191 available, including BLOSUM80 (BL80), and MDM10, MDM20, MDM40 (Jones,
192 Taylor, and Thornton, 1992 CABIOS 8:275-282; specified as -s M10, -s
193 M20, -s M40). Alternatively, BLASTP1.4 format scoring matrix files can
194 be specified. BL80, BL62, and P120 are scaled in 1/2 bit units; all
195 the other matrices use 1/3 bit units. DNA scoring matrices can also
196 be specified with the "-r" option.
199 treat lower case letters in the query or database as low complexity
200 regions that are equivalent to 'X' during the initial database scan,
201 but are treated as normal residues for the final alignment display.
202 Statistical estimates are based on the 'X'ed out sequence used during
203 the initial search. Protein databases (and query sequences) can be
204 generated in the appropriate format using John Wooton's "pseg"
205 program, available from ftp://ncbi.nlm.nih.gov/pub/seg/pseg. Once you
206 have compiled the "pseg" program, use the command:
208 \fCpseg database.fasta -z 1 -q > database.lc_seg\fP
211 Translation table - tfasta3, fastx3, tfastx3, fasty3, and
212 tfasty3 now support the BLAST tranlation tables. See
213 \fChttp://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi\fP.
215 In addition, "\-t t" or "\-t t#" turns on the addition of an implicit termination
216 codon to a protein:translated DNA match. That is, each protein
217 sequence implicitly ends with "*", which matches the termination codes
218 for the appropriate genetic code. "\-t t#" sets implicit termination
219 and a different genetic code.
222 (threaded, parallel only) number of threads or workers to use (set by
223 default to 4 at compile time).
226 Do RNA sequence comparisons: treat 'T' as 'U', allow G:U base pairs (by
227 scoring "G-A" and "T-C" as "G-G" -1). Search only one strand.
230 Allow special annotation characters in query sequence. These characters
231 will be displayed in the alignments on the coordinate number line.
233 \-w # line width for similarity score, sequence alignment, output.
235 \-W # context length (default is 1/2 of line width -w) for alignment,
236 like fasta and ssearch, that provide additional sequence context.
239 scores used for matches to 'X:X','N:N', '*:*' matches, and the corresponding
240 'X:not-X', etc, mismatches, overriding the values
241 specified in the scoring matrix. If only one value is given, it is
242 used for both values.
245 offsets query, library sequence for numbering alignments
248 Width for band optimization; by default 16 for DNA and protein ktup=2;
249 32 for protein ktup=1;
252 Specify statistical calculation. Default is -z 1, which uses
253 regression against the length of the library sequence. -z 0 disables
254 statistics. -z 2 provides maximum likelihood estimates for lambda and K,
255 censoring the 250 lowest and 250 highest scores. -z 3 uses Altschul
256 and Gish's statistical estimates for specific protein BLOSUM scoring
257 matrices and gap penalties. -z 4,5: an alternate regression method.
258 \-z 6 uses a composition based maximum likelihood estimate based
259 on the method of Mott (1992) Bull. Math. Biol. 54:59-75.
260 -z 11,12,14,15,16: compute the regression against scores of randomly
261 shuffled copies of the library sequences. Twice as many comparisons
262 are performed, but accurate estimates can be generated from databases
263 of related sequences. -z 11 uses the -z 1 regression strategy, etc.
266 Set the apparent database size used for expectation value calculations
267 (used for protein/protein FASTA and SSEARCH, and for FASTX, FASTY, TFASTX,
269 .SH Environment variables:
272 location of library choice file (-l FASTLIBS)
275 default scoring matrix (-s SMATRIX)
278 the format string used to define the option to re-search the
282 the format string used to define the option to lookup the library
283 sequence in entrez, or some other database.