.TH FASTA/TFASTA/FASTX/TFASTXv3 1 local
.SH NAME
fasta3, fasta3_t \- scan a protein or DNA sequence library for similar
sequences

tfasta3, tfasta3_t \- compare a protein sequence to a DNA sequence
library, translating the DNA sequence library `on-the-fly'.

fastx3, fastx3_t \ - compare a DNA sequence to a protein sequence
database, comparing the translated DNA sequence in forward and
reverse frames.

tfastx3, tfastx3_t \ - compare a protein sequence to a DNA sequence
database, calculating similarities with frameshifts to the forward and
reverse orientations.

fasty3, fasty3_t \ - compare a DNA sequence to a protein sequence
database, comparing the translated DNA sequence in forward and reverse
frames.

tfasty3, tfasty3_t \ - compare a protein sequence to a DNA sequence
database, calculating similarities with frameshifts to the forward and
reverse orientations.

fasts3, fasts3_t \- compare unordered peptides to a protein sequence database

tfasts3, tfasts3_t \- compare unordered peptides to a translated DNA
sequence database

fastf3, fastf3_t \- compare mixed peptides to a protein sequence database

tfastf3, tfastf3_t \- compare mixed peptides to a translated DNA
sequence database

ssearch3, ssearch3_t \- compare a protein or DNA sequence to a
sequence database using the Smith-Waterman algorithm.

prss3, prfx3 \- estimate statistical significance of an alignment by
comparing the score to the distribution of similarity scores generated
by shuffling the second sequence.  prss3 uses Smith-Waterman.  prfx3
uses the fastx algorithm.

.SH DESCRIPTION

Release 3.x of the FASTA package provides a modular set of sequence
comparison programs that can run on conventional single processor
computers or in parallel on multiprocessor computers. Seven different
programs \- fasta3, fastx3, fasty3, tfastx3, tfasty3, tfasta3, and
ssearch3 \- are currently available.

All of the comparison programs share a set of basic command line
options; additional options are available for individual comparison
functions.

The fasta3_t, fastx3_t, fasty3_t, tfasta3_t, tfastx3_t, tfasty3_t and
ssearch3_t programs are threaded versions that will run in parallel on
Digital Equipment, Sun, and SGI multiprocessor computers.

.SH Options for comparison functions
.LP
These versions of the fasta programs have been modified to accept a
query sequence from the unix "stdin" data stream.  This makes it much
easier to use fasta3 and its relatives as part of a WWW page. To
indicate that stdin is to be used, use "@" as the query
sequence file name.  "@" can also be used to specify a
subset of the query sequence to be used, e.g:
.sp
.ti 0.5i
cat query.aa | fasta3 -q @:50-150 s
.sp
would search the 's' database with residues 50-150 of query.aa.  FASTA
cannot automatically detect the sequence type (protein vs DNA) when
"stdin" is used, so the '-n' option is required for DNA.
.TP
\-1
Sort by "init1" score.
.TP
\-3
(TFASTA3, TFASTX/Y3 only) use only forward frame translations
.TP
\-a #
"SHOWALL" option attempts to align all of both sequences in FASTA and SSEARCH.
.TP
\-A
force Smith-Waterman alignment for output.  Smith-Waterman is the
default for protein sequences and FASTX3, but not for TFASTA3 or DNA
comparisons with FASTA3.
.TP
\-b #
number of best scores to show (must be < -E cutoff if -E is given)
.TP
\-B
show z-scores rather than bit scores
.TP
\-c #
threshold for band optimization (FASTA, FASTX)
.TP
\-C #
(fasta34t11d4) length of name abbreviation in alignments, default = 6.
.TP
\-d #
number of best alignments to show ( must be < -e cutoff)
.TP
\-D
turn on debugging mode.  Enables checks on sequence alphabet that
cause problems with tfastx3, tfasty3, tfasta3.
.TP
\-E #
expectation value upper limit for score and alignment display.
Defaults are 10.0 for FASTA3 and SSEARCH3 protein searches, 5.0 for
translated DNA/protein comparisons, and 2.0 for DNA/DNA searches.
.TP
\-f #
penalty for opening a gap (or first residue for older versions)
.TP
\-F #
expectation value lower limit for score and alignment display.
-F 1e-6 prevents library sequences with E()-values lower than 1e-6
from being displayed. This allows the use to focus on more distant
relationships.
.TP
\-g #
penalty for additional residues in a gap
.TP
\-h #
(FASTX3, TFASTX3, FASTY3, TFASTY3 only) penalty for a frameshift between
two codons.
.TP
\-j #
(FASTY3, TFASTY3 only) penalty for a frameshift within a codon.
.TP
\-H
turn off histogram display
.TP
\-i
(DNA only) reverse complement the query sequence. (TFASTX) compare against
only the reverse complement of the library sequence.
.TP
\-l str
specify FASTLIBS file
.TP
\-L
report long sequence description in alignments
.TP
\-m 0,1,2,3,4,5,6,9,10 alignment display options.  \fC-m 0, 1, 2, 3\fP
display different types of alignments.  \fC-m 4\fP provides an
alignment "map" on the query. \fC-m 5\fP combines the alignment map
and a \fC-m 0\fP alignment.  \fC-m 6\fP provides an HTML output.
\fC-m 9\fP does not change the alignment output, but provides
alignment coordinate and percent identity information with the best
scores report.  \fC-m 9c\fP adds encoded alignment information to the
\fC-m 9\fP; \fC-m 9i\fP provides only percent identity and alignment
length information with the best scores.  With current versions of the
FASTA programs, independent \fC-m\fP options can be combined;
e.g. \fC-m 1 -m 9c -m 6\fP.
.TP
\-M #-#
molecular weight (residue) cutoffs.  -M "101-200" examines only sequences that are 101-200 residues long.
.TP
\-n
force query to nucleotide sequence
.TP
\-N #
break long library sequences into blocks of # residues.  Useful for
bacterial genomes, which have only one sequence entry.  -N 2000 works
well for well for bacterial genomes.
.TP
\-o
(FASTA) turn fasta band optimization off during initial phase.  This was
the behavior of fasta1.x versions.
.TP
\-O file
send output to file
.TP
\-q/-Q
quiet option; do not prompt for input
.TP
\-r "+n/-m" 
values for match/mismatch for DNA comparisons. \fC+n\fP is
used for the maximum positive value and \fC-m\fP is used for the
maximum negative value. Values between max and min, are rescaled, but
residue pairs having the value -1 continue to be -1.
.TP 
\-R file
save all scores to statistics file (previously -r file)
.TP
\-s name
specify substitution matrix.  BLOSUM50 is used by default;
PAM250, PAM120, and BLOSUM62 can be specified by setting -s P120,
P250, or BL62.  With this version, many more scoring matrices are
available, including BLOSUM80 (BL80), and MDM10, MDM20, MDM40 (Jones,
Taylor, and Thornton, 1992 CABIOS 8:275-282; specified as -s M10, -s
M20, -s M40). Alternatively, BLASTP1.4 format scoring matrix files can
be specified.  BL80, BL62, and P120 are scaled in 1/2 bit units; all
the other matrices use 1/3 bit units.  DNA scoring matrices can also
be specified with the "-r" option.
.TP
\-S
treat lower case letters in the query or database as low complexity
regions that are equivalent to 'X' during the initial database scan,
but are treated as normal residues for the final alignment display.
Statistical estimates are based on the 'X'ed out sequence used during
the initial search. Protein databases (and query sequences) can be
generated in the appropriate format using John Wooton's "pseg"
program, available from ftp://ncbi.nlm.nih.gov/pub/seg/pseg.  Once you
have compiled the "pseg" program, use the command:
.IP
\fCpseg database.fasta -z 1 -q  > database.lc_seg\fP
.TP
\-t #
Translation table - tfasta3, fastx3, tfastx3, fasty3, and
tfasty3 now support the BLAST tranlation tables.  See
\fChttp://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi\fP.
.IP
In addition, "\-t t" or "\-t t#" turns on the addition of an implicit termination
codon to a protein:translated DNA match.  That is, each protein
sequence implicitly ends with "*", which matches the termination codes
for the appropriate genetic code.  "\-t t#" sets implicit termination
and a different genetic code.
.TP
\-T #
(threaded, parallel only) number of threads or workers to use (set by
default to 4 at compile time).
.TP
\-U
Do RNA sequence comparisons: treat 'T' as 'U', allow G:U base pairs (by 
scoring "G-A" and "T-C" as "G-G" -1).  Search only one strand.
.TP
\-V "?$%*"
Allow special annotation characters in query sequence.  These characters
will be displayed in the alignments on the coordinate number line.
.TP
\-w # line width for similarity score, sequence alignment, output.
.TP
\-W # context length (default is 1/2 of line width -w) for alignment,
like fasta and ssearch, that provide additional sequence context.
.TP
\-x #match,#mismatch
scores used for matches to 'X:X','N:N', '*:*' matches, and the corresponding
'X:not-X', etc, mismatches, overriding the values
specified in the scoring matrix.  If only one value is given, it is
used for both values.
.TP
\-X "#,#"
offsets query, library sequence for numbering alignments
.TP
\-y #
Width for band optimization; by default 16 for DNA and protein ktup=2;
32 for protein ktup=1;
.TP
\-z #
Specify statistical calculation. Default is -z 1, which uses
regression against the length of the library sequence. -z 0 disables
statistics.  -z 2 provides maximum likelihood estimates for lambda and K,
censoring the 250 lowest and 250 highest scores. -z 3 uses Altschul
and Gish's statistical estimates for specific protein BLOSUM scoring
matrices and gap penalties. -z 4,5: an alternate regression method.
\-z 6 uses a composition based maximum likelihood estimate based
on the method of Mott (1992) Bull. Math. Biol. 54:59-75.
-z 11,12,14,15,16: compute the regression against scores of randomly
shuffled copies of the library sequences.  Twice as many comparisons
are performed, but accurate estimates can be generated from databases
of related sequences. -z 11 uses the -z 1 regression strategy, etc.
.TP
\-Z db_size
Set the apparent database size used for expectation value calculations
(used for protein/protein FASTA and SSEARCH, and for FASTX, FASTY, TFASTX,
and TFASTY).
.SH Environment variables:
.TP
FASTLIBS
location of library choice file (-l FASTLIBS)
.TP
SMATRIX
default scoring matrix (-s SMATRIX)
.TP
SRCH_URL
the format string used to define the option to re-search the
database.
.TP
REF_URL
the format string used to define the option to lookup the library
sequence in entrez, or some other database.

.SH AUTHOR
Bill Pearson
.br
wrp@virginia.EDU