binaries/src/fasta34/fasta3.1

   1 .TH FASTA/TFASTA/FASTX/TFASTXv3 1 local
   2 .SH NAME
   3 fasta3, fasta3_t \- scan a protein or DNA sequence library for similar
   4 sequences
   5
   6 tfasta3, tfasta3_t \- compare a protein sequence to a DNA sequence
   7 library, translating the DNA sequence library `on-the-fly'.
   8
   9 fastx3, fastx3_t \ - compare a DNA sequence to a protein sequence
  10 database, comparing the translated DNA sequence in forward and
  11 reverse frames.
  12
  13 tfastx3, tfastx3_t \ - compare a protein sequence to a DNA sequence
  14 database, calculating similarities with frameshifts to the forward and
  15 reverse orientations.
  16
  17 fasty3, fasty3_t \ - compare a DNA sequence to a protein sequence
  18 database, comparing the translated DNA sequence in forward and reverse
  19 frames.
  20
  21 tfasty3, tfasty3_t \ - compare a protein sequence to a DNA sequence
  22 database, calculating similarities with frameshifts to the forward and
  23 reverse orientations.
  24
  25 fasts3, fasts3_t \- compare unordered peptides to a protein sequence database
  26
  27 tfasts3, tfasts3_t \- compare unordered peptides to a translated DNA
  28 sequence database
  29
  30 fastf3, fastf3_t \- compare mixed peptides to a protein sequence database
  31
  32 tfastf3, tfastf3_t \- compare mixed peptides to a translated DNA
  33 sequence database
  34
  35 ssearch3, ssearch3_t \- compare a protein or DNA sequence to a
  36 sequence database using the Smith-Waterman algorithm.
  37
  38 prss3, prfx3 \- estimate statistical significance of an alignment by
  39 comparing the score to the distribution of similarity scores generated
  40 by shuffling the second sequence.  prss3 uses Smith-Waterman.  prfx3
  41 uses the fastx algorithm.
  42
  43 .SH DESCRIPTION
  44
  45 Release 3.x of the FASTA package provides a modular set of sequence
  46 comparison programs that can run on conventional single processor
  47 computers or in parallel on multiprocessor computers. Seven different
  48 programs \- fasta3, fastx3, fasty3, tfastx3, tfasty3, tfasta3, and
  49 ssearch3 \- are currently available.
  50
  51 All of the comparison programs share a set of basic command line
  52 options; additional options are available for individual comparison
  53 functions.
  54
  55 The fasta3_t, fastx3_t, fasty3_t, tfasta3_t, tfastx3_t, tfasty3_t and
  56 ssearch3_t programs are threaded versions that will run in parallel on
  57 Digital Equipment, Sun, and SGI multiprocessor computers.
  58
  59 .SH Options for comparison functions
  60 .LP
  61 These versions of the fasta programs have been modified to accept a
  62 query sequence from the unix "stdin" data stream.  This makes it much
  63 easier to use fasta3 and its relatives as part of a WWW page. To
  64 indicate that stdin is to be used, use "@" as the query
  65 sequence file name.  "@" can also be used to specify a
  66 subset of the query sequence to be used, e.g:
  67 .sp
  68 .ti 0.5i
  69 cat query.aa | fasta3 -q @:50-150 s
  70 .sp
  71 would search the 's' database with residues 50-150 of query.aa.  FASTA
  72 cannot automatically detect the sequence type (protein vs DNA) when
  73 "stdin" is used, so the '-n' option is required for DNA.
  74 .TP
  75 \-1
  76 Sort by "init1" score.
  77 .TP
  78 \-3
  79 (TFASTA3, TFASTX/Y3 only) use only forward frame translations
  80 .TP
  81 \-a #
  82 "SHOWALL" option attempts to align all of both sequences in FASTA and SSEARCH.
  83 .TP
  84 \-A
  85 force Smith-Waterman alignment for output.  Smith-Waterman is the
  86 default for protein sequences and FASTX3, but not for TFASTA3 or DNA
  87 comparisons with FASTA3.
  88 .TP
  89 \-b #
  90 number of best scores to show (must be < -E cutoff if -E is given)
  91 .TP
  92 \-B
  93 show z-scores rather than bit scores
  94 .TP
  95 \-c #
  96 threshold for band optimization (FASTA, FASTX)
  97 .TP
  98 \-C #
  99 (fasta34t11d4) length of name abbreviation in alignments, default = 6.
 100 .TP
 101 \-d #
 102 number of best alignments to show ( must be < -e cutoff)
 103 .TP
 104 \-D
 105 turn on debugging mode.  Enables checks on sequence alphabet that
 106 cause problems with tfastx3, tfasty3, tfasta3.
 107 .TP
 108 \-E #
 109 expectation value upper limit for score and alignment display.
 110 Defaults are 10.0 for FASTA3 and SSEARCH3 protein searches, 5.0 for
 111 translated DNA/protein comparisons, and 2.0 for DNA/DNA searches.
 112 .TP
 113 \-f #
 114 penalty for opening a gap (or first residue for older versions)
 115 .TP
 116 \-F #
 117 expectation value lower limit for score and alignment display.
 118 -F 1e-6 prevents library sequences with E()-values lower than 1e-6
 119 from being displayed. This allows the use to focus on more distant
 120 relationships.
 121 .TP
 122 \-g #
 123 penalty for additional residues in a gap
 124 .TP
 125 \-h #
 126 (FASTX3, TFASTX3, FASTY3, TFASTY3 only) penalty for a frameshift between
 127 two codons.
 128 .TP
 129 \-j #
 130 (FASTY3, TFASTY3 only) penalty for a frameshift within a codon.
 131 .TP
 132 \-H
 133 turn off histogram display
 134 .TP
 135 \-i
 136 (DNA only) reverse complement the query sequence. (TFASTX) compare against
 137 only the reverse complement of the library sequence.
 138 .TP
 139 \-l str
 140 specify FASTLIBS file
 141 .TP
 142 \-L
 143 report long sequence description in alignments
 144 .TP
 145 \-m 0,1,2,3,4,5,6,9,10 alignment display options.  \fC-m 0, 1, 2, 3\fP
 146 display different types of alignments.  \fC-m 4\fP provides an
 147 alignment "map" on the query. \fC-m 5\fP combines the alignment map
 148 and a \fC-m 0\fP alignment.  \fC-m 6\fP provides an HTML output.
 149 \fC-m 9\fP does not change the alignment output, but provides
 150 alignment coordinate and percent identity information with the best
 151 scores report.  \fC-m 9c\fP adds encoded alignment information to the
 152 \fC-m 9\fP; \fC-m 9i\fP provides only percent identity and alignment
 153 length information with the best scores.  With current versions of the
 154 FASTA programs, independent \fC-m\fP options can be combined;
 155 e.g. \fC-m 1 -m 9c -m 6\fP.
 156 .TP
 157 \-M #-#
 158 molecular weight (residue) cutoffs.  -M "101-200" examines only sequences that are 101-200 residues long.
 159 .TP
 160 \-n
 161 force query to nucleotide sequence
 162 .TP
 163 \-N #
 164 break long library sequences into blocks of # residues.  Useful for
 165 bacterial genomes, which have only one sequence entry.  -N 2000 works
 166 well for well for bacterial genomes.
 167 .TP
 168 \-o
 169 (FASTA) turn fasta band optimization off during initial phase.  This was
 170 the behavior of fasta1.x versions.
 171 .TP
 172 \-O file
 173 send output to file
 174 .TP
 175 \-q/-Q
 176 quiet option; do not prompt for input
 177 .TP
 178 \-r "+n/-m"
 179 values for match/mismatch for DNA comparisons. \fC+n\fP is
 180 used for the maximum positive value and \fC-m\fP is used for the
 181 maximum negative value. Values between max and min, are rescaled, but
 182 residue pairs having the value -1 continue to be -1.
 183 .TP
 184 \-R file
 185 save all scores to statistics file (previously -r file)
 186 .TP
 187 \-s name
 188 specify substitution matrix.  BLOSUM50 is used by default;
 189 PAM250, PAM120, and BLOSUM62 can be specified by setting -s P120,
 190 P250, or BL62.  With this version, many more scoring matrices are
 191 available, including BLOSUM80 (BL80), and MDM10, MDM20, MDM40 (Jones,
 192 Taylor, and Thornton, 1992 CABIOS 8:275-282; specified as -s M10, -s
 193 M20, -s M40). Alternatively, BLASTP1.4 format scoring matrix files can
 194 be specified.  BL80, BL62, and P120 are scaled in 1/2 bit units; all
 195 the other matrices use 1/3 bit units.  DNA scoring matrices can also
 196 be specified with the "-r" option.
 197 .TP
 198 \-S
 199 treat lower case letters in the query or database as low complexity
 200 regions that are equivalent to 'X' during the initial database scan,
 201 but are treated as normal residues for the final alignment display.
 202 Statistical estimates are based on the 'X'ed out sequence used during
 203 the initial search. Protein databases (and query sequences) can be
 204 generated in the appropriate format using John Wooton's "pseg"
 205 program, available from ftp://ncbi.nlm.nih.gov/pub/seg/pseg.  Once you
 206 have compiled the "pseg" program, use the command:
 207 .IP
 208 \fCpseg database.fasta -z 1 -q  > database.lc_seg\fP
 209 .TP
 210 \-t #
 211 Translation table - tfasta3, fastx3, tfastx3, fasty3, and
 212 tfasty3 now support the BLAST tranlation tables.  See
 213 \fChttp://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi\fP.
 214 .IP
 215 In addition, "\-t t" or "\-t t#" turns on the addition of an implicit termination
 216 codon to a protein:translated DNA match.  That is, each protein
 217 sequence implicitly ends with "*", which matches the termination codes
 218 for the appropriate genetic code.  "\-t t#" sets implicit termination
 219 and a different genetic code.
 220 .TP
 221 \-T #
 222 (threaded, parallel only) number of threads or workers to use (set by
 223 default to 4 at compile time).
 224 .TP
 225 \-U
 226 Do RNA sequence comparisons: treat 'T' as 'U', allow G:U base pairs (by
 227 scoring "G-A" and "T-C" as "G-G" -1).  Search only one strand.
 228 .TP
 229 \-V "?$%*"
 230 Allow special annotation characters in query sequence.  These characters
 231 will be displayed in the alignments on the coordinate number line.
 232 .TP
 233 \-w # line width for similarity score, sequence alignment, output.
 234 .TP
 235 \-W # context length (default is 1/2 of line width -w) for alignment,
 236 like fasta and ssearch, that provide additional sequence context.
 237 .TP
 238 \-x #match,#mismatch
 239 scores used for matches to 'X:X','N:N', '*:*' matches, and the corresponding
 240 'X:not-X', etc, mismatches, overriding the values
 241 specified in the scoring matrix.  If only one value is given, it is
 242 used for both values.
 243 .TP
 244 \-X "#,#"
 245 offsets query, library sequence for numbering alignments
 246 .TP
 247 \-y #
 248 Width for band optimization; by default 16 for DNA and protein ktup=2;
 249 32 for protein ktup=1;
 250 .TP
 251 \-z #
 252 Specify statistical calculation. Default is -z 1, which uses
 253 regression against the length of the library sequence. -z 0 disables
 254 statistics.  -z 2 provides maximum likelihood estimates for lambda and K,
 255 censoring the 250 lowest and 250 highest scores. -z 3 uses Altschul
 256 and Gish's statistical estimates for specific protein BLOSUM scoring
 257 matrices and gap penalties. -z 4,5: an alternate regression method.
 258 \-z 6 uses a composition based maximum likelihood estimate based
 259 on the method of Mott (1992) Bull. Math. Biol. 54:59-75.
 260 -z 11,12,14,15,16: compute the regression against scores of randomly
 261 shuffled copies of the library sequences.  Twice as many comparisons
 262 are performed, but accurate estimates can be generated from databases
 263 of related sequences. -z 11 uses the -z 1 regression strategy, etc.
 264 .TP
 265 \-Z db_size
 266 Set the apparent database size used for expectation value calculations
 267 (used for protein/protein FASTA and SSEARCH, and for FASTX, FASTY, TFASTX,
 268 and TFASTY).
 269 .SH Environment variables:
 270 .TP
 271 FASTLIBS
 272 location of library choice file (-l FASTLIBS)
 273 .TP
 274 SMATRIX
 275 default scoring matrix (-s SMATRIX)
 276 .TP
 277 SRCH_URL
 278 the format string used to define the option to re-search the
 279 database.
 280 .TP
 281 REF_URL
 282 the format string used to define the option to lookup the library
 283 sequence in entrez, or some other database.
 284
 285 .SH AUTHOR
 286 Bill Pearson
 287 .br
 288 wrp@virginia.EDU