X-Git-Url: http://source.jalview.org/gitweb/?a=blobdiff_plain;ds=inline;f=website%2Farchive%2Fbinaries%2Fmac%2Fsrc%2Ffasta34%2Ffasta3.1;fp=website%2Farchive%2Fbinaries%2Fmac%2Fsrc%2Ffasta34%2Ffasta3.1;h=2275c0dae41453c642390fc685f2fe4a76b6cc47;hb=dbde3fb6f00b9bb770343631a517c0e599db8528;hp=0000000000000000000000000000000000000000;hpb=85f830bbd51a7277994bd4233141016304e210c9;p=jabaws.git diff --git a/website/archive/binaries/mac/src/fasta34/fasta3.1 b/website/archive/binaries/mac/src/fasta34/fasta3.1 new file mode 100644 index 0000000..2275c0d --- /dev/null +++ b/website/archive/binaries/mac/src/fasta34/fasta3.1 @@ -0,0 +1,288 @@ +.TH FASTA/TFASTA/FASTX/TFASTXv3 1 local +.SH NAME +fasta3, fasta3_t \- scan a protein or DNA sequence library for similar +sequences + +tfasta3, tfasta3_t \- compare a protein sequence to a DNA sequence +library, translating the DNA sequence library `on-the-fly'. + +fastx3, fastx3_t \ - compare a DNA sequence to a protein sequence +database, comparing the translated DNA sequence in forward and +reverse frames. + +tfastx3, tfastx3_t \ - compare a protein sequence to a DNA sequence +database, calculating similarities with frameshifts to the forward and +reverse orientations. + +fasty3, fasty3_t \ - compare a DNA sequence to a protein sequence +database, comparing the translated DNA sequence in forward and reverse +frames. + +tfasty3, tfasty3_t \ - compare a protein sequence to a DNA sequence +database, calculating similarities with frameshifts to the forward and +reverse orientations. + +fasts3, fasts3_t \- compare unordered peptides to a protein sequence database + +tfasts3, tfasts3_t \- compare unordered peptides to a translated DNA +sequence database + +fastf3, fastf3_t \- compare mixed peptides to a protein sequence database + +tfastf3, tfastf3_t \- compare mixed peptides to a translated DNA +sequence database + +ssearch3, ssearch3_t \- compare a protein or DNA sequence to a +sequence database using the Smith-Waterman algorithm. + +prss3, prfx3 \- estimate statistical significance of an alignment by +comparing the score to the distribution of similarity scores generated +by shuffling the second sequence. prss3 uses Smith-Waterman. prfx3 +uses the fastx algorithm. + +.SH DESCRIPTION + +Release 3.x of the FASTA package provides a modular set of sequence +comparison programs that can run on conventional single processor +computers or in parallel on multiprocessor computers. Seven different +programs \- fasta3, fastx3, fasty3, tfastx3, tfasty3, tfasta3, and +ssearch3 \- are currently available. + +All of the comparison programs share a set of basic command line +options; additional options are available for individual comparison +functions. + +The fasta3_t, fastx3_t, fasty3_t, tfasta3_t, tfastx3_t, tfasty3_t and +ssearch3_t programs are threaded versions that will run in parallel on +Digital Equipment, Sun, and SGI multiprocessor computers. + +.SH Options for comparison functions +.LP +These versions of the fasta programs have been modified to accept a +query sequence from the unix "stdin" data stream. This makes it much +easier to use fasta3 and its relatives as part of a WWW page. To +indicate that stdin is to be used, use "@" as the query +sequence file name. "@" can also be used to specify a +subset of the query sequence to be used, e.g: +.sp +.ti 0.5i +cat query.aa | fasta3 -q @:50-150 s +.sp +would search the 's' database with residues 50-150 of query.aa. FASTA +cannot automatically detect the sequence type (protein vs DNA) when +"stdin" is used, so the '-n' option is required for DNA. +.TP +\-1 +Sort by "init1" score. +.TP +\-3 +(TFASTA3, TFASTX/Y3 only) use only forward frame translations +.TP +\-a # +"SHOWALL" option attempts to align all of both sequences in FASTA and SSEARCH. +.TP +\-A +force Smith-Waterman alignment for output. Smith-Waterman is the +default for protein sequences and FASTX3, but not for TFASTA3 or DNA +comparisons with FASTA3. +.TP +\-b # +number of best scores to show (must be < -E cutoff if -E is given) +.TP +\-B +show z-scores rather than bit scores +.TP +\-c # +threshold for band optimization (FASTA, FASTX) +.TP +\-C # +(fasta34t11d4) length of name abbreviation in alignments, default = 6. +.TP +\-d # +number of best alignments to show ( must be < -e cutoff) +.TP +\-D +turn on debugging mode. Enables checks on sequence alphabet that +cause problems with tfastx3, tfasty3, tfasta3. +.TP +\-E # +expectation value upper limit for score and alignment display. +Defaults are 10.0 for FASTA3 and SSEARCH3 protein searches, 5.0 for +translated DNA/protein comparisons, and 2.0 for DNA/DNA searches. +.TP +\-f # +penalty for opening a gap (or first residue for older versions) +.TP +\-F # +expectation value lower limit for score and alignment display. +-F 1e-6 prevents library sequences with E()-values lower than 1e-6 +from being displayed. This allows the use to focus on more distant +relationships. +.TP +\-g # +penalty for additional residues in a gap +.TP +\-h # +(FASTX3, TFASTX3, FASTY3, TFASTY3 only) penalty for a frameshift between +two codons. +.TP +\-j # +(FASTY3, TFASTY3 only) penalty for a frameshift within a codon. +.TP +\-H +turn off histogram display +.TP +\-i +(DNA only) reverse complement the query sequence. (TFASTX) compare against +only the reverse complement of the library sequence. +.TP +\-l str +specify FASTLIBS file +.TP +\-L +report long sequence description in alignments +.TP +\-m 0,1,2,3,4,5,6,9,10 alignment display options. \fC-m 0, 1, 2, 3\fP +display different types of alignments. \fC-m 4\fP provides an +alignment "map" on the query. \fC-m 5\fP combines the alignment map +and a \fC-m 0\fP alignment. \fC-m 6\fP provides an HTML output. +\fC-m 9\fP does not change the alignment output, but provides +alignment coordinate and percent identity information with the best +scores report. \fC-m 9c\fP adds encoded alignment information to the +\fC-m 9\fP; \fC-m 9i\fP provides only percent identity and alignment +length information with the best scores. With current versions of the +FASTA programs, independent \fC-m\fP options can be combined; +e.g. \fC-m 1 -m 9c -m 6\fP. +.TP +\-M #-# +molecular weight (residue) cutoffs. -M "101-200" examines only sequences that are 101-200 residues long. +.TP +\-n +force query to nucleotide sequence +.TP +\-N # +break long library sequences into blocks of # residues. Useful for +bacterial genomes, which have only one sequence entry. -N 2000 works +well for well for bacterial genomes. +.TP +\-o +(FASTA) turn fasta band optimization off during initial phase. This was +the behavior of fasta1.x versions. +.TP +\-O file +send output to file +.TP +\-q/-Q +quiet option; do not prompt for input +.TP +\-r "+n/-m" +values for match/mismatch for DNA comparisons. \fC+n\fP is +used for the maximum positive value and \fC-m\fP is used for the +maximum negative value. Values between max and min, are rescaled, but +residue pairs having the value -1 continue to be -1. +.TP +\-R file +save all scores to statistics file (previously -r file) +.TP +\-s name +specify substitution matrix. BLOSUM50 is used by default; +PAM250, PAM120, and BLOSUM62 can be specified by setting -s P120, +P250, or BL62. With this version, many more scoring matrices are +available, including BLOSUM80 (BL80), and MDM10, MDM20, MDM40 (Jones, +Taylor, and Thornton, 1992 CABIOS 8:275-282; specified as -s M10, -s +M20, -s M40). Alternatively, BLASTP1.4 format scoring matrix files can +be specified. BL80, BL62, and P120 are scaled in 1/2 bit units; all +the other matrices use 1/3 bit units. DNA scoring matrices can also +be specified with the "-r" option. +.TP +\-S +treat lower case letters in the query or database as low complexity +regions that are equivalent to 'X' during the initial database scan, +but are treated as normal residues for the final alignment display. +Statistical estimates are based on the 'X'ed out sequence used during +the initial search. Protein databases (and query sequences) can be +generated in the appropriate format using John Wooton's "pseg" +program, available from ftp://ncbi.nlm.nih.gov/pub/seg/pseg. Once you +have compiled the "pseg" program, use the command: +.IP +\fCpseg database.fasta -z 1 -q > database.lc_seg\fP +.TP +\-t # +Translation table - tfasta3, fastx3, tfastx3, fasty3, and +tfasty3 now support the BLAST tranlation tables. See +\fChttp://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi\fP. +.IP +In addition, "\-t t" or "\-t t#" turns on the addition of an implicit termination +codon to a protein:translated DNA match. That is, each protein +sequence implicitly ends with "*", which matches the termination codes +for the appropriate genetic code. "\-t t#" sets implicit termination +and a different genetic code. +.TP +\-T # +(threaded, parallel only) number of threads or workers to use (set by +default to 4 at compile time). +.TP +\-U +Do RNA sequence comparisons: treat 'T' as 'U', allow G:U base pairs (by +scoring "G-A" and "T-C" as "G-G" -1). Search only one strand. +.TP +\-V "?$%*" +Allow special annotation characters in query sequence. These characters +will be displayed in the alignments on the coordinate number line. +.TP +\-w # line width for similarity score, sequence alignment, output. +.TP +\-W # context length (default is 1/2 of line width -w) for alignment, +like fasta and ssearch, that provide additional sequence context. +.TP +\-x #match,#mismatch +scores used for matches to 'X:X','N:N', '*:*' matches, and the corresponding +'X:not-X', etc, mismatches, overriding the values +specified in the scoring matrix. If only one value is given, it is +used for both values. +.TP +\-X "#,#" +offsets query, library sequence for numbering alignments +.TP +\-y # +Width for band optimization; by default 16 for DNA and protein ktup=2; +32 for protein ktup=1; +.TP +\-z # +Specify statistical calculation. Default is -z 1, which uses +regression against the length of the library sequence. -z 0 disables +statistics. -z 2 provides maximum likelihood estimates for lambda and K, +censoring the 250 lowest and 250 highest scores. -z 3 uses Altschul +and Gish's statistical estimates for specific protein BLOSUM scoring +matrices and gap penalties. -z 4,5: an alternate regression method. +\-z 6 uses a composition based maximum likelihood estimate based +on the method of Mott (1992) Bull. Math. Biol. 54:59-75. +-z 11,12,14,15,16: compute the regression against scores of randomly +shuffled copies of the library sequences. Twice as many comparisons +are performed, but accurate estimates can be generated from databases +of related sequences. -z 11 uses the -z 1 regression strategy, etc. +.TP +\-Z db_size +Set the apparent database size used for expectation value calculations +(used for protein/protein FASTA and SSEARCH, and for FASTX, FASTY, TFASTX, +and TFASTY). +.SH Environment variables: +.TP +FASTLIBS +location of library choice file (-l FASTLIBS) +.TP +SMATRIX +default scoring matrix (-s SMATRIX) +.TP +SRCH_URL +the format string used to define the option to re-search the +database. +.TP +REF_URL +the format string used to define the option to lookup the library +sequence in entrez, or some other database. + +.SH AUTHOR +Bill Pearson +.br +wrp@virginia.EDU