X-Git-Url: http://source.jalview.org/gitweb/?a=blobdiff_plain;f=sources%2Freadseq%2FReadseq.help;fp=sources%2Freadseq%2FReadseq.help;h=08fdc080be1755d130c48e88c12584a30e0aca5a;hb=a5e6297d655a784603d499da5a025d5d5fa78783;hp=0000000000000000000000000000000000000000;hpb=df24dcd3c415c000592af419f2c9304a4e05c2ee;p=jpred.git diff --git a/sources/readseq/Readseq.help b/sources/readseq/Readseq.help new file mode 100644 index 0000000..08fdc08 --- /dev/null +++ b/sources/readseq/Readseq.help @@ -0,0 +1,229 @@ + + * ReadSeq.Help -- 30 Dec 92 + * + * Reads and writes nucleic/protein sequences in various + * formats. Data files may have multiple sequences. + * + * Copyright 1990 by d.g.gilbert + * biology dept., indiana university, bloomington, in 47405 + * e-mail: gilbertd@bio.indiana.edu + * + * This program may be freely copied and used by anyone. + * Developers are encourged to incorporate parts in their + * programs, rather than devise their own private sequence + * format. + * + * This should compile and run with any ANSI C compiler. + * Please advise me of any bugs, additions or corrections. + +Readseq is particularly useful as it automatically detects many +sequence formats, and interconverts among them. + +Formats which readseq currently understands: + + * IG/Stanford, used by Intelligenetics and others + * GenBank/GB, genbank flatfile format + * NBRF format + * EMBL, EMBL flatfile format + * GCG, single sequence format of GCG software + * DNAStrider, for common Mac program + * Fitch format, limited use + * Pearson/Fasta, a common format used by Fasta programs and others + * Zuker format, limited use. Input only. + * Olsen, format printed by Olsen VMS sequence editor. Input only. + * Phylip3.2, sequential format for Phylip programs + * Phylip, interleaved format for Phylip programs (v3.3, v3.4) + * Plain/Raw, sequence data only (no name, document, numbering) + + MSF multi sequence format used by GCG software + + PAUP's multiple sequence (NEXUS) format + + PIR/CODATA format used by PIR + + ASN.1 format used by NCBI + + Pretty print with various options for nice looking output. Output only. + +See the included "Formats" file for detail on file formats. + + +Example usage: + readseq + -- for interactive use + + readseq my.1st.seq my.2nd.seq -all -format=genbank -output=my.gb + -- convert all of two input files to one genbank format output file + + readseq my.seq -all -form=pretty -nameleft=3 -numleft -numright -numtop -match + -- output to standard output a file in a pretty format + + readseq my.seq -item=9,8,3,2 -degap -CASE -rev -f=msf -out=my.rev + -- select 4 items from input, degap, reverse, and uppercase them + + cat *.seq | readseq -pipe -all -format=asn > bunch-of.asn + -- pipe a bunch of data thru readseq, converting all to asn + + +The brief usage of readseq is as follows. The "[]" denote +optional parts of the syntax: + +readseq -help +readSeq (27Dec92), multi-format molbio sequence reader. +usage: readseq [-options] in.seq > out.seq + options + -a[ll] select All sequences + -c[aselower] change to lower case + -C[ASEUPPER] change to UPPER CASE + -degap[=-] remove gap symbols + -i[tem=2,3,4] select Item number(s) from several + -l[ist] List sequences only + -o[utput=]out.seq redirect Output + -p[ipe] Pipe (command line, stdout) + -r[everse] change to Reverse-complement + -v[erbose] Verbose progress + -f[ormat=]# Format number for output, or + -f[ormat=]Name Format name for output: + 1. IG/Stanford 10. Olsen (in-only) + 2. GenBank/GB 11. Phylip3.2 + 3. NBRF 12. Phylip + 4. EMBL 13. Plain/Raw + 5. GCG 14. PIR/CODATA + 6. DNAStrider 15. MSF + 7. Fitch 16. ASN.1 + 8. Pearson/Fasta 17. PAUP + 9. Zuker 18. Pretty (out-only) + + Pretty format options: + -wid[th]=# sequence line width + -tab=# left indent + -col[space]=# column space within sequence line on output + -gap[count] count gap chars in sequence numbers + -nameleft, -nameright[=#] name on left/right side [=max width] + -nametop name at top/bottom + -numleft, -numright seq index on left/right side + -numtop, -numbot index on top/bottom + -match[=.] use match base for 2..n species + -inter[line=#] blank line(s) between sequence blocks + + +Notes: + +In use, readseq will respond to command line arguments, or to +interactive use. Command line arguments cannot be combined +but must each follow a switch character (-). In this release, +the command line options are now words, with an equals (=) +to separate parameter(s) fromt he command. You cannot put a +space between a command and its parameter, as is usual for +Unix programs (this is to preserve compatibility with VMS). +The command line syntax of the earlier versions is still +supported. + +See the file Formats for details of the sequence formats which +are supported by readseq. The auto-detection feature of +readseq which distinguishes these formats looks for some of the +unique keywords and symbols that are found in each format. It +is not infallible at this, though it attempts to exclude unknown +formats. In general, if you feed to readseq a sequence file that +you know is one of these common formats, you are okay. If you feed +it data that might be oddball formats, or non-sequence data, +you might well get garbage results. Also, different developers +are always thinking up minor twists on these common formats +(like PAUP requiring a blank line between blocks of Phylip format, +or IG adding form feeds between sequences), which may cause hassles. + +In general, output supports only minimal subsets of each format +needed for sequence data exchanges. Features, descriptions +and other format-unique information is discarded. + +The pretty format requires additional options to generate a +nice output. Try the various pretty options to see what you like. +Pretty format is OUPUT only, readseq cannot read a Pretty format +file. + +Readseq is NOT optimized for LARGE files. It generally makes several +reads thru each input file (one per sequence output at present, future +version may optimize this). It should handle input and output files +and sequences of any size, but will slow down quite a bit for very large +(multi megabyte) sized files. It is NOT recommended for converting +databanks or large subsets there-of. It is primarily directed at the +small files that researchers use to maintain their personal data, which +they frequently need to interconvert for the various analysis programs +which so frequently require a special format. + +Users of Olsen multi sequence editor (VMS). The Olsen format +here is produced with the print command: + print/out=some.file +Use Genbank output from readseq to produce a format that this +editor can read, and use the command + load/genbank some.file +Dan Davison has a VMS program that will convert to/from the +Olsen native binary data format. E-mail davison@uh.edu + +Warning: Phylip format input is now supported (30Dec92), however the +auto-detection of Phylip format is very probabilistic and messy, +especially distinguishing sequential from interleaved versions. It +is not recommended that one use readseq to convert files from Phylip +format to others unless essential. + + +This program is available thru Internet gopher, as + + gopher ftp.bio.indiana.edu + browse into the IUBio-Software+Data/molbio/readseq/ folder + select the readseq.shar document + +Or thru anonymous FTP in this manner: + my_computer> ftp ftp.bio.indiana.edu (or IP address 129.79.224.25) + username: anonymous + password: my_username@my_computer + ftp> cd molbio/readseq + ftp> get readseq.shar + ftp> bye + +readseq.shar is a Unix shell archive of the readseq files. +This file can be editted by any text editor to reconstitute the +original files, for those who do not have a Unix system or an +Unshar program. Read the top of this .shar file for further +instructions. + +There are also pre-compiled executables for the following computers: +Silicon Graphics Iris, Sparc (Sun Sparcstation & clones), VMS-Vax, +Macintosh. Use binary ftp to transfer these, except Macintosh. The +Mac version is just the command-line program in a window, not very +handy. + +C source files: + readseq.c ureadseq.c ureadasn.c ureadseq.h + +Document files: + Readme (this doc) + Formats (description of sequence file formats) + add.gdemenu (GDE program users can add this to the .GDEmenu file) + Stdfiles -- test sequence files + Makefile -- Unix make file + Make.com -- VMS make file + *.std -- files for testing validity of readseq + + +Recent changes (see also readseq.c for all history of changes): + +4 May 92 ++ added 32 bit CRC checksum as alternative to GCG 6.5bit checksum +Aug 92 += fixed Olsen format input to handle files w/ more sequences, + not to mess up when more than one seq has same identifier, + and to convert number masks to symbols. += IG format fix to understand ^L +30 Dec 92 +* revised command-line & interactive interface. Suggested form is now + readseq infile -format=genbank -output=outfile -item=1,3,4 ... + but remains compatible with prior commandlines: + readseq infile -f2 -ooutfile -i3 ... ++ added GCG MSF multi sequence file format ++ added PIR/CODATA format ++ added NCBI ASN.1 sequence file format ++ added Pretty, multi sequence pretty output (only) ++ added PAUP multi seq format ++ added degap option ++ added Gary Williams (GWW, G.Williams@CRC.AC.UK) reverse-complement option. ++ added support for reading Phylip formats (interleave & sequential) +* string fixes, dropped need for compiler flags NOSTR, FIXTOUPPER, NEEDSTRCASECMP +* changed 32bit checksum to default, -DSMALLCHECKSUM for GCG version + +