forester/archive/RIO/others/hmmer/squid/Man/sfetch.man

   1 .TH "sfetch" 1 "@RELEASEDATE@" "@PACKAGE@ @RELEASE@" "@PACKAGE@ Manual"
   2
   3 .SH NAME
   4 .TP
   5 sfetch - get a sequence from a flatfile database.
   6
   7 .SH SYNOPSIS
   8 .B sfetch
   9 .I [options]
  10 .I seqname
  11
  12 .SH DESCRIPTION
  13
  14 .B sfetch
  15 retrieves the sequence named
  16 .I seqname
  17 from a sequence database.
  18
  19 .PP
  20 Which database is used is controlled by the
  21 .B -d
  22 and
  23 .B -D
  24 options, or "little databases" and "big
  25 databases".
  26 The directory location of "big databases" can
  27 be specified by environment variables,
  28 such as $SWDIR for Swissprot, and $GBDIR
  29 for Genbank (see
  30 .B -D
  31 for complete list).
  32 A complete file path must be specified
  33 for "little databases".
  34 By default, if neither option is specified
  35 and the name looks like a Swissprot identifier
  36 (e.g. it has a _ character), the $SWDIR
  37 environment variable is used to attempt
  38 to retrieve the sequence
  39 .I seqname
  40 from Swissprot.
  41
  42 .PP
  43 A variety of other options are available which allow
  44 retrieval of subsequences
  45 .RI ( -f,-t );
  46 retrieval by accession number instead of
  47 by name
  48 .RI ( -a );
  49 reformatting the extracted sequence into a variety
  50 of other formats
  51 .RI ( -F );
  52 etc.
  53
  54 .PP
  55 If the database has been GSI indexed, sequence
  56 retrieval will be extremely efficient; else,
  57 retrieval may be painfully slow (the entire
  58 database may have to be read into memory to
  59 find
  60 .IR seqname ).
  61 GSI indexing
  62 is recommended for all large or permanent
  63 databases.
  64
  65 .pp
  66 This program was originally named
  67 .B getseq,
  68 and was renamed because it clashed with a GCG
  69 program of the same name.
  70
  71 .SH OPTIONS
  72
  73 .TP
  74 .B -a
  75 Interpret
  76 .I seqname
  77 as an accession number, not an identifier.
  78
  79 .TP
  80 .BI -d " <seqfile>"
  81 Retrieve the sequence from a sequence file named
  82 .I <seqfile>.
  83 If a GSI index
  84 .I <seqfile>.gsi
  85 exists, it is used to speed up the retrieval.
  86
  87 .TP
  88 .BI -f " <from>"
  89 Extract a subsequence starting from position
  90 .I <from>,
  91 rather than from 1. See
  92 .B -t.
  93 If
  94 .I <from>
  95 is greater than
  96 .I <to>
  97 (as specified by the
  98 .B -t
  99 option), then the sequence is extracted as
 100 its reverse complement (it is assumed to be
 101 nucleic acid sequence).
 102
 103 .TP
 104 .B -h
 105 Print brief help; includes version number and summary of
 106 all options, including expert options.
 107
 108 .TP
 109 .BI -o " <outfile>"
 110 Direct the output to a file named
 111 .I <outfile>.
 112 By default, output would go to stdout.
 113
 114 .TP
 115 .BI -r " <newname>"
 116 Rename the sequence
 117 .I <newname>
 118 in the output after extraction. By default, the original
 119 sequence identifier would be retained. Useful, for instance,
 120 if retrieving a sequence fragment; the coordinates of
 121 the fragment might be added to the name (this is what Pfam
 122 does).
 123
 124 .TP
 125 .BI -t " <to>"
 126 Extract a subsequence that ends at position
 127 .I <to>,
 128 rather than at the end of the sequence. See
 129 .B -f.
 130 If
 131 .I <to>
 132 is less than
 133 .I <from>
 134 (as specified by the
 135 .B -f
 136 option), then the sequence is extracted as
 137 its reverse complement (it is assumed to be
 138 nucleic acid sequence)
 139
 140 .TP
 141 .B -B
 142 (Babelfish). Autodetect and read a sequence file format other than the
 143 default (FASTA). Almost any common sequence file format is recognized
 144 (including Genbank, EMBL, SWISS-PROT, PIR, and GCG unaligned sequence
 145 formats, and Stockholm, GCG MSF, and Clustal alignment formats). See
 146 the printed documentation for a complete list of supported formats.
 147
 148
 149 .TP
 150 .BI -D " <database>"
 151 Retrieve the sequence from the main sequence database
 152 coded
 153 .I <database>. For each code, there is an environment
 154 variable that specifies the directory path to that
 155 database.
 156 Recognized codes and their corresponding environment
 157 variables are
 158 .I -Dsw
 159 (Swissprot, $SWDIR);
 160 .I -Dpir
 161 (PIR, $PIRDIR);
 162 .I -Dem
 163 (EMBL, $EMBLDIR);
 164 .I -Dgb
 165 (Genbank, $GBDIR);
 166 .I -Dwp
 167 (Wormpep, $WORMDIR); and
 168 .I -Dowl
 169 (OWL, $OWLDIR).
 170 Each database is read in its native flatfile format.
 171
 172 .TP
 173 .BI -F " <format>"
 174 Reformat the extracted sequence into a different format.
 175 (By default, the sequence is extracted from the database
 176 in the same format as the database.) Available formats
 177 are
 178 .B embl, fasta, genbank, gcg, strider, zuker, ig, pir, squid,
 179 and
 180 .B raw.
 181
 182 .SH EXPERT OPTIONS
 183
 184 .TP
 185 .BI --informat " <s>"
 186 Specify that the sequence file is in format
 187 .I <s>,
 188 rather than the default FASTA format.
 189 Common examples include Genbank, EMBL, GCG,
 190 PIR, Stockholm, Clustal, MSF, or PHYLIP;
 191 see the printed documentation for a complete list
 192 of accepted format names.
 193 This option overrides the default format (FASTA)
 194 and the
 195 .I -B
 196 Babelfish autodetection option.
 197
 198 .SH SEE ALSO
 199
 200 .PP
 201 @SEEALSO@
 202
 203 .SH AUTHOR
 204
 205 @PACKAGE@ and its documentation is @COPYRIGHT@
 206 HMMER - Biological sequence analysis with profile HMMs
 207 Copyright (C) 1992-1999 Washington University School of Medicine
 208 All Rights Reserved
 209
 210     This source code is distributed under the terms of the
 211     GNU General Public License. See the files COPYING and LICENSE
 212     for details.
 213 See COPYING in the source code distribution for more details, or contact me.
 214
 215 .nf
 216 Sean Eddy
 217 Dept. of Genetics
 218 Washington Univ. School of Medicine
 219 4566 Scott Ave.
 220 St Louis, MO 63110 USA
 221 Phone: 1-314-362-7666
 222 FAX  : 1-314-362-7855
 223 Email: eddy@genetics.wustl.edu
 224 .fi
 225
 226