websoft/data/blast/sequin.hlp

   1 <HTML> <HEAD>
   2
   3 <TITLE>Sequin help documentation</TITLE>
   4
   5 <!-- if you use the following meta tags, uncomment them.
   6  <meta name="author" content="sequindoc">
   7  <META NAME="keywords" CONTENT="national center for biotechnology information, ncbi, national library of medicine, nlm, national institutes of health, nih, database, archive, bookshelf, pubmed, pubmed central, bioinformatics, biomedicine, sequence submission, sequin, bankit, submitting sequences">
   8  <META NAME="description" CONTENT="Sequin is a stand-alone software tool developed by the National Center for Biotechnology Information (NCBI) for submitting and updating entries to the GenBank, EMBL, or DDBJ sequence databases. "> -->
   9  <link rel="stylesheet" href="ncbi_sequin.css">
  10
  11 </HEAD>
  12
  13 <body bgcolor="#FFFFFF" text="#000000" link="#0033CC" vlink="#0033CC">
  14 <!-- change the link and vlink colors from the original orange (link="#CC6600" vlink="#CC6600") -->
  15
  16 <!--  the header   -->
  17 <table border="0" width="600" cellspacing="0" cellpadding="0">
  18   <tr>
  19     <td width="140"><a href="http://www.ncbi.nlm.nih.gov"> <img src="http://www.ncbi.nlm.nih.gov/corehtml/left.GIF" width="130" height="45" border="0"></a></td>
  20     <td width="360" class="head1" valign="BOTTOM"> <span class="H1">Sequin Help Documentation</span></td>
  21 <!--    <td width="100" valign="BOTTOM">Your Logo</td> -->
  22   </tr>
  23 </table>
  24
  25 <!--  the quicklinks bar   -->
  26 <table CLASS="TEXT" border="0" width="600" cellspacing="0" cellpadding="3" bgcolor="#000000">
  27   <tr CLASS="TEXT"  align="CENTER">
  28     <td width="100"><a href="index.html" class="BAR">Sequin</a></td>
  29     <td width="100"><a href="http://www.ncbi.nlm.nih.gov/Entrez/" class="BAR">Entrez</a></td>
  30     <td width="100"><a href="http://www.ncbi.nlm.nih.gov/BLAST/" class="BAR">BLAST</a></td>
  31     <td width="100"><a href="http://www.ncbi.nlm.nih.gov/omim/" class="BAR">OMIM</a></td>
  32     <td width="100"><a href="http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html" class="BAR">Taxonomy</a></td>
  33     <td width="100"><a href="http://www.ncbi.nlm.nih.gov/Structure/" class="BAR">Structure</a></td>
  34   </tr>
  35 </table>
  36
  37 <!--  the contents   -->
  38 <P>&nbsp
  39
  40 <H2>Table of Contents</H2>
  41
  42 <HR>
  43
  44 >Introduction
  45
  46 #Sequin is a program designed to aid in the submission of sequences to
  47 the GenBank, EMBL, and DDBJ sequence databases.  It was written at the
  48 National Center for Biotechnology Information, part of the National
  49 Library of Medicine at the National Institutes of Health.  This section
  50 of the help document provides a basic overview of how to submit
  51 sequences using the Sequin forms.  Subsequent sections provide detailed
  52 instructions for entering information on each form.
  53
  54 *The Help Documentation
  55
  56 #The Sequin help documentation is available in both on-line and World
  57 Wide Web (http://www.ncbi.nlm.nih.gov/Sequin/sequin.hlp.html) formats.
  58 The text of the on-line version scrolls as you progress through the
  59 Sequin forms.  Specific words or phrases can be identified with the
  60 "find" command at the top of the window.  The on-line document can also
  61 be saved as a text file, or printed directly to a printer.  Click on the
  62 window that contains the help documentation.  Under the Sequin File
  63 menu, choose Export Help... to save the documentation as a text file.
  64 To print the documentation without saving it first, click on the help
  65 window, and choose Print from the Sequin File menu.
  66
  67 *Organization of Forms
  68
  69 #Information is entered into Sequin on a number of different forms. Each
  70 form is made up of pages, which are indicated by folder tabs at the top
  71 of the form.  You can move to the desired page by clicking on the
  72 appropriate folder tab.  You can also move between pages of a form by
  73 clicking on the "Next page" or "Prev page" buttons at the bottom of the
  74 screen.  You can move to the previous form or the next form by clicking
  75 on the "Prev form" or "Next form" buttons on the first or last pages of
  76 a form, respectively.
  77
  78 #There are numerous ways to enter information onto a page of a form,
  79 #including text fields, radio buttons, check boxes, scrolling boxes,
  80 #pop-up menus and spreadsheets.
  81
  82 #You may also use tables to import annotation of source information.
  83 #The formatting of these tables will be discussed below.
  84
  85 *Overview of Sequin
  86
  87 #If you are using Sequin for the first time, you will be prompted to
  88 fill out four forms:  the Welcome to Sequin form, the Submitting
  89 Authors Form, the Sequence Format form, and the Organism and Sequences
  90 Form. After you have filled out these forms, a window will appear that
  91 contains the Sequin record viewer. This viewer allows you to access
  92 many other forms in which you can edit fields filled out in the three
  93 initial forms, as well as add additional information.  Detailed
  94 instructions on how to fill out the forms and use the record viewer are
  95 presented below.
  96
  97 >Welcome to Sequin Form
  98
  99 #First, indicate with one of the three radio buttons whether you are
 100 submitting the sequence to the GenBank, EMBL, or DDBJ database.  If you
 101 are working on a sequence submission for the first time, click on
 102 "Start New Submission".  If you are modifying an existing submission
 103 record, click on "Read Existing Record". If you would like to quit from
 104 Sequin, click on "Quit Program".
 105
 106 #You can also "Read Existing Record" to read in a FASTA-formatted
 107 #sequence file
 108 for analysis purposes.  The sequence will be displayed in Sequin and
 109 can be analyzed with tools such as CDD Search, but it should not be
 110 submitted because it does not have the appropriate annotations.
 111
 112 #If you are running Sequin in its network-aware mode, you will see
 113 another button labeled "Download from Entrez".  This option allows you
 114 to update an existing database record using Sequin. The record will be
 115 downloaded from GenBank into Sequin using NCBI's Entrez retrieval
 116 system.  The contents of the record will appear in Sequin, and you can
 117 edit them by updating the sequence or the annotations, as necessary.  If
 118 you do not see the button labeled "Download from Entrez" on the Welcome
 119 to Sequin form, you are not running Sequin in its network-aware mode.
 120 To make Sequin network-aware, see the
 121 <A HREF="#NetConfigure">
 122 instructions
 123 </A>
 124 later in the help documentation.
 125
 126 #You can update only those records that you have submitted, not those
 127 submitted by others.  To update an existing record, first select which
 128 of the databases you will be sending the update to.  This should be the
 129 database to which the original record was submitted.  If you do not
 130 know which database to use, send the record to GenBank and the NCBI
 131 staff will forward it to the appropriate database.  Next, click on the
 132 button "Download from Entrez". Enter the nucleotide Accession number or
 133 GI of the sequence on the first form. Then enter "yes" if you are
 134 planning to submit the record as an update to one of the databases.
 135 Fill out the Submitting Authors form.
 136
 137 <A HREF="#EditSubmitterInfo">
 138 Instructions
 139 </A>
 140
 141 for this form are found in the Sequin help documentation under "Edit
 142 Submitter Info" under the Sequin File menu.  The record will then open
 143 in the record viewer.  Explanations of how to add annotations or update
 144 sequences are presented in the documentation entitled
 145
 146 <A HREF="#EditingtheRecord">
 147 "Editing the record"
 148 </A>
 149 and
 150 <A HREF="#SequenceEditor">
 151 "Sequence Editor"
 152 </A>
 153
 154  respectively.  You will not see the Submitting Authors Form, the
 155 Sequence Format Form, or the Organism and Sequences Form.  Note that
 156 updates, as well as new records, must be emailed to the appropriate
 157 database.  Sequin does not support direct submission of records over the
 158 Internet.
 159
 160 #Additional configuration options are available under the Misc menu.
 161 You can toggle between the stand-alone and network-aware modes of
 162 Sequin.  The default mode of Sequin, which is sufficient for most
 163 sequence submissions, is stand-alone.  In its network-aware mode, Sequin
 164 can exchange data with NCBI and, for example, retrieve sequences
 165 from Entrez and perform Taxonomy searches.  The network-aware mode of
 166 Sequin is described in detail in the
 167 <A HREF="#NetConfigure">
 168 Net Configure
 169 </A>
 170 section below.  You can also start the NCBI DeskTop, which is for
 171 advanced Sequin users only.
 172
 173 >Submitting Authors Form
 174
 175 #Information from this form will be used as a citation for the sequence
 176 entry itself.  It can contain the same information found in citations
 177 associated with the formal publication of the sequence.
 178
 179 #On the bottom of each form are two buttons.  Click "Prev form" (first
 180 page in a form) or "Prev page" (subsequent pages in a form) to go to the
 181 previous form or page.  Click "Next Form" (last page on a form) or "Next
 182 Page" (earlier pages on a form) to move to the next form or page.
 183
 184 #Form pages can also be saved individually by using the "Export" function
 185 under the File menu.  If you are processing multiple submissions, you
 186 can use the "Import" function under the File menu to paste previously
 187 entered information directly on the page.
 188
 189 #The Contact, Authors, and Affiliation pages can be saved as a block so
 190 that you can use this information for your next submission.  For your
 191 first Sequin submission, fill in the requested information on the
 192 Submitting Authors form and proceed with the preparation of the
 193 submission.  Choose Export Submitter Info under the File menu to export
 194 this to a file.  You can then import this information in subsequent
 195 submissions using the Import Submitter Info in the File menu.  You will
 196 need to fill in the manuscript title for each submission however.
 197
 198 *Submission Page
 199
 200 **When May We Release Your Sequence Record?
 201
 202 #Please select one of the two radio buttons.  If you select
 203 #"Immediately After Processing", the
 204 entry will be released to the public after the database staff has added
 205 it to the database. If you select "Release Date", fields will appear in
 206 which you can indicate the date on which the sequences should be
 207 released to the public.  The submission will then be held back until
 208 formal publication of the sequence or GenBank Accession number, or
 209 until the release date, whichever comes first.
 210
 211 **Tentative Title for Manuscript
 212
 213 #Please enter a title that appropriately describes the sequence entry.
 214  Later in the submission process, you will have the
 215 opportunity to change this information and add details for published
 216 or in press references.
 217
 218 *Contact Page
 219
 220 #Please enter the name, telephone and fax numbers, and email address of
 221 the person who is submitting the sequence.  This is the person who will
 222 be contacted regarding the sequence submission.  The phone, fax, and
 223 email address will not be visible in the database record, but are
 224 essential for contact by the database staff.
 225
 226 *Authors Page
 227
 228 #Please enter the names of the people who should receive scientific
 229 credit for the generation of sequences in this entry.  The person on
 230 the Contact page is automatically listed as the first author.  This
 231 information can be changed if necessary. The author names should be
 232 entered in the order first name, middle initial, surname.  You can add
 233 as many authors to this page as you wish.  After you type in the name
 234 of the third author, the box becomes a spreadsheet, and you can scroll
 235 down to the next line by using the space bar.  The consortium box
 236 should only be used for consortium names, not institute or department
 237 names.
 238
 239 *Affiliation Page
 240
 241 #Please enter information about the principal institution where the
 242 sequencing was performed.  This is not necessarily the same as the
 243 workplace of the person described on the Contact page.  This information
 244 will show up in the reference section of the record, with the title
 245 Direct Submission.
 246
 247 >Sequence Format Form
 248
 249 #Use this form to indicate the type, format and category of sequence
 250 #you are submitting.
 251
 252 #Sequin can process single nucleotide sequences, gapped sequences and
 253 sets of related sequences.  If the sequences are related in terms of
 254 coming from the same publication, or the same organism, they may be
 255 candidates for a Batch submission.  Biologically related sequences may
 256 be classified as environmental samples, population, phylogenetic,
 257 mutation, or segmented sets as appropriate.  Segmented sets consist of
 258 a collection of non-overlapping sequences covering a specific genetic
 259 region.  In all cases, although the sequences are handled as a single
 260 submission, each sequence in a set will receive its own database
 261 Accession number and can be annotated independently.
 262
 263 #Sequin can display the alignments of sequences that are submitted as
 264 part of an aligned phylogenetic, population, mutation set, or
 265 environmental samples.  Such sequences can be submitted in FASTA,
 266 Contiguous (FASTA+GAP, NEXUS, MACAW), or Interleaved (PHYLIP, NEXUS)
 267 formats.  If the sequences are in FASTA format, Sequin can generate an
 268 alignment. If the sequences have already been aligned in FASTA+GAP,
 269 PHYLIP, MACAW, or NEXUS, Sequin will not change the alignment. If one
 270 of the sequences in your alignment is already present in the
 271 GenBank/EMBL/DDBJ database, you must mark that sequence so that it does
 272 not receive a new Accession number. Instead of supplying that sequence
 273 with a new Sequence Identifier, give it the identifier accU12345, where
 274 U12345 is the Accession number of the sequence.
 275
 276 #Single sequences, gapped sequences, segmented sequences, and batch
 277 submissions must be submitted in FASTA format.
 278
 279 *Submission Type
 280
 281 #Use the radio buttons to indicate which of the following types of
 282 submissions you are creating:
 283
 284 #-Single sequence: a single mRNA or genomic DNA sequence.  If you are
 285 submitting multiple sequences from the same publication, consider a
 286 Batch Submission.  If you decide to submit multiple Sequin files, each
 287 with one or more sequences, please send each file in a separate email
 288 message.
 289
 290 #-Segmented sequence: a collection of non-overlapping, non-contiguous
 291 sequences that cover a specified genetic region.  A standard example is
 292 a set of genomic DNA sequences that encode exons from a gene along with
 293 fragments of their flanking introns.  If the segmented set is part of
 294 an alignment, however, select the appropriate Population, Phylogenetic,
 295 or Mutation study button.
 296
 297 #-Gapped sequence: a single, non-contiguous mRNA or genomic DNA sequence.
 298 A gapped sequence contains specified gaps of know or unknown length
 299 where the exact nucleotide sequence has not been determined.  The FASTA
 300 format for gapped sequences is slightly different and is explained
 301 below.
 302
 303 #-Population study: a set of sequences that were derived by sequencing
 304 the same gene from different isolates of the same organism.
 305
 306 #-Phylogenetic study: a set of sequences that were derived by sequencing
 307 the same gene from different organisms.
 308
 309 #-Mutation study: a set of sequences that were derived by sequencing
 310 multiple mutations of a single gene.
 311
 312 #-Environmental samples: a set of sequences that were derived by
 313 sequencing the same gene from a population of unclassified or unknown
 314 organisms.
 315
 316 #-Batch submission: a set of related sequences that are not part of a
 317 population, mutation, or phylogenetic study. The sequences should be
 318 related in some way, such as coming from the same publication or
 319 organism.  You should plan that all sequences will be released to the
 320 public on the same date.
 321
 322 *Sequence Data Format
 323
 324 #If you are submitting a single, gapped, or segmented sequence, or a
 325 batch submission, your sequence must be in FASTA format, described
 326 below.  If you are submitting a set of sequences as part of a
 327 population, phylogenetic, or mutation study, you have a choice of
 328 sequence formats.  You may submit the set as individual sequences in
 329 FASTA format.  Alternatively, you can submit the sequences as part of
 330 an alignment.  Sequin currently accepts the alignment formats
 331 FASTA+GAP, PHYLIP, MACAW, NEXUS Interleaved, and NEXUS Contiguous.
 332
 333 *Submission Category
 334
 335 #Use the radio buttons to indicate whether your sequence corresponds to
 336 an original submission or a third-party annotation submission.  If you
 337 have directly sequenced the nucleotide sequence in your laboratory,
 338 your submission would be considered an original submission.
 339
 340 #If you have downloaded the sequence from GenBank and added to it your
 341 own annotations, your entry may be eligible for submission to the
 342 Third-Party Annotation Database
 343
 344 <A HREF="http://www.ncbi.nlm.nih.gov/Genbank/tpa.html">
 345 (TPA)
 346 </A>
 347 .
 348
 349 #In order to be released into the TPA database, the sequence must appear
 350 in a peer-reviewed publication in a biological journal.  If you select
 351 this option, a pop-up box will appear upon the completion of the
 352 Sequence Format form.  You must provide some description of the
 353 biological experiments used as evidence for the annotation of your TPA
 354 submission in this box.
 355
 356 #You will be asked later in the submission process to provide the GenBank
 357 Accession number(s) of the primary sequence(s) from which your TPA
 358 submission was derived.
 359
 360 >Organism and Sequences Form
 361
 362 #This form is made up of four pages.  If your sequences are imported as
 363 properly formatted FASTA files, there will be minimum input necessary
 364 in these pages.
 365
 366 >FASTA Format for Nucleotide Sequences
 367
 368 #In FASTA format the line before the nucleotide sequence, called the
 369 FASTA definition line, must begin with a carat (">"), followed by a
 370 unique SeqID (sequence identifier).  The SeqID must be unique for each
 371 nucleotide sequence and should not contain any spaces.  Use of brackets
 372 ("[]") in the SeqID is also prohibited.  The identifier will be
 373 replaced with an Accession number by the database staff when your
 374 submission is processed.
 375
 376 #Information about the source organism from which the sequence was
 377 obtained follows the SeqID and must be in the format [modifier=text].
 378 Do not put spaces around the "=".  At minimum, the scientific name of
 379 the organism should be included.  Optional modifiers can be added to
 380 provide additional information.  A complete list of available source
 381 <A HREF="http://www.ncbi.nlm.nih.gov/Sequin/modifiers.html">
 382 modifiers
 383 </A>
 384  and their format is available.
 385
 386 #The final optional component of the FASTA definition line is the
 387 sequence title, which will be used as the DEFINITION field in the final
 388 flatfile.  The title should contain a brief description of the
 389 sequence.  There is a preferred format for nucleotide and protein
 390 titles and Sequin can generate them automatically using the Generate
 391 Definition Line function under the Annotate menu in the record viewer.
 392
 393 #Note in all cases, the FASTA definition line must not contain any hard
 394 returns.  All information must be on a single line of text.  If you
 395 have trouble importing your FASTA sequences, please double check that
 396 no returns were added to the FASTA definition line by your editing
 397 software.
 398
 399 #Examples of properly formatted FASTA definition lines for nucleotide
 400 sequences are:
 401
 402 <KBD><PRE>>Seq1 [organism=Mus musculus] [strain=C57BL/6] Mus musculus neuropilin 1 (Nrp1) mRNA, complete cds.
 403 </KBD></PRE>
 404 <KBD><PRE>>ABCD [organism=Plasmodium falciparum] [isolate=ABCD] Plasmodium falciparum isolate ABCD merozoite surface protein 2 (msp2) gene, partial cds.
 405 </KBD></PRE>
 406 <KBD><PRE>>DNA.new [organism=Homo sapiens] [chromosome=17] [map=17q21] [moltype=mRNA] Homo sapiens breast and ovarian cancer susceptibility protein (BRCA1) mRNA, complete cds.
 407 </KBD></PRE>
 408 #The line after the FASTA definition line begins the nucleotide
 409 sequence.  Unlike the FASTA definition line, the nucleotide sequence
 410 itself can contain returns.  It is recommended that each line of
 411 sequence be no longer than 80 characters.  Please only use IUPAC
 412 symbols within the nucleotide sequence.  For sequences that are not
 413 contained within an alignment, do not use "?" or "-" characters.  These
 414 will be stripped from the sequence.  Use the IUPAC approved symbol "N"
 415 for ambiguous characters instead.
 416
 417 #A single file containing multiple FASTA sequences can be imported into
 418 Sequin.  Make sure that the FASTA definition line for each sequence is
 419 formatted as above.
 420
 421 #If the FASTA definition line is not properly formatted a pop-up box
 422 will appear upon importing the nucleotide FASTA.  The top box in this
 423 pop-up will list any errors in the FASTA definition lines, including
 424 missing SeqIDs, duplicate SeqIDs for different sequences, or improperly
 425 formatted modifiers.  You can add or edit this information in the
 426 spreadsheet provided.  The toggle at the bottom of the pop-up allows
 427 you to select whether all sequences or only those with errors are
 428 listed in the spreadsheet above. After making changes, click on Refresh
 429 Error List to ensure that all errors have been corrected.  You must
 430 correct any errors involving the SeqID in order to proceed with your
 431 submission.
 432
 433 *FASTA Format for Segmented Sequence
 434
 435 #Each segment of a segmented sequence must have its own SeqID, but the
 436 organism name and other modifiers are only indicated in the FASTA
 437 definition line of the first segment.  Square brackets are used to
 438 delimit the members of the segmented set.  For example,
 439
 440 ![
 441 !>A-0V-1-Apart1 [organism=Gallus gallus] [clone=C]
 442 !TCACTCTTTGGCAAC
 443 !>A-0V-1-Apart2
 444 !GACCCGTCGTCATAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
 445 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
 446 !]
 447
 448 *FASTA Format for Gapped Sequence
 449
 450 #The FASTA definition line for a gapped sequence follows the same format
 451 as above.  To indicate a gap within the sequence, enter a hard return
 452 within the sequence at the point of the gap, then insert an extra line
 453 starting with a carat (">") and a question mark ("?").  If the gap size
 454 is unknown, enter "unk100" after the question mark.  If the gap size is
 455 known, enter the length of the gap after the question mark.  For
 456 example,
 457
 458 !>Dobi [organism=Canis familiaris] [breed=Doberman pinscher]
 459 !AAATGCATGGGTAAAAGTAGTAGAAGAGAAGGCTTTTAGCCCAGAAGTAATACCCATGTTTTCAGCATTA
 460 !GGAAAAAGGGCTGTTG
 461 !>?unk100
 462 !TGGATGACAGAAACCTTGTTGGTCCAAAATGCAAACCCAGATKGTAAGACCATTTTAAAAGCATTGGGTC
 463 !TTAGAAATAGGGCAACACAGAACAAAAAT
 464 !>?234
 465 !AAAAATAAAAGCATTAGTAGAAATTTGTACAGAACTGGAAAAGGAAGGAAAAATTTCAAAAATTGGGCCT
 466 !GAAAACCCATACAATACTCCGGG
 467
 468 will generate a sequence containing two gaps.  The first gap is of
 469 unknown length, the second is 234 nucleotides long.
 470
 471 *FASTA+GAP Format for Aligned Nucleotide Sequences
 472
 473 #A number of programs output sets of aligned sequences in FASTA format.
 474 Frequently, to align these sequences, gaps must be inserted.  Specify
 475 relevant gap and ambiguous characters in the appropriate box on the
 476
 477 <A HREF="#NucleotidePage">
 478 Nucleotide Page
 479 </A>
 480
 481 form.  Each sequence, including gaps, must be the same length.  The
 482 gaps will only show up in the alignment, not in the individual sequence
 483 in the database.
 484
 485 #Sequences in FASTA+GAP format resemble FASTA sequences.  The previous
 486 section on
 487
 488 <A HREF="#FASTAFormatforNucleotideSequences">
 489 FASTA Format for Nucleotide Sequences
 490 </A>
 491
 492 has instructions for formatting FASTA sequences.  If one of the
 493 sequences in your alignment is already present in the GenBank/EMBL/DDBJ
 494 database, you must mark that sequence so that it does not receive a new
 495 Accession number.  To do this, use a SeqID in the format accU12345,
 496 where U12345 is the Accession number of the pre-existing sequence.  All
 497 sequences in FASTA+GAP format should be in the same file.
 498
 499 #The following is an example of FASTA+GAP format:
 500
 501 !>A-0V-1-A [organism=Gallus gallus] [clone=C]
 502 !TCACTCTTTGGCAACGACCCGTCGTCATAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
 503 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
 504 !
 505 !>A-0V-2-A [organism=Drosophila melanogaster] [strain=D]
 506 !TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
 507 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
 508 !
 509 !>A-0V-3-A [organism=Caenorhabditis elegans] [strain=E]
 510 !TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
 511 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
 512 !
 513 !>A-0V-4-A [organism=Rattus norvegicus] [strain=F]
 514 !TCACTCTTTGGCAACGACCCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
 515 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
 516 !
 517 !>A-0V-7-A [organism=Aspergillus nidulans] [strain=G]
 518 !TCACTCTTTGGCAACGACCAGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
 519 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
 520
 521 *PHYLIP Format for Aligned Nucleotide Sequences
 522
 523 #A number of programs output sets of aligned sequences in PHYLIP format.
 524
 525 #The following is an example of PHYLIP format.
 526
 527 !     5    100
 528 !A-0V-1-A   TCACTCTTTG GCAACGACCC GTCGTCATAA TAAAGATAGA GGGGCAACTA
 529 !A-0V-2-A   TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA
 530 !A-0V-3-A   TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA
 531 !A-0V-4-A   TCACTCTTTG GCAACGACCC GTCGTCACAA TAAAGATAGA GGGGCAACTA
 532 !A-0V-7-A   TCACTCTTTG GCAACGACCA GTCGTCACAA TAAAGATAGA GGGGCAACTA
 533 !
 534 !
 535 !           AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
 536 !           AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
 537 !           AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
 538 !           AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
 539 !           AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
 540
 541 #In this example, the first line indicates that there are 5 sequences,
 542 each with 100 nt of sequence.  The following five lines contain the
 543 Sequence IDs, followed by the sequences. Specifically, the sequence
 544 identifier for the first sequence is A-0V-1-A.  Note that subsequent
 545 blocks of sequence do not contain the Sequence ID. If one of the
 546 sequences in your alignment is already present in the GenBank/EMBL/DDBJ
 547 database, you must mark that sequence so that it does not receive a new
 548 Accession number.  To do this, use a SeqID in the format accU12345,
 549 where U12345 is the Accession number of the pre-existing sequence.
 550
 551 #Specify relevant gap and ambiguous characters in the appropriate box on the
 552 <A HREF="#NucleotidePage">
 553 Nucleotide Page
 554 </A>
 555 form.
 556
 557 #You can modify the PHYLIP format so that Sequin can
 558 determine the correct organism and any other modifiers for each
 559 sequence.  An example of such modifications are below in the section on
 560 <A HREF="#SourceModifiersforPHYLIPandNEXUS">
 561 Source Modifiers for PHYLIP and NEXUS
 562 </A>
 563 .
 564 #Alternatively, you can leave your sequence alignment in
 565 standard PHYLIP format and enter the organism, strain, chromosome, etc.
 566 information on the following
 567
 568 <A HREF="#SourceModifiersForm">
 569 Source Modifers form
 570 </A>
 571 .
 572
 573 *NEXUS Format for Aligned Nucleotide Sequences
 574
 575 #A number of programs output sets of aligned sequences in one of two
 576 NEXUS formats, NEXUS Interleaved and NEXUS Contiguous.
 577
 578 #NEXUS files can contain ? for "missing" at the 5' and 3' ends of
 579 sequences, as long as this parameter is properly defined within the
 580 header of the NEXUS file.
 581
 582 #The following is an example of NEXUS Interleaved format.
 583
 584 !#NEXUS
 585 !
 586 !begin data;
 587 !   dimensions ntax=5 nchar=100;
 588 !   format datatype=dna missing=? gap=- interleave;
 589 !   matrix
 590 !
 591 !A-0V-1-A   TCACTCTTTG GCAACGACCC GTCGTCATAA TAAAGATAGA GGGGCAACTA
 592 !A-0V-2-A   TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA
 593 !A-0V-3-A   TCACTCTTTG GCAAC---GC GTCGTCACAA TAAAGATAGA GGGGCAACTA
 594 !A-0V-4-A   TCACTCTTTG GCAACGACCC GTCGTCACAA T????ATAGA GGGGCAACTA
 595 !A-0V-7-A   TCACTCTTTG GCAACGACCA GTCGTCACAA TAAAGATAGA GGGGCAACTA
 596 !
 597 !
 598 !A-0V-1-A   AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
 599 !A-0V-2-A   AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
 600 !A-0V-3-A   AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
 601 !A-0V-4-A   AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
 602 !A-0V-7-A   AAGGAAGCTC TATTAGATAC AGGAGCAGAT GATACAGTAT TAGAAGAAAT
 603
 604 #In this example, the first few lines provide information about the data
 605 in the sequence alignment. The following five lines contain the
 606 Sequence IDs, followed by the sequences. Specifically, the sequence
 607 identifier for the first sequence is A-0V-1-A. Note that subsequent
 608 blocks of sequence also contain the Sequence ID. If one of the
 609 sequences in your alignment is already present in the GenBank/EMBL/DDBJ
 610 database, you must mark that sequence so that it does not receive a new
 611 Accession number.  To do this, use a SeqID in the format accU12345,
 612 where U12345 is the Accession number of the pre-existing sequence.
 613 Also, Sequin will replace the "?" characters in the sequences with "N"s
 614 since they are defined as "missing" data in the header.  You should
 615 specify relevant gap and ambiguous characters in the appropriate box on
 616 the
 617
 618 <A HREF="#NucleotidePage">
 619
 620 Nucleotide Page
 621
 622 </A>
 623
 624 form.
 625 #The following is an example of NEXUS Contiguous format.
 626
 627 !#NEXUS
 628 !BEGIN DATA;
 629 !DIMENSIONS NTAX=5 NCHAR=100;
 630 !FORMAT MISSING=? GAP=- DATATYPE=DNA ;
 631 !MATRIX
 632 !
 633 !A-0V-1-A
 634 !TCACTCTTTGGCAACGACCCGTCGTCATAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
 635 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
 636 !
 637 !A-0V-2-A
 638 !TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
 639 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
 640 !
 641 !A-0V-3-A
 642 !TCACTCTTTGGCAAC---GCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
 643 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
 644 !
 645 !A-0V-4-A
 646 !TCACTCTTTGGCAACGACCCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
 647 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
 648 !
 649 !A-0V-7-A
 650 !TCACTCTTTGGCAACGACCAGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
 651 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
 652
 653 #In this example, the first few lines provide information about the data
 654 in the sequence alignment.  The following five lines contain the
 655 Sequence IDs, followed by the sequences. Specifically, the sequence
 656 identifier for the first sequence is A-0V-1-A.  Note that subsequent
 657 blocks of sequence also contain the Sequence ID.  If one of the
 658 sequences in your alignment is already present in the GenBank/EMBL/D
 659 DBJ database, you must mark that sequence so that it does not receive a
 660 new Accession number.  To do this, use a SeqID in the format accU12345,
 661 where U12345 is the Accession number of the pre-existing sequence.
 662
 663 #You can modify either NEXUS format so that Sequin can
 664 determine the correct organism and any other modifiers for each
 665 sequence.  An example of such modifications are below in the section on
 666 <A HREF="#SourceModifiersforPHYLIPandNEXUS">
 667 Source Modifiers for PHYLIP and NEXUS
 668 </A>
 669 .
 670 Alternatively, you can leave your sequence alignment in
 671 standard NEXUS format and enter the organism, strain, chromosome, etc.
 672 information on the following
 673
 674 <A HREF="#SourceModifiersForm">
 675 Source Modifers form
 676 </A>
 677 .
 678
 679 **Source Modifiers for PHYLIP and NEXUS
 680
 681 #You can modify the PHYLIP or NEXUS formats so that Sequin can determine
 682 the correct organism and any other modifiers for each sequence by
 683 adding lines at the end of the file.  The first line applies to the
 684 first sequence, the second line to the second sequence, and so on.  You
 685 must have one line for each sequence.  These inserted lines contain
 686 modifiers formatted like in the FASTA definition line, but do not begin
 687 with a SeqID.  Instead, the SeqID is present at the beginning of the
 688 sequence lines as shown above.
 689
 690 #Each of the initial lines starts with the character ">".  The
 691 scientific organism name follows in brackets.  Optional modifiers also
 692 follow in brackets.  For further information on the data that can go in
 693 the lines preceding the sequences, see the instructions entitled "FASTA
 694 Format for Nucleotide Sequences",
 695
 696 <A HREF="#FASTAFormatforNucleotideSequences">
 697 above.
 698 </A>
 699
 700 #The following lines indicating the organisms and strain of each sequence
 701 would follow immediately after the sequence in the PHYLIP and NEXUS
 702 examples, above.
 703
 704 !>[organism=Gallus gallus] [clone=C]
 705 !>[organism=Drosophila melanogaster] [strain=D]
 706 !>[organism=Caenorhabditis elegans] [strain=E]
 707 !>[organism=Rattus norvegicus] [strain=F]
 708 !>[organism=Aspergillus nidulans] [strain=G]
 709
 710 #The number of lines of source information must exactly match the number
 711 of sequences provided.
 712
 713 #Alternatively, you can leave your sequence alignment in
 714 standard NEXUS or PHYLIP format and enter the organism, strain, chromosome, etc.
 715 information on the following
 716
 717 <A HREF="#OrganismPage">
 718 Organism Page
 719 </A>
 720 .
 721
 722 *Importing Aligned Sets of Segmented Sequences
 723
 724 #Sequin can also read segmented sets that are part of an alignment if
 725 the sequences are in FASTA or FASTA+GAP format.  Each segment should
 726 have its own Sequence ID, but organism name and source modifiers should
 727 only be indicated for the first segment from each sequence. Square
 728 brackets are used to delimit the members of a set.  For example,
 729
 730 ![
 731 !>A-0V-1-Apart1 [organism=Gallus gallus] [strain=C]
 732 !TCACTCTTTGGCAAC
 733 !>A-0V-1-Apart2
 734 !GACCCGTCGTCATAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
 735 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
 736 !]
 737 ![
 738 !>A-0V-2-Apart1 [organism=Drosophila melanogaster] [strain=D]
 739 !TCACTCTTTGGCAAC
 740 !>A-0V-2-Apart2
 741 !GAAGCGTCGTCACAATAAAGATAGAGGGGCAACTAAAGGAAGCTCTA
 742 !TTAGATACAGGAGCAGATGATACAGTATTAGAAGAAAT
 743 !]
 744
 745 >Nucleotide Page
 746
 747 #The options on this page will vary depending on the
 748 <A HREF="#SubmissionType">
 749 Submission Type
 750 </A>
 751  and
 752 <A HREF="#SequenceDataFormat">
 753 Sequence Data Format
 754 </A>
 755 selected earlier.  Segmented sets and gapped sequences mut be imported
 756 as properly formatted FASTA files.  Details about importing alignment
 757 files are
 758 <A HREF="#NucleotidePageforAlignedDataFormats">
 759 below
 760 </A>
 761 .
 762
 763 *Nucleotide Page for FASTA Data Format
 764
 765 **Create Alignment
 766
 767 #If you have selected a Population study, Phylogenetic study, Mutation
 768 study, or Environmental samples set as a
 769 <A HREF="#SubmissionType">
 770 Submission Type
 771 </A>
 772 a check box will appear at the top of the Nucleotide Page.  If you
 773 check 'Create Alignment', Sequin will attempt to generate an alignment
 774 of the seqeunces within your submission.
 775
 776 **Import Nucleotide FASTA
 777
 778 #Use this button to import your properly formatted
 779 <A HREF="#FASTAFormatforNucleotideSequences">
 780 FASTA file
 781 </A>
 782 .  You will see a window containing information about the imported
 783 sequence(s).  Please check the number of sequences, Sequence IDs
 784 (SeqIDs) and length of each sequence to make sure they are correct.  If
 785 you have included source information within the FASTA definition line,
 786 this will also be listed.
 787
 788 **Add/Modify Sequences
 789
 790 #This option allows you to add or modify sequences without using a
 791 previously formatted FASTA file, but is not available if you have
 792 selected a Segmented sequence or Gapped sequence as a
 793 <A HREF="#SubmissionType">
 794 Submission Type
 795 </A>
 796 .  On the Specify Sequences box you can either import a nucleotide FASTA
 797 or add a new sequence.  If you choose Add New Sequence, a new box will
 798 pop-up where you can either import an existing sequence file or
 799 directly paste or type the nucleotide sequence.
 800
 801 #If you add a sequence where the FASTA definition line is not properly
 802 formatted a pop-up box will appear.  The top box in this pop-up will
 803 list any errors in the FASTA definition lines, including missing
 804 SeqIDs, duplicate SeqIDs for different sequences, or improperly
 805 formatted modifiers.  You can add or edit this information in the
 806 spreadsheet provided.  The toggle at the bottom of the pop-up allows
 807 you to select whether all sequences or only those with errors are
 808 listed in the spreadsheet above.  After making changes, click on
 809 Refresh Error List to ensure that all errors have been corrected.  You
 810 must correct any errors involving the SeqID in order to proceed with
 811 your submission.  Click on Accept to save your sequences and return to
 812 the Specify Sequences box.
 813
 814 #In the Specify Sequences box, you can choose to add another sequence or
 815 select a sequence from the list and choose to edit or delete it.  You
 816 can also delete all sequences at this point.  You will need to click on
 817 Done to save your sequences and return to the Nucleotide Page.
 818
 819 **Clear Sequences
 820
 821 #This option will remove all imported nucleotide sequences.
 822
 823 **Specify Molecule
 824
 825 #A database sequence can represent one of several different molecule
 826 types. The default molecule is genomic DNA.  If the sequence was not
 827 derived from genomic DNA, you can edit that information here. If you
 828 are submitting multiple sequences you can apply one molecule type to
 829 all sequences or apply the molecule type to each sequence individually.
 830   Enter in the Molecule pop-up menu the type of molecule that was
 831 sequenced.
 832
 833 #-Genomic DNA: Sequence derived directly from the DNA of an organism.
 834 Note: The DNA sequence of an rRNA gene has this molecule type, as does
 835 that from a naturally-occurring plasmid.
 836
 837 #-Genomic RNA: Sequence derived directly from the genomic RNA of certain
 838 organisms, such as viruses.
 839
 840 #-Precursor RNA: An RNA transcript before it is processed into mRNA,
 841 rRNA, tRNA, or other cellular RNA species.
 842
 843 #-mRNA[cDNA]: A cDNA sequence derived from mRNA.
 844
 845 #-Ribosomal RNA: A sequence derived from the RNA in ribosomes.  This
 846 should only be selected if the RNA itself was isolated and sequenced.
 847 If the gene for the ribosomal RNA was sequence, select Genomic DNA.
 848
 849 #-Transfer RNA: A sequence derived from the RNA in a transfer RNA, for
 850 example, the sequence of a cDNA derived from tRNA.
 851
 852 #-Small nuclear RNA: A sequence derived from small nuclear RNA, for
 853 example, the sequence of a cDNA derived from snRNA.
 854
 855 #-Small cytoplasmic RNA: A sequence derived from small cytoplasmic RNA,
 856 for example, the sequence of a cDNA derived from small cytoplasmic RNA.
 857
 858 #-Other-Genetic: A synthetically derived sequence including cloning
 859 vectors and tagged fusion constructs.
 860
 861 #-cRNA: A sequence derived from complementary RNA transcribed from DNA,
 862 mainly used for viral submissions.
 863
 864 #-Small nucleolar RNA: A sequence derived from small nucleolar RNA, for
 865 example, the sequence of a cDNA derived from snoRNA.
 866
 867 **Specify Topology
 868
 869 #Most sequences have a Linear topology and this is the default. You
 870 should change this setting to Circular only if the sequence is complete
 871 and it has a circular topology. For example, a complete plasmid or a
 872 complete mitochondrial genome would have a Circular topology, but a
 873 single gene from a plasmid or mitochondrion would have a Linear
 874 topology.  If you are submitting multiple sequences you can apply one
 875 topology to all sequences or set the topology for each sequence
 876 individually.
 877
 878 *Nucleotide Page for Aligned Data Formats
 879
 880 **Sequence Characters
 881
 882 #If you are submitting a set of aligned sequences, you can specify sequence
 883 characters used in your alignment here.  Sequin requires that you
 884 define any non-IUPAC nucleotide characters in your alignment file.  The
 885 five types of variable characters are listed under Sequence Characters.
 886
 887 #Every sequence within an alignment file must contain the same number of
 888 characters (nucleotides + gaps).  Gap characters are used to represent the
 889 spaces between contiguous nucleotides in an alignment.  Gaps that appear at
 890 the beginning or end of a sequence are treated differently than gaps that
 891 appear between nucleotides and each must be defined.  GenBank prefers to
 892 use a hyphen (-) to represent gaps. If you use a different character to
 893 represent a gap, you will need to add this character to the list in the
 894 Beginning Gap, Middle Gap, or End Gap boxes.
 895
 896 #Ambiguous characters represent nucleotides that are known to exist, but
 897 whose identity has not been experimentally validated.  GenBank prefers to
 898 use 'n' to represent any ambiguous nucleotides.  If you are using a
 899 different character to represent an ambiguous base, you will need to add
 900 this character to the list in the Ambiguous/Unknown box.  Sequin will
 901 convert these characters to 'n's when your file is imported.
 902
 903 #Match characters denote nucleotides that are identical in every member of
 904 an alignment.  GenBank prefers the use of a colon (:) to represent match
 905 characters.  If you are using a different character to represent a match
 906 character, you will need to add this character to the list in the Match box.
 907
 908 **Import Nucleotide Alignment
 909
 910 #Once you have imported the alignment using the Import Nucleotide
 911 Alignment button, you can edit the molecule information using the
 912 <A HREF="#SpecifyMolecule">
 913 Specify Molecule
 914 </A>
 915  and
 916 <A HREF="#SpecifyTopology">
 917 Specify Topology
 918 </A>
 919 buttons explained above.  Note that you can not access the
 920 <A HREF="#Add/ModifySequences">
 921 Add/Modify Sequences
 922 </A>
 923  dialog for submissions of aligned sequences.
 924
 925 >Organism Page
 926
 927 #Information about the organism from which the sequence was derived
 928 should be entered or edited on this page.  If there are any potential
 929 problems with the organism information previously provided in either
 930 the
 931 <A HREF="#FASTAFormatforNucleotideSequences">
 932 FASTA definition line
 933 </A>
 934  or entered in the
 935 <A HREF="#Add/ModifySequences">
 936 Add/Modify Sequences
 937 </A>
 938  dialog, a window listing these problems will appear at the top of the
 939 form.  Please review these problems and edit using the
 940
 941 <A HREF="#AddSourceModifiers">
 942 </A>
 943 Add Source Modifiers button as necessary.  At minimum, you must supply
 944 the scientific name of the organism from which the sequence was
 945 obtained in order to proceed with your submission.
 946
 947 #The second window is a summary of the organism information provided so
 948 far. Double clicking on a line of text within this window will launch a
 949 modifier-specific editing window.  In each of these windows, you can
 950 edit the available information for the specific modifier.  In most
 951 cases, you have the choice to edit the modifier for each sequence
 952 separately, or to enter text and select Apply above value to all
 953 sequences.  These changes will be reflected in the windows of the
 954 Organism page immediately upon closing the modifier-specific editor.
 955
 956 *Add Organisms, Locations, and Genetic Codes
 957
 958 #If you have not added organism information using either the
 959 <A HREF="#FASTAFormatforNucleotideSequences">
 960 FASTA definition line
 961 </A>
 962  or the
 963 <A HREF="#Add/ModifySequences">
 964 Add/Modify Sequences
 965 </A>
 966 dialog, you can use the Add Organisms, Locations, and Genetic Codes to
 967 do so at this point.  This button will launch the Multiple Organism
 968 Editor pop-up where you may add or edit existing information concerning
 969 the
 970 <A HREF="#Organism">
 971 Organism
 972 </A>
 973  name,
 974 <A HREF="#Location">
 975 Location
 976 </A>
 977  and
 978 <A HREF="#GeneticCode">
 979 Genetic Code
 980 </A>
 981 .  The SeqID of each sequence is listed at the left of the spreadsheet
 982 format.  You can change the information in the spreadsheet individually
 983 or globally for all sequences.
 984
 985 **Organism
 986
 987 #The scrollable list at the top of the pop-up contains the scientific
 988 names of many organisms. To reach a name on the list, type the first
 989 few letters of the scientific name into the box above the list or the
 990 appropriate box in the spreadsheet.  The list will scroll to the names
 991 beginning with those letters, and you can select the organism within
 992 the list itself.  You can then use the arrow button to copy this name
 993 into the appropriate box in the spreadsheet.
 994
 995 #To apply the same scientific name to all sequences in the submission,
 996 click on the Organism button in the spreadsheet column header.  A
 997 separate pop-up box will appear with the same organism list.  You can
 998 select a name from this list and choose Accept to apply this name to
 999 all sequences.
1000
1001 #If you have any questions about the scientific name of an organism, see
1002 the NCBI
1003 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html">
1004 Taxonomy Browser
1005 </A>
1006 http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html
1007
1008 #If the name of the organism is not on the list, type it in directly. If
1009 you do not know the scientific name, please be as specific as you can
1010 and include a unique identifier, such as a clone, isolate, strain or
1011 voucher number, or cultivar name, e.g.; Nostoc ATCC29106, uncultured
1012 spirochete Im403, Lauraceae sp. Vásquez 25230 (MO), Rosa hybrid
1013 cultivar 'Kazanlik'.  Also, if applicable, indicate if the name is
1014 unpublished as of the time of submission.  Additional information such
1015 as strain, isolate, or serotype can be entered later in the submission
1016 process.
1017
1018 **Location
1019
1020 #The default Location for all seqeunces is "Genomic".  If the sequence
1021 is not genomic, select the alternative location (ie, organelle) from
1022 the pull-down list.  You can change the location of all sequences
1023 globally by clicking on the Location button in the spreadsheet header.
1024 The following is a brief description of the choices in this list:
1025
1026 #-Genomic:  chromosome.  This category includes
1027 mitochondrial and chloroplast proteins that are encoded by the nuclear
1028 genome.
1029
1030 #-Chloroplast:  a chlorophyllous plastid.
1031
1032 #-Chromoplast:  a non-chlorophyllous, pigmented plastid, found in
1033 fruits and flowers.
1034
1035 #-Kinetoplast: a specialized type of mitochondrion found exclusively
1036 in Kinetoplastida (e.g., Leishmania).  NOTE: kinetoplast should
1037 be applied ONLY to members of the Kinetoplastida (trypanosomes and
1038 bodonids).
1039
1040 #-Mitochondrion: a semi-autonomous, self-reproducing organelle that
1041 occurs in the cytoplasm of most eukaryotic cells.
1042
1043 #-Plastid: any of a class of double membrane-bound, light-harvesting
1044 organelles (or derived from same).  NOTE: plastid should be used
1045 ONLY when a more precise term, e.g., chloroplast, is not
1046 applicable.
1047
1048 #-Macronuclear: a specialized type of nucleus found exclusively in the
1049 ciliated protists (e.g., Tetrahymena).  NOTE: macronucleus
1050 should be applied ONLY to members of the Ciliophora.
1051
1052 #-Extrachromosomal:  other extrachromosomal elements not listed here,
1053 such as a B chromosome or an F factor.
1054
1055 #-Plasmid: extrachromosomal genetic element found in bacterial species.
1056 Note this does not include the cloning vector used to propagate
1057 the sequence of interest.
1058
1059 #-Cyanelle:  a specialized type of plastid found exclusively in
1060 glaucocystophytes (e.g., Cyanophora).  NOTE: cyanelle should be
1061 applied ONLY to members of the Glaucocystophyceae.
1062
1063 #-Proviral:  a virus that is integrated into a host cell chromosome.
1064
1065 #-Virion:  a completed virus particle.
1066
1067 #-Nucleomorph:  a reduced nuclear remnant found in Chlorarachniophyceae
1068 (e.g., Chlorarachnion) and Cryptophyta (e.g, Cryptomonas).  NOTE:
1069 nucleomorph should be applied ONLY to members of the
1070 Chlorarachniophyceae or Cryptophyta.
1071
1072 #-Apicoplast:   a reduced plastid characteristic of apicomplexans
1073 (e.g., Plasmodium).  NOTE: apicoplast should be applied ONLY to
1074 members of the Apicomplexa.
1075
1076 #-Leucoplast:  a plastid lacking pigments of any type.
1077
1078 #-Proplastid:  an immature plastid.
1079
1080 #-Endogenous_virus:  a virus that has integrated permanently into the
1081 host genome, and which is inherited vertically through the
1082 germ-line of the host.
1083
1084 #-Hydrogenosome:  an organelle that produces hydrogen and ATP and is
1085 found mainly in ciliates, fungi and trichomonads.  Hydrogenosomes may
1086 be reduced mitochondria
1087
1088 **Genetic Code
1089
1090 #If you selected a scientific organism name from the scrollable list
1091 described above, this field will be filled out automatically.  However,
1092 if the organism is not on the list, this field will default to the
1093 "Standard" genetic code.  If this is incorrect, you can select the
1094 correct genetic code from the pull-down list.  To globally change the
1095 genetic code for all sequences which are not automatically filled out,
1096 click on the Genetic Code button in the spreadsheet header.
1097
1098 #For more information regarding the genetic codes available, see the NCBI
1099 <A HREF="http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c">
1100 Taxonomy page
1101 </A>.
1102 http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c
1103
1104 *Import Source Modifiers
1105
1106 #Using this button allows you to import a tab-delimited table of source
1107 modifiers.  The first column in the table must contain the Sequince
1108 Identifiers (SeqIDs) used earlier in the submission and each subsequent
1109 column must contain a different source modifier.  The first row in the
1110 table must contain the labels for each column.  The label for the
1111 Sequence Identifiers column should be in the format "Seq_ID".  A list
1112 of available
1113 <A HREF="http://www.ncbi.nlm.nih.gov/Sequin/modifiers.html">
1114 modifiers
1115 </A>
1116 in the format to be used in the column headers can be found at
1117 http://www.ncbi.nlm.nih.gov/Sequin/modifiers.html .
1118
1119 *Add Source Modifiers
1120
1121 #Using this button will launch the Specify Source Modifiers pop-up box
1122 where you can add or edit any source modifier.  You can also import a
1123 source modifier table or export the existing source modifiers in table
1124 format from this page.
1125
1126 #The Select Modifier dialog allows you to select a modifier from the
1127 pull-down list and edit the value of this modifier for each sequence or
1128 globally add a value to all sequences.
1129
1130 #The two windows in this pop-up provide information about the current
1131 source modifiers for the sequences in your submission.  The top window
1132 provides a summary of these modifiers and the lower window lists the
1133 values of each modifier for each sequence.  If any sequences have
1134 missing organism names or have source information that is identical to
1135 another sequence, the SeqIDs will be shown in red in this window.
1136 Double-clicking on a modifier value in this window will launch a pop-up
1137 where you can edit this value.  Double-clicking on the modifier name
1138 used in the header will launch a modifier-specific pop-up where you can
1139 globally edit the modifier value for all sequences or change the value
1140 for individual sequences.
1141
1142 *Clear All Source Modifiers
1143
1144 #This button will clear all modifiers previously entered in either the
1145 FASTA definition lines or the submission dialogs.  This includes the
1146 organism name which is required for submission.
1147
1148 >Protein Page
1149
1150 #This page allows you to provide the protein sequence translated from
1151 the nucleotide sequence that you just entered.  If the nucleotide
1152 sequence is alternatively spliced or contains multiple open reading
1153 frames, enter all of the protein sequences on this page.  Each protein
1154 sequence will appear in the database record as a coding sequence (CDS)
1155 feature.  Sequin will automatically determine which nucleotide
1156 sequences code for the protein and indicate the nucleotide sequence
1157 interval on the database record. Sequin also provides tools that allow
1158 you to view a graphical representation of all the open reading frames
1159 in your nucleotide sequence and to convert these reading frames into
1160 CDS features.  These tools are described later in the help
1161 documentation under the
1162
1163 <A HREF="#ORFFinder">
1164 ORF Finder.
1165 </A>
1166
1167 *Conceptual Translation Confirmed by Peptide Sequencing
1168
1169 #Most protein entries are computer-generated conceptual translations of
1170 a nucleic acid sequence.  If you have confirmed this translation by
1171 direct sequencing either of the entire protein or of peptides derived
1172 from the protein, please check this box.
1173
1174 *Incomplete at NH3 end/Incomplete at COOH end
1175
1176 #If the sequence is lacking amino acids at the amino- or
1177 carboxy-terminal end of the protein, please check the appropriate box.
1178
1179 *Create Initial mRNA with CDS Intervals
1180
1181 #If you check this box, Sequin will make an mRNA feature with the same
1182 initial intervals (i.e., range of sequence) as the CDS feature.  After
1183 the record has been assembled, you should edit the mRNA feature location
1184 to add the 5' UTR and 3' UTR intervals.  This may be done either in the
1185 mRNA editor or in the sequence editor.
1186
1187 *Import Protein FASTA
1188
1189 #You can import a single or multiple protein sequences contained within
1190 a previously generated protein FASTA file.
1191
1192 **FASTA Format for Protein Sequences
1193
1194 #The basic FASTA format is the same as that used for
1195 <A HREF="#FASTAFormatforNucleotideSequences">
1196 nucleotide sequences
1197 </A>
1198 , with a FASTA definition line followed by the sequence itself.
1199
1200 #In order to match the protein sequence to the correct nucleotide
1201 sequence, you must use the same Sequence Identifier (SeqID) that you
1202 used to identify the nucleotide sequence.  Thus in cases of
1203 alternatively spliced genes, a single protein FASTA file can contain
1204 two unique sequences that have the same SeqID.  Both coding regions
1205 will be added to the same nucleotide sequence.
1206
1207 #The available modifiers for use in a protein FASTA definition line are
1208 different than those for a nucleotide FASTA definition line and are
1209 limited to information about the protein or gene itself and are
1210 contained within the examples below.  The format remains [modifer=text].
1211
1212 #Note in all cases, the FASTA definition line must not contain any hard
1213 returns.  All information must be on a single line of text.
1214
1215 #Examples of properly formatted protein FASTA definition lines are:
1216
1217 <KBD><PRE>>Seq1 [protein=neuropilin 1] [gene=Nrp1]</KBD></PRE>
1218
1219 <KBD><PRE>>ABCD [protein=merozoite surface protein 2] [gene=msp2] [protein_desc=MSP2]</KBD></PRE>
1220
1221 <KBD><PRE>>DNA.new [protein=breast and ovarian cancer susceptibility protein] [gene=BRCA1] [note=breast cancer 1, early onset]</KBD></PRE>
1222
1223 #The protein name should be included in the entry; all other fields are
1224 optional.
1225
1226 #The line after the FASTA definition line begins the amino acid
1227 sequence.  It is recommended that each line of sequence be no longer
1228 than 80 characters.  Please only use IUPAC symbols within the amino
1229 acid sequence. Non-IUPAC amino acid symbols will be stripped from the
1230 sequence.
1231
1232 #After you import your sequence, a window will appear with information
1233 about the sequence.  The first line will describe the number of protein
1234 sequences imported and the total length in amino acids of
1235 all sequences. Each sequence is numbered, and its length,
1236 unique identifier (SeqID), Gene symbol, Protein name, and title
1237 (Definition line) as supplied in the FASTA definition line are listed.
1238
1239 >Annotation Page
1240
1241 #Note: This page will not be available if you have selected a segmented
1242 or gapped sequence as the
1243 <A HREF="#SubmissionType">
1244 Submission Type
1245 </A>
1246 .
1247
1248 #On this page, you can add a
1249 <A HREF="#gene">
1250 gene
1251 </A>
1252 ,
1253 <A HREF="#rRNA">
1254 ribosomal RNA
1255 </A>
1256  or
1257 <A HREF="#CDS">
1258 CDS
1259 </A>
1260  feature across the entire span of each sequence you are submitting.
1261 You can not specify locations within each sequence using this page.
1262 More options are available under the
1263
1264 <A HREF="#AnnotateMenu">
1265 Annotate Menu
1266 </A>
1267 in the record viewer.
1268
1269 #If the feature should be partial at one or both ends, check the
1270 appropriate box and then fill in the text boxes for the relevant
1271 feature.
1272
1273 #You may add a title to all sequences if this was not included in the
1274 FASTA definition line.  This will be used as the DEFINITION field in
1275 the final flatfile.  The title should contain a brief description of
1276 the sequence.  There is a preferred format for nucleotide and protein
1277 titles and Sequin can generate them automatically using the Generate
1278 Definition Line function under the Annotate menu in the record viewer.
1279
1280 >Assembly Tracking
1281
1282 #You will only see this form if you had previously indicated that the
1283 entry is a Third-Party Annotation submission.  You must provide the
1284 GenBank Accession number(s) of the primary sequence used to assemble
1285 your TPA sequence.  We can not accept primary sequences corresponding
1286 to Reference Sequences or those from proprietary databases.  More
1287 information about this can be found on the
1288
1289 <A HREF="http://www.ncbi.nlm.nih.gov/Genbank/tpa.html">
1290 TPA
1291 </A>
1292 home page.
1293
1294 #If a proper GenBank Accession is entered in the first column of the
1295 Assembly Tracking form, the GenBank staff can map the coordinates for
1296 you.  You do not need to fill out the 'from' and 'to' columns.  Note
1297 that multiple accessions may be entered to provide full coverage of the
1298 assembled sequence.
1299
1300 #You may also generate an Assembly Tracking form in the record viewer
1301 under the Annotate menu.  Select Descriptors and TPA Assembly from the
1302 pull-down menu in order to generate the Assembly Tracking form.
1303
1304 >Editing the Record
1305
1306 *Overview
1307
1308 #After you finish the Organism and Sequences Form, Sequin will process
1309 your entry based on the information you have entered.  The window you
1310 see now is called the record viewer.  This is also the window you will
1311 see if you are submitting an update to an existing record.  The
1312 instructions after this point are the same whether you are submitting a
1313 new record or an update.
1314
1315 #In the default window of the record viewer, you will see your entry
1316 approximately as it would appear in the database.  Most of the
1317 information that you entered earlier in the submission process is
1318 present in the viewer; other information, such as the contact, is still
1319 present in the record but will not be visible in the database entry.  If
1320 you have provided a conceptual translation of the nucleotide sequence,
1321 the translation will be listed as a CDS Feature.  Sequin automatically
1322 determines which nucleotides encode for the protein, and lists them,
1323 even if the nucleotide sequence contains introns and exons.
1324
1325 #You can save the entry to a file by selecting Save or Save As under the
1326 File menu.  This is not the same as saving the entry for submission to
1327 the database.  It is a good idea to save the file at this point so that
1328 if you make any unwanted changes during the editing process you can
1329 revert to the original copy.  If you wish to edit the entry later, click
1330 on "Read Existing Record" on the Welcome to Sequin form and choose
1331 the file.
1332
1333 #It is likely that the entry could be processed now for submission to
1334 the database.  However, you may wish to add information to
1335 the entry. This information may be in the form of Descriptors or
1336 Features.  Descriptors are annotations that apply to an
1337 entire sequence, or an entire set of sequences, and Features are
1338 annotations that apply to a specific sequence interval.  For example,
1339 you may want to change the Reference Descriptor to add a published
1340 manuscript, or to annotate the sequence by adding features such as a
1341 signal peptide or polyA signal.
1342
1343 #Information in the record viewer can be edited in different ways.  One
1344 way to modify information is to double click within the block of
1345 information you wish to edit.  Many blocks, such as "Definition",
1346 "Source", "Reference", or "Features" can be edited.
1347
1348 #To add information, create a new descriptor
1349 or feature by selecting the appropriate form from the Misc or Features
1350 menus. These options are described later in this help document.
1351
1352 #Finally, you may need to edit the sequence itself.
1353 <A HREF="#SequenceEditor">
1354 Instructions
1355 </A>
1356 for working with the sequence are presented in the documentation for the
1357 Sequence Editor.
1358
1359 *Submitting the Finished Record to the Database
1360
1361 #Once you are satisfied that you have added all the appropriate
1362 information, you must process your entry for submission to the database.
1363  Select "Validate" under the Search menu.  This function detects
1364 discrepancies between the format of your submission and that required by
1365 the database selected for entry.
1366
1367 #If Sequin detects problems with the format of your record, you will see a
1368 screen listing the validation errors as well as suggestions for how to fix the
1369 discrepancies.  Single clicking on an error message scrolls the record viewer
1370 to the feature that is causing the error.  Double clicking on the error message
1371 launches a new form on which you can enter information to correct the problem.
1372 If you are annotating a set of multiple sequences, shift-click to scroll to the
1373 target sequence and feature.  You can also dismiss the suggestion and proceed
1374 on your own. When you think you have corrected all the problems, click on
1375 "Revalidate".
1376
1377 #Message:  Select Verbose, Normal, Terse, or Table. Verbose gives a more
1378 detailed explanation of the problem.
1379
1380 #Filter:  Select the error messages you wish to see.  You can select
1381 ALL, SEQ_INST (errors regarding the sequence itself, its type, or
1382 length), SEQ_DESCR (descriptor errors), SEQ_FEAT (feature errors), or
1383 errors specific to your record.
1384
1385 #Severity:  Select the types of error messages you wish to see.  You
1386 will see the type of message selected, as well as any messages warning
1387 of more serious problems.
1388
1389 #There are four types of error messages, Info, Warning, Error, and
1390 Reject. Info is the least severe, and Reject is the most severe.  You
1391 may submit the record even if it does contain errors.  However, we
1392 encourage you to fix as many problems as possible.  Note that some
1393 messages may be merely suggestions, not discrepancies.  A possible
1394 Warning message is that a splice site does not match the consensus.
1395 This may be a legitimate result, but you may wish to recheck the
1396 sequence.  A possible Error message is that the conceptual translation
1397 of the sequence that you supplied does not encode an open reading
1398 frame.  In this case, you should check that you translated the sequence
1399 in the correct reading frame.  A possible Reject message is that you
1400 neglected to include the name of the organism from which the sequence
1401 was derived.  The name of the organism is absolutely required for a
1402 database entry.
1403
1404 #If Sequin does not detect any problems with the format of your record,
1405 you will see a message that "Validation test succeeded".
1406
1407 #To prepare the submission, click the "Done" button on the record
1408 viewer, or select "Prepare Submission" under the File menu. You will be
1409 prompted to save the file.  Email this file to the database at the
1410 address shown.  You MUST email the file; Sequin does not submit the
1411 file automatically over the network.  The email addresses for the
1412 databases are:
1413
1414 !-GenBank:  gb-sub@ncbi.nlm.nih.gov
1415 !-EMBL:  datasubs@ebi.ac.uk
1416 !-DDBJ: ddbjsub@ddbj.nig.ac.jp
1417
1418 #After your entry is complete, close the record viewer.  You will be
1419 returned to the Welcome to Sequin form and can begin another entry.
1420
1421 >The Record Viewer
1422
1423 *Target Sequence
1424
1425 #This pop-up menu shows a list of SeqIDs of all nucleotide and protein
1426 sequences associated with the Sequin entry.  Use the menu to select the
1427 sequences displayed in the record viewer, as well as the sequences you
1428 want to "target", that is, the sequences to which you want to apply a
1429 descriptor (see
1430 <A HREF="#Descriptors">
1431 Descriptors
1432 </A>
1433  in the Sequin help documentation).  You may select either an individual
1434 sequence by name or a set of sequences, such as All Sequences, or
1435 SEG_dna if you have a segmented nucleotide set.  You may change the
1436 selection at any time.
1437
1438 *Display Format
1439
1440 #You may change the display format of the record viewer to any of the
1441 formats described below. Editing a field in one display format will
1442 change that field in all formats.  Subsequent pop-up menus will appear
1443 depending on which format is selected.
1444
1445 **GenBank
1446
1447 #This display format allows you to see the submission as it would appear
1448 as a GenBank or DDBJ entry.  It is the default format.
1449
1450 #The Mode pop-up default setting is Sequin.  Release mode shows certain
1451 qualifiers and db_xrefs in RefSeq entries which are non-collaborative.
1452 Entrez mode is used fro web display and can show new elements that have
1453 not yet finished their four month quarentine period. Dump mode requires
1454 that the accession slot be populated.  In most cases, there is no need
1455 to change from the default Sequin mode.
1456
1457 #The Style pop-up allows different views of segmented records.  The
1458 default is Normal.  Segment style is the traditional representation of
1459 segmented sequences, while Contig style displays a CONTIG line with a
1460 join of accessions instead of raw sequence.  Master style shows
1461 features mapped to the segmented sequence coordinates instead of the
1462 coordinates of the individual parts.
1463
1464 **Graphic
1465
1466 #This display format shows the entry in a graphical view.  The top bar
1467 represents the nucleotide sequence.  Lower arrows or bars represent
1468 different features on the sequence.  Double click on an arrow or bar to
1469 launch the appropriate editing window. Any sequence highlighted in the
1470 Sequence Editor will be boxed on the graphical view of the sequence.
1471 To see a graphical representation of a segmented set (see
1472
1473 <A HREF="#Submissiontype">
1474 Submission type
1475 </A>,
1476 above), the Target Sequence must be set to
1477 SEG_dna.
1478
1479 #The Style pop-up menu allows you to see the display in different styles
1480 and colors.
1481
1482 #The Scale pop-up menu allows you to see the display in different sizes.
1483 The smaller the number, the larger the display.
1484
1485 **Sequence
1486
1487 #This display format shows the nucleotide sequence in the record along
1488 with any annotated features (such as CDS or mRNA).  You can only view a
1489 single sequence at a time with this option.  You can use the Features
1490 pop-up menu to change the display of the features.  With the numbering
1491 pop-up menu, select where you want the sequence numbers to be
1492 indicated, at the side of the sindow, at the top of each sequence line,
1493 or not at all.
1494
1495 **Alignment
1496
1497 #This display format shows sets of aligned sequences, such as those
1498 imported as part of a population, phylogenetic, mutation, or
1499 environmental samples set. When toggled to All Sequences in the Target
1500 Sequence pop-up, the alignment of all entries will be displayed.  To
1501 more closely analyze similarities, you can select a single entry in the
1502 Target Sequence pop-up.  The complete sequence of the entry selected
1503 will be displayed.  Any nucleotides in the other sequences that differ
1504 from that selected will be displayed, while identical nucleotides will
1505 be displayed as a period.  You can also display features annotated on
1506 the selected target sequence or all sequences using the Feature display
1507 toggle.  To launch the alignment editor, select
1508 <A HREF="#AlignmentAssistant">
1509 Alignment Assistant
1510 </A>
1511 from the record viewer Edit menu.
1512
1513 **EMBL
1514
1515 #This display format allows you to see the submission as it would appear
1516 as an EMBL entry.
1517
1518 **Table
1519
1520 #This display format shows the annotation in a five-column, tab-delimited
1521 <A HREF="table.html">table</A>
1522  format. This format can be imported to add annotation to a record that
1523 has none.
1524
1525 **FASTA
1526
1527 #This display shows the sequence and Definition line only, without any
1528 annotations, in a format called the FASTA format.  This is a format used
1529 by many molecular biology analysis programs.  You cannot edit in this
1530 display mode.
1531
1532 **Quality
1533
1534 #This display format shows quality score data ifit has been included in
1535 the submission.
1536
1537 **ASN.1
1538
1539 #This display shows the entry in Abstract Syntax Notation 1, a data
1540 description language used by the NCBI.  You cannot edit in this display
1541 mode.
1542
1543 **XML
1544
1545 #This display format shows the entry in XML language, sometimes used by
1546 various databases.  You cannot edit in this display mode.
1547
1548 **INSDSeq
1549
1550 #This display format shows the entry in the XML format used by the INSD.
1551  You cannot edit in this display mode.
1552
1553 **Desktop
1554
1555 #The NCBI DeskTop displays the internal
1556 structure of the record being viewed in Sequin.  The
1557 <A HREF="#NCBIDeskTop">
1558 DeskTop
1559 </A>
1560 is explained under the Misc menu.
1561
1562 *Done
1563
1564 #This button allows you to validate the entry when you are finished with
1565 the submission.  See
1566 <A HREF="#SubmittingtheFinishedRecordtotheDatabase">
1567 Submitting the Finished Record to the Database
1568 </A>
1569  in the Sequin help documentation.
1570
1571 *Controls for Downloaded Entries
1572
1573 #If you have downloaded a sequence from Entrez, you will see an
1574 additional button labeled PubMed.  This button will launch a web
1575 browser containing the target sequence as it appears in Entrez.  From
1576 here, you can access any Entrez-supported Links, including related
1577 sequences and associated references in PubMed.
1578
1579 >Descriptors
1580
1581 *Overview
1582
1583 #Descriptors are annotations that apply to an entire sequence, or an
1584 entire set of sequences, in a given entry.  They do not have a specific
1585 location on a sequence, as they apply to the entire sequence.  They can
1586 be contrasted to
1587 <A HREF="#Features">
1588 Features,
1589 </A>
1590 which apply to a specific interval of the sequence.
1591
1592 #You may edit descriptors in one of two ways.
1593
1594 #(1) In the record viewer, double click within the text of the
1595 descriptor to bring up a form on which information can be added.
1596
1597 #(2) Choose the option Descriptors from the Annotate menu.
1598
1599 *Annotate Menu
1600
1601 #This menu allows you either to create new descriptors or to modify
1602 existing ones.  Select the descriptor that you wish to modify.
1603
1604 #When you first select a descriptor, you will see a window called
1605 "Descriptor Target Control".  Using the target control pop-up menu,
1606 select the sequences you wish this descriptor to cover.  The name(s)
1607 listed correspond to the SeqID(s) given to the nucleotide or amino acid
1608 sequences when when they were imported into Sequin.  The default
1609 selection for this menu is set in the Target Sequence pop-up menu on
1610 the record viewer.  You may choose to have the descriptor cover just
1611 one sequence, or a set of sequences in your entry.  If you are creating
1612 a new descriptor, select "Create New".  If you wish to modify a
1613 previous descriptor, select "Edit Old".
1614
1615 #The following is a list of some of the descriptors that can be added.
1616 Two additional descriptors, those for
1617 <A HREF="#Publications">
1618 Publications
1619 </A>
1620 and
1621 <A HREF="#BiologicalSourceDescriptororFeature">
1622 Biological Source,
1623 </A>
1624 are described in other sections.
1625
1626 **TPA Assembly
1627
1628 #If you indicated that your sequence is a TPA submission, a
1629 <A HREF="#AssemblyTracking">
1630 TPA Assembly
1631 </A>
1632  was created from the information regarding primary accession numbers.
1633 This Assembly information can be edited here.  Note that it is not
1634 necessary to enter nucleotide location in the "from" and "to" columns.
1635
1636 **Update Date
1637
1638 #This is for database staff use only.  Please do not modify the date.
1639
1640 **Create Date
1641
1642 #This is for database staff use only.  Please do not modify the date.
1643
1644 **Region
1645
1646 #This descriptor provides general information about the genetic context
1647 of the sequence.  For example, if your nucleotide sequence is cloned
1648 from the region surrounding the Huntington's Disease gene, you could
1649 enter that information here.  Providing information for this descriptor
1650 is optional.
1651
1652 **Name
1653
1654 #Alternative place for a descriptive name for the sequence.  This
1655 information will not appear in the flatfile view, but will be
1656 maintained in the ASN1.
1657
1658 **Comment
1659
1660 #This descriptor is used to list any additional information that you
1661 wish to provide about the sequence. Use of this descriptor is optional.
1662  Most information can be better annotated using the appropriate
1663 features and qualifiers rather than a generic comment descriptor.
1664
1665 **Title
1666
1667 #This descriptor contains the information that will go on the Definition
1668 line of the database entry.  If you supplied a title for your
1669 nucleotide sequence when you imported it into Sequin, that information
1670 is here.  If you wish to change the Definition line, or if you did not
1671 supply a title when you submitted the sequence, edit this Descriptor.
1672 For more information on creating proper Definition lines, please see
1673 the Sequin help documentation for the
1674
1675 <A HREF="#NucleotideDefinitionLine(Title)">
1676 Nucleotide Definition Line (Title)
1677 </A>.
1678
1679 **Molecule Description
1680
1681 #This descriptor indicates the characteristics of the molecule from
1682 which the sequence was derived. The information that you have already
1683 entered can be edited here.  In most cases, the molecule and class are
1684 the only choices which should be edited from the default values.
1685
1686 ***Molecule
1687
1688 #A GenBank sequence can represent one of several different molecule
1689 types. Enter in the Molecule pop-up menu the type of molecule that was
1690 sequenced.  A brief description of the choices in this pop-up menu were
1691 listed previously.
1692
1693 ***Completedness
1694
1695 Choose the appropriate option from the pop-up menu.
1696
1697 #-Complete:  Use this designation when a complete molecule, such as a
1698 complete mitochondrial genome, is being submitted.
1699
1700 #-Partial:  Use this designation when an incomplete unit, such as the
1701 partial coding sequence of a gene, is being submitted.
1702
1703 #-No left:  Use this designation when an incomplete unit, such as the
1704 partial coding sequence of a gene, or a partial protein sequence, is
1705 being submitted.  The sequence has no left if it is incomplete on the
1706 5', or amino-terminal, end.
1707
1708 #-No right:  Use this designation when an incomplete unit, such as the
1709 partial coding sequence of a gene, or a partial protein sequence, is
1710 being submitted.  The sequence has no right if it is incomplete on the
1711 3', or carboxy-terminal, end.
1712
1713 #-No ends:  Use this designation when an incomplete unit, such as the
1714 partial coding sequence of a gene, or a partial protein sequence, is
1715 being submitted, The sequence has no ends if it is incomplete at both
1716 the 5' and 3', or amino- and carboxy- terminal, ends.
1717
1718 #-Other:  Use this designation when none of the above descriptions apply.
1719
1720 ***Technique
1721
1722 #From the pop-up menu, select the technique that was used to generate the
1723 sequence.
1724
1725 #-Standard: standard sequencing technique.
1726
1727 #-EST:
1728 <A HREF="http://www.ncbi.nlm.nih.gov/dbEST/index.html">
1729 Expressed Sequence Tag
1730 </A>
1731 : single-pass, low-quality mRNA sequences
1732 derived from cDNAs.  These sequences will appear in the EST division.
1733
1734 #-STS:
1735 <A HREF="http://www.ncbi.nlm.nih.gov/dbSTS/index.html">
1736   Sequence Tagged Site
1737 </A>
1738 : short sequences that are operationally
1739 unique in a genome and that define a specific position on the physical
1740 map.  These sequences will appear in the STS division.
1741
1742 #-Survey:
1743 <A HREF="http://www.ncbi.nlm.nih.gov/dbGSS/index.html">
1744 single-pass genomic sequence
1745 </A>
1746 .  These sequences will appear in
1747 the Genome Survey Sequence (GSS) division.
1748
1749 #-Genetic Map: Genetic map information, for example, in the Genomes division.
1750
1751 #-Physical Map: Physical map information, for example in the Genomes division.
1752
1753 #-Derived: A sequence assembled into a contig from shorter sequences.
1754
1755 #-Concept-trans: A protein translation generated with the appropriate
1756 genetic code.
1757
1758 #-Seq-pept: Protein sequence was generated by direct sequencing of a
1759 peptide.
1760
1761 #-Both: Protein sequence was generated by conceptual translation and
1762 confirmed by peptide sequencing.
1763
1764 #-Seq-pept-Overlap: Protein sequence was generated by sequencing
1765 multiple peptides, and the order of peptides was determined by overlap
1766 in their sequences.
1767
1768 #-Seq-pept-Homol: Protein sequence was generated by sequencing
1769 multiple peptides, and the order of peptides was determined by homology
1770 with another protein.
1771
1772 #-Concept-Trans-A: Conceptual translation of the nucleotide sequence
1773 provided by the author of the entry.
1774
1775 #-HTGS 0:
1776 <A HREF="http://www.ncbi.nlm.nih.gov/HTGS/">
1777 High Throughput Genome Sequence
1778 </A>
1779 , Phase 0.  These sequences
1780 are produced by high-throughput sequencing projects and will be in the
1781 HTG division.
1782
1783 #-HTGS 1:
1784 <A HREF="http://www.ncbi.nlm.nih.gov/HTGS/">
1785 High Throughput Genome Sequence
1786 </A>
1787 , Phase 1.  These sequences
1788 are produced by high-throughput sequencing projects and will be in the
1789 HTG division.
1790
1791 #-HTGS 2:
1792 <A HREF="http://www.ncbi.nlm.nih.gov/HTGS/">
1793 High Throughput Genome Sequence
1794 </A>
1795 , Phase 2.  These sequences
1796 are produced by high-throughput sequencing projects and will be in the
1797 HTG division.
1798
1799 #-HTGS 3:
1800 <A HREF="http://www.ncbi.nlm.nih.gov/HTGS/">
1801 High Throughput Genome Sequence
1802 </A>
1803 , Phase 3.  These sequences
1804 are produced by high-throughput sequencing projects and will be in the
1805 HTG division.
1806
1807 #-FLI_cDNA: Full Length Insert cDNA.  Sequence corresponds to entire cDNA but
1808 not necessarily entire transcript. These sequences are produced by large
1809 sequencing projects.
1810
1811 #-HTC: High Throughput cDNA. These sequences are produced by large sequencing
1812 projects.
1813
1814 #-WGS:
1815 <A HREF="http://www.ncbi.nlm.nih.gov/Genbank/wgs.html">
1816 Whole Genome Shotgun
1817 </A>
1818 .  These sequences are produced by large sequencing projets and follow a
1819 separate submission process.
1820
1821 #-Barcode: Nucleotide sequence is part of Barcodes of Life project.  This
1822 selection should only be used by members of the Consortium for the
1823 Barcodes of Life.
1824
1825 #-Other: Do not use this designation.
1826
1827 ***Class
1828
1829 #From the pop-up menu, select the type of molecule that was sequenced.
1830
1831 #-DNA:  DNA
1832
1833 #-RNA:  RNA
1834
1835 #-Protein:  Protein
1836
1837 #-Nucleotide: Do not select this item
1838
1839 #-Other:  Do not select this item
1840
1841 ***Topology
1842
1843 #From the pop-up menu, select the topology of the sequenced molecule.
1844
1845 #-Linear:  Linear molecule (most sequences).
1846
1847 #-Circular:  Circular molecule (such as a complete plasmid or mitochondrion).
1848
1849 #-Tandem:  Do not select this item.
1850
1851 #-Other:  Do not select this item.
1852
1853 ***Strand
1854
1855 #From the pop-up menu, select whether the sequence was derived from an
1856 organism with a single- or double-stranded genome.  This is used primarily for
1857 viral submissions.
1858
1859 #-Single:  The organism contains only a single-stranded genome, for
1860 example, ssRNA viruses.
1861
1862 #-Double:  The organism contains only a double-stranded genome, for
1863 example, dsDNA viruses.
1864
1865 #-Mixed:  Do not select this item.
1866
1867 #-Mixed Rev:  Do not select this item.
1868
1869 #-Other:  Do not select this item.
1870
1871 **Biological Source
1872
1873 #The Biological Source descriptor is described in more detail
1874 <A HREF="#BiologicalSourceDescriptororFeature">
1875 below.
1876 </A>
1877
1878 >Features
1879
1880 *Overview
1881
1882 #Features are annotations which apply to one or more intervals on a
1883 sequence. They can be contrasted to
1884 <A HREF="#Descriptors">
1885 Descriptors,
1886 </A>
1887 that apply to an entire sequence or an entire set of sequences.
1888 Features will be added to the Target Sequence selected in the record
1889 viewer pop-up menu.
1890
1891 #You may add or modify features in one of three ways.
1892
1893 #(1) In the record viewer, double click on the text of an existing
1894 feature to bring up a form on which information can be added or edited.
1895
1896 #(2) Choose the feature from the Annotate menu to add a new feature.
1897
1898 #(3) Choose the feature from the Sequence Editor Features menu to add a
1899 new feature.
1900
1901 #The features listed in the Annotate menu and the Sequence Editor
1902 Features menu are identical, and the instructions for adding them are
1903 the same, with one exception.  If you annotate them in the Annotate
1904 menu, you must provide the nucleotide sequence location of the feature.
1905  However, if you add features from the Sequence Editor, you can
1906 highlight the sequence that the feature covers, and the location of the
1907 sequence will be automatically entered in the feature location box.
1908
1909 *Annotate Menu
1910
1911 #This menu allows you to add or modify features on the sequence selected
1912 in the Target Sequence pop-up menu of the record viewer.  Features are
1913 grouped into six categories.  Select the feature that you would like to
1914 mark on your sequence.  A new form will appear.
1915
1916 #Feature forms share a common design.  The first page is specific to the
1917 particular feature, e.g., Coding Region or Gene.  The second page lists
1918 Properties of the Feature.  The third page describes the Location of the
1919 feature.  Details about the common second and third pages are provided
1920 below.
1921
1922 **Properties Page
1923
1924 ***General Subpage
1925
1926 #Enter general comments about the feature here.
1927
1928 #Select any of the flags if necessary.  If this sequence contains only a
1929 partial representation of the feature you are describing, check the
1930 "Partial" box.  Check the "Exception" box if the feature annotates a
1931 post-transcriptional modification of the nucleotide sequence, such as
1932 ribosomal slippage or RNA editing.  This is generally used only on CDS
1933 features.  The evidence dialogs will only be editable if information
1934 has been entered in the Evidence subpage.
1935
1936 #If a gene feature overlaps the feature you are editing, the gene symbol
1937 will appear in the pull-down menu.  If you want to add the name of a
1938 new gene, select new, and enter its name and optional description.  By
1939 default, mapping between the feature and the gene is done by overlap,
1940 that is, the gene associated with the feature is the gene whose
1941 location overlaps with the location of the feature.   Under some
1942 circumstances, for example, if the sequences of two genes overlap, you
1943 may wish the feature to apply to a different gene.  In this case,
1944 select cross-reference, and select the name of the new gene in the
1945 pop-up menu. If you do not want the feature to map to any existing
1946 gene, select suppress.  You may also edit information on the Gene
1947 feature form by clicking on Edit Gene Feature.
1948
1949 ***Comment Subpage
1950
1951 #Add any comments about the feature here, especially if you checked the
1952 "Exception" box on the General Subpage.
1953
1954 ***Citations Subpage
1955
1956 #This page is used to list any citations that specifically apply to the
1957 feature you are annotating.  The citation must have already been entered
1958 into the record (see
1959 <A HREF="#Publications">
1960 Publications
1961 </A>)
1962  in the Sequin help documentation.  Click on Edit Citations, and
1963 place a check mark in box next to the publication you want to cite.
1964 However, we discourage the use of citations on features.
1965
1966 ***Cross-Refs Subpage
1967
1968 #This is a read-only page used to cross-reference this entry to entries
1969 in external databases (databases other than GenBank, EMBL/EBI, and
1970 DDBJ), such as dbEST or FLYBASE.  For more information on this topic,
1971 see the International Nucleotide Sequence Database Collaboration
1972
1973 <A HREF="http://www.ncbi.nlm.nih.gov/collab/db_xref.html">
1974 page
1975 </A>.
1976 http://www.ncbi.nlm.nih.gov/collab/db_xref.html
1977
1978 ***Evidence Subpage
1979
1980 #This page is primarily used by large sequencing centers to explain
1981 annotation prediction methods and its use is optional.  More details
1982 about these qualifiers can be found in the
1983 <A HREF="http://www.ncbi.nlm.nih.gov/GenBank/evidence.html">
1984 genome submission guidelines
1985 </A>.
1986 The two choices of evidence are Experiment or Inference.
1987
1988 #Wet-bench, experimental evidence can be entered as free text in the
1989 Experiment section.  Please be as brief as possible.
1990
1991 #The Inference section allows for information to be added in cases where
1992 the feature is annotated based solely on sequence similarity or
1993 prediction software. In order to fill in text, you must select one of
1994 the options from the Category pull-down menu.  Different pull-down and
1995 text boxes will appear depending on the selection you choose from the
1996 Category menu.  If you select one of the 'similar to' categories, you
1997 must include the name of the database and the corresponding accession
1998 number of the sequence used as the basis for the annotation.  If you
1999 choose one of the prediction categories, you must include the name and
2000 version of the prediction program used as the basis for the annotation.
2001
2002 #For example, if your annotation of a coding region was based on
2003 similarity to the sequence and annotation in GenBank Accession number
2004 AY411252, you would select "similar to DNA sequence" from the pull-down
2005 menu and then select "INSD" in the Database pull-down.  You would then
2006 type "AY411252.1" in the Accession text box.  If the annotation is
2007 based on the Genscan prediction algorithm, you would select "ab initio
2008 prediction" from the pull-down menu, select "Genscan" in the Program
2009 pull-down and enter 2.0 in the Program Version text box.  If the
2010 database or program used is not listed in the appropriate pull-down
2011 list, select Other from the list.  A new text box will appear where you
2012 can enter the name of the database or program used.  You still must
2013 include the appropriate accession number or version in the subsequent
2014 text box.
2015
2016 ***Identifiers Subpage
2017
2018 #This is a read-only page used by the database staff for tracking
2019 features within the record.
2020
2021 **Location Page
2022
2023 #This page allows you to select the location of the feature you are
2024 citing. Each feature must have a sequence interval associated with it.
2025 In most cases, Sequin will limit the option to the nucleic acid or
2026 protein sequence as appropriate.
2027
2028 #Check the 5' Partial or 3' Partial box if the feature in your nucleic
2029 acid sequence is missing residues at the 5' or 3' ends, respectively.
2030 Check the NH2 Partial or COOH Partial if the feature in your amino acid
2031 sequence is missing residues at the amino- or carboxy-terminal ends,
2032 respectively.  If you checked "Partial" on the Properties page, you
2033 must check either the 5' and/or 3' partial boxes.
2034
2035 #Enter the sequence range of the feature.  The numbers should correspond
2036 to the nucleotide sequence interval if the SeqID is set to a nucleotide
2037 sequence, and to an amino acid sequence interval if the SeqID is set to
2038 a protein sequence.  If the feature spans multiple, non-continuous
2039 intervals on the sequence, indicate the beginning and end points of each
2040 interval. If each interval is separate, and should not be joined with
2041 the others to describe the feature, check the Intersperse intervals with
2042 gaps box (for example, when annotating multiple primer binding sites).
2043 If the feature is composed of several intervals that should all be
2044 joined together, do not check the box (for example, when annotating mRNA
2045 on a genomic DNA sequence).
2046
2047 #For nucleic acid Features only:  From the pop-up menu, select the
2048 strand on which the feature is found.
2049
2050 #-Plus:  Plus strand, or coding strand.
2051
2052 #-Minus:  Minus strand, or non-coding strand.
2053
2054 #-Both:  Both strands.
2055
2056 #-Reverse:  Do not select this item.
2057
2058 #-Other:  Do not select this item.
2059
2060 #Use the pop-up menu to select the SeqID of the sequence you are
2061 describing by the location.
2062
2063 #If you are working on a set of sequences which contain an alignment,
2064 you will see a toggle at the bottom of the Location Page where you can
2065 select to add or view the location of the feature using the Sequence
2066 Coordinates of the target sequence or the Alignment Coordinates.  In
2067 either case, the feature will only be added to the target sequence.  If
2068 you want to add features to all members of the set using the alignment
2069 coordinates, you must use the
2070
2071 <A HREF="http://www.ncbi.nlm.nih.gov/Sequin/sequin.hlp.html#Workingwithsetsofalignedsequences">
2072 Alignment Assistant
2073 </A>
2074 .
2075 #A brief description of the available features follows.  A detailed
2076 explanation of how to use the coding region (CDS) feature is included.
2077 The DDBJ/EMBL/GenBank feature table definition
2078 <A HREF="http://www.ncbi.nlm.nih.gov/collab/FT/index.html">
2079 page
2080 </A>
2081 http://www.ncbi.nlm.nih.gov/collab/FT/index.html
2082  provides detailed information about other features.
2083
2084 *attenuator
2085
2086 #1) region of DNA at which regulation of termination of transcription
2087 occurs, which controls the expression of some bacterial operons; 2)
2088 sequence segment located between the promoter and the first structural
2089 gene that causes partial termination of transcription.
2090
2091 *C_region
2092
2093 #Constant region of immunoglobulin light and heavy chains, and T-cell
2094 receptor alpha, beta, and gamma chains.  Includes one or more exons,
2095 depending on the particular chain.
2096
2097 *CAAT_signal
2098
2099 #CAAT box; part of a conserved sequence located about 75 bp upstream of
2100 the start point of eukaryotic transcription units that may be involved
2101 in RNA polymerase binding; consensus=GG(C or T)CAATCT.
2102
2103 *CDS
2104
2105 #coding sequence; sequence of nucleotides that corresponds with the
2106 sequence of amino acids in a protein (location includes stop codon).
2107 Feature includes amino acid conceptual translation.
2108
2109 **Coding Region Page
2110
2111 #Most users add a coding region to their sequence when they fill out the
2112 Organism and Sequences form.  However, you may need to edit the coding
2113 region, or add additional ones.  Choose CDS under the Coding Regions
2114 and Transcripts submenu of the Features menu, or to edit an existing
2115 CDS, double click on the record viewer. If you appended the partial
2116 sequence of a coding region to the Organism and Sequences form, you will
2117 probably need to edit the Coding Region feature to avoid validation
2118 error messages about the location of the coding region.
2119
2120 ***General (Product) Subpage
2121
2122 #Choose the genetic code that should be used to translate the
2123 nucleotide sequence.  For more information, and for the translation
2124 tables themselves, see the NCBI Taxonomy
2125 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c">
2126 page
2127 </A>.
2128 If the genetic code is already populated from the taxonomy database, do
2129 not change this selection.
2130
2131 #Choose the reading frame in which to translate the sequence.  Do not
2132 fill in the Protein Product or SeqID selections.
2133
2134 #Supply additional information about the protein by clicking on Edit
2135 Protein Information to launch the Protein feature forms. The protein
2136 name must have already been filled out on the Protein subpage.
2137
2138 #Checking retranslate on accept will translate the nucleotide sequence
2139 according to the interval(s) indicated on the Locations page when you
2140 click on Accept to exit the editor.  This new translation will replace
2141 any earlier translations you have supplied.  This should not be a
2142 problem if the interval was indicated appropriately.
2143
2144 #If the coding sequence that you supply is a partial sequence and you
2145 have checked a Partial box on the Location subpage, it is a good idea to
2146 check the Synchronize Partials box.  In this case, Sequin will ensure
2147 that all other appropriate features (such as protein) are also marked as
2148 partial.
2149
2150 #When editing existing CDS features, choose the sequence you want to
2151 view by selecting its name uder the Product pop-up menu.  You may also
2152 import a new protein sequence by selecting Import Protein FASTA under
2153 the file menu. The sequence should be formatted as described above on
2154 the Organism and Sequences form.
2155
2156 #After you have imported a protein sequence, click on Predict Interval.
2157 This function will predict the interval on the nucleotide sequence to
2158 which the coding region applies. If you do not select this function,
2159 the interval will likely be wrong, and you will get an error message
2160 when you attempt to validate the record. If your sequence is a 5' or 3'
2161 partial, you must first indicate this manually on the Location Page.
2162
2163 #You may also have Sequin generate the protein sequence from the
2164 nucleotide sequence by clicking on Translate Product. However, you must
2165 first indicate the location and partialness of the coding region on the
2166 Location page in order to obtain the correct translation.
2167
2168 ***Protein Subpage
2169
2170 #Use this page to enter or edit a name or descriptionof the protein
2171 product.  For a new sequence, enter information directly into the
2172 boxes.  You can edit descriptions of an existing sequence by clicking
2173 on Edit Protein Feature which will bring up the Protein feature form.
2174 The Launch Product Viewer displays the flatfile view of ht eprotein
2175 record generated from the information in the CDS feature.
2176
2177 ***Exceptions Subpage
2178
2179 #Exceptions describe places where there is a posttranslational
2180 modification. Enter the amino acid position at which the modification
2181 occurs, and select the amino acid that is actually represented in the
2182 protein from the pop-up list. Sequin will change the amino acid number
2183 to a nucleotide interval.  Please provide some explanation for the
2184 exception in a comment.
2185
2186 *conflict
2187
2188 #Independent determinations of the "same" sequence differ at this site
2189 or region.
2190
2191 *D-loop
2192
2193 #Displacement loop; a region within mitochondrial DNA in which a short
2194 stretch of RNA is paired with one strand of DNA, displacing the
2195 original partner DNA strand in this region; also used to describe the
2196 displacement of a region of one strand of duplex DNA by a single
2197 stranded invader in the reaction catalyzed by RecA protein.
2198
2199 *D_segment
2200
2201 #Diversity segment of immunoglobulin heavy chain, and T-cell receptor
2202 beta chain.
2203
2204 *enhancer
2205
2206 #A cis-acting sequence that increases the utilization of (some)
2207 eukaryotic promoters and can function in either orientation and in any
2208 location (upstream or downstream) relative to the promoter.
2209
2210 *exon
2211
2212 #Region of genome that codes for portion of spliced mRNA; may contain
2213 5' UTR, all CDSs, and 3' UTR.
2214
2215 *gap
2216
2217 #Gap in the sequence, only applied to gaps of unknown length.  The
2218 location span of the gap feature is 100 base pairs, indicated by 100 "n"s
2219 in the sequence.  The qualifier /estimated_length=unknown is mandatory.
2220
2221 *GC_signal
2222
2223 #GC box; a conserved GC-rich region located upstream of the start point
2224 of eukaryotic transcription units that may occur in multiple copies or
2225 in either orientation; consensus=GGGCGG.
2226
2227 *gene
2228
2229 #Region of biological interest identified as a gene and for which a name
2230 has been assigned.
2231
2232 *iDNA
2233
2234 #Intervening DNA; DNA which is eliminated through any of several kinds
2235 of recombination.
2236
2237 *intron
2238
2239 #A segment of DNA that is transcribed, but removed from within the
2240 transcript, by splicing together the sequences (exons) on either side of
2241 it.
2242
2243 *J_segment
2244
2245 #Joining segment of immunoglobulin light and heavy chains, and T-cell
2246 receptor alpha, beta, and gamma chains.
2247
2248 *LTR
2249
2250 #Long terminal repeat, a sequence directly repeated at both ends of a
2251 defined sequence, of the sort typically found in retroviruses.
2252
2253 *mat_peptide
2254
2255 #Mature peptide or protein coding sequence; coding sequence for the
2256 mature or final peptide or protein product following post-translational
2257 modification. The location does not include the stop codon (unlike the
2258 corresponding CDS).
2259
2260 *misc_binding
2261
2262 #Site in nucleic acid that covalently or non-covalently binds another
2263 moiety that cannot be described by any other Binding key (primer_bind or
2264 protein_bind).
2265
2266 *misc_difference
2267
2268 #Feature sequence is different from that presented in the entry and
2269 cannot be described by any other Difference key (conflict, unsure,
2270 old_sequence, mutation, variation, allele, or modified_base).
2271
2272 *misc_feature
2273
2274 #Region of biological interest which cannot be described by any other
2275 feature key.
2276
2277 *misc_recomb
2278
2279 #Site of any generalized, site-specific, or replicative recombination
2280 event where there is a breakage and reunion of duplex DNA that cannot be
2281 described by other recombination keys (iDNA and virion) or qualifiers of
2282 source key (/insertion_seq, /transposon, /proviral).
2283
2284 *misc_RNA
2285
2286 #Any transcript or RNA product that cannot be defined by other RNA keys
2287 (prim_transcript, precursor_RNA, mRNA, 5'clip, 3'clip, 5'UTR, 3'UTR,
2288 exon, intron, polyA_site, rRNA, tRNA, scRNA, snoRNA, and snRNA).
2289
2290 *misc_signal
2291
2292 #Any region containing a signal controlling or altering gene function or
2293 expression that cannot be described by other Signal keys (promoter,
2294 CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS,
2295 polyA_signal, enhancer, attenuator, terminator, and rep_origin).
2296
2297 *misc_structure
2298
2299 #Any secondary or tertiary structure or conformation that cannot be
2300 described by other Structure keys (stem_loop and D-loop).
2301
2302 *modified_base
2303
2304 #The indicated nucleotide is a modified nucleotide and should be
2305 substituted for by the indicated molecule (given in the mod_base
2306 qualifier value).
2307
2308 *mRNA
2309
2310 #messenger RNA; includes 5' untranslated region (5' UTR), coding sequences
2311 (CDS, exon) and 3' untranslated region (3' UTR).
2312
2313 *N_region
2314
2315 #Extra nucleotides inserted between rearranged immunoglobulin segments.
2316
2317 *old_sequence
2318
2319 #The presented sequence revises a previous version of the sequence at
2320 this location.
2321
2322 *operon
2323
2324 #Region containing polycistronic transcript under the control of the same
2325 regulatory sequences.
2326
2327 *oriT
2328
2329 Origin of transfer; region of DNA where transfer is initiated during the
2330 process of conjugation or mobilization.
2331
2332 *polyA_signal
2333
2334 #Recognition region necessary for endonuclease cleavage of an RNA
2335 transcript that is followed by polyadenylation; consensus=AATAAA.
2336
2337 *polyA_site
2338
2339 #Site on an RNA transcript to which will be added adenine residues by
2340 post-transcriptional polyadenylation.
2341
2342 *precursor_RNA
2343
2344 #Any RNA species that is not yet the mature RNA product; may include 5'
2345 clipped region (5' clip), 5' untranslated region (5' UTR), coding
2346 sequences (CDS, exon), intervening sequences (intron), 3' untranslated
2347 region (3' UTR), and 3' clipped region (3' clip).
2348
2349 *prim_transcript
2350
2351 #Primary (initial, unprocessed) transcript; includes 5' clipped region
2352 (5' clip), 5' untranslated region (5' UTR), coding sequences (CDS, exon),
2353 intervening sequences (intron), 3' untranslated region (3' UTR), and 3'
2354 clipped region (3' clip).
2355
2356 *primer_bind
2357
2358 #Non-covalent primer binding site for initiation of replication,
2359 transcription, or reverse transcription. Includes site(s) for synthetic
2360 e.g., PCR primer elements.
2361
2362 *promoter
2363
2364 #Region on a DNA molecule involved in RNA polymerase binding to initiate
2365 transcription.
2366
2367 *protein_bind
2368
2369 #Non-covalent protein binding site on nucleic acid.
2370
2371 *RBS
2372
2373 #Ribosome binding site.
2374
2375 *repeat_region
2376
2377 #Region of genome containing repeating units.
2378
2379 *repeat_unit
2380
2381 #Single repeat element.
2382
2383 *rep_origin
2384
2385 #Origin of replication; starting site for duplication of nucleic acid to
2386 give two identical copies.
2387
2388 *rRNA
2389
2390 #Mature ribosomal RNA ; the RNA component of the ribonucleoprotein
2391 particle (ribosome) that assembles amino acids into proteins.
2392
2393 *S_region
2394
2395 #Switch region of immunoglobulin heavy chains. Involved in the
2396 rearrangement of heavy chain DNA leading to the expression of a
2397 different immunoglobulin class from the same B-cell.
2398
2399 *satellite
2400
2401 #Many tandem repeats (identical or related) of a short basic repeating
2402 unit; many have a base composition or other property different from the
2403 genome average that allows them to be separated from the bulk (main
2404 band) genomic DNA.
2405
2406 *scRNA
2407
2408 #Small cytoplasmic RNA; any one of several small cytoplasmic RNA
2409 molecules present in the cytoplasm and (sometimes) nucleus of a
2410 eukaryote.
2411
2412 *sig_peptide
2413
2414 #Signal peptide coding sequence; coding sequence for an N-terminal
2415 domain of a secreted protein; this domain is involved in attaching
2416 nascent polypeptide to the membrane; leader sequence.
2417
2418 *snRNA
2419
2420 #Small nuclear RNA involved in pre-mRNA splicing and processing.
2421
2422 *snoRNA
2423
2424 #Small nucleolar RNA molecules generally involved in rRNA modification
2425 and processing.
2426
2427 *source
2428
2429 #Identifies the biological source of the specified span of the sequence.
2430 This key is mandatory. Every entry will have, as a minimum, a single
2431 source key spanning the entire sequence. More than one source key per
2432 sequence is permittable.
2433
2434 *stem_loop
2435
2436 #Hairpin; a double-helical region formed by base-pairing between
2437 adjacent (inverted) complementary sequences in a single strand of RNA or
2438 DNA.
2439
2440 *STS
2441
2442 #Sequence Tagged Site. Short, single-copy DNA sequence that
2443 characterizes a mapping landmark on the genome and can be detected by
2444 PCR. A region of the genome can be mapped by determining the order of a
2445 series of STSs.
2446
2447 *TATA_signal
2448
2449 #TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found
2450 about 25 bp before the start point of each eukaryotic RNA polymerase II
2451 transcript unit that may be involved in positioning the enzyme for
2452 correct initiation; consensus=TATA(A or T)A(A or T).
2453
2454 *terminator
2455
2456 #Sequence of DNA located either at the end of the transcript or adjacent
2457 to a promoter region that causes RNA polymerase to terminate
2458 transcription; may also be site of binding of repressor protein.
2459
2460 *transit_peptide
2461
2462 #Transit peptide coding sequence; coding sequence for an N-terminal
2463 domain of a nuclear-encoded organellar protein; this domain is involved
2464 in post- translational import of the protein into the organelle.
2465
2466 *tRNA
2467
2468 #Mature transfer RNA, a small RNA molecule (75-85 bases long) that
2469 mediates the translation of a nucleic acid sequence into an amino acid
2470 sequence.
2471
2472 *unsure
2473
2474 #Author is unsure of exact sequence in this region.
2475
2476 *V_region
2477
2478 #Variable region of immunoglobulin light and heavy chains, and T-cell
2479 receptor alpha, beta, and gamma chains.  Codes for the variable amino
2480 terminal portion.  Can be made up from V_segments, D_segments,
2481 N_regions, and J_segments.
2482
2483 *V_segment
2484
2485 #Variable segment of immunoglobulin light and heavy chains, and T-cell
2486 receptor alpha, beta, and gamma chains.  Codes for most of the variable
2487 region (V_region) and the last few amino acids of the leader peptide.
2488
2489 *variation
2490
2491 #A related strain contains stable mutations from the same gene (e.g.,
2492 RFLPs, polymorphisms, etc.) that differ from the presented sequence at
2493 this location (and possibly others).
2494
2495 *3'clip
2496
2497 #3'-most region of a precursor transcript that is clipped off during
2498 processing.
2499
2500 *3'UTR
2501
2502 #Region near or at the 3' end of a mature transcript (usually following
2503 the stop codon) that is not translated into a protein; trailer.
2504
2505 *5'clip
2506
2507 #5'-most region of a precursor transcript that is clipped off during
2508 processing.
2509
2510 *5'UTR
2511
2512 #Region near or at the 5' end of a mature transcript (usually preceding
2513 the initiation codon) that is not translated into a protein; leader.
2514
2515 * -10_signal
2516
2517 #Pribnow box; a conserved region about 10 bp upstream of the start point
2518 of bacterial transcription units that may be involved in binding RNA
2519 polymerase; consensus=TAtAaT.
2520
2521 * -35_signal
2522
2523 #A conserved hexamer about 35 bp upstream of the start point of
2524 bacterial transcription units; consensus = TTGACa or TGTTGACA.
2525
2526 >Biological Source Descriptor or Feature
2527
2528 #This annotation is very important, as an entry cannot be processed by
2529 the databases unless it includes some basic information about the
2530 organism from which the sequence was derived.  This basic information was
2531 entered previously in the submission, in the Organism and Sequences
2532 Form.  The more detailed Organism Information form allows you to alter
2533 or add to the data you entered earlier.
2534
2535 *Overview:  Descriptor or Feature?
2536
2537 #Sequin allows two types of biological source information to be entered,
2538 Biological Source Descriptors and Biological Source Features. Biological
2539 Source Descriptors, like other descriptors, provide organism information
2540 about an entire sequence, or an entire set of sequences, in an entry.
2541 Biological Source Features, like other features, provide organism
2542 information about a specific interval on a given sequence.
2543
2544 #In most cases, you will want to use a Biological Source Descriptor, because
2545 all the sequences in the entry will derive from the same source.
2546 However, if you have sequenced a transgenic molecule, for example, one
2547 that is part plant and part bacterial, you would use Biological Source
2548 Features to annotate which sequence was derived from plant and which from
2549 bacteria.
2550
2551 #To add a Biological Source Descriptor, select Biological Source under
2552 the Descriptor section of the Annotate menu.  To add a Biological
2553 Source Feature, select Biological Source under the Bibliographic and
2554 Comments section of the Annotate menu.
2555
2556 #Annotating a Biological Source Descriptor or Feature is similar to
2557 annotating any descriptor or feature.  For help in creating descriptors
2558 and features, see the appropriate section of the help documentation.
2559 The following are instructions for filling out Biological
2560 Source-specific forms.
2561
2562 *Organism Page
2563
2564 **Names Subpage
2565
2566 #The scrollable list contains the scientific names of many organisms.
2567 To reach a name on the list, either type the first few letters of the
2568 scientific name, or use the thumb bar.  Click on a name from the list to
2569 fill out the scientific name field.  If there is a common name for the
2570 organism, that field will be filled out automatically.  You may also
2571 directly type in the scientific name.  If you have any questions about
2572 the scientific or common name of an organism, see the NCBI
2573 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html">
2574 taxonomy browser
2575 </A>
2576 http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html
2577
2578 **Location Subpage
2579
2580 ***Location of Sequence
2581
2582 #From the selection list, please enter the location of the genome that
2583 contains your sequence.  Most entries will have a "Genomic" location.
2584 A brief description of the choices in this pop-up menu were listed
2585 previously.
2586
2587 ***Origin of Sequence
2588
2589 #This menu is for the use of database personnel.  Please leave this
2590 field empty.  The Biological focus box should be checked in rare cases
2591 where multiple source features are annotated.
2592
2593 **Genetic Codes Subpage
2594
2595 #Please use these fields to select the nuclear and mitochondrial genetic
2596 code that should be used to translate the nucleic acid sequence.  The
2597 genetic code for a eukaryotic organism is "Standard".  If you selected
2598 an organism name from the scrollable list described above, this field
2599 was filled out automatically.  Do not change these fields if they have
2600 been filled out automatically.
2601
2602 #For more information regarding the translation tables available, see
2603 the NCBI Taxonomy
2604
2605 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c">
2606 page
2607 </A>.
2608
2609 **Lineage Subpage
2610
2611 #This information is normally entered by the database staff.  They will
2612 use the
2613 <A HREF="http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html">
2614 Taxonomy database
2615 http://www.ncbi.nlm.nih.gov/Taxonomy/tax.html
2616 </A>
2617  maintained by the NCBI/GenBank.
2618
2619 #If you disagree with the lineage supplied please notify the database
2620 staff.
2621
2622 #If you are running Sequin in its
2623 <A HREF="#NetConfigure">
2624 network-aware
2625 </A>
2626 mode, you will see a button labeled "Lookup Taxonomy".  Click on this
2627 button to perform an automatic look-up of the taxonomic lineage of the
2628 organism.  Sequin will perform the look-up by accessing the Taxonomy
2629 database and will fill out the Taxonomic Lineage and
2630 Division fields.
2631
2632 #If you have any comments about the taxonomic lineage determined by
2633 Sequin, please submit these comments with your entry.  Under the Sequin
2634 File menu, select Edit Submitter Info.  Enter your comments in the box
2635 entitled "Special Instructions to Database Staff", on the Submission
2636 page.
2637
2638 *Modifiers Page
2639
2640 #This page allows you to enter additional information about the source
2641 and/or organism.  Entering information is optional.
2642
2643 **Source Subpage
2644
2645 #Choose a modifier from the pull-down menu on the left side of the page
2646 and type the appropriate name on the right side of the page.  If you do
2647 not find appropriate modifiers in the scroll down list, you can enter
2648 additional source information as text in the field at the bottom of the
2649 page.  You may select as many modifiers as you want.
2650
2651 #The following is a description of the available modifiers:
2652
2653 #-Cell-line:  Cell line from which sequence derives.
2654
2655 #-Cell-type:  Type of cell from which sequence derives.
2656
2657 #-Chromosome:  Chromosome to which the gene maps.
2658
2659 #-Clone:  Name of clone from which sequence was obtained.
2660
2661 #-Clone-lib:  Name of library from which sequence was obtained.
2662
2663 #-Collected-by:  Name of person who collected sample.  Do not use
2664 accented or non-ASCII characters.
2665
2666 #-Collection-date:  Date sample was collected.  Must use format
2667 23-Mar-2005, Mar-2005, or 2005.
2668
2669 #-Country: The country of origin of DNA samples used for epidemiological
2670 or population studies.
2671
2672 #-Dev-stage:  Developmental stage of organism.
2673
2674 #-Endogenous-virus-name:  Name of inactive virus that is integrated into
2675 the chromosome of its host cell and can therefore exhibit vertical
2676 transmission.
2677
2678 #-Environmental-sample: Identifies sequence derived by direct molecular
2679 isolation from an unidentified organism.  Do not include extra text when
2680 using this modifier.
2681
2682 #-Frequency:  Frequency of occurrence of a feature.
2683
2684 #-Fwd-PCR-primer-name:  Name or designation of forward primer used for
2685 amplification.
2686
2687 #-Fwd-PCR-primer-seq:  Sequence of forward primer used for amplification.
2688
2689 #-Genotype:  Genotype of the organism.
2690
2691 #-Germline:  If the sequence shown is DNA and a member of the
2692 immunoglobulin family, this qualifier is used to denote that the sequence
2693 is from unrearranged DNA.  Do not include extra text when using this
2694 modifier.
2695
2696 #-Haplotype:  Haplotype of the organism.
2697
2698 #-Identified-by:  Name of person who identified sample.  Do not use
2699 accented or non-ASCII characters.
2700
2701 #-Isolation-source:  Describes the local geographical source of the organism
2702 from which the sequence was derived
2703
2704 #-Lab-host:  Laboratory host used to propagate the organism from which
2705 the sequence was derived.
2706
2707 #-Lat-Lon:  Latitude and longitude of location where sample was
2708 collected.  Preferred format is decimal degrees N/S E/W.
2709
2710 #-Map:  Map location of the gene.
2711
2712 #-Plasmid-name:  Name of plasmid from which the sequence was obtained.
2713
2714 #-Plastid-name:  Name of plastid from which the sequence was obtained.
2715
2716 #-Pop-variant:  Name of the population variant from which the sequence was
2717 obtained.
2718
2719 #-Rearranged:  If the sequence shown is DNA and a member of the
2720 immunoglobulin family, this qualifier is used to denote that the sequence
2721 is from rearranged DNA.  Do not include extra text when using this
2722 modifier.
2723
2724 #Rev-PCR-primer-name:  Name or description of reverse primer used for
2725 amplification.
2726
2727 #Rev-PCR-primer-seq:  Sequence of reverse primer used for amplification.
2728
2729 #-Segment: Name of viral genome fragmented into two or more nucleic acid
2730 molecules.
2731
2732 #-Sex:  Sex of the organism from which the sequence derives.
2733
2734 #-Subclone:  Name of subclone from which sequence was obtained.
2735
2736 #-Tissue-lib:  Tissue library from which the sequence was obtained.
2737
2738 #-Tissue-type:  Type of tissue from which sequence derives.
2739
2740 #-Transgenic:  Identified organism that was the recipient of transgenic
2741 DNA.  Do not include extra text when using this modifier.
2742
2743 **Organism Subpage
2744
2745 #Choose a modifier from the pull-down menu on the left side of the page
2746 and type the appropriate name on the right side of the page.  If you do
2747 not find appropriate modifiers in the scroll down list, you can enter
2748 additional organism information as text in the field at the bottom of
2749 the page. You may select as many modifiers as you want.
2750
2751 #The following is a description of the available modifiers:
2752
2753 #-Acronym:  Standard synonym (usually of a virus) based on the initials
2754 of the formal name.  An example is HIV-1.
2755
2756 #-Anamorph: The scientific name applied to the asexual phase of a fungus.
2757
2758 #-Authority: The author or authors of the organism name from which sequence
2759 was obtained.
2760
2761 #-Biotype:  See biovar.
2762
2763 #-Biovar:  Variety of a species (usually a fungus, bacteria, or virus)
2764 characterized by some specific biological property (often geographical,
2765 ecological, or physiological).  Same as biotype.
2766
2767 #-Breed: The named breed from which sequence was obtained (usually applied
2768 to domesticated mammals).
2769
2770 #-Chemovar:  Variety of a species (usually a fungus, bacteria, or virus)
2771 characterized by its biochemical properties.
2772
2773 #-Common:  Common name of the organism from which sequence was obtained.
2774
2775 #-Cultivar:  Cultivated variety of plant from which sequence was obtained.
2776
2777 #-Ecotype: The named ecotype (population adapted to a local habitat) from
2778 which sequence was obtained (customarily applied to populations of
2779 Arabidopsis thaliana).
2780
2781 #-Forma: The forma (lowest taxonomic unit governed by the nomenclatural
2782 codes) of organism from which sequence was obtained. This term is usually
2783 applied to plants and fungi.
2784
2785 #-Forma-specialis: The physiologically distinct form from which sequence
2786 was obtained (usually restricted to certain parasitic fungi).
2787
2788 #-Group:  Do not select this item.
2789
2790 #-Isolate:  Identification or description of the specific individual
2791 from which this sequence was obtained.  An example is Patient X14.
2792
2793 #-Old name:  Do not select this item.
2794
2795 #-Pathovar:  Variety of a species (usually a fungus, bacteria or virus)
2796 characterized by the biological target of the pathogen.  Examples
2797 include Pseudomonas syringae pathovar tomato and Pseudomonas syringae
2798 pathovar tabaci.
2799
2800 #-Serogroup:  See serotype.
2801
2802 #-Serotype:  Variety of a species (usually a fungus, bacteria, or virus)
2803 characterized by its antigenic properties.  Same as serogroup and
2804 serovar.
2805
2806 #-Serovar:  See serotype.
2807
2808 #-Specific-host:  When the sequence submission is from an organism that
2809 exists in a symbiotic, parasititc, or other special relationship with
2810 some second organism, use this modifier to identify the name of the
2811 host species.
2812
2813 #-Specimen-voucher: An identifier of the individual or collection of the
2814 source organism and the place where it is currently stored, usually an
2815 institution.
2816
2817 #-Strain:  Strain of organism from which sequence was obtained.
2818
2819 #-Subgroup:  Do not select this item.
2820
2821 #-Sub-species:  Subspecies of organism from which sequence was obtained.
2822
2823 #-Substrain:  Sub-strain of organism from which sequence was obtained.
2824
2825 #-Subtype:  Subtype of organism from which sequence was obtained.
2826
2827 #-Synonym: The synonym (alternate scientific name) of the organism name
2828 from which sequence was obtained.
2829
2830 #-Teleomorph: The scientific name applied to the sexual phase of a fungus.
2831
2832 #-Type:  Type of organism from which sequence was obtained.
2833
2834 #-Variety:  Variety of organism from which sequence was obtained.
2835
2836 **GenBank Subpage
2837
2838 #Please do not use this form.  This field is reserved for information from
2839 NCBI's taxonomy database.
2840
2841 *Miscellaneous Page
2842
2843 **Synonyms Subpage
2844
2845 #If there are alternative names for the organism from which the sequence
2846 was derived, enter them here.  Please be aware that this is the
2847 appropriate field only for alternative names for the organism, not for
2848 alternative gene or protein names.
2849
2850 **Cross-Refs Subpage
2851
2852 #This page is for use by database staff only.
2853
2854 >Publications
2855
2856 *Overview:  Descriptor or Feature?
2857
2858 #Sequin allows two types of publications to be entered, Publication
2859 Descriptors and Publication Features.  Publication Descriptors are
2860 bibliographic references that, like other descriptors, cover an entire
2861 sequence, or an entire set of sequences, in an entry.  Publication
2862 Features are bibliographic references that, like other features, cover
2863 a specific interval on a given sequence.
2864
2865 #Publications are entered into the Reference field of the database
2866 entry. References are citations of unpublished, in press, or published
2867 works that are relevant to the submitted sequence. Publications
2868 should provide information regarding the principle cloning and
2869 determination of the sequence within the record.
2870
2871 #In general, there is one publication describing a sequence, and a
2872 Publication Descriptor should be used. To enter a Publication
2873 Descriptor, select Publications under the Annotate menu and click on
2874 Publication Descriptor.
2875
2876 #However, if one publication describes the cloning of the 5' end of a
2877 gene, and another publication describes the cloning of the 3' end of
2878 the gene, Publication features may be used.  To make a publication
2879 feature, choose Publication Feature in the Publications section of the
2880 Annotate menu.  Enter the information about the publication, and then
2881 enter the nucleotide interval to which the publication refers on the
2882 Location page.
2883
2884 *Citation on Entry Form
2885
2886 **Status
2887
2888 #Using the radio buttons, select one of the three options:
2889
2890 #-Unpublished: Select this option if a manuscript has been written but
2891 not yet submitted or has been submitted for publication but has not yet
2892 been accepted.
2893
2894 #-In Press: The article has been accepted for publication but is not yet
2895 in print.
2896
2897 #-Published: The article has been published.
2898
2899 **Class
2900
2901 #Using the radio buttons, select the type of publication in which the
2902 sequence will appear.
2903
2904 #-Journal
2905
2906 #-Book Chapter
2907
2908 #-Book
2909
2910 #-Thesis/Monograph
2911
2912 #-Proceedings Chapter:  Abstract from a meeting
2913
2914 #-Proceedings:  A meeting
2915
2916 #-Patent
2917
2918 #-Online Publication: Used for journals which publish strictly online and
2919 do not issue print copies.
2920
2921 #-Submission
2922
2923 **Scope
2924
2925 #Using the radio buttons, select one of the options.
2926
2927 #-Refers to the entire sequence: Most publications should be classified
2928 as such.
2929
2930 #-Refers to part of the sequence: For use only when a publication
2931 discusses only part of the presented sequence.  You must enter the
2932 locations in the location tab in later forms.  This selection is only
2933 valid when adding a Publication feature, not descriptor.
2934
2935 #-Cites a feature on the sequence: This selection should only be made in
2936 limited cases.  Its use must coincide with the use of the /citation
2937 qualifier on the given feature.
2938
2939 #After you have filled out the Citation on Entry form, click on
2940 "Proceed" to see the next form.
2941
2942 *Citation Information Form (General)
2943
2944 **Authors Page
2945
2946 ***Names Subpage
2947
2948 #Please enter the names of the authors.  Note that the first name of the
2949 author is listed first.  You can add as many authors to this page as
2950 necessary. After you type in the name of the third author, the box
2951 becomes a spreadsheet, and you can scroll down to the next line by
2952 using the thumb bar.  The suffix toggle allows the addition of common
2953 suffixes to the author name.  The consortium field should be used when
2954 a consortium is responsible for the sequencing or publication of the
2955 data.  The consortium should not be the department or institute
2956 affiliation of the authors.  Individual authors may be listed along
2957 with a consortium name.
2958
2959 ***Affiliation Subpage
2960
2961 #Please enter information about the institution where the sequencing was
2962 performed.
2963
2964 #Other pages in the Citation Information Form will be different,
2965 depending on the Class of publication selected in the Citation on Entry
2966 Form. Instructions for filling out the Citation Information Form for
2967 Journals is included here.
2968
2969 *Citation Information Form (If Selected Class Was Journal)
2970
2971 **Title Page
2972
2973 #Enter title for manuscript in the box.
2974
2975 **Journal Page
2976
2977 #Fill in the appropriate Journal, Volume, Issue, Pages, Day, and Year
2978 fields by typing information into the boxes.  Select the month with the
2979 pop-up menu. If necessary, choose an option from the Erratum pop-up
2980 menu and explain the erratum.
2981
2982 #If you are running Sequin in its
2983 <A HREF="#NetConfigure">
2984 network-aware
2985 </A>
2986 mode, the program will look up the Title, Author, and Journal
2987 information in the MEDLINE database if you supply it with some minimal
2988 information.  For example, if you know the MUID (MEDLINE Unique
2989 Identifier) of the publication, enter it in the appropriate box and
2990 select "Lookup By MUID."  Sequin will automatically retrieve the rest
2991 of the information.  One way to find the MUID of the publication is to
2992 look up the publication with the NCBI's
2993
2994 <A HREF="http://www.ncbi.nlm.nih.gov/Entrez">
2995 Entrez
2996 </A>
2997 service. Alternatively, if you do not know the MUID, enter the Journal,
2998 Volume, Pages, and Year.  Then select "Lookup Article".  Sequin will
2999 retrieve the missing Title and Author information.
3000
3001 #The PubStatus toggle is used by database staff.  If you have used the
3002 "Lookup by MUID" or "Lookup by PMID" functions, this field may be
3003 populated.  Please do not edit the information.
3004
3005 **Remark Page
3006
3007 #This page is reserved for use by the database staff.
3008
3009 >File Menu
3010
3011 *About Sequin
3012
3013 #Details about the current version of Sequin.
3014
3015 *Help
3016
3017 #Launches the help documentation.
3018
3019 *Open
3020
3021 #Open an existing entry.  This option will open a record that has been
3022 previously saved in Sequin.  Furthermore, for analysis purposes, it can also
3023 open
3024 a FASTA-formatted sequence file.  The sequence will be displayed in Sequin and
3025 can be analyzed with tools such as CDD Search, but it should not be submitted,
3026 because it does not have the appropriate annotations.
3027
3028 *Close
3029
3030 #Close this entry.
3031
3032 *Export
3033
3034 #Exports the currently displayed format to a file.  Do not use export
3035 ASN1 for submission of sequences to the database.
3036
3037 *Duplicate View
3038
3039 #Duplicates the entry.  You can then view the entry simultaneously in
3040 different Display Formats.
3041
3042 *Save
3043
3044 #Saves the entry.  Note:  This merely saves the entry so you can go back
3045 and edit it.  It does not prepare the entry for submission to the
3046 database, that is, it does not validate the entry.
3047
3048 *Save As
3049
3050 #See Save.
3051
3052 *Save as Binary Seq-entry
3053
3054 #Saves the file in a compressed format and should be used only when the
3055 file is to be imported into other analysis programs.  Do not use this
3056 option to save files for submission directly to GenBank.
3057
3058 *Restore
3059
3060 #Replaces the displayed record with a previously saved version.  This
3061 feature is useful if you have made unwanted changes since you last saved
3062 the record.
3063
3064 *Prepare Submission
3065
3066 #Prepares the entry for submission to the database.  See
3067 <A HREF="#SubmittingtheFinishedRecordtotheDatabase">
3068 Submitting the Finished Record to the Database
3069 </A>
3070  in the Sequin help documentation.
3071
3072 *Print
3073
3074 #Prints the window that is currently selected.  The selected window can
3075 be one of the Sequin forms or pages, or the help documentation.
3076
3077 *Quit
3078
3079 #Exit from Sequin.
3080
3081 >Edit Menu
3082
3083 *Copy
3084
3085 #Copy the selected item.
3086
3087 *Clear
3088
3089 #Clear the selected item.
3090
3091 *Edit Sequence
3092
3093 #To edit a single sequence, select the sequence identifier in the Target
3094 Sequence pop-up menu, and click on Edit sequence.  The sequence editor
3095 will be launched for that sequence.  The
3096 <A HREF="#SequenceEditor">
3097 sequence editor
3098 </A>
3099 is discussed in more detail below.
3100
3101 *Alignment Assistant
3102
3103 #This option will launch the Alignment Assistant which is discussed in
3104 more detail
3105
3106 <A HREF="#Workingwithsetsofalignedsequences">
3107 below
3108 </A>
3109 .
3110
3111 *Edit Submitter Info
3112
3113 #Opens up the Submission Instructions form, which allows you to enter
3114 additional information about the person submitting the record.  Much of
3115 this information was entered on the first form in Sequin, the Submitting
3116 Authors form.
3117
3118 #You can also save the information from the Submitting Authors form
3119 here, so that you can use it in subsequent Sequin submissions.  Click
3120 on "Edit Submitter Info" and, under the file menu in the resulting
3121 Submission Instructions form, click on Export Submitter Info to save
3122 the information to a file.  For subsequent Sequin submissions, if you
3123 have already saved the submittor information, click on Import Submitter
3124 Info under the File menu on the Submission page of the Submitting
3125 Authors form.
3126
3127 **Submission Page
3128
3129 #Indicate the type of submission.  If it is a new submission, select
3130 New.  If you are updating an existing submission in order to resubmit it
3131 to the databases, select Update.  Check either the "Yes" or "No" radio
3132 button to indicate if the record should be released before publication.
3133 If you select "Yes", the entry will be released to the public after the
3134 database staff has added it to the database. If you select "No", fields
3135 will appear in which you can indicate the date on which the sequences
3136 should be released to the public.  The submission will then be held back
3137 until formal publication of the sequence or
3138 GenBank Accession number, or until the Release Date, whichever comes
3139 first. If you have any special instructions, enter them in the box at
3140 the bottom of the page.
3141
3142 **Contact Page
3143
3144 #Update the name, affiliation, or contact numbers of the person
3145 submitting the record.  Please supply a fax number to facilitate
3146 communication with database staff.
3147
3148 **Citation Page
3149
3150 #Update the names and affiliation of the people who should receive
3151 scientific credit for the generation of sequences in this entry.  The
3152 address should list the principal institution in which the sequencing
3153 and/or analysis was carried out.  If you are submitting the record as
3154 an update to the databases, explain the reason for the update on the
3155 Description subpage.
3156
3157 *Update Sequence
3158
3159 #This selection allows you to replace a sequence with another sequence,
3160 merge two sequences that overlap at their ends, 'patch' a corrected
3161 fragment of a sequence to the current sequence, or copy features from
3162 one sequence to another.
3163
3164 #Use Single Sequence to import a sequence in FASTA or ASN.1 format (for
3165 example, a sequence record that has already been saved in Sequin). If
3166 you are running Sequin in
3167
3168 <A HREF="#NetConfigure">
3169 Network Aware mode,
3170 </A>
3171 you can use Download Accession to import a record from Entrez. The
3172 Multiple Sequences option allows you to update multiple sequences using
3173 either FASTA or ASN.1 formats.  In either format, each sequence
3174 identifier must be identical in the new and old sequences.
3175
3176 #After you import the updated sequence, a new window will open that
3177 displays two graphical views and the text of the alignment of the new
3178 and old sequence. The first graphic displays the relative length of the
3179 two sequences and the length of the overlapping region between
3180 sequences.  The second graphic represents any inserts, deletions, or
3181 point changes within the aligned region between the new and old
3182 sequences.  Clicking on a region in this graphic will scroll to the
3183 corresponding nucleotide sequence in the alignment text below.
3184
3185 #The Sequence Update box to the left shows the action that will be
3186 performed upon updating the sequence, i.e., no change, replace, extend
3187 5', extend 3', or patch.  The patch function allows you to replace an
3188 internal fragment of the sequence without affecting flanking regions.
3189 You can also override the alignment between the new and old sequence
3190 using the Ignore alignment checkbox to force a sequence change of
3191 replace, Extend 5' or Extend 3'.  This option allows you to append new
3192 sequence to with no overlap.
3193
3194 #If the current sequence has annotation, you can use the Existing
3195 Features box to determine whether the annotation should remain or be
3196 removed upon updating the sequence.  The Do not remove option is the
3197 default.  However, you may chose to remove annotated features only in
3198 the aligned area, outside the aligned area, or to remove all currently
3199 annotated features.
3200
3201 #When updating via Download Accession or an ASN.1 file, the Import
3202 Features box allows you to specify whether features from the new file
3203 should be imported to the existing record.   The dialog offers
3204 different options for cases where the features on the new file are
3205 identical to those on the existing record.
3206
3207 #If you are using the Multiple Sequences option, you may choose to
3208 review the sequences and update them one by one using the Update this
3209 Sequence box at the bottom of the window.  You may skip a sequence
3210 update or choose to update all sequences at once without reviewing them
3211 in the Update Sequence dialog.
3212
3213 #In any case, please carefully review the sequence and annotation in the
3214 record viewer after using the Update Sequence function.
3215
3216 *Extend Sequence
3217
3218 #This selection functions similar to the
3219
3220 <A HREF="#UpdateSequence">
3221 Update Sequence
3222 </A>
3223
3224 function.  However, you can only extend the existing sequence in either
3225 the 5' or 3' direction in cases with no overlap between the existing
3226 and new sequences.
3227
3228 *Feature Propagate
3229
3230 #This selection allows you to propagate any annotated feature from one
3231 sequence in an aligned set to other sequences within the set. For
3232 example, if one nucleotide sequence in the alignment contains a CDS
3233 feature, you can annotate a similar CDS on the other nucleotide
3234 sequences in the set.
3235
3236 #The default source of features to be propagated is the first member
3237 of the set.  If you would like to use a different entry as the source of
3238 the features, scope to that entry in the Target Sequence menu before
3239 selecting Feature Propagate from the Edit menu.
3240
3241 #The Feature Propagate window allows you to select which sequences
3242 should receive the new annotation and which features will be
3243 propagated. You can also select whether the features will be extended
3244 or split at gaps in the alignment.  The split at gaps selection will
3245 produce two features, one on either side of the gap within the
3246 alignment.  If you are propagating a CDS feature, you may specify that
3247 the translation end or extend through internal stop codons.  You may
3248 also extend the translation after the stop codon on the source entry by
3249 chosing to translate the CDS after partial 3' boundary.  If the CDS
3250 that you are propagating to other records is partial on either end, you
3251 should select the 'Cleanup CDS partials after propagation' check box.
3252 This will retain the partial nature of the CDS features on all records.
3253  The fuse adjacent propagated intervals function will create one
3254 feature from two of the same type that contain abutting nucleotide
3255 intervals due to the nature of the alignment used for propagation.
3256
3257 *Add Sequence
3258
3259 #This selection allows you to add a new sequence to an existing
3260 population, mutation, phylogenetic, or environmental sample set.
3261 You may import the new entry in FASTA format or ASN.1 format (for
3262 example, a sequence record that has been saved in Sequin).
3263
3264 *Parse File to Source
3265
3266 #This selection allows you to add unique information for one source
3267 qualifier for each of the records in a batch or set.  The input file
3268 must be in the format of a tab-delimited, two column table.  The first
3269 column should list the SeqID exactly as it was listed in the original
3270 FASTA file.  The second column should list the text value for the
3271 desired source qualifier for each record. Once the file has been
3272 imported, a pop-up box will appear with the source qualifiers listed in
3273 the pull down menus.  The qualifiers are separated into three menus:
3274 one for taxonomic information, one for the Organism modifiers and one
3275 for the Source modifiers.   For example, in order to add the clone
3276 designations 57 and 49 to the sequences labeled seq1 and seq2, the table
3277
3278 seq1    57
3279 seq2    49
3280
3281 should be used and clone should be selected from the Source modifiers
3282 pull-down menu.
3283
3284 >Search Menu
3285
3286 *Find ASN.1
3287
3288 #Under this command, you can find and replace strings of letters in
3289 those fields of your submission that contain manually entered data.
3290 The fields that can be altered are Locus, Definition, Accession,
3291 Keywords, Source, Reference, and Features. To use this option, select
3292 Find and fill the Find and Replace lines with the appropriate text.
3293 Note that you cannot edit the sequence in this way.
3294
3295 *Find FlatFile
3296
3297 #Under this command, you can find strings of letters in all fields of
3298 your submission.  You can use the Find First and Find Next buttons to
3299 identify the specified text sequentially through the flatfile.
3300
3301 *Find by Gene
3302
3303 #This option allows you to move quickly in the record viewer to a gene
3304 feature containing the specified gene symbol.
3305
3306 *Find by Protein
3307
3308 #This option allows you to move quickly in the record viewer to a CDS
3309 feature containing the specified product name.
3310
3311 *Find by Position
3312
3313 #This option allows you to move quickly in the record viewer to any
3314 feature annotated at the specified nucleotide location.
3315
3316 *Validate
3317
3318 #This option detects discrepancies between the format of your submission
3319 and that required by the database selected for entry.  If discrepancies
3320 are present, it suggests ways in which to correct them. See the topic on
3321
3322 <A HREF="#SubmittingtheFinishedRecordtotheDatabase">
3323 Submitting the Finished Record to the Database
3324 </A>
3325  in the Sequin help documentation.
3326
3327 *CDD Search
3328
3329 #Performs a CDD BLAST search of the selected sequence against the
3330 NCBI's
3331 <A HREF="http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml">
3332 Conserved Domain Database
3333 </A>
3334 .  To do a CDD BLAST search, Sequin must be in its network aware mode.
3335
3336 #CDD currently contains domains derived from two popular collections,
3337 Smart and Pfam, plus contributions from colleagues at NCBI.  The source
3338 databases also provide descriptions and links to citations.  Since
3339 conserved domains correspond to compact structural units, CDs contain
3340 links to 3D-structure via Cn3D whenever possible.
3341
3342 #The results of the CDD search will be displayed in the record
3343 viewer.  These results are for your use only and should be removed
3344 from the record before submission.
3345
3346 *Vector Screen
3347
3348 #This option allows you to run a BLAST search of your nucleotide
3349 #sequence(s) against NCBI's
3350 <A HREF="http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html">
3351 UniVec
3352 </A>
3353 database.  We highly recommend that you run this analysis and remove
3354 any vector contamination before submission.  The UniVec database
3355 contains only one copy of every unique sequence segment from a large
3356 number of vectors.   It also contains sequences for adapters, linkers
3357 and primers commonly used.
3358
3359 #To run Vector screen on a submission containing multiple sequences,
3360 scope to ALL SEQUENCES in the Target Sequence pull-down before running
3361 the analysis.  If there are many sequences, a status bar will appear
3362 indicating the progress of the search.  If no contamination is found, a
3363 pop-up box will appear to notify you.  If contamination is found, a
3364 miscellaneous feature will be annotated on the flatfile with the
3365 location of the contamination.  Details will include the relative
3366 strength of the BLAST hit.  You must trim the nucleotide sequence to
3367 remove this feature before submission.
3368
3369 *ORF Finder
3370
3371 #The ORF Finder shows a graphical representation of all the open reading
3372 frames (ORFs) in the nucleotide sequence.  This tool allows you to
3373 select ORFs and have them appear as coding sequence (CDS) features on
3374 the sequence record.
3375
3376 #The ORFs, indicated by colored boxes, are defined as the longest
3377 sequence containing a start codon and a stop codon.  If the
3378 entire nucleotide sequence is an open reading frame but does not
3379 contain an initial start or a terminal stop codon, it will be indicated
3380 as an ORF as well.  All six reading frames are shown; the top three
3381 boxes represent the plus strands, and the bottom three boxes the minus
3382 strands.  The nucleotide sequence intervals of the ORFs are displayed in
3383 descending length order on the right side of the window.  Intervals on
3384 the complementary (minus) strand are indicated by a 'c'.  ORFs can be
3385 selected by clicking either directly on them or on the sequence
3386 interval.  The ORF length button selects the length of ORFs that are
3387 displayed.  For example, the default of 10 shows all ORFs that are
3388 greater than 10 nucleotides in length.  Clicking on the box labelled ORF
3389 changes the display; potential start codons are indicated in white, and
3390 stop codons in red.  ORFs can be selected in this display also.  The
3391 definition of start and stop codons is dependent on the genetic code
3392 that was selected. Be sure to choose the appropriate genetic code for
3393 translating the sequence before opening the ORF finder.
3394
3395 *Select Target
3396
3397 #This option changes the sequence that is selected in the Target
3398 Sequence pop-up.  Type the SeqID of the sequence in the box, and the
3399 record viewer will be updated to display that sequence.
3400
3401 >Misc Menu
3402
3403 *Style Manager
3404
3405 #The Style manager allows you to choose between different formats in
3406 which to view the Graphical Display Format.  The graphical display is
3407 selected by choosing the Graphic display format on the record viewer.
3408 Using the Style Manager, you can also copy the style or modify it to
3409 suit your needs.
3410
3411 *Net Configure
3412
3413 #As a default, Sequin is available as a stand-alone program.  However,
3414 the program can also be configured to exchange information with the NCBI
3415 (GenBank) over the Internet.  The network-aware mode of Sequin is
3416 identical to the stand-alone mode, but it contains some additional
3417 useful options.
3418
3419 #Sequin will only function in its network-aware mode if the computer on
3420 which it resides has a direct Internet connection.  Electronic mail
3421 access to the Internet is insufficient.  In general, if you can install
3422 and use a WWW browser on your system, you should be able to install and
3423 use network-aware Sequin.  Check with your system administrator or
3424 Internet provider if you are uncertain as to whether you have direct
3425 Internet connectivity.
3426
3427 #There are two ways to change Sequin into its network-aware mode.  If
3428 you are still on the initial Welcome to Sequin form, select Net
3429 Configure under the Misc menu.  If you have already worked on a Sequin
3430 submission and are looking at the record in the record viewer, select
3431 the Net Configure option from the Misc menu.
3432
3433 #Most users will be able to use the default (Normal) settings on the
3434 Network Configuration page; select Accept to complete the configuration
3435 process.
3436
3437 #If a "Normal" Connection does not work, you may need to select the
3438 Firewall Connection.  Contact your system administrator to determine
3439 what to enter into the Proxy and Port fields.  If you do not have
3440 access to the domain name server (DNS), uncheck this box.
3441
3442 #The Timeout pop-up selects the length of time that your local copy of
3443 Sequin will wait for a reply from the NCBI server.  You may need to set
3444 this number higher (i.e., 60 seconds or 5 minutes) if you are outside
3445 of the United States or have a bad internet connection.
3446
3447 #If you have problems setting up the network configuration, contact
3448
3449 <a href="mailto:info@ncbi.nlm.nih.gov">
3450 info@ncbi.nlm.nih.gov.
3451 </a>
3452
3453 #If you would like to change Sequin back to its stand-alone mode, select
3454 Net Configure again from the Misc menu.  Click on Connection: None.
3455
3456 #The network-aware mode of Sequin allows you to perform a number of
3457 additional, important functions.  These functions all appear as
3458 additional menu items.  A brief description of these functions follows.
3459 Further descriptions are available as indicated elsewhere in the help
3460 documentation.
3461
3462 **Updating Existing GenBank Records
3463
3464 #Using Sequin in its network-aware mode, you can download an existing
3465 GenBank record from Entrez using the GenBank accession number or GI
3466 identification number (NID). You can then use Sequin to make any
3467 necessary changes to the record, and resubmit it to GenBank as a
3468 sequence update.
3469
3470 <A HREF="#WelcometoSequinForm">
3471 Instructions
3472 </A>
3473 for submitting sequence updates are presented under the Welcome to
3474 Sequin Form. You can download any record from Entrez and look at it in
3475 Sequin. However, you can only formally update those records which you
3476 have submitted since submitters retain editorial control of their
3477 records.
3478
3479 **Performing a PubMed Look-Up
3480
3481 #In its network-aware mode, Sequin can import the relevant sections of a
3482 PubMed record directly into a sequence submission record.  Rather than
3483 typing in the entire citation, you can enter minimal information, such
3484 as the PubMed Unique Identifier (PMID), or Journal name, volume, year,
3485 and pages.  The
3486
3487 <A HREF="#JournalPage">
3488 PubMed lookup
3489 </A>
3490 is explained in the section of the documentation entitled Publications.
3491
3492 **Performing a Taxonomy Look-up
3493 #In its network-aware mode, Sequin can look
3494 up the taxonomic lineage of an organism from the NCBI's Taxonomy
3495 database.  This look-up is normally performed by the NCBI database staff
3496 after the record has been submitted to GenBank.  If you look up the
3497 taxonomy before submitting the sequence, you can make a note in the
3498 record of any disagreements.  The
3499 <A HREF="#LineageSubpage">
3500 taxonomy lookup
3501 </A>
3502 is explained in the section of the documentation covering
3503 Biological Source: Organism page: Lineage subpage.
3504
3505 **Accessing the NCBI DeskTop
3506 #The NCBI DeskTop displays the internal
3507 structure of the record being viewed in Sequin.  The
3508 <A HREF="#NCBIDeskTop">
3509 DeskTop
3510 </A>
3511 is explained under the Misc menu.
3512
3513 *NCBI DeskTop
3514
3515 #This option is only available if you are running Sequin in its
3516 <A HREF="#NetConfigure">
3517 network-aware
3518 </A>
3519 mode.
3520
3521 #The NCBI DeskTop provides a view of the internal structure of the
3522 Sequin record, the ASN.1.  Its display resembles a Venn diagram and
3523 represents all the structures represented in the ASN.1 data model.
3524
3525 #In addition, a number of undocumented software tools from the NCBI can
3526 be accessed from the DeskTop.  These tools are components of the NCBI
3527 portable software Toolkit.  You can also customize these functions using
3528 the Toolkit with your own software tools.  The Toolkit and its
3529 documentation are available from the NCBI by anonymous
3530 <A HREF="ftp://ftp.ncbi.nih.gov/toolbox/README">
3531 FTP.
3532 </A>
3533
3534 #The DeskTop should only be used by very seasoned users.  At this time,
3535 we are not providing any documentation for these specialized functions.
3536
3537 >Annotate Menu
3538
3539 #This menu allows you to enter features and descriptors on the sequence.
3540
3541 #The first six options, Genes and Named Regions, Coding Regions and
3542 Transcripts, Structural RNAs, Bibliographic and Comments, Sites and
3543 Bonds, and Remaining Features refer to types of Features that can be
3544 added to the sequence. Features are described in more detail in the
3545 above section entitled
3546 <A HREF="#Features">
3547 Features.
3548 </A>
3549
3550 #If you are submitting a set of similar sequences, you can add the same
3551 feature across the entire span of each by using the Batch Feature Apply
3552 option.  The feature must span the entire nucleotide sequence of each
3553 member; you can not annotate specific nucleotide locations using this
3554 option (for this, see
3555
3556 <A HREF="#FeaturePropagate">Feature Propagate</A>).
3557
3558 For each feature, you will be prompted to designate whether the feature
3559 is 5' or 3' partial and whether is is on the plus or minus strand.  You
3560 may also add a comment or other qualifier to the feature.  The Add
3561 Qualifier option allows you to add a qualifier to an existing feature.
3562 You must specify the feature and qualifier in the Add Qualifier pop-up
3563 box.  Source qualifiers can be added to all entries using the Add
3564 Source Qualifier option.  Qualifiers specific to the CDS and gene can
3565 be added using Add CDS-Gene-Prot-mRNA and RNA qualifiers using Add RNA
3566 Qual.  In each case, a pop-up box appears with qualifier options
3567 appropriate for that feature.
3568
3569 #The Batch Feature Edit function allows you to edit existing qualifiers.
3570  For each menu choice, a pop-up box allows you to select the feature
3571 containing the qualifier and the specific qualifier to be edited.  You
3572 can use the Find/Replace text boxes to edit the information contained
3573 within the qualifier.
3574
3575 #The Publications option allows you to add a Publication Feature or
3576 Publication Descriptor to the record.  Publications are described in
3577 more detail in the above section entitled
3578
3579 <A HREF="#Publications">
3580 Publications.
3581 </A>
3582
3583 #The Descriptors option allows you to add Descriptors to the
3584 record.  Descriptors are described in more detail in the section
3585 entitled
3586 <A HREF="#Descriptors">
3587 Descriptors,
3588 </A>
3589 above.
3590
3591 #The Generate Definition Line option will generate a title for your
3592 sequence based on the information provided in the record.  This option
3593 will work for single sequences as well as sets of sequences, and can
3594 handle complex annotations with multiple features.  The title will
3595 follow GenBank conventions, but may be modified by the database staff
3596 if it is not appropriate.  The title you enter here will replace any
3597 title you entered elsewhere in the submission, for example, any title
3598 that was attached to the nucleotide sequence.  For a description of
3599 definition lines, see
3600
3601 <A HREF="#NucleotideDefinitionLine(Title)">
3602 Nucleotide Definition Line (Title)
3603 </A>
3604 , above.
3605
3606 >Options Menu
3607
3608 *Font
3609
3610 #Use this item to change the display font.  From the pop-up menus,
3611 choose the style and size of type.  For additional changes, mark the
3612 Bold, Italic, or Underline check boxes. The default font is 10-point
3613 Courier.
3614
3615 >Sequence Editor
3616
3617 #This editor allows you to modify the nucleotide or amino acid sequences
3618 and corresponding annotation in your entry.  Although the Sequence Editor
3619 does allow you to undo changes you make to the sequence, we strongly
3620 suggest that you save a copy of the entry before launching the Sequence
3621 Editor so that you can revert to it if necessary.
3622
3623 *Starting the Sequence Editor
3624
3625 #The sequence that appears in the editor is dependent on the sequence(s)
3626 selected in the Target Sequence pull-down list.  There are two ways to
3627 launch the sequence editor for nucleotide sequences.  First, you can
3628 double click within sequence in any display format of the record viewer.
3629 A window containing the DNA sequence will appear.  Second, in the record
3630 viewer, select the sequence that you would like to edit in the Target
3631 Sequence pop-up menu.  Click on Edit Sequence under the Edit menu. You
3632 can launch the editor for protein sequences by selecting the protein
3633 sequence in the Target Sequence pop-up menu and double clicking within
3634 the protein sequence. A window containing the protein sequence will
3635 appear.
3636
3637 *Moving around the Sequence Editor
3638
3639 #The cursor can be moved with the mouse or the arrow keys.  The display
3640 window will change to show the position of the cursor.  The sequence
3641 location of the first residue on each line is indicated on the left side
3642 of the window.  The cursor location, or the range of sequences selected
3643 by the mouse, is shown in the upper left corner of the window.  If you
3644 want to move the cursor to a specific location, type the number in the
3645 box on the top left of the sequence editor window, and hit the Go to
3646 button.  If you want to look at a specific sequence, but not move the
3647 cursor to it, type the number in the upper right box of the window and
3648 hit the Look at button.
3649
3650 *Editing Sequence and Existing Annotation
3651
3652 #Select a piece of sequence by highlighting it with the mouse.  To
3653 select the entire sequence, click on a sequence location number on the
3654 left side of the window.  Any sequence that is highlighted in the
3655 Sequence Editor will show up as a box on the sequence when it is viewed
3656 in the Graphic Display Format.
3657
3658 #One way to insert and delete residues is with the mouse.  Move the
3659 cursor to the appropriate location and type.  Text will be inserted to
3660 the left of the cursor.  Delete sequence with the backspace or delete
3661 key.  Text will be deleted to the left of the cursor.  To delete a block
3662 of sequence, highlight it with the mouse and use the delete or backspace
3663 key.
3664
3665 #Another way to insert and delete residues is with options under the Edit
3666 menu of the Sequence Editor.  Use Cut to remove, or Copy to copy,
3667 highlighted residues.  Copied residues can then be pasted elsewhere
3668 within the sequence by using the Paste option.
3669
3670 #Features annotated via the record viewer will be displayed in a
3671 graphical format within the sequence editor.  CDS features will be be
3672 displayed as a blue line across the appropriate nucleotide location.  All
3673 other features will be displayed as a black line. To the left of the
3674 line, the name of the feature is displayed.  In the case of CDS or mRNA
3675 features, the product name is shown.  For gene features, the gene locus
3676 is shown.
3677
3678 #Double-clicking on the feature will launch the feature editor just as in
3679 the record viewer.  However, you can also change the nucleotide location
3680 of any feature within the graphical view.  To move the entire feature,
3681 select the feature and drag it to the appropriate location while holding
3682 down the mouse button.  To alter the 5' or 3' end of a feature,  click on
3683 the feature's end and drag to the new location while holding down the
3684 mouse button.
3685
3686 #Before moving the nucleotide locations of a CDS feature, it may be
3687 useful to view the codons in the current translation.  You can do this by
3688 clicking on the feature line and releasing the mouse button.  A grid will
3689 be displayed that shows the triplet location for the current annotation.
3690 Once you have changed the nucleotide location of a CDS feature in the
3691 graphical view, you can see the new translation by using the Translate
3692 CDS button at the bottom of the window.
3693
3694 #To save changes you have made to the sequence, press the Accept button
3695 at the bottom of the Sequence Editor display window.  If you do not want
3696 to save the changes, press the Cancel button at the bottom of the
3697 Sequence Editor display window.  Selecting either Accept or Cancel will
3698 quit the Sequence Editor and return you to the record viewer.  Any
3699 changes you make will not become a permanent part of the Sequin record
3700 until you Save the record in the record viewer.
3701
3702 #New features can be added using the Features menu.
3703
3704 *Sequence Editor Window Buttons
3705
3706 **Go to
3707
3708 #Moves the cursor to the indicated location.
3709
3710 **Look at
3711
3712 #Moves the window to the indicated location without moving the cursor.
3713
3714 **Merge Feature Mode/Split Feature Mode
3715
3716 #In merge mode, any new sequence that is entered into a region spanned
3717 by an existing feature becomes part of that feature.  For example, if
3718 you enter new sequence in the middle of a CDS, that sequence will be
3719 translated as part of the CDS.  In split mode, the new sequence
3720 interrupts the feature.  For example, if you enter new sequence in the
3721 middle of a CDS, the CDS will be interrupted by that sequence (see the
3722 location of the CDS in the record viewer).
3723
3724 **Numbering
3725
3726 #Allows the sequence location numbering to be hidden, displayed on the
3727 side, or displayed on the top of the sequence.
3728
3729 **Grid
3730
3731 #Allows the display to show a grid separating each feature and sequence
3732 for easier viewing.
3733
3734 **Show/Hide Features
3735
3736 #This box toggles between hiding and showing the features on a sequence.
3737
3738 **Accept
3739
3740 #Closes the Sequence Editor after saving all of the changes made to
3741 sequences and features.
3742
3743 **Cancel
3744
3745 #Closes the Sequence Editor without saving any changes made to sequences or
3746 features.
3747
3748 **Translate CDS
3749
3750 #Allows translation of coding region features after the location has been
3751 changed within the graphical view.
3752
3753 *Sequence Editor File Menu
3754
3755 **Export
3756
3757 #Allows the export of a range of sequence as a FASTA file or text file.
3758 Using the text option will also export overlapping features if they are
3759 displayed.  If the features are first hidden, only the sequence will be
3760 exported.  All protein translations displayed at the time of export, will
3761 be exported as well.
3762
3763 **Accept
3764
3765 #Closes the Sequence Editor after saving all of the changes made to
3766 sequence and features.
3767
3768 **Cancel
3769
3770 #Closes the Sequence Editor without saving any changes made to sequences
3771 of features.
3772
3773 *Sequence Editor Edit Menu
3774
3775 **Undo
3776
3777 #Undoes all actions performed in the Sequence Editor since the last save.
3778
3779 **Redo
3780
3781 #Restores changes removed with Undo option
3782
3783 **Cut
3784
3785 #Removes the highlighted sequence.  This sequence can be pasted elsewhere.
3786
3787 **Paste
3788
3789 #Pastes a cut or copied sequence to the right of the cursor.
3790
3791 **Copy
3792
3793 #Copies the highlighted sequence.  This sequence can be pasted elsewhere.
3794
3795 **Find
3796
3797 #Allows you to find DNA or amino acid sequence patterns in your sequence.
3798  The search is case insensitive.  To find an exact match to a DNA
3799 sequence pattern, type the pattern in the box. The number of items found
3800 will be displayed and you can toggle through each instance with the Find
3801 Next button.   To find the reverse complement of the pattern, click on
3802 the reverse complement box at the bottom of the pop-up box.
3803
3804 #To find an exact match to an amino acid seqeunce pattern, type that
3805 sequence in the box, and click on "translate sequence".  Sequin will look
3806 for all occurrences of that pattern in all six open reading frames.  The
3807 DNA sequence encoding that protein sequence in any of the six reading
3808 frames will be hightlighted.
3809
3810 **Translate CDS
3811
3812 #Allows translation of coding region features after the location has been
3813 changed within the graphical view.
3814
3815 **Complement
3816
3817 #Shows the complement of the submitted strand underneath the original.
3818
3819 **Reading Frames
3820
3821 #Shows the indicated phase translation of the selected coding sequence.
3822 You can select any or all of the six reading frames, all reading frames
3823 or all positive or negative frames.
3824
3825 **Protein Mismatches
3826
3827 #Indicates amino acid which does not match conceptual translation
3828 following a nucleotide sequence change.  The original amino acid sequence
3829 will be displayed until the Translate CDS function is used.  Differences
3830 will be indicated by a red box around the amino acid abbreviation.
3831
3832 **On-the-fly Protein Translations
3833
3834 #Creates a second amino acid sequence in the display which retranslates
3835 as the nucleotide sequence is changed to allow side-by-side comparison to
3836 the original amino acid sequence.
3837
3838 *Sequence Editor Features Menu
3839
3840 #The menu contains a long list of all features that can be annotated on a
3841 sequence.  These features are the same as those that are accessible
3842 through the main Sequin Annotate menu.
3843
3844 #You can annotate features either in the Annotate menu or in the Sequence
3845 Editor. If you annotate them in the Annotate menu, you must type in the
3846 nucleotide sequence location of the feature.  However, if you add
3847 features from the Sequence Editor, you can highlight the sequence that
3848 the feature covers, and the location of the sequence will be
3849 automatically entered in the feature location box.  Additional
3850 explanations of how to annotate features are provided in the section on
3851 <A HREF="#Features">
3852 Features.
3853 </A>
3854
3855 >Working with Sets of Aligned Sequences
3856
3857 #Sequin allows you to work with aligned sets of closely related
3858 nucleotide sequences that are part of a population, phylogenetic, or
3859 mutation study.  If the sequences are imported in a pre-aligned format,
3860 such as PHYLIP, Sequin uses this alignment.  If the sequences are
3861 imported individually in FASTA format, Sequin can generate its own
3862 alignment.
3863
3864 #You can view the aligned sequences in the Sequence Alignment Editor. In
3865 the record viewer, select All Sequences in the Target Sequences menu,
3866 and select the Alignment Display Format.
3867
3868 #The Alignment Assistant is launched by selecting Alignment Assistant
3869 from the Edit menu in the record viewer. It can be used to apply
3870 features to the whole set of sequences using the alignment coordinates.
3871 Rather than calculating the nucleotide coordinates for every feature on
3872 every nucleotide sequence, you may select the feature's location using
3873 its alignment coordinates and apply it to every member of the set
3874 simultaneously.  Sequin will calculate the nucleotide locations as they
3875 apply to each member of the set.
3876
3877 *Alignment Assistant Window Buttons
3878
3879 **Go to
3880
3881 #The Go to alignment position and Go to sequence position buttons both
3882 scroll the aligment assistant so that the requested position is
3883 visible. If the requested position is already visible, nothing will
3884 happen.  Unlike the Sequence editor window, the 'go to' button does not
3885 control the cursor position.
3886
3887 **Numbering
3888
3889 #Allows the sequence location numbering to be hidden, displayed on the
3890 side, or displayed on the top of the sequence.
3891
3892 **Grid
3893
3894 #Allows the display to show a grid separating each feature and sequence for easier viewing.
3895
3896 **Features Toggle
3897
3898 #It is possible to view annotated features in the aligment assistant.
3899 The features are displayed as a bar underneath the coordinates for that
3900 feature. The identity of the feature is displayed in the left-hand
3901 column.  The default selection is to have the features Hidden.  You may
3902 display the features associated only with the Target Sequence or
3903 features annotated on All Sequences in the alignment.
3904
3905 *Alignment Assistant File Menu
3906
3907 **Export
3908
3909 #Allows you to export the alignment to a file in three different
3910 formats.  The contiguous and interleaved options export the alignment
3911 accordingly in FASTA+GAP format.  The text representation option saves
3912 the alignment as it appears in the Alignment Assistant.  Note that with
3913 this option features are included if they are displayed at the time of
3914 export.
3915
3916 **Close
3917
3918 #Closes the Alignment Assistant window and saves any changes made.
3919
3920 *Alignment Assistant Edit Menu
3921
3922 **Remove Sequences from Alignment
3923
3924 #Allows you to remove selected sequence(s) from the alignment.  Select
3925 the sequence by clicking on it.  You can select multiple sequences by
3926 holding down the control key.  The sequence will then be highlighted in
3927 grey.  Note that this option will remove the sequence from the
3928 alignment, but it is still present in your submission.
3929
3930 **Validate Alignment
3931
3932 #Checks for problems with the alignment.  If errors are reported, please
3933 review and attempt to fix your alignment before submission.
3934
3935 **Propagate Features
3936
3937 #This function is the same as that available under the Edit Menu in the
3938 record viewer.  A full description is available
3939
3940 <A HREF="#FeaturePropagate">
3941 above
3942 </A>
3943 .
3944
3945 *Alignment Assistant View Menu
3946
3947 **Target
3948
3949 #Allows you to select a sequence within the alignment as the target
3950 sequence.  This can also be done by double-clicking on the sequence
3951 within the alignment.  The SeqID of the target sequence will be
3952 displayed in red.  Features can be displayed on the target sequence
3953 only and it is the sequence used for comparison in the
3954
3955 <A HREF="#ShowSubstitutions">
3956 Show Substitutions
3957 </A>
3958 view.
3959
3960 **Show Substitutions
3961
3962 #Changes the alignment view so that identities are replaced with a "."
3963 and only substitutions are shown.  The substitutions and identities are
3964 relative to the selected target sequence.
3965
3966 *Alignment Assistant Features Menu
3967
3968 #Allows the annotation of features to a single sequence or all sequences
3969 within the alignment.  All features available in this menu are
3970 discussed through the main Sequin Annotate menu.
3971
3972 #Select the feature location by clicking the start location on one of
3973 the sequences, keeping the mouse button depressed, drag the cursor to
3974 the end of the feature location.  The selected area will now be
3975 underlined and red and the alignment coordinates of this area will be
3976 displayed in the upper left of the Alignment Assistant window.
3977
3978 **Apply to Target Sequence
3979
3980 #Allows you to choose a feature to be applied only to the target
3981 sequence.  The locations may be entered manually or can be determined
3982 based on highlighting the sequence as described above.
3983
3984 **Apply to Alignnent
3985
3986 #Allows you to add the selected feature to all sequences within your
3987 alignment based on the alignment coordinates you have selected.  Note
3988 that in the feature pop-up boxes in this menu, the Location will always
3989 be entered as the location relative to the alignment coordinates.
3990
3991 <HR>
3992
3993 <CENTER>
3994 <P>&nbsp
3995 <P CLASS=medium1><B>Questions or Comments?</B>
3996 <BR>Write to the <A HREF="mailto:info@ncbi.nlm.nih.gov">NCBI Service
3997 Desk</A></P>
3998 <P CLASS=medium1>Revised December 2, 2005
3999
4000 </CENTER>
4001
4002 <!--  end of content  -->
4003
4004 </body>
4005 </html>
4006