From dd2bb0b112c9628cc58c6a43c52602d03ee82346 Mon Sep 17 00:00:00 2001 From: Ben Soares Date: Mon, 25 Jul 2022 17:09:43 +0100 Subject: [PATCH] JAL-4036 updated help about new UniProt API --- help/help/html/features/uniprotqueryfields.html | 585 ++++++++------------ .../help/html/features/uniprotsequencefetcher.html | 18 +- 2 files changed, 239 insertions(+), 364 deletions(-) diff --git a/help/help/html/features/uniprotqueryfields.html b/help/help/html/features/uniprotqueryfields.html index 182b206..66082f2 100644 --- a/help/help/html/features/uniprotqueryfields.html +++ b/help/help/html/features/uniprotqueryfields.html @@ -33,359 +33,238 @@ syntax).

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FieldExampleDescription
accessionaccession:P62988Lists all entries with the primary or secondary accession - number P62988.
activeactive:no Lists all obsolete entries.
annotation - annotation:(type:non-positional)
- annotation:(type:positional)
annotation:(type:mod_res - "Pyrrolidone carboxylic acid" evidence:experimental) -
Lists all entries with: -
    -
  • any general annotation (comments [CC])
  • -
  • any sequence annotation (features [FT])
  • -
  • at least one amino acid modified with a Pyrrolidone - carboxylic acid group
  • -
-
author author:ashburner Lists all entries with at least one reference co-authored - by Michael Ashburner.
cdantigen cdantigen:CD233 Lists all entries whose cluster of differentiation number - is CD233.
citation - citation:("intracellular structural proteins")
- citation:(author:ashburner journal:nature) citation:9169874 -
Lists all entries with a literature citation: -
    -
  • containing the phrase "intracellular structural - proteins" in either title or abstract
  • -
  • co-authored by Michael Ashburner and published in - Nature
  • -
  • with the PubMed identifier 9169874
  • -
-
cluster cluster:UniRef90_A5YMT3 Lists all entries in the UniRef 90% identity cluster - whose representative sequence is UniProtKB entry A5YMT3.
count - annotation:(type:transmem count:5)
- annotation:(type:transmem count:[5 TO *])
- annotation:(type:cofactor count:[3 TO *]) -
Lists all entries with: -
    -
  • exactly 5 transmembrane regions
  • -
  • 5 or more transmembrane regions
  • -
  • 3 or more Cofactor comments
  • -
-
created - created:[20121001 TO *]
reviewed:yes AND - created:[current TO *] -
Lists all entries created since October 1st 2012.
- Lists all new UniProtKB/Swiss-Prot entries in the last release. -
database - database:(type:pfam)
database:(type:pdb 1aut) -
Lists all entries with: -
    -
  • a cross-reference to the Pfam database
  • -
  • a cross-reference to the PDB database entry 1aut
  • -
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
rest.uniprot.org fieldrest.uniprot.org exampleDescription
accessionaccession:P62988The old behaviour was to list all entries with primary or secondary accession number P62988. The new behaviour will list all primary / canonical isoform accessions P62988. To search over secondary accessions, we have introduced the sec_acc field.
activeactive:falseLists all obsolete entries.
Refer to the page: Sequence AnnotationsLists all entries with:
  1. any general annotation (comments [CC])
  2. any sequence annotation (features [FT])
  3. at least one amino acid modified with a Pyrrolidone carboxylic acid group
lit_authorlit_author:ashburnerLists all entries with at least one reference co-authored by Michael Ashburner.
protein_nameprotein_name:CD233Lists all entries whose cluster of differentiation number is CD233 (see cdlist.txt).
chebichebi:18420Lists all entries which are associated with the small molecule corresponding to ChEBI identifier 18420, Mg(2+) (see How can I search UniProt for chemical or reaction data?).
uniprot_id (/uniref), then uniref_cluster_90 (/uniprotkb)
  1. uniprot_id:A5YMT3 to find cluster UniRef90_P00395
  2. uniref_cluster_90:UniRef90_P00395
Find all entries in the UniRef 90% identity cluster whose representative sequence is UniProtKB entry A5YMT3 (about UniRef).
xrefcount_pdb (or xref_count)xref_count_pdb:[20 TO *]Lists all entries with 20 or more cross-references to PDB
date_createddate_created:[2012-10-01 TO *]Lists all entries created since October 1st 2012.
database, xref
  1. database:pfam
  2. xref:pdb-1aut
Lists all entries with:
  1. a cross-reference to the Pfam database
  2. a cross-reference to the PDB database entry 1aut
(see Databases cross-referenced in UniProtKB and Database mapping)
ecec:3.2.1.23Lists all beta-galactosidases (Enzyme nomenclature database).
Refer to the pages: Comments or Sequence AnnotationsLists all entries with:
  1. a signal sequence whose positions have been experimentally proven
  2. experimentally proven phosphoserine sites
  3. a function manually asserted according to rules
(see Evidence attribution)
existenceexistence:3See Protein existence criteria.
familyfamily:serpinLists all entries belonging to the Serpin family of proteins (Index of protein domains and families).
fragmentfragment:trueLists all entries with an incomplete sequence.
genegene:HPSELists all entries for proteins encoded by gene HPSE, but also by HPSE2.
gene_exactgene_exact:HPSELists all entries for proteins encoded by gene HPSE, but excluding variations like HPSE2 or HPSE_0.
gogo:0015629)Lists all entries associated with the GO term Actin cytoskeleton and any subclasses
virus_host_name, virus_host_idvirus_host_id:10090Lists all entries for viruses infecting Mus musculus (Mouse)
accession_idaccession_id:P00750Returns the entry with the primary accession number P00750.
inchikeyinchikey:WQZGKKKJIJFFOK-GASJEMHNSA-NReturns entries associated with the small molecule identified by the InChIKey WQZGKKKJIJFFOK-GASJEMHNSA-N, i.e. D-glucopyranose (see How can I search UniProt for chemical or reaction data?). To get the CHEBI identifier for an Inchikey value, one can now use the advanced search builder.
protein_nameprotein_name:AnakinraLists all entries whose protein name includes the "International Nonproprietary Name" is Anakinra.
interactorinteractor:P00520Lists all entries describing interactions with the protein described by entry P00520.
keyword
  1. keyword:toxin
  2. keyword:KW-0800
  1. Lists all entries associated with a keyword matching "Toxin" in its name or description (UniProtKB Keywords).
  2. Lists all entries associated with the UniProtKB keyword Toxin.
lengthlength:[500 TO 700]Lists all entries describing sequences of length between 500 and 700 residues.
massmass:[500000 TO *]Lists all entries describing sequences with a mass of at least 500,000 Da.
cc_mass_spectrometrycc_mass_spectrometry:maldiLists all entries for proteins identified by: matrix-assisted laser desorption/ionization (MALDI), crystallography (X-Ray). The method field searches names of physico-chemical identification methods in the 'Biophysicochemical properties' subsection of the 'Function' section, the 'Publications' and 'Cross-references' sections.
date_modifiedmodified:[2012-01-01 TO 2019-03-01] AND active:trueLists all active entries that were last modified between January and March 2019.
protein_nameprotein_name:"prion protein"Lists all entries for prion proteins.
organelleorganelle:MitochondrionLists all entries for proteins encoded by a gene of the mitochondrial chromosome.
organism_name, organism_id
  1. organism_name:"Ovis aries"
  2. organism_id:9940
  3. organism_name:sheep
Lists all entries for proteins expressed in sheep (first 2 examples) and organisms whose name contains the term "sheep" (UniProt taxonomy).
plasmidplasmid:ColE1Lists all entries for proteins encoded by a gene of plasmid ColE1 (Controlled vocabulary of plasmids).
proteomeproteome:UP000005640Lists all entries from the human proteome.
proteomecomponentproteomecomponent:"chromosome 1" AND organism_id:9606Lists all entries from the human chromosome 1.
sec_accsec_acc:P02023Lists all entries that were created from a merge with entry P02023 (see FAQ).
reviewedreviewed:trueLists all UniProtKB/Swiss-Prot entries (about UniProtKB).
scopescope:mutagenesisLists all entries containing a reference that was used to gather information about mutagenesis (Entry view: "Cited for", See 'Publications' section of the user manual).
sec_accsec_acc:P62988Lists all entries containing a secondary accession P62988.
sequenceaccession:P05067-9 AND is_isoform:trueLists all entries containing a link to isoform 9 of the sequence described in entry P05067. Allows searching by specific sequence identifier.
date_sequence_modified
  1. date_sequence_modified:[2012-01-01 TO 2012-03-01]
  2. date_sequence_modified:[2012-01-01 TO 2012-03-01]
  1. Lists all entries whose sequences were last modified between January and March 2012.
  2. Lists all UniProtKB/Swiss-Prot entries whose sequences were modified after the start of 2012.
strainstrain:wistarLists all entries containing a reference relevant to strain wistar (Lists of strains in reference comments and Taxonomy help: organism strains).
taxonomy_name, taxonomy_id
  1. taxonomy_name:mammal
  2. taxonomy_id:40674
Lists all entries for proteins expressed in Mammals. This field is used to retrieve entries for all organisms classified below a given taxonomic node (taxonomy classification).
tissuetissue:liverLists all entries containing a reference describing the protein sequence obtained from a clone isolated from liver (Controlled vocabulary of tissues).
cc_webresourcecc_webresource:wikipediaLists all entries for proteins that are described in Wikipedia.
-
domain domain:VWFA Lists all entries with a Von Willebrand factor type A - domain described in the 'Family and Domains' section.
ec ec:3.2.1.23 Lists all beta-galactosidases.
evidence - annotation:(type:signal evidence:ECO_0000269)
- (type:mod_res phosphoserine evidence:ECO_0000269)
- annotation:(type:function AND evidence:ECO_0000255) -
Lists all entries with: -
    -
  • a signal sequence whose positions have been - experimentally proven
  • -
  • experimentally proven phosphoserine sites
  • -
  • a function manually asserted according to rules
  • -
-
family family:serpin Lists all entries belonging to the Serpin family of - proteins.
fragment fragment:yes Lists all entries with an incomplete sequence.
gene gene:HSPC233 Lists all entries for proteins encoded by gene HSPC233.
go - go:cytoskeleton
go:0015629 -
Lists all entries associated with: -
    -
  • a GO term containing the word "cytoskeleton"
  • -
  • the GO term Actin cytoskeleton and any subclasses
  • -
-
host - host:mouse
host:10090
host:40674 -
Lists all entries for viruses infecting: -
    -
  • organisms with a name containing the word "mouse"
  • -
  • Mus musculus (Mouse)
  • -
  • all mammals (all taxa classified under the taxonomy - node for Mammalia)
  • -
-
idid:P00750Returns the entry with the primary accession number - P00750.
inn inn:Anakinra Lists all entries whose "International Nonproprietary - Name" is Anakinra.
interactor interactor:P00520 Lists all entries describing interactions with the - protein described by entry P00520.
keyword keyword:toxin Lists all entries associated with the keyword Toxin.
length length:[500 TO 700] Lists all entries describing sequences of length between - 500 and 700 residues.
lineage - This field is a synonym for the field taxonomy. -
mass mass:[500000 TO *] Lists all entries describing sequences with a mass of at - least 500,000 Da.
method - method:maldi
method:xray -
Lists all entries for proteins identified by: - matrix-assisted laser desorption/ionization (MALDI), - crystallography (X-Ray). The method field searches - names of physico-chemical identification methods in the - 'Biophysicochemical properties' subsection of the 'Function' - section, the 'Publications' and 'Cross-references' sections. -
mnemonic mnemonic:ATP6_HUMAN Lists all entries with entry name (ID) ATP6_HUMAN. - Searches also obsolete entry names.
modified - modified:[20120101 TO 20120301]
reviewed:yes AND - modified:[current TO *] -
Lists all entries that were last modified between January - and March 2012.
Lists all UniProtKB/Swiss-Prot entries - modified in the last release. -
name name:"prion protein" Lists all entries for prion proteins.
organelle organelle:Mitochondrion Lists all entries for proteins encoded by a gene of the - mitochondrial chromosome.
organism - organism:"Ovis aries"
organism:9940
- organism:sheep
-
Lists all entries for proteins expressed in sheep (first - 2 examples) and organisms whose name contains the term "sheep". -
plasmid plasmid:ColE1 Lists all entries for proteins encoded by a gene of - plasmid ColE1.
proteome proteome:UP000005640 Lists all entries from the human proteome.
proteomecomponent proteomecomponent:"chromosome 1" and - organism:9606 Lists all entries from the human chromosome 1.
replaces replaces:P02023 Lists all entries that were created from a merge with - entry P02023.
reviewed reviewed:yes Lists all UniProtKB/Swiss-Prot entries.
scope scope:mutagenesis Lists all entries containing a reference that was used to - gather information about mutagenesis.
sequence sequence:P05067-9 Lists all entries containing a link to isoform 9 of the - sequence described in entry P05067. Allows searching by specific - sequence identifier.
sequence_modified - sequence_modified:[20120101 TO 20120301]
reviewed:yes - AND sequence_modified:[current TO *] -
Lists all entries whose sequences were last modified - between January and March 2012.
Lists all - UniProtKB/Swiss-Prot entries whose sequences were modified in - the last release. -
source source:intact Lists all entries containing a GO term whose annotation - source is the IntAct database.
strain strain:wistar Lists all entries containing a reference relevant to - strain wistar.
taxonomy taxonomy:40674 Lists all entries for proteins expressed in Mammals. This - field is used to retrieve entries for all organisms classified - below a given taxonomic node taxonomy classification).
tissue tissue:liver Lists all entries containing a reference describing the - protein sequence obtained from a clone isolated from liver.
web web:wikipedia Lists all entries for proteins that are described in - Wikipedia.
- \ No newline at end of file + diff --git a/help/help/html/features/uniprotsequencefetcher.html b/help/help/html/features/uniprotsequencefetcher.html index 25d1a17..1d368ef 100644 --- a/help/help/html/features/uniprotsequencefetcher.html +++ b/help/help/html/features/uniprotsequencefetcher.html @@ -32,14 +32,11 @@ allows sequences to be located via gene name, keywords, or even via manual cross-referencing from UniProt or other bioinformatics websites. -

- Please Note:Versions of Jalview older than 2.11.2.3 may need a configuration change - in order to access freetext search. Please see this post: - https://discourse.jalview.org/t/uniprot-free-text-search-not-working-in-jalview-2-11-2-2-and-earlier/1825 - in Jalview's discussion forum for a workaround.
+ Please Note:UniProt updated their API in July 2022. Versions of Jalview older than 2.11.2.4 will not work with the July 2022 UniProt free text search. +
+ The new UniProt API has a different search syntax for ranges of dates and numbers, and different query fields for advanced searches. The general syntax of combining queries remains the same. Because of these differences, your previously saved searches will not appear in the dropdown list next to the search box. If you need to access these old searches they can be found in your ~/.jalview_properties file with the label CACHE.UNIPROT_FTS. If you want to transfer them to the new API search then copy the values to the CACHE.UNIPROT_2022_FTS label (or rename the existing label if the new one does not exist) (see the UniProtKB query fields page).

To open the UniProt Sequence Fetcher, select UniProt as the database from any Sequence Fetcher dialog (opened @@ -78,8 +75,7 @@

  • Complex queries with the UniProt query Syntax The text box also allows complex queries to be entered. The table below provides a brief overview - of the supported syntax (see query - fields for UniProtKB): + of the supported syntax (see the UniProtKB query fields page for more details): @@ -144,7 +140,7 @@ acids. - + @@ -171,7 +167,7 @@ like to be displayed or removed.

    The UniProt Free Test Search Interface was introduced in - Jalview 2.10.0 + Jalview 2.10.0 and updated to the July 2022 API in Jalview 2.11.2.4

    - \ No newline at end of file + -- 1.7.10.2
    human antigen
    citation:(author:Arai author:Chung)(lit_author:Arai) AND (lit_author:Chung) All entries with a publication that was coauthored by two specific authors.