Sequence Features File
The Sequence features File provides a simple way of getting your own sequence features into Jalview. It also allows feature display styles and filters to be saved and imported to another alignment. Users familiar with the earliest versions of Jalview will know that features files were originally termed 'groups' files, and that the format was was designed as a space efficient format to allow sequence features to be rendered in the Jalview applet.
Features files are imported into Jalview in the following ways:
--features <Features filename>
Sequence Features File Format
A features file is a simple ASCII text file, where each line contains tab separated text fields. No comments are allowed. Its structure consists of three blocks:
The first set of lines contain feature type definitions and their colours:
<Feature Type> <Feature Style>Each feature type definition assigns a style to features of the given type. <Feature Style> can be either a simple colour, or a more complex Graduated Colour Scheme that shades features according to their description, score, or other attributes.
Assigning a colour for a <Feature Type>
A single colour specified as either a red,green,blue 24 bit
triplet in hexadecimal (eg. 00ff00) or as comma separated numbers
(ranging from 0 to 255))
(For help with colour values, see https://www.w3schools.com/colors/colors_converter.asp.)
Specifying a Graduated Colourscheme
Data dependent feature colourschemes are defined by a series of "|" separated fields:
[label or score or attribute|<attName>|]<mincolor>|<maxcolor>|[absolute|]<minvalue>|<maxvalue>[|<novalue>][|<thresholdtype>|[<threshold value>]]
This section is optional, and allows one or more filters to be defined for each feature type.
Only features that satisfy the filter conditions will be displayed.
Begin with a line which is just STARTFILTERS, and end with a line which is just ENDFILTERS.
Each line has the format:
featureType <tab> (filtercondition1) [and|or] (filtercondition2) [and|or]...The parentheses are not needed if there is only one condition. Combine multiple conditions with either and or or (but not a mixture).
Label or Score or AttributeName condition [value]where either the label (description), (numeric) score, or (text or numeric) attribute is tested against the condition.
The remaining lines in the file are sequence feature data. Features are either non-positional - attached to a whole sequence (as specified by its ID), or positional, so attached to a specific range on a sequence. In addition to a type, features can also include descriptive text and a score, and depending on the format used, many additional attributes.
Importing Generalised Feature Format (GFF) feature dataJalview has its own tabular format (described below) for describing sequence features, which allows HTML descriptions (including URLs) to be defined for each feature. However, sequence feature definitions can also be provided in GFF2 (http://gmod.org/wiki/GFF2) or GFF3 (http://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md) format. To do this, a line containing only 'GFF' should precede any GFF data (this mixed format capability was added in Jalview 2.6).
Feature attributes can be included as name=value
pairs in GFF3 column 9, including (since Jalview 2.11.1.0) 'nested' sub-attributes, for example:
alleles=G,A,C;AF=6;CSQ=SIFT=deleterious,tolerated,PolyPhen=possibly_damaging(0.907)
where SIFT
and PolyPhen
are sub-attributes of CSQ
. This data is preserved if features are exported in GFF format (but not, currently,
in Jalview format).
Jalview's sequence feature format
Each feature is specified as a tab-separated series of columns as defined below:
description sequenceId sequenceIndex start end featureType score (optional)This format allows two alternate ways of referring to a sequence, either by its text ID, or its index (base 0) in an associated alignment. Normally, sequence features are associated with sequences rather than alignments, and the sequenceIndex field is given as "-1". In order to specify a sequence by its index in a particular alignment, the sequenceId should be given as "ID_NOT_SPECIFIED", otherwise the sequenceId field will be used in preference to the sequenceIndex field.
The description may contain simple HTML document body tags if
enclosed by "<html></html>" and these will be
rendered as formatted tooltips in the Jalview Application (the
Jalview applet is not capable of rendering HTML tooltips, so all
formatting tags will be removed).
Attaching Links
to Sequence Features
Any anchor tags in an html formatted
description line will be translated into URL links. A link symbol
will be displayed adjacent to any feature which includes links, and
these are made available from the links submenu
of the popup menu which is obtained by right-clicking when a link
symbol is displayed in the tooltip.
Non-positional
features
Specify the start and end for
a feature to be 0 in order to attach it to the
whole sequence. Non-positional features are shown in a tooltip when
the mouse hovers over the sequence ID panel, and any embedded links
can be accessed from the popup menu.
Scores
Scores can be associated with sequence features, and used to sort
sequences or shade the alignment (this was added in Jalview 2.5).
The score field is optional, and malformed scores will be ignored.
Feature annotations can be collected into named groups by prefixing definitions with lines of the form:
startgroup groupname.. and subsequently post-fixing the group with:
endgroup groupnameFeature grouping was introduced in version 2.08, and used to control whether a set of features are either hidden or shown together in the sequence Feature settings dialog box.
A complete example is shown below :
domain red metal ion-binding site 00ff00 transit peptide 0,105,215 chain 225,105,0 modified residue 105,225,35 signal peptide 0,155,165 helix ff0000 strand 00ff00 coil cccccc kdHydrophobicity ccffcc|333300|-3.9|4.5|above|-2.0 STARTFILTERS metal ion-binding site Label Contains sulfur kdHydrophobicity (Score LT 1.5) OR (Score GE 2.8) ENDFILTERS Your Own description here FER_CAPAA -1 3 93 domain Your Own description here FER_CAPAN -1 48 144 chain Your Own description here FER_CAPAN -1 50 140 domain Your Own description here FER_CAPAN -1 136 136 modified residue Your Own description here FER1_LYCES -1 1 47 transit peptide Your Own description here Q93XJ9_SOLTU -1 1 48 signal peptide Your Own description here Q93XJ9_SOLTU -1 49 144 chain STARTGROUP secondarystucture PDB secondary structure annotation FER1_SPIOL -1 52 59 strand PDB secondary structure annotation FER1_SPIOL -1 74 80 helix ENDGROUP secondarystructure STARTGROUP kd Hydrophobicity score by kD Q93XJ9_SOLTU -1 48 48 kdHydrophobicity 1.8 ENDGROUP kd GFF FER_CAPAA GffGroup domain 3 93 . .