phyloXML is an XML language to describe evolutionary trees and associated
data. Version: 1.20. License: dual-licensed under the LGPL or Ruby's License. Copyright (c)
2008-2016 Christian M Zmasek.
'phyloxml' is the name of the root element. Phyloxml contains an
arbitrary number of 'phylogeny' elements (each representing one phylogeny) possibly
followed by elements from other namespaces.
Element Phylogeny is used to represent a phylogeny. The required
attribute 'rooted' is used to indicate whether the phylogeny is rooted or not. The
attribute 'rerootable' can be used to indicate that the phylogeny is not allowed to be
rooted differently (i.e. because it is associated with root dependent data, such as gene
duplications). The attribute 'type' can be used to indicate the type of phylogeny (i.e.
'gene tree'). It is recommended to use the attribute 'branch_length_unit' if the
phylogeny has branch lengths. Element clade is used in a recursive manner to describe
the topology of a phylogenetic tree.
Element Clade is used in a recursive manner to describe the topology of
a phylogenetic tree. The parent branch length of a clade can be described either with
the 'branch_length' element or the 'branch_length' attribute (it is not recommended to
use both at the same time, though). Usage of the 'branch_length' attribute allows for a
less verbose description. Element 'confidence' is used to indicate the support for a
clade/parent branch. Element 'events' is used to describe such events as
gene-duplications at the root node/parent branch of a clade. Element 'width' is the
branch width for this clade (including parent branch). Both 'color' and 'width' elements
apply for the whole clade unless overwritten in-sub clades. Attribute 'id_source' is
used to link other elements to a clade (on the xml-level).
Element Taxonomy is used to describe taxonomic information for a clade.
Element 'code' is intended to store UniProt/Swiss-Prot style organism codes (e.g.
'APLCA' for the California sea hare 'Aplysia californica'). Element 'authority' is used
to keep the authority, such as 'J. G. Cooper, 1863', associated with the
'scientific_name'. Element 'id' is used for a unique identifier of a taxon (for example
'6500' with 'ncbi_taxonomy' as 'provider' for the California sea hare). Attribute
'id_source' is used to link other elements to a taxonomy (on the
xml-level).
Element Sequence is used to represent a molecular sequence (Protein,
DNA, RNA) associated with a node. 'symbol' is a short (maximal 20 characters) symbol of
the sequence (e.g. 'ACTM') whereas 'name' is used for the full name (e.g. 'muscle
Actin'). 'gene_name' can be used when protein and gene names differ. 'location' is used
for the location of a sequence on a genome/chromosome. The actual sequence can be stored
with the 'mol_seq' element. Attribute 'type' is used to indicate the type of sequence
('dna', 'rna', or 'protein'). One intended use for 'id_ref' is to link a sequence to a
taxonomy (via the taxonomy's 'id_source') in case of multiple sequences and taxonomies
per node.
Element 'mol_seq' is used to store molecular sequences. The 'is_aligned'
attribute is used to indicated that this molecular sequence is aligned with all other
sequences in the same phylogeny for which 'is aligned' is true as well (which, in most
cases, means that gaps were introduced, and that all sequences for which 'is aligned' is
true must have the same length).
Element Accession is used to capture the local part in a sequence
identifier (e.g. 'P17304' in 'UniProtKB:P17304', in which case the 'source' attribute
would be 'UniProtKB').
Used to store accessions to additional resources.
This is used describe the domain architecture of a protein. Attribute
'length' is the total length of the protein
To represent an individual domain in a domain architecture. The
name/unique identifier is described via the 'id' attribute. 'confidence' can be used to
store (i.e.) E-values.
Events at the root node of a clade (e.g. one gene duplication).
The names and/or counts of binary characters present, gained, and lost
at the root of a clade.
A literature reference for a clade. It is recommended to use the 'doi'
attribute instead of the free text 'desc' element whenever possible.
The annotation of a molecular sequence. It is recommended to annotate by
using the optional 'ref' attribute (some examples of acceptable values for the ref
attribute: 'GO:0008270', 'KEGG:Tetrachloroethene degradation', 'EC:1.1.1.1'). Optional
element 'desc' allows for a free text description. Optional element 'confidence' is used
to state the type and value of support for a annotation. Similarly, optional attribute
'evidence' is used to describe the evidence for a annotation as free text (e.g.
'experimental'). Optional element 'property' allows for further, typed and referenced
annotations from external resources.
Property allows for typed and referenced properties from external
resources to be attached to 'Phylogeny', 'Clade', and 'Annotation'. The value of a
property is its mixed (free text) content. Attribute 'datatype' indicates the type of a
property and is limited to xsd-datatypes (e.g. 'xsd:string', 'xsd:boolean',
'xsd:integer', 'xsd:decimal', 'xsd:float', 'xsd:double', 'xsd:date', 'xsd:anyURI').
Attribute 'applies_to' indicates the item to which a property applies to (e.g. 'node'
for the parent node of a clade, 'parent_branch' for the parent branch of a clade).
Attribute 'id_ref' allows to attached a property specifically to one element (on the
xml-level). Optional attribute 'unit' is used to indicate the unit of the property. An
example: <property datatype="xsd:integer" ref="NOAA:depth" applies_to="clade"
unit="METRIC:m"> 200 </property>
A uniform resource identifier. In general, this is expected to be an URL
(for example, to link to an image on a website, in which case the 'type' attribute might
be 'image' and 'desc' might be 'image of a California sea hare').
A general purpose confidence element. For example this can be used to
express the bootstrap support value of a clade (in which case the 'type' attribute is
'bootstrap').
A general purpose identifier element. Allows to indicate the provider
(or authority) of an identifier.
The geographic distribution of the items of a clade (species,
sequences), intended for phylogeographic applications. The location can be described
either by free text in the 'desc' element and/or by the coordinates of one or more
'Points' (similar to the 'Point' element in Google's KML format) or by 'Polygons'.
The coordinates of a point with an optional altitude (used by element
'Distribution'). Required attributes are the 'geodetic_datum' used to indicate the
geodetic datum (also called 'map datum', for example Google's KML uses 'WGS84').
Attribute 'alt_unit' is the unit for the altitude (e.g. 'meter').
A polygon defined by a list of 'Points' (used by element
'Distribution').
A date associated with a clade/node. Its value can be numerical by using
the 'value' element and/or free text with the 'desc' element' (e.g. 'Silurian'). If a
numerical value is used, it is recommended to employ the 'unit' attribute to indicate
the type of the numerical value (e.g. 'mya' for 'million years ago'). The elements
'minimum' and 'maximum' are used the indicate a range/confidence
interval
This indicates the color of a clade when rendered (the color applies to
the whole clade unless overwritten by the color(s) of sub clades).
This is used to express a typed relationship between two sequences. For
example it could be used to describe an orthology (in which case attribute 'type' is
'orthology').
This is used to express a typed relationship between two clades. For
example it could be used to describe multiple parents of a clade.