From fb039131770b5f74ce2d4668ebea037eebfdec84 Mon Sep 17 00:00:00 2001 From: cmzmasek Date: Thu, 14 Jul 2016 19:28:13 -0700 Subject: [PATCH] new version --- .../resources/phyloxml_schema/1.20/phyloxml.xsd | 593 ++++++++++++++++++++ 1 file changed, 593 insertions(+) create mode 100644 forester/resources/phyloxml_schema/1.20/phyloxml.xsd diff --git a/forester/resources/phyloxml_schema/1.20/phyloxml.xsd b/forester/resources/phyloxml_schema/1.20/phyloxml.xsd new file mode 100644 index 0000000..f4e5872 --- /dev/null +++ b/forester/resources/phyloxml_schema/1.20/phyloxml.xsd @@ -0,0 +1,593 @@ + + + + + + + + + + + + + + + + + + + + + + + phyloXML is an XML language to describe evolutionary trees and associated data. Version: 1.10. + License: dual-licensed under the LGPL or Ruby's License. Copyright (c) 2008-2011 Christian M Zmasek. + + + + + + + 'phyloxml' is the name of the root element. Phyloxml contains an arbitrary number of + 'phylogeny' elements (each representing one phylogeny) possibly followed by elements from other namespaces. + + + + + + + + + + + Element Phylogeny is used to represent a phylogeny. The required attribute 'rooted' is used + to indicate whether the phylogeny is rooted or not. The attribute 'rerootable' can be used to indicate that + the phylogeny is not allowed to be rooted differently (i.e. because it is associated with root dependent + data, such as gene duplications). The attribute 'type' can be used to indicate the type of phylogeny (i.e. + 'gene tree'). It is recommended to use the attribute 'branch_length_unit' if the phylogeny has branch + lengths. Element clade is used in a recursive manner to describe the topology of a phylogenetic + tree. + + + + + + + + + + + + + + + + + + + + + + Element Clade is used in a recursive manner to describe the topology of a phylogenetic tree. + The parent branch length of a clade can be described either with the 'branch_length' element or the + 'branch_length' attribute (it is not recommended to use both at the same time, though). Usage of the + 'branch_length' attribute allows for a less verbose description. Element 'confidence' is used to indicate + the support for a clade/parent branch. Element 'events' is used to describe such events as gene-duplications + at the root node/parent branch of a clade. Element 'width' is the branch width for this clade (including + parent branch). Both 'color' and 'width' elements apply for the whole clade unless overwritten in-sub + clades. Attribute 'id_source' is used to link other elements to a clade (on the xml-level). + + + + + + + + + + + + + + + + + + + + + + + + + + + Element Taxonomy is used to describe taxonomic information for a clade. Element 'code' is + intended to store UniProt/Swiss-Prot style organism codes (e.g. 'APLCA' for the California sea hare 'Aplysia + californica') or other styles of mnemonics (e.g. 'Aca'). Element 'authority' is used to keep the authority, + such as 'J. G. Cooper, 1863', associated with the 'scientific_name'. Element 'id' is used for a unique + identifier of a taxon (for example '6500' with 'ncbi_taxonomy' as 'provider' for the California sea hare). + Attribute 'id_source' is used to link other elements to a taxonomy (on the xml-level). + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Element Sequence is used to represent a molecular sequence (Protein, DNA, RNA) associated + with a node. 'symbol' is a short (maximal 20 characters) symbol of the sequence (e.g. 'ACTM') whereas + 'name' is used for the full name (e.g. 'muscle Actin'). 'gene_name' can be used when protein and gene names differ. + 'location' is used for the location of a sequence on a genome/chromosome. The actual sequence can be stored with the + 'mol_seq' element. Attribute 'type' is used to indicate the type of sequence ('dna', 'rna', or 'protein'). + One intended use for 'id_ref' is to link a sequence to a taxonomy (via the taxonomy's 'id_source') in case + of multiple sequences and taxonomies per node. + + + + + + + + + + + + + + + + + + + + + + + + + + Element 'mol_seq' is used to store molecular sequences. The 'is_aligned' attribute is used + to indicated that this molecular sequence is aligned with all other sequences in the same phylogeny for + which 'is aligned' is true as well (which, in most cases, means that gaps were introduced, and that all + sequences for which 'is aligned' is true must have the same length). + + + + + + + + + + + + + + + + + + Element Accession is used to capture the local part in a sequence identifier (e.g. 'P17304' + in 'UniProtKB:P17304', in which case the 'source' attribute would be 'UniProtKB'). + + + + + + + + + + + + Used to store accessions to additional resources. + + + + + + + + + This is used describe the domain architecture of a protein. Attribute 'length' is the total + length of the protein + + + + + + + + + To represent an individual domain in a domain architecture. The name/unique identifier is + described via the 'id' attribute. 'confidence' can be used to store (i.e.) E-values. + + + + + + + + + + + + + + Events at the root node of a clade (e.g. one gene duplication). + + + + + + + + + + + + + + + + + + + + + + + The names and/or counts of binary characters present, gained, and lost at the root of a + clade. + + + + + + + + + + + + + + + + + + + + + + A literature reference for a clade. It is recommended to use the 'doi' attribute instead of + the free text 'desc' element whenever possible. + + + + + + + + + + The annotation of a molecular sequence. It is recommended to annotate by using the optional + 'ref' attribute (some examples of acceptable values for the ref attribute: 'GO:0008270', + 'KEGG:Tetrachloroethene degradation', 'EC:1.1.1.1'). Optional element 'desc' allows for a free text + description. Optional element 'confidence' is used to state the type and value of support for a annotation. + Similarly, optional attribute 'evidence' is used to describe the evidence for a annotation as free text + (e.g. 'experimental'). Optional element 'property' allows for further, typed and referenced annotations from + external resources. + + + + + + + + + + + + + + + + Property allows for typed and referenced properties from external resources to be attached + to 'Phylogeny', 'Clade', and 'Annotation'. The value of a property is its mixed (free text) content. + Attribute 'datatype' indicates the type of a property and is limited to xsd-datatypes (e.g. 'xsd:string', + 'xsd:boolean', 'xsd:integer', 'xsd:decimal', 'xsd:float', 'xsd:double', 'xsd:date', 'xsd:anyURI'). Attribute + 'applies_to' indicates the item to which a property applies to (e.g. 'node' for the parent node of a clade, + 'parent_branch' for the parent branch of a clade). Attribute 'id_ref' allows to attached a property + specifically to one element (on the xml-level). Optional attribute 'unit' is used to indicate the unit of + the property. An example: <property datatype="xsd:integer" ref="NOAA:depth" applies_to="clade" + unit="METRIC:m"> 200 </property> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + A uniform resource identifier. In general, this is expected to be an URL (for example, to + link to an image on a website, in which case the 'type' attribute might be 'image' and 'desc' might be + 'image of a California sea hare'). + + + + + + + + + + + + A general purpose confidence element. For example this can be used to express the bootstrap + support value of a clade (in which case the 'type' attribute is 'bootstrap'). + + + + + + + + + + + + A general purpose identifier element. Allows to indicate the provider (or authority) of an + identifier. + + + + + + + + + + + The geographic distribution of the items of a clade (species, sequences), intended for + phylogeographic applications. The location can be described either by free text in the 'desc' element and/or + by the coordinates of one or more 'Points' (similar to the 'Point' element in Google's KML format) or by + 'Polygons'. + + + + + + + + + + The coordinates of a point with an optional altitude (used by element 'Distribution'). + Required attributes are the 'geodetic_datum' used to indicate the geodetic datum (also called 'map datum', + for example Google's KML uses 'WGS84'). Attribute 'alt_unit' is the unit for the altitude (e.g. 'meter'). + + + + + + + + + + + + + A polygon defined by a list of 'Points' (used by element 'Distribution'). + + + + + + + + + + A date associated with a clade/node. Its value can be numerical by using the 'value' element + and/or free text with the 'desc' element' (e.g. 'Silurian'). If a numerical value is used, it is recommended + to employ the 'unit' attribute to indicate the type of the numerical value (e.g. 'mya' for 'million years + ago'). The elements 'minimum' and 'maximum' are used the indicate a range/confidence + interval + + + + + + + + + + + + + This indicates the color of a clade when rendered (the color applies to the whole clade + unless overwritten by the color(s) of sub clades). + + + + + + + + + + + + This is used to express a typed relationship between two sequences. For example it could be + used to describe an orthology (in which case attribute 'type' is 'orthology'). + + + + + + + + + + + + + + + + + + + + + + + + + This is used to express a typed relationship between two clades. For example it could be + used to describe multiple parents of a clade. + + + + + + + + + + + + + + + + + -- 1.7.10.2