X-Git-Url: http://source.jalview.org/gitweb/?a=blobdiff_plain;f=wiki%2FGSDI.wiki;h=ac4db98a08c33c77f229a54db9d7c12d9bb62511;hb=7b769dbf71669ab3b9237158f4942add7ea570e2;hp=a875f25883bc57f65884e13bbfa6a4a2a8c44c4f;hpb=c5ab23290f8f52cf0f3446283297ce2e087cd902;p=jalview.git diff --git a/wiki/GSDI.wiki b/wiki/GSDI.wiki index a875f25..ac4db98 100644 --- a/wiki/GSDI.wiki +++ b/wiki/GSDI.wiki @@ -1,25 +1,62 @@ -#summary preprocessing of gene trees for speciation/duplication inference +#summary generalized speciation duplication inference -= Generalized Speciation Duplication Inference += Generalized Speciation Duplication Inference = == Purpose == -Infer duplication events on a gene tree given a trusted species tree. +To infer duplication events on a gene tree given a trusted species tree. == Usage == {{{ java -Xmx1024m -cp -path/to/forester.jar org.forester.application.gene_tree_preprocess +path/to/forester.jar org.forester.application.gsdi [-options] }}} +=== Options === + * -g: to allow stripping of gene tree nodes without a matching species in the species tree + + * -m: use most parimonious duplication model for GSDI: assign nodes as speciations which would otherwise be assiged as potential duplications due tp polytomies in the species tree + * -q: to allow species tree in other formats than phyloXML (i.e. Newick, NHX, Nexus) -== Details == + * -b: to use SDIse algorithm instead of GSDI algorithm (for binary species trees) -Output consists of three files: - * input-name_preprocessed_gene_tree.phylo.xml - * input-name_species_present.txt - * input-name_removed_nodes.txt +==== Gene tree ==== +Must be in phyloXM format, with taxonomy and sequence data in appropriate fields ([http://forester.googlecode.com/files/wnt_gene_tree.xml example]). + +==== Species tree ==== +Must be in phyloXML format unless option -q is used ([http://forester.googlecode.com/files/species.xml example]). + +=== Output === + +Besides the main output of a gene tree with duplications and speciations assigned to all of its internal nodes, this program also produces the following: + * a log file, ending in `"_gsdi_log.txt"` ([http://forester.googlecode.com/files/wnt_gsdi_log.txt example]) + * a species tree file which only contains external nodes with were needed for the reconciliation, ending in `"_species_tree_used.xml"` + * if the gene tree contains species with scientific species names such as "Pyrococcus horikoshii strain ATCC 700860" and if a mapping cannot be establish based on these, GSDI will attempt to map by removing the "strain" (or "subspecies") information, these will be listed in a file ending in `"_gsdi_remapped.txt"`. + +=== Taxonomic mapping between gene and species tree === + +GSDI can establish a taxonomic mapping between gene and species tree on the following three data fields: + * scientific names (e.g. "Pyrococcus horikoshii") + * taxonomic identifiers (e.g. "35932" from uniprot or ncbi) + * taxonomy codes (e.g. "PYRHO") + + + +=== Example === +`gsdi -g -q gene_tree.xml tree_of_life.nwk out.xml` + + +=== Example files === + * [http://forester.googlecode.com/files/wnt_gene_tree.xml gene tree] + * [http://forester.googlecode.com/files/species.xml species tree] + * [http://forester.googlecode.com/files/wnt_gsdi_log.txt log file (output)] + + +== Reference == + +Zmasek CM and Eddy SR "A simple algorithm to infer gene duplication and speciation events on a gene tree" [http://bioinformatics.oxfordjournals.org/content/17/9/821.abstract Bioinformatics, 17, 821-828] + == Download ==