wiki/RIO.wiki

   1 #summary resampled inference of orthologs
   2
   3 = RIO: Resampled Inference of Orthologs =
   4
   5 == Purpose ==
   6
   7 RIO (Resampled Inference of Orthologs) is a method for automated phylogenomics based on explicit phylogenetic inference. RIO analyses are performed over resampled phylogenetic trees to estimate the reliability of orthology assignments.
   8
   9
  10 == Usage ==
  11
  12 `java -Xmx2048m -cp forester.jar org.forester.application.rio [options] <gene trees> <species tree> <outfile> [logfile]`
  13
  14 === Options ===
  15   * `-f=<first>` : first gene tree to analyze (0-based index) (default: analyze all gene trees)
  16   * `-l=<last>` : last gene tree to analyze (0-based index) (default: analyze all gene trees)
  17   * `-r=<re-rooting>` : re-rooting method for gene trees, possible values or 'none', 'midpoint', or 'outgroup' (default: by minizming duplications)
  18   * `-o=<outgroup>` : for rooting by outgroup, name of outgroup (external gene tree node)
  19   * `-b` : to use SDIR instead of GSDIR (faster, but non-binary species trees are disallowed, as are all options)
  20
  21
  22 ==== Gene trees ====
  23 The gene trees ideally are in [http://www.biomedcentral.com/1471-2105/10/356/ phyloXML] format, with taxonomy and sequence data in appropriate fields; but can also be in New Hamphshire (Newick) or Nexus format, as long as species information can be extracted from the gene names (e.g. "HUMAN" from "BCL2_HUMAN") ([http://forester.googlecode.com/files/gene_trees_rio.nh example]).
  24 All gene trees must be *completely binary*.
  25
  26
  27 ==== Species tree ====
  28 The species tree ideally is in [http://www.biomedcentral.com/1471-2105/10/356/ phyloXML] format, but can also be in New Hamphshire (Newick) or Nexus format  ([http://forester.googlecode.com/files/species_tree_rio.xml example]).
  29 The species tree is allowed to have nodes with more than two descendants (polytomies), as long as the (slower) GSDIR ([GSDI GSDI] re-rooting) algorithm is used.
  30
  31
  32 ==== Note about memory ====
  33 Since the Java memory default allocation is too small for even moderately large data-sets, it is necessary to increase it with the `-Xmx2048m` command line option.
  34
  35
  36 === Examples ===
  37 `rio gene_trees.nh species.xml outtable.tsv log.txt`
  38
  39 `rio gene_trees.nh species.xml outtable.tsv log.txt -r=outgroup -o=XVL1_ECOLI`
  40
  41 `rio gene_trees.nh species.xml outtable.tsv log.txt -f=0 -l=49`
  42
  43 === Example files ===
  44   * [http://forester.googlecode.com/files/gene_trees_rio.nh gene trees file]
  45   * [http://forester.googlecode.com/files/species_tree_rio.xml species tree file]
  46
  47
  48 == References ==
  49
  50 Zmasek CM and Eddy SR "RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs" [http://www.biomedcentral.com/1471-2105/3/14/ BMC Bioinformatics 2002, 3:14]
  51
  52 Zmasek CM and Eddy SR "A simple algorithm to infer gene duplication and speciation events on a gene tree" [http://bioinformatics.oxfordjournals.org/content/17/9/821.abstract Bioinformatics, 17, 821-828]
  53
  54 Han M and Zmasek CM "phyloXML: XML for evolutionary biology and comparative genomics" [http://www.biomedcentral.com/1471-2105/10/356/ BMC Bioinformatics 2009, 10:356]
  55
  56
  57 == Download ==
  58
  59 Download forester.jar here: http://code.google.com/p/forester/downloads/list