Edited wiki page PhyloBioRuby through web user interface.

[jalview.git] / wiki / PhyloBioRuby.wiki
diff --git a/wiki/PhyloBioRuby.wiki b/wiki/PhyloBioRuby.wiki

index cf2e5d5..91baa22 100644 (file)
--- a/wiki/PhyloBioRuby.wiki
+++ b/wiki/PhyloBioRuby.wiki
@@ -1,65 +1,132 @@
  #summary Tutorial for multiple sequence alignments and phylogenetic methods in BioRuby -- under development!
  
+
+
  = Introduction =
  
-Tutorial for multiple sequence alignments and phylogenetic methods in !BioRuby -- under development!
+Under development!
+
+Tutorial for multiple sequence alignments and phylogenetic methods in [http://bioruby.open-bio.org/ BioRuby].
+
+Eventually, this is expected to be placed on the official !BioRuby page.
+
+Author: [http://www.cmzmasek.net/ Christian M Zmasek], Sanford-Burnham Medical Research Institute
+
+ 
+Copyright (C) 2011 Christian M Zmasek
  
  
  = Multiple Sequence Alignments =
  
+
  == Multiple Sequence Alignment Input and Output ==
  
-=== Reading in a Multiple Sequence Alignments from a File ===
+=== Reading in a Multiple Sequence Alignment from a File ===
  
-_... to be done_
+The follow example shows how to read in a *ClustalW*-formatted multiple sequence alignment.
  
  {{{
  #!/usr/bin/env ruby
  require 'bio'
  
+# Reads in a ClustalW-formatted multiple sequence alignment
+# from a file named "infile_clustalw.aln" and stores it in 'report'.
+report = Bio::ClustalW::Report.new(File.read('infile_clustalw.aln'))
+
+# Accesses the actual alignment.
+align = report.alignment
+
+# Goes through all sequences in 'align' and prints the
+# actual molecular sequence.
+align.each do |entry|
+  puts entry.seq
+end
  }}}
  
+ 
  
  === Writing a Multiple Sequence Alignment to a File ===
  
-_... to be done_
+
+The follow example shows how to writing a multiple sequence alignment in *FASTA*-format:
  
  {{{
  #!/usr/bin/env ruby
  require 'bio'
  
+# Creates a new file named "outfile.fasta" and writes
+# multiple sequence alignment 'align' to it in fasta format.
+File.open('outfile.fasta', 'w') do |f|
+  f.write(align.output(:fasta))
+end
+}}}
+
+The following constants determine the output format
+
+  * ClustalW: `:clustal`
+  * FASTA:    `:fasta`
+  * PHYLIP interleaved (will truncate sequence names to no more than 10 characters): `:phylip`
+  * PHYLIP non-interleaved (will truncate sequence names to no more than 10 characters): `:phylipnon`
+  * MSF: `:msf`
+  * Molphy: `:molphy`
+
+
+For example, the following writes iPHYLIP's non-interleaved format:
+
+{{{
+f.write(align.output(:phylipnon))
  }}}
  
  
  
-== Calculating Multiple Sequence Alignments  ==
+== Calculating Multiple Sequence Alignments ==
  
  !BioRuby can be used to execute a variety of multiple sequence alignment
-programs (such as [http://mafft.cbrc.jp/alignment/software/ MAFFT], [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], [http://www.drive5.com/muscle/ Muscle]). 
+programs (such as [http://mafft.cbrc.jp/alignment/software/ MAFFT], [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], [http://www.drive5.com/muscle/ Muscle], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee]). 
  In the following, examples for using the MAFFT and Muscle are shown.
  
  
  === MAFFT ===
  
+The following example uses the MAFFT program to align four sequences
+and then prints the result to the screen.
+Please note that if the path to the MAFFT executable is properly set `mafft=Bio::MAFFT.new(options)` can be used instead of explicitly indicating the path as in the example. 
+
  {{{
  #!/usr/bin/env ruby
  require 'bio'
  
+# 'seqs' is either an array of sequences or a multiple sequence 
+# alignment. In general this is read in from a file as described in ?.
+# For the purpose of this tutorial, it is generated in code.
+seqs = ["KMLFGVVFFFGG",
+        "LMGGHHF",
+        "GKKKKGHHHGHRRRGR",
+        "KKKKGHHHGHRRRGR"] 
+
+
  # Calculates the alignment using the MAFFT program on the local
  # machine with options '--maxiterate 1000 --localpair'
  # and stores the result in 'report'.
  options = ['--maxiterate', '1000', '--localpair']
  mafft = Bio::MAFFT.new('path/to/mafft', options)
-report = mafft.query_align( seqs)
+report = mafft.query_align(seqs)
  
-# Accesses the actual alignment
+# Accesses the actual alignment.
  align = report.alignment
  
  # Prints each sequence to the console.
-report.align.each { |s| puts s.to_s }
-#
+align.each { |s| puts s.to_s }
+
  }}}
  
+References:
+
+ * Katoh, Toh (2008) "Recent developments in the MAFFT multiple sequence alignment program" Briefings in Bioinformatics 9:286-298 
+
+ * Katoh, Toh 2010 (2010) "Parallelization of the MAFFT multiple sequence alignment program" Bioinformatics 26:1899-1900 
+
+
  
  === Muscle ===
  
@@ -67,26 +134,41 @@ report.align.each { |s| puts s.to_s }
  #!/usr/bin/env ruby
  require 'bio'
  
+# 'seqs' is either an array of sequences or a multiple sequence 
+# alignment. In general this is read in from a file as described in ?.
+# For the purpose of this tutorial, it is generated in code.
+seqs = ["KMLFGVVFFFGG",
+        "LMGGHHF",
+        "GKKKKGHHHGHRRRGR",
+        "KKKKGHHHGHRRRGR"] 
+
  # Calculates the alignment using the Muscle program on the local
  # machine with options '-quiet -maxiters 64'
  # and stores the result in 'report'.
  options = ['-quiet', '-maxiters', '64']
  muscle = Bio::Muscle.new('path/to/muscle', options)
-report = muscle.query_align( seqs)
+report = muscle.query_align(seqs)
  
-# Accesses the actual alignment
+# Accesses the actual alignment.
  align = report.alignment
  
  # Prints each sequence to the console.
-report.align.each { |s| puts s.to_s }
-#
+align.each { |s| puts s.to_s }
+
  }}}
  
+References:
+
+ * Edgar, R.C. (2004) "MUSCLE: multiple sequence alignment with high accuracy and high throughput" Nucleic Acids Res 32(5):1792-1797
+
+=== Other Programs ===
+
+[http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee] can be used in the same manner as the programs above. 
+
  
  == Manipulating Multiple Sequence Alignments ==
  
-It is probably a good idea to 'clean up' multiple sequence to be used
-for phylogenetic inference. For instance, columns with more than 50% gaps can be deleted, like so:
+Oftentimes, multiple sequence to be used for phylogenetic inference are 'cleaned up' in some manner. For instance, some researchers prefer to delete columns with more than 50% gaps. The following code is an example of how to do that in !BioRuby.
  
  
  _... to be done_
@@ -98,6 +180,8 @@ require 'bio'
  }}}
  
  
+----
+
  = Phylogenetic Trees =
  
  == Phylogenetic Tree Input and Output ==
@@ -114,6 +198,8 @@ require 'bio'
  
  Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation
  
+
+
  === Writing of Phylogenetic Trees ===
  
  _... to be done_
@@ -127,9 +213,14 @@ require 'bio'
  Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation
  
  
+
  == Phylogenetic Inference ==
  
-*Currently !BioRuby does not contain any wrappers for phylogenetic inference, I am progress of writing a RAxML wrapper followed by a wrapper for FastMA.*
+_Currently !BioRuby does not contain wrappers for phylogenetic inference programs, thus I am progress of writing a RAxML wrapper followed by a wrapper for FastME..._
+
+_What about pairwise distance calculation?_
+
+
  
  == Maximum Likelihood ==
  
@@ -144,6 +235,16 @@ require 'bio'
  }}}
  
  
+=== PhyML ===
+
+_... to be done_
+
+{{{
+#!/usr/bin/env ruby
+require 'bio'
+
+}}}
+
  == Pairwise Distance Based Methods ==
  
  === FastME ===
@@ -157,11 +258,34 @@ require 'bio'
  }}}
  
  
+
+=== PHYLIP? ===
+
+
+
+== Support Calculation? ==
+
+=== Bootstrap Resampling? ===
+
+
+----
+
  = Analyzing Phylogenetic Trees =
  
+== PAML ==
+
+
  == Gene Duplication Inference ==
  
  _need to further test and then import GSoC 'SDI' work..._
  
  
-== Others? ==
\ No newline at end of file
+== Others? ==
+
+
+----
+
+= Putting It All Together =
+
+Example of a small "pipeline"-type program running a mininal phyogenetic analysis: starting with a set of sequences and ending with a phylogenetic tree.
+ 
\ No newline at end of file