#summary Tutorial for multiple sequence alignments and phylogenetic methods in BioRuby -- under development!


= Introduction =

Tutorial for multiple sequence alignments and phylogenetic methods in !BioRuby -- under development!


= Multiple Sequence Alignments =


== Multiple Sequence Alignment Input and Output ==

=== Reading in a Multiple Sequence Alignment from a File ===

Reading in a clustalw formatted multiple sequence alignment:

{{{
#!/usr/bin/env ruby
require 'bio'

# Reads in a clustalw formatted multiple sequence alignment
# from a file named "infile_clustalw.aln" and stores it in 'report'.
report = Bio::ClustalW::Report.new(File.read('infile_clustalw.aln'))

# Accesses the actual alignment.
align = report.alignment

# Goes through all sequences in 'align' and prints the
# actual molecular sequence.
align.each do |entry|
  puts entry.seq
end
}}}

 
=== Writing a Multiple Sequence Alignment to a File ===

Writing a multiple sequence alignment in fasta format:

{{{
#!/usr/bin/env ruby
require 'bio'

# Creates a new file named "outfile.fasta" and writes
# multiple sequence alignment 'align' to it in fasta format.
File.open('outfile.fasta', 'w') do |f|
  f.write(align.output(:fasta))
end
}}}


Writing a multiple sequence alignment in clustalw format:

{{{
#!/usr/bin/env ruby
require 'bio'

# Creates a new file named "outfile.aln" and writes
# multiple sequence alignment 'align' to it in clustal format.
File.open('outfile.aln', 'w') do |f|
  f.write(align.output(:clustal))
end
}}}


== Calculating Multiple Sequence Alignments ==

!BioRuby can be used to execute a variety of multiple sequence alignment
programs (such as [http://mafft.cbrc.jp/alignment/software/ MAFFT], [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], [http://www.drive5.com/muscle/ Muscle], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee]). 
In the following, examples for using the MAFFT and Muscle are shown.


=== MAFFT ===

The following example uses the MAFFT program to align four sequences
and then prints the result to the screen.
If the path to the MAFFT executable is properly set, `mafft = Bio::MAFFT.new(options)` can be used instead of explicitly giving a path. 

n{{{
#!/usr/bin/env ruby
require 'bio'

# 'seqs' is either an array of sequences or a multiple sequence 
# alignment. In general this is read in from a file as described in ?.
# For the purpose of this tutorial, it is generated in code.
seqs = ["KMLFGVVFFFGG",
        "LMGGHHF",
        "GKKKKGHHHGHRRRGR",
        "KKKKGHHHGHRRRGR"] 


# Calculates the alignment using the MAFFT program on the local
# machine with options '--maxiterate 1000 --localpair'
# and stores the result in 'report'.
options = ['--maxiterate', '1000', '--localpair']
mafft = Bio::MAFFT.new('path/to/mafft', options)
report = mafft.query_align(seqs)

# Accesses the actual alignment.
align = report.alignment

# Prints each sequence to the console.
align.each { |s| puts s.to_s }

}}}

References:

 * Katoh, Toh (2008) "Recent developments in the MAFFT multiple sequence alignment program" Briefings in Bioinformatics 9:286-298 

 * Katoh, Toh 2010 (2010) "Parallelization of the MAFFT multiple sequence alignment program" Bioinformatics 26:1899-1900 


=== Muscle ===

{{{
#!/usr/bin/env ruby
require 'bio'

# 'seqs' is either an array of sequences or a multiple sequence 
# alignment. In general this is read in from a file as described in ?.
# For the purpose of this tutorial, it is generated in code.
seqs = ["KMLFGVVFFFGG",
        "LMGGHHF",
        "GKKKKGHHHGHRRRGR",
        "KKKKGHHHGHRRRGR"] 

# Calculates the alignment using the Muscle program on the local
# machine with options '-quiet -maxiters 64'
# and stores the result in 'report'.
options = ['-quiet', '-maxiters', '64']
muscle = Bio::Muscle.new('path/to/muscle', options)
report = muscle.query_align(seqs)

# Accesses the actual alignment.
align = report.alignment

# Prints each sequence to the console.
align.each { |s| puts s.to_s }

}}}

References:

 * Edgar, R.C. (2004) "MUSCLE: multiple sequence alignment with high accuracy and high throughput" Nucleic Acids Res 32(5):1792-1797

=== Other Programs ===

[http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee] can be used in the same manner as the programs above. 


== Manipulating Multiple Sequence Alignments ==

Oftentimes, multiple sequence to be used for phylogenetic inference are 'cleaned up' in some manner. For instance, some researchers prefer to delete columns with more than 50% gaps. The following code is an example of how to do that in !BioRuby.


_... to be done_

{{{
#!/usr/bin/env ruby
require 'bio'

}}}


----

= Phylogenetic Trees =

== Phylogenetic Tree Input and Output ==

=== Reading in of Phylogenetic Trees ===

_... to be done_

{{{
#!/usr/bin/env ruby
require 'bio'

}}}

Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation


=== Writing of Phylogenetic Trees ===

_... to be done_

{{{
#!/usr/bin/env ruby
require 'bio'

}}}

Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation


== Phylogenetic Inference ==

_Currently !BioRuby does not contain wrappers for phylogenetic inference programs, thus I am progress of writing a RAxML wrapper followed by a wrapper for FastME..._

_What about pairwise distance calculation?_


== Maximum Likelihood ==

=== RAxML ===

_... to be done_

{{{
#!/usr/bin/env ruby
require 'bio'

}}}


=== PhyML ===

_... to be done_

{{{
#!/usr/bin/env ruby
require 'bio'

}}}

== Pairwise Distance Based Methods ==

=== FastME ===

_... to be done_

{{{
#!/usr/bin/env ruby
require 'bio'

}}}


=== PHYLIP? ===


== Support Calculation? ==

=== Bootstrap Resampling? ===


----

= Analyzing Phylogenetic Trees =

== PAML ==


== Gene Duplication Inference ==

_need to further test and then import GSoC 'SDI' work..._


== Others? ==