#summary Tutorial for multiple sequence alignments and phylogenetic methods in BioRuby -- under development!


= Introduction =

Tutorial for multiple sequence alignments and phylogenetic methods in !BioRuby -- under development!


= Multiple Sequence Alignments =


== Multiple Sequence Alignment Input and Output ==

=== Reading in a Multiple Sequence Alignment from a File ===

Reading in a clustalw formatted multiple sequence alignment:

{{{
#!/usr/bin/env ruby
require 'bio'

# Reads in a clustalw formatted multiple sequence alignment
# from a file named "infile_clustalw.aln" and stores it in 'report'.
report = Bio::ClustalW::Report.new(File.read('infile_clustalw.aln'))

# Accesses the actual alignment.
align = report.alignment

# Goes through all sequences in 'align' and prints the
# actual molecular sequence.
align.each do |entry|
  puts entry.seq
end
}}}

 
=== Writing a Multiple Sequence Alignment to a File ===

Writing a multiple sequence alignment in fasta format:

{{{
#!/usr/bin/env ruby
require 'bio'

# Creates a new file named "outfile.fasta" and writes
# multiple sequence alignment 'align' to it in fasta format.
File.open('outfile.fasta', 'w') do |f|
  f.write(align.output(:fasta))
end
}}}


Writing a multiple sequence alignment in clustalw format:

{{{
#!/usr/bin/env ruby
require 'bio'

# Creates a new file named "outfile.aln" and writes
# multiple sequence alignment 'align' to it in clustal format.
File.open('outfile.aln', 'w') do |f|
  f.write(align.output(:clustal))
end
}}}


== Calculating Multiple Sequence Alignments ==

!BioRuby can be used to execute a variety of multiple sequence alignment
programs (such as [http://mafft.cbrc.jp/alignment/software/ MAFFT], [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], [http://www.drive5.com/muscle/ Muscle], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee]). 
In the following, examples for using the MAFFT and Muscle are shown.


=== MAFFT ===

The following example uses the MAFFT program to align four sequences
and then prints the result to the screen.
Please note that if the path to the MAFFT executable is properly set `mafft = Bio::MAFFT.new(options)` can be used instead of explicitly indicating the path as in the example. 

{{{
#!/usr/bin/env ruby
require 'bio'

# 'seqs' is either an array of sequences or a multiple sequence 
# alignment. In general this is read in from a file as described in ?.
# For the purpose of this tutorial, it is generated in code.
seqs = ["KMLFGVVFFFGG",
        "LMGGHHF",
        "GKKKKGHHHGHRRRGR",
        "KKKKGHHHGHRRRGR"] 


# Calculates the alignment using the MAFFT program on the local
# machine with options '--maxiterate 1000 --localpair'
# and stores the result in 'report'.
options = ['--maxiterate', '1000', '--localpair']
mafft = Bio::MAFFT.new('path/to/mafft', options)
report = mafft.query_align(seqs)

# Accesses the actual alignment.
align = report.alignment

# Prints each sequence to the console.
align.each { |s| puts s.to_s }

}}}

References:

 * Katoh, Toh (2008) "Recent developments in the MAFFT multiple sequence alignment program" Briefings in Bioinformatics 9:286-298 

 * Katoh, Toh 2010 (2010) "Parallelization of the MAFFT multiple sequence alignment program" Bioinformatics 26:1899-1900 


=== Muscle ===

{{{
#!/usr/bin/env ruby
require 'bio'

# 'seqs' is either an array of sequences or a multiple sequence 
# alignment. In general this is read in from a file as described in ?.
# For the purpose of this tutorial, it is generated in code.
seqs = ["KMLFGVVFFFGG",
        "LMGGHHF",
        "GKKKKGHHHGHRRRGR",
        "KKKKGHHHGHRRRGR"] 

# Calculates the alignment using the Muscle program on the local
# machine with options '-quiet -maxiters 64'
# and stores the result in 'report'.
options = ['-quiet', '-maxiters', '64']
muscle = Bio::Muscle.new('path/to/muscle', options)
report = muscle.query_align(seqs)

# Accesses the actual alignment.
align = report.alignment

# Prints each sequence to the console.
align.each { |s| puts s.to_s }

}}}

References:

 * Edgar, R.C. (2004) "MUSCLE: multiple sequence alignment with high accuracy and high throughput" Nucleic Acids Res 32(5):1792-1797

=== Other Programs ===

[http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee] can be used in the same manner as the programs above. 


== Manipulating Multiple Sequence Alignments ==

Oftentimes, multiple sequence to be used for phylogenetic inference are 'cleaned up' in some manner. For instance, some researchers prefer to delete columns with more than 50% gaps. The following code is an example of how to do that in !BioRuby.


_... to be done_

{{{
#!/usr/bin/env ruby
require 'bio'

}}}


----

= Phylogenetic Trees =

== Phylogenetic Tree Input and Output ==

=== Reading in of Phylogenetic Trees ===

_... to be done_

{{{
#!/usr/bin/env ruby
require 'bio'

}}}

Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation


=== Writing of Phylogenetic Trees ===

_... to be done_

{{{
#!/usr/bin/env ruby
require 'bio'

}}}

Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation


== Phylogenetic Inference ==

_Currently !BioRuby does not contain wrappers for phylogenetic inference programs, thus I am progress of writing a RAxML wrapper followed by a wrapper for FastME..._

_What about pairwise distance calculation?_


== Maximum Likelihood ==

=== RAxML ===

_... to be done_

{{{
#!/usr/bin/env ruby
require 'bio'

}}}


=== PhyML ===

_... to be done_

{{{
#!/usr/bin/env ruby
require 'bio'

}}}

== Pairwise Distance Based Methods ==

=== FastME ===

_... to be done_

{{{
#!/usr/bin/env ruby
require 'bio'

}}}


=== PHYLIP? ===


== Support Calculation? ==

=== Bootstrap Resampling? ===


----

= Analyzing Phylogenetic Trees =

== PAML ==


== Gene Duplication Inference ==

_need to further test and then import GSoC 'SDI' work..._


== Others? ==