#summary Tutorial for multiple sequence alignments and phylogenetic methods in BioRuby -- under development! = Introduction = Tutorial for multiple sequence alignments and phylogenetic methods in !BioRuby -- under development! = Multiple Sequence Alignments = == Multiple Sequence Alignment Input and Output == === Reading in a Multiple Sequence Alignment from a File === Reading in a clustalw formatted multiple sequence alignment: {{{ #!/usr/bin/env ruby require 'bio' # Reads in a clustalw formatted multiple sequence alignment # from a file named "infile_clustalw.aln" and stores it in 'report'. report = Bio::ClustalW::Report.new(File.read('infile_clustalw.aln')) # Accesses the actual alignment. align = report.alignment # Goes through all sequences in 'align' and prints the # actual molecular sequence. align.each do |entry| puts entry.seq end }}} === Writing a Multiple Sequence Alignment to a File === Writing a multiple sequence alignment in fasta format: {{{ #!/usr/bin/env ruby require 'bio' # Creates a new file named "outfile.fasta" and writes # multiple sequence alignment 'align' to it in fasta format. File.open('outfile.fasta', 'w') do |f| f.write(align.output(:fasta)) end }}} Writing a multiple sequence alignment in clustalw format: {{{ #!/usr/bin/env ruby require 'bio' # Creates a new file named "outfile.aln" and writes # multiple sequence alignment 'align' to it in clustal format. File.open('outfile.aln', 'w') do |f| f.write(align.output(:clustal)) end }}} == Calculating Multiple Sequence Alignments == !BioRuby can be used to execute a variety of multiple sequence alignment programs (such as [http://mafft.cbrc.jp/alignment/software/ MAFFT], [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], [http://www.drive5.com/muscle/ Muscle], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee]). In the following, examples for using the MAFFT and Muscle are shown. === MAFFT === The following example uses the MAFFT program to align four sequences and then prints the result to the screen. If the path to the MAFFT executable is properly set, `mafft = Bio::MAFFT.new(options)` can be used instead of explicitly giving a path. n{{{ #!/usr/bin/env ruby require 'bio' # 'seqs' is either an array of sequences or a multiple sequence # alignment. In general this is read in from a file as described in ?. # For the purpose of this tutorial, it is generated in code. seqs = ["KMLFGVVFFFGG", "LMGGHHF", "GKKKKGHHHGHRRRGR", "KKKKGHHHGHRRRGR"] # Calculates the alignment using the MAFFT program on the local # machine with options '--maxiterate 1000 --localpair' # and stores the result in 'report'. options = ['--maxiterate', '1000', '--localpair'] mafft = Bio::MAFFT.new('path/to/mafft', options) report = mafft.query_align(seqs) # Accesses the actual alignment. align = report.alignment # Prints each sequence to the console. align.each { |s| puts s.to_s } }}} References: * Katoh, Toh (2008) "Recent developments in the MAFFT multiple sequence alignment program" Briefings in Bioinformatics 9:286-298 * Katoh, Toh 2010 (2010) "Parallelization of the MAFFT multiple sequence alignment program" Bioinformatics 26:1899-1900 === Muscle === {{{ #!/usr/bin/env ruby require 'bio' # 'seqs' is either an array of sequences or a multiple sequence # alignment. In general this is read in from a file as described in ?. # For the purpose of this tutorial, it is generated in code. seqs = ["KMLFGVVFFFGG", "LMGGHHF", "GKKKKGHHHGHRRRGR", "KKKKGHHHGHRRRGR"] # Calculates the alignment using the Muscle program on the local # machine with options '-quiet -maxiters 64' # and stores the result in 'report'. options = ['-quiet', '-maxiters', '64'] muscle = Bio::Muscle.new('path/to/muscle', options) report = muscle.query_align(seqs) # Accesses the actual alignment. align = report.alignment # Prints each sequence to the console. align.each { |s| puts s.to_s } }}} References: * Edgar, R.C. (2004) "MUSCLE: multiple sequence alignment with high accuracy and high throughput" Nucleic Acids Res 32(5):1792-1797 === Other Programs === [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee] can be used in the same manner as the programs above. == Manipulating Multiple Sequence Alignments == Oftentimes, multiple sequence to be used for phylogenetic inference are 'cleaned up' in some manner. For instance, some researchers prefer to delete columns with more than 50% gaps. The following code is an example of how to do that in !BioRuby. _... to be done_ {{{ #!/usr/bin/env ruby require 'bio' }}} ---- = Phylogenetic Trees = == Phylogenetic Tree Input and Output == === Reading in of Phylogenetic Trees === _... to be done_ {{{ #!/usr/bin/env ruby require 'bio' }}} Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation === Writing of Phylogenetic Trees === _... to be done_ {{{ #!/usr/bin/env ruby require 'bio' }}} Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation == Phylogenetic Inference == _Currently !BioRuby does not contain wrappers for phylogenetic inference programs, thus I am progress of writing a RAxML wrapper followed by a wrapper for FastME..._ _What about pairwise distance calculation?_ == Maximum Likelihood == === RAxML === _... to be done_ {{{ #!/usr/bin/env ruby require 'bio' }}} === PhyML === _... to be done_ {{{ #!/usr/bin/env ruby require 'bio' }}} == Pairwise Distance Based Methods == === FastME === _... to be done_ {{{ #!/usr/bin/env ruby require 'bio' }}} === PHYLIP? === == Support Calculation? == === Bootstrap Resampling? === ---- = Analyzing Phylogenetic Trees = == PAML == == Gene Duplication Inference == _need to further test and then import GSoC 'SDI' work..._ == Others? ==