1 #summary Tutorial for multiple sequence alignments and phylogenetic methods in BioRuby -- under development!
7 Tutorial for multiple sequence alignments and phylogenetic methods in !BioRuby -- under development!
11 = Multiple Sequence Alignments =
14 == Multiple Sequence Alignment Input and Output ==
16 === Reading in a Multiple Sequence Alignment from a File ===
28 === Writing a Multiple Sequence Alignment to a File ===
40 == Calculating Multiple Sequence Alignments ==
42 !BioRuby can be used to execute a variety of multiple sequence alignment
43 programs (such as [http://mafft.cbrc.jp/alignment/software/ MAFFT], [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], [http://www.drive5.com/muscle/ Muscle], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee]).
44 In the following, examples for using the MAFFT and Muscle are shown.
53 # 'seqs' is either an array of sequences or a multiple sequence
54 # alignment. In general this is read in from a file as described in ?.
55 # For the purpose of this tutorial, it is generated in code.
56 seqs = ["KMLFGVVFFFGG",
62 # Calculates the alignment using the MAFFT program on the local
63 # machine with options '--maxiterate 1000 --localpair'
64 # and stores the result in 'report'.
65 options = ['--maxiterate', '1000', '--localpair']
66 mafft = Bio::MAFFT.new('path/to/mafft', options)
67 report = mafft.query_align(seqs)
69 # Accesses the actual alignment.
70 align = report.alignment
72 # Prints each sequence to the console.
73 align.each { |s| puts s.to_s }
79 * Katoh, Toh (2008) "Recent developments in the MAFFT multiple sequence alignment program" Briefings in Bioinformatics 9:286-298
81 * Katoh, Toh 2010 (2010) "Parallelization of the MAFFT multiple sequence alignment program" Bioinformatics 26:1899-1900
91 # 'seqs' is either an array of sequences or a multiple sequence
92 # alignment. In general this is read in from a file as described in ?.
93 # For the purpose of this tutorial, it is generated in code.
94 seqs = ["KMLFGVVFFFGG",
99 # Calculates the alignment using the Muscle program on the local
100 # machine with options '-quiet -maxiters 64'
101 # and stores the result in 'report'.
102 options = ['-quiet', '-maxiters', '64']
103 muscle = Bio::Muscle.new('path/to/muscle', options)
104 report = muscle.query_align(seqs)
106 # Accesses the actual alignment.
107 align = report.alignment
109 # Prints each sequence to the console.
110 align.each { |s| puts s.to_s }
116 * Edgar, R.C. (2004) "MUSCLE: multiple sequence alignment with high accuracy and high throughput" Nucleic Acids Res 32(5):1792-1797
118 === Other Programs ===
120 [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee] can be used in the same manner as the programs above.
123 == Manipulating Multiple Sequence Alignments ==
125 Oftentimes, multiple sequence to be used for phylogenetic inference are 'cleaned up' in some manner. For instance, some researchers prefer to delete columns with more than 50% gaps. The following code is an example of how to do that in !BioRuby.
139 = Phylogenetic Trees =
141 == Phylogenetic Tree Input and Output ==
143 === Reading in of Phylogenetic Trees ===
153 Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation
157 === Writing of Phylogenetic Trees ===
167 Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation
171 == Phylogenetic Inference ==
173 _Currently !BioRuby does not contain wrappers for phylogenetic inference programs, thus I am progress of writing a RAxML wrapper followed by a wrapper for FastME..._
175 _What about pairwise distance calculation?_
179 == Maximum Likelihood ==
202 == Pairwise Distance Based Methods ==
220 == Support Calculation? ==
222 === Bootstrap Resampling? ===
227 = Analyzing Phylogenetic Trees =
232 == Gene Duplication Inference ==
234 _need to further test and then import GSoC 'SDI' work..._