wiki/PhyloBioRuby.wiki

   1 #summary Tutorial for multiple sequence alignments and phylogenetic methods in BioRuby -- under development!
   2
   3
   4
   5 = Introduction =
   6
   7 Tutorial for multiple sequence alignments and phylogenetic methods in !BioRuby -- under development!
   8
   9
  10
  11 = Multiple Sequence Alignments =
  12
  13
  14 == Multiple Sequence Alignment Input and Output ==
  15
  16 === Reading in a Multiple Sequence Alignment from a File ===
  17
  18 _... to be done_
  19
  20 {{{
  21 #!/usr/bin/env ruby
  22 require 'bio'
  23
  24 }}}
  25
  26
  27
  28 === Writing a Multiple Sequence Alignment to a File ===
  29
  30 _... to be done_
  31
  32 {{{
  33 #!/usr/bin/env ruby
  34 require 'bio'
  35
  36 }}}
  37
  38
  39
  40 == Calculating Multiple Sequence Alignments ==
  41
  42 !BioRuby can be used to execute a variety of multiple sequence alignment
  43 programs (such as [http://mafft.cbrc.jp/alignment/software/ MAFFT], [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], [http://www.drive5.com/muscle/ Muscle], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee]).
  44 In the following, examples for using the MAFFT and Muscle are shown.
  45
  46
  47 === MAFFT ===
  48
  49 {{{
  50 #!/usr/bin/env ruby
  51 require 'bio'
  52
  53 # 'seqs' is either an array of sequences or a multiple sequence
  54 # alignment. In general this is read in from a file as described in ?.
  55 # For the purpose of this tutorial, it is generated in code.
  56 seqs = ["KMLFGVVFFFGG",
  57         "LMGGHHF",
  58         "GKKKKGHHHGHRRRGR",
  59         "KKKKGHHHGHRRRGR"]
  60
  61
  62 # Calculates the alignment using the MAFFT program on the local
  63 # machine with options '--maxiterate 1000 --localpair'
  64 # and stores the result in 'report'.
  65 options = ['--maxiterate', '1000', '--localpair']
  66 mafft = Bio::MAFFT.new('path/to/mafft', options)
  67 report = mafft.query_align(seqs)
  68
  69 # Accesses the actual alignment.
  70 align = report.alignment
  71
  72 # Prints each sequence to the console.
  73 align.each { |s| puts s.to_s }
  74
  75 }}}
  76
  77 References:
  78
  79  * Katoh, Toh (2008) "Recent developments in the MAFFT multiple sequence alignment program" Briefings in Bioinformatics 9:286-298
  80
  81  * Katoh, Toh 2010 (2010) "Parallelization of the MAFFT multiple sequence alignment program" Bioinformatics 26:1899-1900
  82
  83
  84
  85 === Muscle ===
  86
  87 {{{
  88 #!/usr/bin/env ruby
  89 require 'bio'
  90
  91 # 'seqs' is either an array of sequences or a multiple sequence
  92 # alignment. In general this is read in from a file as described in ?.
  93 # For the purpose of this tutorial, it is generated in code.
  94 seqs = ["KMLFGVVFFFGG",
  95         "LMGGHHF",
  96         "GKKKKGHHHGHRRRGR",
  97         "KKKKGHHHGHRRRGR"]
  98
  99 # Calculates the alignment using the Muscle program on the local
 100 # machine with options '-quiet -maxiters 64'
 101 # and stores the result in 'report'.
 102 options = ['-quiet', '-maxiters', '64']
 103 muscle = Bio::Muscle.new('path/to/muscle', options)
 104 report = muscle.query_align(seqs)
 105
 106 # Accesses the actual alignment.
 107 align = report.alignment
 108
 109 # Prints each sequence to the console.
 110 align.each { |s| puts s.to_s }
 111
 112 }}}
 113
 114 References:
 115
 116  * Edgar, R.C. (2004) "MUSCLE: multiple sequence alignment with high accuracy and high throughput" Nucleic Acids Res 32(5):1792-1797
 117
 118 === Other Programs ===
 119
 120 [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee] can be used in the same manner as the programs above.
 121
 122
 123 == Manipulating Multiple Sequence Alignments ==
 124
 125 Oftentimes, multiple sequence to be used for phylogenetic inference are 'cleaned up' in some manner. For instance, some researchers prefer to delete columns with more than 50% gaps. The following code is an example of how to do that in !BioRuby.
 126
 127
 128 _... to be done_
 129
 130 {{{
 131 #!/usr/bin/env ruby
 132 require 'bio'
 133
 134 }}}
 135
 136
 137 ----
 138
 139 = Phylogenetic Trees =
 140
 141 == Phylogenetic Tree Input and Output ==
 142
 143 === Reading in of Phylogenetic Trees ===
 144
 145 _... to be done_
 146
 147 {{{
 148 #!/usr/bin/env ruby
 149 require 'bio'
 150
 151 }}}
 152
 153 Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation
 154
 155
 156
 157 === Writing of Phylogenetic Trees ===
 158
 159 _... to be done_
 160
 161 {{{
 162 #!/usr/bin/env ruby
 163 require 'bio'
 164
 165 }}}
 166
 167 Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation
 168
 169
 170
 171 == Phylogenetic Inference ==
 172
 173 _Currently !BioRuby does not contain wrappers for phylogenetic inference programs, thus I am progress of writing a RAxML wrapper followed by a wrapper for FastME..._
 174
 175 _What about pairwise distance calculation?_
 176
 177
 178
 179 == Maximum Likelihood ==
 180
 181 === RAxML ===
 182
 183 _... to be done_
 184
 185 {{{
 186 #!/usr/bin/env ruby
 187 require 'bio'
 188
 189 }}}
 190
 191
 192 === PhyML ===
 193
 194 _... to be done_
 195
 196 {{{
 197 #!/usr/bin/env ruby
 198 require 'bio'
 199
 200 }}}
 201
 202 == Pairwise Distance Based Methods ==
 203
 204 === FastME ===
 205
 206 _... to be done_
 207
 208 {{{
 209 #!/usr/bin/env ruby
 210 require 'bio'
 211
 212 }}}
 213
 214
 215
 216 === PHYLIP? ===
 217
 218
 219
 220 == Support Calculation? ==
 221
 222 === Bootstrap Resampling? ===
 223
 224
 225 ----
 226
 227 = Analyzing Phylogenetic Trees =
 228
 229 == PAML ==
 230
 231
 232 == Gene Duplication Inference ==
 233
 234 _need to further test and then import GSoC 'SDI' work..._
 235
 236
 237 == Others? ==