wiki/PhyloBioRuby.wiki

   1 #summary Tutorial for multiple sequence alignments and phylogenetic methods in BioRuby -- under development!
   2
   3
   4
   5 = Introduction =
   6
   7 Under development!
   8
   9 Tutorial for multiple sequence alignments and phylogenetic methods in [http://bioruby.open-bio.org/ BioRuby].
  10
  11 Eventually, this is expected to be placed on the official !BioRuby page.
  12
  13 Author: [http://www.cmzmasek.net/ Christian M Zmasek], Sanford-Burnham Medical Research Institute
  14
  15
  16 Copyright (C) 2011 Christian M Zmasek
  17
  18
  19 = Multiple Sequence Alignments =
  20
  21
  22 == Multiple Sequence Alignment Input and Output ==
  23
  24 === Reading in a Multiple Sequence Alignment from a File ===
  25
  26 Reading in a ClustalW-formatted multiple sequence alignment:
  27
  28 {{{
  29 #!/usr/bin/env ruby
  30 require 'bio'
  31
  32 # Reads in a ClustalW-formatted multiple sequence alignment
  33 # from a file named "infile_clustalw.aln" and stores it in 'report'.
  34 report = Bio::ClustalW::Report.new(File.read('infile_clustalw.aln'))
  35
  36 # Accesses the actual alignment.
  37 align = report.alignment
  38
  39 # Goes through all sequences in 'align' and prints the
  40 # actual molecular sequence.
  41 align.each do |entry|
  42   puts entry.seq
  43 end
  44 }}}
  45
  46
  47
  48 === Writing a Multiple Sequence Alignment to a File ===
  49
  50 Writing a multiple sequence alignment in fasta format:
  51
  52 {{{
  53 #!/usr/bin/env ruby
  54 require 'bio'
  55
  56 # Creates a new file named "outfile.fasta" and writes
  57 # multiple sequence alignment 'align' to it in fasta format.
  58 File.open('outfile.fasta', 'w') do |f|
  59   f.write(align.output(:fasta))
  60 end
  61 }}}
  62
  63
  64 Writing a multiple sequence alignment in clustalw format:
  65
  66 {{{
  67 #!/usr/bin/env ruby
  68 require 'bio'
  69
  70 # Creates a new file named "outfile.aln" and writes
  71 # multiple sequence alignment 'align' to it in clustal format.
  72 File.open('outfile.aln', 'w') do |f|
  73   f.write(align.output(:clustal))
  74 end
  75 }}}
  76
  77
  78 == Calculating Multiple Sequence Alignments ==
  79
  80 !BioRuby can be used to execute a variety of multiple sequence alignment
  81 programs (such as [http://mafft.cbrc.jp/alignment/software/ MAFFT], [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], [http://www.drive5.com/muscle/ Muscle], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee]).
  82 In the following, examples for using the MAFFT and Muscle are shown.
  83
  84
  85 === MAFFT ===
  86
  87 The following example uses the MAFFT program to align four sequences
  88 and then prints the result to the screen.
  89 Please note that if the path to the MAFFT executable is properly set `mafft=Bio::MAFFT.new(options)` can be used instead of explicitly indicating the path as in the example.
  90
  91 {{{
  92 #!/usr/bin/env ruby
  93 require 'bio'
  94
  95 # 'seqs' is either an array of sequences or a multiple sequence
  96 # alignment. In general this is read in from a file as described in ?.
  97 # For the purpose of this tutorial, it is generated in code.
  98 seqs = ["KMLFGVVFFFGG",
  99         "LMGGHHF",
 100         "GKKKKGHHHGHRRRGR",
 101         "KKKKGHHHGHRRRGR"]
 102
 103
 104 # Calculates the alignment using the MAFFT program on the local
 105 # machine with options '--maxiterate 1000 --localpair'
 106 # and stores the result in 'report'.
 107 options = ['--maxiterate', '1000', '--localpair']
 108 mafft = Bio::MAFFT.new('path/to/mafft', options)
 109 report = mafft.query_align(seqs)
 110
 111 # Accesses the actual alignment.
 112 align = report.alignment
 113
 114 # Prints each sequence to the console.
 115 align.each { |s| puts s.to_s }
 116
 117 }}}
 118
 119 References:
 120
 121  * Katoh, Toh (2008) "Recent developments in the MAFFT multiple sequence alignment program" Briefings in Bioinformatics 9:286-298
 122
 123  * Katoh, Toh 2010 (2010) "Parallelization of the MAFFT multiple sequence alignment program" Bioinformatics 26:1899-1900
 124
 125
 126
 127 === Muscle ===
 128
 129 {{{
 130 #!/usr/bin/env ruby
 131 require 'bio'
 132
 133 # 'seqs' is either an array of sequences or a multiple sequence
 134 # alignment. In general this is read in from a file as described in ?.
 135 # For the purpose of this tutorial, it is generated in code.
 136 seqs = ["KMLFGVVFFFGG",
 137         "LMGGHHF",
 138         "GKKKKGHHHGHRRRGR",
 139         "KKKKGHHHGHRRRGR"]
 140
 141 # Calculates the alignment using the Muscle program on the local
 142 # machine with options '-quiet -maxiters 64'
 143 # and stores the result in 'report'.
 144 options = ['-quiet', '-maxiters', '64']
 145 muscle = Bio::Muscle.new('path/to/muscle', options)
 146 report = muscle.query_align(seqs)
 147
 148 # Accesses the actual alignment.
 149 align = report.alignment
 150
 151 # Prints each sequence to the console.
 152 align.each { |s| puts s.to_s }
 153
 154 }}}
 155
 156 References:
 157
 158  * Edgar, R.C. (2004) "MUSCLE: multiple sequence alignment with high accuracy and high throughput" Nucleic Acids Res 32(5):1792-1797
 159
 160 === Other Programs ===
 161
 162 [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee] can be used in the same manner as the programs above.
 163
 164
 165 == Manipulating Multiple Sequence Alignments ==
 166
 167 Oftentimes, multiple sequence to be used for phylogenetic inference are 'cleaned up' in some manner. For instance, some researchers prefer to delete columns with more than 50% gaps. The following code is an example of how to do that in !BioRuby.
 168
 169
 170 _... to be done_
 171
 172 {{{
 173 #!/usr/bin/env ruby
 174 require 'bio'
 175
 176 }}}
 177
 178
 179 ----
 180
 181 = Phylogenetic Trees =
 182
 183 == Phylogenetic Tree Input and Output ==
 184
 185 === Reading in of Phylogenetic Trees ===
 186
 187 _... to be done_
 188
 189 {{{
 190 #!/usr/bin/env ruby
 191 require 'bio'
 192
 193 }}}
 194
 195 Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation
 196
 197
 198
 199 === Writing of Phylogenetic Trees ===
 200
 201 _... to be done_
 202
 203 {{{
 204 #!/usr/bin/env ruby
 205 require 'bio'
 206
 207 }}}
 208
 209 Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation
 210
 211
 212
 213 == Phylogenetic Inference ==
 214
 215 _Currently !BioRuby does not contain wrappers for phylogenetic inference programs, thus I am progress of writing a RAxML wrapper followed by a wrapper for FastME..._
 216
 217 _What about pairwise distance calculation?_
 218
 219
 220
 221 == Maximum Likelihood ==
 222
 223 === RAxML ===
 224
 225 _... to be done_
 226
 227 {{{
 228 #!/usr/bin/env ruby
 229 require 'bio'
 230
 231 }}}
 232
 233
 234 === PhyML ===
 235
 236 _... to be done_
 237
 238 {{{
 239 #!/usr/bin/env ruby
 240 require 'bio'
 241
 242 }}}
 243
 244 == Pairwise Distance Based Methods ==
 245
 246 === FastME ===
 247
 248 _... to be done_
 249
 250 {{{
 251 #!/usr/bin/env ruby
 252 require 'bio'
 253
 254 }}}
 255
 256
 257
 258 === PHYLIP? ===
 259
 260
 261
 262 == Support Calculation? ==
 263
 264 === Bootstrap Resampling? ===
 265
 266
 267 ----
 268
 269 = Analyzing Phylogenetic Trees =
 270
 271 == PAML ==
 272
 273
 274 == Gene Duplication Inference ==
 275
 276 _need to further test and then import GSoC 'SDI' work..._
 277
 278
 279 == Others? ==