wiki/PhyloBioRuby.wiki

   1 #summary Tutorial for multiple sequence alignments and phylogenetic methods in BioRuby -- under development!
   2
   3
   4
   5 = Introduction =
   6
   7 Tutorial for multiple sequence alignments and phylogenetic methods in !BioRuby -- under development!
   8
   9
  10
  11 = Multiple Sequence Alignments =
  12
  13
  14 == Multiple Sequence Alignment Input and Output ==
  15
  16 === Reading in a Multiple Sequence Alignment from a File ===
  17
  18 Reading in a clustalw formatted multiple sequence alignment:
  19
  20 {{{
  21 #!/usr/bin/env ruby
  22 require 'bio'
  23
  24 # Reads in a clustalw formatted multiple sequence alignment
  25 # from a file named "infile_clustalw.aln" and stores it in 'report'.
  26 report = Bio::ClustalW::Report.new(File.read('infile_clustalw.aln'))
  27
  28 # Accesses the actual alignment.
  29 align = report.alignment
  30
  31 # Goes through all sequences in 'align' and prints the
  32 # actual molecular sequence.
  33 align.each do |entry|
  34   puts entry.seq
  35 end
  36 }}}
  37
  38
  39
  40 === Writing a Multiple Sequence Alignment to a File ===
  41
  42 Writing a multiple sequence alignment in fasta format:
  43
  44 {{{
  45 #!/usr/bin/env ruby
  46 require 'bio'
  47
  48 # Creates a new file named "outfile.fasta" and writes
  49 # multiple sequence alignment 'align' to it in fasta format.
  50 File.open('outfile.fasta', 'w') do |f|
  51   f.write(align.output(:fasta))
  52 end
  53 }}}
  54
  55
  56 Writing a multiple sequence alignment in clustalw format:
  57
  58 {{{
  59 #!/usr/bin/env ruby
  60 require 'bio'
  61
  62 # Creates a new file named "outfile.aln" and writes
  63 # multiple sequence alignment 'align' to it in clustal format.
  64 File.open('outfile.aln', 'w') do |f|
  65   f.write(align.output(:clustal))
  66 end
  67 }}}
  68
  69
  70 == Calculating Multiple Sequence Alignments ==
  71
  72 !BioRuby can be used to execute a variety of multiple sequence alignment
  73 programs (such as [http://mafft.cbrc.jp/alignment/software/ MAFFT], [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], [http://www.drive5.com/muscle/ Muscle], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee]).
  74 In the following, examples for using the MAFFT and Muscle are shown.
  75
  76
  77 === MAFFT ===
  78
  79 The following example uses the MAFFT program to align four sequences
  80 and then prints the result to the screen.
  81 If the path to the MAFFT executable is properly set, `mafft = Bio::MAFFT.new(options)` can be used instead of explicitly giving a path.
  82
  83 n{{{
  84 #!/usr/bin/env ruby
  85 require 'bio'
  86
  87 # 'seqs' is either an array of sequences or a multiple sequence
  88 # alignment. In general this is read in from a file as described in ?.
  89 # For the purpose of this tutorial, it is generated in code.
  90 seqs = ["KMLFGVVFFFGG",
  91         "LMGGHHF",
  92         "GKKKKGHHHGHRRRGR",
  93         "KKKKGHHHGHRRRGR"]
  94
  95
  96 # Calculates the alignment using the MAFFT program on the local
  97 # machine with options '--maxiterate 1000 --localpair'
  98 # and stores the result in 'report'.
  99 options = ['--maxiterate', '1000', '--localpair']
 100 mafft = Bio::MAFFT.new('path/to/mafft', options)
 101 report = mafft.query_align(seqs)
 102
 103 # Accesses the actual alignment.
 104 align = report.alignment
 105
 106 # Prints each sequence to the console.
 107 align.each { |s| puts s.to_s }
 108
 109 }}}
 110
 111 References:
 112
 113  * Katoh, Toh (2008) "Recent developments in the MAFFT multiple sequence alignment program" Briefings in Bioinformatics 9:286-298
 114
 115  * Katoh, Toh 2010 (2010) "Parallelization of the MAFFT multiple sequence alignment program" Bioinformatics 26:1899-1900
 116
 117
 118
 119 === Muscle ===
 120
 121 {{{
 122 #!/usr/bin/env ruby
 123 require 'bio'
 124
 125 # 'seqs' is either an array of sequences or a multiple sequence
 126 # alignment. In general this is read in from a file as described in ?.
 127 # For the purpose of this tutorial, it is generated in code.
 128 seqs = ["KMLFGVVFFFGG",
 129         "LMGGHHF",
 130         "GKKKKGHHHGHRRRGR",
 131         "KKKKGHHHGHRRRGR"]
 132
 133 # Calculates the alignment using the Muscle program on the local
 134 # machine with options '-quiet -maxiters 64'
 135 # and stores the result in 'report'.
 136 options = ['-quiet', '-maxiters', '64']
 137 muscle = Bio::Muscle.new('path/to/muscle', options)
 138 report = muscle.query_align(seqs)
 139
 140 # Accesses the actual alignment.
 141 align = report.alignment
 142
 143 # Prints each sequence to the console.
 144 align.each { |s| puts s.to_s }
 145
 146 }}}
 147
 148 References:
 149
 150  * Edgar, R.C. (2004) "MUSCLE: multiple sequence alignment with high accuracy and high throughput" Nucleic Acids Res 32(5):1792-1797
 151
 152 === Other Programs ===
 153
 154 [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee] can be used in the same manner as the programs above.
 155
 156
 157 == Manipulating Multiple Sequence Alignments ==
 158
 159 Oftentimes, multiple sequence to be used for phylogenetic inference are 'cleaned up' in some manner. For instance, some researchers prefer to delete columns with more than 50% gaps. The following code is an example of how to do that in !BioRuby.
 160
 161
 162 _... to be done_
 163
 164 {{{
 165 #!/usr/bin/env ruby
 166 require 'bio'
 167
 168 }}}
 169
 170
 171 ----
 172
 173 = Phylogenetic Trees =
 174
 175 == Phylogenetic Tree Input and Output ==
 176
 177 === Reading in of Phylogenetic Trees ===
 178
 179 _... to be done_
 180
 181 {{{
 182 #!/usr/bin/env ruby
 183 require 'bio'
 184
 185 }}}
 186
 187 Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation
 188
 189
 190
 191 === Writing of Phylogenetic Trees ===
 192
 193 _... to be done_
 194
 195 {{{
 196 #!/usr/bin/env ruby
 197 require 'bio'
 198
 199 }}}
 200
 201 Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation
 202
 203
 204
 205 == Phylogenetic Inference ==
 206
 207 _Currently !BioRuby does not contain wrappers for phylogenetic inference programs, thus I am progress of writing a RAxML wrapper followed by a wrapper for FastME..._
 208
 209 _What about pairwise distance calculation?_
 210
 211
 212
 213 == Maximum Likelihood ==
 214
 215 === RAxML ===
 216
 217 _... to be done_
 218
 219 {{{
 220 #!/usr/bin/env ruby
 221 require 'bio'
 222
 223 }}}
 224
 225
 226 === PhyML ===
 227
 228 _... to be done_
 229
 230 {{{
 231 #!/usr/bin/env ruby
 232 require 'bio'
 233
 234 }}}
 235
 236 == Pairwise Distance Based Methods ==
 237
 238 === FastME ===
 239
 240 _... to be done_
 241
 242 {{{
 243 #!/usr/bin/env ruby
 244 require 'bio'
 245
 246 }}}
 247
 248
 249
 250 === PHYLIP? ===
 251
 252
 253
 254 == Support Calculation? ==
 255
 256 === Bootstrap Resampling? ===
 257
 258
 259 ----
 260
 261 = Analyzing Phylogenetic Trees =
 262
 263 == PAML ==
 264
 265
 266 == Gene Duplication Inference ==
 267
 268 _need to further test and then import GSoC 'SDI' work..._
 269
 270
 271 == Others? ==