wiki/PhyloBioRuby.wiki

   1 #summary Tutorial for multiple sequence alignments and phylogenetic methods in BioRuby -- under development!
   2
   3
   4
   5 = Introduction =
   6
   7 Tutorial for multiple sequence alignments and phylogenetic methods in !BioRuby -- under development!
   8
   9
  10
  11 = Multiple Sequence Alignments =
  12
  13
  14 == Multiple Sequence Alignment Input and Output ==
  15
  16 === Reading in a Multiple Sequence Alignment from a File ===
  17
  18 Reading in a clustalw formatted multiple sequence alignment:
  19
  20 {{{
  21 #!/usr/bin/env ruby
  22 require 'bio'
  23
  24 # Reads in a clustalw formatted multiple sequence alignment
  25 # from a file named "infile_clustalw.aln" and stores it in 'report'.
  26 report = Bio::ClustalW::Report.new(File.read('infile_clustalw.aln'))
  27
  28 # Accesses the actual alignment.
  29 align = report.alignment
  30
  31 # Goes through all sequences in 'align' and prints the
  32 # actual molecular sequence.
  33 align.each do |entry|
  34   puts entry.seq
  35 end
  36 }}}
  37
  38
  39
  40 === Writing a Multiple Sequence Alignment to a File ===
  41
  42 Writing a multiple sequence alignment in fasta format:
  43
  44 {{{
  45 #!/usr/bin/env ruby
  46 require 'bio'
  47
  48 # Creates a new file named "outfile.fasta" and writes
  49 # multiple sequence alignment 'align' to it in fasta format.
  50 File.open('outfile.fasta', 'w') do |f|
  51   f.write(align.output(:fasta))
  52 end
  53 }}}
  54
  55
  56 Writing a multiple sequence alignment in clustalw format:
  57
  58 {{{
  59 #!/usr/bin/env ruby
  60 require 'bio'
  61
  62 # Creates a new file named "outfile.aln" and writes
  63 # multiple sequence alignment 'align' to it in clustal format.
  64 File.open('outfile.aln', 'w') do |f|
  65   f.write(align.output(:clustal))
  66 end
  67 }}}
  68
  69
  70 == Calculating Multiple Sequence Alignments ==
  71
  72 !BioRuby can be used to execute a variety of multiple sequence alignment
  73 programs (such as [http://mafft.cbrc.jp/alignment/software/ MAFFT], [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], [http://www.drive5.com/muscle/ Muscle], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee]).
  74 In the following, examples for using the MAFFT and Muscle are shown.
  75
  76
  77 === MAFFT ===
  78
  79 {{{
  80 #!/usr/bin/env ruby
  81 require 'bio'
  82
  83 # 'seqs' is either an array of sequences or a multiple sequence
  84 # alignment. In general this is read in from a file as described in ?.
  85 # For the purpose of this tutorial, it is generated in code.
  86 seqs = ["KMLFGVVFFFGG",
  87         "LMGGHHF",
  88         "GKKKKGHHHGHRRRGR",
  89         "KKKKGHHHGHRRRGR"]
  90
  91
  92 # Calculates the alignment using the MAFFT program on the local
  93 # machine with options '--maxiterate 1000 --localpair'
  94 # and stores the result in 'report'.
  95 options = ['--maxiterate', '1000', '--localpair']
  96 mafft = Bio::MAFFT.new('path/to/mafft', options)
  97 report = mafft.query_align(seqs)
  98
  99 # Accesses the actual alignment.
 100 align = report.alignment
 101
 102 # Prints each sequence to the console.
 103 align.each { |s| puts s.to_s }
 104
 105 }}}
 106
 107 References:
 108
 109  * Katoh, Toh (2008) "Recent developments in the MAFFT multiple sequence alignment program" Briefings in Bioinformatics 9:286-298
 110
 111  * Katoh, Toh 2010 (2010) "Parallelization of the MAFFT multiple sequence alignment program" Bioinformatics 26:1899-1900
 112
 113
 114
 115 === Muscle ===
 116
 117 {{{
 118 #!/usr/bin/env ruby
 119 require 'bio'
 120
 121 # 'seqs' is either an array of sequences or a multiple sequence
 122 # alignment. In general this is read in from a file as described in ?.
 123 # For the purpose of this tutorial, it is generated in code.
 124 seqs = ["KMLFGVVFFFGG",
 125         "LMGGHHF",
 126         "GKKKKGHHHGHRRRGR",
 127         "KKKKGHHHGHRRRGR"]
 128
 129 # Calculates the alignment using the Muscle program on the local
 130 # machine with options '-quiet -maxiters 64'
 131 # and stores the result in 'report'.
 132 options = ['-quiet', '-maxiters', '64']
 133 muscle = Bio::Muscle.new('path/to/muscle', options)
 134 report = muscle.query_align(seqs)
 135
 136 # Accesses the actual alignment.
 137 align = report.alignment
 138
 139 # Prints each sequence to the console.
 140 align.each { |s| puts s.to_s }
 141
 142 }}}
 143
 144 References:
 145
 146  * Edgar, R.C. (2004) "MUSCLE: multiple sequence alignment with high accuracy and high throughput" Nucleic Acids Res 32(5):1792-1797
 147
 148 === Other Programs ===
 149
 150 [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee] can be used in the same manner as the programs above.
 151
 152
 153 == Manipulating Multiple Sequence Alignments ==
 154
 155 Oftentimes, multiple sequence to be used for phylogenetic inference are 'cleaned up' in some manner. For instance, some researchers prefer to delete columns with more than 50% gaps. The following code is an example of how to do that in !BioRuby.
 156
 157
 158 _... to be done_
 159
 160 {{{
 161 #!/usr/bin/env ruby
 162 require 'bio'
 163
 164 }}}
 165
 166
 167 ----
 168
 169 = Phylogenetic Trees =
 170
 171 == Phylogenetic Tree Input and Output ==
 172
 173 === Reading in of Phylogenetic Trees ===
 174
 175 _... to be done_
 176
 177 {{{
 178 #!/usr/bin/env ruby
 179 require 'bio'
 180
 181 }}}
 182
 183 Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation
 184
 185
 186
 187 === Writing of Phylogenetic Trees ===
 188
 189 _... to be done_
 190
 191 {{{
 192 #!/usr/bin/env ruby
 193 require 'bio'
 194
 195 }}}
 196
 197 Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation
 198
 199
 200
 201 == Phylogenetic Inference ==
 202
 203 _Currently !BioRuby does not contain wrappers for phylogenetic inference programs, thus I am progress of writing a RAxML wrapper followed by a wrapper for FastME..._
 204
 205 _What about pairwise distance calculation?_
 206
 207
 208
 209 == Maximum Likelihood ==
 210
 211 === RAxML ===
 212
 213 _... to be done_
 214
 215 {{{
 216 #!/usr/bin/env ruby
 217 require 'bio'
 218
 219 }}}
 220
 221
 222 === PhyML ===
 223
 224 _... to be done_
 225
 226 {{{
 227 #!/usr/bin/env ruby
 228 require 'bio'
 229
 230 }}}
 231
 232 == Pairwise Distance Based Methods ==
 233
 234 === FastME ===
 235
 236 _... to be done_
 237
 238 {{{
 239 #!/usr/bin/env ruby
 240 require 'bio'
 241
 242 }}}
 243
 244
 245
 246 === PHYLIP? ===
 247
 248
 249
 250 == Support Calculation? ==
 251
 252 === Bootstrap Resampling? ===
 253
 254
 255 ----
 256
 257 = Analyzing Phylogenetic Trees =
 258
 259 == PAML ==
 260
 261
 262 == Gene Duplication Inference ==
 263
 264 _need to further test and then import GSoC 'SDI' work..._
 265
 266
 267 == Others? ==