wiki/PhyloBioRuby.wiki

   1 #summary Tutorial for multiple sequence alignments and phylogenetic methods in BioRuby -- under development!
   2
   3
   4
   5 = Introduction =
   6
   7 Tutorial for multiple sequence alignments and phylogenetic methods in !BioRuby -- under development!
   8
   9
  10
  11 = Multiple Sequence Alignments =
  12
  13
  14 == Multiple Sequence Alignment Input and Output ==
  15
  16 === Reading in a Multiple Sequence Alignment from a File ===
  17
  18 Reading in a clustalw formatted multiple sequence alignment:
  19
  20 {{{
  21 #!/usr/bin/env ruby
  22 require 'bio'
  23
  24 # Reads in a clustalw formatted multiple sequence alignment
  25 # from a file named "infile_clustalw.aln" and stores it in 'report'.
  26 report = Bio::ClustalW::Report.new(File.read('infile_clustalw.aln'))
  27
  28 }}}
  29
  30
  31
  32 === Writing a Multiple Sequence Alignment to a File ===
  33
  34 _... to be done_
  35
  36 {{{
  37 #!/usr/bin/env ruby
  38 require 'bio'
  39
  40 }}}
  41
  42
  43
  44 == Calculating Multiple Sequence Alignments ==
  45
  46 !BioRuby can be used to execute a variety of multiple sequence alignment
  47 programs (such as [http://mafft.cbrc.jp/alignment/software/ MAFFT], [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], [http://www.drive5.com/muscle/ Muscle], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee]).
  48 In the following, examples for using the MAFFT and Muscle are shown.
  49
  50
  51 === MAFFT ===
  52
  53 {{{
  54 #!/usr/bin/env ruby
  55 require 'bio'
  56
  57 # 'seqs' is either an array of sequences or a multiple sequence
  58 # alignment. In general this is read in from a file as described in ?.
  59 # For the purpose of this tutorial, it is generated in code.
  60 seqs = ["KMLFGVVFFFGG",
  61         "LMGGHHF",
  62         "GKKKKGHHHGHRRRGR",
  63         "KKKKGHHHGHRRRGR"]
  64
  65
  66 # Calculates the alignment using the MAFFT program on the local
  67 # machine with options '--maxiterate 1000 --localpair'
  68 # and stores the result in 'report'.
  69 options = ['--maxiterate', '1000', '--localpair']
  70 mafft = Bio::MAFFT.new('path/to/mafft', options)
  71 report = mafft.query_align(seqs)
  72
  73 # Accesses the actual alignment.
  74 align = report.alignment
  75
  76 # Prints each sequence to the console.
  77 align.each { |s| puts s.to_s }
  78
  79 }}}
  80
  81 References:
  82
  83  * Katoh, Toh (2008) "Recent developments in the MAFFT multiple sequence alignment program" Briefings in Bioinformatics 9:286-298
  84
  85  * Katoh, Toh 2010 (2010) "Parallelization of the MAFFT multiple sequence alignment program" Bioinformatics 26:1899-1900
  86
  87
  88
  89 === Muscle ===
  90
  91 {{{
  92 #!/usr/bin/env ruby
  93 require 'bio'
  94
  95 # 'seqs' is either an array of sequences or a multiple sequence
  96 # alignment. In general this is read in from a file as described in ?.
  97 # For the purpose of this tutorial, it is generated in code.
  98 seqs = ["KMLFGVVFFFGG",
  99         "LMGGHHF",
 100         "GKKKKGHHHGHRRRGR",
 101         "KKKKGHHHGHRRRGR"]
 102
 103 # Calculates the alignment using the Muscle program on the local
 104 # machine with options '-quiet -maxiters 64'
 105 # and stores the result in 'report'.
 106 options = ['-quiet', '-maxiters', '64']
 107 muscle = Bio::Muscle.new('path/to/muscle', options)
 108 report = muscle.query_align(seqs)
 109
 110 # Accesses the actual alignment.
 111 align = report.alignment
 112
 113 # Prints each sequence to the console.
 114 align.each { |s| puts s.to_s }
 115
 116 }}}
 117
 118 References:
 119
 120  * Edgar, R.C. (2004) "MUSCLE: multiple sequence alignment with high accuracy and high throughput" Nucleic Acids Res 32(5):1792-1797
 121
 122 === Other Programs ===
 123
 124 [http://probcons.stanford.edu/ Probcons], [http://www.clustal.org/ ClustalW], and [http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html T-Coffee] can be used in the same manner as the programs above.
 125
 126
 127 == Manipulating Multiple Sequence Alignments ==
 128
 129 Oftentimes, multiple sequence to be used for phylogenetic inference are 'cleaned up' in some manner. For instance, some researchers prefer to delete columns with more than 50% gaps. The following code is an example of how to do that in !BioRuby.
 130
 131
 132 _... to be done_
 133
 134 {{{
 135 #!/usr/bin/env ruby
 136 require 'bio'
 137
 138 }}}
 139
 140
 141 ----
 142
 143 = Phylogenetic Trees =
 144
 145 == Phylogenetic Tree Input and Output ==
 146
 147 === Reading in of Phylogenetic Trees ===
 148
 149 _... to be done_
 150
 151 {{{
 152 #!/usr/bin/env ruby
 153 require 'bio'
 154
 155 }}}
 156
 157 Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation
 158
 159
 160
 161 === Writing of Phylogenetic Trees ===
 162
 163 _... to be done_
 164
 165 {{{
 166 #!/usr/bin/env ruby
 167 require 'bio'
 168
 169 }}}
 170
 171 Also, see: https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation
 172
 173
 174
 175 == Phylogenetic Inference ==
 176
 177 _Currently !BioRuby does not contain wrappers for phylogenetic inference programs, thus I am progress of writing a RAxML wrapper followed by a wrapper for FastME..._
 178
 179 _What about pairwise distance calculation?_
 180
 181
 182
 183 == Maximum Likelihood ==
 184
 185 === RAxML ===
 186
 187 _... to be done_
 188
 189 {{{
 190 #!/usr/bin/env ruby
 191 require 'bio'
 192
 193 }}}
 194
 195
 196 === PhyML ===
 197
 198 _... to be done_
 199
 200 {{{
 201 #!/usr/bin/env ruby
 202 require 'bio'
 203
 204 }}}
 205
 206 == Pairwise Distance Based Methods ==
 207
 208 === FastME ===
 209
 210 _... to be done_
 211
 212 {{{
 213 #!/usr/bin/env ruby
 214 require 'bio'
 215
 216 }}}
 217
 218
 219
 220 === PHYLIP? ===
 221
 222
 223
 224 == Support Calculation? ==
 225
 226 === Bootstrap Resampling? ===
 227
 228
 229 ----
 230
 231 = Analyzing Phylogenetic Trees =
 232
 233 == PAML ==
 234
 235
 236 == Gene Duplication Inference ==
 237
 238 _need to further test and then import GSoC 'SDI' work..._
 239
 240
 241 == Others? ==