JAL-2805 JAL-2847 JAL-281 getTreeFile made public

[jalview.git] / wiki / PhyloBioRuby.wiki
diff --git a/wiki/PhyloBioRuby.wiki b/wiki/PhyloBioRuby.wiki

index 0c3686c..4b6b45f 100644 (file)
--- a/wiki/PhyloBioRuby.wiki
+++ b/wiki/PhyloBioRuby.wiki
@@ -10,10 +10,10 @@ Tutorial for multiple sequence alignments and phylogenetic methods in [http://bi
  
  Eventually, this is expected to be placed on the official !BioRuby page.
  
-Author: [http://www.cmzmasek.net/ Christian M Zmasek], Sanford-Burnham Medical Research Institute
+Author: [https://sites.google.com/site/cmzmasek/ Christian Zmasek], Sanford-Burnham Medical Research Institute
  
   
-Copyright (C) 2011 Christian M Zmasek
+Copyright (C) 2011 Christian M Zmasek. All rights reserved.
  
  
  = Multiple Sequence Alignment =
@@ -23,6 +23,41 @@ Copyright (C) 2011 Christian M Zmasek
  
  === Reading in a Multiple Sequence Alignment from a File ===
  
+This automatically determines the format
+{{{
+#!/usr/bin/env ruby
+require 'bio'
+
+seq_ary = Array.new
+ff = Bio::FlatFile.auto('bcl2.fasta')
+ff.each_entry do |entry|
+  seq_ary.push(entry)
+  puts entry.entry_id          # prints the identifier of the entry
+  puts entry.definition        # prints the definition of the entry
+  puts entry.seq               # prints the sequence data of the entry
+end
+
+# Creates a multiple sequence alignment (possibly unaligned) named
+# 'seqs' from array 'seq_ary'.
+seqs = Bio::Alignment.new(seq_ary)
+seqs.each { |seq| puts seq.to_s }
+
+# Writes multiple sequence alignment (possibly unaligned) 'seqs'
+# to a file in PHYLIP format.
+File.open('out0.phylip', 'w') do |f|
+  f.write(seqs.output(:phylip))
+end
+
+# Writes multiple sequence alignment (possibly unaligned) 'seqs'
+# to a file in FASTA format.
+File.open('out0.fasta', 'w') do |f|
+  f.write(seqs.output(:fasta))
+end
+}}}
+
+
+==== ClustalW Format ====
+
  The following example shows how to read in a *ClustalW*-formatted multiple sequence alignment.
  
  {{{
@@ -43,7 +78,48 @@ msa.each do |entry|
  end
  }}}
  
- 
+==== FASTA Format ====
+
+The following example shows how to read in a *FASTA*-formatted multiple sequence file. (_This seems a little clumsy, I wonder if there is a more direct way, avoiding the creation of an array.)
+{{{
+#!/usr/bin/env ruby
+require 'bio'
+
+# Reads in a FASTA-formatted multiple sequence alignment (which does
+# not have to be aligned, though) and stores its sequences in
+# array 'seq_ary'.
+seq_ary = Array.new
+fasta_seqs = Bio::Alignment::MultiFastaFormat.new(File.open('infile.fasta').read)
+fasta_seqs.entries.each do |seq|
+  seq_ary.push(seq)
+end
+
+# Creates a multiple sequence alignment (possibly unaligned) named
+# 'seqs' from array 'seq_ary'.
+seqs = Bio::Alignment.new(seq_ary)
+
+# Prints each sequence to the console.
+seqs.each { |seq| puts seq.to_s }
+
+# Writes multiple sequence alignment (possibly unaligned) 'seqs'
+# to a file in PHYLIP format.
+File.open('outfile.phylip', 'w') do |f|
+  f.write(seqs.output(:phylip))
+end
+}}}
+
+Relevant API documentation:
+
+ * [http://bioruby.open-bio.org/rdoc/classes/Bio/ClustalW/Report.html Bio::ClustalW::Report]
+ * [http://bioruby.open-bio.org/rdoc/classes/Bio/Alignment.html Bio::Alignment]
+ * [http://bioruby.open-bio.org/rdoc/classes/Bio/Sequence.html Bio::Sequence]
+
+=== Creating a Multiple Sequence Alignment ===
+
+
+=== Creating a Multiple Sequence Alignment from a Database ===
+
+?
  
  === Writing a Multiple Sequence Alignment to a File ===
  
@@ -63,17 +139,17 @@ end
  
  ==== Setting the Output Format ====
  
-The following constants determine the output format.
+The following symbols determine the output format:
  
-  * ClustalW: `:clustal`
-  * FASTA:    `:fasta`
-  * PHYLIP interleaved (will truncate sequence names to no more than 10 characters): `:phylip`
-  * PHYLIP non-interleaved (will truncate sequence names to no more than 10 characters): `:phylipnon`
-  * MSF: `:msf`
-  * Molphy: `:molphy`
+  * `:clustal` for ClustalW
+  * `:fasta` for FASTA
+  * `:phylip` for PHYLIP interleaved (will truncate sequence names to no more than 10 characters)
+  * `:phylipnon` for PHYLIP non-interleaved (will truncate sequence names to no more than 10 characters)
+  * `:msf` for MSF
+  * `:molphy` for Molphy
  
  
-For example, the following writes PHYLIP's non-interleaved format:
+For example, the following writes in PHYLIP's non-interleaved format:
  
  {{{
  f.write(align.output(:phylipnon))
@@ -82,8 +158,6 @@ f.write(align.output(:phylipnon))
  
  === Formatting of Individual Sequences ===
  
-_... to be done_
-
  !BioRuby can format molecular sequences in a variety of formats.
  Individual sequences can be formatted to (e.g.) Genbank format as shown in the following examples.
  
@@ -97,9 +171,16 @@ For Bio::!FlatFile entries:
  entry.to_biosequence.output(:genbank)
  }}}
  
-Constants for available formats are:
-  * Genbank :genbank
-
+The following symbols determine the output format:
+  * `:genbank` for Genbank
+  * `:embl` for EMBL
+  * `:fasta` for FASTA
+  * `:fasta_ncbi` for NCBI-type FASTA
+  * `:raw` for raw sequence
+  * `:fastq` for FASTQ (includes quality scores)
+  * `:fastq_sanger` for Sanger-type FASTQ 
+  * `:fastq_solexa` for Solexa-type FASTQ 
+  * `:fastq_illumina` for Illumina-type FASTQ 
  
  == Calculating Multiple Sequence Alignments ==
  
@@ -364,12 +445,27 @@ Currently no direct support in !BioRuby.
  
  === Pairwise Sequence Distance Estimation ===
  
+_... to be done_
  
-=== Optimality Criteria Based Methods ===
+{{{
+#!/usr/bin/env ruby
+require 'bio'
+
+}}}
+
+
+=== Optimality Criteria Based on Pairwise Distances ===
  
  
  ==== Minimal Evolution: FastME ====
  
+_... to be done_
+
+{{{
+#!/usr/bin/env ruby
+require 'bio'
+
+}}}
  
  === Algorithmic Methods Based on Pairwise Distances ===