X-Git-Url: http://source.jalview.org/gitweb/?a=blobdiff_plain;f=wiki%2FPhyloBioRuby.wiki;h=a3be9609e4686c7e01d25b590957af1e70c44b88;hb=3b13b835acf5bc4ed44cf28682a846e748f9b46b;hp=44136a654f67c20a3b4c1b0f0b54a27345d076d9;hpb=bacc8852471ae415fe71942d006671f8ead35e4f;p=jalview.git

diff --git a/wiki/PhyloBioRuby.wiki b/wiki/PhyloBioRuby.wiki
index 44136a6..a3be960 100644
--- a/wiki/PhyloBioRuby.wiki
+++ b/wiki/PhyloBioRuby.wiki
@@ -23,6 +23,18 @@ Copyright (C) 2011 Christian M Zmasek. All rights reserved.
 
 === Reading in a Multiple Sequence Alignment from a File ===
 
+This automatically determines the format
+{{{
+ff = Bio::FlatFile.auto('bcl2.fasta')
+ff.each_entry do |entry|
+  puts entry.entry_id          # identifier of the entry
+  puts entry.definition        # definition of the entry
+  puts entry.seq               # sequence data of the entry
+end
+}}}
+
+==== ClustalW Format ====
+
 The following example shows how to read in a *ClustalW*-formatted multiple sequence alignment.
 
 {{{
@@ -43,7 +55,41 @@ msa.each do |entry|
 end
 }}}
 
- 
+==== FASTA Format ====
+
+The following example shows how to read in a *FASTA*-formatted multiple sequence file. (_This seems a little clumsy, I wonder if there is a more direct way, avoiding the creation of an array.)
+{{{
+#!/usr/bin/env ruby
+require 'bio'
+
+# Reads in a FASTA-formatted multiple sequence alignment (which does
+# not have to be aligned, though) and stores its sequences in
+# array 'seq_ary'.
+seq_ary = Array.new
+fasta_seqs = Bio::Alignment::MultiFastaFormat.new(File.open('infile.fasta').read)
+fasta_seqs.entries.each do |seq|
+  seq_ary.push(seq)
+end
+
+# Creates a multiple sequence alignment (possibly unaligned) named
+# 'seqs' from array 'seq_ary'.
+seqs = Bio::Alignment.new(seq_ary)
+
+# Prints each sequence to the console.
+seqs.each { |seq| puts seq.to_s }
+
+# Writes multiple sequence alignment (possibly unaligned) 'seqs'
+# to a file in PHYLIP format.
+File.open('outfile.phylip', 'w') do |f|
+  f.write(seqs.output(:phylip))
+end
+}}}
+
+Relevant API documentation:
+
+ * [http://bioruby.open-bio.org/rdoc/classes/Bio/ClustalW/Report.html Bio::ClustalW::Report]
+ * [http://bioruby.open-bio.org/rdoc/classes/Bio/Alignment.html Bio::Alignment]
+ * [http://bioruby.open-bio.org/rdoc/classes/Bio/Sequence.html Bio::Sequence]
 
 === Writing a Multiple Sequence Alignment to a File ===
 
@@ -82,8 +128,6 @@ f.write(align.output(:phylipnon))
 
 === Formatting of Individual Sequences ===
 
-_... to be done_
-
 !BioRuby can format molecular sequences in a variety of formats.
 Individual sequences can be formatted to (e.g.) Genbank format as shown in the following examples.
 
@@ -101,14 +145,12 @@ The following symbols determine the output format:
   * `:genbank` for Genbank
   * `:embl` for EMBL
   * `:fasta` for FASTA
-  * `:fasta_ncbi` for 
-  * `:raw` for
-  * `:fastq` for
-  * `:fastq_sanger` for
-  * `:fastq_solexa` for
-  * `:fastq_illumina` for
-  * `:fasta_numeric` for
-  * `:qual` for
+  * `:fasta_ncbi` for NCBI-type FASTA
+  * `:raw` for raw sequence
+  * `:fastq` for FASTQ (includes quality scores)
+  * `:fastq_sanger` for Sanger-type FASTQ 
+  * `:fastq_solexa` for Solexa-type FASTQ 
+  * `:fastq_illumina` for Illumina-type FASTQ 
 
 == Calculating Multiple Sequence Alignments ==