X-Git-Url: http://source.jalview.org/gitweb/?a=blobdiff_plain;f=wiki%2FPhyloBioRuby.wiki;h=4b6b45ffa63c3c20de0a76453f618aa7b8714b94;hb=9e1353590a88991e7593d38c8e307f713a0f3d5b;hp=0c3686cf6b8344971d57c0ee2e07f26889625991;hpb=4f756dfa268304c6838caf37151f2ea1f3630b57;p=jalview.git diff --git a/wiki/PhyloBioRuby.wiki b/wiki/PhyloBioRuby.wiki index 0c3686c..4b6b45f 100644 --- a/wiki/PhyloBioRuby.wiki +++ b/wiki/PhyloBioRuby.wiki @@ -10,10 +10,10 @@ Tutorial for multiple sequence alignments and phylogenetic methods in [http://bi Eventually, this is expected to be placed on the official !BioRuby page. -Author: [http://www.cmzmasek.net/ Christian M Zmasek], Sanford-Burnham Medical Research Institute +Author: [https://sites.google.com/site/cmzmasek/ Christian Zmasek], Sanford-Burnham Medical Research Institute -Copyright (C) 2011 Christian M Zmasek +Copyright (C) 2011 Christian M Zmasek. All rights reserved. = Multiple Sequence Alignment = @@ -23,6 +23,41 @@ Copyright (C) 2011 Christian M Zmasek === Reading in a Multiple Sequence Alignment from a File === +This automatically determines the format +{{{ +#!/usr/bin/env ruby +require 'bio' + +seq_ary = Array.new +ff = Bio::FlatFile.auto('bcl2.fasta') +ff.each_entry do |entry| + seq_ary.push(entry) + puts entry.entry_id # prints the identifier of the entry + puts entry.definition # prints the definition of the entry + puts entry.seq # prints the sequence data of the entry +end + +# Creates a multiple sequence alignment (possibly unaligned) named +# 'seqs' from array 'seq_ary'. +seqs = Bio::Alignment.new(seq_ary) +seqs.each { |seq| puts seq.to_s } + +# Writes multiple sequence alignment (possibly unaligned) 'seqs' +# to a file in PHYLIP format. +File.open('out0.phylip', 'w') do |f| + f.write(seqs.output(:phylip)) +end + +# Writes multiple sequence alignment (possibly unaligned) 'seqs' +# to a file in FASTA format. +File.open('out0.fasta', 'w') do |f| + f.write(seqs.output(:fasta)) +end +}}} + + +==== ClustalW Format ==== + The following example shows how to read in a *ClustalW*-formatted multiple sequence alignment. {{{ @@ -43,7 +78,48 @@ msa.each do |entry| end }}} - +==== FASTA Format ==== + +The following example shows how to read in a *FASTA*-formatted multiple sequence file. (_This seems a little clumsy, I wonder if there is a more direct way, avoiding the creation of an array.) +{{{ +#!/usr/bin/env ruby +require 'bio' + +# Reads in a FASTA-formatted multiple sequence alignment (which does +# not have to be aligned, though) and stores its sequences in +# array 'seq_ary'. +seq_ary = Array.new +fasta_seqs = Bio::Alignment::MultiFastaFormat.new(File.open('infile.fasta').read) +fasta_seqs.entries.each do |seq| + seq_ary.push(seq) +end + +# Creates a multiple sequence alignment (possibly unaligned) named +# 'seqs' from array 'seq_ary'. +seqs = Bio::Alignment.new(seq_ary) + +# Prints each sequence to the console. +seqs.each { |seq| puts seq.to_s } + +# Writes multiple sequence alignment (possibly unaligned) 'seqs' +# to a file in PHYLIP format. +File.open('outfile.phylip', 'w') do |f| + f.write(seqs.output(:phylip)) +end +}}} + +Relevant API documentation: + + * [http://bioruby.open-bio.org/rdoc/classes/Bio/ClustalW/Report.html Bio::ClustalW::Report] + * [http://bioruby.open-bio.org/rdoc/classes/Bio/Alignment.html Bio::Alignment] + * [http://bioruby.open-bio.org/rdoc/classes/Bio/Sequence.html Bio::Sequence] + +=== Creating a Multiple Sequence Alignment === + + +=== Creating a Multiple Sequence Alignment from a Database === + +? === Writing a Multiple Sequence Alignment to a File === @@ -63,17 +139,17 @@ end ==== Setting the Output Format ==== -The following constants determine the output format. +The following symbols determine the output format: - * ClustalW: `:clustal` - * FASTA: `:fasta` - * PHYLIP interleaved (will truncate sequence names to no more than 10 characters): `:phylip` - * PHYLIP non-interleaved (will truncate sequence names to no more than 10 characters): `:phylipnon` - * MSF: `:msf` - * Molphy: `:molphy` + * `:clustal` for ClustalW + * `:fasta` for FASTA + * `:phylip` for PHYLIP interleaved (will truncate sequence names to no more than 10 characters) + * `:phylipnon` for PHYLIP non-interleaved (will truncate sequence names to no more than 10 characters) + * `:msf` for MSF + * `:molphy` for Molphy -For example, the following writes PHYLIP's non-interleaved format: +For example, the following writes in PHYLIP's non-interleaved format: {{{ f.write(align.output(:phylipnon)) @@ -82,8 +158,6 @@ f.write(align.output(:phylipnon)) === Formatting of Individual Sequences === -_... to be done_ - !BioRuby can format molecular sequences in a variety of formats. Individual sequences can be formatted to (e.g.) Genbank format as shown in the following examples. @@ -97,9 +171,16 @@ For Bio::!FlatFile entries: entry.to_biosequence.output(:genbank) }}} -Constants for available formats are: - * Genbank :genbank - +The following symbols determine the output format: + * `:genbank` for Genbank + * `:embl` for EMBL + * `:fasta` for FASTA + * `:fasta_ncbi` for NCBI-type FASTA + * `:raw` for raw sequence + * `:fastq` for FASTQ (includes quality scores) + * `:fastq_sanger` for Sanger-type FASTQ + * `:fastq_solexa` for Solexa-type FASTQ + * `:fastq_illumina` for Illumina-type FASTQ == Calculating Multiple Sequence Alignments == @@ -364,12 +445,27 @@ Currently no direct support in !BioRuby. === Pairwise Sequence Distance Estimation === +_... to be done_ -=== Optimality Criteria Based Methods === +{{{ +#!/usr/bin/env ruby +require 'bio' + +}}} + + +=== Optimality Criteria Based on Pairwise Distances === ==== Minimal Evolution: FastME ==== +_... to be done_ + +{{{ +#!/usr/bin/env ruby +require 'bio' + +}}} === Algorithmic Methods Based on Pairwise Distances ===