From: jprocter Date: Tue, 8 Mar 2005 16:42:01 +0000 (+0000) Subject: more precise methods information in help description. X-Git-Tag: Release_2_0~619 X-Git-Url: http://source.jalview.org/gitweb/?a=commitdiff_plain;h=998ae297436877c6086f657e0a02c376cb5eb4dd;p=jalview.git more precise methods information in help description. --- diff --git a/help/help.jhm b/help/help.jhm index 0bf7ee8..c843321 100755 --- a/help/help.jhm +++ b/help/help.jhm @@ -16,7 +16,6 @@ - diff --git a/help/helpIndex.xml b/help/helpIndex.xml index d38cad8..62ea52f 100755 --- a/help/helpIndex.xml +++ b/help/helpIndex.xml @@ -8,8 +8,8 @@ - - + + diff --git a/help/html/calculations/pca.html b/help/html/calculations/pca.html index 12fbab8..e824d7d 100755 --- a/help/html/calculations/pca.html +++ b/help/html/calculations/pca.html @@ -2,28 +2,55 @@ Principal Component Analysis

Principal Component Analysis

-

This is a method of clustering sequences based on the method developed by G. - Casari, C. Sander and A. Valencia. Structural Biology volume 2, no. 2, February - 1995 . Extra information can also be found at the SeqSpace server at the EBI. -
- The version implemented here only looks at the clustering of whole sequences - and not individual positions in the alignment to help identify functional residues. - For large alignments plans are afoot to implement a web service to do this 'residue - space' PCA remotely.

-

When the Principal component analysis option is selected all the sequences - ( or just the selected ones) are used in the calculation and for large numbers - of sequences this could take quite a time. When the calculation is finished - a new window is displayed showing the projections of the sequences along the - 2nd, 3rd and 4th vectors giving a 3dimensional view of how the sequences cluster. +

This calculation creates a spatial representation of the +similarities within a selected group, or all of the sequences in +an alignment. After the calculation finishes, a 3D viewer displays the +set of sequences as points in 'similarity space', and similar +sequences tend to lie near each other in the space.

+

Note: The calculation is computationally expensive, and may fail for very large sets of sequences - + usually because the JVM has run out of memory. The next release of + Jalview release will execute this calculation through a web service.

+

Principal components analysis is a technique for examining the +structure of complex datasets. The components are a set of dimensions +formed from the measured values in the dataset, and the principle +component is the one with the greatest magnitude, or length. The +sets of measurements that differ the most should lie at either end of +this principle axis, and the other axes correspond to less extreme +patterns of variation in the dataset.

-

This 3d view can be rotated by holding the left mouse button down in the PCA - window and moving it. The user can also zoom in and out by using the up and - down arrow keys.

-

Individual points can be selected using the mouse and selected sequences show - up green in the PCA window and the usual grey background/white text in the alignment - and tree windows.

-

Different eigenvectors can be used to do the projection by changing the selected - dimensions in the 3 menus underneath the 3d window.
+ +

In this case, the components are generated by an eigenvector +decomposition of the matrix formed from the sum of BLOSUM scores at +each aligned position between each pair of sequences. The basic method +is described in the paper by G. Casari, C. Sander and +A. Valencia. Structural Biology volume 2, no. 2, February 1995 (pubmed) + and implemented at the SeqSpace server (http://industry.ebi.ac.uk/SeqSpace) at the EBI.

+ +

The PCA Viewer

+

This is an interactive display of the sequences positioned within + the similarity space. The colour of each sequence point is the same + as the sequence group coloring, white if no colour has been + defined for the sequence, and green if the sequence is part of a + the currently selected group. +

+

The 3d view can be rotated by dragging the mouse with the + left mouse button pressed. The view can also be + zoomed in and out with the up and down arrow + keys.

+

A tool tip gives the sequence ID corresponding to a point in the + space, and clicking a point toggles the selection of the + corresponding sequence in the alignment window. Rectanglar region + based selection is also possible, by holding the 's' key whilst + left-clicking and dragging the mouse over the display. +

+

Initially, the display shows the first three components of the + similarity space, but any eigenvector can be used by changing the selected + dimension for the x, y, or z axis through each ones menu located + below the 3d display. +

+ diff --git a/help/html/calculations/tree.html b/help/html/calculations/tree.html index 16d7bbc..a8dbe59 100755 --- a/help/html/calculations/tree.html +++ b/help/html/calculations/tree.html @@ -1,28 +1,85 @@ Tree Calculation -

UPGMA tree

-

If this option is selected then all sequences are used to generate a UPGMA - tree. The pairwise distances used to cluster the sequences are the percentage - mismatch between two sequences. For a reliable phylogenetic tree I recommend - other programs (phylowin, phylip) should be used as they have the speed to use - better distance methods and bootstrapping. Again, plans are afoot for a server - to do this and to be able to read in tree files generated by other programs. -
- When the tree has been calculated a new window is displayed showing the tree - with labels on the leaves showing the sequence ids. The user can select the - ids with the mouse and the selected sequences will also be selected in the alignment - window and the PCA window if that analysis has been calculated.

+

Calculation of trees from alignments

+

Trees are calculated on either the complete alignment, or just the +currently selected group of sequences. There are four different +calculations, using one of two distance measures and constructing the +tree from one of two algorithms : +

+

Distance Measures

+

Trees are calculated on the basis of a measure of similarity +between each pair of sequences in the alignment : +

    +
  • PID
    The percentage identity between the two +sequences at each aligned position. +
  • BLOSUM62
    The sum of BLOSUM62 scores for the +residue pair at each aligned position. +
+

+

Tree Construction Methods

+

Jalview currently supports two kinds of agglomerative clustering +methods. These are not intended to substitute for rigorous +phylogenetic tree construction, and may fail on very large alignments. +

    +
  • UPGMA tree
    + UPGMA stands for Unweighted Pair-Group Method using Arithmetic + averages. Clusters are iteratively formed and extended by finding a + non-member sequence with the lowest average dissimilarity over the + cluster members. +

    +
  • +
  • Neighbour Joining tree
    + First described in 1987 by Saitou and Nei, this method applies a + greedy algorithm to find the tree with the shortest branch + lengths.
    + This method, as implemented in Jalview, is considerably more + expensive than UPGMA. +
  • +
+

+

+

The Tree Viewing Window

+

+ When the tree has been calculated a window is displayed showing the + tree, with the leaves labelled with sequence ids.

Selecting the 'show distances' checkbox will put branch lengths on the branches. These branch lengths are the percentage mismatch between two nodes.

-

 

-

Neighbour Joining tree

-

The distances between sequences for this tree are generated in the same way - as for the UPGMA tree. The method of clustering is the neighbour joining method - which doesn't just pick the two closest leaves to cluster together but compensates - for long edges by subtracting from the distances the average distance from each - leaf to all the others.
- Selection and output options are the same as for the UPGMA tree.
+ +

+ Selecting sequence ids at the leaves of the tree selects sequences + in the original alignment. These selections are reflected in any + other analysis windows open on the same alignment.

+

+ Clicking on an internal node of the tree will rearrange the tree + diagram, inverting the ordering of the branches at that node. +

+

+ Clicking anywhere along the extent of the tree (but not on a leaf or + internal node) defines a tree 'partition', by cutting every branch + of the tree spanning the depth where the mouse-click occured. Groups + are created containing sequences at the leaves of each connected + subtree. These groups are each given a different colour, which are + reflected in other windows in the same way as if the sequence ids + were selected, and can be edited in the same way as user defined + sequence groups.

+

Tree partitions are useful for comparing clusterings produced by +different methods and measures. They are also an effective way of +identifying specific patterns of conservation and mutation +corresponding to the overall phylogenetic structure, when combined +with the conservation +based colour scheme.

+ + +

External Sources for Phylogenetic Tree Construction

+

A number of programs exist for the reliable construction of + phylogenetic trees, which can cope with large numbers of sequences, + use better distance methods and can perform bootstrapping. See the + Phylogenetic Web + Services page for directly accessible methods. It will also be + possible to read trees into Jalview directly, in the near future. +

+ diff --git a/help/html/colourSchemes/conservation.html b/help/html/colourSchemes/conservation.html index 29756ba..3ac2c13 100755 --- a/help/html/colourSchemes/conservation.html +++ b/help/html/colourSchemes/conservation.html @@ -1,25 +1,29 @@ Conservation Calculation -

Conservation Colours

-

This option is based on the AMAS method of multiple sequence alignment analysis - (Livingstone C.D. and Barton G.J. (1993), Protein Sequence Alignments: A Strategy - for the Hierarchical Analysis of Residue Conservation.CABIOS Vol. 9 No. 6 (745-756)). -
- Hierarchical analysis is based on each residue having certain physico-chemical - properties.

-

The alignment can first be divided into groups. This is best done by first - creating an average distance tree (Calculate->Average distance tree). Selecting - a position on the tree will cluster the sequences into groups depending on the - position selected. Each group is coloured a different colour which is used for - both the ids in the tree and alignment windows and the sequences themselves. - If a PCA window is visible a visual comparison can be made between the clustering - based on the tree and the PCA.

-

The grouping by tree may not be satisfactory and the user may want to edit - the groups to put any outliers together.

-

When the conservation option is selected the existing colour scheme is modified - so that the most conserved columns in each group have the most intense colours - and the least conserved are the palest.

-

 

+

Colouring by Conservation

+

This is an approach to alignment colouring based on the one used in + the AMAS method of multiple sequence alignment analysis (Livingstone + C.D. and Barton G.J. (1993), Protein Sequence Alignments: A Strategy + for the Hierarchical Analysis of Residue Conservation.CABIOS Vol. 9 + No. 6 (745-756)). +

+

Conservation is measured as a numerical index reflecting the + conservation of physico-chemical properties in the alignment: + Identities score highest, and the next most conserved group contain + substitutions to amino acids lying in the same physico-chemical + class.

+

For an already coloured alignment, the conservation index at each + alignment position is used to modify the shading intensity of the + colour at that position. This means that the most conserved columns + in each group have the most intense colours, and the least conserved + are the palest. The slider controls the contrast between these + extremes.

+

Conservation can be calculated over the whole alignment, or just + within specific groups of sequences (such as those defined by + phylogenetic tree partitioning). + The option 'apply to all groups' controls whether the contrast + slider value will be applied to the indices for the currently + selected group, or all groups defined over the alignment.