X-Git-Url: http://source.jalview.org/gitweb/?a=blobdiff_plain;ds=sidebyside;f=help%2Fhtml%2Fcalculations%2Fpca.html;h=3529cae8f9159e0691061cdbfeddbef0b2eee416;hb=ead01b555e8b1d6322cecdf7780222c73143a03a;hp=c38d9ac812f81b944c1909a63935924f5d418988;hpb=37de9310bec3501cbc6381e0c3dcb282fcaad812;p=jalview.git diff --git a/help/html/calculations/pca.html b/help/html/calculations/pca.html index c38d9ac..3529cae 100755 --- a/help/html/calculations/pca.html +++ b/help/html/calculations/pca.html @@ -26,16 +26,23 @@

Principal Component Analysis

+

+ A principal component analysis can be performed via the calculations dialog which is accessed + by selecting Calculate→Calculate Tree or + PCA.... +

This calculation creates a spatial representation of the similarities within a selected group, or all of the sequences in an alignment. After the calculation finishes, a 3D viewer displays the set of sequences as points in 'similarity space', and similar sequences tend to lie near each other in the space.

- Caveats
The calculation is computationally expensive, - and may fail for very large sets of sequences - usually because the - JVM has run out of memory. A future release of Jalview will be able - to avoid this by executing the calculation via a web service. + Caveats
The calculation can be computationally + expensive, and may fail for very large sets of sequences - usually + because the JVM has run out of memory. However, the PCA + implementation in Jalview 2.10.2 employs more memory efficient + matrix storage structures, allowing larger PCAs to be performed.

@@ -53,33 +60,15 @@ Calculating PCAs for aligned sequences
Jalview can perform PCA analysis on both proteins and nucleotide sequence alignments. In both cases, components are generated by an - eigenvector decomposition of the matrix formed from the sum of - substitution matrix scores at each aligned position between each - pair of sequences - computed with one of the available score - matrices, such as BLOSUM62, + eigenvector decomposition of the matrix formed from pairwise similarity + scores between each pair of sequences. The similarity score model is + selected on the calculations dialog, and + may use one of the available score matrices, such as + BLOSUM62, PAM250, or the simple single - nucleotide substitution matrix. The options available for - calculation are given in the Change - Parameters menu. -

-

- PCA Calculation modes
The default Jalview - calculation mode (indicated when Jalview - PCA Calculation is ticked in the Change - Parameters menu) is to perform a PCA on a matrix where elements - in the upper diagonal give the sum of scores for mutating in one - direction, and the lower diagonal is the sum of scores for mutating - in the other. For protein substitution models like BLOSUM62, this - gives an asymmetric matrix, and a different PCA to a matrix produced - with the method described in the paper by G. Casari, C. Sander and - A. Valencia. Structural Biology volume 2, no. 2, February 1995 (pubmed) - and implemented at the SeqSpace server at the EBI. This method - preconditions the matrix by multiplying it with its transpose, and - can be employed in the PCA viewer by unchecking the Jalview - PCA Calculation option in the Change - Parameters menu. + nucleotide substitution matrix, or by sequence percentage identity, + or sequence feature similarity.

@@ -89,8 +78,10 @@ within the similarity space, as points in a rotateable 3D scatterplot. The colour of each sequence point is the same as the sequence group colours, white if no colour has been defined for the - sequence, and green if the sequence is part of a the currently - selected group.

+ sequence, and grey if the sequence is part of the currently selected + group. The viewer also employs depth cueing, so points appear darker + the farther away they are, and become brighter as they are rotated + towards the front of the view.

The 3d view can be rotated by dragging the mouse with the left mouse button pressed. The view can also be zoomed in and out with @@ -141,5 +132,26 @@ left-clicking and dragging the mouse over the display. --> added to the Jalview desktop in v2.7. The Reset button and Change Parameters menu were added in Jalview 2.8. Support for PAM250 based PCA was added in Jalview 2.8.1. +

+

+ Reproducing PCA calculations performed with older + Jalview releases Jalview 2.10.2 included a revised PCA + implementation which treated Gaps and non-standard residues in the + same way as a matrix produced with the method described in the paper + by G. Casari, C. Sander and A. Valencia. Structural Biology volume + 2, no. 2, February 1995 (pubmed) + and implemented at the SeqSpace server at the EBI. To reproduce + calculations performed with earlier Jalview releases it is necessary + to execute the following Groovy script: +

+    jalview.analysis.scoremodels.ScoreMatrix.scoreGapAsAny=true
+    jalview.analysis.scoremodels.ScoreModels.instance.BLOSUM62.@matrix[4][1]=3
+    
+ This script enables the legacy PCA mode where gaps were treated as + 'X', and to modify the BLOSUM62 matrix so it is asymmetric for + mutations between C to R (this was a typo in the original Jalview + BLOSUM62 matrix which was fixed in 2.10.2). +