Principal Component Analysis
This calculation creates a spatial representation of the similarities within a selected group, or all of the sequences in an alignment. After the calculation finishes, a 3D viewer displays the set of sequences as points in 'similarity space', and similar sequences tend to lie near each other in the space.
Caveats
The calculation is computationally expensive, and may fail
for very large sets of sequences - usually because the JVM has run out
of memory. A future release of Jalview will be able to avoid this by
executing the calculation via a web service.
About PCA
Principal components analysis is a technique for examining the structure of complex data sets. The components are a set of dimensions formed from the measured values in the data set, and the principle component is the one with the greatest magnitude, or length. The sets of measurements that differ the most should lie at either end of this principle axis, and the other axes correspond to less extreme patterns of variation in the data set.
Calculating PCAs for aligned sequences
Jalview can
perform PCA analysis on both proteins and nucleotide sequence
alignments. In both cases, components are generated by an eigenvector
decomposition of the matrix formed from the sum of substitution matrix
scores at each aligned position between each pair of sequences -
computed with one of the available score matrices, such as
BLOSUM62, PAM250, or the simple single
nucleotide substitution matrix. The options available for
calculation are given in the
Change Parameters menu.
PCA Calculation modes
The default Jalview calculation mode
(indicated when Jalview PCA Calculation is
ticked in the Change Parameters menu) is to
perform a PCA on a matrix where elements in the upper diagonal give
the sum of scores for mutating in one direction, and the lower
diagonal is the sum of scores for mutating in the other. For protein
substitution models like BLOSUM62, this gives an asymmetric matrix,
and a different PCA to a matrix produced with the method described in the
paper by G. Casari, C. Sander and A. Valencia. Structural Biology
volume 2, no. 2, February 1995 (pubmed)
and implemented at the SeqSpace server at the EBI. This method
preconditions the matrix by multiplying it with its transpose, and can be employed in the PCA viewer by unchecking the Jalview
PCA Calculation option in the Change
Parameters menu.
The PCA Viewer
This is an interactive display of the sequences positioned within the similarity space, as points in a rotateable 3D scatterplot. The colour of each sequence point is the same as the sequence group colours, white if no colour has been defined for the sequence, and green if the sequence is part of a the currently selected group.
The 3d view can be rotated by dragging the mouse with the left mouse button pressed. The view can also be zoomed in and out with the up and down arrow keys (and the roll bar of the mouse if present). Labels will be shown for each sequence if the entry in the View menu is checked, and the plot background colour changed from the View→Background Colour.. dialog box. The File menu allows the view to be saved (File→Save submenu) as an EPS or PNG image or printed, and the original alignment data and matrix resulting from its PCA analysis to be retrieved. The coordinates for the whole PCA space, or just the current view may also be exported as CSV files for visualization in another program or further analysis.
Options for coordinates export are:
A tool tip gives the sequence ID corresponding to a point in the space, and clicking a point toggles the selection of the corresponding sequence in the associated alignment window views. By default, points are only associated with the alignment view from which the PCA was calculated, but this may be changed via the View→Associate Nodes sub-menu.
Initially, the display shows the first three components of the similarity space, but any eigenvector can be used by changing the selected dimension for the x, y, or z axis through each ones menu located below the 3d display. The Reset button will reset axis and rotation settings to their defaults.
The output of points and transformed point coordinates was added to the Jalview desktop in v2.7. The Reset button and Change Parameters menu were added in Jalview 2.8. Support for PAM250 based PCA was added in Jalview 2.8.1.