3 * Jalview - A Sequence Alignment Editor and Viewer ($$Version-Rel$$)
4 * Copyright (C) $$Year-Rel$$ The Jalview Authors
6 * This file is part of Jalview.
8 * Jalview is free software: you can redistribute it and/or
9 * modify it under the terms of the GNU General Public License
10 * as published by the Free Software Foundation, either version 3
11 * of the License, or (at your option) any later version.
13 * Jalview is distributed in the hope that it will be useful, but
14 * WITHOUT ANY WARRANTY; without even the implied warranty
15 * of MERCHANTABILITY or FITNESS FOR A PARTICULAR
16 * PURPOSE. See the GNU General Public License for more details.
18 * You should have received a copy of the GNU General Public License
19 * along with Jalview. If not, see <http://www.gnu.org/licenses/>.
20 * The Jalview Authors are detailed in the 'AUTHORS' file.
23 <title>Principal Component Analysis</title>
27 <strong>Principal Component Analysis</strong>
29 <p>This calculation creates a spatial representation of the
30 similarities within a selected group, or all of the sequences in an
31 alignment. After the calculation finishes, a 3D viewer displays the
32 set of sequences as points in 'similarity space', and similar
33 sequences tend to lie near each other in the space.</p>
35 <em>Caveats</em><br />The calculation is computationally expensive,
36 and may fail for very large sets of sequences - usually because the
37 JVM has run out of memory. A future release of Jalview will be able
38 to avoid this by executing the calculation via a web service.
42 <strong>About PCA</strong>
44 <p>Principal components analysis is a technique for examining the
45 structure of complex data sets. The components are a set of
46 dimensions formed from the measured values in the data set, and the
47 principal component is the one with the greatest magnitude, or
48 length. The sets of measurements that differ the most should lie at
49 either end of this principal axis, and the other axes correspond to
50 less extreme patterns of variation in the data set.</p>
53 <em>Calculating PCAs for aligned sequences</em><br />Jalview can
54 perform PCA analysis on both proteins and nucleotide sequence
55 alignments. In both cases, components are generated by an
56 eigenvector decomposition of the matrix formed from the sum of
57 substitution matrix scores at each aligned position between each
58 pair of sequences - computed with one of the available score
59 matrices, such as <a href="scorematrices.html#blosum62">BLOSUM62</a>,
60 <a href="scorematrices.html#pam250">PAM250</a>, or the <a
61 href="scorematrices.html#simplenucleotide">simple single
62 nucleotide substitution matrix</a>. The options available for
63 calculation are given in the <strong><em>Change
64 Parameters</em></strong> menu.
67 <em>PCA Calculation modes</em><br /> The default Jalview
68 calculation mode (indicated when <em><strong>Jalview
69 PCA Calculation</strong></em> is ticked in the <strong><em>Change
70 Parameters</em></strong> menu) is to perform a PCA on a matrix where elements
71 in the upper diagonal give the sum of scores for mutating in one
72 direction, and the lower diagonal is the sum of scores for mutating
73 in the other. For protein substitution models like BLOSUM62, this
74 gives an asymmetric matrix, and a different PCA to a matrix produced
75 with the method described in the paper by G. Casari, C. Sander and
76 A. Valencia. Structural Biology volume 2, no. 2, February 1995 (<a
77 href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=7749921">pubmed</a>)
78 and implemented at the SeqSpace server at the EBI. This method
79 preconditions the matrix by multiplying it with its transpose, and
80 can be employed in the PCA viewer by unchecking the <strong><em>Jalview
81 PCA Calculation</em></strong> option in the <strong><em>Change
82 Parameters</em></strong> menu.
84 <img src="pcaviewer.gif">
86 <strong>The PCA Viewer</strong>
88 <p>This is an interactive display of the sequences positioned
89 within the similarity space, as points in a rotateable 3D
90 scatterplot. The colour of each sequence point is the same as the
91 sequence group colours, white if no colour has been defined for the
92 sequence, and green if the sequence is part of a the currently
95 The 3d view can be rotated by dragging the mouse with the <strong>left
96 mouse button</strong> pressed. The view can also be zoomed in and out with
97 the up and down <strong>arrow keys</strong> (and the roll bar of the
98 mouse if present). Labels will be shown for each sequence if the
99 entry in the View menu is checked, and the plot background colour
100 changed from the View→Background Colour.. dialog box. The File
101 menu allows the view to be saved (<strong>File→Save</strong>
102 submenu) as an EPS or PNG image or printed, and the original
103 alignment data and matrix resulting from its PCA analysis to be
104 retrieved. The coordinates for the whole PCA space, or just the
105 current view may also be exported as CSV files for visualization in
106 another program or further analysis.
108 <p>Options for coordinates export are:</p>
110 <li>Output Values - complete dump of analysis (TxT* matrix
111 computed from sum of scores for all pairs of aligned residues from
112 from i->j and j->i, conditioned matrix to be diagonalised,
113 tridiagonal form, major eigenvalues found)</li>
114 <li>Output Points - The eigenvector matrix - rows correspond to
115 sequences, columns correspond to each dimension in the PCA</li>
116 <li>Transformed Points - The 3D coordinates for each sequence
117 as shown in the PCA plot</li>
121 A tool tip gives the sequence ID corresponding to a point in the
122 space, and clicking a point toggles the selection of the
123 corresponding sequence in the associated alignment window views.
124 <!-- Rectangular region
125 based selection is also possible, by holding the 'S' key whilst
126 left-clicking and dragging the mouse over the display. -->
127 By default, points are only associated with the alignment view from
128 which the PCA was calculated, but this may be changed via the <strong>View→Associate
129 Nodes</strong> sub-menu.
132 Initially, the display shows the first three components of the
133 similarity space, but any eigenvector can be used by changing the
134 selected dimension for the x, y, or z axis through each ones menu
135 located below the 3d display. The <strong><em>Reset</em></strong>
136 button will reset axis and rotation settings to their defaults.
140 <em>The output of points and transformed point coordinates was
141 added to the Jalview desktop in v2.7.</em> <em>The Reset button
142 and Change Parameters menu were added in Jalview 2.8.</em> <em>Support
143 for PAM250 based PCA was added in Jalview 2.8.1.</em>