</p>
<p>Like the <a href="pca.html">PCA<a> function in Jalview, PaSiMap
analysis creates a spatial representation of
- how similar sequences are within a selected group, or all of the sequences in an
- alignment. After the calculation finishes, a 3D viewer displays the
- set of sequences as points in 'similarity space', and similar
- sequences tend to lie near each other in the space.</p>
- <p>Similarities in the PaSiMap calculation are calculated from
- pairwise alignments of all pairs of input sequences. Since this can be
- very time consuming, the maximum number sequences that can be used to
- calculate a PaSiMap is limited by available time. Jalview will provide
+ how similar sequences are within a selected group, or all of
+ the sequences in an alignment. However, instead of using
+ similarities calculated from the current alignment, PaSiMap
+ calculates a pairwise alignment for each pair of sequences,
+ which can take some time. After the calculation finishes, a
+ 3D viewer displays the set of sequences as points in 'similarity
+ space', and similar sequences tend to lie near each other in the
+ space.</p>
+ <p>Since similarities in the PaSiMap calculation are calculated from
+ pairwise alignments of all pairs of input sequences, the maximum number sequences that can be used to
+ calculate a PaSiMap is limited to 20000. Jalview will provide
an estimate of how long the calculation will take, and a 'Cancel' button
- allows it to be stopped if desired.
+ so the calculation can be stopped if desired.
</p>
<p>
<strong>About PaSiMap</strong>
</p>
<p>
-
- Principal components analysis is a technique for examining the
- structure of complex data sets. The components are a set of
- dimensions formed from the measured values in the data set, and the
- principal component is the one with the greatest magnitude, or
- length. The sets of measurements that differ the most should lie at
- either end of this principal axis, and the other axes correspond to
- less extreme patterns of variation in the data set.</p>
-
- <p>
- <em>Calculating PCAs for aligned sequences</em><br />Jalview can
- perform PCA analysis on both proteins and nucleotide sequence
- alignments. In both cases, components are generated by an
- eigenvector decomposition of the matrix formed from pairwise similarity
- scores between each pair of sequences. The similarity score model is
- selected on the <a href="calculations.html">calculations dialog</a>, and
- may use one of the available score matrices, such as
- <a href="scorematrices.html#blosum62">BLOSUM62</a>,
- <a href="scorematrices.html#pam250">PAM250</a>, or the <a
- href="scorematrices.html#simplenucleotide">simple single
- nucleotide substitution matrix</a>, or by sequence percentage identity,
- or sequence feature similarity.
+ The PaSiMap technique has been shown to be an effective way of
+ visualising patterns of similarity amongst closely related
+ sequences (e.g. repeats, such as Titin). The approach takes as input
+ a set of pairwise alignment scores, rather than from scores derived
+ from a multiple alignment. These scores are used to compute <i>q</i> -
+ which ranges between 0 (random) and 1 (high similarity). <i>q</i>
+ reflects how good the alignment is as compared to an alignment
+ of two random sequences with the same amino acid composition.
+ </p><p>The matrix of <em>q</em> scores is then analysed with
+ <em>cc_analysis</em>. This method produces a spatial projection of
+ each sequence around an origin, where sequences sharing simialar
+ features lie on similar projected angles to the origin, and their distance
+ only affected by 'random variation'.
</p>
- <img src="pcaviewer.png">
+ <p> <img src="pasimapviewer.png">
<p>
- <strong>The PCA Viewer</strong>
+ <strong>The PaSiMap Viewer</strong>
</p>
<p>This is an interactive display of the sequences positioned
within the similarity space, as points in a rotateable 3D
- scatterplot. The colour of each sequence point is the same as the
+ scatterplot, based on the <a href="pca.html">PCA viewer</a>.
+ The colour of each sequence point is the same as the
sequence group colours, white if no colour has been defined for the
sequence, and grey if the sequence is part of the currently selected
group. The viewer also employs depth cueing, so points appear darker
changed from the View→Background Colour.. dialog box. The File
menu allows the view to be saved (<strong>File→Save</strong>
submenu) as an EPS or PNG image or printed, and the original
- alignment data and matrix resulting from its PCA analysis to be
- retrieved. The coordinates for the whole PCA space, or just the
+ alignment data and matrix resulting from the PaSiMap analysis to be
+ retrieved. The coordinates for the whole PaSiMap space, or just the
current view may also be exported as CSV files for visualization in
another program or further analysis.
<p>
- <p>Options for coordinates export are:</p>
+ <p>Options for coordinates export allow them to be easily imported
+ to R for further analysis. For a worked example, take a look at the
+ STAR protocol paper (Morrell, submitted) and
+ <a href="https://github.com/MorellThomas/plot_pasimap_data/releases/tag/v1.1">github
+ repository</a> for scripts.</p>
<ul>
<li>Output Values - complete dump of analysis (TxT* matrix
computed from sum of scores for all pairs of aligned residues from
<li>Output Points - The eigenvector matrix - rows correspond to
sequences, columns correspond to each dimension in the PCA</li>
<li>Transformed Points - The 3D coordinates for each sequence
- as shown in the PCA plot</li>
+ as shown in the PaSiMap plot</li>
</ul>
-
- <p>
- A tool tip gives the sequence ID corresponding to a point in the
- space, and clicking a point toggles the selection of the
- corresponding sequence in the associated alignment window views.
- <!-- Rectangular region
-based selection is also possible, by holding the 'S' key whilst
-left-clicking and dragging the mouse over the display. -->
- By default, points are only associated with the alignment view from
- which the PCA was calculated, but this may be changed via the <strong>View→Associate
- Nodes</strong> sub-menu.
- </p>
- <p>
- Initially, the display shows the first three components of the
- similarity space, but any eigenvector can be used by changing the
- selected dimension for the x, y, or z axis through each one's menu
- located below the 3d display. The <strong><em>Reset</em></strong>
- button will reset axis and rotation settings to their defaults.
- </p>
<p>
- <p>
- <em>The output of points and transformed point coordinates was
- added to the Jalview desktop in v2.7.</em> <em>The Reset button
- and Change Parameters menu were added in Jalview 2.8.</em> <em>Support
- for PAM250 based PCA was added in Jalview 2.8.1.</em><em>In Jalview 2.11, support for saving and restoring PCAs in Project files was added, and the Change parameters menu removed.</em>
- </p>
- <p>
- <strong>Reproducing PCA calculations performed with older
- Jalview releases</strong> Jalview 2.10.2 included a revised PCA
- implementation which treated Gaps and non-standard residues in the
- same way as a matrix produced with the method described in the paper
- by G. Casari, C. Sander and A. Valencia. Structural Biology volume
- 2, no. 2, February 1995 (<a
- href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=7749921">pubmed</a>)
- and implemented at the SeqSpace server at the EBI. To reproduce
- calculations performed with earlier Jalview releases it is necessary
- to execute the following Groovy script:
- <pre>
- jalview.analysis.scoremodels.ScoreMatrix.scoreGapAsAny=true
- jalview.analysis.scoremodels.ScoreModels.instance.BLOSUM62.@matrix[4][1]=3
- </pre>
- This script enables the legacy PCA mode where gaps were treated as
- 'X', and to modify the BLOSUM62 matrix so it is asymmetric for
- mutations between C to R (this was a typo in the original Jalview
- BLOSUM62 matrix which was fixed in 2.10.2).
- </p>
+ Please see the original PaSiMap publication: <br/>Su K,
+ Mayans O, Diederichs K and Fleming, JR (2020) "Pairwise sequence similarity
+ mapping with PaSiMap: Reclassification of immunoglobulin domains from titin as case study" in
+ <em>Computational and Structural Biotechnology Journal</em> <strong>2022</strong> 5409-5419<br/>
+ <a href="https://doi.org/10.1016/j.csbj.2022.09.034">https://doi.org/10.1016/j.csbj.2022.09.034</a></p>
</body>
</html>