From 84393a0dcdd5056f3b75b28ecce838c409d9b6ff Mon Sep 17 00:00:00 2001 From: Jim Procter Date: Mon, 9 Sep 2024 18:12:06 +0100 Subject: [PATCH] JAL-4418 initial pasimap documentation and updated release notes for recent merges --- help/help/html/calculations/pasimap.html | 125 ++++++++++------------------ help/markdown/releases/release-2_11_4_0.md | 7 +- 2 files changed, 47 insertions(+), 85 deletions(-) diff --git a/help/help/html/calculations/pasimap.html b/help/help/html/calculations/pasimap.html index b24377a..249c528 100755 --- a/help/help/html/calculations/pasimap.html +++ b/help/help/html/calculations/pasimap.html @@ -34,51 +34,46 @@

Like the PCA function in Jalview, PaSiMap analysis creates a spatial representation of - how similar sequences are within a selected group, or all of the sequences in an - alignment. After the calculation finishes, a 3D viewer displays the - set of sequences as points in 'similarity space', and similar - sequences tend to lie near each other in the space.

-

Similarities in the PaSiMap calculation are calculated from - pairwise alignments of all pairs of input sequences. Since this can be - very time consuming, the maximum number sequences that can be used to - calculate a PaSiMap is limited by available time. Jalview will provide + how similar sequences are within a selected group, or all of + the sequences in an alignment. However, instead of using + similarities calculated from the current alignment, PaSiMap + calculates a pairwise alignment for each pair of sequences, + which can take some time. After the calculation finishes, a + 3D viewer displays the set of sequences as points in 'similarity + space', and similar sequences tend to lie near each other in the + space.

+

Since similarities in the PaSiMap calculation are calculated from + pairwise alignments of all pairs of input sequences, the maximum number sequences that can be used to + calculate a PaSiMap is limited to 20000. Jalview will provide an estimate of how long the calculation will take, and a 'Cancel' button - allows it to be stopped if desired. + so the calculation can be stopped if desired.

About PaSiMap

- - Principal components analysis is a technique for examining the - structure of complex data sets. The components are a set of - dimensions formed from the measured values in the data set, and the - principal component is the one with the greatest magnitude, or - length. The sets of measurements that differ the most should lie at - either end of this principal axis, and the other axes correspond to - less extreme patterns of variation in the data set.

- -

- Calculating PCAs for aligned sequences
Jalview can - perform PCA analysis on both proteins and nucleotide sequence - alignments. In both cases, components are generated by an - eigenvector decomposition of the matrix formed from pairwise similarity - scores between each pair of sequences. The similarity score model is - selected on the
calculations dialog, and - may use one of the available score matrices, such as - BLOSUM62, - PAM250, or the simple single - nucleotide substitution matrix, or by sequence percentage identity, - or sequence feature similarity. + The PaSiMap technique has been shown to be an effective way of + visualising patterns of similarity amongst closely related + sequences (e.g. repeats, such as Titin). The approach takes as input + a set of pairwise alignment scores, rather than from scores derived + from a multiple alignment. These scores are used to compute q - + which ranges between 0 (random) and 1 (high similarity). q + reflects how good the alignment is as compared to an alignment + of two random sequences with the same amino acid composition. +

The matrix of q scores is then analysed with + cc_analysis. This method produces a spatial projection of + each sequence around an origin, where sequences sharing simialar + features lie on similar projected angles to the origin, and their distance + only affected by 'random variation'.

- +

- The PCA Viewer + The PaSiMap Viewer

This is an interactive display of the sequences positioned within the similarity space, as points in a rotateable 3D - scatterplot. The colour of each sequence point is the same as the + scatterplot, based on the PCA viewer. + The colour of each sequence point is the same as the sequence group colours, white if no colour has been defined for the sequence, and grey if the sequence is part of the currently selected group. The viewer also employs depth cueing, so points appear darker @@ -95,12 +90,16 @@ changed from the View→Background Colour.. dialog box. The File menu allows the view to be saved (File→Save submenu) as an EPS or PNG image or printed, and the original - alignment data and matrix resulting from its PCA analysis to be - retrieved. The coordinates for the whole PCA space, or just the + alignment data and matrix resulting from the PaSiMap analysis to be + retrieved. The coordinates for the whole PaSiMap space, or just the current view may also be exported as CSV files for visualization in another program or further analysis.

-

Options for coordinates export are:

+

Options for coordinates export allow them to be easily imported + to R for further analysis. For a worked example, take a look at the + STAR protocol paper (Morrell, submitted) and + github + repository for scripts.

- -

- A tool tip gives the sequence ID corresponding to a point in the - space, and clicking a point toggles the selection of the - corresponding sequence in the associated alignment window views. - - By default, points are only associated with the alignment view from - which the PCA was calculated, but this may be changed via the View→Associate - Nodes sub-menu. -

-

- Initially, the display shows the first three components of the - similarity space, but any eigenvector can be used by changing the - selected dimension for the x, y, or z axis through each one's menu - located below the 3d display. The Reset - button will reset axis and rotation settings to their defaults. -

-

- The output of points and transformed point coordinates was - added to the Jalview desktop in v2.7. The Reset button - and Change Parameters menu were added in Jalview 2.8. Support - for PAM250 based PCA was added in Jalview 2.8.1.In Jalview 2.11, support for saving and restoring PCAs in Project files was added, and the Change parameters menu removed. -

-

- Reproducing PCA calculations performed with older - Jalview releases Jalview 2.10.2 included a revised PCA - implementation which treated Gaps and non-standard residues in the - same way as a matrix produced with the method described in the paper - by G. Casari, C. Sander and A. Valencia. Structural Biology volume - 2, no. 2, February 1995 (pubmed) - and implemented at the SeqSpace server at the EBI. To reproduce - calculations performed with earlier Jalview releases it is necessary - to execute the following Groovy script: -

-    jalview.analysis.scoremodels.ScoreMatrix.scoreGapAsAny=true
-    jalview.analysis.scoremodels.ScoreModels.instance.BLOSUM62.@matrix[4][1]=3
-    
- This script enables the legacy PCA mode where gaps were treated as - 'X', and to modify the BLOSUM62 matrix so it is asymmetric for - mutations between C to R (this was a typo in the original Jalview - BLOSUM62 matrix which was fixed in 2.10.2). -

+ Please see the original PaSiMap publication:
Su K, + Mayans O, Diederichs K and Fleming, JR (2020) "Pairwise sequence similarity + mapping with PaSiMap: Reclassification of immunoglobulin domains from titin as case study" in + Computational and Structural Biotechnology Journal 2022 5409-5419
+ https://doi.org/10.1016/j.csbj.2022.09.034

diff --git a/help/markdown/releases/release-2_11_4_0.md b/help/markdown/releases/release-2_11_4_0.md index ff5dd45..b47ebc3 100644 --- a/help/markdown/releases/release-2_11_4_0.md +++ b/help/markdown/releases/release-2_11_4_0.md @@ -6,20 +6,23 @@ channel: "develop" ## New Features -- Calculate PASiMap projection for sequences - ported by Thomas Morell ( U. Konstanz) +- Calculate PASiMap projection for sequences - ported by Thomas Morell ( U. Konstanz) - Consensus secondary structure visualization for alignments - Show data source for 'reference annotation' from 3D structure (e.g. Secondary Structure) - Calculate tree or PCA using secondary structure annotation - allow adjustment of gap opening, extension, and score model for built in pairwise alignment +- Pairwise alignment can be performed with different substitution matrices +- PCA, Pairwise alignment, trees and PaSiMap window titles include the matrix and additional parameters used ### Experimental Features (enable via Desktop's tools menu) +- PCA and PaSiMap panels allow Right-click and drag to select several sequences at once - Jalview sensibly handles opening or a drag'n'drop of several .features, .annotations and .newick files onto an alignment ### development and deployment -- Add a command-line wrapper script to macOS bundle, linux and Windows installations (bash, powershell and .bat wrappers) +- Add a command-line wrapper script to macOS bundle, linux and Windows installations (bash, powershell and .bat wrappers) - Redirect stdout and stderr to file when launched from getdown - Add error message when launched directly from Jalview Installer DMG volume - Allow jalview auto-updates to download to and work from separate user-space directory -- 1.7.10.2