--- /dev/null
+
+ SOV program measures secondary structure prediction accuracy
+
+ Copyright by Adam Zemla (11/16/1996)
+ Email: adamz@llnl.gov
+
+-------------------------------------------------------------------------------
+
+ Usage: sov <input_data>
+
+ Readme file: README.sov
+
+-------------------------------------------------------------------------------
+
+ SOV
+
+ Measure Description
+
+ Adam Zemla & Ceslovas Venclovas
+
+-------------------------------------------------------------------------------
+
+ Secondary structure prediction accuracy evaluation
+ SOV (Segment OVerlap) measure
+
+ Introduction
+
+
+The evaluation of secondary structure prediction accuracy is not as simple
+task as it may look like. Traditionally used Q3 measure that gives an overall
+number of residues predicted correctly can be very misleading. It seems that
+measures concentrating on how well secondary structure elements are predicted
+instead of individual residues better reflect the nature of three-dimensional
+protein structure. As an effort to make evaluation of secondary structure
+prediction more structurally meaningfull we have defined segment overlap
+measure (SOV). SOV measure first proposed by Rost et al. - JMB. 1994, 235, 13-26
+is redefined here. The paper containing full scientific description of current
+version of SOV measure and discussion regarding secondary structure prediction
+accuracy evaluation is published by Zemla et al. - PROTEINS: Structure,
+Function, and Genetics, 34, 1999, pp. 220-223.
+
+The aim of this program is to provide a possibility to evaluate predictions
+and compare peformance of prediction accuracy measures. Given both predicted
+and observed secondary structure assignments the program evaluates the accuracy
+of the secondary structure prediction. Evaluation is done for overall
+three-state (helix, strand, coil) and for each single conformational state
+prediction accuraccy. The measures used are:
+
+ Q3 - traditional per-residue prediction accuracy Qindex
+ SOV - Segment OVerlap measure (the definition of Zemla et al. - PROTEINS:
+ Structure, Function, and Genetics, 34, 1999, pp. 220-223)
+
+
+
+ Q3 measure
+
+Qindex: (Qhelix, Qstrand, Qcoil, Q3) gives percentage of residues predicted
+correctly as helix, strand, coil or for all three conformational states.
+The definition of Qindex is as follows.
+
+For a single conformational state:
+
+ number of residues correctly predicted in state i
+ Qi = ------------------------------------------------- * 100,
+ number of residues observed in state i
+
+
+where i is either helix, strand or coil.
+
+For all three states:
+
+ number of residues correctly predicted
+ Q3 = -------------------------------------- * 100
+ number of all residues
+
+
+
+ SOV measure
+
+Segment OVerlap quantity measure for a single conformational state:
+
+
+ 1 SUM MINOV(S1;S2) + DELTA(S1;S2)
+ SOV(i) = --- SUM --------------------------- * LEN(S1)
+ N(i) SUM MAXOV(S1;S2)
+ S(i)
+
+
+S1 and S2 are the observed and predicted secondary structure segments
+ (in state i, which can be either H, E or C);
+LEN(S1) is the number of residues in the segments S1;
+MINOV(S1;S2) is the length of actual overlap of S1 and S2, i.e.
+ the extent for which both segments have residues in state i,
+ for example H;
+MAXOV(S1;S2) is the length of the total extent for which either of
+ the segments S1 or S2 has a residue in state i;
+DELTA(S1;S2) is the integer value defined as being equal to the
+ MIN{(MAXOV(S1;S2)- MINOV(S1;S2)); MINOV(S1;S2);
+ INT(LEN(S1)/2); INT(LEN(S2)/2)}
+
+THE SUM is taken over S, all the pairs of segments {S1;S2},
+ where S1 and S2 have at least one residue in state i
+ in common;
+
+N(i) is the number of residues in state i defined as follows:
+
+ SUM SUM
+ N(i) = SUM LEN(S1) + SUM LEN(S1)
+ SUM SUM
+ S(i) S'(i)
+
+Two sums are taken over S and S'
+
+S(i) is the number of all the pairs of segments {S1;S2},
+ where S1 and S2 have at least one residue in state i
+ in common
+
+S'(i) is the number of segments S1 that do not produce
+ any segment pair
+
+
+Segment OVerlap quantity measure for all three states:
+
+
+ 1 SUM SUM MINOV(S1;S2) + DELTA(S1;S2)
+ SOV = --- SUM SUM --------------------------- * LEN(S1)
+ N SUM SUM MAXOV(S1;S2)
+ i S(i)
+
+where the normalization value N is a sum of N(i) over all three
+conformational states (i = HELIX, STRAND, COIL):
+
+ SUM
+ N = SUM N(i)
+ SUM
+ i
+
+
+SOV observed indicates that S1 is observed fragment and S2 is predicted one.
+SOV predicted indicates that S1 is predicted fragment and S2 is observed one.
+
+
+-------------------------------------------------------------------------------
+
+ Data format of prediction
+
+Data for secondary structure prediction accuracy evaluation should be prepared
+in COLUMN format:
+
+ First column: protein sequence (AA) in one-letter code
+ Second column: observed (OSEC) secondary structure
+ Third column: predicted (PSEC) secondary structure
+
+Secondary structure conformational states can be either helix (H), strand (E)
+or coil (C). Note: Alternatively, for coil assignment 'L' can be used instead,
+but not a mixture of 'C' and 'L' in the same data file. Delimiters of columns
+allowed are spaces.
+
+
+Example.1 of input data format:
+*******************************
+
+AA OSEC PSEC
+M C C
+Q C C
+T C H
+R H H
+S H H
+I H H
+G C C
+V C C
+
+
+-------------------------------------------------------------------------------
+
+Three other formats of the input data are also allowed:
+
+Example.2 of input data format:
+*******************************
+
+ AA OSEC PSEC NUM
+ M C C 1
+ Q C C 2
+ T C H 3
+ R H H 4
+ S H H 5
+ I H H 6
+ G C C 7
+ V C C 8
+
+
+Example.3 of input data format:
+*******************************
+
+>OSEQ
+CCCHHHCC
+>PSEQ
+CCHHHHCC
+>AA
+MQTRSIGV
+
+
+Example.4 of input data format:
+*******************************
+
+SSP 1 M C C
+SSP 2 Q C C
+SSP 3 T C H
+SSP 4 R H H
+SSP 5 S H H
+SSP 6 I H H
+SSP 7 G C C
+SSP 8 V C C
+
+-------------------------------------------------------------------------------
+
+Output:
+*******
+
+ SECONDARY STRUCTURE PREDICTION
+ NUMBER OF RESIDUES PREDICTED: LENGTH = 8
+ AA OSEC PSEC NUM
+ M C C 1
+ Q C C 2
+ T C H 3
+ R H H 4
+ S H H 5
+ I H H 6
+ G C C 7
+ V C C 8
+ -----------------------
+
+ SECONDARY STRUCTURE PREDICTION ACCURACY EVALUATION. N_AA = 8
+
+ ALL HELIX STRAND COIL
+
+ Q3 : 87.5 100.0 100.0 80.0
+
+ SOV : 100.0 100.0 100.0 100.0
+
+ -----------------------
+
+
+