2 SOV program measures secondary structure prediction accuracy
4 Copyright by Adam Zemla (11/16/1996)
7 -------------------------------------------------------------------------------
9 Usage: sov <input_data>
11 Readme file: README.sov
13 -------------------------------------------------------------------------------
19 Adam Zemla & Ceslovas Venclovas
21 -------------------------------------------------------------------------------
23 Secondary structure prediction accuracy evaluation
24 SOV (Segment OVerlap) measure
29 The evaluation of secondary structure prediction accuracy is not as simple
30 task as it may look like. Traditionally used Q3 measure that gives an overall
31 number of residues predicted correctly can be very misleading. It seems that
32 measures concentrating on how well secondary structure elements are predicted
33 instead of individual residues better reflect the nature of three-dimensional
34 protein structure. As an effort to make evaluation of secondary structure
35 prediction more structurally meaningfull we have defined segment overlap
36 measure (SOV). SOV measure first proposed by Rost et al. - JMB. 1994, 235, 13-26
37 is redefined here. The paper containing full scientific description of current
38 version of SOV measure and discussion regarding secondary structure prediction
39 accuracy evaluation is published by Zemla et al. - PROTEINS: Structure,
40 Function, and Genetics, 34, 1999, pp. 220-223.
42 The aim of this program is to provide a possibility to evaluate predictions
43 and compare peformance of prediction accuracy measures. Given both predicted
44 and observed secondary structure assignments the program evaluates the accuracy
45 of the secondary structure prediction. Evaluation is done for overall
46 three-state (helix, strand, coil) and for each single conformational state
47 prediction accuraccy. The measures used are:
49 Q3 - traditional per-residue prediction accuracy Qindex
50 SOV - Segment OVerlap measure (the definition of Zemla et al. - PROTEINS:
51 Structure, Function, and Genetics, 34, 1999, pp. 220-223)
57 Qindex: (Qhelix, Qstrand, Qcoil, Q3) gives percentage of residues predicted
58 correctly as helix, strand, coil or for all three conformational states.
59 The definition of Qindex is as follows.
61 For a single conformational state:
63 number of residues correctly predicted in state i
64 Qi = ------------------------------------------------- * 100,
65 number of residues observed in state i
68 where i is either helix, strand or coil.
72 number of residues correctly predicted
73 Q3 = -------------------------------------- * 100
74 number of all residues
80 Segment OVerlap quantity measure for a single conformational state:
83 1 SUM MINOV(S1;S2) + DELTA(S1;S2)
84 SOV(i) = --- SUM --------------------------- * LEN(S1)
89 S1 and S2 are the observed and predicted secondary structure segments
90 (in state i, which can be either H, E or C);
91 LEN(S1) is the number of residues in the segments S1;
92 MINOV(S1;S2) is the length of actual overlap of S1 and S2, i.e.
93 the extent for which both segments have residues in state i,
95 MAXOV(S1;S2) is the length of the total extent for which either of
96 the segments S1 or S2 has a residue in state i;
97 DELTA(S1;S2) is the integer value defined as being equal to the
98 MIN{(MAXOV(S1;S2)- MINOV(S1;S2)); MINOV(S1;S2);
99 INT(LEN(S1)/2); INT(LEN(S2)/2)}
101 THE SUM is taken over S, all the pairs of segments {S1;S2},
102 where S1 and S2 have at least one residue in state i
105 N(i) is the number of residues in state i defined as follows:
108 N(i) = SUM LEN(S1) + SUM LEN(S1)
112 Two sums are taken over S and S'
114 S(i) is the number of all the pairs of segments {S1;S2},
115 where S1 and S2 have at least one residue in state i
118 S'(i) is the number of segments S1 that do not produce
122 Segment OVerlap quantity measure for all three states:
125 1 SUM SUM MINOV(S1;S2) + DELTA(S1;S2)
126 SOV = --- SUM SUM --------------------------- * LEN(S1)
127 N SUM SUM MAXOV(S1;S2)
130 where the normalization value N is a sum of N(i) over all three
131 conformational states (i = HELIX, STRAND, COIL):
139 SOV observed indicates that S1 is observed fragment and S2 is predicted one.
140 SOV predicted indicates that S1 is predicted fragment and S2 is observed one.
143 -------------------------------------------------------------------------------
145 Data format of prediction
147 Data for secondary structure prediction accuracy evaluation should be prepared
150 First column: protein sequence (AA) in one-letter code
151 Second column: observed (OSEC) secondary structure
152 Third column: predicted (PSEC) secondary structure
154 Secondary structure conformational states can be either helix (H), strand (E)
155 or coil (C). Note: Alternatively, for coil assignment 'L' can be used instead,
156 but not a mixture of 'C' and 'L' in the same data file. Delimiters of columns
160 Example.1 of input data format:
161 *******************************
174 -------------------------------------------------------------------------------
176 Three other formats of the input data are also allowed:
178 Example.2 of input data format:
179 *******************************
192 Example.3 of input data format:
193 *******************************
203 Example.4 of input data format:
204 *******************************
215 -------------------------------------------------------------------------------
220 SECONDARY STRUCTURE PREDICTION
221 NUMBER OF RESIDUES PREDICTED: LENGTH = 8
231 -----------------------
233 SECONDARY STRUCTURE PREDICTION ACCURACY EVALUATION. N_AA = 8
235 ALL HELIX STRAND COIL
237 Q3 : 87.5 100.0 100.0 80.0
239 SOV : 100.0 100.0 100.0 100.0
241 -----------------------