SOV program measures secondary structure prediction accuracy Copyright by Adam Zemla (11/16/1996) Email: adamz@llnl.gov ------------------------------------------------------------------------------- Usage: sov Readme file: README.sov ------------------------------------------------------------------------------- SOV Measure Description Adam Zemla & Ceslovas Venclovas ------------------------------------------------------------------------------- Secondary structure prediction accuracy evaluation SOV (Segment OVerlap) measure Introduction The evaluation of secondary structure prediction accuracy is not as simple task as it may look like. Traditionally used Q3 measure that gives an overall number of residues predicted correctly can be very misleading. It seems that measures concentrating on how well secondary structure elements are predicted instead of individual residues better reflect the nature of three-dimensional protein structure. As an effort to make evaluation of secondary structure prediction more structurally meaningfull we have defined segment overlap measure (SOV). SOV measure first proposed by Rost et al. - JMB. 1994, 235, 13-26 is redefined here. The paper containing full scientific description of current version of SOV measure and discussion regarding secondary structure prediction accuracy evaluation is published by Zemla et al. - PROTEINS: Structure, Function, and Genetics, 34, 1999, pp. 220-223. The aim of this program is to provide a possibility to evaluate predictions and compare peformance of prediction accuracy measures. Given both predicted and observed secondary structure assignments the program evaluates the accuracy of the secondary structure prediction. Evaluation is done for overall three-state (helix, strand, coil) and for each single conformational state prediction accuraccy. The measures used are: Q3 - traditional per-residue prediction accuracy Qindex SOV - Segment OVerlap measure (the definition of Zemla et al. - PROTEINS: Structure, Function, and Genetics, 34, 1999, pp. 220-223) Q3 measure Qindex: (Qhelix, Qstrand, Qcoil, Q3) gives percentage of residues predicted correctly as helix, strand, coil or for all three conformational states. The definition of Qindex is as follows. For a single conformational state: number of residues correctly predicted in state i Qi = ------------------------------------------------- * 100, number of residues observed in state i where i is either helix, strand or coil. For all three states: number of residues correctly predicted Q3 = -------------------------------------- * 100 number of all residues SOV measure Segment OVerlap quantity measure for a single conformational state: 1 SUM MINOV(S1;S2) + DELTA(S1;S2) SOV(i) = --- SUM --------------------------- * LEN(S1) N(i) SUM MAXOV(S1;S2) S(i) S1 and S2 are the observed and predicted secondary structure segments (in state i, which can be either H, E or C); LEN(S1) is the number of residues in the segments S1; MINOV(S1;S2) is the length of actual overlap of S1 and S2, i.e. the extent for which both segments have residues in state i, for example H; MAXOV(S1;S2) is the length of the total extent for which either of the segments S1 or S2 has a residue in state i; DELTA(S1;S2) is the integer value defined as being equal to the MIN{(MAXOV(S1;S2)- MINOV(S1;S2)); MINOV(S1;S2); INT(LEN(S1)/2); INT(LEN(S2)/2)} THE SUM is taken over S, all the pairs of segments {S1;S2}, where S1 and S2 have at least one residue in state i in common; N(i) is the number of residues in state i defined as follows: SUM SUM N(i) = SUM LEN(S1) + SUM LEN(S1) SUM SUM S(i) S'(i) Two sums are taken over S and S' S(i) is the number of all the pairs of segments {S1;S2}, where S1 and S2 have at least one residue in state i in common S'(i) is the number of segments S1 that do not produce any segment pair Segment OVerlap quantity measure for all three states: 1 SUM SUM MINOV(S1;S2) + DELTA(S1;S2) SOV = --- SUM SUM --------------------------- * LEN(S1) N SUM SUM MAXOV(S1;S2) i S(i) where the normalization value N is a sum of N(i) over all three conformational states (i = HELIX, STRAND, COIL): SUM N = SUM N(i) SUM i SOV observed indicates that S1 is observed fragment and S2 is predicted one. SOV predicted indicates that S1 is predicted fragment and S2 is observed one. ------------------------------------------------------------------------------- Data format of prediction Data for secondary structure prediction accuracy evaluation should be prepared in COLUMN format: First column: protein sequence (AA) in one-letter code Second column: observed (OSEC) secondary structure Third column: predicted (PSEC) secondary structure Secondary structure conformational states can be either helix (H), strand (E) or coil (C). Note: Alternatively, for coil assignment 'L' can be used instead, but not a mixture of 'C' and 'L' in the same data file. Delimiters of columns allowed are spaces. Example.1 of input data format: ******************************* AA OSEC PSEC M C C Q C C T C H R H H S H H I H H G C C V C C ------------------------------------------------------------------------------- Three other formats of the input data are also allowed: Example.2 of input data format: ******************************* AA OSEC PSEC NUM M C C 1 Q C C 2 T C H 3 R H H 4 S H H 5 I H H 6 G C C 7 V C C 8 Example.3 of input data format: ******************************* >OSEQ CCCHHHCC >PSEQ CCHHHHCC >AA MQTRSIGV Example.4 of input data format: ******************************* SSP 1 M C C SSP 2 Q C C SSP 3 T C H SSP 4 R H H SSP 5 S H H SSP 6 I H H SSP 7 G C C SSP 8 V C C ------------------------------------------------------------------------------- Output: ******* SECONDARY STRUCTURE PREDICTION NUMBER OF RESIDUES PREDICTED: LENGTH = 8 AA OSEC PSEC NUM M C C 1 Q C C 2 T C H 3 R H H 4 S H H 5 I H H 6 G C C 7 V C C 8 ----------------------- SECONDARY STRUCTURE PREDICTION ACCURACY EVALUATION. N_AA = 8 ALL HELIX STRAND COIL Q3 : 87.5 100.0 100.0 80.0 SOV : 100.0 100.0 100.0 100.0 -----------------------