X-Git-Url: http://source.jalview.org/gitweb/?a=blobdiff_plain;f=sources%2Fsov%2FREADME.sov;fp=sources%2Fsov%2FREADME.sov;h=4cdb1c6102795c1308536095bac6bb70c0419684;hb=81362e35a140cd040e948b921053e74267f8a6e3;hp=0000000000000000000000000000000000000000;hpb=2cf032f4b987ba747c04159965aed78e3820d942;p=jpred.git diff --git a/sources/sov/README.sov b/sources/sov/README.sov new file mode 100644 index 0000000..4cdb1c6 --- /dev/null +++ b/sources/sov/README.sov @@ -0,0 +1,244 @@ + + SOV program measures secondary structure prediction accuracy + + Copyright by Adam Zemla (11/16/1996) + Email: adamz@llnl.gov + +------------------------------------------------------------------------------- + + Usage: sov + + Readme file: README.sov + +------------------------------------------------------------------------------- + + SOV + + Measure Description + + Adam Zemla & Ceslovas Venclovas + +------------------------------------------------------------------------------- + + Secondary structure prediction accuracy evaluation + SOV (Segment OVerlap) measure + + Introduction + + +The evaluation of secondary structure prediction accuracy is not as simple +task as it may look like. Traditionally used Q3 measure that gives an overall +number of residues predicted correctly can be very misleading. It seems that +measures concentrating on how well secondary structure elements are predicted +instead of individual residues better reflect the nature of three-dimensional +protein structure. As an effort to make evaluation of secondary structure +prediction more structurally meaningfull we have defined segment overlap +measure (SOV). SOV measure first proposed by Rost et al. - JMB. 1994, 235, 13-26 +is redefined here. The paper containing full scientific description of current +version of SOV measure and discussion regarding secondary structure prediction +accuracy evaluation is published by Zemla et al. - PROTEINS: Structure, +Function, and Genetics, 34, 1999, pp. 220-223. + +The aim of this program is to provide a possibility to evaluate predictions +and compare peformance of prediction accuracy measures. Given both predicted +and observed secondary structure assignments the program evaluates the accuracy +of the secondary structure prediction. Evaluation is done for overall +three-state (helix, strand, coil) and for each single conformational state +prediction accuraccy. The measures used are: + + Q3 - traditional per-residue prediction accuracy Qindex + SOV - Segment OVerlap measure (the definition of Zemla et al. - PROTEINS: + Structure, Function, and Genetics, 34, 1999, pp. 220-223) + + + + Q3 measure + +Qindex: (Qhelix, Qstrand, Qcoil, Q3) gives percentage of residues predicted +correctly as helix, strand, coil or for all three conformational states. +The definition of Qindex is as follows. + +For a single conformational state: + + number of residues correctly predicted in state i + Qi = ------------------------------------------------- * 100, + number of residues observed in state i + + +where i is either helix, strand or coil. + +For all three states: + + number of residues correctly predicted + Q3 = -------------------------------------- * 100 + number of all residues + + + + SOV measure + +Segment OVerlap quantity measure for a single conformational state: + + + 1 SUM MINOV(S1;S2) + DELTA(S1;S2) + SOV(i) = --- SUM --------------------------- * LEN(S1) + N(i) SUM MAXOV(S1;S2) + S(i) + + +S1 and S2 are the observed and predicted secondary structure segments + (in state i, which can be either H, E or C); +LEN(S1) is the number of residues in the segments S1; +MINOV(S1;S2) is the length of actual overlap of S1 and S2, i.e. + the extent for which both segments have residues in state i, + for example H; +MAXOV(S1;S2) is the length of the total extent for which either of + the segments S1 or S2 has a residue in state i; +DELTA(S1;S2) is the integer value defined as being equal to the + MIN{(MAXOV(S1;S2)- MINOV(S1;S2)); MINOV(S1;S2); + INT(LEN(S1)/2); INT(LEN(S2)/2)} + +THE SUM is taken over S, all the pairs of segments {S1;S2}, + where S1 and S2 have at least one residue in state i + in common; + +N(i) is the number of residues in state i defined as follows: + + SUM SUM + N(i) = SUM LEN(S1) + SUM LEN(S1) + SUM SUM + S(i) S'(i) + +Two sums are taken over S and S' + +S(i) is the number of all the pairs of segments {S1;S2}, + where S1 and S2 have at least one residue in state i + in common + +S'(i) is the number of segments S1 that do not produce + any segment pair + + +Segment OVerlap quantity measure for all three states: + + + 1 SUM SUM MINOV(S1;S2) + DELTA(S1;S2) + SOV = --- SUM SUM --------------------------- * LEN(S1) + N SUM SUM MAXOV(S1;S2) + i S(i) + +where the normalization value N is a sum of N(i) over all three +conformational states (i = HELIX, STRAND, COIL): + + SUM + N = SUM N(i) + SUM + i + + +SOV observed indicates that S1 is observed fragment and S2 is predicted one. +SOV predicted indicates that S1 is predicted fragment and S2 is observed one. + + +------------------------------------------------------------------------------- + + Data format of prediction + +Data for secondary structure prediction accuracy evaluation should be prepared +in COLUMN format: + + First column: protein sequence (AA) in one-letter code + Second column: observed (OSEC) secondary structure + Third column: predicted (PSEC) secondary structure + +Secondary structure conformational states can be either helix (H), strand (E) +or coil (C). Note: Alternatively, for coil assignment 'L' can be used instead, +but not a mixture of 'C' and 'L' in the same data file. Delimiters of columns +allowed are spaces. + + +Example.1 of input data format: +******************************* + +AA OSEC PSEC +M C C +Q C C +T C H +R H H +S H H +I H H +G C C +V C C + + +------------------------------------------------------------------------------- + +Three other formats of the input data are also allowed: + +Example.2 of input data format: +******************************* + + AA OSEC PSEC NUM + M C C 1 + Q C C 2 + T C H 3 + R H H 4 + S H H 5 + I H H 6 + G C C 7 + V C C 8 + + +Example.3 of input data format: +******************************* + +>OSEQ +CCCHHHCC +>PSEQ +CCHHHHCC +>AA +MQTRSIGV + + +Example.4 of input data format: +******************************* + +SSP 1 M C C +SSP 2 Q C C +SSP 3 T C H +SSP 4 R H H +SSP 5 S H H +SSP 6 I H H +SSP 7 G C C +SSP 8 V C C + +------------------------------------------------------------------------------- + +Output: +******* + + SECONDARY STRUCTURE PREDICTION + NUMBER OF RESIDUES PREDICTED: LENGTH = 8 + AA OSEC PSEC NUM + M C C 1 + Q C C 2 + T C H 3 + R H H 4 + S H H 5 + I H H 6 + G C C 7 + V C C 8 + ----------------------- + + SECONDARY STRUCTURE PREDICTION ACCURACY EVALUATION. N_AA = 8 + + ALL HELIX STRAND COIL + + Q3 : 87.5 100.0 100.0 80.0 + + SOV : 100.0 100.0 100.0 100.0 + + ----------------------- + + +