+GET rid of binaries/help directory!\r
TODO: \r
Registry 1 week\r
webservices - 1 week\r
--- /dev/null
+INTERPRETATION OF THE OUTPUT:\r
+\r
+In the case of long and short types of disorder the output gives the\r
+likelihood of disorder for each residue, i.e. it is a value between 0 and 1,\r
+and higher values indicate higher probability of disorder. Residues with values\r
+above 0.5 can be regarded as disordered, and at this cutoff 5% of globular\r
+proteins is expected to be predicted to disordered (false positives).\r
+ \r
+For the prediction type of globular domains it gives the number of globular\r
+domains and list their start and end position in the sequence. This is followed\r
+by the submitted sequence with residues of globular domains indicated by\r
+uppercase letters. \r
+\r
+\r
+SHORT SUMMARY OF THE METHOD\r
+\r
+Intrinsically unstructured/disordered proteins have no single well-defined\r
+tertiary structure in their native, functional state. Our server recognizes\r
+such regions from the amino acid sequence based on the estimated pairwise\r
+energy content. The underlying assumption is that globular proteins make a\r
+large number of interresidue interactions, providing the stabilizing energy to\r
+overcome the entropy loss during folding. In contrast, IUPs have special\r
+sequences that do not have the capacity to form sufficient interresidue\r
+interactions. Taking a set of globular proteins with known structure, we have\r
+developed a simple formalism that allows the estimation of the pairwise\r
+interaction energies of these proteins. It uses a quadratic expression in the\r
+amino acid composition, which takes into account that the contribution of an\r
+amino acid to order/disorder depends not only its own chemical type, but also\r
+on its sequential environment, including its potential interaction partners.\r
+Applying this calculation for IUP sequences, their estimated energies are\r
+clearly shifted towards less favorable energies compared to globular proteins,\r
+enabling the predicion of protein disorder on this ground. \r
+\r
+ \r
+References\r
+\r
+"The Pairwise Energy Content Estimated from Amino Acid Composition \r
+Discriminates between Folded and Intrinsically Unstructured Proteins"\r
+Zsuzsanna Dosztanyi, Veronika Csizmok, Peter Tompa and Istvan Simon\r
+J. Mol. Biol. (2005) 347, 827-839.\r
+\r
+"IUPred: web server for the prediction of intrinsically unstructured \r
+regions of proteins based on estimated energy content"\r
+Zsuzsanna Dosztanyi, Veronika Csizmok, Peter Tompa and Istvan Simon\r
+Bioinformatics (2005) 21, 3433-3434.\r
\r
### Clustal configuration ###\r
-local.clustalw.bin.windows=binaries/clustalw2.exe\r
+local.clustalw.bin.windows=binaries/windows/clustalw2.exe\r
local.clustalw.bin=binaries/src/clustalw/src/clustalw2\r
cluster.clustalw.bin=/homes/pvtroshin/workspace/jaba2/binaries/src/clustalw/src/clustalw2\r
# Parameters names which come from RunnerConfig -> Parameters.xml file ultimately are all lowercased in comparison!\r
clustalw.cluster.settings=-l h_cpu=24:00:00 -l h_vmem=6000M -l ram=6000M\r
\r
### Muscle configuration ###\r
-local.muscle.bin.windows=binaries/muscle.exe\r
+local.muscle.bin.windows=binaries/windows/muscle.exe\r
local.muscle.bin=binaries/src/muscle/muscle\r
# Beware version of muscle on the cluster older and does not support some \r
# of the newer version attributed thus, will not work with Muscle.java wrapper!\r
local.jronn.bin.windows=D:\\Java\\jdk1.6.0_24\\bin\\java.exe \r
local.jronn.bin=/sw/java/latest/bin/java\r
cluster.jronn.bin=/sw/java/latest/bin/java\r
-jronn.jar.file=binaries/jronn3.1.jar\r
+jronn.jar.file=binaries/windows/jronn3.1.jar\r
# jronn.parameters.file=conf/settings/JronnParameters.xml\r
jronn.limits.file=conf/settings/JronnLimits.xml\r
#TODO jronn.jvm.options=-Xms32M -Xmx512M\r
globplot.cluster.settings=-l h_cpu=24:00:00 -l h_vmem=6000M -l ram=6000M\r
\r
### IUPred configuration ### \r
-#local.iupred.bin.windows= \r
+local.iupred.bin.windows=binaries/windows/iupred/iupred.exe \r
local.iupred.bin=binaries/src/iupred/iupred\r
+# This must point to the directory where iupred binary is, with other files it depends on\r
iupred.bin.env=IUPred_PATH#/homes/pvtroshin/workspace/jaba2/binaries/src/iupred\r
cluster.iupred.bin=/homes/pvtroshin/workspace/jaba2/binaries/src/iupred/iupred\r
iupred.parameters.file=conf/settings/IUPredParameters.xml\r
iupred.cluster.settings=-l h_cpu=24:00:00 -l h_vmem=6000M -l ram=6000M\r
\r
### AACon configuration ###\r
+# This is just a path to the standard java executable \r
local.aacon.bin.windows=D:\\Java\\jdk1.6.0_24\\bin\\java.exe \r
local.aacon.bin=/sw/java/latest/bin/java\r
cluster.aacon.bin=/sw/java/latest/bin/java\r
-aacon.jar.file=binaries/aaconservation.jar\r
+# Path to the AACon library\r
+aacon.jar.file=binaries/windows/aaconservation.jar\r
aacon.parameters.file=conf/settings/AAConParameters.xml\r
aacon.presets.file=conf/settings/AAConPresets.xml\r
aacon.limits.file=conf/settings/AAConLimits.xml\r
--- /dev/null
+
+AA Conservation version 1.0b (2 September 2010)
+
+This program allows calculation of conservation of amino acids in
+multiple sequence alignments.
+It implements 17 different conservation scores as described by Valdar in
+his paper (Scoring Residue Conservation, PROTEINS: Structure, Function
+and Bioinformatics 48:227-241 (2002)) and SMERFS scoring algorithm as described
+by Manning, Jefferson and Barton (The contrasting properties of conservation
+and correlated phylogeny in protein functional residue prediction,
+BMC Bioinformatics (2008)).
+
+The conservation algorithms supported are:
+
+KABAT, JORES, SCHNEIDER, SHENKIN, GERSTEIN, TAYLOR_GAPS, TAYLOR_NO_GAPS,
+ZVELIBIL, KARLIN, ARMON, THOMPSON, NOT_LANCET, MIRNY, WILLIAMSON,
+LANDGRAF, SANDER, VALDAR, SMERFS
+
+Input format is either a FASTA formatted file containing aligned sequences with
+gaps or a Clustal alignment. The valid gap characters are *, -, space character,
+X and . (a dot). By default program prints the results to the command window.
+If the output file is provided the results are printed to the file in two
+possible formats with or without an alignment.
+If format is not specified, the program outputs conservation scores without
+alignment. The scores are not normalized by default but they can be (see below).
+SMERFS default parameters are window width of 7, column score is set to
+the middle column (MID_SCORE), gap% cutoff of 0.1. Different parameters for SMERFS
+can be provided (see below). Details of the program execution can be recorded to
+a separate file if an appropriate file path is provided.
+
+List of command line arguments:
+
+-m= precedes a comma separated list of method names
+ EXAMPLE: -m=KABAT,JORES,GERSTEIN
+ Optional, if no method is specified request for all is assumed.
+
+-i= precedes a full path to the input FASTA file, required
+
+-o= precedes a full path to the output file, optional, if no output file is
+ provided the program will output to the standard out.
+
+-t= precedes the number of CPUs (CPU cores more precisely) to use. Optional,
+ defaults to all processors available on the machine.
+
+-f= precedes the format of the results in the output file
+ two different formats are possible:
+ RESULT_WITH_ALIGNMENT
+ RESULT_NO_ALIGNMENT
+ Optional, if not specified RESULT_NO_ALIGNMENT is assumed
+
+-d= precedes a full path to a file where program execution details are to be
+ listed. Optional, if not provided, no execution statistics is produced.
+
+-g= precedes comma separated list of gap characters provided by the user, if
+ you're using an unusual gap character (not a -,., ,*,X) you have to
+ provide it. If you you provide this list you have to list all the gaps
+ accepted. Including those that were previously treated as a default.
+ Optional.
+
+-n using this key causes the results to be normalized.
+ Normalized results have values between 0 and 1. Please note however, that
+ some results cannot be normalized. In such a case, the system returns not
+ normalized value, and log the issue to the standard error stream.
+ The following formula is used for normalization
+ n = (d - dmin)/(dmax - dmin)
+ Negative results first converted to positive by adding an absolute value of
+ the most negative result. Optional.
+
+SMERFS Only Parameters:
+
+-smerfsGT= precedes SMERFS Gap Treshold - a gap percentage cutoff -
+ a float greater than 0 and smaller or equal 1. Optional defaults
+ to 0.1
+
+-smerfsCS= precedes SMERFS Column Score algorithm defines the window scores to
+ columns allocation , two methods are possible:
+ MID_SCORE - gives the window score to the middle column
+ MAX_SCORE - gives the column the highest score of all the windows it
+ belongs to. Optional defaults to MID_SCORE.
+
+-smerfsWW= precedes Window Width parameter - an integer and an odd number.
+ Optional, defaults to 7
+
+
+EXAMPLE HOW TO RUN THE PROGRAM:
+java -jar <jar name> -m=KABAT,SMERFS -i=prot1 -o=prot1_results -n
+
+As a result of the execution KABAT and SMERFS scores will be calculated.
+Input comes form prot1 file and an output without an alignment is recorded to
+prot1_results file.
+
+Authors: Peter Troshin, Agnieszka Golicz, David Martin and Geoff Barton.
+Please visit http://www.compbio.dundee.ac.uk/aacon for further information.
+
\ No newline at end of file
--- /dev/null
+INTERPRETATION OF THE OUTPUT:\r
+\r
+In the case of long and short types of disorder the output gives the\r
+likelihood of disorder for each residue, i.e. it is a value between 0 and 1,\r
+and higher values indicate higher probability of disorder. Residues with values\r
+above 0.5 can be regarded as disordered, and at this cutoff 5% of globular\r
+proteins is expected to be predicted to disordered (false positives).\r
+ \r
+For the prediction type of globular domains it gives the number of globular\r
+domains and list their start and end position in the sequence. This is followed\r
+by the submitted sequence with residues of globular domains indicated by\r
+uppercase letters. \r
+\r
+\r
+SHORT SUMMARY OF THE METHOD\r
+\r
+Intrinsically unstructured/disordered proteins have no single well-defined\r
+tertiary structure in their native, functional state. Our server recognizes\r
+such regions from the amino acid sequence based on the estimated pairwise\r
+energy content. The underlying assumption is that globular proteins make a\r
+large number of interresidue interactions, providing the stabilizing energy to\r
+overcome the entropy loss during folding. In contrast, IUPs have special\r
+sequences that do not have the capacity to form sufficient interresidue\r
+interactions. Taking a set of globular proteins with known structure, we have\r
+developed a simple formalism that allows the estimation of the pairwise\r
+interaction energies of these proteins. It uses a quadratic expression in the\r
+amino acid composition, which takes into account that the contribution of an\r
+amino acid to order/disorder depends not only its own chemical type, but also\r
+on its sequential environment, including its potential interaction partners.\r
+Applying this calculation for IUP sequences, their estimated energies are\r
+clearly shifted towards less favorable energies compared to globular proteins,\r
+enabling the predicion of protein disorder on this ground. \r
+\r
+ \r
+References\r
+\r
+"The Pairwise Energy Content Estimated from Amino Acid Composition \r
+Discriminates between Folded and Intrinsically Unstructured Proteins"\r
+Zsuzsanna Dosztanyi, Veronika Csizmok, Peter Tompa and Istvan Simon\r
+J. Mol. Biol. (2005) 347, 827-839.\r
+\r
+"IUPred: web server for the prediction of intrinsically unstructured \r
+regions of proteins based on estimated energy content"\r
+Zsuzsanna Dosztanyi, Veronika Csizmok, Peter Tompa and Istvan Simon\r
+Bioinformatics (2005) 21, 3433-3434.\r