Java Bioinformatics Analyses Web Services (JABAWS) manual

JABAWS Manual

Table of content

JABAWS Virtual Appliance

When to use virtual appliance

The appliance best suits for individual users who want to use JABA web services locally, without Internet connection, want to keep their data private and uses Windows as their main OS. The appliance is a self contained unit of software and as such may be an attractive option for Linux, UNIX or Mac users but they can always deploy a war distribution instead.
To run the appliance you would need to have relatively powerful computer. The appliance comes pre configured to use 1 CPU and 512M of memory and the minimum amount of memory required is about 378M.

VirtualBox appliance configuration

VirtualBox can be used to run JABAWS services from Windows, Linux, Solaris or Mac host operation systems. Use VitualBox "Import Appliance" option to import JABAWS. Please bear in mind that to benefit from multiple CPU support under VirtualBox software you need to enable hardware virtualization extensions, such as Intel Virtualization VT-x or AMD-V support in the BIOS of your computer. Unfortunately, we were unable to find a reliable way to do it on Mac, so some Macs running VirtualBox will be limited to one CPU only, irrespective of the number of CPUs of the host machine.

We found that, by default, virtualization extensions are enabled in VirtualBox irrespective of whether your computer supports them. You will get the VERR_VMX_MSR_LOCKED_OR_DISABLED exception if you computer does not support the extensions or their support is disabled. Just deselect the checkboxes shown on the screen shot below to solve the problem.

VirtualBox JABAWS VM configuration screen shot displaying virtualization settings.

VT-x extension on VirtualBox

VMware Player appliance configuration

Free VMware Player can be used to run JABAWS services from Windows and Linux host operation system, there is no support for Mac in time of writing. However, VMware Fusion, a commercial VMware product, offer virtual machine support for Mac computers too.

To run JABAWS server on VMware player, unpack JABAWS VM into one of the folders on your local hard drive. Open VMware Player, and click "Open Virtual Machine" and point the Player to the location of the JABAWS, choose JABAWS.vmx file to open an appliance.

When you play the machine for the first time the Player might ask you whether "This virtual machine may have been moved or copied.", say that you have copied it. That is all.

JABAWS Appliance details

By default, JABAWS virtual appliance is configured with 512M of memory and 1 CPU, but you are free to change these settings. If you have more than one CPU or CPU core on your computer you can make them available for JABAWS virtual machine by editing virtual machine settings. Please bear in mind that more CPU power will not make a single calculation go faster, but it will enable the VM to do calculations in parallel. Similarly, you can add more memory to the virtual machine. More memory let your VM deal with larger tasks, e.g. work with large alignments.

The VMware Player screen shot below displays JABAWS VM CPU settings.

JABAWS appliance configuration:

VMware info
- Date of creation: 8 October 2010
- CPUs : 1
- RAM : 512 MB
- Networking : Host only (the VM has no access to the outside network, nothing from outside network can access the VM)
- Hard disk : 20 GB (expanding)
- VMware tools : Installed

OS info
- OS : TurnKey Linux, based on Ubuntu 8.0.4 JEOS (Just-Enough-Operation-System)
- Installation : Oracle Java 6, Tomcat 6, JABAWS v. 1.0
- Hostname : tomcat
- Patches : till date of creation
- IPv4 address : dhcp
- IPv6 address : auto
- DNS name : none
- Name server : dhcp
- Route : dhcp
- Keyboard : US_intl

Login credentials
- Root password: jabaws

Services

Default virtual console Alt+F7
Tomcat web server.
Access: http://VM_IP
JABAWS URL: http://VM_IP/jabaws
Web Shell
Access: https://VM_IP:12320/
Webmean
Access: https://VM_IP:12321/
SSH/SFTP
Access: root@VM_IP

Where VM_IP is the VM IP address. Under VMware Player host only networking, the first VM may have 192.168.227.128 IP address. Under VirtualBox host only networking, first VM may have 192.168.56.101 IP address.

Configuring Jalview to work with your JABAWS VM

After you booted JABAWS VM, you should see similar screen, however, the IP address of your VM may be different. To enable Jalview to work with your JABAWS appliance you need to go to Jalview->Tools->Preferences->Web Services -> New Service URL, and add JABAWS URL into the box provided. For more information please refer to Jalview help pages.

JABAWS welcome screen

If you click on Advanced Menu, you will see the configuration console, similar to the one below.

JABAWS welcome screen

If you need to configure a static IP address the configuration console will help you with this. Shutting down the VM is best from the configuration console as well.

JABAWS Installation

System Requirements

JABAWS requires a Java web application server compliant with version 2.4 of the Java Servlet specification, and a Java 6 runtime environment. We recommend using an official Oracle Java 6 runtime environment, and Apache-Tomcat web application server version 6, but other versions may work as well.

Installing the JABAWS WAR file

JABAWS is distributed as a web application archive (WAR). To deploy JABAWS in Apache-Tomcat - simply drop the war file into the webapps directory of a running Tomcat, and it will do the rest. For any other web application server, please follow your server's specific deployment procedure for 'WAR' files. If you are installing on a windows machine, then at this point your JABAWS installation will already be up and running, and you can try its services out using the JABAWS test client, but installations on other operating systems will require a final step to ensure JABAWS can locate and execute the binary programs it needs.

Preparing executables for use with JABAWS

JABAWS's web services use command line programs to do the actual analysis, so it must have access to programs which can be executed on your platform. The native executables bundled with JABAWS for Windows (32-bit) and Linux (i386) should be OK for those systems. However, the source code for these programs is also provided so you can recompile for your own architecture and exploit any optimizations that your system can provide. Alternately, if you have already got binaries on your system, then you can simply change the paths in JABAWS's configuration files so these are used instead.

Using the pre-compiled i386 binaries on Linux

Before the binaries that are bundled with JABAWS can be used, they must first be made executable using the provided 'setexecflag.sh' script:

cd to <webapplicationpath>/binaries/src
run sh setexecflag.sh
Make sure binaries supplied work under your OS.
For this run each binary, without any command line options or input files. If you see an error message complaining about missing libraries or other problems, then you probably need to recompile the binaries. with
Restart the Tomcat.

That's it! JABAWS should work at this point. Try it out using theJABAWS test client. If not, read on... or have a look at deploying on Tomcat tips.
Note: You may want to enable logging, see below for instructions on how to do that.

Recompiling the bundled programs for your system

If you have a fully equipped build environment on your (POSIX-like) system, then you should be able to recompile the programs from the source distributions which are included in the JABAWS war file. A script called 'compilebin.sh' is provided to automate this task.

In a terminal window, change the working directory to binaries/src
execute the compilebin.sh script,
either use: chmod +x compilebin.sh; compilebin.sh > compilebin.out;
or: sh compilebin.sh > compilebin.out
Now run sh setexecflag.sh
If any of the binaries was not recompiled, then a 'file not found' error will be raised.
Finally, restart your tomcat (or servlet container), and use the JABAWS test client to check that JABAWS can use the new binaries.

If you couldn't compile everthing, then it may be that your system does not have all the tools required for compiling the programs. At the very least check that you have gcc, g++ and make installed in your system. If not install these packages and repeat the compilation steps again. You should also review the compilebin.sh output - which was redirected to compilebin.out, and any errors output to the terminal. Finally, try obtaining the pre compiled binaries for your OS.

Reuse the binaries that are already in your system

If you would like to use the binaries you already have then you just need to let JABAWS know there they are. To do this, edit: conf/Executable.properties

When specifying paths to executables that already exist on your system, make sure you provide an absolute path, or one relative to the JABAWS directory inside webapps. For example, the default path for clustalw is defined aslocal.clustalw.bin=binaries/src/clustalw/src/clustalw2 Alternatively, instead of changing Executable.properties you could also replace the executables bundled with JABAWS with the ones that you have, or make symlinks to them. Then the default configuration will work for you. More information about the Executable.properties file is given below.

Obtaining alignment programs for your operation system from elsewhere

You could search for pre-packaged compiled executable in your system package repository or alternately, download pre-compiled binaries from each alignment program's home page. Then, either replace the executables supplied with the downloaded ones, or modify the paths in executable.properties as described above.

Configuring JABAWS

There are three parts of the system you can configure. The local and cluster engines, and the paths to individual executables for each engine. These settings are stored in configuration files within the web application directory (for an overview, then take a look at the war file content table).

Default JABA Web Services Configuration

Initially, JABAWS is configured with only the local engine enabled, with job output written to directory called "jobsout" within the web application itself. This means that JABAWS will work out of the box, but may not be suitable for serving a whole lab or instute.

Local Engine Configuration

The Local execution engine configuration is defined in the properties file conf/Engine.local.properties. The supported configuration settings are:
engine.local.enable=true - # enable or disable local engine, valid values true | false
local.tmp.directory=D:\\clusterengine\\testoutput - a directory to use for temporary files storage, optional, defaults to java temporary directory
engine.local.thread.number=4 - Number of threads for tasks execution (valid values between 1 and 2x cpu. Where x is a number of cores available in the system). Optional defaults to the number of cores for core number <=4 and number of cores-1 for greater core numbers.

If you are planning to heavily use the local engine (which you have to if you do not have a cluster) it is a good idea to increase the amount of memory available for the web application server. If you are using Apache-Tomcat, then you can define its memory settings in the JAVA_OPTS environment variable. To specify which JVM to use for Apache-Tomcat, put the full path to the JRE installation in the JAVA_HOME environment variable (We would recommend using Sun Java Virtual Machine (JVM) in preference to Open JDK). Below is an example of code which can be added to <tomcat_dir>/bin/setenv.sh script to define which JVM to use and a memory settings for Tomcat server. Tomcat server startup script (catalina.sh) will execute setenv.sh on each server start automatically.
export JAVA_HOME=/homes/ws-dev2/jdk1.6.0_17/
export JAVA_OPTS="-server -Xincgc -Xms512m -Xmx1024m"

Cluster Engine Configuration

Supported configuration settings:
engine.cluster.enable=true - # enable or disable local engine true | false, defaults to false
cluster.tmp.directory=/homes/clustengine/testoutput- a directory to use for temporary files storage. The value must be an absolute path to the temporary directory. Required. The value must be different from what is defined for local engine. This directory must be accessible from all cluster nodes.
For the cluster engine to work, the SGE_ROOT and LD_LIBRARY_PATH environment variables have to be defined. They tell the cluster engine where to find DRMAA libraries. These variables should be defined when the web application server starts up, e.g.

SGE_ROOT=/gridware/sge
LD_LIBRARY_PATH=/gridware/sge/lib/lx24-amd64

Finally, do not forget to configure executables for the cluster execution, they may be the same as for the local execution but may be different. Please refer to the executable configuration section for further details.

Executable Configuration

All the executable programs are configured in conf/Executable.properties file. Each executable is configured with a number of options. They are: local.X.bin.windows=<path to executable under windows system, optional>
local.X.bin=<path to the executable under non-windows system, optional>
cluster.X.bin=<path to the executable on the cluster, all cluster nodes must see it, optional>
X.bin.env=<semicolon separated list of environment variables for executable, use hash symbol as name value separator, optional>
X.--aamatrix.path=<path to the directory containing substitution matrices, optional>
X.presets.file=<path to the preset configuration file, optional >
X.parameters.file=<path to the parameters configuration file, optional>
X.limits.file=<path to the limits configuration file, optional>
X.cluster.settings=<list of the cluster specific options, optional>

Where X is a short executable wrapper class name.

Default JABAWS configuration includes path to local executables to be run by the local engine only, all cluster related settings are commened out, but they are there for you as example. Cluster engine is disabled by default. To configure executable for cluster execution uncomment the X.cluster settings and change them appropriately.

By default limits are set well in excess of what you may want to offer to the users outside your lab, to make sure that the tasks are never rejected. The default limit is 100000 sequences of 100000 letters on average for all of the JABA web services. You can adjust the limits according to your needs by editing conf/settings/<X>Limit.xml files.
After you have completed the editing your configuration may look like this:local.mafft.bin.windows=
local.mafft.bin=binaries/mafft
cluster.mafft.bin=/homes/cengine/mafft
mafft.bin.env=MAFFT_BINARIES#/homes/cengine/mafft;FASTA_4_MAFFT#/bin/fasta34;
mafft.--aamatrix.path=binaries/matrices
mafft.presets.file=conf/settings/MafftPresets.xml
mafft.parameters.file=conf/settings/MafftParameters.xml
mafft.limits.file=conf/settings/MafftLimits.xml
mafft.cluster.settings=-q bigmem.q -l h_cpu=24:00:00 -l h_vmem=6000M -l ram=6000M

Please not that relative paths must only be specified for the files that reside inside web application directory, all other paths must be supplied as absolute!

Furthermore, you should avoid using environment variables within the paths or options - since these will not be evaluated correctly. Instead, please explicitly specify the absolute path to anything normally evaluated from an environment variable at execution time.

If you are using JABAWS to submit jobs to the cluster (with cluster engine enabled), executables must be available from all cluster nodes the task can be sent to, also paths to the executables on the cluster e.g. cluster.<exec_name>.bin must be absolute.

Executables can be located anywhere in your system, they do not have to reside on the server as long as the web application server can access and execute them.

Cluster settings are treated as a black box, the system will just pass whatever is specified in this line directly to the cluster submission library. This is how DRMAA itself treats this settings. More exactly DRMAA JobTemplate.setNativeSpecification() function will be called.

Defining Environment Variables for Executables

Environment variables can be defined in property x.bin.env Where x is one of five executables supported by JABAWS. Several environment variables can be specified in the same line. For example.
mafft.bin.env=MAFFT_BINARIES#/homes/cengine/mafft;FASTA_4_MAFFT#/bin/fasta34;

The example above defines two environment variables with names MAFFT-BINARIES and FASTA_4_MAFFT and values /homes/cengine/mafft and /bin/fasta34 respectively. Semicolon is used as a separator between different environment variables whereas hash is used as a separator for name and value of the variable.

Configure JABAWS to Work with Mafft

If you use default configuration you do not need to read any further. The default configuration will work for you without any changes, however, if you want to install Mafft yourself then there is a couple of more steps to do.

Mafft executable needs to know the location of other files supplied with Mafft. In addition some Mafft functions depends on the fasta executable, which is not supplied with Mafft, but is a separate package. Mafft needs to know the location of fasta34 executable.

To let Mafft know where the other files from its package are change the value of MAFFT-BINARIES environment variables. To let Mafft know where is the fasta34 executable set the value of FASTA_4_MAFFT environment variable to point to a location of fasta34 program. The latter can be added to the PATH variable instead. If you are using executables supplied with JABAWS, the path to Mafft binaries would be like <relative path to web application directory>/binaries/src/mafft/binaries and the path to fasta34 binary would be <relative path to web application directory>/binaries/src/fasta34/fasta34. You can specify the location of Mafft binaries as well as fasta34 program elsewhere by providing an absolute path to them. All these settings are defined in conf/Executable.properties file.

For Developers

Web service functions description

All JABA multiple sequence alignment web services comply to the same interface, thus the function described below are available from all the services.

Functions for initiating the alignment String id = align(List<FastaSequence> list)
String id = customAlign(List<FastaSequence> sequenceList, List<Option> optionList)
String id = presetAlign(List<FastaSequence> sequenceList, Preset preset)

Functions pertaining to job monitoring and control
JobStatus status = getJobStatus(String id)
Alignment al = getResult(String id)
boolean cancelled = cancelJob(String id)
ChunkHolder chunk = pullExecStatistics(String id, long marker)

Functions relating to service features discovery
RunnerConfig rc = getRunnerOptions()
Limit limit = getLimit(String name)
LimitsManager lm = getLimits()
PresetManager pm = getPresets()

Please refer to a data model javadoc for a detailed description of each methods.

Structure of the template command line client

Packages	Classes and Interfaces
compbio.data.msa	MsaWS the interface for all multiple sequence alignment web services
compbio.data.sequence	JABAWS data types
compbio.metadata	JABAWS meta data types
compbio.ws.client	JABAWS command line client

Additional utility libraries this client depend upon is the compbio-util-1.3.jar and compbio-annotation-1.0.jar.
Please refer to a data model javadoc for a detailed description of each class and its methods.

JABA Web Services Internals

Testing JABA Web Services

You can use a command line client (part of the client only package) to test you JABAWS installation as described here. If you downloaded a JABAWS server package, you can use <your_jaba_context_name>/WEB-INF/lib/jaba-client.jar to test JABAWS installation as described in how-to. If you downloaded the source code, then you could run a number of test suits defined in the build.xml ant build file.

JABAWS Log Files

JABAWS can be configured to log what it is doing. This comes handy if you would like to see who is using your web services or need to chase some problems. JABAWS uses log4j to do the logging, the example of log4j configuration is bundled with JABAWS war file. You will find it in the /WEB-INF/classes/log4j.properties file. All the lines in this file are commented out. The reason why the logging is disabled by default it simple, log4j have to know the exact location where the log files should be stored. This is not known up until the deployment time. To enable the logging you need to definelogDir property in the log4j.properties and uncomment section of the file which corresponds to your need. More information is given in the log4j.properties file itself. Restart the tomcat or the JABAWS web application to apply the settings.

After you have done this, assuming that you did not change the log4j.properties file yourself, you should see the application log file called activity.log. The amount of information logged can be adjusted using different logging levels, it is reduced in the following order of log levels TRACE, DEBUG, INFO, WARN, ERROR, FATAL.

If you would like to know who is using your services, you might want to enable tomcat access logging.

JABAWS War File Content

Directory	Content description
conf/	contains configuration files such as Executable.properties, Engine.local.properties, Engine.cluster.properties
conf/settings	Contains individual executable description files. In particular XXXParameters.xml, XXXPresets.xml, XXXLimits.xml where XXX is the name of the executable
jobsout/	Contains directories generated when running an individual executable. E.g. input and output files and some other task related data. (optional)
binaries/	Directory contains native executables - programs, windows binaries (optional)
binaries/src	Contains source of native executables and Linux i386 binaries.
binaries/matrices	Substitution matrices
WEB-INF	Web application descriptor
WEB-INF/lib	Web application libraries
WEB-INF/classes	log4j.properties - log configuration file (optional)
Help Pages
/	help pages, index.html is the starting page
dm_javadoc	javadoc for JABAWS client (the link is available from How To pages)
prog_docs	documentation for programmes that JABAWS uses
images	images referenced by html pages

JAva Bioinformatics Analysis Web Services

JABAWS Manual

Table of content

JABAWS Virtual Appliance

JABAWS Installation

Configuring JABAWS

For Developers

JABA Web Services Internals

JABAWS Virtual Appliance

When to use virtual appliance

JABAWS Installation

System Requirements

Installing the JABAWS WAR file

Preparing executables for use with JABAWS

Using the pre-compiled i386 binaries on Linux

Configuring JABAWS

Default JABA Web Services Configuration

Local Engine Configuration

Cluster Engine Configuration

Executable Configuration

Configure JABAWS to Work with Mafft

For Developers

Web service functions description

Structure of the template command line client

JABA Web Services Internals

Testing JABA Web Services

JABAWS Log Files

JABAWS War File Content