X-Git-Url: http://source.jalview.org/gitweb/?a=blobdiff_plain;ds=sidebyside;f=sources%2Falscript%2Fdoc%2Falscript.tex;fp=sources%2Falscript%2Fdoc%2Falscript.tex;h=9e699aba3d34227116c42087c2d273e816f95b15;hb=4a806b607a11fc8a5296b11d5b863bbf8a448808;hp=0000000000000000000000000000000000000000;hpb=443c228bf0712d71e7fa34b5a2dc4b2b2e79f13f;p=jpred.git diff --git a/sources/alscript/doc/alscript.tex b/sources/alscript/doc/alscript.tex new file mode 100644 index 0000000..9e699ab --- /dev/null +++ b/sources/alscript/doc/alscript.tex @@ -0,0 +1,1745 @@ +\documentstyle[12pt,times,html]{article} +\begin{document} +\parindent 0in +%\raggedright +\bibliographystyle{jmb} +\nocite{TitlesOn} +\begin{titlepage} +%\begin{singlespace} +\begin{center} +\begin{Huge} +\begin{bf} +\vskip 0.5in + +User Guide to + +ALSCRIPT - Sequence alignment to PostScript\\ + +Version 2.03\\ + +\end{bf} +\end{Huge} + +\vskip 0.5in + +\begin{large} +{\em Geoffrey J. Barton} +\vskip 0.5in + + +Laboratory of Molecular Biophysics\\ +University of Oxford\\ +Rex Richards Building\\ +South Parks Road\\ +Oxford OX1 3QU\\ +U.K.\\ + +\vskip 0.25in + +\end{large} +\vskip 0.25in +Tel: (44) 1865-275368\\ +Fax: (44) 1865-510454\\ +e-mail: gjb@bioch.ox.ac.uk + +\vskip 0.25in + +REFERENCE:\\ + Barton, G. J. (1993),\\ + ALSCRIPT a tool to format multiple sequence alignments\\ + Protein Engineering, Volume 6, No. 1, pp.37-40.\\ +\vskip 0.25in + +\end{center} +\end{titlepage} +\tableofcontents + +\section{Update History} +\begin{verbatim} + +VERSION 1.0 19 June 1992 +Version 1.1 26 June 1992 +Version 1.2 21 October 1992: Add multiple blocks per page option. +Version 1.3 15 November 1992: First Distribution. +Version 1.4 6 December 1992: Add Colour commands. +Version 1.4.1 1 February 1993: Small bug fixes - FULL RELEASE VERSION. +Version 1.4.2 15 February 1993: Make silent_mode toggle. +Version 1.4.3 1 March 1993: Fix bug in colour option. +Version 1.4.4 24 March 1993: Add mask features (should be version 1.5). +Version 1.4.5 25 May 1993: Include alsnum program in distribution. + 7 June 1993: Fix NO_NUMBERS bug in documentation + Change defaults for -q option to use MASK. + +Version 2.0 23 May 1995: Numerous changes and additions including the + option to colour backgrounds + differently, ommission of idents on + second and subsequent lines, helix, + strand and other special characters, + relative numbering, error checking of + ranges on input, bounding box, + screening, conservation colouring... +Version 2.03 5 June 1996: Small bug fixes - patches incorporated. + BACKGROUND_REGION and BOUNDING_BOX commands moved to + step 1 section. + +\end{verbatim} + +\section{Read This First - VERSION 2.0} + +This manual describes an interim release of ALSCRIPT that includes +many additional features over the previously distributed Version +1.4.5. I had hoped to make a lot more changes and improvements before +distributing the new version, however I have not had the time to do +this. I am distributing Version 2.0 since the new features have been +used in a number of published alignment figures. Please see +\hyperref{New Features in Version 2.0}{see the Section }{ }{new2} for details +of the new features. + +\section{Related Programs} + +The AMPS package (Barton, 1990). This performs multiple sequence +alignments and databank scanning. + +AMAS (Livingstone and Barton, 1993, CABIOS, 9, 745-756). Analysis of +Multiply Aligned Sequences. This package uses a sophisticated +set-based method to identify patterns of residue conservation in +multiple sequence alignments. + +All programs are available by anonymous ftp from geoff.biop.ox.ac.uk. +Please see the README file for details licencing and registration. +You can read manuals for the programs and some related papers at +http://geoff.biop.ox.ac.uk/. + +\section{Availability} + +ALSCRIPT is available free of charge for academic and non-commercial +purposes. Distribution is by anonymous ftp from geoff.biop.ox.ac.uk. +See the README file on the ftp server for details. You need to +register with G. J. Barton before downloading the software. + +\section{Installing ALSCRIPT} + +See the appropriate section for the computer type you are using: + +For PCs see \hyperref{386 MS-DOS installation}{see Section }{}{app4}. +For Unix see \hyperref{Unix installation}{see Section }{ }{app5}. +For VMS see \hyperref{VMS installation}{see Section }{ }{app6}. + +\section{Brief Description of ALSCRIPT} + +ALSCRIPT takes a multiple sequence alignment in AMPS (Barton \& +Sternberg, 1987, Barton, 1990) block-file format and a set of +formatting commands and produces a PostScript file that may be printed +on a PostScript laser printer, or viewed using a PostScript previewer +(e.g. Sun Microsystem's PageView program). CLUSTAL and GCG format +multiple alignment files may also be used (see below). ALSCRIPT is NOT a +multiple sequence alignment program, nor is it an alignment editor. + +Given a block-file and pointsize (character width/height), ALSCRIPT +calculates how many residues can be fitted across the page, and how +many sequences will fit down the page, it then prints the alignment at +the chosen pointsize on as many pages as are needed. Running ALSCRIPT with +a smaller or larger pointsize will automatically re-scale the alignment +to fit on fewer or more pages as appropriate. The actual page +dimensions may be re-set to any value, so if you have access to an A3 +PostScript printer, or typesetting machine, alignments can readily be +scaled to maximise the available space. + +Each output page has three basic regions. The left hand edge contains +identifier codes for each sequence. The main part of the page holds the +alignment, and the top part, the position numbers and tick marks. ALSCRIPT +commands make use of a character coordinate system for font changes, +and other formatting commands. Thus, any residue in the alignment may +be referred to by its sequence position number (x-axis) and sequence +number (y-axis), similarly, ranges of residue positions, or sequences +may also be defined in the character coordinate system. + +The basic ALSCRIPT commands allow the following functionality: + +Fonts: Any PostScript font at any size may be defined and used on +individual residues, regions or identifier codes. + +Boxing: Simple rectangular boxes may be drawn around any part of the +alignment. Particular residue types may be selected and automatically +"surrounded" by lines. For example, if the characters 'G' and 'P' are +selected, then lines will not be drawn between G and P characters, but +only where G and P border with other characters. + +Shading: Grey shading of any level from black to white may be applied +to any region of the alignment, either as a rectangular region, or as +residue specific shading. e.g. "shade all Cys residues between +positions 6 and 30" + +Text: Specific text strings may be added to the alignment at any +position and in any font or font size. + +Lines: Horizontal or vertical lines may be drawn to the left, right, top +or bottom of any residue position or group of positions. + +Colour: Characters or character backgrounds may be independently +coloured. + +The example block file "example1.blc" and command file "example1.als" +illustrate most of these commands in action. + +Although written with the aim of producing figures for journal +submission, ALSCRIPT may be used as a tool for interpreting multiple +sequence alignments. For example, the boxing, shading and font changing +facilities can be applied to highlight amino acids of a particular type +and thus draw attention to clusters of positive or negative charge, +hydrophobics, etc. + + +\section{New Features in Version 1.4.4} + +This version introduces the MASK family of commands which allows +boxing, shading etc to be applied according to the frequency of +occurence of the character types at each position in the alignment. +For example, it is possible to box positions where one character is +seen in more than N of the sequences. It is also possible to +box/shade etc the most frequently occurring character at each +position. Commands exist to select which characters will be used in +the calculation of frequencies and which will be excluded, thus boxing +can be based upon two or more character types at a position. MASK +commands also exist to show residues identical to one sequence in the +set. See the section on MASK below for details. + +NOTE: Although boxing according to the frequency of amino acids seen +at a position is a popular method of representation it is not usually +the most informative. An analysis that takes into account the +physico-chemical properties of the amino acids and also relates the +amino acid similarities to the overall similarity between the +sequences is more helpful in identifying functionally important +residues. The AMAS program (Livingstone and Barton, 1993) applies a +flexible hierarchical set-based approach to this problem. + +\section{New Features in Version 1.4.5 - Program alsnum} + +Version 1.4.5 includes the program "alsnum". This is a temporary solution to +the residue numbering problem. Ultimately, these functions will be included +as alscript commands. + +alsnum creates a set of TEXT commands that can be incorporated into an +alscript command file to place sequential numbers at any position in the +alignment. The numbers ignore gaps, so the numbering will correspond to +the specific sequence position rather than the alignment. + +To use the program: + +1. Decide where you want the numbers to be placed. For example, you might +want the numbers above the third sequence in the alignment. If so, make +an extra sequence space above the third sequence using the ADD\_SEQ\ command. + +2. Decide what is the number of the first residue of the sequence to be +numbered. This will not always be 1 since you may be aligning fragments +or domains. + +3. Decide the numbering interval (e.g. every 10th amino acid). + +4. Run the program. + +For example, if you want to add numbers according to sequence 37 of a +block file (junk.blc), calling the first residue of the sequence 12, +and with an interval of 5, and the numbers are to be placed at the +location of sequence 3 in the alignment. Type: + +alsnum 37 12 5 3 $<$ junk.blc $>$ junk.text + +5. Add the resulting TEXT commands from junk.text to your alscript +command file. + +\section{New Features in Version 2.0} +\label{new2} + +Error checking is now done on all ranges input. If you run ALSCRIPT +2.0 on a file that worked with ALSCRIPT 1.4.5, and it complains about +out of range numbers, then check your ranges carefully. If you think +you are right, then send me a minimal example of the problem and I +will investigate. Versions of ALSCRIPT before 2.0 would often work +happily with out of range numbers and produce perfectly OK output. + +The files {\bf ipns.als} and {\bf ipns.blc} show example command and +block file that use most of these new commands. See the {\em +examples} directory. + +\subsection{New Step 1 Commands} + +SCREENSIZE 120 + +Usually you should not need to change this value, it alters the +screening used by the printer. A value of 120 is used by default. On +most 300dpi black and white printers this gives much smoother greys +than the default used in earlier versions of ALSCRIPT. + +PIR\_SAVE\ filename + +Will cause the block file to be saved into the file ``filename'' in +PIR format. This can be useful for moving block file alignments to other +programs. + +MSF\_SAVE\ filename + +Will cause the block file to be saved into the file ``filename'' in +something that approximates GCG .msf format. {\em Warning! This has not +been fully tested.} + +NUMBER\_COLOUR\ 4 + +Sets the colour used for numbering at the top of the alignment (no +American spelling at the moment). In this example, colour number 4 has been +defined (See the DEFINE\_COLOUR command if you are not sure what this means). + +SINGLE\_PAGE + +If this is set, ALSCRIPT assumes everything will be plotted on one +page. At the moment, all this does is write the bounding box for the +figure, so encapsulating the PostScript. This {\em may} allow the +output of alscript to be imported into word processors etc, but +probably not all. + +ID\_ONLY\_ON\_FIRST\_LINE + +If this is present, then sequence identifiers will only be printed on +the first line of the alignment. Often this looks better for small +alignments than the default. + +BACKGROUND\_COLOUR\ 7 + +Sets the colour used for the background to the alignment. This can be +useful for preparing figures for projection. At the moment this only +works reliably when the SINGLE\_PAGE is also set. + + +BOUNDING\_BOX\ x\ y\ x1\ y1 + +Defines the bounding box for the figure. This is set in points (1/72 inch). +NOTE: In version 2.0 this was a STEP 2 Command. + +BACKGROUND\_REGION\ x\ y\ x1\ y1 + +Defines the region to colour as background - the default is set up for A4 paper +so US users may have to fiddle with this. Values are points (1/72 inch). +NOTE: In version 2.0 this was a STEP 2 Command. + +\subsection{New Step 2 Commands} + + +COLOUR\_TEXT\_REGION\ x\ y\ x1\ y1 colour + +Sets the colour for TEXT command output. Similar syntax to COLOUR\_REGION, FONT\_REGION etc. + +COLOUR\_LINE\_REGION\ x\ y\ x1\ y1 colour + +Set the colour for LINEs in a region. + +CALCONS\ x\ y\ x1\ y1 + +Calculate conservation values according to Zvelebil {\em et al.} for the designated +region. (See Livingstone \& Barton 1993 for details and further refs) + +MASK\ CONSERVATION\ cutoff + +If CONSCAL has been used, then mask residues according to the conservation cutoff. + +e.g. MASK CONSERVATION 10 would mask all identities, MASK CONSERVATION 6 would +mask reasonably conserved positions. See examples for more on this command. + +HELIX\ x1\ y\ x2 + +Draw a helix from x1 to x2 of sequence y. + +STRAND\ x1\ y\ x2 + +Draw a strand from x1 to x2 of sequence y. + +COIL\ x1\ y\ x2 + +Draw a coil (horizontal line) from x1 to x2 of sequence y. + +RELATIVE\_TO + +Set reference numbers to work relative to sequence number . This +means that in all subsequent commands, ALSCRIPT will translate your x +values into absolute position values in the alignment. This is {\em +extremely} useful since you can annotate your alignment using your +favourite sequences as a reference point. You no longer have to +translate every x position into the alignment position. + + is optional. If present, it specifies what the first +residue in the displayed sequence is. For example, you may be showing +residues 200-500 of a sequence, so would be 200 rather than +the default of 1. {\em Warning - this is a very new feature and +bounds checking is not fully enabled for it}. + +You can use RELATIVE\_TO several times in the command file to annotate +different sequences. RELATIVE\_TO 0 resets to the ``normal'' +alignment numbering. + +\subsection{New special TEXT commands} + +Some special TEXT commands have been added to allow drawing of alternative +shapes etc. In fact this is how the HELIX, STRAND and COIL commands are implemented. The text commands are all prefixed by an @ symbol. + +e.g. TEXT 3 6 ``@fuparrow'' + +will draw a filled up arrow at position 3,6. + +The alternative text commands are: + +@leftarrow - an open left pointing arrow. + +@fleftarrow - a filled left pointing arrow. + +@uparrow - an open up pointing arrow. + +@fuparrow - a filled up pointing arrow. + +@downarrow - an open down pointing arrow. + +@fdownarrow - a filled down pointing arrow. + +@circle - an open circle. + +@fcircle - a filled circle. + +I plan to make this option more flexible in the near future. + +\section{Running ALSCRIPT} +\subsection{Basic Use} + +I recommend you read through this section, then scan the commands in +\hyperref{ALSCRIPT Command Summary}{Section }{ }{app1} to get a feel for what ALSCRIPT can do. + +See \hyperref{Alternative ways of invoking ALSCRIPT}{Section }{for +alternative methods of invoking ALSCRIPT}{app7}. In this section, the +interactive method is described. The QUICK START method shown in +\hyperref{alternative ways of invoking ALSCRIPT}{Section }{ }{app7} +is useful to format a sequence alignment quickly +using standard pointsize and shading. + +ALSCRIPT is designed to work with AMPS block file format multiple +alignments. If you have a multiple alignment generated by CLUSTAL V +or the GCG package, then it must be translated to AMPS block file +format. + +To translate a GCG .MSF file: Type: msf2blc. +To translate a CLUSTAL PIR format file, or any PIR format file: clus2blc. + +Both programs prompt for the name of an input file, and an output +block file name. A good convention to follow is to name all blockfiles +with the extension ".blc". + +To run ALSCRIPT simply type: + +alscript + +you will then be prompted for the name of the ALSCRIPT command file. +Having typed the filename, the commands will be executed as you have +specified. + +A Simple Command File (example.als) + +The file example.blc contains a small multiple sequence alignment. +The following ALSCRIPT command file will convert this into a +PostScript alignment file in 12 point Helvetica. + +\begin{verbatim} +#Comments in ALSCRIPT command files start with a # +# +#Commands are free format - separated by blank, tab or comma characters +# +BLOCK_FILE example.blc #define the block file to format +OUTPUT_FILE example.ps #where to put the result +LANDSCAPE #landscape paper orientation +POINTSIZE 12 #12 point default pointsize +DEFINE_FONT 0 Helvetica DEFAULT #set font 0 to be Helvetica +SETUP #Tell the program to get on with it. +\end{verbatim} + +Now try changing the POINTSIZE value to 5 ALSCRIPT will re-format the +alignment to make best use of the available paper. + +These are all STEP 1 commands - they refer to overall layout, and system +settings - for example, the paper size or maximum sequence length. +Other commonly used STEP 1 commands are IDENT\_WIDTH\ which reserves more +or less width for the sequence identifier codes, NUMBER\_SEQS\ which adds +a number to each sequence and LINE\_WIDTH\_FACTOR\ which allows the +thickness of all boxing lines to be adjusted. See +\hyperref{ALSCRIPT Command Summary}{Section }{ }{app1} +for details of these and all other STEP 1 commands. + +The simple example outlined above can be modified with a variety of +STEP 2 commands. + +for example file example2.als: + +\begin{verbatim} +# FILE example2.als +# +#Commands are free format - separated by blank, tab or comma characters +# +BLOCK_FILE example.blc #define the block file to format +OUTPUT_FILE example2.ps #where to put the result +LANDSCAPE #landscape paper orientation +POINTSIZE 12 #12 point default pointsize +DEFINE_FONT 0 Helvetica DEFAULT #set font 0 to be Helvetica +DEFINE_FONT 1 Helvetica REL 0.5 #set font 1 to be half sized Helvetica +DEFINE_FONT 3 Helvetica-Bold DEFAULT #set font 3 to be Bold Helvetica +DEFINE_FONT 4 Times-BoldItalic DEFAULT #font 4 is Times-BoldItalic +NUMBER_SEQS #Number the sequences at left hand side +SETUP #Tell the program to get on with it. +# +#step 2 commands come after the SETUP command +# +#Here are some examples... +# +SURROUND_CHARS GP ALL #draw lines around all G and P +SHADE_CHARS ILVW ALL 0.6 #shade all I L V and W with value 0.6 +BOX_REGION 1 1 2 20 0.8 #rectangular box from positions 1 to 2 of sequences 1 to 20 +FONT_CHARS C ALL 3 #Use font 3 (BOLD Helvetica) for C characters +ID_FONT ALL 1 #set identifiers in font 1 + +\end{verbatim} + +There are many possible ways of combining these commands and the +others shown in +\hyperref{ALSCRIPT Command Summary}{Section }{ }{app1}. +In general, if you apply multiple +commands to the same residue, the effect of the last applied command +persists where there would otherwise be a conflict. Thus the +intersection of two overlapping SHADed regions would be shaded +according to the second SHADE command, not some mixture of the two. +Similarly for FONT commands. BOX and SURROUND commands behave in the +opposite sense, all BOXing and SURROUNDing persists regardless of how +many commands you issue. This makes it possible for example, to +SURROUND two different sets of residues as follows: + +\begin{verbatim} + +SURROUND_CHARS DE ALL +SURROUND_CHARS DEHKR ALL + +\end{verbatim} + +This would result in D and E characters being partitioned from the rest as well +as D E H K R characters (see Example output). + +\subsection{More complex effects - Text Lines, and Masks} + +Text, lines and masking are meant to be used to annotate the multiple +alignment. The TEXT command allows any piece of text to be located +anywhere on the alignment. Clearly, however it makes little sense to +superimpose the text over the alignment though this can be done! Accordingly, +you must first make a space to put the text in. Usually, this will be a +few lines below the multiple sequence alignment, but you may want to +add text at the top, or somewhere in between two sequences. You can make +space in two ways. Either by editing the block-file to introduce "dummy" +sequences at the locations you want, or by making use of the ADD\_SEQ\ +command. + +The ADD\_SEQ\ command has two arguments, the sequence after which you want +further sequences to be added, and how many blank sequences you need. +Thus, we can reserve space for 5 lines of text underneath a 10 sequence +multiple alignment with the following command. + +ADD\_SEQ\ 10 5 + +we can then put text below the alignment at the 20th residue. + +TEXT 20 13 "Active Site His" + +or any other position. + +Similarly, we could draw a vertical line to point out which residue we mean + +LINE LEFT 20 13 11 + +And change the font of the text to number 7 (whatever that has been set to): + +FONT\_RESIDUE\ 20 13 7 + +You can have multiple ADD\_SEQ\ commands, but they must occur in sequence order. +Thus: + +\begin{verbatim} +ADD_SEQ 0 5 +ADD_SEQ 5 12 +\end{verbatim} +is legal. +But +\begin{verbatim} +ADD_SEQ 5 12 +ADD_SEQ 0 5 +\end{verbatim} + +Is NOT!! NO CHECKING IS performed by the program for this error - so beware! + +Note that add\_seq\ commands refer to the actual sequence number as +implied by the block file, not the number after applying the add\_seq\ +command. Thus, for a four sequence block file, if you want to add +space for three sequences before sequence 1 and two sequences after +sequence 3, the commands would be: + +\begin{verbatim} +ADD_SEQ 0 3 +ADD_SEQ 3 2 +\end{verbatim} + +Text added with the TEXT command will not be split across page breaks, +so you may in some circumstances need to fiddle a little with the +location/pointsize for the text to get the desired result. + +Masking is a technique for drawing irregular shaped outlines, or +shaded regions - this should not be confused with the MASK family of commands +described below. For example a histogram can be added to the bottom of +an alignment by first defining some dummy sequences in the block-file +that have letters building up the shape of the histogram, then using +the SURROUND\_CHARS\ or SHADE\_CHARS\ commands together with the SUB\_CHARS\ +command to produce the desired effect. An example of this operation +being used to show frequencies of secondary structure predictions is +shown in example1.als and in the Protein Engineering paper. + +\section{Using Colour} + +Version 1.4 includes commands to allow the independent colouring of +characters, or their backgrounds. Colours are defined in a similar +manner to fonts using the DEFINE\_COLOUR\ command (American spelling also +allowed). For example: + +DEFINE\_COLOUR\ 7 1 0 0 + +defines colour number 7 to be red - see +\hyperref{ALSCRIPT Command Summary}{Section }{ }{app1} +for full details of +this command. Colours 99 and 100 are pre-defined to white and black. +ALSCRIPT assumes the paper colour is white. + +The command to colour the text of a character or text string is: + +CCOL\_CHARS\ + +the command to colour the background of a character is + +SCOL\_CHARS\ + +both have similar syntax to the FONT\_CHARS command. + +COLOUR\_REGION\ and COLOUR\_RES\ have similar syntax to SHADE\_REGION\ and +SHADE\_RES. + +An example command file that uses colour is shown in example3.als. + +\section{The MASK command family} + +The idea behind the MASK command is to build up a set of character +positions that will subsequently be boxed, shaded, set in a particular +font, etc. For example, lets say we want to box the most frequently +occuring character at each position in an alignment. + +The command + +mask SETUP + +tells ALSCRIPT to prepare a mask. + +mask FRE ALL + +specifies that the most frequently occuring character at each position +in the alignment will be masked. This command can be restricted to a +region of the alignment using: mask FRE sx sy ex ey, where sx etc +define the region in the same way as for font\_region\ and other commands. + +mask BOX ALL + +Tells ALSCRIPT to create the boxing lines that will separate the masked +characters from non-masked characters - this command may also be +restricted to a region of the alignment. + +The mask can be reset for re-use using the command: + +mask RESET + +Two further commands define which characters can be used when +calculating the mask. This allows gap-characters, or other amino acids +to be excluded from the calculation to avoid unwanted boxing. + +mask LEGAL "AVLI" + +defines the AVL and I as the only characters that will be used when +calculating the mask. + +mask ILLEGAL ".-\_" + +defines .- and \_\ as characters that will not be used when calculating +the mask. NOTE: the blank character " " cannot be defined in this way. +To avoid boxing " " characters substitute blanks for something else +(using SUB\_CHARS),\ calculate the mask, then substitute back. + +\subsection{Summary of mask commands} + +mask SETUP +\# Prepares for masking + +mask LEGAL $<$qstring$>$ +\# defines characters to include in ID or FRE calcs - optional. + +mask ILLEGAL $<$qstring$>$ +\# defines characters to exclude in ID or FRE calcs - optional. + +mask ID ALL N +\# Calculates a mask that flags the character that occurs at least N +times at a position. The word ALL can be substituted by four numbers +that define a region of the alignment. + +mask FRE ALL +\# calculates a mask that flags the most frequently occuring amino acid +at each position. ALL may be replaced by four numbers defining a region +of the alignment. + +Multiple mask FRE or mask ID commands may be applied, using different +LEGAL and ILLEGAL character definitions. In this way more complex +effects can be built up. + +The mask command also allows characters that are identical to one +sequence to be masked. + +mask AGREE ALL N + +will mask all positions that are identical to the Nth sequence. Thus, +for sequences that are very similar to a newly sequenced sequence, all +characters identical to the new sequence can be boxed or shaded, or set +in a different font or colour etc... + +mask NOT ALL + +allows the mask to be inverted. Thus, all positions that are NOT in the +mask now form the mask. So, having done a mask AGREE, a mask NOT will +allow the positions that are not identical to the selected sequence to +be highlighted or substituted. + +mask SUB ALL $<$char$>$ + +substitutes all characters in the mask with the character $<$char$>$. + +mask REGION ALL +applies a mask to all residues in the defined region. + +The following effects can now be applied to the masked characters: + +mask BOX ALL +\# boxes the masked residues - ie surrounds them by lines. + +mask SHADE ALL $<$grey$>$ +\# shades the masked residues by grey value. + +mask FONT ALL $<$fontnum$>$ +\# Uses font fontnum to output the masked residues. + +mask INVERSE ALL +\# Inverts the masked characters - ie outputs them in white. + +mask CCOL ALL $<$colnum$>$ +\# outputs the masked characters in the defined colour. + +mask SCOL ALL $<$colnum$>$ +\# outputs the backgrounds of the masked characters in the defined +colour. + +In all commands, the word ALL can be replaced by four numbers defining +the region to which the command is applied. + +mask RESET +\# resets the entire mask for re-use + + +\section{Printing ALSCRIPT Files} + +ALSCRIPT produces files in PostScript which may be printed on a PostScript +printer (e.g. an Apple LaserWriter). If you don't have a PostScript printer, +then you may still be able to use ALSCRIPT if you get hold of the +GhostScript software. This is a free package that interprets PostScript +commands and can produce output on a large number of different types of +printer. GhostScript runs on most hardware types (including PCs) and can also +display output to the screen. The package can be obtained from many different +sites on the Internet (In the UK try src.doc.ic.ac.uk). + +The actual command you need to type to send a PostScript file to the printer +will depend on your system. Consult your system manager for help. + +Be warned, ALSCRIPT can create extremely large PostScript files if +lots of boxing and shading is done on big alignments. On older +printers such output may take a long time to process. + +\section{Conclusion} + +ALSCRIPT provides a powerful set of formatting and editing commands +specifically tailored for multiple sequence alignments. It is best +used in conjunction with a PostScript previewer such as Sun's +PageView or GhostView since this allows the effect of changing a command to be +seen quickly. In the absence of such a tool, simpler effects can be +tested out without destroying too many trees in the Laser Printer! + +Like most programs, ALSCRIPT is evolving as I find new problems to display, so +if you have any suggestions - I shall endeavour to include them in a later +version. + +\section{Appendices} + +\subsection{ALSCRIPT Command Summary} +\label{app1} + +WARNING: Very little error checking is performed on command input. If +you give the wrong number of arguments to a command, then unexpected +things may happen, or the program will crash very inelegantly. I hope +to fix this in the next version of the program, in the meantime, make sure +you give the correct number of arguments to each command. + +All commands up to the first space character may be entered in UPPER +or lower case or MiXEd case. Qualifiers for commands (e.g. REL) must +be written in UPPER case. + +Command Reference: + +\begin{verbatim} + = enter an integer (e.g. 240) + = enter a floating point number (e.g. 0.45) + = enter a string (e.g. ARNDql) + = enter a quoted string (e.g. "Active Site") + = enter a single character. +\end{verbatim} + +\subsubsection{STEP 1 COMMANDS} + +These all refer to either system settings - e.g. the maximum allowed +sequence length, or to general page layout features. e.g. the longest +and shortest side of the paper on which you are plotting. + +\subsubsection{REQUIRED STEP 1 COMMANDS} + +BLOCK\_FILE\ $<$string$>$ + +Gives the name of the file that contains the multiple sequence alignment +to be formatted. File names should be fully qualified i.e. not +relative to the current directory. If no block file command is given, +ALSCRIPT will expect to read the block file from standard input. + +OUTPUT\_FILE\ $<$string$>$ + +Defines the output file name. This command should be near the beginning of +the command list. +e.g. OUTPUT\_FILE\ Figure1.ps + +You MUST define an output file unless the -p option +(See \hyperref{Alternative ways of invoking ALSCRIPT}{Section }{ }{app7}) +is used. + +DEFINE\_FONT\ $<$int$>$ $<$string$>$ ($<$int$>$/DEFAULT)/(REL $<$float$>$) + +Defines a font to use later: +e.g. +\begin{verbatim} +DEFINE_FONT 0 Helvetica 10 +DEFINE_FONT 2 Times-Roman 2 +\end{verbatim} + +defines font number 0 to be 10 point Helvetica, and font number 2 to be +2 point Times-Roman. Font 0 is always used as the default font. You MUST +define at least the font 0 font. + +DEFINE\_FONT\ 3 Times-Roman DEFAULT + +sets font 3 to be Times-Roman at whatever the default pointsize +is as set by the POINTSIZE command. + +DEFINE\_FONT\ 4 Helvetica REL 0.5 + +sets font 4 to be helvetica at half the default pointsize. + +NOTE: Font names must be written exactly as shown in +\hyperref{PostScript Fonts}{Section }{ }{app3}. + + +SETUP + +Signals the end of the STEP 1 commands. + +\subsubsection{OPTIONAL STEP 1 COMMANDS} + + +ADD\_SEQ\ $<$int$>$ $<$int$>$ + +Allows extra sequence positions to be created in an existing alignment. This +permits additional annotations to be interspaced either above, below, or +anywhere in the middle of an alignment. For example: + +ADD\_SEQ\ 0 10 + +would create an additional 10 sequences - all set to the blank character +before the first sequence in the block file that has been read in. + +ADD\_SEQ\ 3 1 + +would add an extra sequence after sequence 3. + +IMPORTANT: If you use the ADD\_SEQ\ facility to add sequences anywhere except +after the last sequence, then remember that the sequence +numbers will alter. All formatting commands that follow this command +must use the new sequence numbering. Thus in the first example: + +ADD\_SEQ\ 0 10 + +what was sequence number 1 becomes sequence 11. Sequences 1-10 are +the new blank sequences to be used for annotation. Note that the +sequence numbers only change for commands AFTER the SETUP command, +thus, multiple add\_seq\ commands refer to the sequence number as +implied by the block file. + +POINTSIZE $<$int$>$ + +Defines the pointsize to be used to scale the plot and space the characters. +Default is 10 point. + +NUMBER\_SEQS + +If present, then the sequence number is output with the identifier +code. This is useful for finding the coordinates of residues to box +or otherwise highlight. + +LANDSCAPE + +Specifies that alignments will be plotted with the longest paper axis +horizontal. (Can get longer alignments on a page this way). + +PORTRAIT + +Specifies that alignments will be plotted with the longest paper axis vertical +(can get more sequences on a page this way). + +IDENT\_WIDTH\ $<$int$>$ + +Units are characters. + +Reserves $<$int$>$ characters at left of every page for plotting +identifiers. Note that not all this space need be used, if a smaller +pointsize is used to plot out the identifier codes, than is used for +the main alignment. + +LINE\_WIDTH\_FACTOR\ $<$float$>$ + +Value greater than 0 that scales the default line width. The linewidth +is obtained by multiplying the pointsize by this factor. + + +X\_SPACE\_FACTOR\ $<$float$>$ + +Y\_SPACE\_FACTOR\ $<$float$>$ + +This determines the spacing between adjacent residues in the X and Y +directions. The spacing is calculated as: POINTSIZE + POINTSIZE * +X\_SPACE\_FACTOR or POINTSIZE + POINTSIZE * Y\_SPACE\_FACTOR\ as +appropriate. Defaults are 0.2 and 0.0 respectively. + +X\_SHIFT\_FACTOR\ $<$float$>$ + +Y\_SHIFT\_FACTOR\ $<$float$>$ + +These determine the shift relative to the residue drawing position +that is given to the boxing lines. The shift is calculated as +follows + +(POINTSIZE + POINTSIZE * X\_SPACE\_FACTOR)\ * X\_SHIFT\_FACTOR\ +similarly for Y\_SHIFT\_FACTOR. + +The defaults are 0.3 and 0.0 respectively. + +Fiddling with the X\_SPACE/SHIFT\ values is useful to fine tune the +appearance of the alignment. + + +MAX\_INPUT\_LEN\ $<$int$>$ + +Units are characters. +Defines the maximum number of characters possible in the input line +length. This must be greater than the maximum number of sequences +(MAX\_NSEQ). + +e.g. MAX\_INPUT\_LEN\ 600 + +Increases the default value of 500 characters to 600 characters. + +MAX\_NSEQ\ $<$int$>$ + +Units are characters. Defines the maximum number of sequences that +may be read by the program. This parameter has a large default (500). +You may need to reduce it on computers with small memories. + +MAX\_ILEN\ $<$int$>$ + +Units are characters. +The maximum length allowed for a sequence identifier code. + +MAX\_SEQ\_LEN\ $<$int$>$ + +Defines the maximum length allowed for a sequence alignment - this may +need to be reduced from the 8000 default value on smaller computers. + +X\_OFFSET\ $<$int$>$ +Units of points (1/72 inch). + +Defines the offset along the X-axis that the alignments will be shifted prior +to printing. Fiddle with this value to get a nice offset from the bottom left +hand corner of the page if your page size is not A4. + +Y\_OFFSET\ $<$int$>$ +Units of points (1/72 inch). + +As for X\_OFFSET,\ only Y axis. + +MAX\_SIDE\ $<$int$>$ +Units of inches. + +Defines the length of the longest side of the printer page. + +MIN\_SIDE\ $<$int$>$ +Units of inches. + +Defines the length of the shortest side of the printer page. + +VERTICAL\_SPACING\ $<$int$>$ + +Defines the vertical spacing in character units between blocks of sequences +when more than one block will fit on a page - default is 0. + + +DEFINE\_COLOUR\ $<$int$>$ $<$float$>$ $<$float$>$ $<$float$>$ + +DEFINE\_COLOR + +Defines a colour - the first number is a number by which the colour will +be referred. The following three numbers are the intensities of red, +green and blue respectively. Thus: + +DEFINE\_COLOUR\ 1 0 0.2 0.8 + +sets colour number 1 to be a colour with no red, 0.2 green and 0.8 blue. +The exact appearance of this colour will depend on the output device. +If you find suitable combinations of colours for your printer, then +please let me know and I shall distribute your suggestions with the +program. + +DO\_TICKS + +If present, then tick marks are drawn below the numbers at the top of the page. +Otherwise no ticks are shown. + +NUMBER\_INT\ $<$int$>$ + +Specifies the interval for writing residue position numbers. Default is 10 + +NO\_NUMBERS + +Switches all residue numbering off. + + +\subsubsection{STEP 2 COMMANDS} + + +All these are optional formatting commands. + +IMPORTANT PLEASE READ THIS NOTE: + +For those commands that accept region definitions (e.g. SURROUND\_CHARS) it +is easiest to think of the region being defined in terms of X and Y +coordinates, where X is the sequence residue coordinate and Y is the +sequence number coordinate. Thus 3 7 means the 3rd residue in sequence 7. +3 7 12 42 means the rectangular box bounded by residue 3 of sequence 7 and +residue 12 of sequence 42. + +SURROUND\_CHARS\ $<$string$>$ ALL + +Draw lines round, but not between the characters that are in the string. +e.g. + +SURROUND\_CHARS\ GP ALL + +will draw lines round all G and P characters in the alignment, but not +between adjacent G and P characters. + +SURROUND\_CHARS\ $<$string$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ + +Similar command, but the surrounding is restricted to the region defined by +the four integers. + +e.g. + +SURROUND\_CHARS\ ILVW 3 12 7 32 + +would surround ILVW characters that occur in the region defined +from residue positions 3-7 of sequences 12 to 32. + +SHADE\_CHARS\ $<$string$>$ ALL $<$float$>$ + +Shade all characters in the $<$string$>$ by the grey value given by $<$float$>$. +e.g. + +SHADE\_CHARS\ GP ALL 0.5 + +would shade all G and P characters in the alignment by the grey value 0.5. + +SHADE\_CHARS\ $<$string$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$float$>$ + +restricts the shading to the region defined by the four integers. Thus + +SHADE\_CHARS\ ILVW 3 12 7 32 0.7 + +would shade I L V and W characters from residues 3-7 of sequences 12-32 +inclusive with a grey value of 0.7. + +FONT\_CHARS\ $<$string$>$ ALL $<$int$>$ + +e.g. + +FONT\_CHARS\ GP ALL 7 + +would use font 7 to write out all G and P characters. Font 7 MUST have been +defined using the DEFINE\_FONT commands above. + +FONT\_CHARS\ $<$string$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ + +Similar to previous command, but restricts the effect to the region defined +by the first four integers. The font must have been defined by the +DEFINE\_FONT\ command. + +e.g. + +FONT\_CHARS\ ILVW 3 45 9 70 7 + +Would set the font to 7 for I L V and W characters for residues 3-9 of +sequences 45-70 inclusive. The font must have been defined by the +DEFINE\_FONT\ command. + +FONT\_REGION\ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ + +Define the font to use throughout the region specified by the first four +integers. + +e.g. + +FONT\_REGION\ 3 12 20 40 10 + +Use font 10 for residues from residues 3-20 of sequences 12-40. The font +must have been defined using the DEFINE\_FONT command. + +FONT\_RESIDUE\ $<$int$>$ $<$int$>$ $<$int$>$ + +Set the font for use with a single residue position - most useful when used +with the TEXT command. + +e.g. + +FONT\_RESIDUE\ 3 7 2 + +Use font 2 for residue 3 of sequence 7. Font 2 must have been defined using +the DEFINE\_FONT\ command. + +LINE $<$string$>$ $<$int$>$ $<$int$>$ $<$int$>$ + +There are four commands of this type for drawing horizontal or vertical lines +on the alignment. + +LINE LEFT $<$int$>$ $<$int$>$ $<$int$>$ + +Draw a line to the left of the character positions indicated. + +e.g. + +LINE LEFT 3 12 24 + +Draw a vertical line starting at residue 3 of sequence 12 and ending at +residue 3 of sequence 24. + +LINE TOP 3 12 24 + +Draw a horizontal line above the character positions from residue 3 of +sequence 12 to residue 24 of sequence 12. + +Similar commands are: + +LINE BOTTOM $<$int$>$ $<$int$>$ $<$int$>$ Draw a line at bottom of character position. + +LINE RIGHT $<$int$>$ $<$int$>$ $<$int$>$ Draw a line at right of character position. + +BOX\_REGION\ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ + +Draw a box around the region indicated by the four integers. + +e.g. + +BOX\_REGION\ 2 5 30 7 + +Would box from residue 2 of sequence 5 to residue 30 of sequence 7. + +SHADE\_REGION\ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$float$>$ + +Shade the region indicated by the integers with the grey value shown by the +float. +e.g. + +SHADE\_REGION\ 30 40 35 46 0.2 + +Would shade from residue 30-35 of sequences 40-46 with a grey value of 0.2. + +SHADE\_RES\ $<$int$>$ $<$int$>$ $<$float$>$ + +Shade just one amino acid with the grey value. + +e.g. + +SHADE\_RES\ 3 4 0.7 + +Shades residue 3 of sequence 7. (Note: this can also be achieved with the + +SHADE\_REGION\ command, but requires 2 extra numbers) + +TEXT $<$int$>$ $<$int$>$ $<$qstring$>$ + +Place the text string at the location indicated. + +e.g. + +TEXT 30 70 "Active Site His" + +would put the text Active Site His starting at position 30 of sequence +70. (Use FONT\_RESIDUE\ or FONT\_REGION\ commands to set the font of the +text). Text added with the TEXT command will not be split across page +breaks, so you may in some circumstances need to fiddle a little with +the location/pointsize for the text to get the desired result. + +ID\_FONT\ ALL $<$int$>$ + +Set the font for all identifier codes to the font number shown by $<$int$>$. +e.g. + +ID\_FONT\ ALL 3 + +Would set all the identifier codes to font 3. + +ID\_FONT\ $<$int$>$ $<$int$>$ + +Set the font for a specific identifier to font number. +e.g. + +ID\_FONT\ 12 4 + +Use font 4 for the identifer of sequence 12, default font for all other +identifiers. + +SUB\_CHARS\ ALL $<$char$>$ $<$char$>$ + +Substitute the characters indicated. + +e.g. + +SUB\_CHARS\ ALL + * + +would change all occurences of + to * in the alignment. + +SUB\_CHARS\ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$char$>$ $<$char$>$ + +restrict the substitution to the region shown. + +e.g. + +SUB\_CHARS\ 1 1 7 8 \% * + +would substitute * for \% from residue 1-7 of sequences 1-8. +NOTE: To substitute for or with the space character use the word SPACE. +e.g. to change all space characters to -. + +SUB\_CHARS\ ALL SPACE - + +SUB\_ID\ $<$int$>$ $<$qstring$>$ + +Replace the numbered identifier by the string. +e.g. + +SUB\_ID\ 34 "Predicted Secondary Structure" + +would replace whatever the identifier of sequence 34 was, by the text shown. +This is useful when used in conjunction with the ADD\_SEQ\ command shown under +the STEP 1 commands. + +INVERSE\_CHARS\ $<$string$>$ ALL/Range (similar syntax to FONT\_CHARS\ but no +font number) + +Print the selected characters in white. This clearly will only work +if you first use the SHADE\_CHARS command to shade the characters with +something other than white. + +CCOL\_CHARS\ $<$string$>$ ALL $<$int$>$ + +Colour all characters in the $<$string$>$ by the colour defined by $<$int$>$. + +e.g. + +CCOL\_CHARS\ GP ALL 12 + +would colour all G and P characters in the alignment by the colour 12. +This colour MUST have been defined by the DEFINE\_COLOUR\ command. + +CCOL\_CHARS\ $<$string$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ + +restricts the colouring to the region defined by the four integers. Thus + +CCOL\_CHARS\ ILVW 3 12 7 32 7 + +would colour I L V and W characters from residues 3-7 of sequences 12-32 +inclusive with the colour 7. + +SCOL\_CHARS:\ This has identical syntax to SCOL\_CHARS,\ but colours the +background of the character, rather than the letter itself. + +COLOUR\_REGION\ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ + +COLOR\_REGION + +Colour the region indicated by the integers with the colour number given +as the last number. + +e.g. + +COLOUR\_REGION\ 30 40 35 46 2 + +Would colour from residue 30-35 of sequences 40-46 with the colour 2. + +COLOUR\_RES\ $<$int$>$ $<$int$>$ $<$int$>$ + +Colour just one amino acid with the defined colour. + +e.g. + +COLOUR\_RES\ 3 4 7 + +Colours residue 3 of sequence 7. (Note: this can also be achieved with the +COLOUR\_REGION\ command, but requires 2 extra numbers) + + +\subsection{AMPS Block file format} +\label{app2} + +The first part of a block-file contains the identifier codes of the +sequences that are to follow. Each code is prefixed by the $>$ symbol, codes +must not contain spaces. + +e.g. +\begin{verbatim} +>HAHU +>Trypsin +>A0046 +>Seq1 + +\end{verbatim} + +etc. + +ALSCRIPT counts the number of $>$ symbols in the beginning of the file +until a * symbol is found. The * signals the beginning of the +multiple alignment which is stored VERTICALLY, thus columns are +individual sequences, whilst rows are aligned positions. The * symbol +must lie over the first sequence. A further star in the same column +signals the end of the alignment. ALSCRIPT uses the number of $>$ +symbols at the beginning of the file to work out how many columns to +read from the * position. It is therefore important that the only $>$ +symbols in the file are those that define the identifiers, and the +only * symbols are those defining the start and end of the multiple +alignment. The block file can contain additional text, providing that +there are no more $>$ or * symbols in the file than those used to define +the identifiers or alignment start and end. + +A simple, small block-file is shown here. + +\begin{verbatim} +>Seq_1 +>A0231 +>HAHU +>Four_Alpha +>Globin +>GLobin_C +* +ARNDLQ +AAAAAA +PPPPPP +PP PPP +WW WWW +LLLLLL +IIVVLL +* +\end{verbatim} + + +\subsection{PostScript Fonts} +\label{app3} +\begin{verbatim} +Times-Roman, +Times-Italic, +Times-Bold, +Times-BoldItalic, +Helvetica, +Helvetica-Oblique, +Helvetica-Bold, +Helvetica-BoldOblique, +Courier, +Courier-Oblique, +Courier-Bold +Courier-BoldOblique, +AvantGarde-Book, +AvantGarde-BookOblique, +AvantGarde-Demi, +AvantGarde-DemiOblique, +Bookman-Demi, +Bookman-DemiItalic, +Bookman-Light, +Bookman-LightItalic, +Helvetica-Narrow, +Helvetica-Narrow-Bold, +Helvetica-Narrow-BoldOblique, +Helvetica-Narrow-Oblique, +NewCenturySchblk-Roman, +NewCenturySchlbk-Bold, +NewCenturySchblk-Italic, +NewCenturySchblk-BoldItalic, +Palatino-Roman, +Palatino-Bold, +Palatino-Italic, +Palatino-BoldItalic +ZapfChancery-MediumItalic. +Symbol +\end{verbatim} + +\subsection{386 DOS installation} +\label{app4} + + +IMPORTANT - The programs on this disk will ONLY WORK on a PC with a 386 +or better processor. See the Technical Notes section for details of why. + +Directions: + +\begin{enumerate} + +\item + Create a directory on your hard disk. + e.g. mkdir ALSCRIPT. + +\item + Copy the Contents of the floppy disk into this directory. +\begin{verbatim} + e.g. copy a:*.* c:\alscript. +\end{verbatim} + +\item + Edit your AUTOEXEC.BAT file and add +\begin{verbatim} + C:\ALSCRIPT to your path. +\end{verbatim} + +\item + Edit your AUTOEXEC.BAT file and add the following two lines. + set DOS4GVM=@ALSCRIPT.VMC + set DOS4G=quiet +\end{enumerate} + +The first line is an instruction to read instructions from the file +ALSCRIPT.VMC. This sets up a permanent swap file on your hard disk. +By default, the swap file is about 12MBytes in size. If you do not have +this much free space on your disk, then edit the ALSCRIPT.VMC file +to reduce the swap file size, or alternatively, do not put this line +in your autoexec.bat. + +The programs will run without this swap file, but you will be limited in + the size of alignment you can process by the amount of RAM you have +installed. I have only tested this program on a 486/33 with 8MBytes RAM +and a 386/33 with 4MBytes so I do not know the practical limitations of +machines with smaller memories. Any feedback would be appreciated. + +5. Type AUTOEXEC.BAT to initialise the changes, or better still, reset +the computer. + +6. You should now be able to run all three programs in the package from +anywhere on your disk. msf2blc, clus2blc and alscript. If you get +memory allocation errors when you try to run alscript, then use the +MAX\_NSEQ\ and MAX\_SEQ\_LEN\ commands to reduce the default limits. If the +program still won't run, then think about buying some more memory!! + +The programs msf2blc and clus2blc should run OK, but if you try to +process alignments that are too large for your computer, you may get a +"malloc error" which will stop the program. If this happens and you are +not using the virtual memory option discussed above, then try adding the +line set DOS4GVM=@filename to your autoexec.bat file. If you +don't have enough disk space to do this, then buy a bigger disk, or more +memory. + +\subsection{TECHNICAL NOTES} + +The executables included in this package were compiled with the WATCOM C +compiler. This is a full 32 bit compiler that makes good use of the 386 +processor and does not work on the 16 bit 286. It also has the +advantage of allowing the flat memory model to be used. In practice +this means that porting programs like alscript from Unix computers like +the Sun, is straightforward. In order to access the memory of the +computer in this way, an extra program called a dos extender is required +- this is called DOS4GW.EXE. DOS4GW is automatically invoked every time +you run one of the programs and is responsible for managing the memory and +creating the swap file discussed above. + +\subsection{Unix Installation} +\label{app5} + +ALSCRIPT is distributed with executables for Sun (SunOS 4.1.3), Silicon +Graphics (IRIX 5.3), DEC ALPHA OSF/1 and Sun Solaris (2.4). The executables +are stored in the subdirectories bin/sun, bin/sgi, bin/osf and bin/sol. If +these are OK for your system, then just add the apporpriate directory to your +path, or put links to /usr/local/bin or somewhere that is on all users paths. + +The source code for ALSCRIPT is contained in a directory hierarchy. +The top directory contains a README file and the BUILD script. +Subdirectories are: {\bf examples} which contains example command and +alignment files, {\bf doc} which contains \LaTeX and PostScript copies +of the manual - a subdirectory of this contains an HTML version of the +manual, and {\bf src} which contains the source code and Makefiles for +the package. There may also be a directory called {\bf bin}. If +present this will contain subdirectories with executables for the +programs in the package. Makefiles to build alscript, msf2blc, +clus2blc and alsnum are included in the {\bf src} directory. Versions +for Sun (acc compiler .sun), Silicon Graphics (.sgi), DEC OSF/1 (.osf) +are included. + +There is a utility csh script called BUILD. Simply type ./BUILD sun to +compile alscript on the Sun, ./BUILD sgi for Silicon Graphics or BUILD +gcc for use with gcc compiler. See instructions in the file BUILD. +The BUILD script will create a /bin directory and subdirecotry +if not already present. You can create makefiles for different computers +and the BUILD script should still function. + + +\subsection{VAX/VMS Installation} +\label{app6} + +The standard VAX C compiler is not ANSI. Accordingly, ALSCRIPT will require +changes to the source code to compile on a VAX. + +The DEC C++ compiler works OK for alscript. Alscript will also compile +on Dec ALPHA under OpenVMS. A descrip.mms file is included for this +purpose. + +{\em WARNING: I've not tested Version 2.0 of ALSCRIPT on VMS} + +\subsection{Alternative ways of invoking ALSCRIPT} +\label{app7} + +The documentation above describes the interactive mode of running ALSCRIPT. +However, it may be more convenient to run the program as a pipe under +Unix or MS-DOS. Examples are shown here. + +ALSCRIPT is a program for producing pretty versions of multiple +sequence aligments. ALSCRIPT will also format single sequences. A +full description of the program is given in the file "alscript.doc". + +Ways of running alscript: + +\begin{enumerate} + +\item + Interactive mode: just type alscript. + You will be prompted for a command file name. The command file will + define the AMPS blocfile, and name of the file to store the PostScript + output - see alscript.doc for details. + +\item + alscript $<$command\_file$>$\ has same effect as 1, But does not prompt for + the command file + e.g. alscript example1.als + +\item + alscript -q $<$ $<$blocfile$>$ $>$ $<$PostScript$>$ + Quick mode - uses default commands, reads alignment from stdin, + writes PostScript to stdout. This mode creates a command file + called ALPSQ.COM. + + e.g. alscript -q $<$ example1.blc $>$ example1.ps + +\item + alscript -f $<$command\_file$>$ + Similar effect to 2. + +\item + alscript -f $<$command\_file$>$\ -s + Silent operation: No messages are written to stderr, unless fatal. + Silent operation may be toggled by the silent\_mode\ command + in the command file. + +\item + alscript -f $<$command\_file$>$\ -p $<$ $<$blocfile$>$ $>$ $<$PostScript$>$ + Make alscript work like a pipe - blocfile is read from stdin, + postscript is written + to stdout. Messages are written to stderr. To supress messages include + the -s flag too + + e.g. alscript -f example1.als -p -s $<$ example1.blc $>$ example1.ps + +\end{enumerate} + +Using alscript as a pipe has the advantage of allowing the blocfile to +be created on the fly by the programs msf2blc or clus2blc. For example +if we have a GCG .msf file called "pileup.msf" we can run alscript with +default shading/fonts and send the results straight to the PostScript +printer "lpr" as follows: + +msf2blc -q $<$pileup.msf | alscript -q -s | lpr + +\subsection{Program Crashes and Known Bugs} +\label{app8} + +We've used ALSCRIPT on Sun Workstations and Silicon Graphics for some +time, with very large alignments and command files with thousands of +commands. All seems to work OK, the program has not crashed on us at +all!! + +However, the command interpreter in ALSCRIPT is very simple and +the program will crash if you give any command the wrong number of +arguments (e.g. leaving out the shade value in a shade\_chars\ command). + +If you do make the program crash, have checked all the documentation +and your numbers, and the program still crashes. Then send me +the command file and block file that causes the crash and I will try +to investigate. + +Suggestions for improvements to the program are always welcome. + +\subsection{Wish List for next version!!} +\label{app9} + +A command interpreter that does more error checking will be included. +Currently, no checking is done to make sure that the correct number of +arguments are given to a command. + +Sequences will be able to be given unique labels and region commands refer +to these labels or ranges of labels. This will permit a sequence to be +deleted or added to the alignment without having to update the .als file. + +The relative numbering option will be extended to allow numbering relative +to a position. e.g. 456+7 would be 7 residues after position 456. This +will allow annotation of positions that may be in insertions relative to the +reference sequence. + +Special TEXT commands will be extended to allow alternative shapes to +be drawn and scaled in various ways. + +Tree drawing and generalised graphics. An option to draw arbitrary lines +on an alignment will be added. This will permit line graphics to be added +to an alignment. The initial reason for this will be to show dendrograms +(trees) alongside the alignment, but simple line graphs could also be plotted +under the alignment. + +Fiddle factors will be introduced to allow fine positioning of +individual characters. For example, if you like your ``I'' characters +to be centred rather than left justified, this will be possible. + +In single\_page mode, it will be possible to add arbitrary text to an +alignment for final annotation, e.g. titles etc. + +Variable height/width sequence lines will be permitted (maybe). + +\subsection{Acknowledgements} + +I thank all those who have emailed me with suggestions for +improvements to alscript. I've tried to include some of these in the +current distribution (e.g. screening). + +\subsection{References} +\label{app10} + +\begin{verbatim} + +1. Barton, G. J. (1993), + "ALSCRIPT A tool to format multiple sequence alignments", + Protein Engineering, Volume 6, No. 1, pp.37-40. + +2. Barton, G. J. (1990), + "Protein Multiple Sequence Alignment and Flexible Pattern Matching", + Methods in Enzymology, + 183,403-428. + +3. Barton, G. J. and Sternberg, M. J. E. (1987), + "A Strategy for the Rapid Multiple Alignment of Protein Sequences: + Confidence Levels From Tertiary Structure Comparisons", + Journal of Molecular Biology, + 198,327-337 + +4. Higgins, D. G. and Sharp, P. M. (1989), + "Fast and sensitive multiple sequence alignments on a microcomputer", + CABIOS, + 5,151--153 + +5. Devereux, J. Haeberli, P. Smithies, O. (1984), + "A comprehensive set of sequence analysis programs for the VAX", + Nucleic Acids Res. + 12, 387-395 + +6. Livingstone, C. D. and Barton, G. J. (1993), + "Protein Sequence Alignments: A Strategy for the Hierarchical analysis + of residue conservation" + Computer Applications in the Biosciences, + 9, 745-756. + +\end{verbatim} +\end{document} + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +