1 \documentstyle[12pt,times,html]{article}
5 \bibliographystyle{jmb}
16 ALSCRIPT - Sequence alignment to PostScript\\
26 {\em Geoffrey J. Barton}
30 Laboratory of Molecular Biophysics\\
31 University of Oxford\\
32 Rex Richards Building\\
41 Tel: (44) 1865-275368\\
42 Fax: (44) 1865-510454\\
43 e-mail: gjb@bioch.ox.ac.uk
48 Barton, G. J. (1993),\\
49 ALSCRIPT a tool to format multiple sequence alignments\\
50 Protein Engineering, Volume 6, No. 1, pp.37-40.\\
57 \section{Update History}
60 VERSION 1.0 19 June 1992
61 Version 1.1 26 June 1992
62 Version 1.2 21 October 1992: Add multiple blocks per page option.
63 Version 1.3 15 November 1992: First Distribution.
64 Version 1.4 6 December 1992: Add Colour commands.
65 Version 1.4.1 1 February 1993: Small bug fixes - FULL RELEASE VERSION.
66 Version 1.4.2 15 February 1993: Make silent_mode toggle.
67 Version 1.4.3 1 March 1993: Fix bug in colour option.
68 Version 1.4.4 24 March 1993: Add mask features (should be version 1.5).
69 Version 1.4.5 25 May 1993: Include alsnum program in distribution.
70 7 June 1993: Fix NO_NUMBERS bug in documentation
71 Change defaults for -q option to use MASK.
73 Version 2.0 23 May 1995: Numerous changes and additions including the
74 option to colour backgrounds
75 differently, ommission of idents on
76 second and subsequent lines, helix,
77 strand and other special characters,
78 relative numbering, error checking of
79 ranges on input, bounding box,
80 screening, conservation colouring...
81 Version 2.03 5 June 1996: Small bug fixes - patches incorporated.
82 BACKGROUND_REGION and BOUNDING_BOX commands moved to
87 \section{Read This First - VERSION 2.0}
89 This manual describes an interim release of ALSCRIPT that includes
90 many additional features over the previously distributed Version
91 1.4.5. I had hoped to make a lot more changes and improvements before
92 distributing the new version, however I have not had the time to do
93 this. I am distributing Version 2.0 since the new features have been
94 used in a number of published alignment figures. Please see
95 \hyperref{New Features in Version 2.0}{see the Section }{ }{new2} for details
98 \section{Related Programs}
100 The AMPS package (Barton, 1990). This performs multiple sequence
101 alignments and databank scanning.
103 AMAS (Livingstone and Barton, 1993, CABIOS, 9, 745-756). Analysis of
104 Multiply Aligned Sequences. This package uses a sophisticated
105 set-based method to identify patterns of residue conservation in
106 multiple sequence alignments.
108 All programs are available by anonymous ftp from geoff.biop.ox.ac.uk.
109 Please see the README file for details licencing and registration.
110 You can read manuals for the programs and some related papers at
111 http://geoff.biop.ox.ac.uk/.
113 \section{Availability}
115 ALSCRIPT is available free of charge for academic and non-commercial
116 purposes. Distribution is by anonymous ftp from geoff.biop.ox.ac.uk.
117 See the README file on the ftp server for details. You need to
118 register with G. J. Barton before downloading the software.
120 \section{Installing ALSCRIPT}
122 See the appropriate section for the computer type you are using:
124 For PCs see \hyperref{386 MS-DOS installation}{see Section }{}{app4}.
125 For Unix see \hyperref{Unix installation}{see Section }{ }{app5}.
126 For VMS see \hyperref{VMS installation}{see Section }{ }{app6}.
128 \section{Brief Description of ALSCRIPT}
130 ALSCRIPT takes a multiple sequence alignment in AMPS (Barton \&
131 Sternberg, 1987, Barton, 1990) block-file format and a set of
132 formatting commands and produces a PostScript file that may be printed
133 on a PostScript laser printer, or viewed using a PostScript previewer
134 (e.g. Sun Microsystem's PageView program). CLUSTAL and GCG format
135 multiple alignment files may also be used (see below). ALSCRIPT is NOT a
136 multiple sequence alignment program, nor is it an alignment editor.
138 Given a block-file and pointsize (character width/height), ALSCRIPT
139 calculates how many residues can be fitted across the page, and how
140 many sequences will fit down the page, it then prints the alignment at
141 the chosen pointsize on as many pages as are needed. Running ALSCRIPT with
142 a smaller or larger pointsize will automatically re-scale the alignment
143 to fit on fewer or more pages as appropriate. The actual page
144 dimensions may be re-set to any value, so if you have access to an A3
145 PostScript printer, or typesetting machine, alignments can readily be
146 scaled to maximise the available space.
148 Each output page has three basic regions. The left hand edge contains
149 identifier codes for each sequence. The main part of the page holds the
150 alignment, and the top part, the position numbers and tick marks. ALSCRIPT
151 commands make use of a character coordinate system for font changes,
152 and other formatting commands. Thus, any residue in the alignment may
153 be referred to by its sequence position number (x-axis) and sequence
154 number (y-axis), similarly, ranges of residue positions, or sequences
155 may also be defined in the character coordinate system.
157 The basic ALSCRIPT commands allow the following functionality:
159 Fonts: Any PostScript font at any size may be defined and used on
160 individual residues, regions or identifier codes.
162 Boxing: Simple rectangular boxes may be drawn around any part of the
163 alignment. Particular residue types may be selected and automatically
164 "surrounded" by lines. For example, if the characters 'G' and 'P' are
165 selected, then lines will not be drawn between G and P characters, but
166 only where G and P border with other characters.
168 Shading: Grey shading of any level from black to white may be applied
169 to any region of the alignment, either as a rectangular region, or as
170 residue specific shading. e.g. "shade all Cys residues between
173 Text: Specific text strings may be added to the alignment at any
174 position and in any font or font size.
176 Lines: Horizontal or vertical lines may be drawn to the left, right, top
177 or bottom of any residue position or group of positions.
179 Colour: Characters or character backgrounds may be independently
182 The example block file "example1.blc" and command file "example1.als"
183 illustrate most of these commands in action.
185 Although written with the aim of producing figures for journal
186 submission, ALSCRIPT may be used as a tool for interpreting multiple
187 sequence alignments. For example, the boxing, shading and font changing
188 facilities can be applied to highlight amino acids of a particular type
189 and thus draw attention to clusters of positive or negative charge,
193 \section{New Features in Version 1.4.4}
195 This version introduces the MASK family of commands which allows
196 boxing, shading etc to be applied according to the frequency of
197 occurence of the character types at each position in the alignment.
198 For example, it is possible to box positions where one character is
199 seen in more than N of the sequences. It is also possible to
200 box/shade etc the most frequently occurring character at each
201 position. Commands exist to select which characters will be used in
202 the calculation of frequencies and which will be excluded, thus boxing
203 can be based upon two or more character types at a position. MASK
204 commands also exist to show residues identical to one sequence in the
205 set. See the section on MASK below for details.
207 NOTE: Although boxing according to the frequency of amino acids seen
208 at a position is a popular method of representation it is not usually
209 the most informative. An analysis that takes into account the
210 physico-chemical properties of the amino acids and also relates the
211 amino acid similarities to the overall similarity between the
212 sequences is more helpful in identifying functionally important
213 residues. The AMAS program (Livingstone and Barton, 1993) applies a
214 flexible hierarchical set-based approach to this problem.
216 \section{New Features in Version 1.4.5 - Program alsnum}
218 Version 1.4.5 includes the program "alsnum". This is a temporary solution to
219 the residue numbering problem. Ultimately, these functions will be included
220 as alscript commands.
222 alsnum creates a set of TEXT commands that can be incorporated into an
223 alscript command file to place sequential numbers at any position in the
224 alignment. The numbers ignore gaps, so the numbering will correspond to
225 the specific sequence position rather than the alignment.
229 1. Decide where you want the numbers to be placed. For example, you might
230 want the numbers above the third sequence in the alignment. If so, make
231 an extra sequence space above the third sequence using the ADD\_SEQ\ command.
233 2. Decide what is the number of the first residue of the sequence to be
234 numbered. This will not always be 1 since you may be aligning fragments
237 3. Decide the numbering interval (e.g. every 10th amino acid).
241 For example, if you want to add numbers according to sequence 37 of a
242 block file (junk.blc), calling the first residue of the sequence 12,
243 and with an interval of 5, and the numbers are to be placed at the
244 location of sequence 3 in the alignment. Type:
246 alsnum 37 12 5 3 $<$ junk.blc $>$ junk.text
248 5. Add the resulting TEXT commands from junk.text to your alscript
251 \section{New Features in Version 2.0}
254 Error checking is now done on all ranges input. If you run ALSCRIPT
255 2.0 on a file that worked with ALSCRIPT 1.4.5, and it complains about
256 out of range numbers, then check your ranges carefully. If you think
257 you are right, then send me a minimal example of the problem and I
258 will investigate. Versions of ALSCRIPT before 2.0 would often work
259 happily with out of range numbers and produce perfectly OK output.
261 The files {\bf ipns.als} and {\bf ipns.blc} show example command and
262 block file that use most of these new commands. See the {\em
265 \subsection{New Step 1 Commands}
269 Usually you should not need to change this value, it alters the
270 screening used by the printer. A value of 120 is used by default. On
271 most 300dpi black and white printers this gives much smoother greys
272 than the default used in earlier versions of ALSCRIPT.
276 Will cause the block file to be saved into the file ``filename'' in
277 PIR format. This can be useful for moving block file alignments to other
282 Will cause the block file to be saved into the file ``filename'' in
283 something that approximates GCG .msf format. {\em Warning! This has not
288 Sets the colour used for numbering at the top of the alignment (no
289 American spelling at the moment). In this example, colour number 4 has been
290 defined (See the DEFINE\_COLOUR command if you are not sure what this means).
294 If this is set, ALSCRIPT assumes everything will be plotted on one
295 page. At the moment, all this does is write the bounding box for the
296 figure, so encapsulating the PostScript. This {\em may} allow the
297 output of alscript to be imported into word processors etc, but
300 ID\_ONLY\_ON\_FIRST\_LINE
302 If this is present, then sequence identifiers will only be printed on
303 the first line of the alignment. Often this looks better for small
304 alignments than the default.
306 BACKGROUND\_COLOUR\ 7
308 Sets the colour used for the background to the alignment. This can be
309 useful for preparing figures for projection. At the moment this only
310 works reliably when the SINGLE\_PAGE is also set.
313 BOUNDING\_BOX\ x\ y\ x1\ y1
315 Defines the bounding box for the figure. This is set in points (1/72 inch).
316 NOTE: In version 2.0 this was a STEP 2 Command.
318 BACKGROUND\_REGION\ x\ y\ x1\ y1
320 Defines the region to colour as background - the default is set up for A4 paper
321 so US users may have to fiddle with this. Values are points (1/72 inch).
322 NOTE: In version 2.0 this was a STEP 2 Command.
324 \subsection{New Step 2 Commands}
327 COLOUR\_TEXT\_REGION\ x\ y\ x1\ y1 colour
329 Sets the colour for TEXT command output. Similar syntax to COLOUR\_REGION, FONT\_REGION etc.
331 COLOUR\_LINE\_REGION\ x\ y\ x1\ y1 colour
333 Set the colour for LINEs in a region.
335 CALCONS\ x\ y\ x1\ y1
337 Calculate conservation values according to Zvelebil {\em et al.} for the designated
338 region. (See Livingstone \& Barton 1993 for details and further refs)
340 MASK\ CONSERVATION\ cutoff
342 If CONSCAL has been used, then mask residues according to the conservation cutoff.
344 e.g. MASK CONSERVATION 10 would mask all identities, MASK CONSERVATION 6 would
345 mask reasonably conserved positions. See examples for more on this command.
349 Draw a helix from x1 to x2 of sequence y.
353 Draw a strand from x1 to x2 of sequence y.
357 Draw a coil (horizontal line) from x1 to x2 of sequence y.
359 RELATIVE\_TO <seqnum> <startnum>
361 Set reference numbers to work relative to sequence number <seqnum>. This
362 means that in all subsequent commands, ALSCRIPT will translate your x
363 values into absolute position values in the alignment. This is {\em
364 extremely} useful since you can annotate your alignment using your
365 favourite sequences as a reference point. You no longer have to
366 translate every x position into the alignment position.
368 <startnum> is optional. If present, it specifies what the first
369 residue in the displayed sequence is. For example, you may be showing
370 residues 200-500 of a sequence, so <startnum> would be 200 rather than
371 the default of 1. {\em Warning - this is a very new feature and
372 bounds checking is not fully enabled for it}.
374 You can use RELATIVE\_TO several times in the command file to annotate
375 different sequences. RELATIVE\_TO 0 resets to the ``normal''
378 \subsection{New special TEXT commands}
380 Some special TEXT commands have been added to allow drawing of alternative
381 shapes etc. In fact this is how the HELIX, STRAND and COIL commands are implemented. The text commands are all prefixed by an @ symbol.
383 e.g. TEXT 3 6 ``@fuparrow''
385 will draw a filled up arrow at position 3,6.
387 The alternative text commands are:
389 @leftarrow - an open left pointing arrow.
391 @fleftarrow - a filled left pointing arrow.
393 @uparrow - an open up pointing arrow.
395 @fuparrow - a filled up pointing arrow.
397 @downarrow - an open down pointing arrow.
399 @fdownarrow - a filled down pointing arrow.
401 @circle - an open circle.
403 @fcircle - a filled circle.
405 I plan to make this option more flexible in the near future.
407 \section{Running ALSCRIPT}
408 \subsection{Basic Use}
410 I recommend you read through this section, then scan the commands in
411 \hyperref{ALSCRIPT Command Summary}{Section }{ }{app1} to get a feel for what ALSCRIPT can do.
413 See \hyperref{Alternative ways of invoking ALSCRIPT}{Section }{for
414 alternative methods of invoking ALSCRIPT}{app7}. In this section, the
415 interactive method is described. The QUICK START method shown in
416 \hyperref{alternative ways of invoking ALSCRIPT}{Section }{ }{app7}
417 is useful to format a sequence alignment quickly
418 using standard pointsize and shading.
420 ALSCRIPT is designed to work with AMPS block file format multiple
421 alignments. If you have a multiple alignment generated by CLUSTAL V
422 or the GCG package, then it must be translated to AMPS block file
425 To translate a GCG .MSF file: Type: msf2blc.
426 To translate a CLUSTAL PIR format file, or any PIR format file: clus2blc.
428 Both programs prompt for the name of an input file, and an output
429 block file name. A good convention to follow is to name all blockfiles
430 with the extension ".blc".
432 To run ALSCRIPT simply type:
436 you will then be prompted for the name of the ALSCRIPT command file.
437 Having typed the filename, the commands will be executed as you have
440 A Simple Command File (example.als)
442 The file example.blc contains a small multiple sequence alignment.
443 The following ALSCRIPT command file will convert this into a
444 PostScript alignment file in 12 point Helvetica.
447 #Comments in ALSCRIPT command files start with a #
449 #Commands are free format - separated by blank, tab or comma characters
451 BLOCK_FILE example.blc #define the block file to format
452 OUTPUT_FILE example.ps #where to put the result
453 LANDSCAPE #landscape paper orientation
454 POINTSIZE 12 #12 point default pointsize
455 DEFINE_FONT 0 Helvetica DEFAULT #set font 0 to be Helvetica
456 SETUP #Tell the program to get on with it.
459 Now try changing the POINTSIZE value to 5 ALSCRIPT will re-format the
460 alignment to make best use of the available paper.
462 These are all STEP 1 commands - they refer to overall layout, and system
463 settings - for example, the paper size or maximum sequence length.
464 Other commonly used STEP 1 commands are IDENT\_WIDTH\ which reserves more
465 or less width for the sequence identifier codes, NUMBER\_SEQS\ which adds
466 a number to each sequence and LINE\_WIDTH\_FACTOR\ which allows the
467 thickness of all boxing lines to be adjusted. See
468 \hyperref{ALSCRIPT Command Summary}{Section }{ }{app1}
469 for details of these and all other STEP 1 commands.
471 The simple example outlined above can be modified with a variety of
474 for example file example2.als:
479 #Commands are free format - separated by blank, tab or comma characters
481 BLOCK_FILE example.blc #define the block file to format
482 OUTPUT_FILE example2.ps #where to put the result
483 LANDSCAPE #landscape paper orientation
484 POINTSIZE 12 #12 point default pointsize
485 DEFINE_FONT 0 Helvetica DEFAULT #set font 0 to be Helvetica
486 DEFINE_FONT 1 Helvetica REL 0.5 #set font 1 to be half sized Helvetica
487 DEFINE_FONT 3 Helvetica-Bold DEFAULT #set font 3 to be Bold Helvetica
488 DEFINE_FONT 4 Times-BoldItalic DEFAULT #font 4 is Times-BoldItalic
489 NUMBER_SEQS #Number the sequences at left hand side
490 SETUP #Tell the program to get on with it.
492 #step 2 commands come after the SETUP command
494 #Here are some examples...
496 SURROUND_CHARS GP ALL #draw lines around all G and P
497 SHADE_CHARS ILVW ALL 0.6 #shade all I L V and W with value 0.6
498 BOX_REGION 1 1 2 20 0.8 #rectangular box from positions 1 to 2 of sequences 1 to 20
499 FONT_CHARS C ALL 3 #Use font 3 (BOLD Helvetica) for C characters
500 ID_FONT ALL 1 #set identifiers in font 1
504 There are many possible ways of combining these commands and the
506 \hyperref{ALSCRIPT Command Summary}{Section }{ }{app1}.
507 In general, if you apply multiple
508 commands to the same residue, the effect of the last applied command
509 persists where there would otherwise be a conflict. Thus the
510 intersection of two overlapping SHADed regions would be shaded
511 according to the second SHADE command, not some mixture of the two.
512 Similarly for FONT commands. BOX and SURROUND commands behave in the
513 opposite sense, all BOXing and SURROUNDing persists regardless of how
514 many commands you issue. This makes it possible for example, to
515 SURROUND two different sets of residues as follows:
519 SURROUND_CHARS DE ALL
520 SURROUND_CHARS DEHKR ALL
524 This would result in D and E characters being partitioned from the rest as well
525 as D E H K R characters (see Example output).
527 \subsection{More complex effects - Text Lines, and Masks}
529 Text, lines and masking are meant to be used to annotate the multiple
530 alignment. The TEXT command allows any piece of text to be located
531 anywhere on the alignment. Clearly, however it makes little sense to
532 superimpose the text over the alignment though this can be done! Accordingly,
533 you must first make a space to put the text in. Usually, this will be a
534 few lines below the multiple sequence alignment, but you may want to
535 add text at the top, or somewhere in between two sequences. You can make
536 space in two ways. Either by editing the block-file to introduce "dummy"
537 sequences at the locations you want, or by making use of the ADD\_SEQ\
540 The ADD\_SEQ\ command has two arguments, the sequence after which you want
541 further sequences to be added, and how many blank sequences you need.
542 Thus, we can reserve space for 5 lines of text underneath a 10 sequence
543 multiple alignment with the following command.
547 we can then put text below the alignment at the 20th residue.
549 TEXT 20 13 "Active Site His"
551 or any other position.
553 Similarly, we could draw a vertical line to point out which residue we mean
557 And change the font of the text to number 7 (whatever that has been set to):
559 FONT\_RESIDUE\ 20 13 7
561 You can have multiple ADD\_SEQ\ commands, but they must occur in sequence order.
575 Is NOT!! NO CHECKING IS performed by the program for this error - so beware!
577 Note that add\_seq\ commands refer to the actual sequence number as
578 implied by the block file, not the number after applying the add\_seq\
579 command. Thus, for a four sequence block file, if you want to add
580 space for three sequences before sequence 1 and two sequences after
581 sequence 3, the commands would be:
588 Text added with the TEXT command will not be split across page breaks,
589 so you may in some circumstances need to fiddle a little with the
590 location/pointsize for the text to get the desired result.
592 Masking is a technique for drawing irregular shaped outlines, or
593 shaded regions - this should not be confused with the MASK family of commands
594 described below. For example a histogram can be added to the bottom of
595 an alignment by first defining some dummy sequences in the block-file
596 that have letters building up the shape of the histogram, then using
597 the SURROUND\_CHARS\ or SHADE\_CHARS\ commands together with the SUB\_CHARS\
598 command to produce the desired effect. An example of this operation
599 being used to show frequencies of secondary structure predictions is
600 shown in example1.als and in the Protein Engineering paper.
602 \section{Using Colour}
604 Version 1.4 includes commands to allow the independent colouring of
605 characters, or their backgrounds. Colours are defined in a similar
606 manner to fonts using the DEFINE\_COLOUR\ command (American spelling also
607 allowed). For example:
609 DEFINE\_COLOUR\ 7 1 0 0
611 defines colour number 7 to be red - see
612 \hyperref{ALSCRIPT Command Summary}{Section }{ }{app1}
614 this command. Colours 99 and 100 are pre-defined to white and black.
615 ALSCRIPT assumes the paper colour is white.
617 The command to colour the text of a character or text string is:
621 the command to colour the background of a character is
625 both have similar syntax to the FONT\_CHARS command.
627 COLOUR\_REGION\ and COLOUR\_RES\ have similar syntax to SHADE\_REGION\ and
630 An example command file that uses colour is shown in example3.als.
632 \section{The MASK command family}
634 The idea behind the MASK command is to build up a set of character
635 positions that will subsequently be boxed, shaded, set in a particular
636 font, etc. For example, lets say we want to box the most frequently
637 occuring character at each position in an alignment.
643 tells ALSCRIPT to prepare a mask.
647 specifies that the most frequently occuring character at each position
648 in the alignment will be masked. This command can be restricted to a
649 region of the alignment using: mask FRE sx sy ex ey, where sx etc
650 define the region in the same way as for font\_region\ and other commands.
654 Tells ALSCRIPT to create the boxing lines that will separate the masked
655 characters from non-masked characters - this command may also be
656 restricted to a region of the alignment.
658 The mask can be reset for re-use using the command:
662 Two further commands define which characters can be used when
663 calculating the mask. This allows gap-characters, or other amino acids
664 to be excluded from the calculation to avoid unwanted boxing.
668 defines the AVL and I as the only characters that will be used when
669 calculating the mask.
673 defines .- and \_\ as characters that will not be used when calculating
674 the mask. NOTE: the blank character " " cannot be defined in this way.
675 To avoid boxing " " characters substitute blanks for something else
676 (using SUB\_CHARS),\ calculate the mask, then substitute back.
678 \subsection{Summary of mask commands}
681 \# Prepares for masking
683 mask LEGAL $<$qstring$>$
684 \# defines characters to include in ID or FRE calcs - optional.
686 mask ILLEGAL $<$qstring$>$
687 \# defines characters to exclude in ID or FRE calcs - optional.
690 \# Calculates a mask that flags the character that occurs at least N
691 times at a position. The word ALL can be substituted by four numbers
692 that define a region of the alignment.
695 \# calculates a mask that flags the most frequently occuring amino acid
696 at each position. ALL may be replaced by four numbers defining a region
699 Multiple mask FRE or mask ID commands may be applied, using different
700 LEGAL and ILLEGAL character definitions. In this way more complex
701 effects can be built up.
703 The mask command also allows characters that are identical to one
704 sequence to be masked.
708 will mask all positions that are identical to the Nth sequence. Thus,
709 for sequences that are very similar to a newly sequenced sequence, all
710 characters identical to the new sequence can be boxed or shaded, or set
711 in a different font or colour etc...
715 allows the mask to be inverted. Thus, all positions that are NOT in the
716 mask now form the mask. So, having done a mask AGREE, a mask NOT will
717 allow the positions that are not identical to the selected sequence to
718 be highlighted or substituted.
720 mask SUB ALL $<$char$>$
722 substitutes all characters in the mask with the character $<$char$>$.
725 applies a mask to all residues in the defined region.
727 The following effects can now be applied to the masked characters:
730 \# boxes the masked residues - ie surrounds them by lines.
732 mask SHADE ALL $<$grey$>$
733 \# shades the masked residues by grey value.
735 mask FONT ALL $<$fontnum$>$
736 \# Uses font fontnum to output the masked residues.
739 \# Inverts the masked characters - ie outputs them in white.
741 mask CCOL ALL $<$colnum$>$
742 \# outputs the masked characters in the defined colour.
744 mask SCOL ALL $<$colnum$>$
745 \# outputs the backgrounds of the masked characters in the defined
748 In all commands, the word ALL can be replaced by four numbers defining
749 the region to which the command is applied.
752 \# resets the entire mask for re-use
755 \section{Printing ALSCRIPT Files}
757 ALSCRIPT produces files in PostScript which may be printed on a PostScript
758 printer (e.g. an Apple LaserWriter). If you don't have a PostScript printer,
759 then you may still be able to use ALSCRIPT if you get hold of the
760 GhostScript software. This is a free package that interprets PostScript
761 commands and can produce output on a large number of different types of
762 printer. GhostScript runs on most hardware types (including PCs) and can also
763 display output to the screen. The package can be obtained from many different
764 sites on the Internet (In the UK try src.doc.ic.ac.uk).
766 The actual command you need to type to send a PostScript file to the printer
767 will depend on your system. Consult your system manager for help.
769 Be warned, ALSCRIPT can create extremely large PostScript files if
770 lots of boxing and shading is done on big alignments. On older
771 printers such output may take a long time to process.
775 ALSCRIPT provides a powerful set of formatting and editing commands
776 specifically tailored for multiple sequence alignments. It is best
777 used in conjunction with a PostScript previewer such as Sun's
778 PageView or GhostView since this allows the effect of changing a command to be
779 seen quickly. In the absence of such a tool, simpler effects can be
780 tested out without destroying too many trees in the Laser Printer!
782 Like most programs, ALSCRIPT is evolving as I find new problems to display, so
783 if you have any suggestions - I shall endeavour to include them in a later
788 \subsection{ALSCRIPT Command Summary}
791 WARNING: Very little error checking is performed on command input. If
792 you give the wrong number of arguments to a command, then unexpected
793 things may happen, or the program will crash very inelegantly. I hope
794 to fix this in the next version of the program, in the meantime, make sure
795 you give the correct number of arguments to each command.
797 All commands up to the first space character may be entered in UPPER
798 or lower case or MiXEd case. Qualifiers for commands (e.g. REL) must
799 be written in UPPER case.
804 <int> = enter an integer (e.g. 240)
805 <float> = enter a floating point number (e.g. 0.45)
806 <string> = enter a string (e.g. ARNDql)
807 <qstring> = enter a quoted string (e.g. "Active Site")
808 <char> = enter a single character.
811 \subsubsection{STEP 1 COMMANDS}
813 These all refer to either system settings - e.g. the maximum allowed
814 sequence length, or to general page layout features. e.g. the longest
815 and shortest side of the paper on which you are plotting.
817 \subsubsection{REQUIRED STEP 1 COMMANDS}
819 BLOCK\_FILE\ $<$string$>$
821 Gives the name of the file that contains the multiple sequence alignment
822 to be formatted. File names should be fully qualified i.e. not
823 relative to the current directory. If no block file command is given,
824 ALSCRIPT will expect to read the block file from standard input.
826 OUTPUT\_FILE\ $<$string$>$
828 Defines the output file name. This command should be near the beginning of
830 e.g. OUTPUT\_FILE\ Figure1.ps
832 You MUST define an output file unless the -p option
833 (See \hyperref{Alternative ways of invoking ALSCRIPT}{Section }{ }{app7})
836 DEFINE\_FONT\ $<$int$>$ $<$string$>$ ($<$int$>$/DEFAULT)/(REL $<$float$>$)
838 Defines a font to use later:
841 DEFINE_FONT 0 Helvetica 10
842 DEFINE_FONT 2 Times-Roman 2
845 defines font number 0 to be 10 point Helvetica, and font number 2 to be
846 2 point Times-Roman. Font 0 is always used as the default font. You MUST
847 define at least the font 0 font.
849 DEFINE\_FONT\ 3 Times-Roman DEFAULT
851 sets font 3 to be Times-Roman at whatever the default pointsize
852 is as set by the POINTSIZE command.
854 DEFINE\_FONT\ 4 Helvetica REL 0.5
856 sets font 4 to be helvetica at half the default pointsize.
858 NOTE: Font names must be written exactly as shown in
859 \hyperref{PostScript Fonts}{Section }{ }{app3}.
864 Signals the end of the STEP 1 commands.
866 \subsubsection{OPTIONAL STEP 1 COMMANDS}
869 ADD\_SEQ\ $<$int$>$ $<$int$>$
871 Allows extra sequence positions to be created in an existing alignment. This
872 permits additional annotations to be interspaced either above, below, or
873 anywhere in the middle of an alignment. For example:
877 would create an additional 10 sequences - all set to the blank character
878 before the first sequence in the block file that has been read in.
882 would add an extra sequence after sequence 3.
884 IMPORTANT: If you use the ADD\_SEQ\ facility to add sequences anywhere except
885 after the last sequence, then remember that the sequence
886 numbers will alter. All formatting commands that follow this command
887 must use the new sequence numbering. Thus in the first example:
891 what was sequence number 1 becomes sequence 11. Sequences 1-10 are
892 the new blank sequences to be used for annotation. Note that the
893 sequence numbers only change for commands AFTER the SETUP command,
894 thus, multiple add\_seq\ commands refer to the sequence number as
895 implied by the block file.
899 Defines the pointsize to be used to scale the plot and space the characters.
904 If present, then the sequence number is output with the identifier
905 code. This is useful for finding the coordinates of residues to box
906 or otherwise highlight.
910 Specifies that alignments will be plotted with the longest paper axis
911 horizontal. (Can get longer alignments on a page this way).
915 Specifies that alignments will be plotted with the longest paper axis vertical
916 (can get more sequences on a page this way).
918 IDENT\_WIDTH\ $<$int$>$
920 Units are characters.
922 Reserves $<$int$>$ characters at left of every page for plotting
923 identifiers. Note that not all this space need be used, if a smaller
924 pointsize is used to plot out the identifier codes, than is used for
927 LINE\_WIDTH\_FACTOR\ $<$float$>$
929 Value greater than 0 that scales the default line width. The linewidth
930 is obtained by multiplying the pointsize by this factor.
933 X\_SPACE\_FACTOR\ $<$float$>$
935 Y\_SPACE\_FACTOR\ $<$float$>$
937 This determines the spacing between adjacent residues in the X and Y
938 directions. The spacing is calculated as: POINTSIZE + POINTSIZE *
939 X\_SPACE\_FACTOR or POINTSIZE + POINTSIZE * Y\_SPACE\_FACTOR\ as
940 appropriate. Defaults are 0.2 and 0.0 respectively.
942 X\_SHIFT\_FACTOR\ $<$float$>$
944 Y\_SHIFT\_FACTOR\ $<$float$>$
946 These determine the shift relative to the residue drawing position
947 that is given to the boxing lines. The shift is calculated as
950 (POINTSIZE + POINTSIZE * X\_SPACE\_FACTOR)\ * X\_SHIFT\_FACTOR\
951 similarly for Y\_SHIFT\_FACTOR.
953 The defaults are 0.3 and 0.0 respectively.
955 Fiddling with the X\_SPACE/SHIFT\ values is useful to fine tune the
956 appearance of the alignment.
959 MAX\_INPUT\_LEN\ $<$int$>$
961 Units are characters.
962 Defines the maximum number of characters possible in the input line
963 length. This must be greater than the maximum number of sequences
966 e.g. MAX\_INPUT\_LEN\ 600
968 Increases the default value of 500 characters to 600 characters.
972 Units are characters. Defines the maximum number of sequences that
973 may be read by the program. This parameter has a large default (500).
974 You may need to reduce it on computers with small memories.
978 Units are characters.
979 The maximum length allowed for a sequence identifier code.
981 MAX\_SEQ\_LEN\ $<$int$>$
983 Defines the maximum length allowed for a sequence alignment - this may
984 need to be reduced from the 8000 default value on smaller computers.
987 Units of points (1/72 inch).
989 Defines the offset along the X-axis that the alignments will be shifted prior
990 to printing. Fiddle with this value to get a nice offset from the bottom left
991 hand corner of the page if your page size is not A4.
994 Units of points (1/72 inch).
996 As for X\_OFFSET,\ only Y axis.
1001 Defines the length of the longest side of the printer page.
1003 MIN\_SIDE\ $<$int$>$
1006 Defines the length of the shortest side of the printer page.
1008 VERTICAL\_SPACING\ $<$int$>$
1010 Defines the vertical spacing in character units between blocks of sequences
1011 when more than one block will fit on a page - default is 0.
1014 DEFINE\_COLOUR\ $<$int$>$ $<$float$>$ $<$float$>$ $<$float$>$
1018 Defines a colour - the first number is a number by which the colour will
1019 be referred. The following three numbers are the intensities of red,
1020 green and blue respectively. Thus:
1022 DEFINE\_COLOUR\ 1 0 0.2 0.8
1024 sets colour number 1 to be a colour with no red, 0.2 green and 0.8 blue.
1025 The exact appearance of this colour will depend on the output device.
1026 If you find suitable combinations of colours for your printer, then
1027 please let me know and I shall distribute your suggestions with the
1032 If present, then tick marks are drawn below the numbers at the top of the page.
1033 Otherwise no ticks are shown.
1035 NUMBER\_INT\ $<$int$>$
1037 Specifies the interval for writing residue position numbers. Default is 10
1041 Switches all residue numbering off.
1044 \subsubsection{STEP 2 COMMANDS}
1047 All these are optional formatting commands.
1049 IMPORTANT PLEASE READ THIS NOTE:
1051 For those commands that accept region definitions (e.g. SURROUND\_CHARS) it
1052 is easiest to think of the region being defined in terms of X and Y
1053 coordinates, where X is the sequence residue coordinate and Y is the
1054 sequence number coordinate. Thus 3 7 means the 3rd residue in sequence 7.
1055 3 7 12 42 means the rectangular box bounded by residue 3 of sequence 7 and
1056 residue 12 of sequence 42.
1058 SURROUND\_CHARS\ $<$string$>$ ALL
1060 Draw lines round, but not between the characters that are in the string.
1063 SURROUND\_CHARS\ GP ALL
1065 will draw lines round all G and P characters in the alignment, but not
1066 between adjacent G and P characters.
1068 SURROUND\_CHARS\ $<$string$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$
1070 Similar command, but the surrounding is restricted to the region defined by
1075 SURROUND\_CHARS\ ILVW 3 12 7 32
1077 would surround ILVW characters that occur in the region defined
1078 from residue positions 3-7 of sequences 12 to 32.
1080 SHADE\_CHARS\ $<$string$>$ ALL $<$float$>$
1082 Shade all characters in the $<$string$>$ by the grey value given by $<$float$>$.
1085 SHADE\_CHARS\ GP ALL 0.5
1087 would shade all G and P characters in the alignment by the grey value 0.5.
1089 SHADE\_CHARS\ $<$string$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$float$>$
1091 restricts the shading to the region defined by the four integers. Thus
1093 SHADE\_CHARS\ ILVW 3 12 7 32 0.7
1095 would shade I L V and W characters from residues 3-7 of sequences 12-32
1096 inclusive with a grey value of 0.7.
1098 FONT\_CHARS\ $<$string$>$ ALL $<$int$>$
1102 FONT\_CHARS\ GP ALL 7
1104 would use font 7 to write out all G and P characters. Font 7 MUST have been
1105 defined using the DEFINE\_FONT commands above.
1107 FONT\_CHARS\ $<$string$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$
1109 Similar to previous command, but restricts the effect to the region defined
1110 by the first four integers. The font must have been defined by the
1111 DEFINE\_FONT\ command.
1115 FONT\_CHARS\ ILVW 3 45 9 70 7
1117 Would set the font to 7 for I L V and W characters for residues 3-9 of
1118 sequences 45-70 inclusive. The font must have been defined by the
1119 DEFINE\_FONT\ command.
1121 FONT\_REGION\ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$
1123 Define the font to use throughout the region specified by the first four
1128 FONT\_REGION\ 3 12 20 40 10
1130 Use font 10 for residues from residues 3-20 of sequences 12-40. The font
1131 must have been defined using the DEFINE\_FONT command.
1133 FONT\_RESIDUE\ $<$int$>$ $<$int$>$ $<$int$>$
1135 Set the font for use with a single residue position - most useful when used
1136 with the TEXT command.
1140 FONT\_RESIDUE\ 3 7 2
1142 Use font 2 for residue 3 of sequence 7. Font 2 must have been defined using
1143 the DEFINE\_FONT\ command.
1145 LINE $<$string$>$ $<$int$>$ $<$int$>$ $<$int$>$
1147 There are four commands of this type for drawing horizontal or vertical lines
1150 LINE LEFT $<$int$>$ $<$int$>$ $<$int$>$
1152 Draw a line to the left of the character positions indicated.
1158 Draw a vertical line starting at residue 3 of sequence 12 and ending at
1159 residue 3 of sequence 24.
1163 Draw a horizontal line above the character positions from residue 3 of
1164 sequence 12 to residue 24 of sequence 12.
1166 Similar commands are:
1168 LINE BOTTOM $<$int$>$ $<$int$>$ $<$int$>$ Draw a line at bottom of character position.
1170 LINE RIGHT $<$int$>$ $<$int$>$ $<$int$>$ Draw a line at right of character position.
1172 BOX\_REGION\ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$
1174 Draw a box around the region indicated by the four integers.
1178 BOX\_REGION\ 2 5 30 7
1180 Would box from residue 2 of sequence 5 to residue 30 of sequence 7.
1182 SHADE\_REGION\ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$float$>$
1184 Shade the region indicated by the integers with the grey value shown by the
1188 SHADE\_REGION\ 30 40 35 46 0.2
1190 Would shade from residue 30-35 of sequences 40-46 with a grey value of 0.2.
1192 SHADE\_RES\ $<$int$>$ $<$int$>$ $<$float$>$
1194 Shade just one amino acid with the grey value.
1200 Shades residue 3 of sequence 7. (Note: this can also be achieved with the
1202 SHADE\_REGION\ command, but requires 2 extra numbers)
1204 TEXT $<$int$>$ $<$int$>$ $<$qstring$>$
1206 Place the text string at the location indicated.
1210 TEXT 30 70 "Active Site His"
1212 would put the text Active Site His starting at position 30 of sequence
1213 70. (Use FONT\_RESIDUE\ or FONT\_REGION\ commands to set the font of the
1214 text). Text added with the TEXT command will not be split across page
1215 breaks, so you may in some circumstances need to fiddle a little with
1216 the location/pointsize for the text to get the desired result.
1218 ID\_FONT\ ALL $<$int$>$
1220 Set the font for all identifier codes to the font number shown by $<$int$>$.
1225 Would set all the identifier codes to font 3.
1227 ID\_FONT\ $<$int$>$ $<$int$>$
1229 Set the font for a specific identifier to font number.
1234 Use font 4 for the identifer of sequence 12, default font for all other
1237 SUB\_CHARS\ ALL $<$char$>$ $<$char$>$
1239 Substitute the characters indicated.
1245 would change all occurences of + to * in the alignment.
1247 SUB\_CHARS\ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$char$>$ $<$char$>$
1249 restrict the substitution to the region shown.
1253 SUB\_CHARS\ 1 1 7 8 \% *
1255 would substitute * for \% from residue 1-7 of sequences 1-8.
1256 NOTE: To substitute for or with the space character use the word SPACE.
1257 e.g. to change all space characters to -.
1259 SUB\_CHARS\ ALL SPACE -
1261 SUB\_ID\ $<$int$>$ $<$qstring$>$
1263 Replace the numbered identifier by the string.
1266 SUB\_ID\ 34 "Predicted Secondary Structure"
1268 would replace whatever the identifier of sequence 34 was, by the text shown.
1269 This is useful when used in conjunction with the ADD\_SEQ\ command shown under
1270 the STEP 1 commands.
1272 INVERSE\_CHARS\ $<$string$>$ ALL/Range (similar syntax to FONT\_CHARS\ but no
1275 Print the selected characters in white. This clearly will only work
1276 if you first use the SHADE\_CHARS command to shade the characters with
1277 something other than white.
1279 CCOL\_CHARS\ $<$string$>$ ALL $<$int$>$
1281 Colour all characters in the $<$string$>$ by the colour defined by $<$int$>$.
1285 CCOL\_CHARS\ GP ALL 12
1287 would colour all G and P characters in the alignment by the colour 12.
1288 This colour MUST have been defined by the DEFINE\_COLOUR\ command.
1290 CCOL\_CHARS\ $<$string$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$
1292 restricts the colouring to the region defined by the four integers. Thus
1294 CCOL\_CHARS\ ILVW 3 12 7 32 7
1296 would colour I L V and W characters from residues 3-7 of sequences 12-32
1297 inclusive with the colour 7.
1299 SCOL\_CHARS:\ This has identical syntax to SCOL\_CHARS,\ but colours the
1300 background of the character, rather than the letter itself.
1302 COLOUR\_REGION\ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$ $<$int$>$
1306 Colour the region indicated by the integers with the colour number given
1311 COLOUR\_REGION\ 30 40 35 46 2
1313 Would colour from residue 30-35 of sequences 40-46 with the colour 2.
1315 COLOUR\_RES\ $<$int$>$ $<$int$>$ $<$int$>$
1317 Colour just one amino acid with the defined colour.
1323 Colours residue 3 of sequence 7. (Note: this can also be achieved with the
1324 COLOUR\_REGION\ command, but requires 2 extra numbers)
1327 \subsection{AMPS Block file format}
1330 The first part of a block-file contains the identifier codes of the
1331 sequences that are to follow. Each code is prefixed by the $>$ symbol, codes
1332 must not contain spaces.
1345 ALSCRIPT counts the number of $>$ symbols in the beginning of the file
1346 until a * symbol is found. The * signals the beginning of the
1347 multiple alignment which is stored VERTICALLY, thus columns are
1348 individual sequences, whilst rows are aligned positions. The * symbol
1349 must lie over the first sequence. A further star in the same column
1350 signals the end of the alignment. ALSCRIPT uses the number of $>$
1351 symbols at the beginning of the file to work out how many columns to
1352 read from the * position. It is therefore important that the only $>$
1353 symbols in the file are those that define the identifiers, and the
1354 only * symbols are those defining the start and end of the multiple
1355 alignment. The block file can contain additional text, providing that
1356 there are no more $>$ or * symbols in the file than those used to define
1357 the identifiers or alignment start and end.
1359 A simple, small block-file is shown here.
1380 \subsection{PostScript Fonts}
1390 Helvetica-BoldOblique,
1394 Courier-BoldOblique,
1396 AvantGarde-BookOblique,
1398 AvantGarde-DemiOblique,
1402 Bookman-LightItalic,
1404 Helvetica-Narrow-Bold,
1405 Helvetica-Narrow-BoldOblique,
1406 Helvetica-Narrow-Oblique,
1407 NewCenturySchblk-Roman,
1408 NewCenturySchlbk-Bold,
1409 NewCenturySchblk-Italic,
1410 NewCenturySchblk-BoldItalic,
1415 ZapfChancery-MediumItalic.
1419 \subsection{386 DOS installation}
1423 IMPORTANT - The programs on this disk will ONLY WORK on a PC with a 386
1424 or better processor. See the Technical Notes section for details of why.
1431 Create a directory on your hard disk.
1432 e.g. mkdir ALSCRIPT.
1435 Copy the Contents of the floppy disk into this directory.
1437 e.g. copy a:*.* c:\alscript.
1441 Edit your AUTOEXEC.BAT file and add
1443 C:\ALSCRIPT to your path.
1447 Edit your AUTOEXEC.BAT file and add the following two lines.
1448 set DOS4GVM=@ALSCRIPT.VMC
1452 The first line is an instruction to read instructions from the file
1453 ALSCRIPT.VMC. This sets up a permanent swap file on your hard disk.
1454 By default, the swap file is about 12MBytes in size. If you do not have
1455 this much free space on your disk, then edit the ALSCRIPT.VMC file
1456 to reduce the swap file size, or alternatively, do not put this line
1457 in your autoexec.bat.
1459 The programs will run without this swap file, but you will be limited in
1460 the size of alignment you can process by the amount of RAM you have
1461 installed. I have only tested this program on a 486/33 with 8MBytes RAM
1462 and a 386/33 with 4MBytes so I do not know the practical limitations of
1463 machines with smaller memories. Any feedback would be appreciated.
1465 5. Type AUTOEXEC.BAT to initialise the changes, or better still, reset
1468 6. You should now be able to run all three programs in the package from
1469 anywhere on your disk. msf2blc, clus2blc and alscript. If you get
1470 memory allocation errors when you try to run alscript, then use the
1471 MAX\_NSEQ\ and MAX\_SEQ\_LEN\ commands to reduce the default limits. If the
1472 program still won't run, then think about buying some more memory!!
1474 The programs msf2blc and clus2blc should run OK, but if you try to
1475 process alignments that are too large for your computer, you may get a
1476 "malloc error" which will stop the program. If this happens and you are
1477 not using the virtual memory option discussed above, then try adding the
1478 line set DOS4GVM=@filename to your autoexec.bat file. If you
1479 don't have enough disk space to do this, then buy a bigger disk, or more
1482 \subsection{TECHNICAL NOTES}
1484 The executables included in this package were compiled with the WATCOM C
1485 compiler. This is a full 32 bit compiler that makes good use of the 386
1486 processor and does not work on the 16 bit 286. It also has the
1487 advantage of allowing the flat memory model to be used. In practice
1488 this means that porting programs like alscript from Unix computers like
1489 the Sun, is straightforward. In order to access the memory of the
1490 computer in this way, an extra program called a dos extender is required
1491 - this is called DOS4GW.EXE. DOS4GW is automatically invoked every time
1492 you run one of the programs and is responsible for managing the memory and
1493 creating the swap file discussed above.
1495 \subsection{Unix Installation}
1498 ALSCRIPT is distributed with executables for Sun (SunOS 4.1.3), Silicon
1499 Graphics (IRIX 5.3), DEC ALPHA OSF/1 and Sun Solaris (2.4). The executables
1500 are stored in the subdirectories bin/sun, bin/sgi, bin/osf and bin/sol. If
1501 these are OK for your system, then just add the apporpriate directory to your
1502 path, or put links to /usr/local/bin or somewhere that is on all users paths.
1504 The source code for ALSCRIPT is contained in a directory hierarchy.
1505 The top directory contains a README file and the BUILD script.
1506 Subdirectories are: {\bf examples} which contains example command and
1507 alignment files, {\bf doc} which contains \LaTeX and PostScript copies
1508 of the manual - a subdirectory of this contains an HTML version of the
1509 manual, and {\bf src} which contains the source code and Makefiles for
1510 the package. There may also be a directory called {\bf bin}. If
1511 present this will contain subdirectories with executables for the
1512 programs in the package. Makefiles to build alscript, msf2blc,
1513 clus2blc and alsnum are included in the {\bf src} directory. Versions
1514 for Sun (acc compiler .sun), Silicon Graphics (.sgi), DEC OSF/1 (.osf)
1517 There is a utility csh script called BUILD. Simply type ./BUILD sun to
1518 compile alscript on the Sun, ./BUILD sgi for Silicon Graphics or BUILD
1519 gcc for use with gcc compiler. See instructions in the file BUILD.
1520 The BUILD script will create a /bin directory and subdirecotry
1521 if not already present. You can create makefiles for different computers
1522 and the BUILD script should still function.
1525 \subsection{VAX/VMS Installation}
1528 The standard VAX C compiler is not ANSI. Accordingly, ALSCRIPT will require
1529 changes to the source code to compile on a VAX.
1531 The DEC C++ compiler works OK for alscript. Alscript will also compile
1532 on Dec ALPHA under OpenVMS. A descrip.mms file is included for this
1535 {\em WARNING: I've not tested Version 2.0 of ALSCRIPT on VMS}
1537 \subsection{Alternative ways of invoking ALSCRIPT}
1540 The documentation above describes the interactive mode of running ALSCRIPT.
1541 However, it may be more convenient to run the program as a pipe under
1542 Unix or MS-DOS. Examples are shown here.
1544 ALSCRIPT is a program for producing pretty versions of multiple
1545 sequence aligments. ALSCRIPT will also format single sequences. A
1546 full description of the program is given in the file "alscript.doc".
1548 Ways of running alscript:
1553 Interactive mode: just type alscript.
1554 You will be prompted for a command file name. The command file will
1555 define the AMPS blocfile, and name of the file to store the PostScript
1556 output - see alscript.doc for details.
1559 alscript $<$command\_file$>$\ has same effect as 1, But does not prompt for
1561 e.g. alscript example1.als
1564 alscript -q $<$ $<$blocfile$>$ $>$ $<$PostScript$>$
1565 Quick mode - uses default commands, reads alignment from stdin,
1566 writes PostScript to stdout. This mode creates a command file
1569 e.g. alscript -q $<$ example1.blc $>$ example1.ps
1572 alscript -f $<$command\_file$>$
1573 Similar effect to 2.
1576 alscript -f $<$command\_file$>$\ -s
1577 Silent operation: No messages are written to stderr, unless fatal.
1578 Silent operation may be toggled by the silent\_mode\ command
1579 in the command file.
1582 alscript -f $<$command\_file$>$\ -p $<$ $<$blocfile$>$ $>$ $<$PostScript$>$
1583 Make alscript work like a pipe - blocfile is read from stdin,
1584 postscript is written
1585 to stdout. Messages are written to stderr. To supress messages include
1588 e.g. alscript -f example1.als -p -s $<$ example1.blc $>$ example1.ps
1592 Using alscript as a pipe has the advantage of allowing the blocfile to
1593 be created on the fly by the programs msf2blc or clus2blc. For example
1594 if we have a GCG .msf file called "pileup.msf" we can run alscript with
1595 default shading/fonts and send the results straight to the PostScript
1596 printer "lpr" as follows:
1598 msf2blc -q $<$pileup.msf | alscript -q -s | lpr
1600 \subsection{Program Crashes and Known Bugs}
1603 We've used ALSCRIPT on Sun Workstations and Silicon Graphics for some
1604 time, with very large alignments and command files with thousands of
1605 commands. All seems to work OK, the program has not crashed on us at
1608 However, the command interpreter in ALSCRIPT is very simple and
1609 the program will crash if you give any command the wrong number of
1610 arguments (e.g. leaving out the shade value in a shade\_chars\ command).
1612 If you do make the program crash, have checked all the documentation
1613 and your numbers, and the program still crashes. Then send me
1614 the command file and block file that causes the crash and I will try
1617 Suggestions for improvements to the program are always welcome.
1619 \subsection{Wish List for next version!!}
1622 A command interpreter that does more error checking will be included.
1623 Currently, no checking is done to make sure that the correct number of
1624 arguments are given to a command.
1626 Sequences will be able to be given unique labels and region commands refer
1627 to these labels or ranges of labels. This will permit a sequence to be
1628 deleted or added to the alignment without having to update the .als file.
1630 The relative numbering option will be extended to allow numbering relative
1631 to a position. e.g. 456+7 would be 7 residues after position 456. This
1632 will allow annotation of positions that may be in insertions relative to the
1635 Special TEXT commands will be extended to allow alternative shapes to
1636 be drawn and scaled in various ways.
1638 Tree drawing and generalised graphics. An option to draw arbitrary lines
1639 on an alignment will be added. This will permit line graphics to be added
1640 to an alignment. The initial reason for this will be to show dendrograms
1641 (trees) alongside the alignment, but simple line graphs could also be plotted
1642 under the alignment.
1644 Fiddle factors will be introduced to allow fine positioning of
1645 individual characters. For example, if you like your ``I'' characters
1646 to be centred rather than left justified, this will be possible.
1648 In single\_page mode, it will be possible to add arbitrary text to an
1649 alignment for final annotation, e.g. titles etc.
1651 Variable height/width sequence lines will be permitted (maybe).
1653 \subsection{Acknowledgements}
1655 I thank all those who have emailed me with suggestions for
1656 improvements to alscript. I've tried to include some of these in the
1657 current distribution (e.g. screening).
1659 \subsection{References}
1664 1. Barton, G. J. (1993),
1665 "ALSCRIPT A tool to format multiple sequence alignments",
1666 Protein Engineering, Volume 6, No. 1, pp.37-40.
1668 2. Barton, G. J. (1990),
1669 "Protein Multiple Sequence Alignment and Flexible Pattern Matching",
1670 Methods in Enzymology,
1673 3. Barton, G. J. and Sternberg, M. J. E. (1987),
1674 "A Strategy for the Rapid Multiple Alignment of Protein Sequences:
1675 Confidence Levels From Tertiary Structure Comparisons",
1676 Journal of Molecular Biology,
1679 4. Higgins, D. G. and Sharp, P. M. (1989),
1680 "Fast and sensitive multiple sequence alignments on a microcomputer",
1684 5. Devereux, J. Haeberli, P. Smithies, O. (1984),
1685 "A comprehensive set of sequence analysis programs for the VAX",
1689 6. Livingstone, C. D. and Barton, G. J. (1993),
1690 "Protein Sequence Alignments: A Strategy for the Hierarchical analysis
1691 of residue conservation"
1692 Computer Applications in the Biosciences,