WSTester updated to work plus hopefully all the other changes that need to go into...

[jabaws.git] / binaries / src / ViennaRNA / man / RNAlib.texinfo
diff --git a/binaries/src/ViennaRNA/man/RNAlib.texinfo b/binaries/src/ViennaRNA/man/RNAlib.texinfo

new file mode 100644 (file)

index 0000000..448c02f
--- /dev/null
+++ b/binaries/src/ViennaRNA/man/RNAlib.texinfo
@@ -0,0 +1,1589 @@
+\input texinfo  @c -*- Mode: texinfo; fill-column: 74 -*-
+@c Last changed Time-stamp: <2010-03-11 21:47:46 ivo>
+@set revision $Id: RNAlib.texinfo,v 1.25 2009/01/13 17:49:58 ivo Exp $
+@setfilename RNAlib.info
+@settitle Vienna RNA Package
+@ifclear Version
+@set Version 1.6
+@end ifclear
+@iftex
+@afourpaper
+@end iftex
+@ifinfo
+@setchapternewpage odd
+@end ifinfo
+@titlepage
+@sp 10
+@title{Vienna RNA Package}
+@sp
+@subtitle{A Library for folding and comparing RNA secondary structures}
+
+@vskip 0pt plus 1filll
+Copyright @copyright{1994-2002} @author{Ivo Hofacker and Peter Stadler}
+@c @smallexample
+Revision @value{revision}
+@c @end smallexample
+@end titlepage
+
+@node Top, Introduction, (dir), (dir)
+@comment node-name, next,          previous, up
+
+@ifinfo
+This file documents the Vienna RNA Package Version @value{Version}
+
+Copyright @copyright{1994-2002} Ivo Hofacker and Peter Stadler
+@end ifinfo
+
+@menu
+* Introduction::
+* Folding Routines::            Functions for Folding RNA Secondary Structures
+* Parsing and Comparing::       Functions to Manipulate Structures
+* Utilities::                   Odds and Ends
+* Example::                     A Small Example Program
+* References::
+* Function Index::
+* Variable Index::
+@end menu
+
+@node Introduction, Folding Routines, Top, Top
+@chapter Introduction
+
+The core of the Vienna RNA Package is formed by a collection of routines
+for the prediction and comparison of RNA secondary structures. These
+routines can be accessed through stand-alone programs, such as RNAfold,
+RNAdistance etc., which should be sufficient for most users. For those who
+wish to develop their own programs we provide a library which can be
+linked to your own code.
+
+This document describes the library and will be primarily useful to
+programmers. However, it also contains details about the implementation
+that may be of interest to advanced users. The stand-alone programs are
+described in separate man pages.  The latest version of the package
+including source code and html versions of the documentation can be found
+at
+@ifclear texi2html
+@url{http://www.tbi.univie.ac.at/~ivo/RNA/}.
+@end ifclear
+@ifset texi2html
+@ifhtml
+the <a href="http://www.tbi.univie.ac.at/~ivo/RNA/">ViennaRNA page</a>.
+@end ifhtml
+@end ifset
+This manual documents version @value{Version}.
+
+Please send comments and bug reports to
+@ifclear texi2html
+@email{ivo@@tbi.univie.ac.at}.
+@end ifclear
+@ifset texi2html
+@ifhtml
+<a href="mailto:rna@tbi.univie.ac.at">&lt;Ivo.Hofacker@tbi.univie.ac.at&gt;</a>.
+@end ifhtml
+@end ifset
+
+@node Folding Routines, Parsing and Comparing, Introduction, Top
+@comment  node-name,  next,  previous,  up
+@chapter  Folding Routines
+
+@menu
+* mfe Fold::                    Calculating Minimum Free Energy Structures
+* PF Fold::                     Calculating Partition Functions and Pair Probabilities
+* Inverse Fold::                Searching for Predefined Structures
+* Suboptimal folding::          Enumerating Suboptimal Structures
+* Cofolding::                   Folding of 2 RNA molecules
+* Local Fold::                  Predicting local structures of large sequences
+* Alignment Fold::              Predicting Consensus Structures from Alignment
+* Fold Vars::                   Global Variables for the Folding Routines
+* Param Files::                 Reading Energy Parameters from File
+@end menu
+
+@node mfe Fold, PF Fold, Folding Routines, Folding Routines
+
+@section Minimum free Energy Folding
+The library provides a fast dynamic programming minimum free energy
+folding algorithm as described by @cite{Zuker & Stiegler (1981)}.
+Associated functions are:
+
+
+@deftypefun float fold (char* @var{sequence}, char* @var{structure})
+folds the @var{sequence} and returns the minimum free energy in kcal/mol;
+the mfe structure in bracket notation (@pxref{notations}) is returned in
+@var{structure}. Sufficient space for string of the same length as
+@var{sequence} must be allocated for @var{structure} before calling
+@code{fold()}. If @code{fold_constrained} (@pxref{Fold Vars,,Variables})
+is 1, the @var{structure} string is interpreted on input as a list of
+constraints for the folding. The characters `` | x < > `` mark bases that
+are paired, unpaired, paired upstream, or downstream, respectively;
+matching brackets `` ( ) `` denote base pairs, dots ``.'' are used for
+unconstrained bases. Constrained folding works by assigning bonus energies
+to all structures that comply with the constraint.
+@end deftypefun
+
+@deftypefun float energy_of_struct (char* @var{sequence}, char* @var{structure})
+calculates the energy of @var{sequence} on the @var{structure}
+@end deftypefun
+
+The @code{circ_fold()} and @code{energy_of_circ_struct()} provide
+analogous functions for folding and evaluating cricular RNA molecules.
+
+@deftypefun void   initialize_fold (int @var{length})
+Obsolete function kept for backward compatibility.
+Allocates memory for mfe folding sequences not longer than @var{length},
+and sets up pairing matrix and energy parameters.
+@end deftypefun
+
+@deftypefun void   free_arrays ()
+frees the memory allocated by @code{fold()}.
+@end deftypefun
+
+@deftypefun void   update_fold_params ()
+call this to recalculate the pair matrix and energy parameters after a
+change in folding parameters like @code{temperature}
+(@pxref{Fold Vars,,Variables}).
+@end deftypefun
+
+Prototypes for these functions are declared in @file{fold.h}.
+
+@node PF Fold, Inverse Fold, mfe Fold, Folding Routines
+@section Partition Function Folding
+
+Instead of the minimum free energy structure the partition function of
+all possible structures and from that the pairing probability for every
+possible pair can be calculated, using a dynamic programming algorithm
+as described by @cite{McCaskill (1990)}. The  following
+functions are provided:
+
+@deftypefun float pf_fold (char* @var{sequence}, char* @var{structure})
+calculates the partition function @math{Z} of @var{sequence} and returns
+the free energy of the ensemble @math{F} in kcal/mol, where @math{F=-kT
+ln(Z)}. If @var{structure} is not a NULL pointer on input, it contains on
+return a string consisting of the letters `` . , | @{ @} ( ) `` denoting
+bases that are essentially unpaired, weakly paired, strongly paired without
+preference, weakly upstream (downstream) paired, or strongly up-
+(down-)stream paired bases, respectively.  If @code{fold_constrained}
+(@pxref{Fold Vars,,Variables}) is 1, the @var{structure} string is
+interpreted on input as a list of constraints for the folding. The
+character ``x`` marks bases that must be unpaired, matching brackets `` ( )
+`` denote base pairs, all other characters are ignored. Any pairs
+conflicting with the constraint will be forbidden. This usually sufficient
+to ensure the constraints are honored.  If @code{do_backtrack} (@pxref{Fold
+Vars,,Variables}) has been set to 0 base pairing probabilities will not
+be computed (saving CPU time), otherwise the @code{pr[iindx[i]-j]}
+(@pxref{Fold Vars,,Variables}) will contain the probability that bases i
+and j pair.
+@end deftypefun
+
+@deftypefun void init_pf_fold (int @var{length})
+allocates memory for folding sequences not longer than @var{length};
+sets up pairing matrix and energy parameters. Has to be called before
+the first call to @code{pf_fold()}.
+@end deftypefun
+
+@deftypefun void free_pf_arrays (void)
+frees the memory allocated by @code{init_pf_fold()}.
+@end deftypefun
+
+@deftypefun void update_pf_params (int @var{length})
+Call this function to recalculate the pair matrix and energy parameters
+after a change in folding parameters like temperature
+(@pxref{Fold Vars,,Variables}).
+@end deftypefun
+
+@deftypefun double mean_bp_dist (int @var{length})
+computes the mean base pair distance in the equilibrium ensemble as a
+measure of the structural diversity.  It is given by @math{<d> =
+\sum_{a,b} p_a * p_b * d(a,b)}, where the sum goes over all pairs of
+possible structure @math{a,b}, @math{p_a} is the Boltzmann weight of
+structure @math{a}, and @math{d(a,b)} is the base pair distance (see
+@code{bp_distance()} in @xref{Distances}.). The mean base pair distances
+can be computed efficiently from the pair probabilities @math{p_ij} as
+@math{<d> = \sum_{ij} p_{ij} * (1-p_{ij})}. Uses the global @code{pr} array
+filled by a previous call to @code{pf_fold()}.
+@end deftypefun
+
+@deftypefun char *centroid (int @var{length}, double *@var{dist})
+Computes the centroid structure, i.e. the structure having the lowest
+average base pair distance to all structures in the Boltzmann
+ensemble. This can be computed trivially from the pair probabilities by
+choosing all base pairs that have probability greater 0.5. The distance of
+the centroid to the ensemble is returned in @var{dist}.
+@end deftypefun
+
+Prototypes for these functions are declared in @file{part_func.h}.
+
+@deftypefun float *MEA (plist *@var{pl}, char *@var{structure}, double @var{gamma})
+Computes a MEA (maximum expected accuracy) structure, such that expected
+accuracy @math{ A(S) = \sum_{(i,j) \in S} 2 \gamma p_{ij} + \sum_{i \notin
+S} p^u_i} is maximised. Higher values of @math{\gamma}
+result in more base pairs of lower probability and thus higher
+sensitivity. Low values of gamm result in structures containing only
+highly likely pairs (high specificity).
+The code of the MEA function also demonstrates the use of sparse dynamic
+programming scheme to reduce the time and memory complexity of folding.
+@end deftypefun
+The corresponding protoype is declared in @file{MEA.h}.
+
+@node Inverse Fold, Suboptimal folding, PF Fold, Folding Routines
+@section  Inverse Folding
+
+We provide two functions that search for sequences with a given
+structure, thereby inverting the folding routines.
+
+
+@deftypefun float inverse_fold (char* @var{start}, char* @var{target})
+searches for a sequence with minimum free energy structure
+@var{target}, starting with sequence @var{start}. It returns 0 if the
+search was successful, otherwise a structure distance to @var{target}
+is returned. The found sequence is returned in @var{start}. If @code{give_up}
+@vindex give_up
+is set to 1, the function will return as soon as it is
+clear that the search will be unsuccessful, this speeds up the algorithm
+if you are only interested in exact solutions.
+Since @code{inverse_fold()} calls @code{fold()} you have to allocate memory
+for folding by calling @code{initialize_fold()}
+@end deftypefun
+
+@deftypefun float inverse_pf_fold (char* @var{start}, char* @var{target})
+searches for a sequence with maximum probability to fold into structure
+@var{target} using the partition function algorithm. It returns
+-kT log(p) where p is the frequency of @var{target} in the ensemble of
+possible structures. This is usually much slower than @code{inverse_fold()}.
+Since @code{inverse_pf_fold()} calls @code{pf_fold()} you have to allocate
+memory for folding by calling @code{init_pf_fold()}
+@end deftypefun
+
+@deftypevar char *symbolset
+The global variable @code{char *symbolset} points to the allowed bases,
+initially @code{"AUGC"}.  It can be used to design sequences from reduced
+alphabets.
+@end deftypevar
+
+Prototypes for these functions are declared in @file{inverse.h}.
+
+@node Suboptimal folding, Cofolding, Inverse Fold, Folding Routines
+@section  Suboptimal folding
+
+@deftp {data type} SOLUTION  struct @{float energy; char *structure;@}
+@end deftp
+
+@deftypefun SOLUTION* subopt (char* @var{sequence}, char* @var{constraint}, int* @var{delta}, FILE* @var{fp})
+produces @strong{all} suboptimal secondary structures within
+@var{delta}*0.01 kcal/mol of the optimum. The results are either
+directly written to a @var{fp} (if @var{fp} is not @code{NULL}), or
+(@var{fp}@code{==NULL}) returned in a @code{SOLUTION *} list terminated
+by an entry were the @code{structure} pointer is @code{NULL}.
+@end deftypefun
+
+Prototypes for these functions are declared in @file{subopt.h}.
+
+@node Cofolding, Local Fold, Suboptimal folding, Folding Routines
+@section  Predicting hybridization structures of two molecules
+
+The function of an RNA molecule often depends on its interaction with
+other RNAs. The following routines therefore allow to predict structures
+formed by two RNA molecules upon hybridization.
+One approach to co-folding two RNAs consists of concatenating the two
+sequences and keeping track of the concatenation point in all energy
+evaluations. Correspondingly, many of the @code{cofold()} and
+@code{co_pf_fold()} routines below take one sequence string as argument
+and use the the global variable @var{cut_point} to mark the concatenation
+point. Note that while the @code{RNAcofold} program uses the '&' character
+to mark the chain break in its input, you should not use an '&' when using
+the library routines (set @var{cut_point} instead).
+In a second approach to co-folding two RNAs, cofolding is seen as a
+stepwise process. In the first step the probability of an unpaired region
+is calculated and in a second step this probability of an unpaired region
+is  multiplied with the probability of an interaction between the two RNAs.
+This approach is implemented for the interaction between a long
+target sequence and a short ligand RNA. Function @code{pf_unstru()}
+calculates the partition function over all unpaired regions in the input
+sequence. Function @code{pf_interact()}, which calculates the
+partition function over all possible interactions between two
+sequences, needs both sequence as separate strings as input.
+
+@deftypevar int cut_point
+@code{cut_point} marks the position (starting from 1) of the first
+nucleotide of the second molecule within the concatenated sequence.
+The default value of -1 stands for single molecule folding. The
+@code{cut_point} variable is also used by @code{PS_rna_plot()} and
+@code{PS_dot_plot()} to mark the chain break in postscript plots.
+@end deftypevar
+
+@deftypefun float cofold (char @var{*sequence}, char @var{*structure})
+The analog to the  @code{fold()} function. Computes the minimum free
+energy of two RNA molecules interacting. If @code{cut_point==-1} results
+should be the same as with @code{fold()}.
+@end deftypefun
+
+@deftypefun  void  free_co_arrays (void)
+Frees arrays allocated by @code{cofold()}.
+@end deftypefun
+
+@c @deftypefun char *tokenize (char @var{*line})
+@c Takes the sequence as it is taken from stdin or a file, sets the
+@c @var{cut_point} global variable to the position of the '&' character
+@c and cuts out the '&' character to return the sequence as it will be
+@c used by all cofold functions.
+@c @end deftypefun
+
+@deftypefun void   initialize_cofold (int @var{length})
+Allocates memory for mfe cofolding sequences not longer than @var{length},
+and sets up pairing matrix and energy parameters.
+Explicitly calling  @code{initialize_cofold()} is normally not necessary,
+as it will be called automagically before folding.
+@end deftypefun
+
+@c @deftypefun extern struct plist *get_mfe_plist (struct plist @var{*pl})
+@c This writes the global @var{*pr} array into the new structure
+@c @var{plist}, creating an array of the base pairs of the minimum free
+@c energy structure (and nothing else). Default probability is 0.95.
+@c Used when @var{*pr} is going to be freed or overwritten by other
+@c functions (e.g. @code{ co_pf_fold()},@code{cofold()}, etc).
+@c @end deftypefun
+
+Prototypes for these functions are declared in @file{cofold.h}.
+
+
+@section  Partition Function Cofolding
+As for folding one RNA molecule, this computes the partition function
+of all possible structures and the base pair probabilities. Uses the
+same global @var{pf_scale} variable to avoid overflows.
+
+To simplify the implementation the partition function computation is done
+internally in a null model that does not include the duplex initiation
+energy, i.e. the entropic penalty for producing a dimer from two
+monomers). The resulting free energies and pair probabilities are initially
+relative to that null model. In a second step the free energies can be
+corrected to include the dimerization penalty, and the pair probabilities
+can be divided into the conditional pair probabilities given that a re
+dimer is formed or not formed.
+
+
+@deftp {data type} cofoldF  struct @{double F0AB, FAB, FcAB, FA, FB;@}
+@end deftp
+
+@deftypefun cofoldF co_pf_fold (char* @var{sequence}, char* @var{structure})
+The analog to the @code{pf_fold()} function, it computes the partition
+function of two interacting RNA molecules as well as base pair
+probabilities.
+It returns a struct including free energies computed for different
+models. F0AB is the free energy in the null model; FAB is corrected
+to include the initiation penalty, FcAB only includes real dimers. FA and
+FB are the free energies of the 2 molecules, resp. As with @code{cofold()}
+the @var{cut_point} global variable has to be set to mark the chain break
+between the molecules.
+@end deftypefun
+
+
+@deftypefun void   free_co_pf_arrays (void)
+frees partition function cofold arrays allocated in @code{ co_pf_fold()}.
+@end deftypefun
+
+@deftypefun void   init_co_pf_fold (int @var{length})
+Obsolete function kept for backward compatibility.
+Allocates memory for partition function cofolding sequences not longer
+than @var{length}, and sets up pairing matrix and energy parameters.
+@end deftypefun
+
+@deftp {data type} plist  struct @{int i; int j; float p;@}
+@end deftp
+
+@deftypefun extern struct plist *get_plist (struct plist @var{*pl}, int @var{length}, double @var{cut_off})
+Makes a list of base pairs out of the global @var{*pr} array,
+ignoring all base pairs with a probability less than @var{cut_off}.
+This is useful since the @var{pr} array gets overwritten by subsequent
+foldings.
+@end deftypefun
+
+Prototypes for these functions are declared in @file{co_part_func.h}.
+
+@section  Cofolding all Dimeres, Concentrations
+After computing the partition functions of all possible dimeres one
+can compute the probabilities of base pairs, the concentrations out of
+start concentrations and sofar and soaway.
+
+@deftypefun void compute_probabilities (double @var{FAB}, double @var{FEA}, double @var{FEB}, struct plist  @var{*prAB}, struct plist  @var{*prA}, struct plist  @var{*prB}, int @var{Alength})
+Given the pair probabilities and free energies (in the null model) for a
+dimer AB and the two constituent monomers A and B, compute the conditional pair
+probabilities given that a dimer AB actually forms.
+Null model pair probabilities are given as a list as produced by
+@code{get_plist()}), the dimer probabilities @var{prAB} are modified in
+place.
+@end deftypefun
+
+Dimer formation is inherently concentration dependent. Given the free
+energies of the monomers A and B and dimers AB, AA, and BB one can compute
+the equilibrium concentrations, given input concentrations of A and B, see
+e.g.~@cite{Dimitrov & Zuker (2004)}
+
+@deftp {data type} ConcEnt  struct @{double A0; double B0;double ABc;double AAc; double BBc; double Ac; double Bc;@}
+@end deftp
+
+@deftypefun extern struct ConcEnt  *get_concentrations(double @var{FEAB}, double @var{FEAA}, double @var{FEBB}, double @var{FEA}, double @var{FEB}, double * @var{startconc})
+Takes an array  @var{startconc} of input concentrations with alternating
+entries for the initial concentrations of molecules A and B (terminated by
+two zeroes), then computes the resulting equilibrium concentrations
+from the free energies for the dimers. Dimer free energies should be the
+dimer-only free energies, i.e. the FcAB entries from the @code{cofoldF} struct.
+@end deftypefun
+
+@c @deftypefun int make_probsum (int @var{length}, char @var{*name})
+@c Computes the probability of any base to be paired to a base with a
+@c number lower than @var{length} (used for e.g. compute probability to
+@c be paired within one molecule).
+@c @end deftypefun
+
+Prototypes for these functions are declared in @file{co_part_func.h}.
+
+@section Partition Function Cofolding as stepwise process
+
+In this approach to cofolding the interaction between two RNA molecules is
+seen as a stepwise process. In a first step, the target molecule has to
+adopt a structure in which a binding site is accessible. In a second step,
+the ligand molecule will hybridize with a region accessible to an
+interaction. Consequently the algorithm is designed as a two step process:
+The first step is the calculation of the probability
+that a region within the target is unpaired, or equivalently, the
+calculation of the free energy needed to expose a region. In the second step
+we compute the free energy of an interaction for every possible binding
+site.
+
+@deftp {data type} pu_contrib struct @{double **H; double **I; double **M; double **E; int length;@}
+@end deftp
+
+@deftypefun @code{pu_contrib} *pf_unstru (char @var{*sequence}, int @var{u})
+This function calculates the partition function over all unpaired regions
+in @var{*sequence} of a maximal length @var{u}. You have to call function
+@code{pf_fold()} for @var{*sequence} befor calling @code{pf_unstru()}. If
+you want to calculate unpaired regions for a constrained structure, set
+variable @var{structure} in function @code{pf_fold()} to the constrain string.
+Function @code{pf_unstru} returns a
+@code{pu_contrib} struct containing four arrays of dimension [i = 1 to length
+of @var{*sequence}][j = 0 to @var{u}-1] containing all possible contributions
+to the probabilities of unpaired regions of maximum length @var{u}.
+Each array in @code{pu_contrib} contains one of the contributions to the
+total probability of being unpaired: The probability of being unpaired
+within an exterior loop is in array @code{pu_contrib->E}, the probability
+of being unpaired within a hairpin loop is in array @code{pu_contrib->H},
+the probability of being unpaired within an interior loop is in array
+@code{pu_contrib->I} and probability of being unpaired within a multi-loop
+is in array @code{pu_contrib->M}. The total probability of being unpaired
+is the sum of the four arrays of @code{pu_contrib}.
+Function @code{pf_unstru} frees everything allocated automatically. To
+free the output structure call @code{free_pu_contrib}.
+@end deftypefun
+
+@deftypefun void  free_pu_contrib (@code{pu_contrib} @var{*p_con})
+frees the output of function @code{pf_unstru}.
+@end deftypefun
+
+@deftp {data type} interact struct @{double *Pi; double *I; double Gikjl; double Gikjl_wo; int i, int k, int j, int l, int length;@}
+@end deftp
+
+@deftypefun interact *pf_interact (const char @var{*s1}, const char @var{*s2}, @code{pu_contrib} @var{*p_c}, @code{pu_contrib} @var{*p_c2},int @var{w}, char @var{*cstruc}, int @var{incr3}, int @var{incr5})
+Calculates the probability of a local interaction between sequence
+@var{*s1} and sequence @var{*s2}, considering the probability that the
+region of interaction is unpaired within @var{*s1} and  @var{*s2}. The
+longer sequence has to be given as @var{*s1}. The shorter sequence has to
+be given as @var{*s2}. Function @code{pf_unstru()} has to be called
+for @var{*s1} and @var{*s2}, where the probabilities of  being unpaired
+have to be given in @var{*p_c} and @var{*p_c2}, respectively. If you do
+not want to include the probabilities of  being unpaired for @var{*s2} set
+@var{*p_c2} to NULL. If variable char @var{*cstruc} is not NULL,
+constrained folding is done: The available constrains for intermolecular
+interaction are: '.' (no constrain), 'x' (the base has no intermolecular
+interaction) and '|' (the corresponding base has to be paired
+intermolecularily).
+The parameter @var{w} determines the maximal length of the interaction. The
+parameters @var{incr5} and @var{incr3} allows inclusion of
+unpaired residues left (@var{incr5}) and right (@var{incr3}) of the region
+of interaction in @var{*s1}. If the @var{incr} options are used, function
+@code{pf_unstru()} has to be called with
+@var{w}=@var{w}+@var{incr5}+@var{incr3} for the longer sequence @var{*s1}.
+Function @code{pf_interact()} returns a structure of type @code{interact} which
+contains the probability of the best local interaction including residue i
+in Pi and the minimum free energy in Gi, where i is the position in sequence
+@var{*s1}. The member Gikjl of structure @code{interact} is
+the best interaction between region [k,i] k<i in longer sequence
+@var{*s1} and region [j,l] j<l in @var{*s2}. Gikjl_wo is Gikjl without the
+probability of beeing unpaired.
+Use @code{free_interact)} to free the returned structure, all
+other stuff is freed inside @code{pf_interact()}.
+@end deftypefun
+
+@deftypefun void  free_interact (@code{interact} @var{*pin})
+frees the output of function @code{pf_interact}.
+@end deftypefun
+
+Prototypes for these functions are declared in @file{part_func_up.h}.
+
+@node Local Fold, Alignment Fold, Cofolding, Folding Routines
+@section Searching for local structures in large molecules
+
+Local structures can be predicted by a modified version of the
+@code{fold()} algorithm that restricts the span of all base pairs.
+
+@deftypefun float  Lfold (char *@var{string}, char *@var{structure}, int @var{maxdist})
+The local analog to @code{fold()}. Computes the minimum free energy
+structure including only base pairs with a span smaller than
+@end deftypefun
+
+@var{maxdist}.
+
+@deftypefun struct plist *pfl_fold (char *@var{sequence}, int @var{winSize},int @var{pairSize}, float @var{cutoffb}, double **@var{pU},struct plist **@var{dpp2}, FILE *@var{pUfp}, FILE *@var{spup}))
+pfl_fold computes partition functions for every window of size
+@var{winSize} possible in a RNA molecule, allowing only pairs with a span
+smaller than @var{pairSize}. It returns the mean pair probabilities averaged
+over all windows containing the pair in @var{pl}. @var{winSize} should
+always be >= @var{pairSize}. Note that in contrast to @code{Lfold()},
+bases outside of the window do not influence the structure at all. Only
+probabilities higher than @var{cutoff} are kept.
+
+The remaining arguments are optional in the sense that they can be set to
+NULL unless the corresponding functionality is wanted.
+If @var{pU} is supplied (i.e is not the NULL pointer), @code{pfl_fold()}
+will also compute local @emph{accessibilities}, as the mean probability that regions of length @code{u} or
+smaller are unpaired. The parameter @code{ulength} is supplied in @code{pU[0][0]}. On return
+the @var{pU} array will contain these probabilities, with the entry on
+@code{pU[x][y]} containing the mean probability that x and the y following
+bases are unpaired.
+If @var{dpp2} is supplied (i.e is not the NULL pointer), @code{pfl_fold()}
+will also compute the probability of stacked pairs, more precisely the
+conditional probabilities of base pairs (i,j) given that the enclosing
+pair (i-1,j+1) is formed.
+The file pointers @var{pUfp} and @var{spup} file pointer is provided, the unpaired probabilities
+will direcly be printed and not saved, which can be useful when memory is
+an issue).
+If the  file pointer is provided, the base pair probabilities
+will direcly be printed and not saved, which can be useful when memory is
+an issue).
+
+@end deftypefun
+
+@deftypefun void  init_pf_foldLP (int @var{length})
+Analog to @code{init_pf_fold()}. @var{length} is the length  of the sequence.
+@end deftypefun
+
+@deftypefun void  free_pf_arraysLP (void)
+frees the arrays used by @code{pfl_fold()}
+@end deftypefun
+
+@deftypefun void  update_pf_paramsLP (int @var{length})
+Analog to @code{update_pf_params()}. @var{length} is the length  of the sequence.
+@end deftypefun
+
+@node Alignment Fold, Fold Vars, Local Fold, Folding Routines
+@section   Predicting Consensus Structures from an Alignment
+
+Consensus structures can be predicted by a modified version of the
+@code{fold()} algorithm that takes a set of aligned sequences instead
+of a single sequence. The energy function consists of the mean energy
+averaged over the sequences, plus a covariance term that favors pairs
+with consistent and compensatory mutations and penalizes pairs that
+cannot be formed by all structures. For details see @cite{Hofacker (2002)}.
+
+@deftypefun float alifold (char** @var{sequences}, char* @var{structure})
+Predicts the consensus structure for the aligned @var{sequences}
+and returns the minimum free energy; the mfe structure in bracket
+notation (@pxref{notations}) is returned in @var{structure}.
+Sufficient space must be allocated for @var{structure} before calling
+@code{alifold()}.
+@end deftypefun
+
+@deftypefun void free_alifold_arrays ()
+frees the memory allocated by @code{alifold()}.
+@end deftypefun
+
+@deftypefun void update_alifold_params ()
+call this to recalculate the pair matrix and energy parameters after a
+change in folding parameters like @code{temperature}
+(@pxref{Fold Vars,,Variables}).
+@end deftypefun
+
+@deftypefun float alipf_fold (char** @var{sequences}, char* @var{structure}, pair_info **@var{pi})
+The partition function version of @code{alifold()} works in analogy to
+@code{pf_fold()}. Pair probabilities and information about sequence
+covariations are returned via the @var{pi} variable as a list of
+@code{pair_info} structs.  The list is terminated by the first entry with
+@code{pi.i = 0}.
+@end deftypefun
+
+@deftp {data type} pair_info  struct @{short i,j; float p; float ent; short bp[8]; char comp;@}
+for each base pair @code{i,j} the structure lists: its probability
+@code{p}, an entropy-like measure for its well-definedness @code{ent},
+and in @code{bp[]} the frequency of each type of pair. @code{bp[0]}
+contains the number of non-compatible sequences, @code{bp[1]} the
+number of CG pairs, etc.
+@end deftp
+
+@deftypevar double  cv_fact
+@code{cv_fact} controls the weight of the covariance term in the
+energy function. Default is 1.
+@end deftypevar
+
+@deftypevar double nc_fact
+controls the magnitude of the penalty for non-compatible sequences in
+the covariance term. Default is 1.
+@end deftypevar
+
+@node Fold Vars, Param Files , Alignment Fold, Folding Routines
+@section   Global Variables for the Folding Routines
+
+The following global variables change the behavior the folding
+algorithms or contain additional information after folding.
+
+@deftypevar int  noGU
+do not allow GU pairs if equal 1; default is 0.
+@end deftypevar
+
+@deftypevar int  no_closingGU
+if 1 allow GU pairs only inside stacks, not as closing pairs;
+default is 0.
+@end deftypevar
+
+@deftypevar int  noLonelyPairs
+Disallow all pairs which can @strong{only} occur as lonely pairs (i.e. as helix
+of length 1). This avoids lonely base pairs in the predicted structures in
+most cases.
+@end deftypevar
+
+@deftypevar int  tetra_loop
+include special stabilizing energies for some tetra loops; default is 1.
+@end deftypevar
+
+@deftypevar int  energy_set
+if 1 or 2: fold sequences from an artificial alphabet ABCD..., where A
+pairs B, C pairs D, etc. using either GC (1) or AU parameters (2);
+default is 0, you probably don't want to change it.
+@end deftypevar
+
+@deftypevar float temperature
+rescale energy parameters to a temperature of @code{temperature} C.
+Default is 37C. You have to call the update_@dots{}_params() functions after
+changing this parameter.
+@end deftypevar
+
+@deftypevar int dangles
+if set to 0 no stabilizing energies are assigned to bases adjacent to
+helices in free ends and multiloops (so called dangling ends). Normally
+(@code{dangles = 1}) dangling end energies are assigned only to unpaired
+bases and a base cannot participate simultaneously in two dangling ends. In
+the partition function algorithm @code{pf_fold()} these checks are neglected.
+If @code{dangles} is set to 2, the @code{fold()} and
+@code{energy_of_struct()} function will also follow this convention. This
+treatment of dangling ends gives more favorable energies to helices
+directly adjacent to one another, which can be beneficial since such
+helices often do engage in stabilizing interactions through co-axial
+stacking.@*
+If @code{dangles = 3} co-axial stacking is explicitly included for
+adjacent helices in mutli-loops. The option affects only mfe folding
+and energy evaluation (@code{fold()} and @code{energy_of_struct()}), as
+well as suboptimal folding (@code{subopt()}) via re-evaluation of energies.
+Co-axial stacking with one intervening mismatch is not considered so far.
+
+Default is 1, @code{pf_fold()} treats 1 as 2.
+@end deftypevar
+
+@deftypevar char* nonstandards
+Lists additional base pairs that will be allowed to form in addition to
+GC, CG, AU, UA, GU and UG. Nonstandard base pairs are given a stacking
+energy of 0.
+@end deftypevar
+
+@deftypevar int cut_point
+To evaluate the energy of a duplex structure (a structure formed by two
+strands), concatenate the to sequences and set @var{cut_point} to the
+first base of the second strand in the concatenated sequence. Reset to
+0 or -1 (the default) to evaluate normal single sequence structures.
+@end deftypevar
+
+@deftypevar {struct bond @{ int i,j;@}} base_pair
+Contains a list of base pairs after a call to @code{fold()}.
+@code{base_pair[0].i} contains the total number of pairs.
+@end deftypevar
+
+@deftypevar double* pr
+contains the base pair probability matrix after a call to @code{pf_fold()}.
+@end deftypevar
+
+@deftypevar int* iindx
+index array to move through pr. The probability for base i and j to form
+a pair is in @code{pr[iindx[i]-j]}.
+@end deftypevar
+
+@deftypevar float  pf_scale
+a scaling factor used by @code{pf_fold()} to avoid overflows. Should be set
+to approximately exp@math{((-F/kT)/length)}, where @math{F} is an estimate
+for the ensemble free energy, for example the minimum free energy. You must
+call @code{update_pf_params()} or @code{init_pf_fold()} after changing this
+parameter. If pf_scale is -1 (the default) , an estimate will be provided
+automatically when calling @code{init_pf_fold()} or
+@code{update_f_params()}. The automatic estimate is usually insufficient
+for sequences more than a few hundred bases long.
+@end deftypevar
+
+@deftypevar int    fold_constrained
+If 1, calculate constrained minimum free energy structures. @xref{mfe Fold},
+for more information. Default is 0;
+@end deftypevar
+
+@deftypevar int    do_backtrack
+if 0, do not calculate pair probabilities in @code{pf_fold()}; this is about
+twice as fast. Default is 1.
+@end deftypevar
+
+@deftypevar char backtrack_type
+only for use by @code{inverse_fold()}; 'C': force (1,N) to be paired,
+'M' fold as if the sequence were inside a multi-loop. Otherwise the
+usual mfe structure is computed.
+@end deftypevar
+
+include @file{fold_vars.h} if you want to change any of these variables
+from their defaults.
+
+@node Param Files, , Fold Vars, Folding Routines
+@section Energy Parameter Files
+
+A default set of parameters, identical to the one described in @cite{Mathews
+et.al. (1999)}, is compiled into the library.
+Alternately, parameters can be read from and written to a file.
+
+@deftypefun void read_parameter_file (const char @var{fname}[])
+reads energy parameters from file @var{fname}. See below for the format of
+the parameter file.
+@end deftypefun
+@deftypefun void write_parameter_file (const char @var{fname}[])
+writes current energy parameters to the file @var{fname}.
+@end deftypefun
+
+The following describes the file format expected by
+@code{read_parameter_file()}. All energies should be given as integers in
+units of 0.01kcal/mol.
+
+Various loop parameters depend in general on the pairs closing the loops,
+as well as unpaired bases in the loops. Internally, the library
+distinguishes 8 types of pairs (CG=1, GC=2, GU=3, UG=4, AU=5, UA=6,
+nonstandard=7, 0= no pair), and 5 types of bases (A=1, C=2, G=3, U=4 and 0
+for anything else). Parameters belonging to pairs of type 0 are not listed
+in the parameter files, but values for nonstandard pairs (type 7) and
+nonstandard bases (type 0) are. Thus, a table for symmetric size 2 interior
+loops would have 7*7*5*5 entries (2 pairs, two unpaired bases).
+
+The order of entries always uses the closing pair or pairs as the first
+indices followed by the unpaired bases in 5' to 3' direction.  To determine
+the type of a pair consider the base at the (local) 5' end of each strand
+first, i.e. use the pairs @math{(i,j)} and @math{(q,p)} for an interior
+loop with @math{i<p<q<j}. This is probably better explained by an example.
+Consider the symmetric size 4 interior loop
+@example
+                     5'-GAUA-3'
+                     3'-CGCU-5'
+@end example
+the first pair is GC, the second UA (not AU!) the unpaired bases are (in 5'
+to 3' direction, starting at the first pair) A U C G. Thus we need entry
+[2,6,1,4,2,3] of the corresponding table. Because the loop is
+symmetric you could equally well describe it by UA GC C G A U, i.e. entry
+[6,2,2,3,1,4]. Be careful to preserve this symmetry when editing
+parameter tables!
+
+
+@need 1000
+The first line of the file should read
+@*@t{## RNAfold parameter file}
+
+@noindent lines of the form@*
+@noindent @t{# token}@*
+mark the beginning of a list of energy parameters of the type specified by
+token. The following tokens are recognized:
+
+@noindent @t{# stack_energies}@*
+The list of free energies for stacked pairs, indexed by the two closing
+pairs. The list should be formatted as symmetric an 7*7 matrix,conforming
+to the order explained above. As an example the stacked pair
+@example
+                     5'-GU-3'        5'-AC-3'
+                     3'-CA-5'        3'-UG-5'
+@end example
+@noindent corresponds to the entry [2,5], which should be identical to [5,2].
+Note that the format has changed from previous releases, to make it
+consistent with other loop parameters.
+
+@noindent @t{# stack_enthalpies}@*
+enthalpies for stacked pairs, used to rescale stacking energies to
+temperatures other than 37C. Same format as stack_energies.
+
+@noindent @t{# hairpin}@*
+Free energies of hairpin loops as a function of size. The list should
+contain 31 entries on one or more lines. Since the minimum size of a
+hairpin loop is 3 and we start counting with 0, the first three values
+should be INF to indicate a forbidden value.
+
+@noindent @t{# bulge}@*
+Free energies of bulge loops. Should contain 31 entries, the first one
+being INF.
+
+@noindent @t{# internal_loop}@*
+Free energies of internal loops. Should contain 31 entries, the first 4
+being INF (since smaller loops are tabulated).
+
+@noindent @t{# mismatch_interior}@*
+Free energies for the interaction between the closing pair of an interior
+loop and the two unpaired bases adjacent to the helix. This is a three
+dimensional array indexed by the type of the closing pair (see above) and
+the two unpaired bases. Since we distinguish 5 bases the list contains
+7*5*5 entries and should be formatted either as an 7*25 matrix or 7 5*5
+matrices. The order is such that for example the mismatch
+@example
+                              5'-CU-3'
+                              3'-GC-5'
+@end example
+@noindent corresponds to entry [1,4,2] (CG=1, U=4, C=2), (in this notation
+the first index runs from 1 to 7, second and third from 0 to 4)
+
+@noindent @t{# mismatch_hairpin}@*
+Same as above for hairpin loops.
+
+@noindent @t{# mismatch_enthalpies}@*
+Corresponding enthalpies for rescaling to temperatures other than 37C.
+
+@noindent @t{# int11_energies}@*
+Free energies for symmetric size 2 interior loops. 7*7*5*5 entries formatted
+as 49 5*5 matrices, or seven 7*25 matrices. Example:
+@example
+                              5'-CUU-3'
+                              3'-GCA-5'
+@end example
+corresponds to entry [1,5,4,2], which should be identical to [5,1,2,4].
+
+@noindent @t{# int21_energies}@*
+Free energies for size 3 (2+1) interior loops. 7*7*5*5*5 entries formatted
+in 5*5 or 5*25 matrices. The strand with a single unpaired base is listed
+first, example:
+@example
+                              5'-CU U-3'
+                              3'-GCCA-5'
+@end example
+corresponds to entry [1,5,4,2,2].
+
+@noindent @t{# int22_energies}@*
+Free energies for symmetric size 4 interior loops. To reduce the size of
+parameter files this table only lists canonical bases (A,C,G,U) resulting in
+a 7*7*4*4*4*4 table. See above for an example.
+
+@noindent @t{# dangle5}@*
+Energies for the interaction of an unpaired base on the 5' side and
+adjacent to a helix in multiloops and free ends (the equivalent of mismatch
+energies in interior and hairpin loops). The array is indexed by the type
+of pair closing the helix and the unpaired base and, therefore, forms a 8*5
+matrix. For example the dangling base in
+@example
+                              5'-GC-3'
+                              3'- G-5'
+@end example
+@noindent corresponds to entry [1,3] (CG=1, G=3);
+
+@noindent @t{# dangle3}@*
+Same as above for bases on the 3' side of a helix.
+@example
+                              5'- A-3'
+                              3'-AU-5'
+@end example
+@noindent corresponds to entry [5,1] (AU=5, A=1);
+
+@noindent @t{# ML_params}@*
+For the energy of a multi-loop a function of the form
+@code{E = cu*n_unpaired + ci*loop_degree + cc}
+is used where n_unpaired is the number of unpaired bases in the loop and
+loop_degree is the number of helices forming the loop. In addition a
+``terminal AU'' penalty is applied to AU and GU pairs in the loop.
+The line following the token should contain these four values, in the order
+@code{cu cc ci termAU}. The terminal AU penalty is also used for the
+exterior loop and size 3 hairpins, for other loop types it is already
+included in the mismatch energies.
+
+@noindent @t{# Tetraloops}@*
+Some tetraloops particularly stable tetraloops are assigned an energy
+bonus. Up to forty tetraloops and their bonus energies can be listed
+following the token, one sequence per line. For example:
+@example
+       GAAA    -200
+@end example
+@noindent assigns a bonus energy of -2 kcal/mol to tetraloops containing
+the sequence GAAA.
+
+@noindent @t{# END}@*
+Anything beyond this token will be ignored.
+
+A parameter file need not be complete, it might may contain only a subset
+of interaction parameters, such as only stacking energies. However, for
+each type of interaction listed, all entries have to be present.
+A @samp{*} may be used to indicate entries of a list that are to retain
+their default value. For loop energies a @samp{x} may be used to indicate that
+the value is to be extrapolated from the values for smaller loop sizes.
+Parameter files may contain C-style comments, i.e. any text between
+@code{/*} and @code{*/} will be ignored. However, you may have no more
+than one comment per line and no multi-line comments.
+
+A parameter file listing the default parameter set should accompany your
+distribution as @file{default.par}, the file @file{old.par} contains
+parameters used in version 1.1b of the Package.
+
+
+@node Parsing and Comparing, Utilities, Folding Routines, Top
+@chapter Parsing and Comparing of Structures
+
+@menu
+* notations::                   Representations of Secondary Structures.
+* Parsing::                     Functions for Parsing and Coarse Graining.
+* Distances::                   Distances for RNA Secondary Structures
+@end menu
+
+@node notations, Parsing, Parsing and Comparing, Parsing and Comparing
+@section Representations of Secondary Structures
+
+The standard representation of a secondary structure is the ``bracket
+notation'', where matching brackets symbolize base pairs and unpaired
+bases are shown as dots. Alternatively, one may use two types of node
+labels, 'P' for paired and 'U' for unpaired; a dot is then replaced by
+'(U)', and each closed bracket is assigned an additional identifier 'P'.
+We call this the expanded notation. In @cite{Fontana et al. (1993)} a
+condensed
+representation of the secondary structure is proposed, the so-called
+homeomorphically irreducible tree (HIT) representation. Here a stack is
+represented as a single pair of matching brackets labeled 'P' and
+weighted by the number of base pairs.  Correspondingly, a contiguous
+strain of unpaired bases is shown as one pair of matching brackets
+labeled 'U' and weighted by its length.  Generally any string consisting
+of matching brackets and identifiers is equivalent to a plane tree with
+as many different types of nodes as there are identifiers.
+
+@cite{Bruce Shapiro (1988)} proposed a coarse grained representation, which,
+does not retain the full information of the secondary structure. He
+represents the different structure elements by single matching brackets
+and labels them as 'H' (hairpin loop), 'I' (interior loop), 'B'
+(bulge), 'M' (multi-loop), and 'S' (stack). We extend his alphabet by an
+extra letter for external elements 'E'. Again these identifiers may be
+followed by a weight corresponding to the number of unpaired bases or
+base pairs in the structure element.  All tree representations (except
+for the dot-bracket form) can be encapsulated into a virtual root
+(labeled 'R'), see the example below.
+
+The following example illustrates the different linear tree representations
+used by the package. All lines show the same secondary structure.
+
+@example
+a) .((((..(((...)))..((..)))).)).
+   (U)(((((U)(U)((((U)(U)(U)P)P)P)(U)(U)(((U)(U)P)P)P)P)(U)P)P)(U)
+b) (U)(((U2)((U3)P3)(U2)((U2)P2)P2)(U)P2)(U)
+c) (((H)(H)M)B)
+   ((((((H)S)((H)S)M)S)B)S)
+   (((((((H)S)((H)S)M)S)B)S)E)
+d) ((((((((H3)S3)((H2)S2)M4)S2)B1)S2)E2)R)
+@end example
+
+Above: Tree representations of secondary structures.  a) Full structure:
+the first line shows the more convenient condensed notation which is
+used by our programs; the second line shows the rather clumsy expanded
+notation for completeness, b) HIT structure, c) different versions of
+coarse grained structures: the second line is exactly Shapiro's
+representation, the first line is obtained by neglecting the stems.
+Since each loop is closed by a unique stem, these two lines are
+equivalent.  The third line is an extension taking into account also the
+external digits.  d) weighted coarse structure, this time including the
+virtual root.
+
+For the output of aligned structures from string editing, different
+representations are needed, where we put the label on both sides.
+The above examples for tree representations would then look like:
+
+@example
+a) (UU)(P(P(P(P(UU)(UU)(P(P(P(UU)(UU)(UU)P)P)P)(UU)(UU)(P(P(UU)(U...
+b) (UU)(P2(P2(U2U2)(P2(U3U3)P3)(U2U2)(P2(U2U2)P2)P2)(UU)P2)(UU)
+c) (B(M(HH)(HH)M)B)
+   (S(B(S(M(S(HH)S)(S(HH)S)M)S)B)S)
+   (E(S(B(S(M(S(HH)S)(S(HH)S)M)S)B)S)E)
+d) (R(E2(S2(B1(S2(M4(S3(H3)S3)((H2)S2)M4)S2)B1)S2)E2)R)
+@end example
+
+Aligned structures additionally contain the gap character '_'.
+
+
+@node Parsing, Distances, notations, Parsing and Comparing
+@section Parsing and Coarse Graining of Structures
+
+Several functions are provided for parsing structures and converting to
+different representations.
+
+@deftypefun char* expand_Full (char* @var{full})
+converts the @var{full} structure from bracket notation to the
+expanded notation including root.
+@end deftypefun
+
+@deftypefun char* b2HIT (char* @var{full})
+converts the @var{full} structure from bracket notation to the HIT
+notation including root.
+@end deftypefun
+
+@deftypefun char* b2C (char* @var{full})
+converts the @var{full} structure from bracket notation to the a
+coarse grained notation using the 'H' 'B' 'I' 'M' and 'R' identifiers.
+@end deftypefun
+
+@deftypefun char* b2Shapiro (char* @var{full})
+converts the @var{full} structure from bracket notation to the
+@emph{weighted} coarse grained notation using the 'H' 'B' 'I' 'M' 'S' 'E' and
+'R' identifiers.
+@end deftypefun
+
+@deftypefun char* expand_Shapiro (char* @var{coarse})
+inserts missing 'S' identifiers in unweighted coarse grained structures
+as obtained from @code{b2C()}.
+@end deftypefun
+
+@deftypefun char* add_root (char* @var{any})
+adds a root to an un-rooted tree in any except bracket notation.
+@end deftypefun
+
+@deftypefun char* unexpand_Full (char* @var{expanded})
+restores the bracket notation from an expanded full or HIT tree, that is
+any tree using only identifiers 'U' 'P' and 'R'.
+@end deftypefun
+
+@deftypefun char* unweight (char* @var{expanded})
+strip weights from any weighted tree.
+@end deftypefun
+
+All the above functions allocate memory for the strings they return.
+
+@deftypefun void unexpand_aligned_F (char* @var{align}[2])
+converts two aligned structures in expanded notation as produced by
+@code{tree_edit_distance()} function back to bracket notation with '_'
+as the gap character. The result overwrites the input.
+@end deftypefun
+
+@deftypefun void parse_structure (char* @var{full})
+Collects a statistic of structure elements of the @var{full} structure in
+bracket notation, writing to the following global variables:
+@end deftypefun
+
+@deftypevar int    loop_size[]
+contains a list of all loop sizes. @code{loop_size[0]} contains the
+number of external bases.
+@end deftypevar
+
+@deftypevar int    loop_degree[]
+contains the corresponding list of loop degrees.
+@end deftypevar
+
+@deftypevar int    helix_size[]
+contains a list of all stack sizes.
+@end deftypevar
+
+@deftypevar int    loops
+contains the number of loops ( and therefore of stacks ).
+@end deftypevar
+
+@deftypevar int    pairs
+contains the number of base pairs in the last parsed structure.
+@end deftypevar
+
+@deftypevar int unpaired
+contains the number of unpaired bases.
+@end deftypevar
+
+
+Prototypes for the above functions can be found in @file{RNAstruct.h}.
+
+@node Distances,  , Parsing, Parsing and Comparing
+@section Distance Measures
+
+@menu
+* Tree Distances::              Using Tree Edit Distances to Compare Structures
+* String Distances::            Using String Alignment for Comparison
+* Profile Distances::           Comparing Pair probability Matrices
+@end menu
+
+A simple measure of dissimilarity between secondary structures of equal
+length is the base pair distance, given by the number of pairs present in
+only one of the two structures being compared. I.e. the number of base
+pairs that have to be opened or closed to transform one structure into the
+other. It is therefore particularly useful for comparing structures on the
+same sequence. It is implemented by
+
+@deftypefun int bp_distance (char* @var{s1}, char* @var{s2})
+returns the ``base pair'' distance between two secondary structures @var{s1}
+and @var{s2}, which should have the same length.
+@end deftypefun
+
+For other cases a distance measure that allows for gaps is preferable.
+We can define distances between structures as edit distances between
+trees or their string representations. In the case of string distances
+this is the same as ``sequence alignment''. Given a set of edit operations
+and edit costs, the edit distance is given by the minimum sum of the
+costs along an edit path converting one object into the other. Edit
+distances like these always define a metric. The edit operations used by us
+are insertion, deletion and replacement of nodes.
+String editing does not pay attention to the matching of brackets, while
+in tree editing matching brackets represent a single node of the tree.
+Tree editing is therefore usually preferable, although somewhat
+slower. String edit distances are always smaller or equal to tree edit
+distances.
+
+The different level of detail in the structure representations defined
+above naturally leads to different measures of distance. For full
+structures we use a cost of 1 for deletion or insertion of an unpaired
+base and 2 for a base pair. Replacing an unpaired base for a pair incurs
+a cost of 1.
+
+Two cost matrices are provided for coarse grained structures:
+
+@example
+/*  Null,   H,   B,   I,   M,   S,   E     */
+   @{   0,   2,   2,   2,   2,   1,   1@},   /* Null replaced */
+   @{   2,   0,   2,   2,   2, INF, INF@},   /* H    replaced */
+   @{   2,   2,   0,   1,   2, INF, INF@},   /* B    replaced */
+   @{   2,   2,   1,   0,   2, INF, INF@},   /* I    replaced */
+   @{   2,   2,   2,   2,   0, INF, INF@},   /* M    replaced */
+   @{   1, INF, INF, INF, INF,   0, INF@},   /* S    replaced */
+   @{   1, INF, INF, INF, INF, INF,   0@},   /* E    replaced */
+
+
+/*  Null,   H,   B,   I,   M,   S,   E    */
+   @{   0, 100,   5,   5,  75,   5,   5@},   /* Null replaced */
+   @{ 100,   0,   8,   8,   8, INF, INF@},   /* H    replaced */
+   @{   5,   8,   0,   3,   8, INF, INF@},   /* B    replaced */
+   @{   5,   8,   3,   0,   8, INF, INF@},   /* I    replaced */
+   @{  75,   8,   8,   8,   0, INF, INF@},   /* M    replaced */
+   @{   5, INF, INF, INF, INF,   0, INF@},   /* S    replaced */
+   @{   5, INF, INF, INF, INF, INF,   0@},   /* E    replaced */
+@end example
+
+The lower matrix uses the costs given in @cite{Shapiro (1990)}.
+All distance functions use the following global variables:
+
+
+@deftypevar int   cost_matrix
+if 0, use the default cost matrix (upper matrix in example); otherwise
+use Shapiro's costs (lower matrix).
+@end deftypevar
+
+@deftypevar int   edit_backtrack
+produce an alignment of the two structures being compared by
+tracing the editing path giving the minimum distance.
+@end deftypevar
+
+@deftypevar char* aligned_line[2]
+contains the two aligned structures after a call to one of the distance
+functions with
+@code{edit_backtrack} set to 1. @xref{notations}, for
+details on the representation of structures.
+@end deftypevar
+
+@node Tree Distances, String Distances, Distances, Distances
+@subsection Functions for Tree Edit Distances
+
+
+@deftypefun Tree* make_tree (char* @var{xstruc})
+constructs a @code{Tree} ( essentially the postorder list ) of the
+structure @var{xstruc}, for use in
+@code{tree_edit_distance()}.
+@var{xstruc} may be any rooted structure representation.
+@end deftypefun
+
+@deftypefun float tree_edit_distance (Tree* @var{T1}, Tree* @var{T2})
+calculates the edit distance of the two trees @var{T1} and @var{T2}.
+@end deftypefun
+
+@deftypefun void free_tree (Tree* @var{t})
+frees the memory allocated for @var{t}.
+@end deftypefun
+
+Prototypes for the above functions can be found in @file{treedist.h}. The
+type @code{Tree} is defined in @file{dist_vars.h}, which is automatically
+included with @file{treedist.h}
+
+@node String Distances, Profile Distances, Tree Distances, Distances
+@subsection Functions for String Alignment
+
+@deftypefun swString* Make_swString (char* @var{xstruc})
+converts the structure @var{xstruc} into a format suitable for
+@code{string_edit_distance()}.
+@end deftypefun
+
+@deftypefun float string_edit_distance (swString* @var{T1}, swString* @var{T2})
+calculates the string edit distance of @var{T1} and @var{T2}.
+@end deftypefun
+
+Prototypes for the above functions can be found in @file{stringdist.h}.
+
+@node Profile Distances,  , String Distances, Distances
+@subsection Functions for Comparison of Base Pair Probabilities
+
+For comparison of base pair probability matrices, the matrices are first
+condensed into probability profiles which are the compared by alignment.
+
+@deftypefun float* Make_bp_profile (int @var{length})
+reads the base pair probability matrix @code{pr} (@pxref{Fold
+Vars,,Variables}) and calculates a profile, i.e. a vector containing for
+each base the probabilities of being unpaired, upstream, or downstream
+paired, respectively. The returned array is suitable for
+@code{profile_edit_distance}.
+@end deftypefun
+
+@deftypefun float    profile_edit_distance (float* @var{T1}, float* @var{T2})
+calculates an alignment distance of the two profiles @var{T1} and @var{T2}.
+@end deftypefun
+
+@deftypefun void     free_profile (float* @var{T})
+frees the memory allocated for the profile @var{T}.
+Backward compatibility only. You can just use plain @code{free()}
+@end deftypefun
+
+Prototypes for the above functions can be found in @file{profiledist.h}.
+
+
+@node Utilities, Example, Parsing and Comparing, Top
+@chapter Utilities
+
+The following utilities are used and therefore provided by the library:
+
+@deftypefun int PS_dot_plot (char* @var{sequence}, char* @var{filename})
+reads base pair probabilities produced by @code{pf_fold()} from the
+global array @code{pr} and the pair list @code{base_pair} produced by
+@code{fold()} and produces a postscript ``dot plot'' that is written to
+@var{filename}. The ``dot plot'' represents each base pairing
+probability by a square of corresponding area in a upper triangle
+matrix. The lower part of the matrix contains the minimum free energy
+structure.
+@end deftypefun
+
+@deftypefun int PS_rna_plot (char* @var{sequence}, char* @var{structure}, char* @var{filename})
+produces a secondary structure graph in PostScript and writes it to
+@var{filename}. Note that this function has changed from previous versions
+and now expects the structure to be plotted in dot-bracket notation as an
+argument. It does not make use of the global @code{base_pair} array anymore.
+@end deftypefun
+
+@deftypefun int PS_rna_plot_a (char* @var{sequence}, char* @var{structure}, char* @var{filename}, char* @var{pre}, char* @var{post})
+Same as @code{PS_rna_plot} but adds extra PostScript macros for various
+annotations (see generated PS code). The @var{pre} and @var{post}
+variables contain PostScript code that is verbatim copied in the
+resulting PS file just before and after the structure plot.
+@end deftypefun
+
+@deftypefun int svg_RNA_plot (char* @var{sequence}, char* @var{structure}, char* @var{filename})
+produces a secondary structure plot in SVG format and writes it to
+@var{filename}.
+@end deftypefun
+
+@deftypefun int gmlRNA (char* @var{sequence}, char* @var{structure}, char* @var{filename}, char @var{option})
+produces a secondary structure graph in the Graph Meta Language gml and
+writes it to @var{filename}. If @code{option} is an uppercase letter the
+@code{sequence} is used to label nodes, if @code{option} equals @kbd{'X'}
+or @kbd{'x'} the resulting file will coordinates for an initial layout of
+the graph.
+@end deftypefun
+
+@deftypevar int rna_plot_type
+switches between different layout algorithms for drawing secondary
+structures in @code{PS_rna_plot} and @code{gmlRNA}. Current possibility are
+0 for a simple radial drawing or 1 for the modified radial drawing taken from
+the @code{naview} program of @cite{Bruccoleri & Heinrich (1988)}.
+@end deftypevar
+
+@deftypefun char* random_string (int @var{l}, char* @var{symbols})
+generates a ``random'' string of characters from @var{symbols} with
+length @var{l}.
+@end deftypefun
+
+@deftypefun int    hamming (char* @var{s1}, char* @var{s2})
+returns the number of positions in which @var{s1} and @var{s2} differ,
+the so called ``Hamming'' distance. @var{s1} and @var{s2} should have the
+same length.
+@end deftypefun
+
+@deftypefun {unsigned char*} pack_structure (char* @var{struc})
+returns a binary string encoding the secondary structure @var{struc}
+using a 5:1 compression scheme. The string is NULL terminated and can
+therefore be used with standard string functions such as strcmp().
+Useful for programs that need to keep many structures in memory.
+@end deftypefun
+
+@deftypefun char*  unpack_structure (unsigned char* @var{packed})
+translate a compressed binary string produced by pack_structure() back into
+the familiar dot bracket notation.
+@end deftypefun
+
+@deftypefun short* make_pair_table (char* @var{structure})
+returns a newly allocated table, such that:  table[i]=j if (i.j) pair or 0
+if i is unpaired, table[0] contains the length of the @var{structure}.
+@end deftypefun
+
+@deftypefun char* time_stamp (void)
+returns a string containing the current date in the format
+``Fri Mar 19 21:10:57 1993''.
+@end deftypefun
+
+@deftypefun void   nrerror (char* @var{message})
+writes @var{message} to stderr and aborts the program.
+@end deftypefun
+
+@deftypefun double urn ()
+returns a pseudo random number in the range [0..1[, usually implemented
+by calling @code{erand48()}.
+@end deftypefun
+
+@deftypevar {unsigned short} xsubi[3]
+is used by @code{urn ()}. These should be set to some random number seeds
+before the first call to @code{urn ()}.
+@end deftypevar
+
+@deftypefun int    int_urn (int @var{from}, int @var{to})
+generates a pseudo random integer in the range [@var{from}, @var{to}].
+@end deftypefun
+
+@deftypefun void* space (unsigned int @var{size})
+returns a pointer to @var{size} bytes of allocated and 0 initialized
+memory; aborts with an error if memory is not available.
+@end deftypefun
+
+@deftypefun char* get_line (FILE* @var{fp})
+reads a line of arbitrary length from the stream @var{*fp}, and returns
+a pointer to the resulting string. The necessary memory is allocated and
+should be released using @code{free()} when the string is no longer needed.
+@end deftypefun
+
+Prototypes for @code{PS_rna_plot()} and @code{PS_dot_plot()} reside in
+@file{PS_dot.h}, the other functions are declared in @file{utils.h}.
+
+@node Example, References, Utilities, Top
+@chapter A Small Example Program
+
+The following program exercises most commonly used functions of the library.
+The program folds two sequences using both the mfe and partition function
+algorithms and calculates the tree edit and profile distance of the
+resulting structures and base pairing probabilities.
+
+
+@example
+#include  <stdio.h>
+#include  <math.h>
+#include  "utils.h"
+#include  "fold_vars.h"
+#include  "fold.h"
+#include  "part_func.h"
+#include  "inverse.h"
+#include  "RNAstruct.h"
+#include  "treedist.h"
+#include  "stringdist.h"
+#include  "profiledist.h"
+
+void main()
+@{
+   char *seq1="CGCAGGGAUACCCGCG", *seq2="GCGCCCAUAGGGACGC",
+       *struct1,* struct2,* xstruc;
+   float e1, e2, tree_dist, string_dist, profile_dist, kT;
+   Tree *T1, *T2;
+   swString *S1, *S2;
+   float **pf1, **pf2;
+
+   /* fold at 30C instead of the default 37C */
+   temperature = 30.;      /* must be set *before* initializing  */
+   /* allocate memory for fold(), could be skipped */
+   initialize_fold(strlen(seq1));
+
+   /* allocate memory for structure and fold */
+   struct1 = (char* ) space(sizeof(char)*(strlen(seq1)+1));
+   e1 =  fold(seq1, struct1);
+
+   struct2 = (char* ) space(sizeof(char)*(strlen(seq2)+1));
+   e2 =  fold(seq2, struct2);
+
+   free_arrays();     /* free arrays used in fold() */
+
+   /* produce tree and string representations for comparison */
+   xstruc = expand_Full(struct1);
+   T1 = make_tree(xstruc);
+   S1 = Make_swString(xstruc);
+   free(xstruc);
+
+   xstruc = expand_Full(struct2);
+   T2 = make_tree(xstruc);
+   S2 = Make_swString(xstruc);
+   free(xstruc);
+
+   /* calculate tree edit distance and aligned structures with gaps */
+   edit_backtrack = 1;
+   tree_dist = tree_edit_distance(T1, T2);
+   free_tree(T1); free_tree(T2);
+   unexpand_aligned_F(aligned_line);
+   printf("%s\n%s  %3.2f\n", aligned_line[0], aligned_line[1], tree_dist);
+
+   /* same thing using string edit (alignment) distance */
+   string_dist = string_edit_distance(S1, S2);
+   free(S1); free(S2);
+   printf("%s  mfe=%5.2f\n%s  mfe=%5.2f  dist=%3.2f\n",
+       aligned_line[0], e1, aligned_line[1], e2, string_dist);
+
+   /* for longer sequences one should also set a scaling factor for
+      partition function folding, e.g: */
+   kT = (temperature+273.15)*1.98717/1000.;  /* kT in kcal/mol */
+   pf_scale = exp(-e1/kT/strlen(seq1));
+   init_pf_fold(strlen(seq1));
+
+   /* calculate partition function and base pair probabilities */
+   e1 = pf_fold(seq1, struct1);
+   pf1 = Make_bp_profile(strlen(seq1));
+
+   e2 = pf_fold(seq2, struct2);
+   pf2 = Make_bp_profile(strlen(seq2));
+
+   free_pf_arrays();  /* free space allocated for pf_fold() */
+
+   profile_dist = profile_edit_distance(pf1, pf2);
+   printf("%s  free energy=%5.2f\n%s  free energy=%5.2f  dist=%3.2f\n",
+       aligned_line[0], e1, aligned_line[1], e2, profile_dist);
+
+   free_profile(pf1); free_profile(pf2);
+@}
+@end example
+
+In a typical Unix environment you would compile this program using:
+@kbd{cc -c example.c -I@var{hpath}}
+and link using
+@kbd{cc -o example example.o -L@var{lpath} -lRNA -lm}
+where @var{hpath} and @var{lpath} point to the location of the header
+files and library, respectively.
+
+@node References, Function Index, Example, Top
+@chapter References
+
+@itemize -
+@item D.H. Mathews, J. Sabina, M. Zuker and H. Turner (1999)@*
+   Expanded sequence dependence of thermodynamic parameters provides
+   robust prediction of RNA secondary structure, JMB, 288: 911-940
+
+@item M. Zuker and P. Stiegler (1981)@*
+   Optimal  computer  folding  of large RNA sequences using
+   thermodynamic and auxiliary information, Nucl Acid Res 9: 133-148
+
+@item D.A. Dimitrov, M.Zuker(2004)@*
+   Prediction of hybridization and melting for double stranded nucleic
+   acids, Biophysical J. 87: 215-226,
+
+@item J.S. McCaskill (1990)@*
+   The equilibrium partition function and base pair binding
+   probabilities for RNA secondary structures, Biopolymers 29: 1105-1119
+
+@item D.H. Turner, N. Sugimoto and S.M. Freier (1988)@*
+   RNA structure prediction, Ann Rev Biophys Biophys Chem 17: 167-192
+
+@item J.A. Jaeger, D.H. Turner and M. Zuker (1989)@*
+   Improved predictions of secondary structures for RNA,
+   Proc. Natl. Acad. Sci. 86: 7706-7710
+
+@item L. He, R. Kierzek, J. SantaLucia, A.E. Walter and D.H. Turner (1991)@*
+   Nearest-Neighbor Parameters For GU Mismatches,
+   Biochemistry 30: 11124-11132
+
+@item A.E. Peritz, R. Kierzek, N, Sugimoto, D.H. Turner (1991)@*
+   Thermodynamic Study of Internal Loops in Oligoribonucleotides ... ,
+   Biochemistry 30: 6428--6435
+
+@item A. Walter, D. Turner, J. Kim, M. Lyttle, P. M@"uller, D. Mathews and M. Zuker (1994)@*
+   Coaxial stacking of helices enhances binding of Oligoribonucleotides..,
+   Proc. Natl. Acad. Sci. 91: 9218-9222
+
+@item B.A. Shapiro, (1988)@*
+   An algorithm for comparing multiple  RNA secondary structures,
+   CABIOS 4, 381-393
+
+@item B.A. Shapiro and K. Zhang (1990)@*
+   Comparing multiple RNA secondary structures using tree comparison,
+   CABIOS 6, 309-318
+
+@item R. Bruccoleri and G. Heinrich (1988)@*
+   An improved algorithm for nucleic acid secondary structure display,
+   CABIOS 4, 167-173
+
+@item W. Fontana , D.A.M. Konings, P.F. Stadler, P. Schuster (1993) @*
+   Statistics of RNA secondary structures, Biopolymers 33, 1389-1404
+
+@item W. Fontana, P.F. Stadler, E.G. Bornberg-Bauer, T. Griesmacher, I.L.
+   Hofacker, M. Tacker, P. Tarazona, E.D. Weinberger, P. Schuster (1993)@*
+   RNA folding and combinatory landscapes, Phys. Rev. E 47: 2083-2099
+
+@item I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P.
+   Schuster (1994) Fast Folding and Comparison of RNA Secondary Structures.
+   Monatshefte f. Chemie 125: 167-188
+
+@item I.L. Hofacker (1994) The Rules of the Evolutionary Game for RNA:
+   A Statistical Characterization of the Sequence to Structure Mapping in RNA.
+   PhD Thesis, University of Vienna.
+
+@item I.L. Hofacker, M. Fekete, P.F. Stadler (2002).
+   Secondary Structure Prediction for Aligned RNA Sequences.
+   J. Mol. Biol. 319:1059-1066
+
+@item S.H. Bernhart, I.L. Hofacker, S. Will, A. Gruber, P.F. Stadler (2008).@*
+   RNAalifold: improved consensus structure prediction for RNA
+   alignments. BMC Bioinformatics, 9:474, 2008.  
+
+@item I.L. Hofacker and Peter F. Stadler (2006).@* 
+   Memory efficient folding algorithms for circular RNA secondary
+   structures. Bioinformatics, 22:1172–1176.
+
+@item S.H. Bernhart, H. Tafer, U. Mückstein, Ch. Flamm, P.F. Stadler, 
+   I.L. Hofacker (2006). Partition function and base pairing probabilities
+   of RNA heterodimers. Algorithms Mol. Biol., 1:3.
+
+@item S.H. Bernhart, I.L. Hofacker, P.F. Stadler (2006).@* 
+   Local RNA base pairing probabilities in large sequences.
+   Bioinformatics, 22:614–615.
+
+@item  U. Mückstein, H. Tafer, J. Hackermüller, S.H. Bernhart,
+   P.F. Stadler, I.L Hofacker (2006). 
+   Thermodynamics of RNA-RNA binding. Bioinformatics, 22:1177–1182.   
+
+@item I.L. Hofacker, B. Priwitzer, P.F. Stadler (2004).@* 
+   Prediction of locally stable RNA secondary structures for genome-wide
+   surveys. Bioinformatics, 20:186–190.
+
+@item D. Adams (1979) @*
+   The hitchhiker's guide to the galaxy, Pan Books, London
+@end itemize
+
+@node Function Index, Variable Index, References, Top
+
+@unnumbered Function Index
+@printindex fn
+
+@node Variable Index,  , Function Index, Top
+
+@unnumbered Variable Index
+@printindex vr
+
+@contents
+@bye
+
+@c  LocalWords:  texinfo setfilename RNAlib settitle syncodeindex vr fn iftex
+@c  LocalWords:  afourpaper ifinfo setchapternewpage titlepage sp vskip pt dir
+@c  LocalWords:  filll mfe PF Params Zuker Stiegler pxref Freier al findex mol
+@c  LocalWords:  struct int params McCaskill pf pr iindx init func kT
+@c  LocalWords:  vindex symbolset noGU GU closingGU tetra ABCD AU rescale UA CG
+@c  LocalWords:  multiloops nonstandards UG GAAG GA AG exp xref vars Shapiro's
+@c  LocalWords:  homeomorphically UU HH emph un unexpand unweight RNAstruct INF
+@c  LocalWords:  xstruc treedist swString stringdist bp profiledist PS rna Fri
+@c  LocalWords:  nrerror stderr congruential xsubi fp utils dist seq RNAfold CA
+@c  LocalWords:  CGCAGGGAUACCCGCG GCGCCCAUAGGGACGC initialising strlen sizeof
+@c  LocalWords:  printf Nucl Biopolymers Sugimoto Biophys Chem Proc Natl Sci CU
+@c  LocalWords:  Kierzek SantaLucia CABIOS Zhang Konings Bornberg RNAdistance
+@c  LocalWords:  Griesmacher Tarazona Weinberger Phys Monatshefte enthalpies cu
+@c  LocalWords:  Chemie hitchhiker's printindex ci cc Tetraloops tetraloops url
+@c  LocalWords:  GAAA html http www tbi univie ac ivo Param deftypefun var ln
+@c  LocalWords:  deftypevar const fname samp erand kbd hpath lpath lRNA lm Acad
+@c  LocalWords:  Peritz Oligoribonucleotides Lyttle uller Bonhoeffer ifclear lt
+@c  LocalWords:  smallexample texi ifset ifhtml href ViennaRNA mailto gt noLP
+@c  LocalWords:  initially co noindent gmlRNA gml naview Bruccoleri struc AUGC
+@c  LocalWords:  strcmp LocalWords Hofacker Stadler noLonelyPairs GC mutli GAUA
+@c  LocalWords:  CGCU CUU GCA GCCA termAU PostScript Zuker JMB combinatory pre