--- /dev/null
+This is RNAlib.info, produced by makeinfo version 4.13 from RNAlib.texinfo.
+
+\1f
+File: RNAlib.info, Node: Top, Next: Introduction, Prev: (dir), Up: (dir)
+
+ This file documents the Vienna RNA Package Version 1.6
+
+ Copyright (C)1994-2002 Ivo Hofacker and Peter Stadler
+
+* Menu:
+
+* Introduction::
+* Folding Routines:: Functions for Folding RNA Secondary Structures
+* Parsing and Comparing:: Functions to Manipulate Structures
+* Utilities:: Odds and Ends
+* Example:: A Small Example Program
+* References::
+* Function Index::
+* Variable Index::
+
+\1f
+File: RNAlib.info, Node: Introduction, Next: Folding Routines, Prev: Top, Up: Top
+
+1 Introduction
+**************
+
+The core of the Vienna RNA Package is formed by a collection of routines
+for the prediction and comparison of RNA secondary structures. These
+routines can be accessed through stand-alone programs, such as RNAfold,
+RNAdistance etc., which should be sufficient for most users. For those
+who wish to develop their own programs we provide a library which can be
+linked to your own code.
+
+ This document describes the library and will be primarily useful to
+programmers. However, it also contains details about the implementation
+that may be of interest to advanced users. The stand-alone programs are
+described in separate man pages. The latest version of the package
+including source code and html versions of the documentation can be
+found at `http://www.tbi.univie.ac.at/~ivo/RNA/'. This manual
+documents version 1.6.
+
+ Please send comments and bug reports to <ivo@tbi.univie.ac.at>.
+
+\1f
+File: RNAlib.info, Node: Folding Routines, Next: Parsing and Comparing, Prev: Introduction, Up: Top
+
+2 Folding Routines
+******************
+
+* Menu:
+
+* mfe Fold:: Calculating Minimum Free Energy Structures
+* PF Fold:: Calculating Partition Functions and Pair Probabilities
+* Inverse Fold:: Searching for Predefined Structures
+* Suboptimal folding:: Enumerating Suboptimal Structures
+* Cofolding:: Folding of 2 RNA molecules
+* Local Fold:: Predicting local structures of large sequences
+* Alignment Fold:: Predicting Consensus Structures from Alignment
+* Fold Vars:: Global Variables for the Folding Routines
+* Param Files:: Reading Energy Parameters from File
+
+\1f
+File: RNAlib.info, Node: mfe Fold, Next: PF Fold, Prev: Folding Routines, Up: Folding Routines
+
+2.1 Minimum free Energy Folding
+===============================
+
+The library provides a fast dynamic programming minimum free energy
+folding algorithm as described by `Zuker & Stiegler (1981)'.
+Associated functions are:
+
+ -- Function: float fold (char* SEQUENCE, char* STRUCTURE)
+ folds the SEQUENCE and returns the minimum free energy in kcal/mol;
+ the mfe structure in bracket notation (*note notations::) is
+ returned in STRUCTURE. Sufficient space for string of the same
+ length as SEQUENCE must be allocated for STRUCTURE before calling
+ `fold()'. If `fold_constrained' (*note Variables: Fold Vars.) is
+ 1, the STRUCTURE string is interpreted on input as a list of
+ constraints for the folding. The characters " | x < > " mark bases
+ that are paired, unpaired, paired upstream, or downstream,
+ respectively; matching brackets " ( ) " denote base pairs, dots
+ "." are used for unconstrained bases. Constrained folding works by
+ assigning bonus energies to all structures that comply with the
+ constraint.
+
+ -- Function: float energy_of_struct (char* SEQUENCE, char* STRUCTURE)
+ calculates the energy of SEQUENCE on the STRUCTURE
+
+ The `circ_fold()' and `energy_of_circ_struct()' provide analogous
+functions for folding and evaluating cricular RNA molecules.
+
+ -- Function: void initialize_fold (int LENGTH)
+ Obsolete function kept for backward compatibility. Allocates
+ memory for mfe folding sequences not longer than LENGTH, and sets
+ up pairing matrix and energy parameters.
+
+ -- Function: void free_arrays ()
+ frees the memory allocated by `fold()'.
+
+ -- Function: void update_fold_params ()
+ call this to recalculate the pair matrix and energy parameters
+ after a change in folding parameters like `temperature' (*note
+ Variables: Fold Vars.).
+
+ Prototypes for these functions are declared in `fold.h'.
+
+\1f
+File: RNAlib.info, Node: PF Fold, Next: Inverse Fold, Prev: mfe Fold, Up: Folding Routines
+
+2.2 Partition Function Folding
+==============================
+
+Instead of the minimum free energy structure the partition function of
+all possible structures and from that the pairing probability for every
+possible pair can be calculated, using a dynamic programming algorithm
+as described by `McCaskill (1990)'. The following functions are
+provided:
+
+ -- Function: float pf_fold (char* SEQUENCE, char* STRUCTURE)
+ calculates the partition function Z of SEQUENCE and returns the
+ free energy of the ensemble F in kcal/mol, where F=-kT ln(Z). If
+ STRUCTURE is not a NULL pointer on input, it contains on return a
+ string consisting of the letters " . , | { } ( ) " denoting bases
+ that are essentially unpaired, weakly paired, strongly paired
+ without preference, weakly upstream (downstream) paired, or
+ strongly up- (down-)stream paired bases, respectively. If
+ `fold_constrained' (*note Variables: Fold Vars.) is 1, the
+ STRUCTURE string is interpreted on input as a list of constraints
+ for the folding. The character "x" marks bases that must be
+ unpaired, matching brackets " ( ) " denote base pairs, all other
+ characters are ignored. Any pairs conflicting with the constraint
+ will be forbidden. This usually sufficient to ensure the
+ constraints are honored. If `do_backtrack' (*note Variables: Fold
+ Vars.) has been set to 0 base pairing probabilities will not be
+ computed (saving CPU time), otherwise the `pr[iindx[i]-j]' (*note
+ Variables: Fold Vars.) will contain the probability that bases i
+ and j pair.
+
+ -- Function: void init_pf_fold (int LENGTH)
+ allocates memory for folding sequences not longer than LENGTH;
+ sets up pairing matrix and energy parameters. Has to be called
+ before the first call to `pf_fold()'.
+
+ -- Function: void free_pf_arrays (void)
+ frees the memory allocated by `init_pf_fold()'.
+
+ -- Function: void update_pf_params (int LENGTH)
+ Call this function to recalculate the pair matrix and energy
+ parameters after a change in folding parameters like temperature
+ (*note Variables: Fold Vars.).
+
+ -- Function: double mean_bp_dist (int LENGTH)
+ computes the mean base pair distance in the equilibrium ensemble
+ as a measure of the structural diversity. It is given by <d> =
+ \sum_a,b p_a * p_b * d(a,b), where the sum goes over all pairs of
+ possible structure a,b, p_a is the Boltzmann weight of structure
+ a, and d(a,b) is the base pair distance (see `bp_distance()' in
+ *Note Distances::.). The mean base pair distances can be computed
+ efficiently from the pair probabilities p_ij as <d> = \sum_ij p_ij
+ * (1-p_ij). Uses the global `pr' array filled by a previous call
+ to `pf_fold()'.
+
+ -- Function: char *centroid (int LENGTH, double *DIST)
+ Computes the centroid structure, i.e. the structure having the
+ lowest average base pair distance to all structures in the
+ Boltzmann ensemble. This can be computed trivially from the pair
+ probabilities by choosing all base pairs that have probability
+ greater 0.5. The distance of the centroid to the ensemble is
+ returned in DIST.
+
+ Prototypes for these functions are declared in `part_func.h'.
+
+ -- Function: float *MEA (plist *PL, char *STRUCTURE, double GAMMA)
+ Computes a MEA (maximum expected accuracy) structure, such that
+ expected accuracy A(S) = \sum_(i,j) \in S 2 \gamma p_ij + \sum_i
+ \notin S p^u_i is maximised. Higher values of \gamma result in
+ more base pairs of lower probability and thus higher sensitivity.
+ Low values of gamm result in structures containing only highly
+ likely pairs (high specificity). The code of the MEA function
+ also demonstrates the use of sparse dynamic programming scheme to
+ reduce the time and memory complexity of folding.
+ The corresponding protoype is declared in `MEA.h'.
+
+\1f
+File: RNAlib.info, Node: Inverse Fold, Next: Suboptimal folding, Prev: PF Fold, Up: Folding Routines
+
+2.3 Inverse Folding
+===================
+
+We provide two functions that search for sequences with a given
+structure, thereby inverting the folding routines.
+
+ -- Function: float inverse_fold (char* START, char* TARGET)
+ searches for a sequence with minimum free energy structure TARGET,
+ starting with sequence START. It returns 0 if the search was
+ successful, otherwise a structure distance to TARGET is returned.
+ The found sequence is returned in START. If `give_up' is set to 1,
+ the function will return as soon as it is clear that the search
+ will be unsuccessful, this speeds up the algorithm if you are only
+ interested in exact solutions. Since `inverse_fold()' calls
+ `fold()' you have to allocate memory for folding by calling
+ `initialize_fold()'
+
+ -- Function: float inverse_pf_fold (char* START, char* TARGET)
+ searches for a sequence with maximum probability to fold into
+ structure TARGET using the partition function algorithm. It returns
+ -kT log(p) where p is the frequency of TARGET in the ensemble of
+ possible structures. This is usually much slower than
+ `inverse_fold()'. Since `inverse_pf_fold()' calls `pf_fold()' you
+ have to allocate memory for folding by calling `init_pf_fold()'
+
+ -- Variable: char *symbolset
+ The global variable `char *symbolset' points to the allowed bases,
+ initially `"AUGC"'. It can be used to design sequences from
+ reduced alphabets.
+
+ Prototypes for these functions are declared in `inverse.h'.
+
+\1f
+File: RNAlib.info, Node: Suboptimal folding, Next: Cofolding, Prev: Inverse Fold, Up: Folding Routines
+
+2.4 Suboptimal folding
+======================
+
+ -- data type: SOLUTION struct {float energy; char *structure;}
+
+ -- Function: SOLUTION* subopt (char* SEQUENCE, char* CONSTRAINT, int*
+ DELTA, FILE* FP)
+ produces *all* suboptimal secondary structures within DELTA*0.01
+ kcal/mol of the optimum. The results are either directly written
+ to a FP (if FP is not `NULL'), or (FP`==NULL') returned in a
+ `SOLUTION *' list terminated by an entry were the `structure'
+ pointer is `NULL'.
+
+ Prototypes for these functions are declared in `subopt.h'.
+
+\1f
+File: RNAlib.info, Node: Cofolding, Next: Local Fold, Prev: Suboptimal folding, Up: Folding Routines
+
+2.5 Predicting hybridization structures of two molecules
+========================================================
+
+The function of an RNA molecule often depends on its interaction with
+other RNAs. The following routines therefore allow to predict structures
+formed by two RNA molecules upon hybridization. One approach to
+co-folding two RNAs consists of concatenating the two sequences and
+keeping track of the concatenation point in all energy evaluations.
+Correspondingly, many of the `cofold()' and `co_pf_fold()' routines
+below take one sequence string as argument and use the the global
+variable CUT_POINT to mark the concatenation point. Note that while the
+`RNAcofold' program uses the '&' character to mark the chain break in
+its input, you should not use an '&' when using the library routines
+(set CUT_POINT instead). In a second approach to co-folding two RNAs,
+cofolding is seen as a stepwise process. In the first step the
+probability of an unpaired region is calculated and in a second step
+this probability of an unpaired region is multiplied with the
+probability of an interaction between the two RNAs. This approach is
+implemented for the interaction between a long target sequence and a
+short ligand RNA. Function `pf_unstru()' calculates the partition
+function over all unpaired regions in the input sequence. Function
+`pf_interact()', which calculates the partition function over all
+possible interactions between two sequences, needs both sequence as
+separate strings as input.
+
+ -- Variable: int cut_point
+ `cut_point' marks the position (starting from 1) of the first
+ nucleotide of the second molecule within the concatenated sequence.
+ The default value of -1 stands for single molecule folding. The
+ `cut_point' variable is also used by `PS_rna_plot()' and
+ `PS_dot_plot()' to mark the chain break in postscript plots.
+
+ -- Function: float cofold (char *SEQUENCE, char *STRUCTURE)
+ The analog to the `fold()' function. Computes the minimum free
+ energy of two RNA molecules interacting. If `cut_point==-1' results
+ should be the same as with `fold()'.
+
+ -- Function: void free_co_arrays (void)
+ Frees arrays allocated by `cofold()'.
+
+ -- Function: void initialize_cofold (int LENGTH)
+ Allocates memory for mfe cofolding sequences not longer than
+ LENGTH, and sets up pairing matrix and energy parameters.
+ Explicitly calling `initialize_cofold()' is normally not
+ necessary, as it will be called automagically before folding.
+
+ Prototypes for these functions are declared in `cofold.h'.
+
+2.6 Partition Function Cofolding
+================================
+
+As for folding one RNA molecule, this computes the partition function
+of all possible structures and the base pair probabilities. Uses the
+same global PF_SCALE variable to avoid overflows.
+
+ To simplify the implementation the partition function computation is
+done internally in a null model that does not include the duplex
+initiation energy, i.e. the entropic penalty for producing a dimer from
+two monomers). The resulting free energies and pair probabilities are
+initially relative to that null model. In a second step the free
+energies can be corrected to include the dimerization penalty, and the
+pair probabilities can be divided into the conditional pair
+probabilities given that a re dimer is formed or not formed.
+
+ -- data type: cofoldF struct {double F0AB, FAB, FcAB, FA, FB;}
+
+ -- Function: cofoldF co_pf_fold (char* SEQUENCE, char* STRUCTURE)
+ The analog to the `pf_fold()' function, it computes the partition
+ function of two interacting RNA molecules as well as base pair
+ probabilities. It returns a struct including free energies
+ computed for different models. F0AB is the free energy in the null
+ model; FAB is corrected to include the initiation penalty, FcAB
+ only includes real dimers. FA and FB are the free energies of the
+ 2 molecules, resp. As with `cofold()' the CUT_POINT global
+ variable has to be set to mark the chain break between the
+ molecules.
+
+ -- Function: void free_co_pf_arrays (void)
+ frees partition function cofold arrays allocated in `
+ co_pf_fold()'.
+
+ -- Function: void init_co_pf_fold (int LENGTH)
+ Obsolete function kept for backward compatibility. Allocates
+ memory for partition function cofolding sequences not longer than
+ LENGTH, and sets up pairing matrix and energy parameters.
+
+ -- data type: plist struct {int i; int j; float p;}
+
+ -- Function: extern struct plist *get_plist (struct plist *PL, int
+ LENGTH, double CUT_OFF)
+ Makes a list of base pairs out of the global *PR array, ignoring
+ all base pairs with a probability less than CUT_OFF. This is
+ useful since the PR array gets overwritten by subsequent foldings.
+
+ Prototypes for these functions are declared in `co_part_func.h'.
+
+2.7 Cofolding all Dimeres, Concentrations
+=========================================
+
+After computing the partition functions of all possible dimeres one can
+compute the probabilities of base pairs, the concentrations out of
+start concentrations and sofar and soaway.
+
+ -- Function: void compute_probabilities (double FAB, double FEA,
+ double FEB, struct plist *PRAB, struct plist *PRA, struct
+ plist *PRB, int ALENGTH)
+ Given the pair probabilities and free energies (in the null model)
+ for a dimer AB and the two constituent monomers A and B, compute
+ the conditional pair probabilities given that a dimer AB actually
+ forms. Null model pair probabilities are given as a list as
+ produced by `get_plist()'), the dimer probabilities PRAB are
+ modified in place.
+
+ Dimer formation is inherently concentration dependent. Given the free
+energies of the monomers A and B and dimers AB, AA, and BB one can
+compute the equilibrium concentrations, given input concentrations of A
+and B, see e.g.~`Dimitrov & Zuker (2004)'
+
+ -- data type: ConcEnt struct {double A0; double B0;double ABc;double
+ AAc; double BBc; double Ac; double Bc;}
+
+ -- Function: extern struct ConcEnt *get_concentrations(double FEAB,
+ double FEAA, double FEBB, double FEA, double FEB, double *
+ STARTCONC)
+ Takes an array STARTCONC of input concentrations with alternating
+ entries for the initial concentrations of molecules A and B
+ (terminated by two zeroes), then computes the resulting
+ equilibrium concentrations from the free energies for the dimers.
+ Dimer free energies should be the dimer-only free energies, i.e.
+ the FcAB entries from the `cofoldF' struct.
+
+ Prototypes for these functions are declared in `co_part_func.h'.
+
+2.8 Partition Function Cofolding as stepwise process
+====================================================
+
+In this approach to cofolding the interaction between two RNA molecules
+is seen as a stepwise process. In a first step, the target molecule has
+to adopt a structure in which a binding site is accessible. In a second
+step, the ligand molecule will hybridize with a region accessible to an
+interaction. Consequently the algorithm is designed as a two step
+process: The first step is the calculation of the probability that a
+region within the target is unpaired, or equivalently, the calculation
+of the free energy needed to expose a region. In the second step we
+compute the free energy of an interaction for every possible binding
+site.
+
+ -- data type: pu_contrib struct {double **H; double **I; double **M;
+ double **E; int length;}
+
+ -- Function: `pu_contrib' *pf_unstru (char *SEQUENCE, int U)
+ This function calculates the partition function over all unpaired
+ regions in *SEQUENCE of a maximal length U. You have to call
+ function `pf_fold()' for *SEQUENCE befor calling `pf_unstru()'. If
+ you want to calculate unpaired regions for a constrained
+ structure, set variable STRUCTURE in function `pf_fold()' to the
+ constrain string. Function `pf_unstru' returns a `pu_contrib'
+ struct containing four arrays of dimension [i = 1 to length of
+ *SEQUENCE][j = 0 to U-1] containing all possible contributions to
+ the probabilities of unpaired regions of maximum length U. Each
+ array in `pu_contrib' contains one of the contributions to the
+ total probability of being unpaired: The probability of being
+ unpaired within an exterior loop is in array `pu_contrib->E', the
+ probability of being unpaired within a hairpin loop is in array
+ `pu_contrib->H', the probability of being unpaired within an
+ interior loop is in array `pu_contrib->I' and probability of being
+ unpaired within a multi-loop is in array `pu_contrib->M'. The
+ total probability of being unpaired is the sum of the four arrays
+ of `pu_contrib'. Function `pf_unstru' frees everything allocated
+ automatically. To free the output structure call `free_pu_contrib'.
+
+ -- Function: void free_pu_contrib (`pu_contrib' *P_CON)
+ frees the output of function `pf_unstru'.
+
+ -- data type: interact struct {double *Pi; double *I; double Gikjl;
+ double Gikjl_wo; int i, int k, int j, int l, int length;}
+
+ -- Function: interact *pf_interact (const char *S1, const char *S2,
+ `pu_contrib' *P_C, `pu_contrib' *P_C2,int W, char *CSTRUC,
+ int INCR3, int INCR5)
+ Calculates the probability of a local interaction between sequence
+ *S1 and sequence *S2, considering the probability that the region
+ of interaction is unpaired within *S1 and *S2. The longer
+ sequence has to be given as *S1. The shorter sequence has to be
+ given as *S2. Function `pf_unstru()' has to be called for *S1 and
+ *S2, where the probabilities of being unpaired have to be given
+ in *P_C and *P_C2, respectively. If you do not want to include the
+ probabilities of being unpaired for *S2 set *P_C2 to NULL. If
+ variable char *CSTRUC is not NULL, constrained folding is done:
+ The available constrains for intermolecular interaction are: '.'
+ (no constrain), 'x' (the base has no intermolecular interaction)
+ and '|' (the corresponding base has to be paired
+ intermolecularily). The parameter W determines the maximal length
+ of the interaction. The parameters INCR5 and INCR3 allows
+ inclusion of unpaired residues left (INCR5) and right (INCR3) of
+ the region of interaction in *S1. If the INCR options are used,
+ function `pf_unstru()' has to be called with W=W+INCR5+INCR3 for
+ the longer sequence *S1. Function `pf_interact()' returns a
+ structure of type `interact' which contains the probability of the
+ best local interaction including residue i in Pi and the minimum
+ free energy in Gi, where i is the position in sequence *S1. The
+ member Gikjl of structure `interact' is the best interaction
+ between region [k,i] k<i in longer sequence *S1 and region [j,l]
+ j<l in *S2. Gikjl_wo is Gikjl without the probability of beeing
+ unpaired. Use `free_interact)' to free the returned structure, all
+ other stuff is freed inside `pf_interact()'.
+
+ -- Function: void free_interact (`interact' *PIN)
+ frees the output of function `pf_interact'.
+
+ Prototypes for these functions are declared in `part_func_up.h'.
+
+\1f
+File: RNAlib.info, Node: Local Fold, Next: Alignment Fold, Prev: Cofolding, Up: Folding Routines
+
+2.9 Searching for local structures in large molecules
+=====================================================
+
+Local structures can be predicted by a modified version of the `fold()'
+algorithm that restricts the span of all base pairs.
+
+ -- Function: float Lfold (char *STRING, char *STRUCTURE, int MAXDIST)
+ The local analog to `fold()'. Computes the minimum free energy
+ structure including only base pairs with a span smaller than
+
+ MAXDIST.
+
+ -- Function: struct plist *pfl_fold (char *SEQUENCE, int WINSIZE,int
+ PAIRSIZE, float CUTOFFB, double **PU,struct plist **DPP2,
+ FILE *PUFP, FILE *SPUP))
+ pfl_fold computes partition functions for every window of size
+ WINSIZE possible in a RNA molecule, allowing only pairs with a span
+ smaller than PAIRSIZE. It returns the mean pair probabilities
+ averaged over all windows containing the pair in PL. WINSIZE should
+ always be >= PAIRSIZE. Note that in contrast to `Lfold()', bases
+ outside of the window do not influence the structure at all. Only
+ probabilities higher than CUTOFF are kept.
+
+ The remaining arguments are optional in the sense that they can be
+ set to NULL unless the corresponding functionality is wanted. If
+ PU is supplied (i.e is not the NULL pointer), `pfl_fold()' will
+ also compute local _accessibilities_, as the mean probability that
+ regions of length `u' or smaller are unpaired. The parameter
+ `ulength' is supplied in `pU[0][0]'. On return the PU array will
+ contain these probabilities, with the entry on `pU[x][y]'
+ containing the mean probability that x and the y following bases
+ are unpaired. If DPP2 is supplied (i.e is not the NULL pointer),
+ `pfl_fold()' will also compute the probability of stacked pairs,
+ more precisely the conditional probabilities of base pairs (i,j)
+ given that the enclosing pair (i-1,j+1) is formed. The file
+ pointers PUFP and SPUP file pointer is provided, the unpaired
+ probabilities will direcly be printed and not saved, which can be
+ useful when memory is an issue). If the file pointer is
+ provided, the base pair probabilities will direcly be printed and
+ not saved, which can be useful when memory is an issue).
+
+
+ -- Function: void init_pf_foldLP (int LENGTH)
+ Analog to `init_pf_fold()'. LENGTH is the length of the sequence.
+
+ -- Function: void free_pf_arraysLP (void)
+ frees the arrays used by `pfl_fold()'
+
+ -- Function: void update_pf_paramsLP (int LENGTH)
+ Analog to `update_pf_params()'. LENGTH is the length of the
+ sequence.
+
+\1f
+File: RNAlib.info, Node: Alignment Fold, Next: Fold Vars, Prev: Local Fold, Up: Folding Routines
+
+2.10 Predicting Consensus Structures from an Alignment
+======================================================
+
+Consensus structures can be predicted by a modified version of the
+`fold()' algorithm that takes a set of aligned sequences instead of a
+single sequence. The energy function consists of the mean energy
+averaged over the sequences, plus a covariance term that favors pairs
+with consistent and compensatory mutations and penalizes pairs that
+cannot be formed by all structures. For details see `Hofacker (2002)'.
+
+ -- Function: float alifold (char** SEQUENCES, char* STRUCTURE)
+ Predicts the consensus structure for the aligned SEQUENCES and
+ returns the minimum free energy; the mfe structure in bracket
+ notation (*note notations::) is returned in STRUCTURE. Sufficient
+ space must be allocated for STRUCTURE before calling `alifold()'.
+
+ -- Function: void free_alifold_arrays ()
+ frees the memory allocated by `alifold()'.
+
+ -- Function: void update_alifold_params ()
+ call this to recalculate the pair matrix and energy parameters
+ after a change in folding parameters like `temperature' (*note
+ Variables: Fold Vars.).
+
+ -- Function: float alipf_fold (char** SEQUENCES, char* STRUCTURE,
+ pair_info **PI)
+ The partition function version of `alifold()' works in analogy to
+ `pf_fold()'. Pair probabilities and information about sequence
+ covariations are returned via the PI variable as a list of
+ `pair_info' structs. The list is terminated by the first entry
+ with `pi.i = 0'.
+
+ -- data type: pair_info struct {short i,j; float p; float ent; short
+ bp[8]; char comp;}
+ for each base pair `i,j' the structure lists: its probability `p',
+ an entropy-like measure for its well-definedness `ent', and in
+ `bp[]' the frequency of each type of pair. `bp[0]' contains the
+ number of non-compatible sequences, `bp[1]' the number of CG
+ pairs, etc.
+
+ -- Variable: double cv_fact
+ `cv_fact' controls the weight of the covariance term in the energy
+ function. Default is 1.
+
+ -- Variable: double nc_fact
+ controls the magnitude of the penalty for non-compatible sequences
+ in the covariance term. Default is 1.
+
+\1f
+File: RNAlib.info, Node: Fold Vars, Next: Param Files, Prev: Alignment Fold, Up: Folding Routines
+
+2.11 Global Variables for the Folding Routines
+==============================================
+
+The following global variables change the behavior the folding
+algorithms or contain additional information after folding.
+
+ -- Variable: int noGU
+ do not allow GU pairs if equal 1; default is 0.
+
+ -- Variable: int no_closingGU
+ if 1 allow GU pairs only inside stacks, not as closing pairs;
+ default is 0.
+
+ -- Variable: int noLonelyPairs
+ Disallow all pairs which can *only* occur as lonely pairs (i.e. as
+ helix of length 1). This avoids lonely base pairs in the predicted
+ structures in most cases.
+
+ -- Variable: int tetra_loop
+ include special stabilizing energies for some tetra loops; default
+ is 1.
+
+ -- Variable: int energy_set
+ if 1 or 2: fold sequences from an artificial alphabet ABCD...,
+ where A pairs B, C pairs D, etc. using either GC (1) or AU
+ parameters (2); default is 0, you probably don't want to change it.
+
+ -- Variable: float temperature
+ rescale energy parameters to a temperature of `temperature' C.
+ Default is 37C. You have to call the update_..._params() functions
+ after changing this parameter.
+
+ -- Variable: int dangles
+ if set to 0 no stabilizing energies are assigned to bases adjacent
+ to helices in free ends and multiloops (so called dangling ends).
+ Normally (`dangles = 1') dangling end energies are assigned only
+ to unpaired bases and a base cannot participate simultaneously in
+ two dangling ends. In the partition function algorithm `pf_fold()'
+ these checks are neglected. If `dangles' is set to 2, the
+ `fold()' and `energy_of_struct()' function will also follow this
+ convention. This treatment of dangling ends gives more favorable
+ energies to helices directly adjacent to one another, which can be
+ beneficial since such helices often do engage in stabilizing
+ interactions through co-axial stacking.
+ If `dangles = 3' co-axial stacking is explicitly included for
+ adjacent helices in mutli-loops. The option affects only mfe
+ folding and energy evaluation (`fold()' and `energy_of_struct()'),
+ as well as suboptimal folding (`subopt()') via re-evaluation of
+ energies. Co-axial stacking with one intervening mismatch is not
+ considered so far.
+
+ Default is 1, `pf_fold()' treats 1 as 2.
+
+ -- Variable: char* nonstandards
+ Lists additional base pairs that will be allowed to form in
+ addition to GC, CG, AU, UA, GU and UG. Nonstandard base pairs are
+ given a stacking energy of 0.
+
+ -- Variable: int cut_point
+ To evaluate the energy of a duplex structure (a structure formed
+ by two strands), concatenate the to sequences and set CUT_POINT to
+ the first base of the second strand in the concatenated sequence.
+ Reset to 0 or -1 (the default) to evaluate normal single sequence
+ structures.
+
+ -- Variable: struct bond { int i,j;} base_pair
+ Contains a list of base pairs after a call to `fold()'.
+ `base_pair[0].i' contains the total number of pairs.
+
+ -- Variable: double* pr
+ contains the base pair probability matrix after a call to
+ `pf_fold()'.
+
+ -- Variable: int* iindx
+ index array to move through pr. The probability for base i and j
+ to form a pair is in `pr[iindx[i]-j]'.
+
+ -- Variable: float pf_scale
+ a scaling factor used by `pf_fold()' to avoid overflows. Should be
+ set to approximately exp((-F/kT)/length), where F is an estimate
+ for the ensemble free energy, for example the minimum free energy.
+ You must call `update_pf_params()' or `init_pf_fold()' after
+ changing this parameter. If pf_scale is -1 (the default) , an
+ estimate will be provided automatically when calling
+ `init_pf_fold()' or `update_f_params()'. The automatic estimate is
+ usually insufficient for sequences more than a few hundred bases
+ long.
+
+ -- Variable: int fold_constrained
+ If 1, calculate constrained minimum free energy structures. *Note
+ mfe Fold::, for more information. Default is 0;
+
+ -- Variable: int do_backtrack
+ if 0, do not calculate pair probabilities in `pf_fold()'; this is
+ about twice as fast. Default is 1.
+
+ -- Variable: char backtrack_type
+ only for use by `inverse_fold()'; 'C': force (1,N) to be paired,
+ 'M' fold as if the sequence were inside a multi-loop. Otherwise the
+ usual mfe structure is computed.
+
+ include `fold_vars.h' if you want to change any of these variables
+from their defaults.
+
+\1f
+File: RNAlib.info, Node: Param Files, Prev: Fold Vars, Up: Folding Routines
+
+2.12 Energy Parameter Files
+===========================
+
+A default set of parameters, identical to the one described in `Mathews
+et.al. (1999)', is compiled into the library. Alternately, parameters
+can be read from and written to a file.
+
+ -- Function: void read_parameter_file (const char FNAME[])
+ reads energy parameters from file FNAME. See below for the format
+ of the parameter file.
+
+ -- Function: void write_parameter_file (const char FNAME[])
+ writes current energy parameters to the file FNAME.
+
+ The following describes the file format expected by
+`read_parameter_file()'. All energies should be given as integers in
+units of 0.01kcal/mol.
+
+ Various loop parameters depend in general on the pairs closing the
+loops, as well as unpaired bases in the loops. Internally, the library
+distinguishes 8 types of pairs (CG=1, GC=2, GU=3, UG=4, AU=5, UA=6,
+nonstandard=7, 0= no pair), and 5 types of bases (A=1, C=2, G=3, U=4
+and 0 for anything else). Parameters belonging to pairs of type 0 are
+not listed in the parameter files, but values for nonstandard pairs
+(type 7) and nonstandard bases (type 0) are. Thus, a table for
+symmetric size 2 interior loops would have 7*7*5*5 entries (2 pairs,
+two unpaired bases).
+
+ The order of entries always uses the closing pair or pairs as the
+first indices followed by the unpaired bases in 5' to 3' direction. To
+determine the type of a pair consider the base at the (local) 5' end of
+each strand first, i.e. use the pairs (i,j) and (q,p) for an interior
+loop with i<p<q<j. This is probably better explained by an example.
+Consider the symmetric size 4 interior loop
+ 5'-GAUA-3'
+ 3'-CGCU-5'
+ the first pair is GC, the second UA (not AU!) the unpaired bases are
+(in 5' to 3' direction, starting at the first pair) A U C G. Thus we
+need entry [2,6,1,4,2,3] of the corresponding table. Because the loop is
+symmetric you could equally well describe it by UA GC C G A U, i.e.
+entry [6,2,2,3,1,4]. Be careful to preserve this symmetry when editing
+parameter tables!
+
+ The first line of the file should read
+## RNAfold parameter file
+
+lines of the form
+# token
+mark the beginning of a list of energy parameters of the type specified
+by token. The following tokens are recognized:
+
+# stack_energies
+The list of free energies for stacked pairs, indexed by the two closing
+pairs. The list should be formatted as symmetric an 7*7
+matrix,conforming to the order explained above. As an example the
+stacked pair
+ 5'-GU-3' 5'-AC-3'
+ 3'-CA-5' 3'-UG-5'
+ corresponds to the entry [2,5], which should be identical to [5,2].
+Note that the format has changed from previous releases, to make it
+consistent with other loop parameters.
+
+# stack_enthalpies
+enthalpies for stacked pairs, used to rescale stacking energies to
+temperatures other than 37C. Same format as stack_energies.
+
+# hairpin
+Free energies of hairpin loops as a function of size. The list should
+contain 31 entries on one or more lines. Since the minimum size of a
+hairpin loop is 3 and we start counting with 0, the first three values
+should be INF to indicate a forbidden value.
+
+# bulge
+Free energies of bulge loops. Should contain 31 entries, the first one
+being INF.
+
+# internal_loop
+Free energies of internal loops. Should contain 31 entries, the first 4
+being INF (since smaller loops are tabulated).
+
+# mismatch_interior
+Free energies for the interaction between the closing pair of an
+interior loop and the two unpaired bases adjacent to the helix. This is
+a three dimensional array indexed by the type of the closing pair (see
+above) and the two unpaired bases. Since we distinguish 5 bases the
+list contains 7*5*5 entries and should be formatted either as an 7*25
+matrix or 7 5*5 matrices. The order is such that for example the
+mismatch
+ 5'-CU-3'
+ 3'-GC-5'
+ corresponds to entry [1,4,2] (CG=1, U=4, C=2), (in this notation the
+first index runs from 1 to 7, second and third from 0 to 4)
+
+# mismatch_hairpin
+Same as above for hairpin loops.
+
+# mismatch_enthalpies
+Corresponding enthalpies for rescaling to temperatures other than 37C.
+
+# int11_energies
+Free energies for symmetric size 2 interior loops. 7*7*5*5 entries
+formatted as 49 5*5 matrices, or seven 7*25 matrices. Example:
+ 5'-CUU-3'
+ 3'-GCA-5'
+ corresponds to entry [1,5,4,2], which should be identical to
+[5,1,2,4].
+
+# int21_energies
+Free energies for size 3 (2+1) interior loops. 7*7*5*5*5 entries
+formatted in 5*5 or 5*25 matrices. The strand with a single unpaired
+base is listed first, example:
+ 5'-CU U-3'
+ 3'-GCCA-5'
+ corresponds to entry [1,5,4,2,2].
+
+# int22_energies
+Free energies for symmetric size 4 interior loops. To reduce the size of
+parameter files this table only lists canonical bases (A,C,G,U)
+resulting in a 7*7*4*4*4*4 table. See above for an example.
+
+# dangle5
+Energies for the interaction of an unpaired base on the 5' side and
+adjacent to a helix in multiloops and free ends (the equivalent of
+mismatch energies in interior and hairpin loops). The array is indexed
+by the type of pair closing the helix and the unpaired base and,
+therefore, forms a 8*5 matrix. For example the dangling base in
+ 5'-GC-3'
+ 3'- G-5'
+ corresponds to entry [1,3] (CG=1, G=3);
+
+# dangle3
+Same as above for bases on the 3' side of a helix.
+ 5'- A-3'
+ 3'-AU-5'
+ corresponds to entry [5,1] (AU=5, A=1);
+
+# ML_params
+For the energy of a multi-loop a function of the form `E =
+cu*n_unpaired + ci*loop_degree + cc' is used where n_unpaired is the
+number of unpaired bases in the loop and loop_degree is the number of
+helices forming the loop. In addition a "terminal AU" penalty is
+applied to AU and GU pairs in the loop. The line following the token
+should contain these four values, in the order `cu cc ci termAU'. The
+terminal AU penalty is also used for the exterior loop and size 3
+hairpins, for other loop types it is already included in the mismatch
+energies.
+
+# Tetraloops
+Some tetraloops particularly stable tetraloops are assigned an energy
+bonus. Up to forty tetraloops and their bonus energies can be listed
+following the token, one sequence per line. For example:
+ GAAA -200
+ assigns a bonus energy of -2 kcal/mol to tetraloops containing the
+sequence GAAA.
+
+# END
+Anything beyond this token will be ignored.
+
+ A parameter file need not be complete, it might may contain only a
+subset of interaction parameters, such as only stacking energies.
+However, for each type of interaction listed, all entries have to be
+present. A `*' may be used to indicate entries of a list that are to
+retain their default value. For loop energies a `x' may be used to
+indicate that the value is to be extrapolated from the values for
+smaller loop sizes. Parameter files may contain C-style comments, i.e.
+any text between `/*' and `*/' will be ignored. However, you may have
+no more than one comment per line and no multi-line comments.
+
+ A parameter file listing the default parameter set should accompany
+your distribution as `default.par', the file `old.par' contains
+parameters used in version 1.1b of the Package.
+
+\1f
+File: RNAlib.info, Node: Parsing and Comparing, Next: Utilities, Prev: Folding Routines, Up: Top
+
+3 Parsing and Comparing of Structures
+*************************************
+
+* Menu:
+
+* notations:: Representations of Secondary Structures.
+* Parsing:: Functions for Parsing and Coarse Graining.
+* Distances:: Distances for RNA Secondary Structures
+
+\1f
+File: RNAlib.info, Node: notations, Next: Parsing, Prev: Parsing and Comparing, Up: Parsing and Comparing
+
+3.1 Representations of Secondary Structures
+===========================================
+
+The standard representation of a secondary structure is the "bracket
+notation", where matching brackets symbolize base pairs and unpaired
+bases are shown as dots. Alternatively, one may use two types of node
+labels, 'P' for paired and 'U' for unpaired; a dot is then replaced by
+'(U)', and each closed bracket is assigned an additional identifier 'P'.
+We call this the expanded notation. In `Fontana et al. (1993)' a
+condensed representation of the secondary structure is proposed, the
+so-called homeomorphically irreducible tree (HIT) representation. Here
+a stack is represented as a single pair of matching brackets labeled
+'P' and weighted by the number of base pairs. Correspondingly, a
+contiguous strain of unpaired bases is shown as one pair of matching
+brackets labeled 'U' and weighted by its length. Generally any string
+consisting of matching brackets and identifiers is equivalent to a
+plane tree with as many different types of nodes as there are
+identifiers.
+
+ `Bruce Shapiro (1988)' proposed a coarse grained representation,
+which, does not retain the full information of the secondary structure.
+He represents the different structure elements by single matching
+brackets and labels them as 'H' (hairpin loop), 'I' (interior loop), 'B'
+(bulge), 'M' (multi-loop), and 'S' (stack). We extend his alphabet by an
+extra letter for external elements 'E'. Again these identifiers may be
+followed by a weight corresponding to the number of unpaired bases or
+base pairs in the structure element. All tree representations (except
+for the dot-bracket form) can be encapsulated into a virtual root
+(labeled 'R'), see the example below.
+
+ The following example illustrates the different linear tree
+representations used by the package. All lines show the same secondary
+structure.
+
+ a) .((((..(((...)))..((..)))).)).
+ (U)(((((U)(U)((((U)(U)(U)P)P)P)(U)(U)(((U)(U)P)P)P)P)(U)P)P)(U)
+ b) (U)(((U2)((U3)P3)(U2)((U2)P2)P2)(U)P2)(U)
+ c) (((H)(H)M)B)
+ ((((((H)S)((H)S)M)S)B)S)
+ (((((((H)S)((H)S)M)S)B)S)E)
+ d) ((((((((H3)S3)((H2)S2)M4)S2)B1)S2)E2)R)
+
+ Above: Tree representations of secondary structures. a) Full
+structure: the first line shows the more convenient condensed notation
+which is used by our programs; the second line shows the rather clumsy
+expanded notation for completeness, b) HIT structure, c) different
+versions of coarse grained structures: the second line is exactly
+Shapiro's representation, the first line is obtained by neglecting the
+stems. Since each loop is closed by a unique stem, these two lines are
+equivalent. The third line is an extension taking into account also the
+external digits. d) weighted coarse structure, this time including the
+virtual root.
+
+ For the output of aligned structures from string editing, different
+representations are needed, where we put the label on both sides. The
+above examples for tree representations would then look like:
+
+ a) (UU)(P(P(P(P(UU)(UU)(P(P(P(UU)(UU)(UU)P)P)P)(UU)(UU)(P(P(UU)(U...
+ b) (UU)(P2(P2(U2U2)(P2(U3U3)P3)(U2U2)(P2(U2U2)P2)P2)(UU)P2)(UU)
+ c) (B(M(HH)(HH)M)B)
+ (S(B(S(M(S(HH)S)(S(HH)S)M)S)B)S)
+ (E(S(B(S(M(S(HH)S)(S(HH)S)M)S)B)S)E)
+ d) (R(E2(S2(B1(S2(M4(S3(H3)S3)((H2)S2)M4)S2)B1)S2)E2)R)
+
+ Aligned structures additionally contain the gap character '_'.
+
+\1f
+File: RNAlib.info, Node: Parsing, Next: Distances, Prev: notations, Up: Parsing and Comparing
+
+3.2 Parsing and Coarse Graining of Structures
+=============================================
+
+Several functions are provided for parsing structures and converting to
+different representations.
+
+ -- Function: char* expand_Full (char* FULL)
+ converts the FULL structure from bracket notation to the expanded
+ notation including root.
+
+ -- Function: char* b2HIT (char* FULL)
+ converts the FULL structure from bracket notation to the HIT
+ notation including root.
+
+ -- Function: char* b2C (char* FULL)
+ converts the FULL structure from bracket notation to the a coarse
+ grained notation using the 'H' 'B' 'I' 'M' and 'R' identifiers.
+
+ -- Function: char* b2Shapiro (char* FULL)
+ converts the FULL structure from bracket notation to the
+ _weighted_ coarse grained notation using the 'H' 'B' 'I' 'M' 'S'
+ 'E' and 'R' identifiers.
+
+ -- Function: char* expand_Shapiro (char* COARSE)
+ inserts missing 'S' identifiers in unweighted coarse grained
+ structures as obtained from `b2C()'.
+
+ -- Function: char* add_root (char* ANY)
+ adds a root to an un-rooted tree in any except bracket notation.
+
+ -- Function: char* unexpand_Full (char* EXPANDED)
+ restores the bracket notation from an expanded full or HIT tree,
+ that is any tree using only identifiers 'U' 'P' and 'R'.
+
+ -- Function: char* unweight (char* EXPANDED)
+ strip weights from any weighted tree.
+
+ All the above functions allocate memory for the strings they return.
+
+ -- Function: void unexpand_aligned_F (char* ALIGN[2])
+ converts two aligned structures in expanded notation as produced by
+ `tree_edit_distance()' function back to bracket notation with '_'
+ as the gap character. The result overwrites the input.
+
+ -- Function: void parse_structure (char* FULL)
+ Collects a statistic of structure elements of the FULL structure in
+ bracket notation, writing to the following global variables:
+
+ -- Variable: int loop_size[]
+ contains a list of all loop sizes. `loop_size[0]' contains the
+ number of external bases.
+
+ -- Variable: int loop_degree[]
+ contains the corresponding list of loop degrees.
+
+ -- Variable: int helix_size[]
+ contains a list of all stack sizes.
+
+ -- Variable: int loops
+ contains the number of loops ( and therefore of stacks ).
+
+ -- Variable: int pairs
+ contains the number of base pairs in the last parsed structure.
+
+ -- Variable: int unpaired
+ contains the number of unpaired bases.
+
+ Prototypes for the above functions can be found in `RNAstruct.h'.
+
+\1f
+File: RNAlib.info, Node: Distances, Prev: Parsing, Up: Parsing and Comparing
+
+3.3 Distance Measures
+=====================
+
+* Menu:
+
+* Tree Distances:: Using Tree Edit Distances to Compare Structures
+* String Distances:: Using String Alignment for Comparison
+* Profile Distances:: Comparing Pair probability Matrices
+
+ A simple measure of dissimilarity between secondary structures of
+equal length is the base pair distance, given by the number of pairs
+present in only one of the two structures being compared. I.e. the
+number of base pairs that have to be opened or closed to transform one
+structure into the other. It is therefore particularly useful for
+comparing structures on the same sequence. It is implemented by
+
+ -- Function: int bp_distance (char* S1, char* S2)
+ returns the "base pair" distance between two secondary structures
+ S1 and S2, which should have the same length.
+
+ For other cases a distance measure that allows for gaps is
+preferable. We can define distances between structures as edit
+distances between trees or their string representations. In the case of
+string distances this is the same as "sequence alignment". Given a set
+of edit operations and edit costs, the edit distance is given by the
+minimum sum of the costs along an edit path converting one object into
+the other. Edit distances like these always define a metric. The edit
+operations used by us are insertion, deletion and replacement of nodes.
+String editing does not pay attention to the matching of brackets, while
+in tree editing matching brackets represent a single node of the tree.
+Tree editing is therefore usually preferable, although somewhat slower.
+String edit distances are always smaller or equal to tree edit
+distances.
+
+ The different level of detail in the structure representations
+defined above naturally leads to different measures of distance. For
+full structures we use a cost of 1 for deletion or insertion of an
+unpaired base and 2 for a base pair. Replacing an unpaired base for a
+pair incurs a cost of 1.
+
+ Two cost matrices are provided for coarse grained structures:
+
+ /* Null, H, B, I, M, S, E */
+ { 0, 2, 2, 2, 2, 1, 1}, /* Null replaced */
+ { 2, 0, 2, 2, 2, INF, INF}, /* H replaced */
+ { 2, 2, 0, 1, 2, INF, INF}, /* B replaced */
+ { 2, 2, 1, 0, 2, INF, INF}, /* I replaced */
+ { 2, 2, 2, 2, 0, INF, INF}, /* M replaced */
+ { 1, INF, INF, INF, INF, 0, INF}, /* S replaced */
+ { 1, INF, INF, INF, INF, INF, 0}, /* E replaced */
+
+
+ /* Null, H, B, I, M, S, E */
+ { 0, 100, 5, 5, 75, 5, 5}, /* Null replaced */
+ { 100, 0, 8, 8, 8, INF, INF}, /* H replaced */
+ { 5, 8, 0, 3, 8, INF, INF}, /* B replaced */
+ { 5, 8, 3, 0, 8, INF, INF}, /* I replaced */
+ { 75, 8, 8, 8, 0, INF, INF}, /* M replaced */
+ { 5, INF, INF, INF, INF, 0, INF}, /* S replaced */
+ { 5, INF, INF, INF, INF, INF, 0}, /* E replaced */
+
+ The lower matrix uses the costs given in `Shapiro (1990)'. All
+distance functions use the following global variables:
+
+ -- Variable: int cost_matrix
+ if 0, use the default cost matrix (upper matrix in example);
+ otherwise use Shapiro's costs (lower matrix).
+
+ -- Variable: int edit_backtrack
+ produce an alignment of the two structures being compared by
+ tracing the editing path giving the minimum distance.
+
+ -- Variable: char* aligned_line[2]
+ contains the two aligned structures after a call to one of the
+ distance functions with `edit_backtrack' set to 1. *Note
+ notations::, for details on the representation of structures.
+
+\1f
+File: RNAlib.info, Node: Tree Distances, Next: String Distances, Prev: Distances, Up: Distances
+
+3.3.1 Functions for Tree Edit Distances
+---------------------------------------
+
+ -- Function: Tree* make_tree (char* XSTRUC)
+ constructs a `Tree' ( essentially the postorder list ) of the
+ structure XSTRUC, for use in `tree_edit_distance()'. XSTRUC may
+ be any rooted structure representation.
+
+ -- Function: float tree_edit_distance (Tree* T1, Tree* T2)
+ calculates the edit distance of the two trees T1 and T2.
+
+ -- Function: void free_tree (Tree* T)
+ frees the memory allocated for T.
+
+ Prototypes for the above functions can be found in `treedist.h'. The
+type `Tree' is defined in `dist_vars.h', which is automatically
+included with `treedist.h'
+
+\1f
+File: RNAlib.info, Node: String Distances, Next: Profile Distances, Prev: Tree Distances, Up: Distances
+
+3.3.2 Functions for String Alignment
+------------------------------------
+
+ -- Function: swString* Make_swString (char* XSTRUC)
+ converts the structure XSTRUC into a format suitable for
+ `string_edit_distance()'.
+
+ -- Function: float string_edit_distance (swString* T1, swString* T2)
+ calculates the string edit distance of T1 and T2.
+
+ Prototypes for the above functions can be found in `stringdist.h'.
+
+\1f
+File: RNAlib.info, Node: Profile Distances, Prev: String Distances, Up: Distances
+
+3.3.3 Functions for Comparison of Base Pair Probabilities
+---------------------------------------------------------
+
+For comparison of base pair probability matrices, the matrices are first
+condensed into probability profiles which are the compared by alignment.
+
+ -- Function: float* Make_bp_profile (int LENGTH)
+ reads the base pair probability matrix `pr' (*note Variables: Fold
+ Vars.) and calculates a profile, i.e. a vector containing for each
+ base the probabilities of being unpaired, upstream, or downstream
+ paired, respectively. The returned array is suitable for
+ `profile_edit_distance'.
+
+ -- Function: float profile_edit_distance (float* T1, float* T2)
+ calculates an alignment distance of the two profiles T1 and T2.
+
+ -- Function: void free_profile (float* T)
+ frees the memory allocated for the profile T. Backward
+ compatibility only. You can just use plain `free()'
+
+ Prototypes for the above functions can be found in `profiledist.h'.
+
+\1f
+File: RNAlib.info, Node: Utilities, Next: Example, Prev: Parsing and Comparing, Up: Top
+
+4 Utilities
+***********
+
+The following utilities are used and therefore provided by the library:
+
+ -- Function: int PS_dot_plot (char* SEQUENCE, char* FILENAME)
+ reads base pair probabilities produced by `pf_fold()' from the
+ global array `pr' and the pair list `base_pair' produced by
+ `fold()' and produces a postscript "dot plot" that is written to
+ FILENAME. The "dot plot" represents each base pairing probability
+ by a square of corresponding area in a upper triangle matrix. The
+ lower part of the matrix contains the minimum free energy
+ structure.
+
+ -- Function: int PS_rna_plot (char* SEQUENCE, char* STRUCTURE, char*
+ FILENAME)
+ produces a secondary structure graph in PostScript and writes it to
+ FILENAME. Note that this function has changed from previous
+ versions and now expects the structure to be plotted in
+ dot-bracket notation as an argument. It does not make use of the
+ global `base_pair' array anymore.
+
+ -- Function: int PS_rna_plot_a (char* SEQUENCE, char* STRUCTURE, char*
+ FILENAME, char* PRE, char* POST)
+ Same as `PS_rna_plot' but adds extra PostScript macros for various
+ annotations (see generated PS code). The PRE and POST variables
+ contain PostScript code that is verbatim copied in the resulting
+ PS file just before and after the structure plot.
+
+ -- Function: int svg_RNA_plot (char* SEQUENCE, char* STRUCTURE, char*
+ FILENAME)
+ produces a secondary structure plot in SVG format and writes it to
+ FILENAME.
+
+ -- Function: int gmlRNA (char* SEQUENCE, char* STRUCTURE, char*
+ FILENAME, char OPTION)
+ produces a secondary structure graph in the Graph Meta Language
+ gml and writes it to FILENAME. If `option' is an uppercase letter
+ the `sequence' is used to label nodes, if `option' equals `'X'' or
+ `'x'' the resulting file will coordinates for an initial layout of
+ the graph.
+
+ -- Variable: int rna_plot_type
+ switches between different layout algorithms for drawing secondary
+ structures in `PS_rna_plot' and `gmlRNA'. Current possibility are
+ 0 for a simple radial drawing or 1 for the modified radial drawing
+ taken from the `naview' program of `Bruccoleri & Heinrich (1988)'.
+
+ -- Function: char* random_string (int L, char* SYMBOLS)
+ generates a "random" string of characters from SYMBOLS with length
+ L.
+
+ -- Function: int hamming (char* S1, char* S2)
+ returns the number of positions in which S1 and S2 differ, the so
+ called "Hamming" distance. S1 and S2 should have the same length.
+
+ -- Function: unsigned char* pack_structure (char* STRUC)
+ returns a binary string encoding the secondary structure STRUC
+ using a 5:1 compression scheme. The string is NULL terminated and
+ can therefore be used with standard string functions such as
+ strcmp(). Useful for programs that need to keep many structures
+ in memory.
+
+ -- Function: char* unpack_structure (unsigned char* PACKED)
+ translate a compressed binary string produced by pack_structure()
+ back into the familiar dot bracket notation.
+
+ -- Function: short* make_pair_table (char* STRUCTURE)
+ returns a newly allocated table, such that: table[i]=j if (i.j)
+ pair or 0 if i is unpaired, table[0] contains the length of the
+ STRUCTURE.
+
+ -- Function: char* time_stamp (void)
+ returns a string containing the current date in the format "Fri
+ Mar 19 21:10:57 1993".
+
+ -- Function: void nrerror (char* MESSAGE)
+ writes MESSAGE to stderr and aborts the program.
+
+ -- Function: double urn ()
+ returns a pseudo random number in the range [0..1[, usually
+ implemented by calling `erand48()'.
+
+ -- Variable: unsigned short xsubi[3]
+ is used by `urn ()'. These should be set to some random number
+ seeds before the first call to `urn ()'.
+
+ -- Function: int int_urn (int FROM, int TO)
+ generates a pseudo random integer in the range [FROM, TO].
+
+ -- Function: void* space (unsigned int SIZE)
+ returns a pointer to SIZE bytes of allocated and 0 initialized
+ memory; aborts with an error if memory is not available.
+
+ -- Function: char* get_line (FILE* FP)
+ reads a line of arbitrary length from the stream *FP, and returns
+ a pointer to the resulting string. The necessary memory is
+ allocated and should be released using `free()' when the string is
+ no longer needed.
+
+ Prototypes for `PS_rna_plot()' and `PS_dot_plot()' reside in
+`PS_dot.h', the other functions are declared in `utils.h'.
+
+\1f
+File: RNAlib.info, Node: Example, Next: References, Prev: Utilities, Up: Top
+
+5 A Small Example Program
+*************************
+
+The following program exercises most commonly used functions of the
+library. The program folds two sequences using both the mfe and
+partition function algorithms and calculates the tree edit and profile
+distance of the resulting structures and base pairing probabilities.
+
+ #include <stdio.h>
+ #include <math.h>
+ #include "utils.h"
+ #include "fold_vars.h"
+ #include "fold.h"
+ #include "part_func.h"
+ #include "inverse.h"
+ #include "RNAstruct.h"
+ #include "treedist.h"
+ #include "stringdist.h"
+ #include "profiledist.h"
+
+ void main()
+ {
+ char *seq1="CGCAGGGAUACCCGCG", *seq2="GCGCCCAUAGGGACGC",
+ *struct1,* struct2,* xstruc;
+ float e1, e2, tree_dist, string_dist, profile_dist, kT;
+ Tree *T1, *T2;
+ swString *S1, *S2;
+ float **pf1, **pf2;
+
+ /* fold at 30C instead of the default 37C */
+ temperature = 30.; /* must be set *before* initializing */
+ /* allocate memory for fold(), could be skipped */
+ initialize_fold(strlen(seq1));
+
+ /* allocate memory for structure and fold */
+ struct1 = (char* ) space(sizeof(char)*(strlen(seq1)+1));
+ e1 = fold(seq1, struct1);
+
+ struct2 = (char* ) space(sizeof(char)*(strlen(seq2)+1));
+ e2 = fold(seq2, struct2);
+
+ free_arrays(); /* free arrays used in fold() */
+
+ /* produce tree and string representations for comparison */
+ xstruc = expand_Full(struct1);
+ T1 = make_tree(xstruc);
+ S1 = Make_swString(xstruc);
+ free(xstruc);
+
+ xstruc = expand_Full(struct2);
+ T2 = make_tree(xstruc);
+ S2 = Make_swString(xstruc);
+ free(xstruc);
+
+ /* calculate tree edit distance and aligned structures with gaps */
+ edit_backtrack = 1;
+ tree_dist = tree_edit_distance(T1, T2);
+ free_tree(T1); free_tree(T2);
+ unexpand_aligned_F(aligned_line);
+ printf("%s\n%s %3.2f\n", aligned_line[0], aligned_line[1], tree_dist);
+
+ /* same thing using string edit (alignment) distance */
+ string_dist = string_edit_distance(S1, S2);
+ free(S1); free(S2);
+ printf("%s mfe=%5.2f\n%s mfe=%5.2f dist=%3.2f\n",
+ aligned_line[0], e1, aligned_line[1], e2, string_dist);
+
+ /* for longer sequences one should also set a scaling factor for
+ partition function folding, e.g: */
+ kT = (temperature+273.15)*1.98717/1000.; /* kT in kcal/mol */
+ pf_scale = exp(-e1/kT/strlen(seq1));
+ init_pf_fold(strlen(seq1));
+
+ /* calculate partition function and base pair probabilities */
+ e1 = pf_fold(seq1, struct1);
+ pf1 = Make_bp_profile(strlen(seq1));
+
+ e2 = pf_fold(seq2, struct2);
+ pf2 = Make_bp_profile(strlen(seq2));
+
+ free_pf_arrays(); /* free space allocated for pf_fold() */
+
+ profile_dist = profile_edit_distance(pf1, pf2);
+ printf("%s free energy=%5.2f\n%s free energy=%5.2f dist=%3.2f\n",
+ aligned_line[0], e1, aligned_line[1], e2, profile_dist);
+
+ free_profile(pf1); free_profile(pf2);
+ }
+
+ In a typical Unix environment you would compile this program using:
+`cc -c example.c -IHPATH' and link using `cc -o example example.o
+-LLPATH -lRNA -lm' where HPATH and LPATH point to the location of the
+header files and library, respectively.
+
+\1f
+File: RNAlib.info, Node: References, Next: Function Index, Prev: Example, Up: Top
+
+6 References
+************
+
+ - D.H. Mathews, J. Sabina, M. Zuker and H. Turner (1999)
+ Expanded sequence dependence of thermodynamic parameters provides
+ robust prediction of RNA secondary structure, JMB, 288: 911-940
+
+ - M. Zuker and P. Stiegler (1981)
+ Optimal computer folding of large RNA sequences using
+ thermodynamic and auxiliary information, Nucl Acid Res 9: 133-148
+
+ - D.A. Dimitrov, M.Zuker(2004)
+ Prediction of hybridization and melting for double stranded nucleic
+ acids, Biophysical J. 87: 215-226,
+
+ - J.S. McCaskill (1990)
+ The equilibrium partition function and base pair binding
+ probabilities for RNA secondary structures, Biopolymers 29:
+ 1105-1119
+
+ - D.H. Turner, N. Sugimoto and S.M. Freier (1988)
+ RNA structure prediction, Ann Rev Biophys Biophys Chem 17: 167-192
+
+ - J.A. Jaeger, D.H. Turner and M. Zuker (1989)
+ Improved predictions of secondary structures for RNA, Proc.
+ Natl. Acad. Sci. 86: 7706-7710
+
+ - L. He, R. Kierzek, J. SantaLucia, A.E. Walter and D.H. Turner
+ (1991)
+ Nearest-Neighbor Parameters For GU Mismatches, Biochemistry 30:
+ 11124-11132
+
+ - A.E. Peritz, R. Kierzek, N, Sugimoto, D.H. Turner (1991)
+ Thermodynamic Study of Internal Loops in Oligoribonucleotides ... ,
+ Biochemistry 30: 6428-6435
+
+ - A. Walter, D. Turner, J. Kim, M. Lyttle, P. Mu"ller, D. Mathews
+ and M. Zuker (1994)
+ Coaxial stacking of helices enhances binding of
+ Oligoribonucleotides.., Proc. Natl. Acad. Sci. 91: 9218-9222
+
+ - B.A. Shapiro, (1988)
+ An algorithm for comparing multiple RNA secondary structures,
+ CABIOS 4, 381-393
+
+ - B.A. Shapiro and K. Zhang (1990)
+ Comparing multiple RNA secondary structures using tree comparison,
+ CABIOS 6, 309-318
+
+ - R. Bruccoleri and G. Heinrich (1988)
+ An improved algorithm for nucleic acid secondary structure display,
+ CABIOS 4, 167-173
+
+ - W. Fontana , D.A.M. Konings, P.F. Stadler, P. Schuster (1993)
+ Statistics of RNA secondary structures, Biopolymers 33, 1389-1404
+
+ - W. Fontana, P.F. Stadler, E.G. Bornberg-Bauer, T. Griesmacher, I.L.
+ Hofacker, M. Tacker, P. Tarazona, E.D. Weinberger, P. Schuster
+ (1993)
+ RNA folding and combinatory landscapes, Phys. Rev. E 47: 2083-2099
+
+ - I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker,
+ P. Schuster (1994) Fast Folding and Comparison of RNA
+ Secondary Structures. Monatshefte f. Chemie 125: 167-188
+
+ - I.L. Hofacker (1994) The Rules of the Evolutionary Game for RNA:
+ A Statistical Characterization of the Sequence to Structure
+ Mapping in RNA. PhD Thesis, University of Vienna.
+
+ - I.L. Hofacker, M. Fekete, P.F. Stadler (2002). Secondary
+ Structure Prediction for Aligned RNA Sequences. J. Mol. Biol.
+ 319:1059-1066
+
+ - S.H. Bernhart, I.L. Hofacker, S. Will, A. Gruber, P.F. Stadler
+ (2008).
+ RNAalifold: improved consensus structure prediction for RNA
+ alignments. BMC Bioinformatics, 9:474, 2008.
+
+ - I.L. Hofacker and Peter F. Stadler (2006).
+ Memory efficient folding algorithms for circular RNA secondary
+ structures. Bioinformatics, 22:1172–1176.
+
+ - S.H. Bernhart, H. Tafer, U. Mückstein, Ch. Flamm, P.F. Stadler,
+ I.L. Hofacker (2006). Partition function and base pairing
+ probabilities of RNA heterodimers. Algorithms Mol. Biol., 1:3.
+
+ - S.H. Bernhart, I.L. Hofacker, P.F. Stadler (2006).
+ Local RNA base pairing probabilities in large sequences.
+ Bioinformatics, 22:614–615.
+
+ - U. Mückstein, H. Tafer, J. Hackermüller, S.H. Bernhart, P.F.
+ Stadler, I.L Hofacker (2006). Thermodynamics of RNA-RNA
+ binding. Bioinformatics, 22:1177–1182.
+
+ - I.L. Hofacker, B. Priwitzer, P.F. Stadler (2004).
+ Prediction of locally stable RNA secondary structures for
+ genome-wide surveys. Bioinformatics, 20:186–190.
+
+ - D. Adams (1979)
+ The hitchhiker's guide to the galaxy, Pan Books, London
+
+\1f
+File: RNAlib.info, Node: Function Index, Next: Variable Index, Prev: References, Up: Top
+
+Function Index
+**************
+
+\0\b[index\0\b]
+* Menu:
+
+* *centroid: PF Fold. (line 57)
+* *MEA: PF Fold. (line 67)
+* *pf_interact: Cofolding. (line 183)
+* *pf_unstru: Cofolding. (line 154)
+* add_root: Parsing. (line 31)
+* alifold: Alignment Fold. (line 14)
+* alipf_fold: Alignment Fold. (line 29)
+* b2C: Parsing. (line 18)
+* b2HIT: Parsing. (line 14)
+* b2Shapiro: Parsing. (line 22)
+* bp_distance: Distances. (line 20)
+* co_pf_fold: Cofolding. (line 70)
+* cofold: Cofolding. (line 36)
+* compute_probabilities: Cofolding. (line 109)
+* energy_of_struct: mfe Fold. (line 25)
+* expand_Full: Parsing. (line 10)
+* expand_Shapiro: Parsing. (line 27)
+* fold: mfe Fold. (line 11)
+* free_alifold_arrays: Alignment Fold. (line 20)
+* free_arrays: mfe Fold. (line 36)
+* free_co_arrays: Cofolding. (line 41)
+* free_co_pf_arrays: Cofolding. (line 81)
+* free_interact: Cofolding. (line 211)
+* free_pf_arrays: PF Fold. (line 38)
+* free_pf_arraysLP: Local Fold. (line 49)
+* free_profile: Profile Distances. (line 20)
+* free_pu_contrib: Cofolding. (line 175)
+* free_tree: Tree Distances. (line 15)
+* get_line: Utilities. (line 98)
+* gmlRNA: Utilities. (line 39)
+* hamming: Utilities. (line 56)
+* init_co_pf_fold: Cofolding. (line 85)
+* init_pf_fold: PF Fold. (line 33)
+* init_pf_foldLP: Local Fold. (line 46)
+* initialize_cofold: Cofolding. (line 44)
+* initialize_fold: mfe Fold. (line 31)
+* int_urn: Utilities. (line 91)
+* inverse_fold: Inverse Fold. (line 10)
+* inverse_pf_fold: Inverse Fold. (line 21)
+* Lfold: Local Fold. (line 10)
+* Make_bp_profile: Profile Distances. (line 10)
+* make_pair_table: Utilities. (line 71)
+* Make_swString: String Distances. (line 7)
+* make_tree: Tree Distances. (line 7)
+* mean_bp_dist: PF Fold. (line 46)
+* nrerror: Utilities. (line 80)
+* pack_structure: Utilities. (line 60)
+* parse_structure: Parsing. (line 48)
+* pf_fold: PF Fold. (line 13)
+* plist: Local Fold. (line 18)
+* profile_edit_distance: Profile Distances. (line 17)
+* PS_dot_plot: Utilities. (line 9)
+* PS_rna_plot: Utilities. (line 19)
+* PS_rna_plot_a: Utilities. (line 27)
+* random_string: Utilities. (line 52)
+* read_parameter_file: Param Files. (line 11)
+* space: Utilities. (line 94)
+* string_edit_distance: String Distances. (line 11)
+* struct: Cofolding. (line 93)
+* subopt: Suboptimal folding. (line 10)
+* svg_RNA_plot: Utilities. (line 34)
+* time_stamp: Utilities. (line 76)
+* tree_edit_distance: Tree Distances. (line 12)
+* unexpand_aligned_F: Parsing. (line 43)
+* unexpand_Full: Parsing. (line 34)
+* unpack_structure: Utilities. (line 67)
+* unweight: Parsing. (line 38)
+* update_alifold_params: Alignment Fold. (line 23)
+* update_fold_params: mfe Fold. (line 39)
+* update_pf_params: PF Fold. (line 41)
+* update_pf_paramsLP: Local Fold. (line 52)
+* urn: Utilities. (line 83)
+* write_parameter_file: Param Files. (line 15)
+
+\1f
+File: RNAlib.info, Node: Variable Index, Prev: Function Index, Up: Top
+
+Variable Index
+**************
+
+\0\b[index\0\b]
+* Menu:
+
+* *symbolset: Inverse Fold. (line 29)
+* aligned_line: Distances. (line 76)
+* backtrack_type: Fold Vars. (line 100)
+* base_pair: Fold Vars. (line 69)
+* cost_matrix: Distances. (line 68)
+* cut_point <1>: Fold Vars. (line 62)
+* cut_point: Cofolding. (line 29)
+* cv_fact: Alignment Fold. (line 44)
+* dangles: Fold Vars. (line 36)
+* do_backtrack: Fold Vars. (line 96)
+* edit_backtrack: Distances. (line 72)
+* energy_set: Fold Vars. (line 26)
+* fold_constrained: Fold Vars. (line 92)
+* give_up: Inverse Fold. (line 13)
+* helix_size: Parsing. (line 59)
+* iindx: Fold Vars. (line 77)
+* loop_degree: Parsing. (line 56)
+* loop_size: Parsing. (line 52)
+* loops: Parsing. (line 62)
+* nc_fact: Alignment Fold. (line 48)
+* no_closingGU: Fold Vars. (line 13)
+* noGU: Fold Vars. (line 10)
+* noLonelyPairs: Fold Vars. (line 17)
+* nonstandards: Fold Vars. (line 57)
+* pairs: Parsing. (line 65)
+* pf_scale: Fold Vars. (line 81)
+* pr: Fold Vars. (line 73)
+* rna_plot_type: Utilities. (line 46)
+* temperature: Fold Vars. (line 31)
+* tetra_loop: Fold Vars. (line 22)
+* unpaired: Parsing. (line 68)
+* xsubi: Utilities. (line 87)
+
+
+\1f
+Tag Table:
+Node: Top\7f0
+Node: Introduction\7f602
+Node: Folding Routines\7f1612
+Node: mfe Fold\7f2423
+Node: PF Fold\7f4424
+Node: Inverse Fold\7f8436
+Node: Suboptimal folding\7f10073
+Node: Cofolding\7f10754
+Node: Local Fold\7f22112
+Node: Alignment Fold\7f24820
+Node: Fold Vars\7f27147
+Node: Param Files\7f31766
+Node: Parsing and Comparing\7f39083
+Node: notations\7f39493
+Node: Parsing\7f43016
+Node: Distances\7f45657
+Node: Tree Distances\7f49520
+Node: String Distances\7f50298
+Node: Profile Distances\7f50828
+Node: Utilities\7f51907
+Node: Example\7f56543
+Node: References\7f60052
+Node: Function Index\7f64180
+Node: Variable Index\7f69658
+\1f
+End Tag Table