+++ /dev/null
- GNU GENERAL PUBLIC LICENSE
- Version 2, June 1991
-
- Copyright (C) 1989, 1991 Free Software Foundation, Inc.
- 675 Mass Ave, Cambridge, MA 02139, USA
- Everyone is permitted to copy and distribute verbatim copies
- of this license document, but changing it is not allowed.
-
- Preamble
-
- The licenses for most software are designed to take away your
-freedom to share and change it. By contrast, the GNU General Public
-License is intended to guarantee your freedom to share and change free
-software--to make sure the software is free for all its users. This
-General Public License applies to most of the Free Software
-Foundation's software and to any other program whose authors commit to
-using it. (Some other Free Software Foundation software is covered by
-the GNU Library General Public License instead.) You can apply it to
-your programs, too.
-
- When we speak of free software, we are referring to freedom, not
-price. Our General Public Licenses are designed to make sure that you
-have the freedom to distribute copies of free software (and charge for
-this service if you wish), that you receive source code or can get it
-if you want it, that you can change the software or use pieces of it
-in new free programs; and that you know you can do these things.
-
- To protect your rights, we need to make restrictions that forbid
-anyone to deny you these rights or to ask you to surrender the rights.
-These restrictions translate to certain responsibilities for you if you
-distribute copies of the software, or if you modify it.
-
- For example, if you distribute copies of such a program, whether
-gratis or for a fee, you must give the recipients all the rights that
-you have. You must make sure that they, too, receive or can get the
-source code. And you must show them these terms so they know their
-rights.
-
- We protect your rights with two steps: (1) copyright the software, and
-(2) offer you this license which gives you legal permission to copy,
-distribute and/or modify the software.
-
- Also, for each author's protection and ours, we want to make certain
-that everyone understands that there is no warranty for this free
-software. If the software is modified by someone else and passed on, we
-want its recipients to know that what they have is not the original, so
-that any problems introduced by others will not reflect on the original
-authors' reputations.
-
- Finally, any free program is threatened constantly by software
-patents. We wish to avoid the danger that redistributors of a free
-program will individually obtain patent licenses, in effect making the
-program proprietary. To prevent this, we have made it clear that any
-patent must be licensed for everyone's free use or not licensed at all.
-
- The precise terms and conditions for copying, distribution and
-modification follow.
-\f
- GNU GENERAL PUBLIC LICENSE
- TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
-
- 0. This License applies to any program or other work which contains
-a notice placed by the copyright holder saying it may be distributed
-under the terms of this General Public License. The "Program", below,
-refers to any such program or work, and a "work based on the Program"
-means either the Program or any derivative work under copyright law:
-that is to say, a work containing the Program or a portion of it,
-either verbatim or with modifications and/or translated into another
-language. (Hereinafter, translation is included without limitation in
-the term "modification".) Each licensee is addressed as "you".
-
-Activities other than copying, distribution and modification are not
-covered by this License; they are outside its scope. The act of
-running the Program is not restricted, and the output from the Program
-is covered only if its contents constitute a work based on the
-Program (independent of having been made by running the Program).
-Whether that is true depends on what the Program does.
-
- 1. You may copy and distribute verbatim copies of the Program's
-source code as you receive it, in any medium, provided that you
-conspicuously and appropriately publish on each copy an appropriate
-copyright notice and disclaimer of warranty; keep intact all the
-notices that refer to this License and to the absence of any warranty;
-and give any other recipients of the Program a copy of this License
-along with the Program.
-
-You may charge a fee for the physical act of transferring a copy, and
-you may at your option offer warranty protection in exchange for a fee.
-
- 2. You may modify your copy or copies of the Program or any portion
-of it, thus forming a work based on the Program, and copy and
-distribute such modifications or work under the terms of Section 1
-above, provided that you also meet all of these conditions:
-
- a) You must cause the modified files to carry prominent notices
- stating that you changed the files and the date of any change.
-
- b) You must cause any work that you distribute or publish, that in
- whole or in part contains or is derived from the Program or any
- part thereof, to be licensed as a whole at no charge to all third
- parties under the terms of this License.
-
- c) If the modified program normally reads commands interactively
- when run, you must cause it, when started running for such
- interactive use in the most ordinary way, to print or display an
- announcement including an appropriate copyright notice and a
- notice that there is no warranty (or else, saying that you provide
- a warranty) and that users may redistribute the program under
- these conditions, and telling the user how to view a copy of this
- License. (Exception: if the Program itself is interactive but
- does not normally print such an announcement, your work based on
- the Program is not required to print an announcement.)
-\f
-These requirements apply to the modified work as a whole. If
-identifiable sections of that work are not derived from the Program,
-and can be reasonably considered independent and separate works in
-themselves, then this License, and its terms, do not apply to those
-sections when you distribute them as separate works. But when you
-distribute the same sections as part of a whole which is a work based
-on the Program, the distribution of the whole must be on the terms of
-this License, whose permissions for other licensees extend to the
-entire whole, and thus to each and every part regardless of who wrote it.
-
-Thus, it is not the intent of this section to claim rights or contest
-your rights to work written entirely by you; rather, the intent is to
-exercise the right to control the distribution of derivative or
-collective works based on the Program.
-
-In addition, mere aggregation of another work not based on the Program
-with the Program (or with a work based on the Program) on a volume of
-a storage or distribution medium does not bring the other work under
-the scope of this License.
-
- 3. You may copy and distribute the Program (or a work based on it,
-under Section 2) in object code or executable form under the terms of
-Sections 1 and 2 above provided that you also do one of the following:
-
- a) Accompany it with the complete corresponding machine-readable
- source code, which must be distributed under the terms of Sections
- 1 and 2 above on a medium customarily used for software interchange; or,
-
- b) Accompany it with a written offer, valid for at least three
- years, to give any third party, for a charge no more than your
- cost of physically performing source distribution, a complete
- machine-readable copy of the corresponding source code, to be
- distributed under the terms of Sections 1 and 2 above on a medium
- customarily used for software interchange; or,
-
- c) Accompany it with the information you received as to the offer
- to distribute corresponding source code. (This alternative is
- allowed only for noncommercial distribution and only if you
- received the program in object code or executable form with such
- an offer, in accord with Subsection b above.)
-
-The source code for a work means the preferred form of the work for
-making modifications to it. For an executable work, complete source
-code means all the source code for all modules it contains, plus any
-associated interface definition files, plus the scripts used to
-control compilation and installation of the executable. However, as a
-special exception, the source code distributed need not include
-anything that is normally distributed (in either source or binary
-form) with the major components (compiler, kernel, and so on) of the
-operating system on which the executable runs, unless that component
-itself accompanies the executable.
-
-If distribution of executable or object code is made by offering
-access to copy from a designated place, then offering equivalent
-access to copy the source code from the same place counts as
-distribution of the source code, even though third parties are not
-compelled to copy the source along with the object code.
-\f
- 4. You may not copy, modify, sublicense, or distribute the Program
-except as expressly provided under this License. Any attempt
-otherwise to copy, modify, sublicense or distribute the Program is
-void, and will automatically terminate your rights under this License.
-However, parties who have received copies, or rights, from you under
-this License will not have their licenses terminated so long as such
-parties remain in full compliance.
-
- 5. You are not required to accept this License, since you have not
-signed it. However, nothing else grants you permission to modify or
-distribute the Program or its derivative works. These actions are
-prohibited by law if you do not accept this License. Therefore, by
-modifying or distributing the Program (or any work based on the
-Program), you indicate your acceptance of this License to do so, and
-all its terms and conditions for copying, distributing or modifying
-the Program or works based on it.
-
- 6. Each time you redistribute the Program (or any work based on the
-Program), the recipient automatically receives a license from the
-original licensor to copy, distribute or modify the Program subject to
-these terms and conditions. You may not impose any further
-restrictions on the recipients' exercise of the rights granted herein.
-You are not responsible for enforcing compliance by third parties to
-this License.
-
- 7. If, as a consequence of a court judgment or allegation of patent
-infringement or for any other reason (not limited to patent issues),
-conditions are imposed on you (whether by court order, agreement or
-otherwise) that contradict the conditions of this License, they do not
-excuse you from the conditions of this License. If you cannot
-distribute so as to satisfy simultaneously your obligations under this
-License and any other pertinent obligations, then as a consequence you
-may not distribute the Program at all. For example, if a patent
-license would not permit royalty-free redistribution of the Program by
-all those who receive copies directly or indirectly through you, then
-the only way you could satisfy both it and this License would be to
-refrain entirely from distribution of the Program.
-
-If any portion of this section is held invalid or unenforceable under
-any particular circumstance, the balance of the section is intended to
-apply and the section as a whole is intended to apply in other
-circumstances.
-
-It is not the purpose of this section to induce you to infringe any
-patents or other property right claims or to contest validity of any
-such claims; this section has the sole purpose of protecting the
-integrity of the free software distribution system, which is
-implemented by public license practices. Many people have made
-generous contributions to the wide range of software distributed
-through that system in reliance on consistent application of that
-system; it is up to the author/donor to decide if he or she is willing
-to distribute software through any other system and a licensee cannot
-impose that choice.
-
-This section is intended to make thoroughly clear what is believed to
-be a consequence of the rest of this License.
-\f
- 8. If the distribution and/or use of the Program is restricted in
-certain countries either by patents or by copyrighted interfaces, the
-original copyright holder who places the Program under this License
-may add an explicit geographical distribution limitation excluding
-those countries, so that distribution is permitted only in or among
-countries not thus excluded. In such case, this License incorporates
-the limitation as if written in the body of this License.
-
- 9. The Free Software Foundation may publish revised and/or new versions
-of the General Public License from time to time. Such new versions will
-be similar in spirit to the present version, but may differ in detail to
-address new problems or concerns.
-
-Each version is given a distinguishing version number. If the Program
-specifies a version number of this License which applies to it and "any
-later version", you have the option of following the terms and conditions
-either of that version or of any later version published by the Free
-Software Foundation. If the Program does not specify a version number of
-this License, you may choose any version ever published by the Free Software
-Foundation.
-
- 10. If you wish to incorporate parts of the Program into other free
-programs whose distribution conditions are different, write to the author
-to ask for permission. For software which is copyrighted by the Free
-Software Foundation, write to the Free Software Foundation; we sometimes
-make exceptions for this. Our decision will be guided by the two goals
-of preserving the free status of all derivatives of our free software and
-of promoting the sharing and reuse of software generally.
-
- NO WARRANTY
-
- 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
-FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
-OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
-PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
-OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
-MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
-TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
-PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
-REPAIR OR CORRECTION.
-
- 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
-WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
-REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
-INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
-OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
-TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
-YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
-PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
-POSSIBILITY OF SUCH DAMAGES.
-
- END OF TERMS AND CONDITIONS
-\f
- Appendix: How to Apply These Terms to Your New Programs
-
- If you develop a new program, and you want it to be of the greatest
-possible use to the public, the best way to achieve this is to make it
-free software which everyone can redistribute and change under these terms.
-
- To do so, attach the following notices to the program. It is safest
-to attach them to the start of each source file to most effectively
-convey the exclusion of warranty; and each file should have at least
-the "copyright" line and a pointer to where the full notice is found.
-
- <one line to give the program's name and a brief idea of what it does.>
- Copyright (C) 19yy <name of author>
-
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 2 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program; if not, write to the Free Software
- Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
-
-Also add information on how to contact you by electronic and paper mail.
-
-If the program is interactive, make it output a short notice like this
-when it starts in an interactive mode:
-
- Gnomovision version 69, Copyright (C) 19yy name of author
- Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
- This is free software, and you are welcome to redistribute it
- under certain conditions; type `show c' for details.
-
-The hypothetical commands `show w' and `show c' should show the appropriate
-parts of the General Public License. Of course, the commands you use may
-be called something other than `show w' and `show c'; they could even be
-mouse-clicks or menu items--whatever suits your program.
-
-You should also get your employer (if you work as a programmer) or your
-school, if any, to sign a "copyright disclaimer" for the program, if
-necessary. Here is a sample; alter the names:
-
- Yoyodyne, Inc., hereby disclaims all copyright interest in the program
- `Gnomovision' (which makes passes at compilers) written by James Hacker.
-
- <signature of Ty Coon>, 1 April 1989
- Ty Coon, President of Vice
-
-This General Public License does not permit incorporating your program into
-proprietary programs. If your program is a subroutine library, you may
-consider it more useful to permit linking proprietary applications with the
-library. If this is what you want to do, use the GNU Library General
-Public License instead of this License.
+++ /dev/null
-
-RIO - Phylogenomic Protein Function Analysis
-
-____________________________________________
-
-
-
-
-RIO/FORESTER : http://www.genetics.wustl.edu/eddy/forester/
-RIO webserver: http://www.rio.wustl.edu/
-
-Reference: Zmasek C.M. and Eddy S.R. (2002)
- RIO: Analyzing proteomes by automated phylogenomics using
- resampled inference of orthologs.
- BMC Bioinformatics 3:14
- http://www.biomedcentral.com/1471-2105/3/14/
-
- It is highly recommended that you read this paper before
- installing and/or using RIO. (Included in the RIO
- distribution as PDF: "RIO.pdf".)
-
-
-Preconditions: A Unix system, Java 1.2 or higher, Perl, gcc or cc,
- ... and some experience with Perl and Unix.
-
-
-
-1. Compilation
-______________
-
-
-This describes how to compile the various components of RIO.
-
-
- "gunzip RIO1.x.tar.gz"
-
- "tar -xvf RIO1.x.tar"
-
-
-
-
-in directory "RIO1.x/C":
-
- "make"
-
-
-
-in directory "RIO1.x/hmmer" (version of HMMER is "2.2g"):
-
-(if you already have a local copy of HMMER 2.2g installed, this step
-is not necessary, but in this case you need to change variables "$HMMALIGN",
-"$HMMSEARCH", "$HMMBUILD", "$HMMFETCH", and "$SFE" to point to the
-corresponding HMMER programs)
-
- "./configure"
-
- "make"
-
-
-
-in directory "RIO1.x/java" (requires JDK 1.2 or greater):
-
- "javac forester/tools/*java"
-
- "javac ATVapp.java"
-
-
-
-
-in directory "RIO1.x/puzzle_dqo":
-
- "./configure"
-
- "make"
-
-
-
-in directory "RIO1.x/puzzle_mod":
-
- "./configure"
-
- "make"
-
-
-
-in directory "RIO1.x/phylip_mod/src":
-
- "make install"
-
-
-
-
-2. Setting the variables in "RIO1.x/perl/rio_module.pm"
-_______________________________________________________
-
-
-Most global variables used in "RIO1.x/perl/rio.pl" are set in
-the perl module "RIO1.x/perl/rio_module.pm".
-This module pretty much "controls everything".
-
-It is necessary to set the variables which point to:
-
--- the rio directory itself: $PATH_TO_FORESTER
-
- (example: $PATH_TO_FORESTER = "/home/czmasek/linux/RIO1.1/";)
-
-
--- your Java virtual machine: $JAVA
-
- (example: $JAVA = "/home/czmasek/linux/j2sdk1.4.0/bin/java";)
-
-
--- a directory where temporary files can be created: $TEMP_DIR_DEFAULT
-
-
-
- Example:
- Now that $PATH_TO_FORESTER, $JAVA, $TEMP_DIR_DEFAULT are set,
- it is posssible to run rio.pl based on the example precalculated distances
- in "/example_data/":
-
- % RIO1.1/perl/rio.pl 1 A=aconitase Q=RIO1.1/LEU2_HAEIN N=QUERY_HAEIN O=out0 p I
-
- To use RIO to analyze your protein sequences, please continue setting
- variables and preparing data......
-
-
-
--- your local copy of the Pfam database (see http://pfam.wustl.edu/)
- (if only precalculated distances are being used, these variables do not
- matter):
-
- $PFAM_FULL_DIRECTORY -- the directory containing the "full" alignments
- (Pfam-A.full) see below (3.)
-
- $PFAM_SEED_DIRECTORY -- the directory containing the "seed" alignments
- (Pfam-A.seed) see below (3.)
-
- $PFAM_HMM_DB -- the Pfam HMM library file (Pfam_ls)
- see below (3.)
-
-
--- $TREMBL_ACDEOS_FILE and $SWISSPROT_ACDEOS_FILE: see below (4. and 5.).
-
-
--- list of species (SWISS-PROT codes) which can be analyzed: $SPECIES_NAMES_FILE
- (for most purposes $PATH_TO_FORESTER."data/species/tree_of_life_bin_1-4_species_list"
- should be sufficient, hence this variable does not necessarly need to be changed)
-
-
--- a default species tree in NHX format: $SPECIES_TREE_FILE_DEFAULT
- (for most purposes $PATH_TO_FORESTER."data/species/tree_of_life_bin_1-4.nhx"
- should be sufficient, hence this variable does not necessarly need to be changed)
-
-
--- Only if precalculated distances are being used:
- $MATRIX_FOR_PWD, $RIO_PWD_DIRECTORY, $RIO_BSP_DIRECTORY,
- $RIO_NBD_DIRECTORY, $RIO_ALN_DIRECTORY, and $RIO_HMM_DIRECTORY:
- please see below (6.)
-
-
-
-
-
-
-IMPORTANT: Need to redo steps 3., 4., 5., and 6. if species
- in the master species tree and/or the species list
- are added and/or changed or if a new version of Pfam is used!!
-
-
-
-
-
-3. Downloading and processing of Pfam
-_____________________________________
-
-
-
-Please note: Even if you already have a local copy of the
-Pfam database, you still need to perform steps c. through k.
-
-a. download
- - "Pfam_ls" (PFAM HMM library, glocal alignment models)
- - "Pfam-A.full" (full alignments of the curated families)
- - "Pfam-A.seed" (seed alignments of the curated families)
- [and ideally "prior.tar.gz"]
- from http://pfam.wustl.edu/ or ftp.genetics.wustl.edu/pub/eddy/pfam-x/
-
-b. "gunzip" and "tar -xvf" these downloaded files, if necessary
-
-c. create a new directory named "Full" and move "Pfam-A.full" into it
-
-d. in directory "Full" execute "RIO1.x/perl/pfam2slx.pl Pfam-A.full"
-
-e. set variable $PFAM_FULL_DIRECTORY in "RIO1.x/perl/rio_module.pm"
- to point to this "Full" directory
-
-f. create a new directory named "Seed" and move "Pfam-A.seed" into it
-
-g. in directory "Seed" execute "RIO1.x/perl/pfam2slx.pl Pfam-A.seed"
-
-h. set variable $PFAM_SEED_DIRECTORY in "RIO1.x/perl/rio_module.pm"
- to point to this "Seed" directory
-
-i. execute "RIO1.x/hmmer/binaries/hmmindex Pfam_ls" (in same
- directory as "Pfam_ls") resulting in "Pfam_ls.ssi"
-
-j. set environment variable HMMERDB to point to the directory where
- "Pfam_ls" and "Pfam_ls.ssi" reside
- (for example "setenv HMMERDB /home/czmasek/PFAM7.3/")
-
-k. set variable $PFAM_HMM_DB in "RIO1.x/perl/rio_module.pm"
- to point to the "Pfam_ls" file
- (for example $PFAM_HMM_DB = "/home/czmasek/PFAM7.3/Pfam_ls";)
-
-
-
-
-4. Extraction of ID, DE, and species from a SWISS-PROT sprot.dat file
-_____________________________________________________________________
-
-
-This creates the file from which RIO will get the sequence descriptions for
-sequences from SWISS-PROT.
-(RIO1.x/data/ does not contain an example for this, since SWISS-PROT is
-copyrighted.)
-
-
-a. download SWISS-PROT "sprotXX.dat" from
- "ftp://ca.expasy.org/databases/swiss-prot/release/"
-
-b. "extractSWISS-PROT.pl <infile> <outfile> [species list]"
-
- ("extractSWISS-PROT.pl" is in "RIO1.x/perl")
-
- example:
- "extractSWISS-PROT.pl sprot40.dat sp40_ACDEOS RIO1.x/data/species/tree_of_life_bin_1-4_species_list"
-
-c. the output file should be placed in "RIO1.x/data" and the
- variable $SWISSPROT_ACDEOS_FILE in "RIO1.x/perl/rio_module.pm" should point
- to this output.
-
-
-
-
-5. Extraction of AC, DE, and species from a TrEMBL trembl.dat file
-__________________________________________________________________
-
-
-This creates the file from which RIO will get the sequence descriptions for
-sequences from TrEMBL.
-(RIO1.x/data/ already contains an example: "trembl20_ACDEOS_1-4")
-
-a. download TrEMBL "trembl.dat.gz" from
- "ftp://ca.expasy.org/databases/sp_tr_nrdb/"
-
-b. "gunzip trembl.dat.gz"
-
-c. "extractTrembl.pl <infile> <outfile> [species list]"
-
- ("extractTrembl.pl" is in "RIO1.x/perl")
-
- example:
- "extractTrembl.pl trembl.dat trembl17.7_ACDEOS_1-4 RIO1.x/data/species/tree_of_life_bin_1-4_species_list"
-
-d. the output file should be placed in "RIO1.x/data/" and the
- variable $TREMBL_ACDEOS_FILE in "RIO1.x/perl/rio_module.pm" should point
- to this output.
-
-
-
-Now, you could go to directly to 7. to run the examples......
-
-
-
-6. Precalculation of pairwise distances (optional): pfam2pwd.pl
-_______________________________________________________________
-
-
-This step is of course only necessary if you want to use RIO on
-precalculated pairwise distances. The precalculation is time consuming
-(range of one or two weeks on ten processors).
-It is best to run it on a few machines, dividing up the input data.
-
-The program to do this, is "RIO1.x/perl/pfam2pwd.pl".
-
-Please note: "pfam2pwd.pl" creates a logfile in the same directory
- where is places the pairwise distance output ($MY_RIO_PWD_DIRECTORY).
-
-
-
-The following variables in "RIO1.x/perl/pfam2pwd.pl" need to be set
-("pfam2pwd.pl" gets most of its information from "rio_module.pm"):
-
-
-"$MY_PFAM_FULL_DIRECTORY":
- This is the directory where the Pfam full alignments reside, processed
- as described in 3.a to 3.d.
-
-
-
-"$ALGNS_TO_USE_LIST_FILE":
- If left empty, all alignments in $MY_PFAM_FULL_DIRECTORY are being
- used the calculate pairwise distances from.
- If this points to a file listing names of Pfam alignments,
- only those listed are being used.
- The file can either be a simple new-line deliminated list, or can have
- the same format as the "Summary of changes" list
- ("FI PF03214 RGP NEW SEED HMM_ls HMM_fs FULL DESC")
- which is part of the Pfam distribution.
- One purpose of this is to use the list of "too large" alignments
- in the logfile produced by "pfam2pwd.pl" to run "pfam2pwd.pl" with
- a smaller species list (as can be set with "$MY_SPECIES_NAMES_FILE")
- on large alignments.
-
-
-
-"$MY_SPECIES_NAMES_FILE" -- Dealing with too large alignments:
-
- This is most important. It determines the species whose sequences
- are being used (sequences from species not listed in $MY_SPECIES_NAMES_FILE
- are ignored). Normally, one would use the same list as RIO uses
- ($SPECIES_NAMES_FILE in "rio_module.pm"):
-
- my $MY_SPECIES_NAMES_FILE = $SPECIES_NAMES_FILE;
-
- For certain large families (such as protein kinases, one must use
- a species file which contains less species in order to be able to finish
- the calculations in reasonable time:
-
- my $MY_SPECIES_NAMES_FILE = $PATH_TO_FORESTER."data/tree_of_life_bin_1-4_species_list_NO_RAT_RABBIT_MONKEYS_APES_SHEEP_GOAT_HAMSTER
-
- An additional way to reduce the number of sequences in an alignment is
- to only use sequences originating from SWISS-PROT. This is done by
- placing the following line of code into pfam2pwd.pl:
-
- $TREMBL_ACDEOS_FILE = $PATH_TO_FORESTER."data/NO_TREMBL";
-
-
-
-"$MY_RIO_PWD_DIRECTORY",
-"$MY_RIO_BSP_DIRECTORY",
-"$MY_RIO_NBD_DIRECTORY",
-"$MY_RIO_ALN_DIRECTORY",
-"$MY_RIO_HMM_DIRECTORY":
- These determine where to place the output.
- After all the data has been calculated, the corresponding variables
- in RIO1.x/perl/rio_module.pm ("$RIO_PWD_DIRECTORY", etc.) need to be set
- so that they point to the appropriate values. Having different variables
- allows to precalculate distances and at the same time use RIO on
- previously precalculated distances.
-
-
-
-"$MY_TEMP_DIR":
- A directory to create temporary files in.
-
-
-
-"$MIN_SEQS":
- Alignments in which the number of sequences after pruning (determined
- by "$MY_SPECIES_NAMES_FILE") is lower than $MIN_SEQS, are ignored
- (no calculation of pwds).
-
-
-
-"$MAX_SEQS":
- Alignments in which the number of sequences after pruning (determined
- by "$MY_SPECIES_NAMES_FILE") is greater than $MAX_SEQS, are ignored
- (no calculation of pwds).
-
-
-
-"$MY_SEED":
- Seed for the random number generator for bootstrapping (must be 4n+1).
-
-
-
-"$MY_MATRIX":
- This is used to choose the model to be used for the (ML)
- distance calculation:
- 0 = JTT
- 2 = BLOSUM 62
- 3 = mtREV24
- 5 = VT
- 6 = WAG
- PAM otherwise
- After all the data has been calculated, variable "$MATRIX_FOR_PWD"
- in RIO1.x/perl/rio_module.pm needs to be set to the same value.
-
-
-
-Once pairwise distances are calculated, the following variables in
-"rio_module.pm" need to be set accordingly:
-$MATRIX_FOR_PWD : corresponds to $MY_MATRIX in pfam2pwd.pl
-$RIO_PWD_DIRECTORY : corresponds to $MY_RIO_PWD_DIRECTORY in pfam2pwd.pl
-$RIO_BSP_DIRECTORY : corresponds to $MY_RIO_BSP_DIRECTORY in pfam2pwd.pl
-$RIO_NBD_DIRECTORY : corresponds to $MY_RIO_NBD_DIRECTORY in pfam2pwd.pl
-$RIO_ALN_DIRECTORY : corresponds to $MY_RIO_ALN_DIRECTORY in pfam2pwd.pl
-$RIO_HMM_DIRECTORY : corresponds to $MY_RIO_HMM_DIRECTORY in pfam2pwd.pl
-...of course, if Pfam has been updated, the corresponding variables in rio_module.pm
-($PFAM_FULL_DIRECTORY, etc.) need to be updated, too.
-
-
-
-
-
-
-IMPORTANT: Need to redo steps 3., 4., 5., and 6. if species
- in the master species tree and/or the species list
- are added and/or changed or if a new version of Pfam is used!
-
-
-
-
-7. Example of a phylogenomic analysis using "rio.pl"
-____________________________________________________
-
-
-Without using precalculated distances (for this, all the variables above
-need to point to the correct loctions, in particular to your local and processed
-Pfam database):
-
- % RIO1.1/perl/rio.pl 3 A=/path/to/my/pfam/Full/aconitase H=aconitase Q=RIO1.1/LEU2_HAEIN N=QUERY_HAEIN O=out3 p I C E
-
-
-
-Without using precalculated distances (for this, all the variables above
-need to point to the correct loctions, in particular to your local and processed
-Pfam database) using a query sequence which is already in the alignment:
-
- % RIO1.1/perl/rio.pl 4 A=/path/to/my/pfam/Full/aconitase N=LEU2_LACLA/5-449 O=out4 p I C E
-
-
-
-Using the example precalculated distances in "/example_data/"
-($RIO_PWD_DIRECTORY, etc. need to point to $PATH_TO_FORESTER."example_data/"):
-
- % RIO1.1/perl/rio.pl 1 A=aconitase Q=RIO1.1/LEU2_HAEIN N=QUERY_HAEIN O=out1 p I C E
-
-
-
-Using a query sequence which is already in the precalculated distances in "/example_data/"
-($RIO_PWD_DIRECTORY, etc. need to point to $PATH_TO_FORESTER."example_data/"):
-
- % RIO1.1/perl/rio.pl 2 A=aconitase N=LEU2_LACLA/5-449 O=out2 p I C E
-
-
-
-for detailed instructions on how to use rio.pl see the source code,
-or type "rio.pl" without any arguments
-
-
-
-
-Christian Zmasek
-zmasek@genetics.wustl.edu
-05/26/02
-