Installing Bioperl on Windows
luyued 发布于 2011-02-11 17:02 浏览 N 次
Introduction
This installation guide was written by Barry Moore, Nathan Haigh and other Bioperl authors based on the original work of Paul Boutros. The guide was updated for the BioPerl wiki by Chris Fields and Nathan Haigh.
Please report problems and/or fixes to the BioPerl mailing list.
Requirements
NOTE - Only ActivePerl >= 5.8.8.819 is supported by the BioPerl team. Earlier versions may work, but we do not support them. ActivePerl 5.10 also works.
One of the reason for this requirement is that ActivePerl >= 5.8.8.819 now use Perl Package Manager 4 (PPM4). PPM4 is now superior to earlier versions and also includes a Graphical User Interface (GUI). In short, it's easier for us to produce and maintain a package for installation via PPM and also easier for you to do the install! Proceed with earlier versions at your own risk.
To install ActivePerl:
1) Download the ActivePerl MSI from ActiveState
2) Run the ActivePerl Installer (accepting all defaults is fine).
Installation using the Perl Package Manager
GUI Installation
1) Start the Perl Package Manager GUI from the Start menu.
2) Go to Edit >> Preferences and click the Repositories tab. Add a new repository for each of the following (note the difference based on the perl version). NOTE - The DB_File installed with ActivePerl 5.10 and above is a stub (i.e. it does not work). The Trouchelle database below has a working DB_File.
Repositories to add Name perl 5.8 perl 5.10
BioPerl-Regular Releases http://bioperl.org/DIST http://bioperl.org/DIST
BioPerl-Release Candidates http://bioperl.org/DIST/RC http://bioperl.org/DIST/RC
Kobes http://theoryx5.uwinnipeg.ca/ppms http://cpan.uwinnipeg.ca/PPMPackages/10xx/
Bribes http://www.Bribes.org/perl/ppm http://www.Bribes.org/perl/ppm
Trouchelle http://trouchelle.com/ppm http://trouchelle.com/ppm10
tcool http://ppm.tcool.org/archives/ NA
3) Select View >> All Packages.
4) In the search box type bioperl.
5) Right click the latest version of Bioperl available and choose install. (Note for users of previous Bioperl releases: you should not have to use the Bundle-BioPerl package anymore.)
5a) From bioperl 1.5.2 onward, all 'optional' pre-requisites will be marked for installation. If you see that some of them complain about needing a command-line installation (eg. XML::SAX::ExpatXS), and you want those particular pre-requisites, stop now (skip step 6) and see the 'Command-line Installation' section.
6) Click the green arrow (Run marked actions) to complete the installation.
Comand-line Installation
Use the ActiveState ppm-shell:
* Open a cmd window by going to Start >> Run and typing 'cmd' and pressing return.
* Do
C:> ppm-shell
ppm>
* Make sure you have the module PPM-Repositories. Try installing it:
ppm> install PPM-Repositories
* For BioPerl 1.6.1, we require at least the following repositories. You may have some present already.
ppm> repo add http://bioperl.org/DIST
ppm> repo add uwinnipeg
ppm> repo add trouchelle
Because you have installed PPM-Repositories, PPM will know your Perl version, and select the correct repo from the table above.
* Install BioPerl (not "bioperl").
ppm> install BioPerl
If you are running ActiveState Perl 5.10, you may have a glitch involving SOAP::Lite. Use the following workaround:
* Get the index numbers for your active repositories:
ppm> repo
│ id │ pkgs │ name │
│ 1 │ 11431 │ ActiveState Package Repository │
│ 2 │ 14 │ bioperl.org │
│ 3 │ 291 │ uwinnipeg │
│ 4 │ 11755 │ trouchelle │
* Execute the following commands. (The session here is based on the above table. Substitute the correct index numbers for your situation.)
rem -turn off ActiveState, trouchelle repos
ppm> repo off 1
ppm> repo off 4
rem -to get SOAP-Lite-0.69 from uwinnipeg...
ppm> install SOAP-Lite
rem -turn ActiveState, trouchelle back on...
ppm> repo on 1
ppm> repo on 4
rem -now try...
ppm> install BioPerl
Installation using CPAN or manual installation
Installation using PPM is preferred since it is easier, but if you run into problems, or a PPM isn't available for the version/package of BioPerl you want, or you want to choose which optional dependencies to install, you can install manually by downloading the appropriate package or by using CPAN. In fact both methods ultimately need nmake to be installed, CPAN to be upgraded to >= v1.81, Module::Build to be installed (>= v0.2805) and Test::Harness to be upgraded to >= v2.62:
1) Download nmake
2) Double-click to run it, which extracts 3 files. Move both NMAKE.EXE and the NMAKE.ERR files to a place in your PATH; if set up properly, you can move these to your Perl bin directory, normally C:\Perl\bin.
1) Open a cmd window by going to Start >> Run and typing 'cmd' into the box and pressing return.
2) Type 'cpan' to enter the CPAN shell.
3) At the cpan> prompt, type 'install CPAN' to upgrade to the latest version.
4) Quit (by typing 'q') and reload cpan. You may be asked some configuration questions; accepting defaults is fine.
5) At the cpan> prompt, type 'o conf prefer_installer MB' to tell CPAN to prefer to use Build.PL scripts for installation. Type 'o conf commit' to save that choice.
6) At the cpan> prompt, type 'install Module::Build'.
7) At the cpan> prompt, type 'install Test::Harness'.
You can now follow the unix instructions for installing using CPAN, or install manually:
8) Download the .zip version of the package you want.
9) Extract the archive in the normal way.
10) In a cmd window 'cd' to the directory you extracted to. Eg. if you extracted to directory 'Temp', 'cd Temp\bioperl-1.5.2_100'
11) Type 'perl Build.PL' and answer the questions appropriately.
12) Type 'perl Build test'. All the tests should pass, but if they don't let us know. Your usage of BioPerl may not be affected by the failure, so you can choose to continue anyway.
13) Type 'perl Build install' to install BioPerl.
Bioperl
Bioperl is a large collection of Perl modules (extensions to the Perl language) that aid in the task of writing Perl code to deal with sequence data in a myriad of ways. Bioperl provides objects for various types of sequence data and their associated features and annotations. It provides interfaces for analysis of these sequences with a wide variety of external programs (BLAST, FASTA, clustalw and EMBOSS to name just a few). It provides interfaces to various types of databases both remote (GenBank, EMBL etc) and local (MySQL, Flat_databases flat files, GFF etc.) for storage and retrieval of sequences. And finally with its associated documentation and mailing lists, Bioperl represents a community of bioinformatics professionals working in Perl who are committed to supporting both development of Bioperl and the new users who are drawn to the project.
While most bioinformatics and computational biology applications are developed in UNIX/Linux environments, more and more programs are being ported to other operating systems like Windows, and many users (often biologists with little background in programming) are looking for ways to automate bioinformatics analyses in the Windows environment.
Perl and Bioperl can be installed natively on Windows NT/2000/XP. Most of the functionality of Bioperl is available with this type of install. Much of the heavy lifting in bioinformatics is done by programs originally developed in lower level languages like C and Pascal (e.g. BLAST, clustalw, Staden etc). Bioperl simply acts as a wrapper for running and parsing output from these external programs.
Some of those programs (BLAST for example) are ported to Windows. These can be installed and work quite happily with Bioperl in the native Windows environment. Some external programs such as Staden and the EMBOSS suite of programs can only be installed on Windows by using Cygwin and its gcc C compiler (see Bioperl in Cygwin, below). Recent attempts to port EMBOSS to Windows, however, have been mostly successful:
* EMBOSS ftp site
* EMBOSS 2.10
If you have a fairly simple project in mind, want to start using Bioperl quickly, only have access to a computer running Windows, and/or don't mind bumping up against some limitations then Bioperl on Windows may be a good place for you to start. For example, downloading a bunch of sequences from GenBank and sorting out the ones that have a particular annotation or feature works great. Running a bunch of your sequences against remote or local BLAST, parsing the output and storing it in a MySQL database would be fine also.
Be aware that most Bioperl developers are working in some type of a UNIX environment (Linux, OS X, Cygwin). If you have problems with Bioperl that are specific to the Windows environment, you may be blazing new ground and your pleas for help on the Bioperl mailing list may get few responses (you can but try!) - simply because no one knows the answer to your Windows specific problem. If this is or becomes a problem for you then you are better off working in some type of UNIX-like environment. One solution to this problem that will keep you working on a Windows machine it to install Cygwin, a UNIX emulation environment for Windows. A number of Bioperl users are using this approach successfully and it is discussed in more detail below.
Perl on Windows
There are a couple of ways of installing Perl on a Windows machine. The most common and easiest is to get the most recent build from ActiveState, a software company that provides free builds of Perl for Windows users. The current (October 2006) build is ActivePerl 5.8.8.819. Bioperl also works on Perl 5.6.x but due to installation problems etc, only ActivePerl 5.8.8.819 or later is supported. To install ActivePerl on Windows:
1) Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/.
2) Run the ActivePerl Installer (accepting all defaults is fine).
You can also build Perl yourself (which requires a C compiler) or download one of the other binary distributions. The Perl source for building it yourself is available from CPAN, as are a few other binary distributions that are alternatives to ActiveState. This approach is not recommended unless you have specific reasons for doing so and know what you're doing. If that's the case you probably don't need to be reading this guide.
Cygwin is a UNIX emulation environment for Windows and comes with its own copy of Perl.
Information on Cygwin and Bioperl is found below.
Bioperl on Windows
Perl is a programming language that has been extended a lot by the addition of external modules.
These modules work with the core language to extend the functionality of Perl.
Bioperl is one such extension to Perl. These modular extensions to Perl sometimes depend on the functionality of other Perl modules and this creates a dependency. You can't install module X unless you have already installed module Y. Some Perl modules are so fundamentally useful that the Perl developers have included them in the core distribution of Perl - if you've installed Perl then these modules are already installed. Other modules are freely available from CPAN, but you'll have to install them yourself if you want to use them. Bioperl has such dependencies.
Bioperl is actually a large collection of Perl modules (over 1000 currently) and these modules are split into seven packages. These seven packages are:
Bioperl Group Functions
bioperl (the core) Most of the main functionality of Bioperl
bioperl-run Wrappers to a lot of external programs
bioperl-ext Interaction with some alignment functions and the Staden package
bioperl-db Using Bioperl with BioSQL and local relational databases
bioperl-microarray Microarray specific functions
bioperl-pedigree manipulating genotype, marker, and individual data for linkage studies
bioperl-gui Some preliminary work on a graphical user interface to some Bioperl functions
The Bioperl core is what most new users will want to start with. Bioperl (the core) and the Perl modules that it depends on can be easily installed with the perl package Manager PPM. PPM is an ActivePerl utility for installing Perl modules on systems using ActivePerl. PPM will look online (you have to be connected to the internet of course) for files (these files end with .ppd) that tell it how to install the modules you want and what other modules your new modules depends on. It will then download and install your modules and all dependent modules for you.
These .ppd files are stored online in PPM repositories. ActiveState maintains the largest PPM repository and when you installed ActivePerl PPM was installed with directions for using the ActiveState repositories. Unfortunately the ActiveState repositories are far from complete and other ActivePerl users maintain their own PPM repositories to fill in the gaps. Installing will require you to direct PPM to look in three new repositories as detailed in PPM installation guide.
Once PPM knows where to look for Bioperl and its dependencies you simply tell PPM to search for packages with a particular name, select those of interest and then tell PPM to install the selected packages.
Beyond the Core
You may find that you want some of the features of other Bioperl groups like bioperl-ext or bioperl-pipeline. Currently, plans include setting up PPM packages for installing these parts of Bioperl; check this by doing a Bioperl search in PPM. If these are not available, though, you can use the following instructions for installing the other distributions.
For bioperl-run, bioperl-db and bioperl-network v1.5.2 or higher you can use the PPD or CPAN installation instructions above. For other packages you will need nmake (see also the CPAN installation instructions), and a willingness to experiment. You'll have to read the installation documents for each component that you want to install, and use nmake where the instructions call for make, like so:
perl Makefile.PL
nmake
nmake test
nmake install
nmake test will likely produce lots of warnings, many of these can be safely ignored. You will have to determine from the installation documents what dependencies are required, and you will have to get them, read their documentation and install them first. It is recommended that you look through the PPM repositories for any modules before resorting to using nmake as there isn't any guarantee modules built using nmake will work. The details of this are beyond the scope of this guide. Read the documentation. Search Google. Try your best, and if you get stuck consult with others on the BioPerl mailing list.
Setting environment variables
Some modules and tools such as Bio::Tools::Run::StandAloneBlast and clustal_w, require that environment variables are set; a few examples are listed here. Different versions of Windows utilize different methods for setting these variables. NOTE: The instructions that comes with the BLAST executables for setting up BLAST on Windows are out-of-date. Go to the following web address for instructions on setting up standalone BLAST for Windows: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/pc_setup.html
* For Windows XP, go here. This does not require a reboot but all active shells will not reflect any changes made to the environment.
* For older versions (Windows 95 to ME), generally editing the C:\autoexec.bat file to add a variable works. This requires a reboot. Here's an example:
set BLASTDB=C:\blast\data
For either case, you can check the variable this way:
C:\Documents and Settings\Administrator>echo %BLASTDB%
C:\blast\data
Some versions of Windows may have problems differentiating forward and back slashes used for directories. In general, always use backslashes (\). If something isn't working properly try reversing the slashes to see if it helps.
For setting up Cygwin environment variables quirks, see an example below.
Installing bioperl-db
bioperl-db now works for Windows w/o installing CygWin. This has primarily been tested on WinXP using MySQL5, but it is expected that other bioperl-db supported databases (PostgreSQL, Oracle) should work.
You will need Bioperl 1.5.2, a relational database (I use MySQL5 here as an example), and the Perl modules DBI and DBD::mysql, which can be installed from PPM as desribed above (make sure the addidtional repositories for Kobes and Bribes are added, they will have the latest releases). Do NOT try using nmake with these modules as they will not build correctly under Windows! The PPM builds, by Randy Kobes, have been modified and tested specifically for Windows and ActivePerl.
NOTE: we plan on having a PPM for bioperl-db available along with the regular bioperl 1.5.2 release PPM. We will post instructions at that time on using PPM to install bioperl-db.
to begin, follow instructions detailed in the Installation Guide for adding the three new repositories (Bioperl, Kobes and Bribes). Then install the following packages:
1) DBI
2) DBD-mysql
The next step involves creating a database. The following steps are for MySQL5:
>mysqladmin -u root -p create bioseqdb
Enter password: **********
The database needs to be loaded with the BioSQL schema, which can be downloaded as a tarball here.
>mysql -u root -p bioseqdb < biosqldb-mysql.sql
Enter password: **********
Download bioperl-db from the anonymous Git repository. Use the following to install the modules:
perl Makefile.PL
nmake
Now, for testing out bioperl-db, make a copy of the file DBHarness.conf.example in the bioperl-db test subdirectory (bioperl-db\t). Rename it to DBHarness.biosql.conf, and modify it for your database setup (particularly the user, password, database name, and driver). Save the file, change back to the main bioperl-db directory, and run 'nmake test'. You may see lots of the following lines,
....
Subroutine Bio::Annotation::Reference::(eq redefined at C:/Perl/lib/overload.pm line 25,
line 1.
Subroutine new redefined at C:\Perl\src\bioperl\bioperl-live/Bio\Annotation\Reference.pm line 80,
line 1.
....
which can be safely ignored (These come from ActivePerl's excessively paranoid -w flag). All tests should pass. NOTE : tests should be run with a clean database with the BiOSQL schema loaded, but w/o taxonomy loaded (see below).
To install, run:
nmake install
It is recommended that you load the taxonomy database using the script load_ncbi_taxonomy.pl included in biosql-schema\scripts. You will need to download the latest taxonomy files. This can be accomplished using the -download flag in load_ncbi_taxonomy.pl, but it will not 'untar' the file correctly unless you have GNU tar present in your PATH (which most Windows users will not have), thus causing the following error:
>load_ncbi_taxonomy.pl -download -driver mysql -dbname bioseqdb -dbuser root -dbpass **********
The system cannot find the path specified.
Loading NCBI taxon database in taxdata:
... retrieving all taxon nodes in the database
... reading in taxon nodes from nodes.dmp
Couldn't open data file taxdata/nodes.dmp: No such file or directory rollback ineffective with
AutoCommit enabled at C:\Perl\src\bioperl\biosql-schema\scripts\load_ncbi_taxonomy.pl line 818.
Rollback ineffective while AutoCommit is on at
C:\Perl\src\bioperl\biosql-schema\scripts\load_ncbi_taxonomy.pl line 818.
rollback failed: Rollback ineffective while AutoCommit is on
Use a file decompression utility like 7-Zip to 'untar' the files in the folder (if using 7-Zip, this can be accomplished by right-clicking on the file and using the option 'Extract here'). Rerun the script without the -download flag to load the taxonomic information. Be patient, as this can take quite a while:
>load_ncbi_taxonomy.pl -driver mysql -dbname bioseqdb -dbuser root -dbpass **********
Loading NCBI taxon database in taxdata:
... retrieving all taxon nodes in the database
... reading in taxon nodes from nodes.dmp
... insert / update / delete taxon nodes
... (committing nodes)
... rebuilding nested set left/right values
... reading in taxon names from names.dmp
... deleting old taxon names
... inserting new taxon names
... cleaning up
Done.
Now, load the database with your sequences using the script l
This installation guide was written by Barry Moore, Nathan Haigh and other Bioperl authors based on the original work of Paul Boutros. The guide was updated for the BioPerl wiki by Chris Fields and Nathan Haigh.
Please report problems and/or fixes to the BioPerl mailing list.
Requirements
NOTE - Only ActivePerl >= 5.8.8.819 is supported by the BioPerl team. Earlier versions may work, but we do not support them. ActivePerl 5.10 also works.
One of the reason for this requirement is that ActivePerl >= 5.8.8.819 now use Perl Package Manager 4 (PPM4). PPM4 is now superior to earlier versions and also includes a Graphical User Interface (GUI). In short, it's easier for us to produce and maintain a package for installation via PPM and also easier for you to do the install! Proceed with earlier versions at your own risk.
To install ActivePerl:
1) Download the ActivePerl MSI from ActiveState
2) Run the ActivePerl Installer (accepting all defaults is fine).
Installation using the Perl Package Manager
GUI Installation
1) Start the Perl Package Manager GUI from the Start menu.
2) Go to Edit >> Preferences and click the Repositories tab. Add a new repository for each of the following (note the difference based on the perl version). NOTE - The DB_File installed with ActivePerl 5.10 and above is a stub (i.e. it does not work). The Trouchelle database below has a working DB_File.
Repositories to add Name perl 5.8 perl 5.10
BioPerl-Regular Releases http://bioperl.org/DIST http://bioperl.org/DIST
BioPerl-Release Candidates http://bioperl.org/DIST/RC http://bioperl.org/DIST/RC
Kobes http://theoryx5.uwinnipeg.ca/ppms http://cpan.uwinnipeg.ca/PPMPackages/10xx/
Bribes http://www.Bribes.org/perl/ppm http://www.Bribes.org/perl/ppm
Trouchelle http://trouchelle.com/ppm http://trouchelle.com/ppm10
tcool http://ppm.tcool.org/archives/ NA
3) Select View >> All Packages.
4) In the search box type bioperl.
5) Right click the latest version of Bioperl available and choose install. (Note for users of previous Bioperl releases: you should not have to use the Bundle-BioPerl package anymore.)
5a) From bioperl 1.5.2 onward, all 'optional' pre-requisites will be marked for installation. If you see that some of them complain about needing a command-line installation (eg. XML::SAX::ExpatXS), and you want those particular pre-requisites, stop now (skip step 6) and see the 'Command-line Installation' section.
6) Click the green arrow (Run marked actions) to complete the installation.
Comand-line Installation
Use the ActiveState ppm-shell:
* Open a cmd window by going to Start >> Run and typing 'cmd' and pressing return.
* Do
C:> ppm-shell
ppm>
* Make sure you have the module PPM-Repositories. Try installing it:
ppm> install PPM-Repositories
* For BioPerl 1.6.1, we require at least the following repositories. You may have some present already.
ppm> repo add http://bioperl.org/DIST
ppm> repo add uwinnipeg
ppm> repo add trouchelle
Because you have installed PPM-Repositories, PPM will know your Perl version, and select the correct repo from the table above.
* Install BioPerl (not "bioperl").
ppm> install BioPerl
If you are running ActiveState Perl 5.10, you may have a glitch involving SOAP::Lite. Use the following workaround:
* Get the index numbers for your active repositories:
ppm> repo
│ id │ pkgs │ name │
│ 1 │ 11431 │ ActiveState Package Repository │
│ 2 │ 14 │ bioperl.org │
│ 3 │ 291 │ uwinnipeg │
│ 4 │ 11755 │ trouchelle │
* Execute the following commands. (The session here is based on the above table. Substitute the correct index numbers for your situation.)
rem -turn off ActiveState, trouchelle repos
ppm> repo off 1
ppm> repo off 4
rem -to get SOAP-Lite-0.69 from uwinnipeg...
ppm> install SOAP-Lite
rem -turn ActiveState, trouchelle back on...
ppm> repo on 1
ppm> repo on 4
rem -now try...
ppm> install BioPerl
Installation using CPAN or manual installation
Installation using PPM is preferred since it is easier, but if you run into problems, or a PPM isn't available for the version/package of BioPerl you want, or you want to choose which optional dependencies to install, you can install manually by downloading the appropriate package or by using CPAN. In fact both methods ultimately need nmake to be installed, CPAN to be upgraded to >= v1.81, Module::Build to be installed (>= v0.2805) and Test::Harness to be upgraded to >= v2.62:
1) Download nmake
2) Double-click to run it, which extracts 3 files. Move both NMAKE.EXE and the NMAKE.ERR files to a place in your PATH; if set up properly, you can move these to your Perl bin directory, normally C:\Perl\bin.
1) Open a cmd window by going to Start >> Run and typing 'cmd' into the box and pressing return.
2) Type 'cpan' to enter the CPAN shell.
3) At the cpan> prompt, type 'install CPAN' to upgrade to the latest version.
4) Quit (by typing 'q') and reload cpan. You may be asked some configuration questions; accepting defaults is fine.
5) At the cpan> prompt, type 'o conf prefer_installer MB' to tell CPAN to prefer to use Build.PL scripts for installation. Type 'o conf commit' to save that choice.
6) At the cpan> prompt, type 'install Module::Build'.
7) At the cpan> prompt, type 'install Test::Harness'.
You can now follow the unix instructions for installing using CPAN, or install manually:
8) Download the .zip version of the package you want.
9) Extract the archive in the normal way.
10) In a cmd window 'cd' to the directory you extracted to. Eg. if you extracted to directory 'Temp', 'cd Temp\bioperl-1.5.2_100'
11) Type 'perl Build.PL' and answer the questions appropriately.
12) Type 'perl Build test'. All the tests should pass, but if they don't let us know. Your usage of BioPerl may not be affected by the failure, so you can choose to continue anyway.
13) Type 'perl Build install' to install BioPerl.
Bioperl
Bioperl is a large collection of Perl modules (extensions to the Perl language) that aid in the task of writing Perl code to deal with sequence data in a myriad of ways. Bioperl provides objects for various types of sequence data and their associated features and annotations. It provides interfaces for analysis of these sequences with a wide variety of external programs (BLAST, FASTA, clustalw and EMBOSS to name just a few). It provides interfaces to various types of databases both remote (GenBank, EMBL etc) and local (MySQL, Flat_databases flat files, GFF etc.) for storage and retrieval of sequences. And finally with its associated documentation and mailing lists, Bioperl represents a community of bioinformatics professionals working in Perl who are committed to supporting both development of Bioperl and the new users who are drawn to the project.
While most bioinformatics and computational biology applications are developed in UNIX/Linux environments, more and more programs are being ported to other operating systems like Windows, and many users (often biologists with little background in programming) are looking for ways to automate bioinformatics analyses in the Windows environment.
Perl and Bioperl can be installed natively on Windows NT/2000/XP. Most of the functionality of Bioperl is available with this type of install. Much of the heavy lifting in bioinformatics is done by programs originally developed in lower level languages like C and Pascal (e.g. BLAST, clustalw, Staden etc). Bioperl simply acts as a wrapper for running and parsing output from these external programs.
Some of those programs (BLAST for example) are ported to Windows. These can be installed and work quite happily with Bioperl in the native Windows environment. Some external programs such as Staden and the EMBOSS suite of programs can only be installed on Windows by using Cygwin and its gcc C compiler (see Bioperl in Cygwin, below). Recent attempts to port EMBOSS to Windows, however, have been mostly successful:
* EMBOSS ftp site
* EMBOSS 2.10
If you have a fairly simple project in mind, want to start using Bioperl quickly, only have access to a computer running Windows, and/or don't mind bumping up against some limitations then Bioperl on Windows may be a good place for you to start. For example, downloading a bunch of sequences from GenBank and sorting out the ones that have a particular annotation or feature works great. Running a bunch of your sequences against remote or local BLAST, parsing the output and storing it in a MySQL database would be fine also.
Be aware that most Bioperl developers are working in some type of a UNIX environment (Linux, OS X, Cygwin). If you have problems with Bioperl that are specific to the Windows environment, you may be blazing new ground and your pleas for help on the Bioperl mailing list may get few responses (you can but try!) - simply because no one knows the answer to your Windows specific problem. If this is or becomes a problem for you then you are better off working in some type of UNIX-like environment. One solution to this problem that will keep you working on a Windows machine it to install Cygwin, a UNIX emulation environment for Windows. A number of Bioperl users are using this approach successfully and it is discussed in more detail below.
Perl on Windows
There are a couple of ways of installing Perl on a Windows machine. The most common and easiest is to get the most recent build from ActiveState, a software company that provides free builds of Perl for Windows users. The current (October 2006) build is ActivePerl 5.8.8.819. Bioperl also works on Perl 5.6.x but due to installation problems etc, only ActivePerl 5.8.8.819 or later is supported. To install ActivePerl on Windows:
1) Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/.
2) Run the ActivePerl Installer (accepting all defaults is fine).
You can also build Perl yourself (which requires a C compiler) or download one of the other binary distributions. The Perl source for building it yourself is available from CPAN, as are a few other binary distributions that are alternatives to ActiveState. This approach is not recommended unless you have specific reasons for doing so and know what you're doing. If that's the case you probably don't need to be reading this guide.
Cygwin is a UNIX emulation environment for Windows and comes with its own copy of Perl.
Information on Cygwin and Bioperl is found below.
Bioperl on Windows
Perl is a programming language that has been extended a lot by the addition of external modules.
These modules work with the core language to extend the functionality of Perl.
Bioperl is one such extension to Perl. These modular extensions to Perl sometimes depend on the functionality of other Perl modules and this creates a dependency. You can't install module X unless you have already installed module Y. Some Perl modules are so fundamentally useful that the Perl developers have included them in the core distribution of Perl - if you've installed Perl then these modules are already installed. Other modules are freely available from CPAN, but you'll have to install them yourself if you want to use them. Bioperl has such dependencies.
Bioperl is actually a large collection of Perl modules (over 1000 currently) and these modules are split into seven packages. These seven packages are:
Bioperl Group Functions
bioperl (the core) Most of the main functionality of Bioperl
bioperl-run Wrappers to a lot of external programs
bioperl-ext Interaction with some alignment functions and the Staden package
bioperl-db Using Bioperl with BioSQL and local relational databases
bioperl-microarray Microarray specific functions
bioperl-pedigree manipulating genotype, marker, and individual data for linkage studies
bioperl-gui Some preliminary work on a graphical user interface to some Bioperl functions
The Bioperl core is what most new users will want to start with. Bioperl (the core) and the Perl modules that it depends on can be easily installed with the perl package Manager PPM. PPM is an ActivePerl utility for installing Perl modules on systems using ActivePerl. PPM will look online (you have to be connected to the internet of course) for files (these files end with .ppd) that tell it how to install the modules you want and what other modules your new modules depends on. It will then download and install your modules and all dependent modules for you.
These .ppd files are stored online in PPM repositories. ActiveState maintains the largest PPM repository and when you installed ActivePerl PPM was installed with directions for using the ActiveState repositories. Unfortunately the ActiveState repositories are far from complete and other ActivePerl users maintain their own PPM repositories to fill in the gaps. Installing will require you to direct PPM to look in three new repositories as detailed in PPM installation guide.
Once PPM knows where to look for Bioperl and its dependencies you simply tell PPM to search for packages with a particular name, select those of interest and then tell PPM to install the selected packages.
Beyond the Core
You may find that you want some of the features of other Bioperl groups like bioperl-ext or bioperl-pipeline. Currently, plans include setting up PPM packages for installing these parts of Bioperl; check this by doing a Bioperl search in PPM. If these are not available, though, you can use the following instructions for installing the other distributions.
For bioperl-run, bioperl-db and bioperl-network v1.5.2 or higher you can use the PPD or CPAN installation instructions above. For other packages you will need nmake (see also the CPAN installation instructions), and a willingness to experiment. You'll have to read the installation documents for each component that you want to install, and use nmake where the instructions call for make, like so:
perl Makefile.PL
nmake
nmake test
nmake install
nmake test will likely produce lots of warnings, many of these can be safely ignored. You will have to determine from the installation documents what dependencies are required, and you will have to get them, read their documentation and install them first. It is recommended that you look through the PPM repositories for any modules before resorting to using nmake as there isn't any guarantee modules built using nmake will work. The details of this are beyond the scope of this guide. Read the documentation. Search Google. Try your best, and if you get stuck consult with others on the BioPerl mailing list.
Setting environment variables
Some modules and tools such as Bio::Tools::Run::StandAloneBlast and clustal_w, require that environment variables are set; a few examples are listed here. Different versions of Windows utilize different methods for setting these variables. NOTE: The instructions that comes with the BLAST executables for setting up BLAST on Windows are out-of-date. Go to the following web address for instructions on setting up standalone BLAST for Windows: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/pc_setup.html
* For Windows XP, go here. This does not require a reboot but all active shells will not reflect any changes made to the environment.
* For older versions (Windows 95 to ME), generally editing the C:\autoexec.bat file to add a variable works. This requires a reboot. Here's an example:
set BLASTDB=C:\blast\data
For either case, you can check the variable this way:
C:\Documents and Settings\Administrator>echo %BLASTDB%
C:\blast\data
Some versions of Windows may have problems differentiating forward and back slashes used for directories. In general, always use backslashes (\). If something isn't working properly try reversing the slashes to see if it helps.
For setting up Cygwin environment variables quirks, see an example below.
Installing bioperl-db
bioperl-db now works for Windows w/o installing CygWin. This has primarily been tested on WinXP using MySQL5, but it is expected that other bioperl-db supported databases (PostgreSQL, Oracle) should work.
You will need Bioperl 1.5.2, a relational database (I use MySQL5 here as an example), and the Perl modules DBI and DBD::mysql, which can be installed from PPM as desribed above (make sure the addidtional repositories for Kobes and Bribes are added, they will have the latest releases). Do NOT try using nmake with these modules as they will not build correctly under Windows! The PPM builds, by Randy Kobes, have been modified and tested specifically for Windows and ActivePerl.
NOTE: we plan on having a PPM for bioperl-db available along with the regular bioperl 1.5.2 release PPM. We will post instructions at that time on using PPM to install bioperl-db.
to begin, follow instructions detailed in the Installation Guide for adding the three new repositories (Bioperl, Kobes and Bribes). Then install the following packages:
1) DBI
2) DBD-mysql
The next step involves creating a database. The following steps are for MySQL5:
>mysqladmin -u root -p create bioseqdb
Enter password: **********
The database needs to be loaded with the BioSQL schema, which can be downloaded as a tarball here.
>mysql -u root -p bioseqdb < biosqldb-mysql.sql
Enter password: **********
Download bioperl-db from the anonymous Git repository. Use the following to install the modules:
perl Makefile.PL
nmake
Now, for testing out bioperl-db, make a copy of the file DBHarness.conf.example in the bioperl-db test subdirectory (bioperl-db\t). Rename it to DBHarness.biosql.conf, and modify it for your database setup (particularly the user, password, database name, and driver). Save the file, change back to the main bioperl-db directory, and run 'nmake test'. You may see lots of the following lines,
....
Subroutine Bio::Annotation::Reference::(eq redefined at C:/Perl/lib/overload.pm line 25,
Subroutine new redefined at C:\Perl\src\bioperl\bioperl-live/Bio\Annotation\Reference.pm line 80,
....
which can be safely ignored (These come from ActivePerl's excessively paranoid -w flag). All tests should pass. NOTE : tests should be run with a clean database with the BiOSQL schema loaded, but w/o taxonomy loaded (see below).
To install, run:
nmake install
It is recommended that you load the taxonomy database using the script load_ncbi_taxonomy.pl included in biosql-schema\scripts. You will need to download the latest taxonomy files. This can be accomplished using the -download flag in load_ncbi_taxonomy.pl, but it will not 'untar' the file correctly unless you have GNU tar present in your PATH (which most Windows users will not have), thus causing the following error:
>load_ncbi_taxonomy.pl -download -driver mysql -dbname bioseqdb -dbuser root -dbpass **********
The system cannot find the path specified.
Loading NCBI taxon database in taxdata:
... retrieving all taxon nodes in the database
... reading in taxon nodes from nodes.dmp
Couldn't open data file taxdata/nodes.dmp: No such file or directory rollback ineffective with
AutoCommit enabled at C:\Perl\src\bioperl\biosql-schema\scripts\load_ncbi_taxonomy.pl line 818.
Rollback ineffective while AutoCommit is on at
C:\Perl\src\bioperl\biosql-schema\scripts\load_ncbi_taxonomy.pl line 818.
rollback failed: Rollback ineffective while AutoCommit is on
Use a file decompression utility like 7-Zip to 'untar' the files in the folder (if using 7-Zip, this can be accomplished by right-clicking on the file and using the option 'Extract here'). Rerun the script without the -download flag to load the taxonomic information. Be patient, as this can take quite a while:
>load_ncbi_taxonomy.pl -driver mysql -dbname bioseqdb -dbuser root -dbpass **********
Loading NCBI taxon database in taxdata:
... retrieving all taxon nodes in the database
... reading in taxon nodes from nodes.dmp
... insert / update / delete taxon nodes
... (committing nodes)
... rebuilding nested set left/right values
... reading in taxon names from names.dmp
... deleting old taxon names
... inserting new taxon names
... cleaning up
Done.
Now, load the database with your sequences using the script l
上一篇:重庆获批承接东部产业转移 下一篇:[置顶]重庆沿边衔接财物转移教育区失掉发改委批复
相关资讯
- 06-12· QQ留言摘抄,(第一季)
- 05-23· 办公家具维修13146869485办公
- 05-23· 北京办公用品北京办公家
- 05-23· 陶瓷价格指数启动
- 05-23· 卫浴机械设备-浴缸机械设
- 05-23· [转载]老夫子选股法
- 05-21· 小面积的浴室佳选:美国绅
- 05-21· 带按摩浴缸 美国绅士德
- 05-21· 随心所“浴” 带按摩浴缸
- 05-21· 2千元的多功能淋浴房 艾得
图文资讯
最新资讯
- 05-21· 绅士德fj-317蒸汽房评测(图
- 05-21· 加枫淋浴房SV11评测(图)-加
- 05-21· 哈尔滨唯尚摄影个性男写
- 05-21· 我和男友起死回生的爱情
- 05-21· 公司介绍--唯尚北京办事处
- 05-21· 唯尚咖啡里的爱情时光
- 05-21· 《唯尚圣经》:最美紫罗兰
- 05-21· 全家福-《尖刀门市集训营
- 05-21· 唯尚眼镜公司
- 05-21· 不同身形新娘如何选择婚