compiling Pygr on Windows?

4 views
Skip to first unread message

Christopher Lee

unread,
Apr 16, 2009, 5:04:10 PM4/16/09
to Pygr Development Group
Hi Istvan, anyone else,
can you give me some pointers on compiling Pygr on Windows? It looks
like I am able to compile Python extension modules (e.g. from pygr
import cnestedlist raises no errors), but basic POSIX functionality
like fdopen() appears to fail on this platform (Windows XP
Professional, compiling with mingw32; see details below). For example
in pygr.seqfmt.read_fasta_lengths(), fdopen() on a file descriptor
returns NULL, i.e. it fails to open a stream, so SequenceFileDB
objects appear to be empty (it fails to index any of the sequences in
the database). Based on running the test suite, my setup only seems
to run the pure Python parts of Pygr correctly; the extension modules
are not working for me.

If someone has Pygr compiling successfully on Windows, please tell me
how to set that up. Thanks!!!

FYI, here is my setup:
- Windows XP Professional installed in a VMWare Fusion 2.0.4 virtual
machine, running on my Macbook Pro.
- Python 2.5.4 installed via the standard windows installer
- mingw 5.1.4 installed using their automatic installer, including gcc
3 (not gcc 4)
- Pyrex 0.9.8.5
- I used pexports / libtool to replace the libpython25.a in C:
\Python25\libs, as people on the web said was necessary to enable
mingw32 to compile Python extensions. Is that correct? Other pages
seem to say that the gcc 3 in mingw may have problems which might be
related to the fdopen() failure... Another guy on the web says he has
fixed gcc 4 in mingw to compile Python extensions properly... On the
other hand gcc 4 is only available as a mingw alpha release...

Marek seems to be seeing the same problems on his Windows XP PC, using
a similar Python / mingw setup (see his details below).

Yours,

Chris

Begin forwarded message from Marek Szuba:

> On Wed, 15 Apr 2009 19:37:35 -0700
> Christopher Lee <le...@chem.ucla.edu> wrote:
>
>> can you tell me exactly what you had to do get Python extension
>> modules to build properly with mingw? I ran into the whole "Python
>> 2.5 was compiled with MS visual studio..." problem and followed
>> advice on the web to replace libpython25.a with one built using
>> pexports / libtool. Combining that with the --compiler=mingw32 seems
>> to build extension files, which Python can import without complaint.
>> However, I don't think the extensions are working properly -- tones
>> of tests fail.
> I'm actually in a very similar situation as you - the only thing I've
> done differently is the pexports / libtool step, since I read this
> has no longer been necessary since Python 2.5 was released. Result -
> everything builds fine but there are many errors, plus Python crashes
> when I reach NLMSA tests.
>
> --
> MS

Istvan Albert

unread,
Apr 17, 2009, 4:05:20 PM4/17/09
to pygr-dev


On Apr 16, 5:04 pm, Christopher Lee <l...@chem.ucla.edu> wrote:

> can you give me some pointers on compiling Pygr on Windows?  It looks  
> like I am able to compile Python extension modules (e.g. from pygr  
> import cnestedlist raises no errors), but basic POSIX functionality  
> like fdopen() appears to fail on this platform (Windows XP  

I believe that to compile a native Windows application you'll need to
use a Microsoft compiler. Anything else is probably asking for
trouble. The two options that people wanting to use pygr on Windows
could take are :

- installing cygwin and compiling pygr within cygwin as if it were
unix, no changes should be necessary. It should compile fine although
I remember once having trouble with bsdb not being installed properly

- using the Microsoft compiler to create native windows libraries.
This again works and compiles right out of the box with no changes
necessary. The nice thing with that is produces a binary installer
that is really easy to distribute.

The third option that I see often mentioned using mingw32 with various
options, I know very little about it, but what I know is that it is
always some issue/trouble and not working properly. As long as there
are the other two options it does not seem necessary.

And by the way even if pygr would be only be available for python 2.5
(on windows) it is still a fairly good solution. While I hope this
will not rub the linux fans the wrong way, but I'll say that it is far
easier to have multiple pythons on windows than linux. You download it
and double click and you are done. Different pythons go to separate
directories no cross dependencies. This is unlike unix type systems
where keeping various python versions/libraries separate is a little
tricker as as they may depend on other libraries, runtimes etc. With
windows all python dependency libraries are distributed and stored
inside the respective python distribution. So it is a far less
limiting factor.

Istvan

Marek Szuba

unread,
Apr 17, 2009, 6:27:27 PM4/17/09
to pygr...@googlegroups.com
On Fri, 17 Apr 2009 13:05:20 -0700 (PDT)
Istvan Albert <istvan...@gmail.com> wrote:

> I believe that to compile a native Windows application you'll need to
> use a Microsoft compiler.

It's definitely the easiest way, yes. Still, it would be great if we
could get MinGW-based builds working:
- it would greatly simplify Windows building - with Visual Studio you
need 2008 for 2.6 versions, 2003 for 2.5 and 2.4, Bob knows what
version for 2.3... At least we don't actually have to buy all this
software, with most major universities and labs having bulk licences
for MS products;
- I would very much like to get us out of vendor lock-in with
Microsoft, for obvious reasons.

> - installing cygwin and compiling pygr within cygwin as if it were
> unix, no changes should be necessary. It should compile fine although
> I remember once having trouble with bsdb not being installed properly

While on the subject of Cygwin (which, by the way, indeed appears to
work fine) - have you by any chance ever got MySQLdb to work in this
environment? I've briefly tried both ways and it's a mess, definitely
not something I would expect an ordinary Pygr user to deal with:
- a full-Cygwin installation requires compiling MySQL client libraries
from the source, which may or may not work - with the latest version I
had to edit the configure script to override one of the checks, then
copy sys/ttydefaults.h into Cygwin from my Linux box - and is a lot of
a hassle in either case;
- apparently it should be possible, with the usual pexports / libtool
trickery, to build Cygwin MySQLdb against native MySQL - but that
involves even more hacking - the aforementioned trickery aside, the
Win32 installation of MySQL doesn't seem to contain mysql_config which
setup.py tries to use when under Cygwin.

> I'll say that it is far easier to have multiple pythons on windows
> than linux. You download it and double click and you are done.
> Different pythons go to separate directories no cross dependencies.

I don't quite agree here - on one hand it's entirely possible, even if
mostly unnecessary, to install into e.g. /usr/local/pythonX.X under
Linux/Unix, on the other at least starting with 2.4 there are no
cross-dependencies between Python versions even if you compile and
install them by hand rather than using
click-and-you're-done-just-like-under-Windows distribution package
(everything, from executables and the runtime library through the big
Python directory in lib/ to man pages has the version-number postfix,
and even if you have all of them in the search path you can switch
between them simply by calling the correct executable)... However,
that's beside the point Pygr-wise so we needn't pursue the matter
further.

--
MS

Istvan Albert

unread,
Apr 17, 2009, 7:54:52 PM4/17/09
to pygr-dev


On Apr 17, 6:27 pm, Marek Szuba <mare...@gmail.com> wrote:

> Still, it would be great if we
> could get MinGW-based builds working:

Sure, that could be a goal to pursue. Alas I don't have much to
contribute in this area as I am not using the mingw compiler myself.

Istvan

Marek Szuba

unread,
Apr 17, 2009, 8:18:33 PM4/17/09
to pygr...@googlegroups.com
On Fri, 17 Apr 2009 16:54:52 -0700 (PDT)
Istvan Albert <istvan...@gmail.com> wrote:

> > Still, it would be great if we
> > could get MinGW-based builds working:
> Sure, that could be a goal to pursue. Alas I don't have much to
> contribute in this area as I am not using the mingw compiler myself.

I guess we could start by figuring out whether the problem we've
encountered is already known... Chris, could you either send me the
output of your tests (judging from your comments you've looked into
the matter more deeply than just firing runtest) or ask MinGW
people about this yourself? Also, I suppose we should probably get a
copy of Visual Studio 2003 to keep things going while the original
problem is being addressed.

--
MS

C. Titus Brown

unread,
Apr 17, 2009, 9:22:05 PM4/17/09
to pygr...@googlegroups.com
On Fri, Apr 17, 2009 at 03:27:27PM -0700, Marek Szuba wrote:
-> On Fri, 17 Apr 2009 13:05:20 -0700 (PDT)
-> Istvan Albert <istvan...@gmail.com> wrote:
->
-> > I believe that to compile a native Windows application you'll need to
-> > use a Microsoft compiler.
->
-> It's definitely the easiest way, yes. Still, it would be great if we
-> could get MinGW-based builds working:
-> - it would greatly simplify Windows building - with Visual Studio you
-> need 2008 for 2.6 versions, 2003 for 2.5 and 2.4, Bob knows what
-> version for 2.3... At least we don't actually have to buy all this
-> software, with most major universities and labs having bulk licences
-> for MS products;
-> - I would very much like to get us out of vendor lock-in with
-> Microsoft, for obvious reasons.

I don't think that a pygr compiled by mingw will work with the binary
Python downloads from python.org. As such, it's going to be more of a
curiousity than a serious deployment option for people.

--titus

Christopher Lee

unread,
Apr 17, 2009, 10:32:04 PM4/17/09
to pygr...@googlegroups.com

On Apr 17, 2009, at 3:27 PM, Marek Szuba wrote:

> Still, it would be great if we
> could get MinGW-based builds working:
> - it would greatly simplify Windows building - with Visual Studio you
> need 2008 for 2.6 versions, 2003 for 2.5 and 2.4, Bob knows what
> version for 2.3... At least we don't actually have to buy all this
> software, with most major universities and labs having bulk licences
> for MS products;

Istvan, since you're compiling Pygr for Python 2.5, does that mean you
are using Visual Studio 2003? At Amazon, the cost appears to be
$800... is there a less expensive way to jump into this?

-- Chris

Christopher Lee

unread,
Apr 17, 2009, 10:34:29 PM4/17/09
to pygr...@googlegroups.com

On Apr 17, 2009, at 1:05 PM, Istvan Albert wrote:

>>
>
> I believe that to compile a native Windows application you'll need to
> use a Microsoft compiler. Anything else is probably asking for
> trouble. The two options that people wanting to use pygr on Windows
> could take are :
>
> - installing cygwin and compiling pygr within cygwin as if it were
> unix, no changes should be necessary. It should compile fine although
> I remember once having trouble with bsdb not being installed properly
>
> - using the Microsoft compiler to create native windows libraries.
> This again works and compiles right out of the box with no changes
> necessary. The nice thing with that is produces a binary installer
> that is really easy to distribute.

Using these options, do all (or most) of the extension module tests
pass for you? It would be great if you could send an example output
of running the tests on Windows, so I know what I should be aiming
for...

>
>
> The third option that I see often mentioned using mingw32 with various
> options, I know very little about it, but what I know is that it is
> always some issue/trouble and not working properly. As long as there
> are the other two options it does not seem necessary.

OK, this makes sense. We should use the best established tools on the
platform, and mingw seems to be the least well established on Windows,
compared to Microsoft or cygwin...

Thanks for setting me straight on this.

-- Chris

C. Titus Brown

unread,
Apr 17, 2009, 10:42:00 PM4/17/09
to pygr...@googlegroups.com
On Fri, Apr 17, 2009 at 07:32:04PM -0700, Christopher Lee wrote:
-> On Apr 17, 2009, at 3:27 PM, Marek Szuba wrote:
->
-> > Still, it would be great if we

-> > could get MinGW-based builds working:
-> > - it would greatly simplify Windows building - with Visual Studio you
-> > need 2008 for 2.6 versions, 2003 for 2.5 and 2.4, Bob knows what
-> > version for 2.3... At least we don't actually have to buy all this
-> > software, with most major universities and labs having bulk licences
-> > for MS products;
->
-> Istvan, since you're compiling Pygr for Python 2.5, does that mean you
-> are using Visual Studio 2003? At Amazon, the cost appears to be
-> $800... is there a less expensive way to jump into this?

Many universities have free or inexpensive site-wide VS licenses.

I can give you remote access to a Windows machine if you like; it's the
one we're setting up for the Windows buildbot.

Snakebite (snakebite.org) will make VS available to interested
developers, too. But that's a bit in the future.

cheers,
--titus
--
C. Titus Brown, c...@msu.edu

Christopher Lee

unread,
Apr 18, 2009, 12:58:44 AM4/18/09
to pygr...@googlegroups.com

On Apr 17, 2009, at 1:05 PM, Istvan Albert wrote:

>
> - installing cygwin and compiling pygr within cygwin as if it were
> unix, no changes should be necessary. It should compile fine although
> I remember once having trouble with bsdb not being installed properly

I installed cygwin and built Pygr using it without any problems. I
hit two problems in the test suite:

- seqdb_test.SequenceFileDB_Creation_Test hangs because Titus' test
deletes the shelve file from the filesystem using os.unlink() without
first making that the shelve is closed. On windows this leads to
mayhem; trying to open the shelve later hangs. This is a problem with
the testing framework, rather than Pygr itself. But it does highlight
the importance of ensuring that shelve files get closed properly.
Currently we rely on shelve's default behavior (i.e. when the shelve
object is garbage collected, it closes itself). That doesn't seem
wrong, but it creates a risk that users may try to do things like
delete the seqlen file without having first closed the sequence file db?

The only idea that comes to mind is to make explicitly closing a
sequence database object a requirement of the SequenceDB API? i.e.
you must execute db.close() when you are done with the database. Does
that seem like a good requirement to add?

- another hang in DNAAnnotation_Test: looks like this is due to the
same problem. In both cases, a mysterious __db file appears in place
of the expected shelve, in this case __db.tryannot.seqIDdict. My
guess is that since the same test is run 3 times (once each by
metabase_test, pygrdata2_test and pygrdata_test) we're again getting
into some situation where it tries to reopen or recreate a shelve that
is still open by some other object?

Yours,

Chris

C. Titus Brown

unread,
Apr 18, 2009, 9:51:21 AM4/18/09
to pygr...@googlegroups.com
On Fri, Apr 17, 2009 at 09:58:44PM -0700, Christopher Lee wrote:

-> On Apr 17, 2009, at 1:05 PM, Istvan Albert wrote:
->
-> >
-> > - installing cygwin and compiling pygr within cygwin as if it were
-> > unix, no changes should be necessary. It should compile fine although
-> > I remember once having trouble with bsdb not being installed properly
->
-> I installed cygwin and built Pygr using it without any problems. I
-> hit two problems in the test suite:
->
-> - seqdb_test.SequenceFileDB_Creation_Test hangs because Titus' test
-> deletes the shelve file from the filesystem using os.unlink() without
-> first making that the shelve is closed. On windows this leads to
-> mayhem; trying to open the shelve later hangs. This is a problem with
-> the testing framework, rather than Pygr itself. But it does highlight
-> the importance of ensuring that shelve files get closed properly.
-> Currently we rely on shelve's default behavior (i.e. when the shelve
-> object is garbage collected, it closes itself). That doesn't seem
-> wrong, but it creates a risk that users may try to do things like
-> delete the seqlen file without having first closed the sequence file db?
->
-> The only idea that comes to mind is to make explicitly closing a
-> sequence database object a requirement of the SequenceDB API? i.e.
-> you must execute db.close() when you are done with the database. Does
-> that seem like a good requirement to add?

Yes; I thought about suggesting it during my review but couldn't think
of a functional reason for doing so; it just seemed neat. Now we have a
reason, though ;)

(Providing the function is sufficient; I don't think we need to require
that it be called in most circumstances.)

Istvan Albert

unread,
Apr 18, 2009, 2:21:11 PM4/18/09
to pygr-dev


On Apr 17, 10:32 pm, Christopher Lee <l...@chem.ucla.edu> wrote:

> Istvan, since you're compiling Pygr for Python 2.5, does that mean you  
> are using Visual Studio 2003?  At Amazon, the cost appears to be  
> $800... is there a less expensive way to jump into this?

The educational version used to be around 30 dollars when purchased
from the university computer store.

But by today this product is discontinued (got replaced with other
versions) so you'd need to look on other places to get it, maybe ebay.
Or use Titus's buildbot.
Or purchase a license for 2008 but and get the 2003 CD's from us, sort
of a technical workaround ... that keeps the spirit of license.

Istvan

Christopher Lee

unread,
Apr 19, 2009, 1:26:14 AM4/19/09
to pygr...@googlegroups.com

On Apr 18, 2009, at 11:21 AM, Istvan Albert wrote:

>
> But by today this product is discontinued (got replaced with other
> versions) so you'd need to look on other places to get it, maybe ebay.

Yes, this is the problem I was puzzled about. Marek had told me it
was now hard to track down a copy of VS2003, which was used for
compiling the windows versions of python through at least 2.5...

>
> Or use Titus's buildbot.
> Or purchase a license for 2008 but and get the 2003 CD's from us, sort
> of a technical workaround ... that keeps the spirit of license.

If this is an acceptable interpretation of the license, this sounds
like a great solution. I will go ahead and buy a VS license via UCLA.

Thanks!!

Chris

Christopher Lee

unread,
Apr 19, 2009, 1:51:01 AM4/19/09
to pygr...@googlegroups.com

OK, I will go through Pygr and recommend objects that keep a file
open, for adding mandatory close() methods. The obvious examples
include SequenceFileDB, NLMSA, and various shelve based storages like
Graph.

>
>
> (Providing the function is sufficient; I don't think we need to
> require
> that it be called in most circumstances.)

I think the documentation has to say "you should always close() the
object when done with it." It is true that you can probably get away
without that in most cases (especially on UNIX), because the object
will close itself when garbage collected. We've been relying on that
behavior all these years on UNIX without any problem. But mandating
this as standard operating procedure will save some people from
baffling bugs that arise due to some interaction between the order of
deleting a file vs. when garbage collection actually deletes the
object. This seems to matter on Windows, as even our initial testing
of the test suite hits that issue twice.

IMPLEMENTATION
I want to implement this in a way that avoids having to insert code
that checks whether the file or shelve object is valid in each routine
that accesses them (i.e. to catch whether the object has already been
closed, and to raise a clear error message in that case). I propose
to create a descriptor class that only gets accessed if a particular
file attribute is missing. It will have a __set__() method that
allows you to save the file attribute (saves it to the object's
__dict__, so that it will be retrieved in future requests for this
attribute); a __del__() method that deletes it from __dict__; and a
__get__() which will therefore only be accessed if the file attribute
is missing (because it was already closed) and will raise an exception
with a clear explanatory message ("you already closed this object, you
oaf"). We would then add this descriptor as a property to any class
that keeps such an open file object. The file opening methods in the
class __init__ would be unchanged -- they just save the file / shelve
to that attribute as usual. All code accessing the attribute remains
unchanged. The new close() method for the class would close the
shelve, then delete the attribute, exposing this descriptor. Any
future attempt to access the attribute will invoke the descriptor and
produce the appropriate error message.

Does that sound OK to you? Any criticisms or suggestions? I want to
implement a systematic solution with a good error message, because I
have unpleasant memories of tracking down a problem with a prematurely
closed shelve object... When you close() a shelve, what actually
happens is it overwrites its berkeleyDB hash attribute with the value
zero (0)! So any further attempt to look up a key in the shelve gets
a cryptic error message like "int object has no index method"!!!

Marek Szuba

unread,
Apr 20, 2009, 6:25:38 PM4/20/09
to pygr...@googlegroups.com
On Fri, 17 Apr 2009 18:22:05 -0700
"C. Titus Brown" <c...@msu.edu> wrote:

> I don't think that a pygr compiled by mingw will work with the binary
> Python downloads from python.org. As such, it's going to be more of a
> curiousity than a serious deployment option for people.

Hmm, I was convinced that building extensions with MinGW actually makes
them portable... See e.g.:
http://docs.python.org/install/index.html#gnu-c-cygwin-mingw
or
http://boodebr.org/main/python/build-windows-extensions

This is the primary reason why I wanted to use MinGW rather than VS.

--
MS

Namshin Kim

unread,
Apr 20, 2009, 8:40:10 PM4/20/09
to pygr...@googlegroups.com
I am using full Cygwin installation for windows environment. For bsddb issue, I installed bsddb3 and changes a few things in /site-packages directory. I installed MySQL by source package. For MySQLdb installation, I always remember that version of setuptools which automatically installed during MySQLdb won't be compatible with MySQLdb v1.2.2. Thus, I first install *latest* version of setuptools from PyPI and then compile MySQLdb. Windows version of MySQL (not tarball) or other tools may not be compatible with some of python module due to missing components, I think.

Paul Rigor (gmail)

unread,
Apr 20, 2009, 9:16:11 PM4/20/09
to pygr...@googlegroups.com
Microsoft provides Visual C++ express (or Visual Studio Express).  This should come with the necessary CLI tools.

If you're using cygwin, you can use the gnu autotools and modify your build scripts to use M$'s compiler.  I've done this to compile python along with custom c/c++ based projects... It was to port a huge code base from unix->windows.  Why? Don't ask...

Paul
--
Paul Rigor
Graduate Student
Institute for Genomics and Bioinformatics
Donald Bren School of Information and Computer Sciences
University of California in Irvine
248 ICS2 Bldg.
+1 (760) 536 - 6767 (skype)

C. Titus Brown

unread,
Apr 20, 2009, 11:33:18 PM4/20/09
to pygr...@googlegroups.com
On Mon, Apr 20, 2009 at 03:25:38PM -0700, Marek Szuba wrote:
->
-> On Fri, 17 Apr 2009 18:22:05 -0700
-> "C. Titus Brown" <c...@msu.edu> wrote:
->
-> > I don't think that a pygr compiled by mingw will work with the binary
-> > Python downloads from python.org. As such, it's going to be more of a
-> > curiousity than a serious deployment option for people.
-> Hmm, I was convinced that building extensions with MinGW actually makes
-> them portable... See e.g.:
-> http://docs.python.org/install/index.html#gnu-c-cygwin-mingw
-> or
-> http://boodebr.org/main/python/build-windows-extensions
->
-> This is the primary reason why I wanted to use MinGW rather than VS.

OK, good luck! I have to admit to not having the patience to debug
problems on Windows ;)

Paul Rigor (gmail)

unread,
Apr 21, 2009, 12:53:54 AM4/21/09
to pygr...@googlegroups.com
Just use gnu's autotools (primarily 'make') on cygwin which provides a 'familiar' CLI interface to a linux build environment.  You should be  able to compile python (if needed) and pygr against the Windows libs (and python libs) and not cygwin1.lib or mingw.lib. Should be straight forward and just a matter of linking against the correct dynamic lib stuff (".lib" in windows, a.k.a, ".a" for *nix). 

I currently run windows on a virtual machine on my mac.  I don't have a windows machine on hand to play with.  But email me directly with questions and I can help you get started...

Paul

C. Titus Brown

unread,
Apr 21, 2009, 1:46:31 AM4/21/09
to pygr...@googlegroups.com
On Sat, Apr 18, 2009 at 10:51:01PM -0700, Christopher Lee wrote:
-> > -> The only idea that comes to mind is to make explicitly closing a
-> > -> sequence database object a requirement of the SequenceDB API? i.e.
-> > -> you must execute db.close() when you are done with the database.
-> > Does
-> > -> that seem like a good requirement to add?
-> >
-> > Yes; I thought about suggesting it during my review but couldn't think
-> > of a functional reason for doing so; it just seemed neat. Now we
-> > have a
-> > reason, though ;)
->
-> OK, I will go through Pygr and recommend objects that keep a file
-> open, for adding mandatory close() methods. The obvious examples
-> include SequenceFileDB, NLMSA, and various shelve based storages like
-> Graph.

sounds good.

-> > (Providing the function is sufficient; I don't think we need to
-> > require
-> > that it be called in most circumstances.)
->
-> I think the documentation has to say "you should always close() the
-> object when done with it." It is true that you can probably get away
-> without that in most cases (especially on UNIX), because the object
-> will close itself when garbage collected. We've been relying on that
-> behavior all these years on UNIX without any problem. But mandating
-> this as standard operating procedure will save some people from
-> baffling bugs that arise due to some interaction between the order of
-> deleting a file vs. when garbage collection actually deletes the
-> object. This seems to matter on Windows, as even our initial testing
-> of the test suite hits that issue twice.

I disagree; what's wrong with the default Python behavior of closing
things when GCed?! The test environment is special because it re-uses
files, which should not generally be a problem in pygr use.

I'd suggest waiting until we see it become a problem in daily usage. So
far it hasn't been a problem for me, but I'm not using pygr very heavily
at the moment, I guess.

-> IMPLEMENTATION
-> I want to implement this in a way that avoids having to insert code
-> that checks whether the file or shelve object is valid in each routine
-> that accesses them (i.e. to catch whether the object has already been
-> closed, and to raise a clear error message in that case). I propose
-> to create a descriptor class that only gets accessed if a particular
-> file attribute is missing. It will have a __set__() method that
-> allows you to save the file attribute (saves it to the object's
-> __dict__, so that it will be retrieved in future requests for this
-> attribute); a __del__() method that deletes it from __dict__; and a
-> __get__() which will therefore only be accessed if the file attribute
-> is missing (because it was already closed) and will raise an exception
-> with a clear explanatory message ("you already closed this object, you
-> oaf"). We would then add this descriptor as a property to any class
-> that keeps such an open file object. The file opening methods in the
-> class __init__ would be unchanged -- they just save the file / shelve
-> to that attribute as usual. All code accessing the attribute remains
-> unchanged. The new close() method for the class would close the
-> shelve, then delete the attribute, exposing this descriptor. Any
-> future attempt to access the attribute will invoke the descriptor and
-> produce the appropriate error message.
->
-> Does that sound OK to you?

Yes, but I'd like to take a look at an implementation before it gets
checked in, if possible ;)

cheers,

Christopher Lee

unread,
Apr 21, 2009, 3:30:35 AM4/21/09
to pygr...@googlegroups.com

On Apr 20, 2009, at 10:46 PM, C. Titus Brown wrote:

>
> -> I think the documentation has to say "you should always close() the
> -> object when done with it." It is true that you can probably get
> away
> -> without that in most cases (especially on UNIX), because the object
> -> will close itself when garbage collected. We've been relying on
> that
> -> behavior all these years on UNIX without any problem. But
> mandating
> -> this as standard operating procedure will save some people from
> -> baffling bugs that arise due to some interaction between the
> order of
> -> deleting a file vs. when garbage collection actually deletes the
> -> object. This seems to matter on Windows, as even our initial
> testing
> -> of the test suite hits that issue twice.
>
> I disagree; what's wrong with the default Python behavior of closing
> things when GCed?! The test environment is special because it re-uses
> files, which should not generally be a problem in pygr use.

Yes, on UNIX I have never encountered a problem, but on Windows even
our basic test suite has hit this problem twice (testing on Cygwin).
To me that implies that relying on GC behavior is not adequate
protection on the Windows platform, and that Windows users will have
problems. Such problems will probably be subtle and baffling (e.g.
highly order / context sensitive). So you could view this as one of
those "safety first" best-practices that you follow (even if 95% of
the time it's not actually necessary) because of the 5% of cases where
it saves you from real misery. (e.g. I wasted several hours on Friday
tracking down this problem in the test suite, instead of doing
productive work. It seems ironic to me that according to your
proposal this problem would be called "not a Pygr bug" but would
instead be blamed on the user code, in this case the
SequenceFileDB_Creation_Test test cases. And I'm still not even sure
I understand why this bug occurs, i.e. why your "del db" statement
doesn't achieve the equivalent of a db.close()... Is this just a
demonstration that GC timing is unpredictable?)

It seems to me that providing a close() method but then telling people
it's optional sends a mixed message, which I think many people will
just tune out. Python programmers are used to playing fast and loose
with file object closing (I frequently see code like "for line in
file(filename, 'r'): ..."). How many Python programmers rigorously
follow the strict pattern? i.e.

ifile = file(filename, 'r')
try:
do some stuff...
finally: # close ifile no matter what happens above...
ifile.close()

We can get away with playing fast and loose on UNIX, but Windows seems
really finicky about file closing -- a file left unclosed will cause
certain operations with that file to hang / fail etc. This is sad,
because I personally find the auto-close-on-GC pattern very Pythonic,
especially when the file is opened read-only. For one thing, it
avoids the whole question of how an object should behave if the user
tries to use it *after* calling its close() method. Python shelve
objects make a mess out of this issue, by raising a totally
incomprehensible error message in this case.

>
>
> I'd suggest waiting until we see it become a problem in daily
> usage. So
> far it hasn't been a problem for me, but I'm not using pygr very
> heavily
> at the moment, I guess.

Yes, the real question is whether this problem will actually happen in
normal Pygr usage on Windows. Unfortunately, I have zero real usage
experience on Windows.

Can those of you who do real work on Windows give us your opinion? Do
you think this issue matters, on Windows? Should we make close()
mandatory?

>
>
> Yes, but I'd like to take a look at an implementation before it gets
> checked in, if possible ;)

Absolutely.

Yours with thanks,

Chris

Istvan Albert

unread,
Apr 21, 2009, 9:45:22 AM4/21/09
to pygr-dev


On Apr 21, 3:30 am, Christopher Lee <l...@chem.ucla.edu> wrote:

> We can get away with playing fast and loose on UNIX, but Windows seems
> really finicky about file closing

> Can those of you who do real work on Windows give us your opinion?  Do  
> you think this issue matters, on Windows?  Should we make close()  
> mandatory?

Even on windows Python will close the file when you exit the program
or the file handle gets garbage collected. But under windows you
cannot remove a file that is not yet closed, I think that's the main
source of the errors that you are seeing.

I think this is mainly a problem in test setting when you are
repeatedly creating and removing the same files, so calling to a close
() in teardown should solve that.

Regarding the implementation, sounds a little complicated, wouldn't it
better if instead you subclassed the Shelve and trapped the error
message: ""int object has no index method" (that is generated when the
shelve is already closed) with a more friendly error? That way you
don't need to worry about too many other details.

Istvan


C. Titus Brown

unread,
Apr 21, 2009, 9:50:08 AM4/21/09
to pygr...@googlegroups.com
On Tue, Apr 21, 2009 at 12:30:35AM -0700, Christopher Lee wrote:
-> On Apr 20, 2009, at 10:46 PM, C. Titus Brown wrote:
-> > -> I think the documentation has to say "you should always close() the
-> > -> object when done with it." It is true that you can probably get
-> > away
-> > -> without that in most cases (especially on UNIX), because the object
-> > -> will close itself when garbage collected. We've been relying on
-> > that
-> > -> behavior all these years on UNIX without any problem. But
-> > mandating
-> > -> this as standard operating procedure will save some people from
-> > -> baffling bugs that arise due to some interaction between the
-> > order of
-> > -> deleting a file vs. when garbage collection actually deletes the
-> > -> object. This seems to matter on Windows, as even our initial
-> > testing
-> > -> of the test suite hits that issue twice.
-> >
-> > I disagree; what's wrong with the default Python behavior of closing
-> > things when GCed?! The test environment is special because it re-uses
-> > files, which should not generally be a problem in pygr use.
->
-> Yes, on UNIX I have never encountered a problem, but on Windows even
-> our basic test suite has hit this problem twice (testing on Cygwin).
-> To me that implies that relying on GC behavior is not adequate
-> protection on the Windows platform, and that Windows users will have
-> problems. Such problems will probably be subtle and baffling (e.g.
-> highly order / context sensitive). So you could view this as one of
-> those "safety first" best-practices that you follow (even if 95% of
-> the time it's not actually necessary) because of the 5% of cases where
-> it saves you from real misery. (e.g. I wasted several hours on Friday
-> tracking down this problem in the test suite, instead of doing
-> productive work. It seems ironic to me that according to your
-> proposal this problem would be called "not a Pygr bug" but would
-> instead be blamed on the user code, in this case the
-> SequenceFileDB_Creation_Test test cases. And I'm still not even sure
-> I understand why this bug occurs, i.e. why your "del db" statement
-> doesn't achieve the equivalent of a db.close()... Is this just a
-> demonstration that GC timing is unpredictable?)

Yes, I think so.

We've had to force GC in the test suite in other areas, right? I think
the cache test code uses it.

--titus

Christopher Lee

unread,
Apr 22, 2009, 12:37:27 AM4/22/09
to pygr...@googlegroups.com

On Apr 20, 2009, at 10:46 PM, C. Titus Brown wrote:

>
> Yes, but I'd like to take a look at an implementation before it gets
> checked in, if possible ;)

I created a new branch must_close that implements the proposed "guard"
descriptor that raises an error message if the file / shelve has
already been closed. I pushed this branch to your github repository;
see it here:

http://github.com/ctb/pygr/network

-- Chris

Christopher Lee

unread,
Apr 23, 2009, 12:42:44 AM4/23/09
to pygr...@googlegroups.com
On Apr 20, 2009, at 10:46 PM, C. Titus Brown wrote:

> Yes, but I'd like to take a look at an implementation before it gets
> checked in, if possible ;)

Hi Titus,
since this branch (must_close) now passes all tests on Windows /
Cygwin (see my earlier message), I am eager to get your review of this
"open file guard" implementation so we can push it to master, or do
whatever additional work you think is needed. You can get the code
from your github repository; the branch name is must_close.

Thanks!

Chris

Marek Szuba

unread,
May 11, 2009, 8:31:40 PM5/11/09
to pygr...@googlegroups.com
Hello everyone,

While looking into something completely different I noticed today
Pygr still has got problems with paths with spaces in them, at least
under Windows: running blast_test.py yields multiple failures calling
formatdb and blastall, with failures of the former quite clearly
being caused by spaces in paths:

INFO blast.run_formatdb: Building index: [...]
[NULL_Caption] ERROR: Could not open C:\Documents

Warning: caught OSError: [...]

In case of the latter there is no explicit error message, just
OSError output, but judging from how similar OSError output looks
for both cases, spaces aren't escaped there either;

--
MS

Christopher Lee

unread,
May 11, 2009, 9:25:37 PM5/11/09
to pygr...@googlegroups.com
Hi Marek,
this is odd. Assuming you are using a version of Python > 2.3,
FilePopen just uses Python subprocess.Popen, which presumably works on
Windows in a way that correctly supports Windows style paths. So it
is hard to see how this problem could happen. Also, we already showed
that FilePopen works on Windows and correctly handles Windows style
paths, because we use it to invoke the XMLRPC server process in our
tests on Windows...

I'm afraid that to fix this, we'll need a bug report that specifies a
precise sequence of steps to reproduce the bug.

On May 11, 2009, at 5:31 PM, Marek Szuba wrote:

> While looking into something completely different I noticed today
> Pygr still has got problems with paths with spaces in them, at least
> under Windows: running blast_test.py yields multiple failures calling
> formatdb and blastall, with failures of the former quite clearly
> being caused by spaces in paths:

Both run_formatdb() and start_blast() use FilePopen, which in turn
just uses Python's standard library subprocess.Popen. It's puzzling
that subprocess.Popen would screw up Windows paths... We will have to
catch the problem in the act in order to debug this.

>
>
> INFO blast.run_formatdb: Building index: [...]
> [NULL_Caption] ERROR: Could not open C:\Documents

who returns this message? Some part of Pygr? The external program
formatdb?

-- Chris

Namshin Kim

unread,
May 12, 2009, 5:04:36 AM5/12/09
to pygr...@googlegroups.com
Isn't it the formatdb that causes this probem? I think we need to print out format command to see whether there is quotation marks for every path. If not, we need to add them in order to process correctly in windows environment because windows path has a lot of *spaces*. Just a thought.

Istvan Albert

unread,
May 12, 2009, 9:42:54 AM5/12/09
to pygr-dev

On May 12, 5:04 am, Namshin Kim <deepr...@gmail.com> wrote:

> not, we need to add them in order to process correctly in windows
> environment because windows path has a lot of *spaces*. Just a thought.

I think the solution is to discourage people creating spaces in path
names.

This is a problem under Unix just as well, you can create names with
spaces there too, only that under unix we are used to command lines
and we'd have to work a little harder to create such path names.
Whereas when using a graphical file explorer it is easy to create a
seemingly more readable name.

Either way spaces in paths are huge trouble, sooner or later it blows
into one's face anyhow (if not with pygr then some other tool) so IMHO
it is much better to disallow and crash out than to try to bend
backwards to fix them.

Istvan

Marek Szuba

unread,
May 12, 2009, 2:49:31 PM5/12/09
to pygr...@googlegroups.com
On Tue, 12 May 2009 06:42:54 -0700 (PDT)
Istvan Albert <istvan...@gmail.com> wrote:

> I think the solution is to discourage people creating spaces in path
> names.

I don't think this is doable, given even home directories under Windows
contain at least two spaces in the path... Windows may by default allow
any users to create new directories almost all over its discs but IMHO
that's no reason to encourage such bad behaviour.

--
MS

Christopher Lee

unread,
May 12, 2009, 3:04:02 PM5/12/09
to pygr...@googlegroups.com
This problem is a bit disturbing. We are using subprocess.Popen,
which is supposed to handle different platforms (like Windows) and
quoting of arguments correctly -- but apparently something is wrong
either with it or how we're using it. It's worth tracking down a case
where it fails to behave as we expect, not only because we want to
support windows, but also because there is a good chance that such a
problem could occur on other platforms too.

Marek, please file a bug report with detailed steps to reproduce the
problem.

Thanks!

Chris

Marek Szuba

unread,
May 12, 2009, 4:21:57 PM5/12/09
to pygr...@googlegroups.com
On Mon, 11 May 2009 18:25:37 -0700
Christopher Lee <le...@chem.ucla.edu> wrote:

> I'm afraid that to fix this, we'll need a bug report that specifies
> a precise sequence of steps to reproduce the bug.

This is pretty easy to reproduce:

1. Get a Windows machine;
2. Install BLAST on it somewhere and add the relevant path to PATH so
that Pygr can find it;
3. Build Pygr in a directory with spaces in names, e.g. your Windows
home directory;
4. Enter tests/;
5. Make sure no old formatdb output is present;
6. Run blast_test.py.

> > [NULL_Caption] ERROR: Could not open C:\Documents
> who returns this message? Some part of Pygr? The external program
> formatdb?

This one actually comes from formatfb. Moreover, I have run formatdb by
hand with exactly the same options:
- with quotation marks in place I got the aforementioned message;
- without them, I got
[NULL_Caption] ERROR: Arguments must start with '-' (the
offending argument #3 was: 'and')
I also get the 'Arguments must start...' error from blastall if I
omit quotation marks so since we don't see it in test output, the
failure is caused by something else.
This suggests it's formatdb and not Pygr which has got problems with
spaces this time... Whew, false alarm. I've reported the problem to
BLAST developers - but even if they do fix it soon, for the sake of
users of old versions we may want to add a warning about this to either
our BLAST code or our documentation.

--
MS

Christopher Lee

unread,
May 12, 2009, 4:50:27 PM5/12/09
to pygr...@googlegroups.com

On May 12, 2009, at 1:21 PM, Marek Szuba wrote:
>>
>>> [NULL_Caption] ERROR: Could not open C:\Documents
>> who returns this message? Some part of Pygr? The external program
>> formatdb?
> This one actually comes from formatfb. Moreover, I have run formatdb
> by
> hand with exactly the same options:
> - with quotation marks in place I got the aforementioned message;

Are you saying that even with correctly quoted path strings passed to
formatdb, it fails with the above error message? Then it is a NCBI
formatdb bug and there is nothing Pygr can do to make formatdb
properly handle paths containing whitespace...


>
> - without them, I got
> [NULL_Caption] ERROR: Arguments must start with '-' (the
> offending argument #3 was: 'and')
> I also get the 'Arguments must start...' error from blastall if I
> omit quotation marks so since we don't see it in test output, the
> failure is caused by something else.

Sorry, I didn't quite understand. Have you found any way of getting
blastall to run properly on a database path containing whitespace,
running it by hand? I guess if formatdb failed, the necessary blast
index files will be missing and blast will not be able to run anyway.

>
> This suggests it's formatdb and not Pygr which has got problems with
> spaces this time... Whew, false alarm. I've reported the problem to
> BLAST developers - but even if they do fix it soon, for the sake of
> users of old versions we may want to add a warning about this to
> either
> our BLAST code or our documentation.

Perhaps we can do better than that. Pygr has a nice mechanism in
place for specifying a search path in which to find / create blast
index files -- they don't need to be in the same directory as the
original fasta file. I guess the main issue is that we would have to
decide on a standard location to use specifically for windows (with no
whitespace in the path), then create a symbolic link from there to the
fasta file (otherwise we'd have to copy the whole fasta file to the
standard location). And we would probably add support for the user to
specify an environment variable to set the blast index search path.
And we should add some totally clear error messages for this situation
on Windows.

-- Chris

Marek Szuba

unread,
May 14, 2009, 2:16:40 PM5/14/09
to pygr...@googlegroups.com
On Tue, 12 May 2009 13:50:27 -0700
Christopher Lee <le...@chem.ucla.edu> wrote:

> Are you saying that even with correctly quoted path strings passed
> to formatdb, it fails with the above error message? Then it is a
> NCBI formatdb bug and there is nothing Pygr can do to make formatdb
> properly handle paths containing whitespace...

Indeed. Letting the right people know (BTW. I have received information
from BLAST support that my bug report has been passed to developers)
and possibly adding a warning in Pygr appears to exhaust our options.

> Sorry, I didn't quite understand. Have you found any way of getting
> blastall to run properly on a database path containing whitespace,
> running it by hand?

No, I haven't.

> I guess the main issue is that we would have to decide on a standard
> location to use specifically for windows (with no whitespace in the
> path), then create a symbolic link from there to the fasta file

Actually, this is not a Windows-only problem - just before sending in
the bug report I tried this on leelab2, with the same result. The
reason why I spotted it under Windows first is that spaced paths are
more prevalent there...
As for working around it, there is a fairly simple alternative which
would allow us not to pollute the file system with Pygr-specific
working directories (the standard per-user temporary directory wouldn't
do here because they are inside "Documents and Settings" as well): use
relative paths! Directories within the Pygr source tree are under
our control and none of them have got spaces in their names, so having
formatdb and blastall access them as "data/file_to_open" rather than
"/path/to/pygr/tests/data/file_to_open" would render the issue of
spaced paths unimportant. I've checked just to be extra sure, both
formatdb and blastall appear to have no problems with relative paths
for input/output.

--
MS

Reply all
Reply to author
Forward
0 new messages