Compiling Python 2.7 extensions with VS 2015

390 views
Skip to first unread message

Matthew Brett

unread,
Mar 19, 2016, 4:12:00 PM3/19/16
to mingwpy
Hi,

I just saw this:

http://pybind11.readthedocs.org/en/latest/faq.html#working-with-ancient-visual-studio-2009-builds-on-windows

<quote>
The official Windows distributions of Python are compiled using truly
ancient versions of Visual Studio that lack good C++11 support. Some
users implicitly assume that it would be impossible to load a plugin
built with Visual Studio 2015 into a Python distribution that was
compiled using Visual Studio 2009. However, no such issue exists: it’s
perfectly legitimate to interface DLLs that are built with different
compilers and/or C libraries. Common gotchas to watch out for involve
not free()-ing memory region that that were malloc()-ed in another
shared library, using data structures with incompatible ABIs, and so
on. pybind11 is very careful not to make these types of mistakes.
</quote>

Superficially, that seems to contradict our previous interpretation of
this : https://msdn.microsoft.com/en-us/library/ms235460.aspx

So, who is right? What do you think the pybind11 folks mean by
"pybind11 is very careful not to make these types of mistakes."?

Cheers,

Matthew

Nathaniel Smith

unread,
Mar 19, 2016, 4:41:37 PM3/19/16
to mingwpy

No, that's correct-- it is possible to have multiple CRTs in the same process, if you are very careful to never let them see each other. So this means never calling malloc() in one dll and free() in another, and never calling open() in one dll and read() in another, etc. etc. It's very easy to screw this up, and actually impossible to get right for numpy (specifically when doing file io -- python assumes that file descriptor numbers are global across the process, so numpy needs to call read() on descriptors that were open()ed by python). So as a general solution for the masses, the only reasonable strategy is the one we're taking with mingwpy of making sure the compilers are just compatible in general. But for special cases if you're very careful, understand the issues, and are prepared to audit and debug all the code you're building, then mixing CRTs is technically possible.

One case where I guess we might be being over-careful is BLAS libraries, which have a narrow enough api that they might be immune to these issues... But since we have to fix the compiler anyway for numpy and scipy and everything, then it's a lot easier and less error prone to just use the fixed compiler for BLAS too.

-n

Matthew Brett

unread,
Mar 19, 2016, 4:53:40 PM3/19/16
to mingwpy
What is a 'file descriptor' in Windows? Does this differ between VS runtimes?

I guess the other big issue is where an extension calls malloc and
Python / another extension frees the memory. Is that a common
pattern, that extension code will create memory using its own malloc,
and expect Python / other extension code to free it?

Cheers,

Matthew

carlkl

unread,
Mar 19, 2016, 5:01:48 PM3/19/16
to mingwpy
If you don't ever mix file descriptors (file streams) of different runtimes and ensure that memory management isn't done in both runtimes mixing of runtimes can be done. As file streams are very commonly used across DLL boundaries, this cries for problems.

Accelerated BLAS libraries are known to work. Scipy may also work, as the old mingw32 superpacks always links to msvcrt instead of the CPython C-runtimes.

VS 2015 has changed the semantics of file streams. A VS2015 compatible mingwpy has to be recompiled and will need some changes in the mingw-crt header files as well.

Carl

Carl Kleffner

unread,
Mar 19, 2016, 5:05:21 PM3/19/16
to min...@googlegroups.com
BTW, I stumbled accross bind11 some days ago. This library is very interesting, as it may replace boost-python in many places, i.e. Pythran.

--
You received this message because you are subscribed to the Google Groups "mingwpy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mingwpy+u...@googlegroups.com.
To post to this group, send email to min...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mingwpy/13c60f1a-f46c-4dd6-8b89-a622717e51f0%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

carlkl

unread,
Mar 19, 2016, 5:15:51 PM3/19/16
to mingwpy
BTW, I stumbled accross bind11 some days ago. This library is very interesting, as it may replace boost-python in many places, i.e. Pythran.

C.

Matthew Brett

unread,
Mar 19, 2016, 5:28:45 PM3/19/16
to mingwpy
Hi,

On Sat, Mar 19, 2016 at 2:01 PM, carlkl <cmkle...@gmail.com> wrote:
> If you don't ever mix file descriptors (file streams) of different runtimes
> and ensure that memory management isn't done in both runtimes mixing of
> runtimes can be done. As file streams are very commonly used across DLL
> boundaries, this cries for problems.
>
> Accelerated BLAS libraries are known to work. Scipy may also work, as the
> old mingw32 superpacks always links to msvcrt instead of the CPython
> C-runtimes.
>
> VS 2015 has changed the semantics of file streams. A VS2015 compatible
> mingwpy has to be recompiled and will need some changes in the mingw-crt
> header files as well.

I get the idea in general, but could you fill out the detail a bit to
help me understand?

What exactly gets passed down with - say - a Python file object into
an extension? I guess it's some sort of c-struct? Is that struct
incompatible between VS versions? What exactly could - say - a VS
2015 extension do to a 2008 file descriptor that might cause problems?
Is there some reference to look this up? I'd be happy to write
this up to help make the problem more concrete.

Cheers,

Matthew

Nathaniel Smith

unread,
Mar 19, 2016, 5:33:55 PM3/19/16
to mingwpy
A 'file descriptor' in general is an integer returned by open() that
can be passed to read() or write() or close(). (Among other APIs.)

On Unix file descriptors are indexes into a table maintained by the
kernel. On Windows, the kernel uses "file handles" instead, so the CRT
keeps an internal array mapping file descriptors to file handles,
something vaguely like:

fds = [None] * MAX_FDS
def crt_open(path):
# Find first unused slot
for fd, handle in enumerate(fds):
if handle is None:
break
# open file and assign it an fd
handle = windows_open(path)
fds[fd] = handle
return fd

def crt_read(fd):
handle = fds[fd]
return windows_read(handle)

The problem is that different CRT libraries each have their own fds
array, so msvc2008_open stashes the file handle in msvc2008_fds, and
then msvcrt_read looks for it in msvcrt_fds, and then everything falls
apart.

> I guess the other big issue is where an extension calls malloc and
> Python / another extension frees the memory. Is that a common
> pattern, that extension code will create memory using its own malloc,
> and expect Python / other extension code to free it?

No, Python usually avoids this -- each Python type is responsible for
both allocating and freeing its memory, plus Python exports memory
management functions that most libraries should be using instead of
calling malloc.

-n

--
Nathaniel J. Smith -- https://vorpus.org

Matthew Brett

unread,
Mar 19, 2016, 5:49:38 PM3/19/16
to mingwpy
Ah - so the problem with files is not ABI incompatibility, but the
fact that each runtime has a separate table of file descriptors.

Is there a good reference for that somewhere?

I guess then, that reading a file object in runtime B that has been
opened in runtime A will nearly always give nonsense or crash?

Cheers,

Matthew

Nathaniel Smith

unread,
Mar 19, 2016, 6:32:20 PM3/19/16
to mingwpy

For file descriptors, yes. For stdio FILE* you have the classic abi compatibility issues too :-). And so on...

> Is there a good reference for that somewhere?

Not that I know of, but there's really no other way to implement file descriptors on windows, so...

> I guess then, that reading a file object in runtime B that has been
> opened in runtime A will nearly always give nonsense or crash?

Yes.

-n

Matthew Brett

unread,
Mar 19, 2016, 6:39:46 PM3/19/16
to mingwpy
Are you saying that that is your best guess? I'm sure that's better
than my best guess, but it would be good to have some confirmation.

Matthew

Ian Henriksen

unread,
Mar 19, 2016, 8:02:06 PM3/19/16
to mingwpy
On Saturday, March 19, 2016 at 2:41:37 PM UTC-6, Nathaniel Smith wrote:

No, that's correct-- it is possible to have multiple CRTs in the same process, if you are very careful to never let them see each other. So this means never calling malloc() in one dll and free() in another, and never calling open() in one dll and read() in another, etc. etc. It's very easy to screw this up, and actually impossible to get right for numpy (specifically when doing file io -- python assumes that file descriptor numbers are global across the process, so numpy needs to call read() on descriptors that were open()ed by python). So as a general solution for the masses, the only reasonable strategy is the one we're taking with mingwpy of making sure the compilers are just compatible in general. But for special cases if you're very careful, understand the issues, and are prepared to audit and debug all the code you're building, then mixing CRTs is technically possible.

One case where I guess we might be being over-careful is BLAS libraries, which have a narrow enough api that they might be immune to these issues... But since we have to fix the compiler anyway for numpy and scipy and everything, then it's a lot easier and less error prone to just use the fixed compiler for BLAS too.

-n


One of the benefits of new mingwpy toolchains will be that they let everyone avoid having to mess with stuff
like this. It's awfully easy to get something like this wrong and the resulting errors are very hard to debug.
I'm not aware of any other issues beyond malloc/free and file descriptors, but it wouldn't be surprising if there
were other problems hiding here as well. IMHO, making a toolchain that is compatible with the older MSVC
runtimes is by far the best way forward for older Python versions. Hopefully the UCRT will help resolve some
of these issues in the long run.
Best,
-Ian

Erik Bray

unread,
Mar 22, 2016, 8:03:13 AM3/22/16
to min...@googlegroups.com
If it is a guess it's a good guess because it's correct (though there
are really only so many ways to do this). But it sounds like you're
asking about the specifics in which case, if you're curious and have
an older version of MSVC you can look, for example, in

C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\internal.h

Here the `ioinfo` struct is defined, and an array of `ioinfo` called
`__pioinfo` is declared. The actual `__pioinfo` pointer lives and is
initialized in ioinit.c. The module

C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\open.c

gives a decent demonstration of how this array is used by the CRT
library, and how it is used in wrapping the Win32 system calls. You
can also see some real world code I once wrote around these structures
here:

https://github.com/astropy/astropy/blob/master/astropy/io/fits/util.py#L528

as well as in the implementation of the _PyVerify_fd() function in
CPython. Unfortunately MSVC 2015 no longer exports the __pioinfo
array, and has hidden away most of these implementation details [1].
So the code I linked to just above will no longer work with
ucrtbase.dll and friends. Several other projects have had to
implement workarounds to this change. That said, there's no reason to
believe that the implementation is significantly changed.

Erik

[1] https://connect.microsoft.com/VisualStudio/feedback/details/1279133

carlkl

unread,
Mar 22, 2016, 11:37:45 AM3/22/16
to mingwpy
Reply all
Reply to author
Forward
0 new messages