cythonize generating unicode extension names (and breaking distutils)

Didrik Pinte

unread,

Apr 2, 2012, 2:05:47 PM4/2/12

to cython-users

Hi,

It has been the second version that I maintain a workaround/patch for
this issue. When using cythonize on my MacOSX (and got another report
from an Ubuntu user), I get the following exception :

Traceback (most recent call last):
File "setup.py", line 186, in <module>
ext_modules = collect_extensions(),
File "setup.py", line 164, in collect_extensions
) for dirpath in cython_extension_directories
File "/Library/Frameworks/Python.framework/Versions/7.0/lib/python2.7/site-packages/Cython/Build/Dependencies.py",
line 495, in cythonize
aliases=aliases)
File "/Library/Frameworks/Python.framework/Versions/7.0/lib/python2.7/site-packages/Cython/Build/Dependencies.py",
line 479, in create_extension_list
**kwds))
File "/Library/Frameworks/Python.framework/Versions/7.0/lib/python2.7/distutils/extension.py",
line 106, in __init__
assert type(name) is StringType, "'name' must be a string"
AssertionError: 'name' must be a string
make: *** [build] Error 1

This can be easily fix by forcing a conversion of the generated name to a str:

$ git diff
diff --git a/Cython/Build/Dependencies.py b/Cython/Build/Dependencies.py
index a031d60..62177c7 100644
--- a/Cython/Build/Dependencies.py
+++ b/Cython/Build/Dependencies.py
@@ -473,7 +473,7 @@ def create_extension_list(patterns, exclude=[], ctx=None, al
if template is not None:
sources += template.sources[1:]
module_list.append(exn_type(
- name=module_name,
+ name=str(module_name),
sources=sources,
**kwds))
m = module_list[-1]

Now, I am not particularly happy with the solution because it does not
solve the root cause. My input to cythonize is only using str and no
unicode strings.

Does anybody have a clue about the cause ?

-- Didrik

guyomes

unread,

May 31, 2012, 6:49:10 PM5/31/12

to cython-users

Hi,

I just got the same problems and after some investigation in the code,
it seems that the unicode strings are added when the dependencies are
read from the files.

More precisely:
- in Cython/Build/Dependencies.py,
- the function 'parse_dependencies'
- opens files with 'Cython.Utils.open_source_file'
- then fh.read() will likely returns unicode strings if your system
has utf-8 files, and names will be spoiled.

In then end, I think that it's good to read file contents as unicode
if that is their encodings. So your patch seems to be the best
approach to satisfy Extension ascii names convention.

To make the patch safer, I would use:

- module_name.encode('ascii')

instead of str(module_name). This allows to raise an error if some non-
ascii caracters were written on the cimport lines in the source files.

Guillaume

On Apr 2, 8:05 pm, Didrik Pinte <dpi...@enthought.com> wrote:
> Hi,
>
> It has been the second version that I maintain a workaround/patch for
> this issue. When using cythonize on my MacOSX (and got another report
> from an Ubuntu user), I get the following exception :
>
> Traceback (most recent call last):
> File "setup.py", line 186, in <module>
> ext_modules = collect_extensions(),
> File "setup.py", line 164, in collect_extensions
> ) for dirpath in cython_extension_directories

> File "/Library/Frameworks/Python.framework/Versions/7.0/lib/python2.7/site-packages/Cython/Build/Dependencies.py",

> line 495, in cythonize
> aliases=aliases)

> File "/Library/Frameworks/Python.framework/Versions/7.0/lib/python2.7/site-packages/Cython/Build/Dependencies.py",

> line 479, in create_extension_list
> **kwds))

> File "/Library/Frameworks/Python.framework/Versions/7.0/lib/python2.7/distutils/extension.py",

MinRK

unread,

May 31, 2012, 7:50:58 PM5/31/12

to cython...@googlegroups.com

On Thu, May 31, 2012 at 3:49 PM, guyomes <gmo...@gmail.com> wrote:

Hi,

I just got the same problems and after some investigation in the code,
it seems that the unicode strings are added when the dependencies are
read from the files.

More precisely:
- in Cython/Build/Dependencies.py,
- the function 'parse_dependencies'
- opens files with 'Cython.Utils.open_source_file'
- then fh.read() will likely returns unicode strings if your system
has utf-8 files, and names will be spoiled.

In then end, I think that it's good to read file contents as unicode
if that is their encodings. So your patch seems to be the best
approach to satisfy Extension ascii names convention.

To make the patch safer, I would use:

- module_name.encode('ascii')

instead of str(module_name). This allows to raise an error if some non-
ascii caracters were written on the cimport lines in the source files.

A few notes:

* distutils does not require that paths be ascii, so sys.getfilesystemencoding() might be more appropriate

* If you care about Python 3, make sure you don't encode to bytes there, since all versions of Python strictly require that this be 'str'.

This is actually a distutils bug, where it inappropriately typechecks for str instead of strtypes. If you get in and disable distutils' type checking, unicode paths work just fine (at least in my fiddling on OS X, having explored this issue just yesterday in IPython).

-MinRK

guyomes

unread,

Jun 1, 2012, 6:14:26 AM6/1/12

to cython-users

Indeed, some similar issues were reported for distutils (http://
bugs.python.org/issue13943), so I submitted this issue with a diff
patch:

http://bugs.python.org/issue14978

Reply all

Reply to author

Forward