Cython 0.20 beta 1

693 views
Skip to first unread message

Robert Bradshaw

unread,
Jan 4, 2014, 12:01:01 AM1/4/14
to Core developer mailing list of the Cython compiler, cython...@googlegroups.com
I just uploaded the first beta release for Cython 0.20, it's tagged as
0.20b1 in the git repository or you can find it at
http://cython.org/release/Cython-0.20b1.tar.gz . A summary of the
changes can be found at
https://github.com/cython/cython/blob/0.20b1/CHANGES.rst , please try
it out and let us know if you have any issues.

- Robert

Czarek Tomczak

unread,
Jan 6, 2014, 5:37:09 AM1/6/14
to cython...@googlegroups.com, Core developer mailing list of the Cython compiler
Hi Robert,

What is the syntax for C++ template functions?

Best regards,
Czarek

Robert Bradshaw

unread,
Jan 6, 2014, 11:54:33 PM1/6/14
to cython...@googlegroups.com
Similar to classes, the template parameters follow the name:
https://sage.math.washington.edu:8091/hudson/job/cython-docs/doclinks/1/src/userguide/wrapping_CPlusPlus.html#templates
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "cython-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cython-users...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Czarek Tomczak

unread,
Jan 7, 2014, 5:30:06 AM1/7/14
to cython...@googlegroups.com
On Tue, Jan 7, 2014 at 5:54 AM, Robert Bradshaw <robe...@gmail.com> wrote:
Similar to classes, the template parameters follow the name:
https://sage.math.washington.edu:8091/hudson/job/cython-docs/doclinks/1/src/userguide/wrapping_CPlusPlus.html#templates

Thanks!

-Czarek

Andreas van Cranenburgh

unread,
Jan 7, 2014, 6:02:59 AM1/7/14
to cython...@googlegroups.com, Core developer mailing list of the Cython compiler
There is a new warning:

    Non-trivial type declarators in shared declaration.

It wasn't immediately obvious to me what this means (trivial & shared could be lots of things), but looking at the code it seems to suggest that declarations should be on their own line except for primitive types.

I also get an error in code with string formatting. Turns out "%s" % foo used to be translated with PyNumber_Remainder, but now it uses __Pyx_PyString_Format with type checking which fails. This is probably an issue in my code (trying to make things work in 2 and 3) but string formatting is not mentioned in the changelog.

Stefan Behnel

unread,
Jan 7, 2014, 6:25:20 AM1/7/14
to cython...@googlegroups.com, Core developer mailing list of the Cython compiler
Hi,

thanks for reporting.

Andreas van Cranenburgh, 07.01.2014 12:02:
> There is a new warning:
>
> Non-trivial type declarators in shared declaration.
>
> It wasn't immediately obvious to me what this means (trivial & shared could
> be lots of things), but looking at the code it seems to suggest that
> declarations should be on their own line except for primitive types.

Right. It's meant to prepare a potential switch of the way pointers are
declared, as well as making it less likely to get the declarations wrong. C
is suprisingly relaxed about these things, but Cython shouldn't be.

I guess the message could be improved, though. Maybe add something like
"mix of pointers and values" as a hinting example.


> I also get an error in code with string formatting. Turns out "%s" % foo
> used to be translated with PyNumber_Remainder, but now it
> uses __Pyx_PyString_Format with type checking which fails. This is probably
> an issue in my code (trying to make things work in 2 and 3) but string
> formatting is not mentioned in the changelog.

I added an entry. Could you provide a code snippet that shows what you were
doing? Just to be sure it's really a problem in your code and not a wrong
assumption in Cython. Optimisations shouldn't break code.

Stefan

Andreas van Cranenburgh

unread,
Jan 7, 2014, 6:54:14 AM1/7/14
to cython...@googlegroups.com, Core developer mailing list of the Cython compiler, stef...@behnel.de


On Tuesday, January 7, 2014 12:25:20 PM UTC+1, Stefan Behnel wrote:
I added an entry. Could you provide a code snippet that shows what you were
doing? Just to be sure it's really a problem in your code and not a wrong
assumption in Cython. Optimisations shouldn't break code.

Here goes:

$ cat t.pyx
cdef str foo
bar = u'bar'
foo = 'foo%s' % (bar, )
print(foo)
$ python -c 'import pyximport; pyximport.install(); import t'  # Cython 0.19.2
foobar
$ python -c 'import pyximport; pyximport.install(); import t'  # Cython 0.20b1
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/acranenb/.local/lib/python2.7/site-packages/Cython-0.20b1-py2.7-linux-x86_64.egg/pyximport/pyximport.py", line 431, in load_module
    language_level=self.language_level)
  File "/home/acranenb/.local/lib/python2.7/site-packages/Cython-0.20b1-py2.7-linux-x86_64.egg/pyximport/pyximport.py", line 210, in load_module
    mod = imp.load_dynamic(name, so_path)
  File "t.pyx", line 3, in init t (/home/acranenb/.pyxbld/temp.linux-x86_64-2.7/pyrex/t.c:824)
    foo = 'foo%s' % (bar, )
ImportError: Building module t failed: ['TypeError: Expected str, got unicode\n']

Stefan Behnel

unread,
Jan 7, 2014, 9:09:37 AM1/7/14
to Core developer mailing list of the Cython compiler, Cython-users
CC-ing cython-users again, since others might run into this, too.

Stefan Behnel, 07.01.2014 13:29:
> Andreas van Cranenburgh, 07.01.2014 12:54:
>> On Tuesday, January 7, 2014 12:25:20 PM UTC+1, Stefan Behnel wrote:
>>> I added an entry. Could you provide a code snippet that shows what you
>>> were doing? Just to be sure it's really a problem in your code and not a
>>> wrong assumption in Cython. Optimisations shouldn't break code.
>>
>> Here goes:
>>
>> $ cat t.pyx
>> cdef str foo
>> bar = u'bar'
>> foo = 'foo%s' % (bar, )
>> print(foo)
>> $ python -c 'import pyximport; pyximport.install(); import t' # Cython
>> 0.19.2
>> foobar
>
> Interesting. I assume "python" is Py2.x.
>
>
>> $ python -c 'import pyximport; pyximport.install(); import t' # Cython
>> 0.20b1
>> Traceback (most recent call last):
>> File "<string>", line 1, in <module>
>> File
>> "/home/acranenb/.local/lib/python2.7/site-packages/Cython-0.20b1-py2.7-linux-x86_64.egg/pyximport/pyximport.py",
>> line 431, in load_module
>> language_level=self.language_level)
>> File
>> "/home/acranenb/.local/lib/python2.7/site-packages/Cython-0.20b1-py2.7-linux-x86_64.egg/pyximport/pyximport.py",
>> line 210, in load_module
>> mod = imp.load_dynamic(name, so_path)
>> File "t.pyx", line 3, in init t
>> (/home/acranenb/.pyxbld/temp.linux-x86_64-2.7/pyrex/t.c:824)
>> foo = 'foo%s' % (bar, )
>> ImportError: Building module t failed: ['TypeError: Expected str, got
>> unicode\n']
>
> Hmm, the problem here is not the string formatting, it's the explicitly
> typed "foo". The string formatting expands the plain str formatting string
> into a Unicode string in Py2, but that's not a "str" any more, so the
> subsequent assignment fails due to Cython's type check. You could simply
> your code to this:
>
> bar = u'bar'
> cdef str foo = bar
>
> (if you do this inside of a function, Cython's type inference should be
> able to deduct a compile time error, but it doesn't do that at the global
> module level)
>
> I wonder why it doesn't produce an error in Cython 0.19 for you...

Tried it, and it seems like it simply doesn't generate a type check at all.
That's clearly wrong and apparently fixed in 0.20, although I don't
remember a change specifically in that corner.

What you can do in Cython 0.20 to make it work with a statically typed
variable, is to use "cdef basestring foo" as a type instead of "str", but
that doesn't work in older Cython versions.

Or, in fact, you could not type it at all. Cython 0.20 should be able to
figure it out all by itself now (as it understands what "some_string %
something_else" actually does) and automatically type the result as basestring.

Stefan

Czarek Tomczak

unread,
Jan 9, 2014, 7:04:18 AM1/9/14
to cython...@googlegroups.com, Core developer mailing list of the Cython compiler
Robert,

I've just tested CEF Python with Cython 0.20 beta 1. I get type error when running app:

TypeError: Expected str, got unicode

I've debugged it and seems that the problem is that json.loads() returns unicode strings when using Cython 0.20, and bytes strings when using Cython 0.19.2. Here is my code:

  # str1 is of type "str"
  jsonData = str1[len(cefPythonMessageHash):]
  # jsonData is still of type "str"
  message = json.loads(jsonData)
  # message["functionName"] is now of type "unicode"!

This is still exactly the same version of python and the same json module when running with different versions of Cython. Why the different results? I've searched the Cython sources and I couldn't find any occurence of the "json" keyword. So why does json.loads() behave differently with different Cython versions? My Python version is 2.7.2 and the json module version is 2.0.9.

Not sure if this could be related. In the setup file I have some cython directives to use bytes c strings:

    cython_directives={
        # Any conversion to unicode must be explicit using .decode().
        "c_string_type": "bytes",
        "c_string_encoding": "utf-8",
    },

Another problem I noticed are some compile warnings in __Pyx_PyInt_From_int64():

cefpython.cpp(84198) : warning C4244: 'argument' : conversion from 'int64' to 'long', possible loss of data
cefpython.cpp(84200) : warning C4244: 'argument' : conversion from 'int64' to 'unsigned long', possible loss of data
cefpython.cpp(84206) : warning C4244: 'argument' : conversion from 'int64' to 'long', possible loss of data
cefpython.cpp(84328) : warning C4244: 'argument' : conversion from '__int64' to 'long', possible loss of data
cefpython.cpp(84330) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned long', possible loss of data
cefpython.cpp(84336) : warning C4244: 'argument' : conversion from '__int64' to 'long', possible loss of data
cefpython.cpp(84454) : warning C4244: 'argument' : conversion from 'uint64' to 'long', possible loss of data
cefpython.cpp(84456) : warning C4244: 'argument' : conversion from 'uint64' to 'unsigned long', possible loss of data
cefpython.cpp(84462) : warning C4244: 'argument' : conversion from 'uint64' to 'long', possible loss of data\

These warnings do not occur when using Cython 0.19.2.

The source of __Pyx_PyInt_From_int64 function:

static CYTHON_INLINE PyObject* __Pyx_PyInt_From_int64(int64 value) {
    const int64 neg_one = (int64) -1, const_zero = 0;
    const int is_unsigned = neg_one > const_zero;
    if (is_unsigned) {
        if (sizeof(int64) < sizeof(unsigned long)) {
            return PyInt_FromLong(value); <<< warning on line 84198
        } else if (sizeof(int64) <= sizeof(unsigned long)) {
            return PyLong_FromUnsignedLong(value); <<< warning on line 84200
        } else if (sizeof(int64) <= sizeof(unsigned long long)) {
            return PyLong_FromUnsignedLongLong(value);
        }
    } else {
        if (sizeof(int64) <= sizeof(long)) {
            return PyInt_FromLong(value);
        } else if (sizeof(int64) <= sizeof(long long)) {
            return PyLong_FromLongLong(value);
        }
    }
    {
        int one = 1; int little = (int)*(unsigned char *)&one;
        unsigned char *bytes = (unsigned char *)&value;
        return _PyLong_FromByteArray(bytes, sizeof(int64),
                                     little, !is_unsigned);
    }
} 

Best regard,
Czarek

Stefan Behnel

unread,
Jan 9, 2014, 7:57:31 AM1/9/14
to cython...@googlegroups.com
Czarek Tomczak, 09.01.2014 13:04:
> I've just tested CEF Python <https://code.google.com/p/cefpython/> with
> Cython 0.20 beta 1. I get type error when running app:
>
> TypeError: Expected str, got unicode
>
> I've debugged it and seems that the problem is that json.loads() returns
> unicode strings when using Cython 0.20, and bytes strings when using Cython
> 0.19.2.

This can't have anything to do with Cython.


> Here is my code:
>
> # str1 is of type "str"

Do you mean it's statically typed that way or is it just the runtime type?


> jsonData = str1[len(cefPythonMessageHash):]
> # jsonData is still of type "str"
> message = json.loads(jsonData)
> # message["functionName"] is now of type "unicode"!

Then the json module decoded it into a unicode string, which is totally
reasonable.

1) Are you sure that what you got back before was of type "str"?

2) What does your code do with this value afterwards?

Could you provide a complete code snippet, including all statically
declared types etc.?

My guess is that you are assigning the value to a variable typed as "str".
If so, don't do that. Either use no static typing at all here, or type it
as "basestring" (new in 0.20). "str" is not "unicode" in Py2, and an
assignment of a unicode object to a str typed variable will fail. (It
apparently didn't in 0.19, and that was a bug.)

https://sage.math.washington.edu:8091/hudson/job/cython-docs/doclinks/1/src/tutorial/strings.html#python-string-types-in-cython-code


> This is still exactly the same version of python and the same json module
> when running with different versions of Cython. Why the different results?
> I've searched the Cython sources and I couldn't find any occurence of the
> "json" keyword. So why does json.loads() behave differently with different
> Cython versions?

I'm sure it doesn't.


> My Python version is 2.7.2 and the json module version is 2.0.9.
>
> Not sure if this could be related. In the setup file I have some cython
> directives to use bytes c strings:
>
> cython_directives={
>> # Any conversion to unicode must be explicit using .decode().
>> "c_string_type": "bytes",
>> "c_string_encoding": "utf-8",
>> },

Hmm, this is a funny setup. I wonder what these two actually do in that
combination. Is there a reason why you added them?
My guess is that you can disregard these warnings, because they are for
code that the C compiler discards. Still, it would be better for Cython to
avoid them by using explicit casts, e.g.

if (sizeof(int64) < sizeof(unsigned long)) {
return PyInt_FromLong((long)value); // <- cast added here!

Stefan

Czarek Tomczak

unread,
Jan 9, 2014, 9:08:50 AM1/9/14
to cython...@googlegroups.com, stef...@behnel.de
On Thursday, January 9, 2014 1:57:31 PM UTC+1, Stefan Behnel wrote:
2) What does your code do with this value afterwards?

Here is the code:

ctypedef object py_string

cdef JavascriptCallback CreateJavascriptCallback(py_string functionName):
    Debug("Created javascript callback, callbackId=%s, functionName=%s" % \
            (callbackId, functionName))

And it throws an error in Cython 0.20:

TypeError: Expected str, got unicode

The functionName is of type unicode. In Cython 0.19.2 it works fine. But looks like in Cython 0.20 I need to cast it explicitilly using str()? Or should I define py_string as basestring?

You were right that json.loads() returned unicode strings in both 0.19.2 and 0.20, I must have messed up something when printing debug information.

>     cython_directives={ 
>>         # Any conversion to unicode must be explicit using .decode(). 
>>         "c_string_type": "bytes", 
>>         "c_string_encoding": "utf-8", 
>>     }, 
Hmm, this is a funny setup. I wonder what these two actually do in that 
combination. Is there a reason why you added them? 

This is for backwards compatibility. The code runs on both Python 2.7 and Python 3, there are many conditions in code that check python version and act accordingly. In one of previous cython versions c string types were bytes on Python 2.7 by default, and unicode on Python 3. It all changed in one of cython releases, lots of errors started appearing, because the code was taking for granted that c string types are bytes in Python 2.7. So the fix was either to modify 20 files or to add these cython directives to setup. The latter option was chosen.

-Czarek

Czarek Tomczak

unread,
Jan 9, 2014, 9:18:00 AM1/9/14
to cython...@googlegroups.com, stef...@behnel.de
On Thursday, January 9, 2014 3:08:50 PM UTC+1, Czarek Tomczak wrote:
>     cython_directives={ 
>>         # Any conversion to unicode must be explicit using .decode(). 
>>         "c_string_type": "bytes", 
>>         "c_string_encoding": "utf-8", 
>>     }, 
Hmm, this is a funny setup. I wonder what these two actually do in that 
combination. Is there a reason why you added them? 

This is for backwards compatibility. The code runs on both Python 2.7 and Python 3, there are many conditions in code that check python version and act accordingly. In one of previous cython versions c string types were bytes on Python 2.7 by default, and unicode on Python 3. It all changed in one of cython releases, lots of errors started appearing, because the code was taking for granted that c string types are bytes in Python 2.7. So the fix was either to modify 20 files or to add these cython directives to setup. The latter option was chosen.


Ahh, you asked about the combination of these two. What is wrong with that? In python 2.7 unicode file paths are broken (there was a discussion about that some time ago on this group). I must use utf-8 bytes. What other options do I have?

-Czarek

Stefan Behnel

unread,
Jan 9, 2014, 9:24:02 AM1/9/14
to cython...@googlegroups.com
Czarek Tomczak, 09.01.2014 15:08:
> On Thursday, January 9, 2014 1:57:31 PM UTC+1, Stefan Behnel wrote:
>>
>> 2) What does your code do with this value afterwards?
>
> Here is the code:
>
> ctypedef object py_string
>>
>> cdef JavascriptCallback CreateJavascriptCallback(py_string functionName):
>>
> Debug("Created javascript callback, callbackId=%s, functionName=%s" % \
>> (callbackId, functionName))
>
>
> And it throws an error in Cython 0.20:
>
> TypeError: Expected str, got unicode
>
>
> The functionName is of type unicode.

Let me guess. The "Debug" function is defined as

def Debug(str input): ...


> In Cython 0.19.2 it works fine.

Correction: in Cython 0.19.2 you might not be getting the error that you
should get.


> looks like in Cython 0.20 I need to cast it explicitilly using str()?

That is rarely a good idea, especially in Py2. Leads to all sorts of subtle
bugs.


> Or should I define py_string as basestring?

Depends. Again: could you provide the complete code snippet, *please* ?

And it would be even better if you could provide a code snippet that is so
complete that I could even run it through the compiler.

Stefan

Stefan Behnel

unread,
Jan 9, 2014, 9:41:30 AM1/9/14
to cython...@googlegroups.com
Czarek Tomczak, 09.01.2014 15:18:
> On Thursday, January 9, 2014 3:08:50 PM UTC+1, Czarek Tomczak wrote:
>>
>>> cython_directives={
>>>>> # Any conversion to unicode must be explicit using .decode().
>>>>> "c_string_type": "bytes",
>>>>> "c_string_encoding": "utf-8",
>>>>> },
>>> Hmm, this is a funny setup. I wonder what these two actually do in that
>>> combination. Is there a reason why you added them?
>>
>>
>> This is for backwards compatibility. The code runs on both Python 2.7 and
>> Python 3, there are many conditions in code that check python version and
>> act accordingly. In one of previous cython versions c string types were
>> bytes on Python 2.7 by default, and unicode on Python 3.

Sorry, what? When was that?


>> It all changed in
>> one of cython releases, lots of errors started appearing, because the code
>> was taking for granted that c string types are bytes in Python 2.7.

And they still map to them, in both Py2 and Py3, unless you override the
mapping explicitly using the above two config options.


>> So the
>> fix was either to modify 20 files or to add these cython directives to
>> setup. The latter option was chosen.
>
> Ahh, you asked about the combination of these two. What is wrong with that?

Hmm, not sure exactly, the documentation is seriously lacking here.

I think it means that C strings turn into bytes on conversion to Python
objects and that Unicode strings turn into UTF-8 encoded C strings.

Is that what you wanted?

I'm also not sure about the behaviour when you do something like
"<unicode>some_c_string", may or may not work.


> In
> python 2.7 unicode file paths are broken (there was a discussion about that
> some time ago on this group). I must use utf-8 bytes. What other options do
> I have?

Be explicit in your code about when you encode and decode. Always.

Stefan

Czarek Tomczak

unread,
Jan 9, 2014, 10:16:17 AM1/9/14
to cython...@googlegroups.com, stef...@behnel.de
On Thursday, January 9, 2014 3:24:02 PM UTC+1, Stefan Behnel wrote:
Let me guess. The "Debug" function is defined as

   def Debug(str input): ...

Good guess. That message is printed, but also written to a log file, so it rather should be bytes (or maybe not?). Hmm I remember having some problems with unicode characters that coudln't be decoded when writing to a file. 


> Or should I define py_string as basestring?

Depends. Again: could you provide the complete code snippet, *please* ?
 
And it would be even better if you could provide a code snippet that is so 
complete that I could even run it through the compiler. 

Here is the complete code snippet:

import json
import Cython
print("Cython version = %s" % Cython.__version__)
 
ctypedef object py_string
g_debug = True
g_debugFile = "debug.log"
 
cpdef object Debug(str msg):
    if not g_debug:
        return
    msg = "cefpython: "+str(msg)
    print(msg)
    if g_debugFile:
        try:
            with open(g_debugFile, "a") as file:
                file.write(msg+"\n")
        except:
            print("cefpython: WARNING: failed writing to debug file: %s" % (
                    g_debugFile))
 
cpdef object test():
    cdef py_string cefPythonMessageHash = "####cefpython####"
    cdef bytes messageString = <bytes>"""####cefpython####
            {"what":"javascript-callback","callbackId":123,
             "frameId":123,"functionName":"xxx"}"""
    cdef py_string jsonData = messageString[len(cefPythonMessageHash):]
    print("type of jsonData = %s" % type(jsonData))
    cdef object message = json.loads(jsonData)
    print("type of message[functionName] = %s" % type(message["functionName"]))
    msg = "Created javascript callback, callbackId=%s, functionName=%s" % \
            (message["callbackId"], message["functionName"])
    Debug(msg)
    return None

And the output logs:

C:\cefpython\json-loads-bug>call python "test2.py"
Cython version = 0.20b1
type of jsonData = <type 'str'>
type of message[functionName] = <type 'unicode'>

Traceback (most recent call last):
  File "test2.py", line 2, in <module>
    test.test()
  File "test.pyx", line 23, in test.test (test.cpp:1377)
    cpdef object test():
  File "test.pyx", line 34, in test.test (test.cpp:1314)
    Debug(msg)

TypeError: Expected str, got unicode 

Thanks for taking a look. 

-Czarek

Czarek Tomczak

unread,
Jan 9, 2014, 10:42:29 AM1/9/14
to cython...@googlegroups.com, stef...@behnel.de
On Thursday, January 9, 2014 3:41:30 PM UTC+1, Stefan Behnel wrote:
>> This is for backwards compatibility. The code runs on both Python 2.7 and
>> Python 3, there are many conditions in code that check python version and
>> act accordingly. In one of previous cython versions c string types were
>> bytes on Python 2.7 by default, and unicode on Python 3.

Sorry, what? When was that?

>> It all changed in
>> one of cython releases, lots of errors started appearing, because the code
>> was taking for granted that c string types are bytes in Python 2.7.

And they still map to them, in both Py2 and Py3, unless you override the
mapping explicitly using the above two config options.


Okay, never mind what I said, I got confused, it must have been a different thing. Cython 0.19 introduced the c_string_type and c_string_encoding directives. I remember that I was forced to add these directives, otherwise I was getting some errors about not being explicit during conversion somewhere in the code (or something like that, I don't exactly remember).

I think it means that C strings turn into bytes on conversion to Python
objects and that Unicode strings turn into UTF-8 encoded C strings.

Is that what you wanted?

Yes. Chromium/CEF provides C strings with utf-8 encoding. So when mixing it with Cython, I think it's a good idea to also make C strings have utf-8 encoding by default. Cython and Chromium exchange with these strings in both ways, when sending data to javascript and receiving from javascript.

I'm also not sure about the behaviour when you do something like
"<unicode>some_c_string", may or may not work. 

> In
> python 2.7 unicode file paths are broken (there was a discussion about that
> some time ago on this group). I must use utf-8 bytes. What other options do
> I have?

Be explicit in your code about when you encode and decode. Always.

Here are my utility functions for converting strings between CEF <> Cython:

cdef py_string CefToPyString(
ConstCefString& cefString):
cdef cpp_string cppString
if cefString.empty():
return ""
IF UNAME_SYSNAME == "Windows":
cdef wchar_t* wcharstr = <wchar_t*> cefString.c_str()
return WidecharToPyString(wcharstr)
ELSE:
cppString = cefString.ToString()
if PY_MAJOR_VERSION < 3:
return <bytes>cppString
else:
return <unicode>((<bytes>cppString).decode(
g_applicationSettings["string_encoding"],
errors=BYTES_DECODE_ERRORS))

cdef void PyToCefString(
py_string pyString,
CefString& cefString
) except *:
if PY_MAJOR_VERSION < 3:
if type(pyString) == unicode:
pyString = <bytes>(pyString.encode(
g_applicationSettings["string_encoding"],
errors=UNICODE_ENCODE_ERRORS))
else:
# The unicode type is not defined in Python 3.
if type(pyString) == str:
pyString = <bytes>(pyString.encode(
g_applicationSettings["string_encoding"],
errors=UNICODE_ENCODE_ERRORS))
cdef cpp_string cppString = pyString
# Using cefString.FromASCII() will result in DCHECK failures
# when a non-ascii character is encountered.
cefString.FromString(cppString)
 
All string utility functions:

Thanks,
Czarek

Stefan Behnel

unread,
Jan 9, 2014, 11:21:56 AM1/9/14
to cython...@googlegroups.com

Czarek Tomczak, 09.01.2014 16:16:
> On Thursday, January 9, 2014 3:24:02 PM UTC+1, Stefan Behnel wrote:
>>
>> Let me guess. The "Debug" function is defined as
>>
>> def Debug(str input): ...
>>
>
> Good guess. That message is printed, but also written to a log file, so it
> rather should be bytes (or maybe not?). Hmm I remember having some problems
> with unicode characters that coudln't be decoded when writing to a file.

Hmm, maybe you meant "encoded"?


> Source code of the Debug() function:
> https://code.google.com/p/cefpython/source/browse/cefpython/utils.pyx?r=2f6f611c2fcf#20
> The call to Debug() is on line 13 here:
> https://code.google.com/p/cefpython/source/browse/cefpython/javascript_callback_cef3.pyx?r=2f6f611c2fcf
>
>> Or should I define py_string as basestring?
>>
>> Depends. Again: could you provide the complete code snippet, *please* ?
>
> And it would be even better if you could provide a code snippet that is so
> complete that I could even run it through the compiler.
>
>
> Here is the complete code snippet:
>
>> import json
>> import Cython
>> print("Cython version = %s" % Cython.__version__)
>
>> ctypedef object py_string

This typedef looks a bit funny, but I guess you're only doing that in order
to make it easier to change it to an exact type later?


>> g_debug = True
>> g_debugFile = "debug.log"
>
>
>> cpdef object Debug(str msg):

It's generally a bad idea to type an input argument with "str". It's ok in
Py3-only code, but it fails to do The Right Thing with Python 2's
str/unicode string ambiguity. You could use basestring here, but I'd rather
leave it untyped. There's nothing really to gain here.


>> if not g_debug:
>> return
>> msg = "cefpython: "+str(msg)

You already typed "msg" as "str" above, so converting it to str() here is a
no-op. You can remove either of the two.


>> print(msg)
>> if g_debugFile:
>> try:
>> with open(g_debugFile, "a") as file:
>> file.write(msg+"\n")

ISTM that what you want in this file is text, so why not open it in text
(i.e. Unicode) mode with a proper encoding?

Note that print() isn't safe for arbitrary output, though, unless you also
control the system encoding of sys.stdout.


>> except:
>> print("cefpython: WARNING: failed writing to debug file: %s" %
>> (
>> g_debugFile))
>>
>
>
>> cpdef object test():
>> cdef py_string cefPythonMessageHash = "####cefpython####"
>> cdef bytes messageString = <bytes>"""####cefpython####
>> {"what":"javascript-callback","callbackId":123,
>> "frameId":123,"functionName":"xxx"}"""

You should use the "b" prefix for byte strings. Also, unprefixed strings
auto-coerce in Cython, so the <bytes> cast is even redundant.


>> cdef py_string jsonData = messageString[len(cefPythonMessageHash):]

Here, you are mixing str and bytes. That is generally a bad idea. You
should make it clear in your code what you are processing, bytes or text,
and use the appropriate string type.


>> print("type of jsonData = %s" % type(jsonData))
>> cdef object message = json.loads(jsonData)
>> print("type of message[functionName] = %s" %
>> type(message["functionName"]))
>> msg = "Created javascript callback, callbackId=%s, functionName=%s" % \
>> (message["callbackId"], message["functionName"])
>> Debug(msg)

This is what I meant. You are taking a string of which you can't know the
type in Py2 and pass it into a function that requires a str value. This
fails if the string is a unicode string. It's only guaranteed to work in Py3.

Stefan

Stefan Behnel

unread,
Jan 9, 2014, 11:45:44 AM1/9/14
to cython...@googlegroups.com
Czarek Tomczak, 09.01.2014 16:42:
> On Thursday, January 9, 2014 3:41:30 PM UTC+1, Stefan Behnel wrote:
>>
>>>> This is for backwards compatibility. The code runs on both Python 2.7
>> and
>>>> Python 3, there are many conditions in code that check python version
>> and
>>>> act accordingly. In one of previous cython versions c string types were
>>>> bytes on Python 2.7 by default, and unicode on Python 3.
>>
>> Sorry, what? When was that?
>>
>>>> It all changed in
>>>> one of cython releases, lots of errors started appearing, because the
>> code
>>>> was taking for granted that c string types are bytes in Python 2.7.
>>
>> And they still map to them, in both Py2 and Py3, unless you override the
>> mapping explicitly using the above two config options.
>>
>>
> Okay, never mind what I said, I got confused, it must have been a different
> thing. Cython 0.19 introduced the c_string_type and c_string_encoding
> directives. I remember that I was forced to add these directives, otherwise
> I was getting some errors about not being explicit during conversion
> somewhere in the code (or something like that, I don't exactly remember).

That sounds to me (from a distance, admittedly) like you took the wrong
path. Instead, you should have fixed your code.


>> I think it means that C strings turn into bytes on conversion to Python
>> objects and that Unicode strings turn into UTF-8 encoded C strings.
>>
>> Is that what you wanted?
>
> Yes. Chromium/CEF provides C strings with utf-8 encoding. So when mixing it
> with Cython, I think it's a good idea to also make C strings have utf-8
> encoding by default. Cython and Chromium exchange with these strings in
> both ways, when sending data to javascript and receiving from javascript.

Why not always work with unicode strings? I assume you are dealing with
text here, right? Wanting to support both cases in one code base, i.e.
Unicode strings and encoded byte strings, is just screaming for trouble and
hassle, IMHO.

Obviously, it depends also on what you are doing with the content of these
strings, but if you are passing them into Python space at some point,
you'll want to decode them anyway, so why not do it right at the border to CEF?


> Here are my utility functions for converting strings between CEF <> Cython:

Hm, this is really badly formatted, but I'll see if I can make sense of it.


> cdef py_string CefToPyString(
> ConstCefString& cefString):
> cdef cpp_string cppString
> if cefString.empty():
> return ""

Ok, so this is a str value ...


> IF UNAME_SYSNAME == "Windows":
> cdef wchar_t* wcharstr = <wchar_t*> cefString.c_str()
> return WidecharToPyString(wcharstr)

.... no idea what this is ...


> ELSE:
> cppString = cefString.ToString()
> if PY_MAJOR_VERSION < 3:
> return <bytes>cppString

... but here you are returning bytes, although only in Py2, so this is
actually a "str". Casting it to <bytes> is ok, though, because it's
Py2-only code. Note that coercion to bytes is the default behaviour for
C/C++ strings, though, so my guess is that the cast is actually redundant.


> else:
> return <unicode>((<bytes>cppString).decode(
> g_applicationSettings["string_encoding"],
> errors=BYTES_DECODE_ERRORS))

And here you are returning a unicode string, but only in Py3, so this is a
"str" again. Fine. No need to cast it to <unicode>, though, that's a no-op
again.

Assuming that cppString is an actual C++ std::string, casting it to <bytes>
first is also redundant and costly. Instead, call .decode() on it directly,
Cython supports that.

I take it that this function is supposed to always return a "str" value,
both in Py2 and Py3. I already commented on this above.


> cdef void PyToCefString(
> py_string pyString,
> CefString& cefString
> ) except *:
> if PY_MAJOR_VERSION < 3:
> if type(pyString) == unicode:

What about subtypes?

> pyString = <bytes>(pyString.encode(
> g_applicationSettings["string_encoding"],
> errors=UNICODE_ENCODE_ERRORS))

Cython can generate more efficient code if you cast pyString instead of the
result, i.e.

(<unicode>pyString).encode(...)


> else:
> # The unicode type is not defined in Python 3.

But it's defined in Cython, so the following is dead code:


> if type(pyString) == str:
> pyString = <bytes>(pyString.encode(
> g_applicationSettings["string_encoding"],
> errors=UNICODE_ENCODE_ERRORS))

These two conversion functions look generally ok. I think the problem is
more your general usage of string types in the code.

Stefan

Czarek Tomczak

unread,
Jan 9, 2014, 11:56:39 AM1/9/14
to cython...@googlegroups.com, stef...@behnel.de
On Thursday, January 9, 2014 5:21:56 PM UTC+1, Stefan Behnel wrote:
>> ctypedef object py_string

This typedef looks a bit funny, but I guess you're only doing that in order
to make it easier to change it to an exact type later?

In Python 2.7 I use bytes strings by default, in Python 3 unicode strings. Hmm I wasn't aware of "basestring" type, I should probably use it. I prefer that all functions state explicitilly what types of parameters they accept. The reason for it is to have more static typing I guess, better error detection during compiling.

>>     print(msg) 
>>     if g_debugFile:
>>         try:
>>             with open(g_debugFile, "a") as file:
>>                 file.write(msg+"\n")

ISTM that what you want in this file is text, so why not open it in text
(i.e. Unicode) mode with a proper encoding?

Note that print() isn't safe for arbitrary output, though, unless you also
control the system encoding of sys.stdout.

Found this code in some other file, this is probably what you want me to do?

  if type(errorMsg) == bytes:
        errorMsg = errorMsg.decode(encoding=appEncoding, errors="replace")
    try:
        with codecs.open(errorFile, mode="a", encoding=appEncoding) as fp:
            fp.write("\n[%s] %s\n" % (
                    time.strftime("%Y-%m-%d %H:%M:%S"), errorMsg))
    except:
        print("cefpython: WARNING: failed writing to error file: %s" % (
                errorFile))
 
..

>>     cdef py_string jsonData = messageString[len(cefPythonMessageHash):]

Here, you are mixing str and bytes. That is generally a bad idea. You
should make it clear in your code what you are processing, bytes or text,
and use the appropriate string type.

messageString will be unicode in Python 3 and bytes in Python 2.7, that's why I use py_string type here.
 
-Czarek

Stefan Behnel

unread,
Jan 9, 2014, 12:04:08 PM1/9/14
to cython...@googlegroups.com
Czarek Tomczak, 09.01.2014 17:56:
> On Thursday, January 9, 2014 5:21:56 PM UTC+1, Stefan Behnel wrote:
>>
>>>> ctypedef object py_string
>>
>> This typedef looks a bit funny, but I guess you're only doing that in
>> order
>> to make it easier to change it to an exact type later?
>
> In Python 2.7 I use bytes strings by default, in Python 3 unicode strings.
> Hmm I wasn't aware of "basestring" type, I should probably use it.

If you use bytes in Py2 and unicode in Py3, then str is your type. The
problem you are facing, however, is that you do *not* only have bytes in
Py2. Lots of places can give you unicode strings, such as string formatting
or user code. If you want to allow that, basestring will do it. However,
after accepting the value, you have to normalise it in Py2 to make sure you
really have a byte string.


> I prefer
> that all functions state explicitilly what types of parameters they accept.
> The reason for it is to have more static typing I guess, better error
> detection during compiling.

Fair enough. Note that bytes != str != unicode != basestring in Cython, though.


>>> print(msg)
>>
>>> if g_debugFile:
>>>> try:
>>>> with open(g_debugFile, "a") as file:
>>>> file.write(msg+"\n")
>>
>> ISTM that what you want in this file is text, so why not open it in text
>> (i.e. Unicode) mode with a proper encoding?
>>
>> Note that print() isn't safe for arbitrary output, though, unless you also
>> control the system encoding of sys.stdout.
>>
>
> Found this code in some other file, this is probably what you want me to do?
>
> if type(errorMsg) == bytes:
>> errorMsg = errorMsg.decode(encoding=appEncoding, errors="replace")
>> try:
>> with codecs.open(errorFile, mode="a", encoding=appEncoding) as fp:
>> fp.write("\n[%s] %s\n" % (
>> time.strftime("%Y-%m-%d %H:%M:%S"), errorMsg))
>> except:
>> print("cefpython: WARNING: failed writing to error file: %s" % (
>> errorFile))

Something like that, yes. There's also io.open() in Py2.6+, which is
essentially Py3's open() builtin.

Stefan

Chris Barker

unread,
Jan 9, 2014, 12:04:28 PM1/9/14
to cython-users, stef...@behnel.de
On Thu, Jan 9, 2014 at 8:56 AM, Czarek Tomczak <czarek....@gmail.com> wrote:
In Python 2.7 I use bytes strings by default, in Python 3 unicode strings.

Why not unicode in both cases? You are either ANSI-only or you are unicode -- and with Chrome, I can't imagine you could count on ANSI only for anything. So why not unicode everywhere?

What does the Chrome lib use for unicode strings in its C++ code? All you should need is a translator from python unicode to that -- probably a more or less one-line encode and decode function.
 

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov

Czarek Tomczak

unread,
Jan 9, 2014, 12:29:04 PM1/9/14
to cython...@googlegroups.com, stef...@behnel.de
On Thursday, January 9, 2014 5:45:44 PM UTC+1, Stefan Behnel wrote:
Why not always work with unicode strings? I assume you are dealing with
text here, right? Wanting to support both cases in one code base, i.e.
Unicode strings and encoded byte strings, is just screaming for trouble and
hassle, IMHO.

One thing is backwards compatibility. Users are already expecting that cefpython API returns bytes strings in Python 2.7. I'm not sure if this is a good idea to make such a major change now. It could break code in user apps. This would also require some singificant time for changes and testing it thoroughly.
 
Obviously, it depends also on what you are doing with the content of these
strings, but if you are passing them into Python space at some point,
you'll want to decode them anyway, so why not do it right at the border to CEF?

Until now, it wasn't really needed to convert these strings to unicode strings, but this is probably I haven't been working much with languages other than English. I still haven't received too much demand from users for automatic unicode support. And I have this concern of breaking backwards compatibility, so I haven't yet done too much into that direction. In python 3 all should work great though.
 
> ELSE:
> cppString = cefString.ToString()
> if PY_MAJOR_VERSION < 3:
> return <bytes>cppString

... but here you are returning bytes, although only in Py2, so this is
actually a "str". Casting it to <bytes> is ok, though, because it's
Py2-only code. Note that coercion to bytes is the default behaviour for
C/C++ strings, though, so my guess is that the cast is actually redundant.

I know these are redundant. At one moment I got confused by all this unicode hell, the unfortunate naming of encode/decode doesn't help. To make the code more clear I added these explicit casts so that it's easier to read, to see what's really happening.
 
> else:
> return <unicode>((<bytes>cppString).decode(
> g_applicationSettings["string_encoding"],
> errors=BYTES_DECODE_ERRORS))

And here you are returning a unicode string, but only in Py3, so this is a
"str" again. Fine. No need to cast it to <unicode>, though, that's a no-op
again.

Assuming that cppString is an actual C++ std::string, casting it to <bytes>
first is also redundant and costly. Instead, call .decode() on it directly,
Cython supports that.

I take it that this function is supposed to always return a "str" value,
both in Py2 and Py3. I already commented on this above.  
.. 
> pyString = <bytes>(pyString.encode( 
> g_applicationSettings["string_encoding"], 
> errors=UNICODE_ENCODE_ERRORS)) 
Cython can generate more efficient code if you cast pyString instead of the 
result, i.e. 
    (<unicode>pyString).encode(...)  

I don't think that performance of these functions is a concern at the moment, but thanks for the hints, as who knows it might change in the future.
 
> cdef void PyToCefString(
> py_string pyString,
> CefString& cefString
> ) except *:
> if PY_MAJOR_VERSION < 3:
> if type(pyString) == unicode:

What about subtypes?

Ahhh ;-) I should probably check with isinstance(pyString, unicode)?
And the same for "type(pyString) == str", should become isinstance(pyString, str)?
 
> else:
> # The unicode type is not defined in Python 3.

But it's defined in Cython, so the following is dead code:
 
> if type(pyString) == str: 
> pyString = <bytes>(pyString.encode( 
> g_applicationSettings["string_encoding"], 
> errors=UNICODE_ENCODE_ERRORS))  

No no no, there was a condition checking for python version earlier:

    if PY_MAJOR_VERSION < 3:
        if type(pyString) == unicode:
            ...

    else:
        # The unicode type is not defined in Python 3.
        if type(pyString) == str:
            ...

The comment "The unicode type is not defined in Python 3." was regarding that I had to check for PY_MAJOR_VERSION before I could check if the type is unicode, otherwise I would get an error in Python 3:

if py2.7:
 ..
else: # py 3
  if type == str # in py 3 str == unicode

But I already know that this is not the case in Cython, as unicode type is always defined. So it's a bit of a redundant code.

Thanks for all the comments.

-Czarek 

Czarek Tomczak

unread,
Jan 9, 2014, 12:44:22 PM1/9/14
to cython...@googlegroups.com, stef...@behnel.de
On Thursday, January 9, 2014 6:04:08 PM UTC+1, Stefan Behnel wrote:
> I prefer
> that all functions state explicitilly what types of parameters they accept.
> The reason for it is to have more static typing I guess, better error
> detection during compiling.

Fair enough. Note that bytes != str != unicode != basestring in Cython, though.

Okay, so if I undertand correctly I cannot define py_string as basestring, it must be object. I cannot have static typing in function parameters to accept only strings? I would have to add isinstance(s, basestring) in function body, but that's too much hassle.

-Czarek

Czarek Tomczak

unread,
Jan 9, 2014, 12:52:55 PM1/9/14
to cython...@googlegroups.com, stef...@behnel.de
Hi Chris,

On Thursday, January 9, 2014 6:04:28 PM UTC+1, Chris Barker wrote:
On Thu, Jan 9, 2014 at 8:56 AM, Czarek Tomczak <czarek....@gmail.com> wrote:
In Python 2.7 I use bytes strings by default, in Python 3 unicode strings.

Why not unicode in both cases? You are either ANSI-only or you are unicode -- and with Chrome, I can't imagine you could count on ANSI only for anything. So why not unicode everywhere?

One reason is backwards compatibility. User apps might break as they are already imply that cefpython strings in Py27 are bytes. Although this could be configurable through some option, if someone would like to have unicode strings. On the other hand this would definitely make the code more complex to support two different code paths, and there wasn't much demand for a better unicode support yet, so probably this isn't high priority.

What does the Chrome lib use for unicode strings in its C++ code? All you should need is a translator from python unicode to that -- probably a more or less one-line encode and decode function.

Chrome uses ICU library for unicode support.

The performance for these strings is rather not an issue right now.

-Czarek

Czarek Tomczak

unread,
Jan 9, 2014, 12:55:22 PM1/9/14
to cython...@googlegroups.com, stef...@behnel.de
On Thursday, January 9, 2014 6:29:04 PM UTC+1, Czarek Tomczak wrote:
> else:
> # The unicode type is not defined in Python 3.

But it's defined in Cython, so the following is dead code:
 
> if type(pyString) == str: 
> pyString = <bytes>(pyString.encode( 
> g_applicationSettings["string_encoding"], 
> errors=UNICODE_ENCODE_ERRORS))  

No no no, there was a condition checking for python version earlier:
.......

Ahh, okay, I understand now what you meant by dead code..

-Czarek

Stefan Behnel

unread,
Jan 9, 2014, 2:03:51 PM1/9/14
to cython...@googlegroups.com
Czarek Tomczak, 09.01.2014 18:29:
> On Thursday, January 9, 2014 5:45:44 PM UTC+1, Stefan Behnel wrote:
>> Why not always work with unicode strings? I assume you are dealing with
>> text here, right? Wanting to support both cases in one code base, i.e.
>> Unicode strings and encoded byte strings, is just screaming for trouble
>> and hassle, IMHO.
>
> One thing is backwards compatibility. Users are already expecting that
> cefpython API returns bytes strings in Python 2.7.

Ah, yes. That is a rather unfortunate design that can't be changed without
breaking basically all code there is. What you have done here is that you
have moved the burden of doing the utf8<->unicode conversion to the user
side, outside of your library. So, all those little helper functions that
you have now written for Python 3 support already existed, in pretty much
all application code that uses your library.

You should consider increasing the major version of your package at some
point and just getting these things straight.


> I'm not sure if this is
> a good idea to make such a major change now. It could break code in user
> apps. This would also require some singificant time for changes and testing
> it thoroughly.

Sure. On the upside, Unicode is much easier to get right when you use it
than when you avoid it. Plus, you already have that code from your Py3
port, don't you? Anyone who wants to port their code to Py3 that uses your
library will have to do whatever is needed to port it to the new interface.


>> Obviously, it depends also on what you are doing with the content of these
>> strings, but if you are passing them into Python space at some point,
>> you'll want to decode them anyway, so why not do it right at the border to
>> CEF?
>
> Until now, it wasn't really needed to convert these strings to unicode
> strings, but this is probably I haven't been working much with languages
> other than English. I still haven't received too much demand from users for
> automatic unicode support. And I have this concern of breaking backwards
> compatibility, so I haven't yet done too much into that direction. In
> python 3 all should work great though.

Makes me even more convinced that breaking backwards compatibility is a
good idea in this case. Going all the way and doing it also for Py2 will
help you clean up your code.


>>> cdef void PyToCefString(
>>> py_string pyString,
>>> CefString& cefString
>>> ) except *:
>>> if PY_MAJOR_VERSION < 3:
>>> if type(pyString) == unicode:
>>
>> What about subtypes?
>
> Ahhh ;-) I should probably check with isinstance(pyString, unicode)?
> And the same for "type(pyString) == str", should become
> isinstance(pyString, str)?

Right. And Cython will translate them into fast type checking functions for
you.

Stefan

Czarek Tomczak

unread,
Jan 9, 2014, 2:42:27 PM1/9/14
to cython...@googlegroups.com, stef...@behnel.de
Thank you again Stefan for all the thorough comments.

Best regards,
Czarek

Chris Barker

unread,
Jan 9, 2014, 4:01:11 PM1/9/14
to cython-users
On Thu, Jan 9, 2014 at 9:52 AM, Czarek Tomczak <czarek....@gmail.com> wrote:

Why not unicode in both cases? You are either ANSI-only or you are unicode -- and with Chrome, I can't imagine you could count on ANSI only for anything. So why not unicode everywhere?

One reason is backwards compatibility. User apps might break as they are already imply that cefpython strings in Py27 are bytes.

so py27 users are getting utf-8 byte strings? Yow!
 
and there wasn't much demand for a better unicode support yet, so probably this isn't high priority.

really? that surprises me -- I'm very much an English-only speaker,  but I still need special symbols, etc, and unicode is better way to deal with those. I"d be whining if I had to encode/decode stuff goin in and out of CEF.

It may be that:

a lot of your users are doing ascii only and haven't really noticed, but things will break/be weird at some point.

or

your users are appreciative of your fabulous library, and are not prone to complain about having to do some hand-encoding/encoding in their code!

Chrome uses ICU library for unicode support.

nice -- they all you should need are two functions: to convert to/from python unicode objects.

But I agree -- changing the API now is unfortunate, and hard to decide when to do it.

If you have a way to poll your users, it might be worth finding out what people are doing and what they would want.
 
-Chris

Reply all
Reply to author
Forward
0 new messages