Similar to classes, the template parameters follow the name:
https://sage.math.washington.edu:8091/hudson/job/cython-docs/doclinks/1/src/userguide/wrapping_CPlusPlus.html#templates
I added an entry. Could you provide a code snippet that shows what you were
doing? Just to be sure it's really a problem in your code and not a wrong
assumption in Cython. Optimisations shouldn't break code.
TypeError: Expected str, got unicode
cython_directives={
# Any conversion to unicode must be explicit using .decode().
"c_string_type": "bytes",
"c_string_encoding": "utf-8",
},
cefpython.cpp(84198) : warning C4244: 'argument' : conversion from 'int64' to 'long', possible loss of data
cefpython.cpp(84200) : warning C4244: 'argument' : conversion from 'int64' to 'unsigned long', possible loss of data
cefpython.cpp(84206) : warning C4244: 'argument' : conversion from 'int64' to 'long', possible loss of data
cefpython.cpp(84328) : warning C4244: 'argument' : conversion from '__int64' to 'long', possible loss of data
cefpython.cpp(84330) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned long', possible loss of data
cefpython.cpp(84336) : warning C4244: 'argument' : conversion from '__int64' to 'long', possible loss of data
cefpython.cpp(84454) : warning C4244: 'argument' : conversion from 'uint64' to 'long', possible loss of data
cefpython.cpp(84456) : warning C4244: 'argument' : conversion from 'uint64' to 'unsigned long', possible loss of data
cefpython.cpp(84462) : warning C4244: 'argument' : conversion from 'uint64' to 'long', possible loss of data\
static CYTHON_INLINE PyObject* __Pyx_PyInt_From_int64(int64 value) {
const int64 neg_one = (int64) -1, const_zero = 0;
const int is_unsigned = neg_one > const_zero;
if (is_unsigned) {
if (sizeof(int64) < sizeof(unsigned long)) {
return PyInt_FromLong(value); <<< warning on line 84198
} else if (sizeof(int64) <= sizeof(unsigned long)) {
return PyLong_FromUnsignedLong(value); <<< warning on line 84200
} else if (sizeof(int64) <= sizeof(unsigned long long)) {
return PyLong_FromUnsignedLongLong(value);
}
} else {
if (sizeof(int64) <= sizeof(long)) {
return PyInt_FromLong(value);
} else if (sizeof(int64) <= sizeof(long long)) {
return PyLong_FromLongLong(value);
}
}
{
int one = 1; int little = (int)*(unsigned char *)&one;
unsigned char *bytes = (unsigned char *)&value;
return _PyLong_FromByteArray(bytes, sizeof(int64),
little, !is_unsigned);
}
}
2) What does your code do with this value afterwards?
ctypedef object py_string
cdef JavascriptCallback CreateJavascriptCallback(py_string functionName):
Debug("Created javascript callback, callbackId=%s, functionName=%s" % \
(callbackId, functionName))
TypeError: Expected str, got unicode
> cython_directives={
>> # Any conversion to unicode must be explicit using .decode().
>> "c_string_type": "bytes",
>> "c_string_encoding": "utf-8",
>> },
Hmm, this is a funny setup. I wonder what these two actually do in that
combination. Is there a reason why you added them?
> cython_directives={
>> # Any conversion to unicode must be explicit using .decode().
>> "c_string_type": "bytes",
>> "c_string_encoding": "utf-8",
>> },
Hmm, this is a funny setup. I wonder what these two actually do in that
combination. Is there a reason why you added them?This is for backwards compatibility. The code runs on both Python 2.7 and Python 3, there are many conditions in code that check python version and act accordingly. In one of previous cython versions c string types were bytes on Python 2.7 by default, and unicode on Python 3. It all changed in one of cython releases, lots of errors started appearing, because the code was taking for granted that c string types are bytes in Python 2.7. So the fix was either to modify 20 files or to add these cython directives to setup. The latter option was chosen.
Let me guess. The "Debug" function is defined as
def Debug(str input): ...
> Or should I define py_string as basestring?
Depends. Again: could you provide the complete code snippet, *please* ?
And it would be even better if you could provide a code snippet that is so
complete that I could even run it through the compiler.
import json
import Cython
print("Cython version = %s" % Cython.__version__)
ctypedef object py_string
g_debug = True
g_debugFile = "debug.log"
cpdef object Debug(str msg):
if not g_debug:
return
msg = "cefpython: "+str(msg)
print(msg)
if g_debugFile:
try:
with open(g_debugFile, "a") as file:
file.write(msg+"\n")
except:
print("cefpython: WARNING: failed writing to debug file: %s" % (
g_debugFile))
cpdef object test():
cdef py_string cefPythonMessageHash = "####cefpython####"
cdef bytes messageString = <bytes>"""####cefpython####
{"what":"javascript-callback","callbackId":123,
"frameId":123,"functionName":"xxx"}"""
cdef py_string jsonData = messageString[len(cefPythonMessageHash):]
print("type of jsonData = %s" % type(jsonData))
cdef object message = json.loads(jsonData)
print("type of message[functionName] = %s" % type(message["functionName"]))
msg = "Created javascript callback, callbackId=%s, functionName=%s" % \
(message["callbackId"], message["functionName"])
Debug(msg)
return None
C:\cefpython\json-loads-bug>call python "test2.py"
Cython version = 0.20b1
type of jsonData = <type 'str'>
type of message[functionName] = <type 'unicode'>
Traceback (most recent call last):
File "test2.py", line 2, in <module>
test.test()
File "test.pyx", line 23, in test.test (test.cpp:1377)
cpdef object test():
File "test.pyx", line 34, in test.test (test.cpp:1314)
Debug(msg)
TypeError: Expected str, got unicode
>> This is for backwards compatibility. The code runs on both Python 2.7 and
>> Python 3, there are many conditions in code that check python version and
>> act accordingly. In one of previous cython versions c string types were
>> bytes on Python 2.7 by default, and unicode on Python 3.
Sorry, what? When was that?
>> It all changed in
>> one of cython releases, lots of errors started appearing, because the code
>> was taking for granted that c string types are bytes in Python 2.7.
And they still map to them, in both Py2 and Py3, unless you override the
mapping explicitly using the above two config options.
I think it means that C strings turn into bytes on conversion to Python
objects and that Unicode strings turn into UTF-8 encoded C strings.
Is that what you wanted?
I'm also not sure about the behaviour when you do something like
"<unicode>some_c_string", may or may not work.
> In
> python 2.7 unicode file paths are broken (there was a discussion about that
> some time ago on this group). I must use utf-8 bytes. What other options do
> I have?
Be explicit in your code about when you encode and decode. Always.
cdef py_string CefToPyString( |
ConstCefString& cefString): |
cdef cpp_string cppString |
if cefString.empty(): |
return "" |
IF UNAME_SYSNAME == "Windows": |
cdef wchar_t* wcharstr = <wchar_t*> cefString.c_str() |
return WidecharToPyString(wcharstr) |
ELSE: |
cppString = cefString.ToString() |
if PY_MAJOR_VERSION < 3: |
return <bytes>cppString |
else: |
return <unicode>((<bytes>cppString).decode( |
g_applicationSettings["string_encoding"], |
errors=BYTES_DECODE_ERRORS)) |
cdef void PyToCefString( |
py_string pyString, |
CefString& cefString |
) except *: |
if PY_MAJOR_VERSION < 3: |
if type(pyString) == unicode: |
pyString = <bytes>(pyString.encode( |
g_applicationSettings["string_encoding"], |
errors=UNICODE_ENCODE_ERRORS)) |
else: |
# The unicode type is not defined in Python 3. |
if type(pyString) == str: |
pyString = <bytes>(pyString.encode( |
g_applicationSettings["string_encoding"], |
errors=UNICODE_ENCODE_ERRORS)) |
cdef cpp_string cppString = pyString |
# Using cefString.FromASCII() will result in DCHECK failures |
# when a non-ascii character is encountered. |
cefString.FromString(cppString) |
>> ctypedef object py_string
This typedef looks a bit funny, but I guess you're only doing that in order
to make it easier to change it to an exact type later?
>> print(msg)
>> if g_debugFile:
>> try:
>> with open(g_debugFile, "a") as file:
>> file.write(msg+"\n")
ISTM that what you want in this file is text, so why not open it in text
(i.e. Unicode) mode with a proper encoding?
Note that print() isn't safe for arbitrary output, though, unless you also
control the system encoding of sys.stdout.
if type(errorMsg) == bytes:
errorMsg = errorMsg.decode(encoding=appEncoding, errors="replace")
try:
with codecs.open(errorFile, mode="a", encoding=appEncoding) as fp:
fp.write("\n[%s] %s\n" % (
time.strftime("%Y-%m-%d %H:%M:%S"), errorMsg))
except:
print("cefpython: WARNING: failed writing to error file: %s" % (
errorFile))
>> cdef py_string jsonData = messageString[len(cefPythonMessageHash):]
Here, you are mixing str and bytes. That is generally a bad idea. You
should make it clear in your code what you are processing, bytes or text,
and use the appropriate string type.
In Python 2.7 I use bytes strings by default, in Python 3 unicode strings.
Why not always work with unicode strings? I assume you are dealing with
text here, right? Wanting to support both cases in one code base, i.e.
Unicode strings and encoded byte strings, is just screaming for trouble and
hassle, IMHO.
Obviously, it depends also on what you are doing with the content of these
strings, but if you are passing them into Python space at some point,
you'll want to decode them anyway, so why not do it right at the border to CEF?
> ELSE:
> cppString = cefString.ToString()
> if PY_MAJOR_VERSION < 3:
> return <bytes>cppString
... but here you are returning bytes, although only in Py2, so this is
actually a "str". Casting it to <bytes> is ok, though, because it's
Py2-only code. Note that coercion to bytes is the default behaviour for
C/C++ strings, though, so my guess is that the cast is actually redundant.
> else:
> return <unicode>((<bytes>cppString).decode(
> g_applicationSettings["string_encoding"],
> errors=BYTES_DECODE_ERRORS))
And here you are returning a unicode string, but only in Py3, so this is a
"str" again. Fine. No need to cast it to <unicode>, though, that's a no-op
again.
Assuming that cppString is an actual C++ std::string, casting it to <bytes>
first is also redundant and costly. Instead, call .decode() on it directly,
Cython supports that.
I take it that this function is supposed to always return a "str" value,
both in Py2 and Py3. I already commented on this above.
..
> pyString = <bytes>(pyString.encode(
> g_applicationSettings["string_encoding"],
> errors=UNICODE_ENCODE_ERRORS))
Cython can generate more efficient code if you cast pyString instead of the
result, i.e.
(<unicode>pyString).encode(...)
> cdef void PyToCefString(
> py_string pyString,
> CefString& cefString
> ) except *:
> if PY_MAJOR_VERSION < 3:
> if type(pyString) == unicode:
What about subtypes?
> else:
> # The unicode type is not defined in Python 3.
But it's defined in Cython, so the following is dead code:
> if type(pyString) == str:
> pyString = <bytes>(pyString.encode(
> g_applicationSettings["string_encoding"],
> errors=UNICODE_ENCODE_ERRORS))
if PY_MAJOR_VERSION < 3:
if type(pyString) == unicode:
...
else:
# The unicode type is not defined in Python 3.
if type(pyString) == str:
...
if py2.7:
..
else: # py 3
if type == str # in py 3 str == unicode
> I prefer
> that all functions state explicitilly what types of parameters they accept.
> The reason for it is to have more static typing I guess, better error
> detection during compiling.
Fair enough. Note that bytes != str != unicode != basestring in Cython, though.
On Thu, Jan 9, 2014 at 8:56 AM, Czarek Tomczak <czarek....@gmail.com> wrote:In Python 2.7 I use bytes strings by default, in Python 3 unicode strings.Why not unicode in both cases? You are either ANSI-only or you are unicode -- and with Chrome, I can't imagine you could count on ANSI only for anything. So why not unicode everywhere?
What does the Chrome lib use for unicode strings in its C++ code? All you should need is a translator from python unicode to that -- probably a more or less one-line encode and decode function.
> else:
> # The unicode type is not defined in Python 3.
But it's defined in Cython, so the following is dead code:> if type(pyString) == str:
> pyString = <bytes>(pyString.encode(
> g_applicationSettings["string_encoding"],
> errors=UNICODE_ENCODE_ERRORS))No no no, there was a condition checking for python version earlier:
.......
Why not unicode in both cases? You are either ANSI-only or you are unicode -- and with Chrome, I can't imagine you could count on ANSI only for anything. So why not unicode everywhere?One reason is backwards compatibility. User apps might break as they are already imply that cefpython strings in Py27 are bytes.
and there wasn't much demand for a better unicode support yet, so probably this isn't high priority.