wchar_t
buffer. You'll need to ensure you deallocate the buffer once it's no longer being used though, unlike PyUnicode_AsUnicode()
.--
---
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cython-users/CALGmxELPQjMPWA8-AvrHugEFkGPj7ti0DKa%2BcZ1Vkcf_kDNj1g%40mail.gmail.com.
I think in the past we've recommended:
wchar_str = PyUnicode_AsWideCharString(unicodeobj, NULL)# use wchar_str ...
PyMem_Free(wchar_str)
Python seems to use that quite a bit internally for handling paths (cpython/modules/getpath.c, cpython/Python/fileutils.c) and Windows registry keys (cpython/PC/winreg.c). However, I'm not absolutely sure what it does encoding-wise.
I imagine your UTF-16 bytes scheme would also be fine.
To view this discussion on the web visit https://groups.google.com/d/msgid/cython-users/54a68312-8a35-45ac-b784-b605215b70d1%40d-woods.co.uk.
But I never quite got it to work. This is what I tried:IF UNAME_SYSNAME == 'Windows':
cdef bytes bytes_flag = "wb".encode('utf-16')
cdef bytes bytes_filepath = file_path.encode('utf-16')
fp = _wfopen(<wchar_t*> bytes_filepath, <wchar_t*> bytes_flag)
^
------------------------------------------------------------
py_gd\py_gd.pyx:82:48: Python objects cannot be cast to pointers of primitive typesand that's the Cython error -- so how do I get the pointer to the underlying array? memoryview? buffer?
We need to update the docs -- they are pretty old, still reference Python2, and also recommend the now deprecated API :-(
An update to the docs would definitely be helpful.
I wonder if it's worth putting something directly into Cython for this so we don't all have to figure out on our own.
The main reason that's hard to do automatically in Cython is
because the lifetimes are no longer tied to a Python object so
Cython would have to work out when to release the memory. With
`bytes` -> `char*` and `unicode` -> `Py_UNICODE*` the
storage is owned internally by the Python object.
Obviously a lot of the time it just needs to live for the
duration of the statement it's in, but some of the time users will
stash the data away for later.
But I never quite got it to work. This is what I tried:
IF UNAME_SYSNAME == 'Windows':
cdef bytes bytes_flag = "wb".encode('utf-16')
cdef bytes bytes_filepath = file_path.encode('utf-16')
fp = _wfopen(<wchar_t*> bytes_filepath, <wchar_t*> bytes_flag)
^
------------------------------------------------------------
py_gd\py_gd.pyx:82:48: Python objects cannot be cast to pointers of primitive types
I'd cast to `char*` (to get the underlying data using Cython's predefined conversion) then to `wchar*`:
fp = _wfopen(<wchar_t*><char*>bytes_filepath, <wchar_t*><char*>bytes_flag)
(note that Cython seems happy casting a bytes object to a char*, why not a wchar_t ?)
Because `char*` is always right - every `bytes` object holds a
`char*` array (because underneath it's always just an array of C
chars), but reinterpreting it as `wchar_t*` may or may not make
sense depending on the encoding. If you encoded it as ascii then
`wchar_t*` would be wrong.
I wonder if it's worth putting something directly into Cython for this so we don't all have to figure out on our own.The main reason that's hard to do automatically in Cython is because the lifetimes are no longer tied to a Python object so Cython would have to work out when to release the memory. With `bytes` -> `char*` and `unicode` -> `Py_UNICODE*` the storage is owned internally by the Python object.
But I never quite got it to work. This is what I tried:
IF UNAME_SYSNAME == 'Windows':
cdef bytes bytes_flag = "wb".encode('utf-16')
cdef bytes bytes_filepath = file_path.encode('utf-16')
fp = _wfopen(<wchar_t*> bytes_filepath, <wchar_t*> bytes_flag)
^
------------------------------------------------------------
py_gd\py_gd.pyx:82:48: Python objects cannot be cast to pointers of primitive types
I'd cast to `char*` (to get the underlying data using Cython's predefined conversion) then to `wchar*`:
fp = _wfopen(<wchar_t*><char*>bytes_filepath, <wchar_t*><char*>bytes_flag)
(note that Cython seems happy casting a bytes object to a char*, why not a wchar_t ?)
Because `char*` is always right - every `bytes` object holds a `char*` array (because underneath it's always just an array of C chars), but reinterpreting it as `wchar_t*` may or may not make sense depending on the encoding. If you encoded it as ascii then `wchar_t*` would be wrong.
I'd cast to `char*` (to get the underlying data using Cython's predefined conversion) then to `wchar*`:
fp = _wfopen(<wchar_t*><char*>bytes_filepath, <wchar_t*><char*>bytes_flag)
I agree that opening a file
from a Python path string in C shouldn't be as hard as it is. There should
be at least a FAQ entry for this.
I could imagine having a
cython.fopen(path: str, mode: str) -> FILE*
that does the right thing on different platforms, at C compile time. It
would use the wchar APIs for the file path on Windows (and ASCII encoding
for the 'mode') and encode the file path to the local file system encoding
(which usually is but may not always be UTF-8) on *nix systems. Sounds
doable. Someone out there probably already has the code for this and could
contribute it.
I also wonder if this shouldn't be in CPython's C-API (as well?). Seems
worth filing a ticket on their side.
BTW, while looking up the details, I noticed that fopen() also supports
UTF-8 encoded file paths on Windows, see the section on Unicode support here:
https://learn.microsoft.com/de-de/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170
That might actually be the easiest way to handle this, just append
", ccs=UTF-8"
to the mode if you're on Windows and the encode the file path to UTF-8
normally.
Anyone up for writing a FAQ entry on this?
https://github.com/cython/cython/blob/master/docs/src/userguide/faq.rst
Stefan
--
---
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cython-users/4ebd3651-e138-45ca-9ce5-bd9b4aa71e5e%40behnel.de.
A slight tangent to the main point of discussion but the other thing I believe you can do is:
1. open the file in Python
2. use PyObject_AsFileDescriptor
to get a file descriptor integer from the
file
3. use the POSIX fdopen
to get a
FILE* from the file descriptor integer.
That means you can leave the opening of the file and handling the encoding in Python, but still get the C file pointer.
I believe scipy do it:
A slight tangent to the main point of discussion but the other thing I believe you can do is:
1. open the file in Python
2. usePyObject_AsFileDescriptor
to get a file descriptor integer from the file
3. use the POSIXfdopen
to get a FILE* from the file descriptor integer.
That means you can leave the opening of the file and handling the encoding in Python, but still get the C file pointer.
fdopen
to get a FILE* from the file descriptor integer.--
---
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cython-users/a6ccf438-cb14-4fde-915b-2ba9297e7748%40d-woods.co.uk.
BTW, while looking up the details, I noticed that fopen() also supports
UTF-8 encoded file paths on Windows, see the section on Unicode support here:
https://learn.microsoft.com/de-de/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170
That might actually be the easiest way to handle this, just append
", ccs=UTF-8"
to the mode if you're on Windows and the encode the file path to UTF-8
normally.
Anyone up for writing a FAQ entry on this?
https://github.com/cython/cython/blob/master/docs/src/userguide/faq.rst
Stefan
--
---
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cython-users/4ebd3651-e138-45ca-9ce5-bd9b4aa71e5e%40behnel.de.
'Chris Barker' via cython-users schrieb am 14.03.24 um 20:43:
> Is there no cPython API function for "get a FILE* from an already open
> file"?
I think the main issue here is that, in many cases, files opened by Python
are not plain file objects but some kind of stream these days.
For the simple (bytes) cases, PyObject_AsFileDescriptor seems as good as it
gets.