Compile-Time PY_MAJOR_VERSION?

253 views
Skip to first unread message

Brock Mendel

unread,
Sep 20, 2017, 9:19:11 PM9/20/17
to cython-users
Continuing with the theme of trying to implement some of pandas C code in .pyx files, take `get_c_string` from _libs/src/numpy_helper.h

```
PANDAS_INLINE char* get_c_string(PyObject* obj) {
#if PY_VERSION_HEX >= 0x03000000
    return PyUnicode_AsUTF8(obj);
#else
    return PyString_AsString(obj);
#endif
}
```

My attempt to do this in cython looks like:

```
from cpython.version cimport PY_MAJOR_VERSION
IF PY_MAJOR_VERSION >= 3:
    cdef char* get_c_string(object obj):
        return PyUnicode_AsUTF8(obj)
ELSE:
    cdef char* get_c_string(object obj):
        return PyString_AsString(obj)
```

This leads to the compile-time error `Compile-time name 'PY_MAJOR_VERSION' not defined`.

Is there something equivalent to PY_MAJOR_VERSION which _is_ defined at compile-time?

Robert Bradshaw

unread,
Sep 20, 2017, 11:35:25 PM9/20/17
to cython...@googlegroups.com
No, because there's no way to know at Cython compile time whether the
resulting C code will be used with Python 2 or Python 3.

But this is exactly the kind of thing that Cython handles for you
(more generally in fact). Just write

cdef char* get_c_string(object o):
return o

Though there's really no need to have this method at all because
Cython inserts it for you autmatically, just do the assignment where
you need it, e.g.

cdef char* c_string = o

or

call_method_taking_char_star(o)

and it'll do the right thing.

Stefan Behnel

unread,
Sep 21, 2017, 1:13:31 AM9/21/17
to cython...@googlegroups.com
Robert Bradshaw schrieb am 21.09.2017 um 05:34:
Almost. You still have to tell it which encoding to use:

http://docs.cython.org/en/latest/src/tutorial/strings.html#auto-encoding-and-decoding

Stefan

Nils Bruin

unread,
Sep 21, 2017, 10:59:30 AM9/21/17
to cython-users
There is an undocumented parameter to "cythonize":

With
cythonize(...,compile_time_env={'PY_VERSION_HEX':sys.hexversion},...)

you'll have a variable available as a cython compile-time variable that would give you the python version number of the python that is running cython. We use this in sage to compile a module that reaches into some CPython internals that have changed in recent version (see https://trac.sagemath.org/ticket/22305). For sge, cythonization happens on the target platform, but in general this need not be the case. If you do know your target version, you could of course inject that value.

Brock Mendel

unread,
Sep 21, 2017, 3:50:34 PM9/21/17
to cython-users
> Though there's really no need to have this method at all because 
> Cython inserts it for you autmatically, just do the assignment where 
> you need it, e.g. 
>
> cdef char* c_string = o 

In a perfect world I'd be happy to use that, but the pandas maintainers are insisting on micro-benchmarks for essentially anything touching cython or C.  I've managed to construct cython equivalents for nearly everything in numpy_helper.h but in most cases the generated C has RefNanny bits that hit performance just enough to earn the wrath of the reviewers.

Stefan Behnel

unread,
Sep 21, 2017, 5:35:25 PM9/21/17
to cython...@googlegroups.com
Brock Mendel schrieb am 21.09.2017 um 21:23:
>> Though there's really no need to have this method at all because
>> Cython inserts it for you autmatically, just do the assignment where
>> you need it, e.g.
>>
>> cdef char* c_string = o
>
> In a perfect world I'd be happy to use that, but the pandas maintainers are
> insisting on micro-benchmarks for essentially anything touching cython or
> C. I've managed to construct cython equivalents for nearly everything in
> numpy_helper.h
> <https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/numpy_helper.h> but
> in most cases the generated C has RefNanny bits that hit performance just
> enough to earn the wrath of the reviewers.

The refnanny is discarded completely in standard builds, so that's not it.

One thing that I could imagine is that Cython needs to do safety
refcounting in some cases that you could avoid in manually written code -
*if you are sure that it's not needed*.

These tiny CPU cycles can still add up to measurable time differences if
they are called thousands or millions of times.

OTOH, micro benchmarks only measure one tiny thing. Switching to a
higher-level language often works as an enabler for implementing
algorithmic improvements that would be too difficult to maintain in a
low-level language, But that obviously depends entirely on the existing
code and the problem at hand...

Stefan

Brock Mendel

unread,
Sep 21, 2017, 11:52:46 PM9/21/17
to cython-users
[Note to any readers who got here be googling for the original topic: this thread has strayed and is about to stray further]

> The refnanny is discarded completely in standard builds, so that's not it. 

Not sure I follow.

For a concrete example, pandas/_libs/src/util.pxd gets `is_integer_object` from `numpy_helper.h`, where it is defined as:

```
PANDAS_INLINE int is_integer_object(PyObject* obj) {
    return (!PyBool_Check(obj)) && PyArray_IsIntegerScalar(obj);
}
```

A first-pass implementation in cython is:

```
from cpython cimport PyBool_Check

cdef extern from "numpy/ndarrayobject.h":
    bint PyArray_IsIntegerScalar(obj)


cdef inline bint is_integer_object(object obj):
    return PyBool_Check(obj) and PyArray_IsIntegerScalar(obj) 
```

cythonizing this produces (after trimming comments):

```
static CYTHON_INLINE int __pyx_f_4pdsm_6tslibs_4util_is_integer_object(PyObject *__pyx_v_obj) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  int __pyx_t_2;
  __Pyx_RefNannySetupContext("is_integer_object", 0);

  __pyx_t_2 = ((!(PyBool_Check(__pyx_v_obj) != 0)) != 0);
  if (__pyx_t_2) {
  } else {
    __pyx_t_1 = __pyx_t_2;
    goto __pyx_L3_bool_binop_done;
  }
  __pyx_t_2 = (PyArray_IsIntegerScalar(__pyx_v_obj) != 0);
  __pyx_t_1 = __pyx_t_2;
  __pyx_L3_bool_binop_done:;
  __pyx_r = __pyx_t_1;
  goto __pyx_L0;

  /* function exit code */
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
```

Do the references to RefNannyFooBar here mean that this is not a "standard build"?  Declaring the function as `nogil` gets rid of the PyNanny references, but that requires some gymnastics that merit a separate thread.

Robert Bradshaw

unread,
Sep 22, 2017, 12:01:06 AM9/22/17
to cython...@googlegroups.com
Look at how these are #defined.

Note that you're missing a "not" on the PyBool_Check here.

> Declaring the function as `nogil` gets rid of the PyNanny
> references, but that requires some gymnastics that merit a separate thread.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "cython-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cython-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Brock Mendel

unread,
Sep 22, 2017, 11:45:15 AM9/22/17
to cython-users
> Look at how these are #defined
 
Got it, thanks.  I implicitly assumed that `__Pyx_RefNannyFinishContext()` had to be a function call.  Not a whole lot of C intution.

Looks like you've answered this exact question before: https://groups.google.com/forum/#!msg/cython-users/JsTM2NigkS4/HwAySfcmkyoJ As a representative of the ignorant public, we appreciate your patience.

> Note that you're missing a "not" on the PyBool_Check here.

Good catch.  Luckily that typo was just here and didn't make it into the branch.
Reply all
Reply to author
Forward
0 new messages