coding maxstar

19 views
Skip to first unread message

Neal Becker

unread,
Nov 9, 2012, 7:26:17 AM11/9/12
to num...@googlegroups.com
My problem is to code maxstar function.  For 2 arguments, it is:

def maxstar2 (a, b):
    return max (a, b ) + log1p (exp (-abs (a - b)))

What I have now, is using 'ndarray' 

to convert the corresponding c++ code into a binary ufunc.

The way this would be applied is to pass 2 large vectors, and the maxstar2 would be applied to the pairs of elements of these vectors.

Ultimately, it is intended to operate any more than 2 inputs, but I already have recursive code in python based on maxstar2 which extends to arbitrary # of inputs, and for large vectors the python overhead is probably OK.

So I'm wondering if I can use numexpr here.  The above maxstar2 isn't usable directly - it has to be converted to a binary ufunc first (as written it only works on 2 scalars - not 2 vectors). 

Francesc Alted

unread,
Nov 9, 2012, 8:52:17 AM11/9/12
to num...@googlegroups.com
If I understand you correctly, you can evaluate the above expression as:

ne.evaluate('where(a>b, a, b) + log1p (exp (-abs (a - b)))')

Here it is an actual example:

In []: a = np.arange(1e7)

In []: b = a + 1

In []: time np.where(a>b, a, b) + np.log1p (np.exp (-np.abs (a - b)))
CPU times: user 1.11 s, sys: 1.20 s, total: 2.31 s
Wall time: 6.36 s
Out[]:
array([ 1.31326169e+00, 2.31326169e+00, 3.31326169e+00, ...,
9.99999831e+06, 9.99999931e+06, 1.00000003e+07])

In []: time ne.evaluate('where(a>b, a, b) + log1p (exp (-abs (a - b)))')
CPU times: user 0.55 s, sys: 0.12 s, total: 0.68 s
Wall time: 0.43 s
Out[]:
array([ 1.31326169e+00, 2.31326169e+00, 3.31326169e+00, ...,
9.99999831e+06, 9.99999931e+06, 1.00000003e+07])

[not using VML here]

--
Francesc Alted

Neal Becker

unread,
Nov 9, 2012, 9:07:19 AM11/9/12
to num...@googlegroups.com
Wow!  I'm impressed!  Even without MKL, I'm measuring:

c++ version:
%time test1()
CPU times: user 4.16 s, sys: 0.00 s, total: 4.16 s
Wall time: 4.15 s

numexpr version:
In [34]: %time test1()
CPU times: user 3.53 s, sys: 0.01 s, total: 3.53 s
Wall time: 1.87 s


The full function, which accepts n-ary args is:

def maxstar (*args):
    if len(args) == 1:
        return args[0]
    elif len(args) == 2:
        return maxstar2 (*args)
    else:
        return maxstar2 (
            maxstar (*args[:len(args)/2]),
            maxstar (*args[len(args)/2:])
            )

I'm guessing to just leave this function as pure python - nothing numexpr can do to help here.




--
Francesc Alted

--
You received this message because you are subscribed to the Google Groups "numexpr" group.
To post to this group, send email to num...@googlegroups.com.
To unsubscribe from this group, send email to numexpr+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/numexpr?hl=en.


Francesc Alted

unread,
Nov 9, 2012, 9:23:39 AM11/9/12
to num...@googlegroups.com
On 11/9/12 3:07 PM, Neal Becker wrote:
> Wow! I'm impressed! Even without MKL, I'm measuring:
>
> c++ version:
> %time test1()
> CPU times: user 4.16 s, sys: 0.00 s, total: 4.16 s
> Wall time: 4.15 s
>
> numexpr version:
> In [34]: %time test1()
> CPU times: user 3.53 s, sys: 0.01 s, total: 3.53 s
> Wall time: 1.87 s

Hmm, not sure why numexpr should be faster than the C++ version (just
looking at the CPU times, so the aggregated time using all cores). My
guess is that your CPU has two cores (so using a CPU with more cores you
can expect better speedup).

>
>
> The full function, which accepts n-ary args is:
>
> def maxstar (*args):
> if len(args) == 1:
> return args[0]
> elif len(args) == 2:
> return maxstar2 (*args)
> else:
> return maxstar2 (
> maxstar (*args[:len(args)/2]),
> maxstar (*args[len(args)/2:])
> )
>
> I'm guessing to just leave this function as pure python - nothing
> numexpr can do to help here.

Nah, I don't think trying to use numexpr here is worth the effort.

--
Francesc Alted

Neal Becker

unread,
Nov 9, 2012, 10:10:00 AM11/9/12
to num...@googlegroups.com
you're correct - it seems numexpr was faster than c++ because the c++ using numarray ufunc must introduce some overhead.

I wrote a c++ version that iterates over the arrays using iterators directly, and it is much faster.  Also, this time I made sure to use 1 core in both cases.

The c++ version:

In [14]: time test1 (maxstar2)
CPU times: user 9.04 s, sys: 0.01 s, total: 9.05 s
Wall time: 9.07 s

The ne version:

In [16]: time test1 (maxstar2_ne)
CPU times: user 15.12 s, sys: 0.05 s, total: 15.16 s
Wall time: 15.19 s




--
Francesc Alted

Francesc Alted

unread,
Nov 9, 2012, 10:15:18 AM11/9/12
to num...@googlegroups.com
On 11/9/12 4:10 PM, Neal Becker wrote:
> you're correct - it seems numexpr was faster than c++ because the c++
> using numarray ufunc must introduce some overhead.
>
> I wrote a c++ version that iterates over the arrays using iterators
> directly, and it is much faster. Also, this time I made sure to use 1
> core in both cases.
>
> The c++ version:
>
> In [14]: time test1 (maxstar2)
> CPU times: user 9.04 s, sys: 0.01 s, total: 9.05 s
> Wall time: 9.07 s
>
> The ne version:
>
> In [16]: time test1 (maxstar2_ne)
> CPU times: user 15.12 s, sys: 0.05 s, total: 15.16 s
> Wall time: 15.19 s

Ah yes, that makes more sense indeed. numexpr is always going to add
some overhead due to its internal VM. Hmm, Numba follows another
approach (leverages the LLVM infrastructure) and may be it can help you too.

--
Francesc Alted

Reply all
Reply to author
Forward
0 new messages