[Numpy-discussion] Array concatenation performance

John Porter

unread,

Jul 15, 2010, 5:54:53 AM7/15/10

to numpy-di...@scipy.org

Has anyone got any advice about array creation. I've been using numpy
for a long time and have just noticed something unexpected about array
concatenation.

It seems that using numpy.array([a,b,c]) is around 20 times slower
than creating an empty array and adding the individual elements.

Other things that don't work well either:
numpy.concatenate([a,b,c]).reshape(3,-1)
numpy.concatenate([[a],[b],[c]))

Is there a better way to efficiently create the array ?

See the following snippet:
---------------------------------------
import time
import numpy as nx
print 'numpy version', nx.version.version
t = time.time()
# test array
a = nx.arange(1000*1000)
print 'a ',time.time()-t
t = time.time()
# create array in the normal way..
b0 = nx.array([a,a,a])
print 'b0',time.time()-t
t = time.time()
# create using empty array
b1 = nx.empty((3,)+(a.shape))
b1[0] = a
b1[1] = a
b1[2] = a
print 'b1',time.time()-t
print nx.all((b0==b1))
-----------------------------------------
Produces the output:
  numpy version 1.3.0
  a 0.0019519329071
  b0 0.286643981934
  b1 0.0116579532623
  equal True
_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

j...@heliotis.ch

unread,

Skipper Seabold

unread,

Jul 15, 2010, 9:41:49 AM7/15/10

to Discussion of Numerical Python

On Thu, Jul 15, 2010 at 5:54 AM, John Porter <jpo...@cambridgesys.com> wrote:
> Has anyone got any advice about array creation. I've been using numpy
> for a long time and have just noticed something unexpected about array
> concatenation.
>
> It seems that using numpy.array([a,b,c]) is around 20 times slower
> than creating an empty array and adding the individual elements.
>
> Other things that don't work well either:
> numpy.concatenate([a,b,c]).reshape(3,-1)
> numpy.concatenate([[a],[b],[c]))
>
> Is there a better way to efficiently create the array ?
>

What was your timing for concatenate? It wins for me given the shape of a.

In [1]: import numpy as np

In [2]: a = np.arange(1000*1000)

In [3]: timeit b0 = np.array([a,a,a])
1 loops, best of 3: 216 ms per loop

In [4]: timeit b1 = np.empty(((3,)+a.shape)); b1[0]=a;b1[1]=a;b1[2]=a
100 loops, best of 3: 19.3 ms per loop

In [5]: timeit b2 = np.c_[a,a,a].T
10 loops, best of 3: 30.5 ms per loop

In [6]: timeit b3 = np.concatenate([a,a,a]).reshape(3,-1)
100 loops, best of 3: 9.33 ms per loop

Skipper

John Porter

unread,

Jul 15, 2010, 11:05:55 AM7/15/10

to Discussion of Numerical Python

You're right - I screwed up the timing for the one that works...
It does seem to be faster.

I've always just built arrays using nx.array([]) in the past though
and was surprised
that it performs so badly.

Fabrice Silva

unread,

Jul 15, 2010, 10:58:37 AM7/15/10

to numpy-di...@scipy.org

Le jeudi 15 juillet 2010 à 16:05 +0100, John Porter a écrit :
> You're right - I screwed up the timing for the one that works...
> It does seem to be faster.
>
> I've always just built arrays using nx.array([]) in the past though
> and was surprised that it performs so badly.

Can anyone provide an explanation (or a pointer) to such differences ?
Thanks

Skipper Seabold

unread,

Jul 15, 2010, 11:15:13 AM7/15/10

to Discussion of Numerical Python

One more.

In [26]: timeit b4 = np.vstack((a,a,a))
100 loops, best of 3: 9.46 ms per loop

Skipper

John Porter

unread,

Jul 15, 2010, 12:23:20 PM7/15/10

to Discussion of Numerical Python

ok - except that vstack doesn't seem to work for 2d arrays (without a
reshape) which is what I'm actually after.

The difference between the numpy.concatenate version and numpy.array is fairly
impressive though, I get a factor of > 50x. It would be nice to know why.

Skipper Seabold

unread,

Jul 15, 2010, 12:33:02 PM7/15/10

to Discussion of Numerical Python

On Thu, Jul 15, 2010 at 12:23 PM, John Porter <jpo...@cambridgesys.com> wrote:
> ok - except that vstack doesn't seem to work for 2d arrays (without a
> reshape) which is what I'm actually after.
>

Ah, then you might want hstack. There is also a column_stack and
row_stack if you need to go that route.

> The difference between the numpy.concatenate version and numpy.array is fairly
> impressive though, I get a factor of > 50x. It would be nice to know why.
>

Sorry, I don't have any deep insight here. There is probably just
overhead in the array creation. Consider if you try to use hstack and
company on lists.

In [1]: import numpy as np

In [2]: a = np.arange(1000*1000)

In [3]: b = a.tolist()

In [4]: timeit b0 = np.array((a,a,a))
1 loops, best of 3: 217 ms per loop

In [5]: timeit b1 = np.vstack((b,b,b))
1 loops, best of 3: 380 ms per loop

Skipper

Sturla Molden

unread,

Jul 15, 2010, 1:10:15 PM7/15/10

to Discussion of Numerical Python

Sturla Molden

unread,

Jul 15, 2010, 1:38:17 PM7/15/10

to numpy-di...@scipy.org

Sorry for the previous mispost.

This thread remids me of something I've though about for a while: Would
NumPy benefit from an np.ndarraylist subclass of np.ndarray, that has an
O(1) amortized append like Python lists? (Other methods of Python lists
(pop, extend) would be worth considering as well.) Or will we get the
same performance by combining a Python list and ndarray?

Sturla

Benjamin Root

unread,

Jul 15, 2010, 2:47:09 PM7/15/10

to Discussion of Numerical Python

On Thu, Jul 15, 2010 at 12:38 PM, Sturla Molden <stu...@molden.no> wrote:

Sorry for the previous mispost.

This thread remids me of something I've though about for a while: Would
NumPy benefit from an np.ndarraylist subclass of np.ndarray, that has an
O(1) amortized append like Python lists? (Other methods of Python lists
(pop, extend) would be worth considering as well.) Or will we get the
same performance by combining a Python list and ndarray?

Another idea that I always thought was interesting comes from the C++ Standard Library. The .reserve() function call for the vector class, which would go ahead and allocate the specified length, but the array length is not set to that. It was useful in the case where you have a decent idea of the expected size of the array, but you still have to grow the array iteratively. Don't know how well that would fit into this context, but I thought I ought to mention that.

Ben Root

Anne Archibald

unread,

Jul 15, 2010, 3:12:28 PM7/15/10

to Discussion of Numerical Python

On 15 July 2010 13:38, Sturla Molden <stu...@molden.no> wrote:
> Sorry for the previous mispost.
>
> This thread remids me of something I've though about for a while: Would
> NumPy benefit from an np.ndarraylist subclass of np.ndarray, that has an
> O(1) amortized append like Python lists? (Other methods of Python lists
> (pop, extend) would be worth considering as well.) Or will we get the
> same performance by combining a Python list and ndarray?

This idea, an appendable ndarray, has been discussed before; the
conclusion was that yes, it's sometimes useful, and in fact I think
there was code written for it. The idea was an ndarray subclass that
allowed appending through realloc and simple indexing, but made copies
when slices were taken (so that realloc wouldn't move the memory out
from under it). It could be "frozen" to an ordinary ndarray when
appending was done.

To answer the OP's question, np.array is a do-what-I-mean function
that examines its argument and deduces the shape, size, and dtype for
the new array that it constructs. For example, if you pass it a python
list, it must walk through the list and examine the objects to find
the numeric dtype that contains them (e.g. integer real or complex);
if there are any peculiar objects in there it will construct an object
array. (I don't know whether it tries int() float() and complex() on
every object or whether it examines their type information.) In any
case, all this imposes some bookkeeping overhead that is unnecessary
for np.concatenate. For three-dimensional arrays you might try
np.dstack, by the way, or you can concatenate along a new axis (not
that reshaping is usually expensive):
In [1]: a = np.random.randn(4,5)

In [2]: b = a[np.newaxis,...]

In [3]: np.concatenate([b,b,b], axis=0).shape
Out[3]: (3, 4, 5)

Anne

Christopher Barker

unread,

Jul 15, 2010, 7:31:52 PM7/15/10

to Discussion of Numerical Python

Sturla Molden wrote:
> This thread remids me of something I've though about for a while: Would
> NumPy benefit from an np.ndarraylist subclass of np.ndarray, that has an
> O(1) amortized append like Python lists?

yes, it would.

> (Other methods of Python lists
> (pop, extend) would be worth considering as well.) Or will we get the
> same performance by combining a Python list and ndarray?

I don't think so.

In general, the advice is to accumulate in a Python list, then turn it
into an array. And this works pretty well.

However, Python lists hold python objects, so it's bit inefficient, at
least in terms of memory use.

I wrote such an accumulating array a while back. It uses a numpy array
internally to store the data - oversized at first, and expanded as need
by with ndarray.resize()

The result was that is was a bit slower than the list approach in most
cases -- though should use less memory, and had an advantage is
accumulating things more complex numdata types.

I've thought about trying to write it in C or Cython, rather than Python
-- I think the extra function call overhead on indexing and all may be
fairly expensive. There may be other ways to optimize it as well.

It's also only 1-d, though it wouldn't be hard to make it n-d, but
expandable only on the first dimension.

I've enclosed my current draft, along with some code to test and help
profile it.

Consider it a prototype, but I've used it successfully for a few things.

-Chris

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov

Accumulator.zip

Sturla Molden

unread,

Jul 16, 2010, 7:42:55 AM7/16/10

to Discussion of Numerical Python

Christopher Barker skrev:

>
> However, Python lists hold python objects, so it's bit inefficient, at
> least in terms of memory use.
>

I guess that is mainly in terms of memory use, as the Python (scalar)
objects must be created for the call to append. np.array([]) can also be
inefficient, as Anne explained yesterday, but an appendable ndarray
would not have that problem. But I don't know how important it is to
save a few milliseconds worth of cpu time for this. At least not enough
to convince me to spend my time on it. The memory issue is more
important though. But arrays with O(1) amortized append are memory
inefficient on their own. So it might make more sence to save a list of
equally long short buffers (e.g. ndarrays), and flatten to a contigous
buffer when needed. At least that is what I have been doing when needing
to gradually build up an ndarray. The worst case memory use will be like
this, amortizing the Python object overhead for large N:

list of ndarrays:
N*elsize when building up
2*N*elsize when compacting
N*elsize when used after compacting

appendable array:
2*N*elsize when building up
2*N*elsize when used before compacting
3*N*elsize when compacting
N*elsize when used after compacting

Personally I am not convinced about an appendable array, neither speed
nor memory wise, I just wanted to hear what other people think. And
there is one more thing that makes me sceptical: We cannot use realloc()
to grow an appendable array. That is because it would segfault if there
are array views in use (the views might suddenly get a dangling data
pointer). And prohibiting realloc(), the performance of an appendable
array might not be fantastic (still O(1) on average though). But as Anne
said, this has probably been discussed in details before.

Christopher Barker

unread,

Jul 16, 2010, 3:33:44 PM7/16/10

to Discussion of Numerical Python

Sturla Molden wrote:
> Christopher Barker skrev:
>> However, Python lists hold python objects, so it's bit inefficient, at
>> least in terms of memory use.
>>
> I guess that is mainly in terms of memory use, as the Python (scalar)
> objects must be created for the call to append. np.array([]) can also be
> inefficient, as Anne explained yesterday, but an appendable ndarray
> would not have that problem. But I don't know how important it is to
> save a few milliseconds worth of cpu time for this.

not very.

> At least not enough
> to convince me to spend my time on it. The memory issue is more
> important though.

I agree.

> But arrays with O(1) amortized append are memory
> inefficient on their own.

well, yes, but by the amount that you buffer them -- I've seen reference
to methods that allocate twice as much memory as needed on each
expansion, so that would be a maximum of a 2X hit.

In poking around the python code, it looks like lists and array.arrays
over allocate by 1/16 (once the arrays are large, anyway). In my code,
It defaults to over allocating by 1/4. In my very limited testing, I
didn't see a performance gain by allocating more than that. So it's a
maximum of 25% memory hit -- and it's user configurable.

> So it might make more sence to save a list of
> equally long short buffers (e.g. ndarrays), and flatten to a contigous
> buffer when needed.

I did that in a C extension I wrote a while back for reading big text
files. I found that the size of the buffer hardly mattered to performance.

In my newer code, I found it a lot easier to over-allocate, and even if
you over allocate by 1/10 or 1/16, performance is still pretty good.

> Personally I am not convinced about an appendable array, neither speed
> nor memory wise, I just wanted to hear what other people think.

I still think it's a good idea -- though clearly not important enough
for me to have gone further with my code (though I am using it now).

For me, the reasons are memory and the ability to deal natively with
numpy data types that don't map directly to Python types. I think there
could be a real benefit to loadtxt() for instance.

> there is one more thing that makes me sceptical: We cannot use realloc()
> to grow an appendable array. That is because it would segfault if there
> are array views in use (the views might suddenly get a dangling data
> pointer). And prohibiting realloc(), the performance of an appendable
> array might not be fantastic

I used ndarray.resize, which I'm pretty sure uses realloc(). And that's
why I didn't subclass ndarray, but rather have an ndarray as the buffer,
which never gets a view on it. When slicing, a copy is returned.

But you're right this limits the utility, as it's not a first class
citizen of numpy -- you still use it like you do list -- accumulate in
it, then convert to a regular numpy array for further processing.

I'd love to see someone with more C chops than me pick this up -- but
Sturla's right -- it's hard to find the motivation for the relatively
small gains.

-Chris

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov

Reply all

Reply to author

Forward