Go vs. Python: Speed and memory usage for list sort

4,738 views
Skip to first unread message

Reinhard Wobst

unread,
Dec 27, 2011, 12:15:04 PM12/27/11
to golang-nuts
Hi all,

originally I was going to compare memory usage and speed of sorting
with "native sort interface" (i.e., defining Len(), Swap() and Less()
for the object to sort), and with a generalized sort with less code
were items are copied to a slice of interfaces. This was to be be
compared with the analog in Python to demonstrate how fast Go is.

However, already the first result was so unexpected that I abandoned
further tests. You can find the simple program under

http://play.golang.org/p/bMebZh2sCS

- however, is does not run there since os.Getpid() is not defined on
the "play server" (and I must use the Linux pseudo device /dev/proc/
status to see the memory usage). The task is simple: A table with 100
columns and 10000 rows (i.e., 1 million cells) is filled with random
31bit integers and then sorted 100 times after column 0,1,...,99. The
same is done in Python.

Results:

Memory usage:
40 MB peak for Go program
62 MB peak for Python script

Execution time for sort:
49 sec for Go program
1.3 sec for Python script

The memory usage is already surprising since Python generally uses
much memory; here it creates an extra list of pairs for "key-sorting"
of lists. However, the Go program works with 4MB of data only but has
the 10fold peak memory usage. Why? The usage of slices can not be the
reason.

The performance, however, is simply shocking. In
http://groups.google.com/group/golang-nuts/browse_thread/thread/720ae9e4f6eddb2b/323bc8d35b8cfdf7
I made some considerations about sort efficiency - considering the
above results, they are useless yet. Only my remark "BTW, Python
implements
very sophisticated sorting algorithms" seems "more" true than I
thought :-(

I would be glad to see that I made some basic error ...

Reinhard

Han-Wen Nienhuys

unread,
Dec 27, 2011, 12:28:16 PM12/27/11
to Reinhard Wobst, golang-nuts
On Tue, Dec 27, 2011 at 3:15 PM, Reinhard Wobst <zwi...@gmx.de> wrote:
> Hi all,
>
> originally I was going to compare memory usage and speed of sorting
> with "native sort interface" (i.e., defining Len(), Swap() and Less()
> for the object to sort), and with a generalized sort with less code
> were items are copied to a slice of interfaces. This was to be be
> compared with the analog in Python to demonstrate how fast Go is.


..
type Row [COL]int32
type Table []Row
func (lst Table) Swap(i, j int)      { lst[j], lst[i] = lst[i], lst[j] }
..

is this really what you want? Every swap operation is copying 10000
int32s around.

Your program performance will be dominated by the cost of copying flat
arrays of data.

You could consider storing slices rather than 10000-large arrays.

--
Han-Wen Nienhuys
Google Engineering Belo Horizonte
han...@google.com

Message has been deleted

Gustavo Niemeyer

unread,
Dec 27, 2011, 12:38:24 PM12/27/11
to Reinhard Wobst, golang-nuts
> However, already the first result was so unexpected that I abandoned
> further tests. You can find the simple program under

You're comparing apples and supernovas. Print a sample of the Python
table. You're iterating 100 times switching rows up and down, without
sorting anything.

> The performance, however, is simply shocking. In

(...)


> I would be glad to see that I made some basic error ...

Be happy then. ;-)

--
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/plus
http://niemeyer.net/twitter
http://niemeyer.net/blog

-- I'm not absolutely sure of anything.

Ian Lance Taylor

unread,
Dec 27, 2011, 12:39:37 PM12/27/11
to Han-Wen Nienhuys, Reinhard Wobst, golang-nuts
Han-Wen Nienhuys <han...@google.com> writes:

Yes.

More generally the Go program seems to sort the single slice of large
arrays COL times. The Python program seems to sort each each column
independently.

Ian

Gustavo Niemeyer

unread,
Dec 27, 2011, 12:44:15 PM12/27/11
to Ian Lance Taylor, Han-Wen Nienhuys, Reinhard Wobst, golang-nuts
> Yes.
>
> More generally the Go program seems to sort the single slice of large
> arrays COL times.  The Python program seems to sort each each column
> independently.

It's worse than that.. it doesn't even sort the columns. It swaps the
entire rows col times, where col == 100. At the end, rows will be
ordered by the magnitude of the last column element.

unread,
Dec 27, 2011, 2:02:45 PM12/27/11
to golang-nuts
On Dec 27, 6:15 pm, Reinhard Wobst <zwin...@gmx.de> wrote:
> Results:
>
> Memory usage:
> 40 MB peak for Go program
> 62 MB peak for Python script

On my notebook (32-bit x86 CPU), I see 5 MB for Go, and 20 MB for
Python.

> Execution time for sort:
> 49 sec for Go program
> 1.3 sec for Python script

If you use slices (that is: type Row []int32), Go will be faster than
Python.

The Go source code can be made slightly smaller if you use
http://golang.org/pkg/io/ioutil/#ReadFile

Reinhard Wobst

unread,
Dec 27, 2011, 2:19:26 PM12/27/11
to golang-nuts
Thanks, these were two silly errors - of course, I wanted to work with
slices and not with arrays (I know the difference!), and I
interchanged ROW and COL. The corrected version can be found under
http://play.golang.org/p/1T3xwaodZb
now, and the test finishes after 0.8 sec, i.e. it is about 50% faster
than Python. The world is saved :-) Nevertheless, the difference is
less than one might expect, in time and, before all, in memory usage.

> It's worse than that.. it doesn't even sort the columns. It swaps the
> entire rows col times, where col == 100. At the end, rows will be
> ordered by the magnitude of the last column element.

Of course - I did 100 sort operations and neglected the first 99
results, this was intentional. It is the usual keyfunction sort in
Python (BTW, Python3 tries to force this kind of sort exclusively -
and I am not a friend of Python3, though I have been very familar with
this language for 10 years now).

Sorting each columns separately allows to estimate the average sort
time much better than filling 100 times an array with random numbers
and then sorting it - no magic in it.

So after holidays, I will test what I originally want ...

Happy New Year!

Reinhard

Jesse McNelis

unread,
Dec 27, 2011, 2:58:12 PM12/27/11
to Reinhard Wobst, golang-nuts
On 28/12/11 06:19, Reinhard Wobst wrote:
> now, and the test finishes after 0.8 sec, i.e. it is about 50% faster
> than Python. The world is saved :-) Nevertheless, the difference is
> less than one might expect, in time and, before all, in memory usage.

You can also take advantage of the way Go allows you to control memory
layout by making the entire table contiguous and thus make better use of
cache.

func NewTable() Table {
tab := Table(make([]Row, 0,ROW))
t := make([]int32,ROW*COL)
for i:=0;i<ROW;i++ {
tab = append(tab,Row(t[i*COL:(i+1)*COL]))
}
return tab
}

http://play.golang.org/p/UFTafXCJAR


Reinhard Wobst

unread,
Dec 27, 2011, 3:17:10 PM12/27/11
to golang-nuts
> func NewTable() Table {
>          tab := Table(make([]Row, 0,ROW))
>          t := make([]int32,ROW*COL)
>          for i:=0;i<ROW;i++ {
>                  tab = append(tab,Row(t[i*COL:(i+1)*COL]))
>          }
>          return tab

Yes, this is the "C-ish" style - I come from there :-) Of course, it
is a bit faster (0.6s instead of 0.8s), but the memory usage is almost
the same. No wonder since a slice is roughly an overhead of 3
pointers. I still wonder what causes the usage of 40MB VmPeak ...

Reinhard

unread,
Dec 27, 2011, 4:30:45 PM12/27/11
to golang-nuts
On Dec 27, 9:17 pm, Reinhard Wobst <zwin...@gmx.de> wrote:
> Yes, this is the "C-ish" style - I come from there :-) Of course, it
> is a bit faster (0.6s instead of 0.8s), but the memory usage is almost
> the same. No wonder since a slice is roughly an overhead of 3
> pointers. I still wonder what causes the usage of 40MB VmPeak ...

VmPeak does *not* mean that the Linux process is actually using the
memory.

The relevant fields are VmRSS and VmHWM.

Gustavo Niemeyer

unread,
Dec 27, 2011, 5:17:04 PM12/27/11
to Reinhard Wobst, golang-nuts
(...)

> Sorting each columns separately allows to estimate the average sort
> time much better than filling 100 times an array with random numbers
> and then sorting it - no magic in it.

And you say "no magic" with a straight face after reading this thread? ;)

Andrew Francis

unread,
Dec 27, 2011, 9:21:13 PM12/27/11
to golang-nuts
Hi Reinhard:

On Dec 27, 2:19 pm, Reinhard Wobst <zwin...@gmx.de> wrote:

> now, and the test finishes after 0.8 sec, i.e. it is about 50% faster
> than Python. The world is saved :-) Nevertheless, the difference is
> less than one might expect, in time and, before all, in memory usage.

I would be curious to see the results if you used PyPy with its JIT.

Cheers,
Andrew

unread,
Dec 28, 2011, 5:21:22 AM12/28/11
to golang-nuts, fij...@gmail.com
In my experience, PyPy is faster in cases they are showing on PyPy
website. In case of codes unrelated to the tests PyPy team is dealing
with, the outcome of using PyPy instead of CPython is questionable.

PyPy runs Reinhard's table sort benchmark at the same speed as
CPython2.7 (32-bit x86 CPU). PyPy uses 80% more memory than
CPython2.7.

Majority of the table sort in CPython runs as C code (function
"listsort" in "Objects/listobject.c", "long_richcompare" in "Objects/
longobject.c", etc).

Using a 64-bit CPU may yield somewhat different results.

Carl

unread,
Dec 28, 2011, 7:48:36 AM12/28/11
to golang-nuts
something else to consider is that gccgo is currently undergoing some
major changes for the go 1 release. Once that new compiler is
finished, you may get more differing results if you use that compiler
to get more granular control over optimizations such as speed vs
size, also targeting specific machines and architectures to exploit
the availability of various hardware functions.

Maciej Fijalkowski

unread,
Dec 28, 2011, 5:34:43 AM12/28/11
to ⚛, golang-nuts

Hey,

Thanks for pointing me the thread on golang-nuts, however, I don't
really know what should I reply to that. Do you have any precise
question in mind?

Cheers,
fijal

unread,
Dec 29, 2011, 11:41:19 AM12/29/11
to golang-nuts
On Dec 28, 11:34 am, Maciej Fijalkowski <fij...@gmail.com> wrote:
> On Wed, Dec 28, 2011 at 12:21 PM, ⚛ <0xe2.0x9a.0...@gmail.com> wrote:
> > In my experience, PyPy is faster in cases they are showing on PyPy
> > website. In case of codes unrelated to the tests PyPy team is dealing
> > with, the outcome of using PyPy instead of CPython is questionable.
>
> > PyPy runs Reinhard's table sort benchmark at the same speed as
> > CPython2.7 (32-bit x86 CPU). PyPy uses 80% more memory than
> > CPython2.7.
>
> > Majority of the table sort in CPython runs as C code (function
> > "listsort" in "Objects/listobject.c", "long_richcompare" in "Objects/
> > longobject.c", etc).
>
> > Using a 64-bit CPU may yield somewhat different results.
>
> Hey,
>
> Thanks for pointing me the thread on golang-nuts, however, I don't
> really know what should I reply to that. Do you have any precise
> question in mind?

As a sidenote, the Python & PyPy versions run roughly 15 times slower
than a C/C++ version of the benchmark.

Considering that the PyPy version of listsort seems to be implemented
in pure Python, PyPy's performance is fairly good.

I would be interested in knowing whether PyPy is able to find out that
the table has type "list of list of integer" and whether it can act
based on that knowledge. In Python, the table has type "object", a
more refined type would be "list of object", an even more refined type
would be "list of list of object", then finally (if we stop at this
point) it would be "list of list of integer". Which one from these 4
options is PyPy most close to?

Kevin Gillette

unread,
Jan 11, 2012, 4:48:33 PM1/11/12
to golang-nuts
It's entirely possible to crank a bit more performance out of the
CPython version of the code -- using a list comprehension or map
instead of the 'for ... in ...:' nested loops on lines 104-108 of your
corrected version (a replacement could be):

from random import randint # make it local, save lookup speed
Table = [[randint(0, UPPER) for i in xrange(COL)] for j in
xrange(ROW)]

If you're doing a plain for loop in python for something that can be
reduced to a list comprehension, it becomes a question of comparing
implementations to implementations rather than languages to languages
(you have to use the idioms designed into each language for a given
task). Python has much better optimization of comprehensions than it
does for for loops (which it doesn't optimize). For sorting, CPython
has very efficient C doing all the heavy lifting, and implements
timsort, which has the normal O(nlogn) for random data, but as little
as n time for pre-sorted data (so runs of sorted data are dealt with
in whole). The function you pass into sort is the only place the sort
possibly operation re-enters the vm -- if you pass in something
written in C (like out of the operator module), it'll never re-enter
the vm, and while keep close pace with go (at this point, a matter of
years of CPython tuning) -- so the further optimization for lines
114-115 would be:

import operator
for scol in xrange(COL):
Table.sort(key=operator.itemgetter(scol))

If you're doing any numeric processing in python, it's pythonic to use
optimized library tools (including something like numpy), in which
case the pure pythonized parts take on the role of a control language.
Reply all
Reply to author
Forward
0 new messages