To count number of quadruplets with sum = 0

n00m

unread,

Mar 15, 2007, 6:42:03 PM3/15/07

to

http://www.spoj.pl/problems/SUMFOUR/

3
0 0 0 0
0 0 0 0
-1 -1 1 1
Answer for this input data is 33.

My solution for the problem is
======================================================================

import time
t = time.clock()

q,w,e,r,sch,h = [],[],[],[],0,{}

f = open("D:/m4000.txt","rt")

n = int(f.readline())

for o in range(n):
row = map(long, f.readline().split())
q.append(row[0])
w.append(row[1])
e.append(row[2])
r.append(row[3])

f.close()

for x in q:
for y in w:
if h.has_key(x+y):
h[x+y] += 1
else:
h[x+y] = 1

for x in e:
for y in r:
sch += h.get(-(x+y),0)

q,w,e,r,h = None,None,None,None,None

print sch
print time.clock() - t

===============================================================

Alas it gets "time limit exceeded".
On my home PC (1.6 GHz, 512 MB RAM) and for 4000 input rows it
executes ~1.5 min.
Any ideas to speed it up say 10 times? Or the problem only for C-like
langs?

Matimus

unread,

Mar 15, 2007, 7:22:32 PM3/15/07

to

> Any ideas to speed it up say 10 times? Or the problem only for C-like
> langs?

I dout this will speed it up by a factor of 10, but in your solution
you are mapping the values in the input file to longs. The problem
statement states that the maximum value for any of the numbers is
2**28. I assume the reason for this is precisely to allow you to use
32-bit integers. So, you can safely use int instead, and it _might_
speed up a little.

n00m

unread,

Mar 15, 2007, 8:04:36 PM3/15/07

to

Your suggestion speeded it up 1.5 times (on my 4000 test input rows)!

Steven Bethard

unread,

Mar 15, 2007, 10:19:16 PM3/15/07

to

This won't help much, but you can rewrite the above as::

x_y = x + y
h[x_y] = h.get(x_y, 1)

Or if you're using Python 2.5, try::

h = collections.defaultdict(itertools.repeat(0).next)

...

for x in q:
for y in w:

h[x + y] += 1
...

Not likely to get you an order of magnitude though.

> for x in e:
> for y in r:
> sch += h.get(-(x+y),0)

If you use the collections.defaultdict approach above, this becomes::

for x in e:
for y in r:

sch += h[-(x + y)]

Note that you should also probably put all your code into a function --
looking up function locals is quicker than looking up module globals.

STeVe

Paul Rubin

unread,

Mar 15, 2007, 11:05:03 PM3/15/07

to

"n00m" <n0...@narod.ru> writes:
> http://www.spoj.pl/problems/SUMFOUR/
> 3
> 0 0 0 0
> 0 0 0 0
> -1 -1 1 1
> Answer for this input data is 33.

f = open('input1')
npairs = int(f.readline())

quads = [map(int, f.readline().split()) for i in xrange(npairs)]
assert len(quads) == npairs

da = {}

for p in quads:
for q in quads:
z = p[2] + q[3]
da[z] = da.get(z,0) + 1

print sum([da.get(-(p[0]+q[1]), 0) for p in quads for q in quads])

Paul Rubin

unread,

Mar 15, 2007, 11:08:46 PM3/15/07

to

Paul Rubin <http://phr...@NOSPAM.invalid> writes:
> print sum([da.get(-(p[0]+q[1]), 0) for p in quads for q in quads])

The above should say:

print sum(da.get(-(p[0]+q[1]), 0) for p in quads for q in quads)

I had to use a listcomp instead of a genexp for testing, since I'm
still using python 2.3, and I forgot to patch that before pasting to
the newsgroup. But the listcomp will burn a lot of memory when the
list is large.

I'd be interested in the running time of the above for your large data set.
Note that it still uses potentially O(n**2) space.

n00m

unread,

Mar 16, 2007, 1:49:03 AM3/16/07

to

Steven, I ran this:

import time, collections, itertools
t = time.clock()

q,w,e,r,sch = [],[],[],[],0

h = collections.defaultdict(itertools.repeat(0).next)

f = open("D:/m4000.txt","rt")

for o in range(int(f.readline())):
row = map(int, f.readline().split())

q.append(row[0])
w.append(row[1])
e.append(row[2])
r.append(row[3])

f.close()

for x in q:
for y in w:

h[x+y] += 1

for x in e:
for y in r:

sch += h[-(x + y)]

q,w,e,r,h = None,None,None,None,None

print sch
print time.clock() - t

========= and it almost froze my PC...
=== but it was faster than my code on input file with 1000 rows:
====== 2.00864607094s VS 3.14631077413s

n00m

unread,

Mar 16, 2007, 1:57:13 AM3/16/07

to

Paul,

import time
t = time.clock()

f = open("D:/m4000.txt","rt")

npairs = int(f.readline())
quads = [map(int, f.readline().split()) for i in xrange(npairs)]

f.close()
da = {}
for p in quads:

for q in quads:
z = p[2] + q[3]
da[z] = da.get(z,0) + 1

print sum(da.get(-(p[0]+q[1]), 0) for p in quads for q in quads)

print time.clock() - t

Two first outputs is of above (your) code; next two - of my code:

>>> ================================ RESTART ===
>>>
0
68.9562762865
>>> ================================ RESTART ===
>>>
0
68.0813539151
>>> ================================ RESTART ===
>>>
0
62.5012896891
>>> ================================ RESTART ===
>>>
0
62.5030784639

Paul Rubin

unread,

Mar 16, 2007, 1:59:52 AM3/16/07

to

"n00m" <n0...@narod.ru> writes:
> h = collections.defaultdict(itertools.repeat(0).next)

Something wrong with
h = collections.defaultdict(int)
?????

> for x in e:
> for y in r:
> sch += h[-(x + y)]

That scares me a little: I think it makes a new entry in h, for the
cases where -(x+y) is not already in h. You want:

for x in e:
for y in r:

sch += h.get(-(x+y), 0)

Paul Rubin

unread,

Mar 16, 2007, 2:09:55 AM3/16/07

to

"n00m" <n0...@narod.ru> writes:
> Two first outputs is of above (your) code; next two - of my code:

Yeah, I see now that we both used the same algorithm. At first glance
I thought you had done something much slower. The 10 second limit
they gave looks like they intended to do it about this way, but with a
compiled language. 68 seconds isn't so bad for 4000 entries in pure
CPython. Next thing to do I think is use psyco or pyrex.

Michael Spencer

unread,

Mar 16, 2007, 1:11:40 AM3/16/07

to pytho...@python.org

Perhaps a bit faster using slicing to get the lists and avoiding dict.get:

def sumfour(src):
l = map(int, src.split())
dct={}
s=0
A, B, C, D = l[1::4], l[2::4], l[3::4], l[4::4]
for a in A:
for b in B:
if a+b in dct:
dct[a+b] += 1
else:
dct[a+b] = 1
for c in C:
for d in D:
if -c-d in dct:
s+= dct[-c-d]
return s

if __name__ == '__main__':
import sys
print sumfour(sys.stdin.read())

Michael

n00m

unread,

Mar 16, 2007, 2:28:04 AM3/16/07

to

Steve,
imo strangely enough but your suggestion to replace "if...: else:..."
with

x_y = x + y
h[x_y] = h.get(x_y, 1)

s=l=o=w=e=d the thing by ~1 sec.

n00m

unread,

Mar 16, 2007, 2:41:58 AM3/16/07

to

For those who interested, my test input files:

http://rapidshare.com/files/21267938/m1000.txt

http://rapidshare.com/files/21268386/m4000.txt

Paul Rubin

unread,

Mar 16, 2007, 3:19:36 AM3/16/07

to

"n00m" <n0...@narod.ru> writes:
> http://rapidshare.com/files/21267938/m1000.txt
> http://rapidshare.com/files/21268386/m4000.txt

I get 33190970 for the first set and 0 for the second set.

The first set only makes 38853 distinct dictionary entries, I guess
because the numbers are all fairly small so there's a lot of duplication.
The second set makes 8246860 distinct entries.

I got 128.42 sec runtime on the second set, on a 1.2 ghz Pentium M.
That was with the listcomp in the summation, which burned a lot of
extra memory, but I didn't bother writing out a summation loop.

I guess I can see a few other possible micro-optimizations but no
obvious algorithm improvements.

Paul McGuire

unread,

Mar 16, 2007, 4:02:32 AM3/16/07

to

On Mar 16, 12:49 am, "n00m" <n...@narod.ru> wrote:

> for o in range(int(f.readline())):
> row = map(int, f.readline().split())
> q.append(row[0])
> w.append(row[1])
> e.append(row[2])
> r.append(row[3])

Does this help at all in reading in your data?

numlines = f.readline()
rows = [ map(int,f.readline().split()) for _ in range(numlines) ]
q,w,e,r = zip(rows)

-- Paul

Marc 'BlackJack' Rintsch

unread,

Mar 16, 2007, 4:18:33 AM3/16/07

to

In <7x6491k...@ruckus.brouhaha.com>, Paul Rubin wrote:

> "n00m" <n0...@narod.ru> writes:
>> h = collections.defaultdict(itertools.repeat(0).next)
>
> Something wrong with
> h = collections.defaultdict(int)
> ?????

According to a post by Raymond Hettinger it's faster to use that iterator
instead of `int`.

Ciao,
Marc 'BlackJack' Rintsch

Anton Vredegoor

unread,

Mar 16, 2007, 9:29:56 AM3/16/07

to

n00m wrote:

> 62.5030784639

Maybe this one could save a few seconds, it works best when there are
multiple occurrences of the same value.

A.

from time import time

def freq(L):
D = {}
for x in L:
D[x] = D.get(x,0)+1
return D

def test():
t = time()
f = file('m4000.txt')
f.readline()
L = []
for line in f:
L.append(map(int,line.split()))

q,w,e,r = map(freq,zip(*L))
sch,h = 0,{}
for xk,xv in q.iteritems():
for yk,yv in w.iteritems():
if h.has_key(xk+yk):
h[xk+yk] += xv*yv
else:
h[xk+yk] = xv*yv

for xk,xv in e.iteritems():
for yk,yv in r.iteritems():
if h.has_key(-(xk+yk)):
sch += h[-(xk+yk)]*xv*yv

print sch
print time()-t

if __name__=='__main__':
test()

Steven Bethard

unread,

Mar 16, 2007, 11:37:28 AM3/16/07

to

Yep. It's because the .next() method takes no arguments, while int()
takes varargs because you can do::

int('2')
int('2', 8)

Calling a no-args function is substantially faster than calling a
varargs function.

STeVe

Paul Rubin

unread,

Mar 16, 2007, 11:56:01 AM3/16/07

to

Steven Bethard <steven....@gmail.com> writes:
> > According to a post by Raymond Hettinger it's faster to use that
> > iterator instead of `int`.
> Yep. It's because the .next() method takes no arguments, while int()

> takes varargs because you can do:: ...

Heh, good point. Might be worth putting a special hack in defaultdict
to recognize the common case of defaultdict(int).

marek...@wp.pl

unread,

Mar 16, 2007, 4:05:14 PM3/16/07

to

My attempt uses a different approach: create two sorted arrays, n^2
elements each; and then iterate over them looking for matching
elements (only one pass is required). I managed to get 58,2250612857 s
on my 1,7 MHz machine. It requires numpy for decent performance,
though.

import numpy
import time

def parse_input():
al, bl, cl, dl = [], [], [], []
for i in xrange(int(raw_input())):
a, b, c, d = map(int, raw_input().split())
al.append(a)
bl.append(b)
cl.append(c)
dl.append(d)
return al, bl, cl, dl

def count_zero_sums(al, bl, cl, dl):
n = len(al) # Assume others are equal

# Construct al extended (every element is repeated n times)
ale = numpy.array(al).repeat(n)
del al
# Construct bl extended (whole array is repeated n times)
ble = numpy.zeros((n*n,), int)
for i in xrange(n): ble[i*n:(i+1)*n] = bl
del bl
# Construct abl - sorted list of all sums of a, b for a, b in al, bl
abl = numpy.sort(ale + ble)
del ale, ble

# Construct cl extended (every element is repeated n times)
cle = numpy.array(cl).repeat(n)
del cl
# Construct dl extended (whole array is repeated n times)
dle = numpy.zeros((n*n,), int)
for i in xrange(n): dle[i*n:(i+1)*n] = dl
del dl
# Construct cdl - sorted list of all negated sums of a, b for a, b in
cl, dl
cdl = numpy.sort(-(cle + dle))
del cle, dle

# Iterate over arrays, count matching elements
result = 0
i, j = 0, 0
n = n*n
try:
while True:
while abl[i] < cdl[j]:
i += 1
while abl[i] > cdl[j]:
j += 1
if abl[i] == cdl[j]:
# Found matching sequences
ii = i + 1
while ii < n and abl[ii] == abl[i]: ii += 1
jj = j + 1
while jj < n and cdl[jj] == cdl[j]: jj += 1
result += (ii - i)*(jj - j)
i, j = ii, jj
except IndexError:
pass

return result

t = time.clock()
print count_zero_sums(*parse_input())
print time.clock() - t

n00m

unread,

Mar 17, 2007, 2:05:23 PM3/17/07

to

i have no NumPy to test it...
without Psyco Anton's code is the winner: ~48sec vs ~58sec of my code
But with Psyco my runtime is ~28sec; Anton's - ~30sec (PC: 1.6 ghz,
512 mb)
Not so bad.. keeping in mind that 256000 billions quadruplets to
check :)

import psyco, time
psyco.full()
t = time.clock()

def main():

q,w,e,r,sch,h = [],[],[],[],0,{}

f = open("D:/m4000.txt","rt")

for o in range(f.readline()):
row = map(int, f.readline().split())

q.append(row[0])
w.append(row[1])
e.append(row[2])
r.append(row[3])
f.close()

for x in q:
for y in w:
if h.has_key(x+y):
h[x+y] += 1
else:
h[x+y] = 1

for x in e:
for y in r:

if h.has_key(-(x+y)):
sch += h[-(x+y)]

q,w,e,r,h = None,None,None,None,None
print sch

main()
print time.clock() - t

bearoph...@lycos.com

unread,

Mar 17, 2007, 2:43:41 PM3/17/07

to

n00m:

> i have no NumPy to test it...
> without Psyco Anton's code is the winner: ~48sec vs ~58sec of my code
> But with Psyco my runtime is ~28sec; Anton's - ~30sec (PC: 1.6 ghz,
> 512 mb)
> Not so bad.. keeping in mind that 256000 billions quadruplets to
> check :)

I have oiled it a bit, you can try the speed of this too (for dicts in
is always faster than has_key).

from collections import defaultdict
import time, psyco

def main():
sch = 0
q,w,e,r = [],[],[],[]
h = defaultdict(int)

datafile = file("m1000.txt")
datafile.next()
xrows = (map(int, line.split()) for line in datafile)
q, w, e, r = zip(*xrows)

for x in q:
for y in w:

h[x+y] += 1

for x in e:
for y in r:

if -x-y in h:
sch += h[-x-y]

print sch

t = time.clock()
psyco.full()
main()
print round(time.clock() - t, 2), "secs"

Paul Rubin

unread,

Mar 17, 2007, 2:58:38 PM3/17/07

to

bearoph...@lycos.com writes:
> for x in e:
> for y in r:
> if -x-y in h:
> sch += h[-x-y]

I wonder whether

g = h.get

for x in e:
for y in r:
if -x-y in h:

sch += g(-x-y, 0)

might be a little bit faster. Also, -x-y could be saved in a variable.

It's unfortunate that array.array objects don't support the .sort()
operation. It would be interesting to compare the sorting-based scheme
with the hashing-based one under psyco. O(n**2*log(n)) local memory
references might be faster than O(n**2) hash operations and random
lookups that almost always miss the cache.

mark....@gmail.com

unread,

Mar 17, 2007, 3:00:02 PM3/17/07

to

FWIW, the original program can also be compiled with Shed Skin (http://
mark.dufour.googlepages.com), an experimental (static-)Python-to-C++
compiler, resulting in a speedup of about 8 times for a single test
with 500 tuples. here's a slightly modified version that works with
Shed Skin CVS at least:

import time
t = time.clock()

q,w,e,r,sch,h = [],[],[],[],0,{}

f = open("m33.txt","rt")

n = int(f.readline())

for o in range(n):
row = [int(x) for x in f.readline().split()]

q.append(row[0])
w.append(row[1])
e.append(row[2])
r.append(row[3])

f.close()

for x in q:
for y in w:

if h.has_key(x+y):
h[x+y] += 1
else:

h[x+y] = 1

for x in e:
for y in r:

sch += h.get(-(x+y),0)

print sch
print time.clock() - t

Thanks,
Mark Dufour (Shed Skin author - send me bug reports!)

bearoph...@lycos.com

unread,

Mar 17, 2007, 5:15:15 PM3/17/07

to

Mark Dufour:

> FWIW, the original program can also be compiled with Shed Skin (http://
> mark.dufour.googlepages.com), an experimental (static-)Python-to-C++
> compiler, resulting in a speedup of about 8 times for a single test
> with 500 tuples.

If we want to play, then this is a literal translation to D (I am not
much good in D yet, so maybe there are ways to improve this code a
bit), as you can see D syntax isn't that bad (I think sometimes D
copies some Python syntax):

// Compile with: dmd -O -release solver.d
import std.stdio, std.stream, std.string;

void main() {
size_t sch;
size_t[size_t] h;
size_t[] q,w,e,r;

int nrow = -1;
auto datafile = new File("m4000.txt", FileMode.In);
foreach(char[] row; datafile) {
if (nrow == -1) {
q.length = row.atoi();
w.length = row.atoi();
e.length = row.atoi();
r.length = row.atoi();
} else {
char[][] srow = row.split();
q[nrow] = srow[0].atoi();
w[nrow] = srow[1].atoi();
e[nrow] = srow[2].atoi();
r[nrow] = srow[3].atoi();
}
nrow++;
}

foreach (x; q)
foreach (y; w)
h[x+y]++;

foreach (x; e)
foreach (y; r) {
size_t* pointer = -x-y in h;
if (pointer)
sch += *pointer;
// simpler but slower:
// if (-x-y in h)
// sch += h[-x-y];
}

writefln(sch);
}

On my PC with the 1000 lines file this is about 2.2 times faster than
my Psyco version and about 4.6 times faster than the same code
without Psyco.

Bye,
bearophile

n00m

unread,

Mar 17, 2007, 11:46:06 PM3/17/07

to

bearophileH!

I gave to your "oil" svrl runs ("z in dict" instd of "dict.has_key"
saved only ~0.4 sec).
The result is (and I'm completely lost in all these
*optimizations* :)):

>>> ================================ RESTART ===
>>>
0
34.78 secs (bearophileH)
>>> ================================ RESTART ===
>>>
0
34.77 secs (bearophileH)
>>> ================================ RESTART ===
>>>
0
34.76 secs (bearophileH)
>>> ================================ RESTART ===
>>>
0
27.2460938471 secs (n00m)
>>> ================================ RESTART ===
>>>
0
27.2953596058 secs (n00m)
>>> ================================ RESTART ===
>>>
0
27.3709929614 secs (n00m)
>>>

In the name of exactness this is the tested codes:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

from collections import defaultdict
import time, psyco

def main():
sch = 0
q,w,e,r = [],[],[],[]
h = defaultdict(int)

datafile = file("D:/m4000.txt")

datafile.next()
xrows = (map(int, line.split()) for line in datafile)
q, w, e, r = zip(*xrows)

for x in q:
for y in w:

h[x+y] += 1

for x in e:
for y in r:

if -x-y in h:
sch += h[-x-y]

print sch

t = time.clock()
psyco.full()
main()

print round(time.clock() - t, 2), "secs (bearophileH)"
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
import psyco, time

def main():

q,w,e,r,sch,h = [],[],[],[],0,{}

f = open("D:/m4000.txt","rt")

for o in range(int(f.readline())):
row = map(int, f.readline().split())

q.append(row[0])
w.append(row[1])
e.append(row[2])
r.append(row[3])
f.close()

for x in q:
for y in w:

if (x+y) in h:

h[x+y] += 1
else:
h[x+y] = 1

for x in e:
for y in r:

if -(x+y) in h:
sch += h[-(x+y)]

q,w,e,r,h = None,None,None,None,None
print sch

t = time.clock()
psyco.full()
main()
print time.clock() - t,"secs (n00m)"
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

PS Both Python & Psyco of 2.5 ver.

n00m

unread,

Mar 17, 2007, 11:54:33 PM3/17/07

to

my dial-up line's too slow for downloading 4mb of shedskin-0.0.20.exe

n00m

unread,

Mar 18, 2007, 1:35:09 AM3/18/07

to

>>> ================================ RESTART ===
>>>
0
30.4740708665 secs (Anton Vredegoor)
>>> ================================ RESTART ===
>>>
0
30.4132625795 secs (Anton Vredegoor)
>>> ================================ RESTART ===
>>>
0
30.4812175849 secs (Anton Vredegoor)
>>>

+++++++++++++++++++++++++++++++++++++++++++++++++++++++
import psyco, time

def freq(L):
D = {}
for x in L:
D[x] = D.get(x,0)+1
return D

def test():
f = file('D:/m4000.txt')

f.readline()
L = []
for line in f:
L.append(map(int,line.split()))

q,w,e,r = map(freq,zip(*L))
sch,h = 0,{}
for xk,xv in q.iteritems():
for yk,yv in w.iteritems():
if h.has_key(xk+yk):
h[xk+yk] += xv*yv
else:
h[xk+yk] = xv*yv

for xk,xv in e.iteritems():
for yk,yv in r.iteritems():
if h.has_key(-(xk+yk)):
sch += h[-(xk+yk)]*xv*yv

print sch

t = time.clock()
psyco.full()
test()
print time.clock() - t,"secs (Anton Vredegoor)"
+++++++++++++++++++++++++++++++++++++++++++++++++++++++

PS
The same PC: AMD Sempron +2600 (1.6 GHz); 512 MB RAM.

Jorge Godoy

unread,

Mar 18, 2007, 7:19:18 AM3/18/07

to

"n00m" <n0...@narod.ru> writes:

> my dial-up line's too slow for downloading 4mb of shedskin-0.0.20.exe

<joke>
Don't worry! We can email it to you. :-D
</joke>

--
Jorge Godoy <jgo...@gmail.com>

mark....@gmail.com

unread,

Mar 26, 2007, 10:09:26 AM3/26/07

to

> FWIW, the original program can also be compiled with Shed Skin (http://
> mark.dufour.googlepages.com), an experimental (static-)Python-to-C++
> compiler, resulting in a speedup of about 8 times for a single test
> with 500 tuples. here's a slightly modified version that works with
> Shed Skin CVS at least:

after optimizing dicts a bit for Shedskin 0.0.21 (http://
mark.dufour.googlepages.com), the speedup for this program is now
about 16.5 times for the same test.