Large dicts problems.

51 views
Skip to first unread message

Raphael C

unread,
Aug 7, 2013, 3:29:40 PM8/7/13
to shedskin...@googlegroups.com
I am testing this toy script and see both an unexpected crash and a slowdown versus standard python 2.7.3 (cpython).

#!/usr/bin/python
import sys
from collections import defaultdict
fin = open(sys.argv[1])

dict = defaultdict(list)

for line in fin:
    parts = line.split()
    dict[parts[0]].append(parts[1])

I create data with

paste <(seq 15000000) <(seq 2 15000001) >  largefile.txt

and test in cpython and get

time ./read.py largefile.txt 

real 0m41.771s


With shedskin 0.9.4 I get 

time ./read largefile.txt 
GC Warning: Repeated allocation of very large block (appr. size 134221824):
May lead to memory leak and poor performance.
(use a 64-bit system to possibly avoid GC warnings, or use shedskin -g to disable them)
GC Warning: Out of Memory!  Returning NIL!
(use a 64-bit system to possibly avoid GC warnings, or use shedskin -g to disable them)
Segmentation fault (core dumped)


If I reduce the file size to paste <(seq 10000000) <(seq 2 10000001) >  file.txt

time ./read largefile.txt GC Warning: Repeated allocation of very large block (appr. size 33558528):
May lead to memory leak and poor performance.
(use a 64-bit system to possibly avoid GC warnings, or use shedskin -g to disable them)

real 0m29.354s

With cpython I get

time ./read.py largefile.txt 
real 0m26.570s


Raphael


srepmub

unread,
Aug 17, 2013, 5:28:42 PM8/17/13
to shedskin...@googlegroups.com
sorry for the late reply. please see this thread for some comments about a similar problem:

https://groups.google.com/forum/#!searchin/shedskin-discuss/gc$20problem/shedskin-discuss/pW3y3be6sSI/aiUolHCrwswJ

Raphael C

unread,
Aug 18, 2013, 12:18:16 PM8/18/13
to shedskin...@googlegroups.com
Thanks but that link does not work for me. At least not on my android tablet.
Raphael
Reply all
Reply to author
Forward
0 new messages