I am testing this toy script and see both an unexpected crash and a slowdown versus standard python 2.7.3 (cpython).
#!/usr/bin/python
import sys
from collections import defaultdict
fin = open(sys.argv[1])
dict = defaultdict(list)
for line in fin:
parts = line.split()
dict[parts[0]].append(parts[1])
I create data with
paste <(seq 15000000) <(seq 2 15000001) > largefile.txt
and test in cpython and get
time ./read.py largefile.txt
real 0m41.771s
With shedskin 0.9.4 I get
time ./read largefile.txt
GC Warning: Repeated allocation of very large block (appr. size 134221824):
May lead to memory leak and poor performance.
(use a 64-bit system to possibly avoid GC warnings, or use shedskin -g to disable them)
GC Warning: Out of Memory! Returning NIL!
(use a 64-bit system to possibly avoid GC warnings, or use shedskin -g to disable them)
Segmentation fault (core dumped)
If I reduce the file size to paste <(seq 10000000) <(seq 2 10000001) > file.txt
time ./read largefile.txt GC Warning: Repeated allocation of very large block (appr. size 33558528):
May lead to memory leak and poor performance.
(use a 64-bit system to possibly avoid GC warnings, or use shedskin -g to disable them)
real 0m29.354s
With cpython I get
time ./read.py largefile.txt
real 0m26.570s
Raphael