I am iterating over a large table, simply counting the number of rows, and I noticed that the memory usage is very high (about 2GB). I used the fill_cache=False param, but that didn't help.
Here is some code to replicate the problem:
import leveldb
import os
# this creates a large table with random data (the exact data doesn't matter)
db = leveldb.LevelDB("/mnt/dustin/test_level_db")
for x in xrange(1000*1000*1000):
db.Put(key="%010d"%x, value=str(x**3))
Once this table is created, I ran a new process:
db = leveldb.LevelDB("/mnt/dustin/test_level_db")
num_rows = 0
for (key, value) in db.RangeIter(include_value=True, fill_cache=False):
num_rows += 1
if num_rows % 10000000 == 0:
print "Iterated over %dM rows" % (num_rows/1000000)
os.system("ps u -p %d" % os.getpid())
which produced the following output:
Iterated over 10M rows
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
dustin 28964 97.8 1.2 250968 219576 pts/1 S+ 13:03 0:06 python test_level_db.py
Iterated over 20M rows
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
dustin 28964 98.5 2.4 479872 448408 pts/1 R+ 13:03 0:13 python test_level_db.py
Iterated over 30M rows
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
dustin 28964 98.5 3.8 713816 682164 pts/1 S+ 13:03 0:20 python test_level_db.py
Iterated over 40M rows
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
dustin 28964 102 5.1 949700 917876 pts/1 S+ 13:03 0:27 python test_level_db.py
Iterated over 50M rows
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
dustin 28964 101 6.4 1186420 1154532 pts/1 S+ 13:03 0:34 python test_level_db.py
Iterated over 60M rows
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
dustin 28964 100 7.6 1396724 1364628 pts/1 S+ 13:03 0:41 python test_level_db.py
Iterated over 70M rows
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
dustin 28964 100 8.1 1491676 1460924 pts/1 S+ 13:03 0:48 python test_level_db.py
Iterated over 80M rows
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
dustin 28964 100 8.2 1511856 1481052 pts/1 S+ 13:03 0:55 python test_level_db.py
Iterated over 90M rows
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
dustin 28964 99.7 8.3 1523380 1492436 pts/1 S+ 13:03 1:01 python test_level_db.py
Iterated over 100M rows
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
dustin 28964 99.5 8.3 1532064 1501004 pts/1 S+ 13:03 1:08 python test_level_db.py
As you can see, the memory usage (both the VSZ and RSS) just grows and grows...
Any ideas on why this is happening, or how to stop it?