Eliot;
Thanks for the reply. Capping the file doubling at 2gb works great;
that alleviates
my fears of partially filled 64gb files.
> Would it possible to send a sample data set with how much storage its
> taking up in other engines?
The data is a basic set of key value pairs where the keys are
incremental indexes and
the values are frequency counts. Nothing fancy, so if you create a
file with 2.8 million random
integers in it, that will be identical to what I am loading. Below is
the Python code that
loads the file and indexes the collection on the ID:
from pymongo.connection import Connection
from pymongo import ASCENDING
def main(in_file):
conn = Connection("server_machine")
db = conn["reads_090504"]
col = db["read_to_freq"]
with open(in_file) as in_handle:
for read_index, freq in enumerate(in_handle):
col.insert(dict(read_id=read_index, frequency=int(freq)))
col.create_index("read_id", ASCENDING)
For the Tokyo Cabinet comparison, the data set is the same; the keys
are the indexes
and the values are JSON string dictionaries of dict(frequency=freq) to
be as similar
as possible to what I am loading in MongoDB. The database is a BTree
with compression:
test.tcb#opts=ld#bnum=1000000#lcnum=10000
> Also, could you send the output of
> validate() ? That will tell us a little but more about the usage.
validate
details: 0x2aaac13c8c80 ofs:8c8c80
firstExtent:0:2a00 ns:reads_090504.read_to_freq
lastExtent:2:58f1000 ns:reads_090504.read_to_freq
# extents:18
datasize?:146157464 nrecords?:2810718 lastExtentSize:33205248
padding:1
first extent:
loc:0:2a00 xnext:0:24a00 xprev:null
ns:reads_090504.read_to_freq
size:3072 firstRecord:0:2ab0 lastRecord:0:3594
2810718 objects found, nobj:2810718
191128952 bytes data w/headers
146157464 bytes data wout/headers
deletedList: 0100000000000000010
deleted: n: 8 size: 3294744
nIndexes:2
reads_090504.read_to_freq.$_id_ keys:2810718
reads_090504.read_to_freq.$read_id_1 keys:2810718
Thanks again for taking a look. Let me know if I can provide any other
information,
Brad