Java client (1.1-dp) insert errors with 2.0.0 community edition (build-722)

16 views
Skip to first unread message

Steve

unread,
Apr 23, 2012, 4:00:05 PM4/23/12
to Couchbase
I am inserting ~771k documents read from a text file in key:json_str
format. At some point after about 750k documents are inserted, about
11k are not inserted. I have not checked whether the documents are
contiguous in the source file or scattered through the tail end of it,
but I know the very last records of the source are inserted.

I can collect the non-inserted records into another file and then
insert them with the original update routine.

Also, this is reproducible but not consistent. I have flushed the
bucket and repeated the process, but get a different number of records
inserted (delta = 1976)

The pdf documentation shows this pattern for inserts:
client.set(key, 0, value)
client.get(key)

but does not say the set *must* be followed by a get. Without the get
I see fewer than half the records inserted. The documentation is also
unclear on the TTL value, i.e. what's the meaning of 0,
Integer.MAX_VALUE, negative value, and the fact that it is time to
live.

Other than the get, is there anything I am missing?

Steve

unread,
Apr 25, 2012, 11:56:32 AM4/25/12
to Couchbase
I was able to repeat this behavior with another set of data.  This
time with a file containing about 2mil records. Inserts started
getting rejected after about 400k documents had been added and were
subsequently only inserted intermittently in chucks for a total of
only about 1mil inserts.  I found that if I increased the bucket's RAM
from 128M to 512M, all 2 million documents could be inserted in one
pass.  The problem, however is that I need to insert the original
770k, these 2mil, and at least a million more documents, but a trial
run shows that the 2.8 mil can not be inserted in a single pass --
2.6mil can be inserted in one pass and the remainder later.

There are at least two issues here:
1) Hand tuning the RAM requirements does not scale, especially when
the backend is meant for persisting data to disk, data is growing
always, and the average size of a document can change from moment to
moment
2) The failure mode of the set operation does not follow the principle
of least surprise. Instead of failing with an exception, data is
silently dropped

Ok -- another issue too: views seems to take excessive disk space, 20x
the data size
Reply all
Reply to author
Forward
0 new messages