I have an application using bigcouch and when it first starts it tries to initialize a clean DB with ~400 docs (~500kb) in a single bulk update. This works fine for clusters with N=1, but when N=3 (should be odd, right?) it occasionally succeeds, but more often fails -- sometimes crashing bigcouch and sometimes just printing lots of nasty errors. Q=256 R=2 W=2
I'm running on clusters of 3 CentOS VMs with 3 cores and 6gb RAM, bigcouch 1.1.1. Looking into the problem, the CPU is pegged at 100% utilization and running truss on the beam process it seems to spend 98% of the system time in "futex" calls. (Although this could be a red herring -- most of the CPU time is user, not system.)
It seems like this should be a pretty trivial bulk update for this much hardware. But I get the same results on our own vxware and Amazon's EC2. Same from within our application or reproducing the insert with curl. Smaller reads and writes to the DB work fine. On the rare occasions when the builk insert succeeds, the documents are fine and the application proceeds normally (although in the future there will be more bulk operations throughout). The IDs are not sequential (as recommended for performance) but retrying with sequential IDs did not change anything.
Any suggestions? Here are a few details if they help:
http://pastehtml.com/view/celvd4fm8.txt --- JSON to insert if you want to reproduce
http://pastehtml.com/view/celvr44c0.txt --- logfile entries with crash info
Thanks for any help!