Cube prematurely reporting successful insertions to mongodb?

40 views

Skip to first unread message

Don Spaulding

unread,

Oct 26, 2012, 7:03:43 PM10/26/12

to cube...@googlegroups.com

Hi everyone,

I was able to get up and going with cube very quickly. I've got a week's worth of historical data sitting in a text file that I wrote a simple python script[0] to throw into cube. The python script makes use of the python cube client[1]. The format of the file is one json string per line, FWIW.

When running the import script with just one process, it seems to work fine (but is incredibly slow to load the 1.4M events in the file). When I use multiple processes (by uncommenting the appropriate lines in the script), it appears that my import script gets wildly ahead of cube. It reports to me that 140,000 events or so have been sent, even though a mongo shell shows only ~30,000 items in the myevent_events collection. My import script then stops and waits for a while (basically until the mongodb collection shows 140K events) and then works its way up to 300,000 and waits again for cube to catchup. I get only "200 OK" responses. I estimate that I'm hitting 2000 reqs/sec while it's running.

Why does it appear that cube reports a 200 response before mongodb actually holds the event? Is this by design?

What's the recommended way to scale up the node processes serving the collector?

[0]: http://dpaste.com/hold/818801/

[1]: https://github.com/tsileo/cube-client/blob/master/cube.py

Reply all

Reply to author

Forward

0 new messages