I'm wondering what is "best practice" for when you want to process every document in a large collection in a (ruby) script.
I'm trying to build stats on a collection containing 33 million fairly complex documents; I'm currently traversing them by simply running:
... do something with the data
However, this takes a while to start (I'm assuming in building the cursor) and on my db it fails after a couple of hours with:
/home/ubuntu/.rvm/gems/ruby-1.9.2-p0/gems/mongo-1.0.8/lib/mongo/connection.rb:784:in `check_response_flags': Query response returned CURSOR_NOT_FOUND. Either an invalid cursor was specified, or the cursor may have timed out on the server. (Mongo::OperationFailure)
and in the Mongo logs all I can see is:
... a bunch of successful logs, then:
Mon Sep 27 01:36:56 [conn601] getmore ysa.rawEvents cid:627945061045667031 getMore: {} bytes:1049629 nreturned:707 955ms
Mon Sep 27 01:39:07 [conn601] getmore ysa.rawEvents cid:627945061045667031 getMore: {} bytes:1048698 nreturned:727 147ms
Mon Sep 27 01:40:04 [conn601] getmore ysa.rawEvents cid:627945061045667031 getMore: {} bytes:1084876 nreturned:651 1271310283ms
Mon Sep 27 02:03:01 [conn601] getmore ysa.rawEvents cid:627945061045667031 getMore: {} bytes:1048609 nreturned:651 105ms
Mon Sep 27 02:04:34 [conn601] getmore ysa.rawEvents cid:627945061045667031 getMore: {} bytes:1051472 nreturned:644 589ms
Mon Sep 27 02:04:56 [conn601] getmore ysa.rawEvents cid:627945061045667031 getMore: {} bytes:1051822 nreturned:687 118ms
Mon Sep 27 02:22:57 [conn601] getMore: cursorid not found ysa.rawEvents 627945061045667031
Mon Sep 27 02:22:57 [conn601] getmore ysa.rawEvents cid:627945061045667031 bytes:20 nreturned:0 134ms
Now, I'm fairly sure (well, I hope!) that there is no data corruption - I'm just guessing something timed out somewhere? I'm not sure what's going on with that huge time at 1:40...
Given that I know nothing is writing to the collection, and I don't care about query order, is there some better way to process every document in the collection than this?
(server version is mongodb 1.6.2 running on Ubuntu 10.4 on an Amazon ec2 server, with the ruby client v 1.0.8)
- Korny
--
Kornelis Sietsma korny at my surname dot com
kornys on twitter/fb/gtalk/gwave
www.sietsma.com/korny
"Every jumbled pile of person has a thinking part
that wonders what the part that isn't thinking
isn't thinking of"