Writing and reading BSON dumps

959 views
Skip to first unread message

Brendan Loudermilk

unread,
Jan 16, 2015, 8:38:33 PM1/16/15
to node-mong...@googlegroups.com
Hi all,

We're currently evaluating strategies for archiving data in our MongoDB cluster, which is growing quite large with stale data. Initially we thought about using mongodump but found it insufficient for our needs since we need to perform some slightly complex aggregations and many queries to archive data in a user-friendly way (e.g. anything older than last 1000 messages in a chat thread). Anyway, I've spent the last few hours poking around node-mongodb-native, node-mongodb-core, and js-bson trying to come up with a solution. I'm currently trying a combination of using raw queries using node-mongo-db-native, writing each document buffer to a file using each(), and then closing the file. Here's what that looks like:

MongoClient.connect(config.source_mongodb.uri, function (err, db) {
  db
.collection("groups", { raw: true }).find({ }, { limit: 10 }).each(function (err, doc) {
   
if (doc) {
      stream
.write(doc);
   
} else {
      stream
.end();
      db
.close();
   
}
 
});
});


On the read side I'm trying to use deserializeStream() from js-bson, but I can't read back out of the file without getting a parse error:

var stream = fs.createReadStream("/Users/bloudermilk/Desktop/something.bson"),
    docs
= new Array(10);


BSON
.deserializeStream(stream, 0, 10, docs);

Throws:

/Users/bloudermilk/Projects/archiver/node_modules/bson/lib/bson/bson.js:1221
 
if(size < 5 || size > buffer.length) throw new Error("corrupt bson message")
                                             ^
Error: corrupt bson message
    at Function.BSON.deserialize (/Users/bloudermilk/Projects/talkchain-archiver/node_modules/bson/lib/bson/bson.js:1221:46)
    at Function.BSON.deserializeStream (/Users/bloudermilk/Projects/talkchain-archiver/node_modules/bson/lib/bson/bson.js:1107:41)
    at Object.<anonymous> (/Users/bloudermilk/Projects/talkchain-archiver/read.js:9:6)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:929:3

I guess my first question is, is this a sane way of archiving MongoDB data as BSON? If so, any idea why the seemingly simple code above isn't working? If not, any recommended solutions?

Thanks!

Tim Kuijsten

unread,
Jan 16, 2015, 9:28:01 PM1/16/15
to node-mong...@googlegroups.com, bloud...@gmail.com
mm.. did some experiments with BSON.deserialize and all worked out quite
nice [1]. Not sure why you're getting those errors.

[1] see https://www.npmjs.com/package/bson-stream

-Tim

Brendan Loudermilk schreef op 17-01-15 om 02:38:
> if(size <5||size >buffer.length)thrownewError("corrupt bson message")
> ^
> Error: corrupt bson message
> at Function.BSON.deserialize
> (/Users/bloudermilk/Projects/talkchain-archiver/node_modules/bson/lib/bson/bson.js:1221:46)
> at Function.BSON.deserializeStream
> (/Users/bloudermilk/Projects/talkchain-archiver/node_modules/bson/lib/bson/bson.js:1107:41)
> at Object.<anonymous>
> (/Users/bloudermilk/Projects/talkchain-archiver/read.js:9:6)
> at Module._compile (module.js:456:26)
> at Object.Module._extensions..js (module.js:474:10)
> at Module.load (module.js:356:32)
> at Function.Module._load (module.js:312:12)
> at Function.Module.runMain (module.js:497:10)
> at startup (node.js:119:16)
> at node.js:929:3
> |
>
> I guess my first question is, is this a sane way of archiving MongoDB
> data as BSON? If so, any idea why the seemingly simple code above isn't
> working? If not, any recommended solutions?
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google
> Groups "node-mongodb-native" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to node-mongodb-na...@googlegroups.com
> <mailto:node-mongodb-na...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Brendan Loudermilk

unread,
Jan 16, 2015, 9:36:01 PM1/16/15
to Tim Kuijsten, node-mong...@googlegroups.com
Ah, I just figured it out. Apparently deserializeStream() doesn't accept a read stream as I expected. Reading the full buffer into memory using fs.readFileSync() worked fine, as well as your (very helpful) utility. Thanks, Tim!


Brendan Loudermilk

@bloudermilk on GitHub and Twitter

Reply all
Reply to author
Forward
0 new messages