Yup, I wrote the original files with the original java driver, but it
wasn't a bug in the java driver imho.
If I check the gridfs chunks directly with the mongo shell, the old
chunks (created with the java driver) return the following :
> db.fs.internalstorage.chunks.findOne();
{
"_id" : ObjectId("4eb14d448e197a2072344990"),
"files_id" : ObjectId("4eb14d448e197a2071344990"),
"n" : 0,
"data" : BinData(2,"<binary data>")
}
the chunks written with the mongofiles command line util return the
following :
> db.fs.chunks.findOne();
{
"_id" : ObjectId("4eb7119ff0b485ea49275160"),
"files_id" : ObjectId("4eb7119f88f77a50b478a630"),
"n" : 0,
"data" : BinData(0,"<binary data>")
}
The new files (written by the mongofiles command line util) start with
BinData(0 and the old files (old java driver) start with BinData(2 and
if you look at the bsonspec site (
http://bsonspec.org/#/specification
) and you search for subtype , you can see that subtype 0 and 2 are
just different ways to store binary data. The subtype 2 is the old
generic way and subtype 0 is the new generic way. There is some
explanation on how the old way works (hover over the i behind it) and
it seems a 32bit integer is added before the binary data, which is
consistent with my results.
I'm no expert on bson or on how the bson parser works, but to me it
seems that it should be possible to check the subtype in the returned
bson and strip the 32bit integer if it's a subtype 2 and do nothing
otherwise.