GridFS - problem with reading (binary?) data

Kim Gressens

unread,

Nov 18, 2011, 11:38:41 AM11/18/11

to node-mongodb-native

Hi,

I'm not sure if anyone knows but after quite a few hours of debugging
I've noticed that node-mongodb-native does not support the old way in
which binary data was encoded in GridFS.

There are two possible ways MongoDB can store data in GridFS, one with
a subtype of 0 (which is the default) and one with a subtype of 2
(which is legacy). (documented at http://bsonspec.org/#/specification)
While testing the driver with GridFS data, which was stored as subtype
2, the data that was returned consistently had four bytes in front
(0x00 0x00 0x40 0x00), which garbled many binary files.

Just wanted to share my problems, maybe somebody can write an easy
patch and if not, this feature is documented here ;-)

Thanks for the great driver btw!

christkv

unread,

Nov 18, 2011, 12:54:42 PM11/18/11

to node-mongodb-native

what version of the driver did you use originally ? it will help me
figure out if it's a bug in the driver you used or if it's a
consistent issue.

On Nov 18, 5:38 pm, Kim Gressens <kim.gress...@wondergraphs.com>
wrote:

> Hi,
>
> I'm not sure if anyone knows but after quite a few hours of debugging
> I've noticed that node-mongodb-native does not support the old way in
> which binary data was encoded in GridFS.
>
> There are two possible ways MongoDB can store data in GridFS, one with
> a subtype of 0 (which is the default) and one with a subtype of 2

> (which is legacy). (documented athttp://bsonspec.org/#/specification)

Kim Gressens

unread,

Nov 18, 2011, 2:06:45 PM11/18/11

to node-mongodb-native

The GridFS files were stored using the official java driver with
version 2.3. The implementation of java driver was changed at v2.6.
The changelog of the java driver can be found at
https://github.com/mongodb/mongo-java-driver/wiki/Release-Notes (see
breaking changes in 2.6)

The driver for reading the GridFS files was mon...@0.9.7-0

Kim Gressens

unread,

Nov 18, 2011, 2:08:30 PM11/18/11

to node-mongodb-native

Oh yes and if I stored the file using the mongofiles command line
utility, reading the same file worked perfectly for me. So it
definitely was the legacy subtype storage that garbled everything.

On Nov 18, 8:06 pm, Kim Gressens <kim.gress...@wondergraphs.com>
wrote:

> The GridFS files were stored using the official java driver with
> version 2.3. The implementation of java driver was changed at v2.6.
> The changelog of the java driver can be found athttps://github.com/mongodb/mongo-java-driver/wiki/Release-Notes(see
> breaking changes in 2.6)
>

> The driver for reading the GridFS files was mong...@0.9.7-0

christkv

unread,

Nov 18, 2011, 2:22:06 PM11/18/11

to node-mongodb-native

so did you write the original files using the java driver ? I just
need to understand if it's possible to fix this without breaking other
files stored with the legacy type.
If I make a fix and it's fixing a bug in the java driver it will break
other people's files written with other drivers that do it correctly.
In that case we need to find another solution that will not cause
incompatible behaviour.

On Nov 18, 8:08 pm, Kim Gressens <kim.gress...@wondergraphs.com>

Kim Gressens

unread,

Nov 18, 2011, 2:37:42 PM11/18/11

to node-mongodb-native

Yup, I wrote the original files with the original java driver, but it
wasn't a bug in the java driver imho.

If I check the gridfs chunks directly with the mongo shell, the old
chunks (created with the java driver) return the following :
> db.fs.internalstorage.chunks.findOne();
{
"_id" : ObjectId("4eb14d448e197a2072344990"),
"files_id" : ObjectId("4eb14d448e197a2071344990"),
"n" : 0,
"data" : BinData(2,"<binary data>")
}

the chunks written with the mongofiles command line util return the
following :
> db.fs.chunks.findOne();
{
"_id" : ObjectId("4eb7119ff0b485ea49275160"),
"files_id" : ObjectId("4eb7119f88f77a50b478a630"),
"n" : 0,
"data" : BinData(0,"<binary data>")
}

The new files (written by the mongofiles command line util) start with
BinData(0 and the old files (old java driver) start with BinData(2 and
if you look at the bsonspec site ( http://bsonspec.org/#/specification
) and you search for subtype , you can see that subtype 0 and 2 are
just different ways to store binary data. The subtype 2 is the old
generic way and subtype 0 is the new generic way. There is some
explanation on how the old way works (hover over the i behind it) and
it seems a 32bit integer is added before the binary data, which is
consistent with my results.

I'm no expert on bson or on how the bson parser works, but to me it
seems that it should be possible to check the subtype in the returned
bson and strip the 32bit integer if it's a subtype 2 and do nothing
otherwise.

Reply all

Reply to author

Forward