Indexing hash field, string vs. binary

283 views
Skip to first unread message

weasel

unread,
Oct 4, 2010, 5:49:02 AM10/4/10
to mongodb-user
Hi,

I have a collection of documents where each document has a field
called "hash", which is the SHA1 of another field (an URL). This hash
field is a 40 char string and has an index on it. I sometimes run
queries on the collection and instead of passing the URL, I thought it
would be better if I use my application to generate the SHA1 hash and
query the collection using the hash instead of the URL.

Currently I'm storing the data in the hash field as string and have an
index on it, and I was wondering if I should store it as binary
instead; would the index be more efficient?

I used to do the same thing with MySQL - eg. for a MD5 hash field,
instead of using a CHAR(32) type, I used a BINARY(16) type, and index
that field.

Should I do the same with my Mongo collection? I suppose I would have
to use MongoBinData to do the conversion from the string hash to the
binary form? Something like this maybe (PHP):

$url = "http://www.example.com/";

$doc = array("URL" => $url, "hash" => new
MongoBinData(hex2bin(sha1($url))));

$docs->save($doc);

Eliot Horowitz

unread,
Oct 4, 2010, 6:03:47 AM10/4/10
to mongod...@googlegroups.com
Yes, using binary would be a little more efficient just because its 16
bytes instead of 32.
I think your php syntax is correct as well.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

weasel

unread,
Oct 6, 2010, 4:58:50 AM10/6/10
to mongodb-user
Thanks for the quick reply, I will use binary data then.
Reply all
Reply to author
Forward
0 new messages