Using shorter ids (hashids) for mongodb objectids with mongoose

5,789 views
Skip to first unread message

Ryan Schmidt

unread,
May 28, 2013, 8:10:30 AM5/28/13
to nod...@googlegroups.com
I'm using mongodb with mongoose, and using normal mongodb objectids as my primary key. I like objectids but in URLs they're a bit long, and I'd like to use something shorter.

I've seen modules like shortid which generate shorter strings, but I'm not convinced of their uniqueness. I am convinced of the uniqueness of objectids and would like to find a way to just compress them down a bit. Then I found the module hashids, which is able to do just that.

Now the question is: how do I best incorporate hashids into my models? Do I somehow override each model's id property so that it's automatically converted into a hashid right after reading from the database and converted back to an objectid right before saving? Or do I make a virtual field "myid" or something? How have others handled this?

Alan Hoffmeister

unread,
May 28, 2013, 8:20:27 AM5/28/13
to nodejs
Why not generate a small ammount of letters and numbers and check it's
existence agains the database? If nothing is found you just got a
unique small "hash" and you can save it in any field of the document.
--
Att,
Alan Hoffmeister


2013/5/28 Ryan Schmidt <googl...@ryandesign.com>:
> I'm using mongodb with mongoose, and using normal mongodb objectids as my primary key. I like objectids but in URLs they're a bit long, and I'd like to use something shorter.
>
> I've seen modules like shortid which generate shorter strings, but I'm not convinced of their uniqueness. I am convinced of the uniqueness of objectids and would like to find a way to just compress them down a bit. Then I found the module hashids, which is able to do just that.
>
> Now the question is: how do I best incorporate hashids into my models? Do I somehow override each model's id property so that it's automatically converted into a hashid right after reading from the database and converted back to an objectid right before saving? Or do I make a virtual field "myid" or something? How have others handled this?
>
> --
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups "nodejs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Ryan Schmidt

unread,
May 28, 2013, 9:10:37 AM5/28/13
to nod...@googlegroups.com
Because it seems like a good idea to *know* that an ID is unique, without having to possibly generate multiple IDs and query the database server in a loop to find one that happens to be unique.


Another module I found, cuid, explains it this way:

> Because of the nature of this problem, it's possible to build an app from the ground up and scale it to a million users before this problem rears its head. By the time you notice the problem (when your peak hour use requires dozens of ids to be created per ms), if your db doesn't have unique constraints on the id because you thought your guids were safe, you're in a world of hurt. Your users start to see data that doesn't belong to them because the db just returns the first ID match it finds.
>
> Alternatively, you've played it safe and you only let your database create ids. Writes only happen on a master database, and load is spread out over read replicas. But with this kind of strain, you have to start scaling your database writes horizontally, too, and suddenly your application starts to crawl (if the db is smart enough to guarantee unique ids between write hosts), or you start getting id collisions between different db hosts, so your write hosts don't agree about which ids represent which data.

https://github.com/dilvie/cuid

Mongodb objectids already have the same characteristics as cuids -- a timestamp; a machine id; a process id; and a counter with randomness:

http://docs.mongodb.org/manual/reference/object-id/

Since mongodb objectids contain a machine id they can be generated on multiple database servers without collisions so they don't suffer from the same constraints the author of cuid was talking about above. And I don't at this time need the ability to create such IDs from the client; it's sufficient to me that they be created by the database server.


So again, the question is not "how do I generate short IDs that might not be unique that will cause me problems when they aren't". The question is "how do I take the unique IDs that my database has generated and automatically transform them so that I can keep the original ID in the database but use the transformed value in routes, views, etc. with a minimum of fuss."


On May 28, 2013, at 07:20, Alan Hoffmeister wrote:

> Why not generate a small ammount of letters and numbers and check it's
> existence agains the database? If nothing is found you just got a
> unique small "hash" and you can save it in any field of the document.
> --
> Att,
> Alan Hoffmeister
>
>
> 2013/5/28 Ryan Schmidt:

George Snelling

unread,
May 29, 2013, 1:37:00 AM5/29/13
to nod...@googlegroups.com
FWIW, we scratched our head over the same problem, gave up, and wrote our own _id generator. It's a glorified timestamp with a big random seed after milliseconds part, formatted to be read by humans and look reasonable in urls.  Since the high-order part increases with time, it shards well.  We found it much easier to simply check for a unique index violation error on insert and retry with a new key whenever that happens than to solve the problem you're trying to solve.

But if you do come up with a good solution I'd love to see it :)

-g   

Martin Wawrusch

unread,
May 29, 2013, 1:40:21 AM5/29/13
to nod...@googlegroups.com
Why not simply use base56 encoding of the object id?


Stuart Bentley

unread,
May 29, 2013, 1:42:33 AM5/29/13
to nod...@googlegroups.com
```js
modelSchema.virtual('hashid').get(function () {
  var oidhex = this._id.toHexString()
  return hashids.encrypt(parseInt(oidhex.slice(0,12),16),parseInt(oidhex.slice(12),16);
});

modelSchema.virtual('hashid').set(function (hashid) {
  var halves = hashids.decrypt(hashid);
  var zeroes = '000000000000';
  this._id = new ObjectID((zeroes+halves[0]).slice(-12)+(zeroes+halves[1]).slice(-12));
}
```

Note that this is untested code written in the Google Groups editor. I have no Mongoose or hashids experience - this is going exclusively from the documentation. Where the `hashids` object comes from is left as an exercise for the reader.

Stuart P. Bentley

unread,
May 29, 2013, 1:48:58 AM5/29/13
to nod...@googlegroups.com
Case in point, here's a revised version with correctly balanced parentheses:

```js
modelSchema.virtual('hashid').get(function () {
  var oidhex = this._id.toHexString();
  return hashids.encrypt(parseInt(oidhex.slice(0,12),16),parseInt(oidhex.slice(12),16));
});

modelSchema.virtual('hashid').set(function (hashid) {
  var halves = hashids.decrypt(hashid);
  var zeroes = '000000000000';
  this._id = new ObjectID((zeroes+halves[0]).slice(-12)+(zeroes+halves[1]).slice(-12));
});
```

Ryan Schmidt

unread,
May 30, 2013, 3:35:53 PM5/30/13
to nod...@googlegroups.com
Thanks for your responses.


On May 29, 2013, at 00:40, Martin Wawrusch wrote:

> Why not simply use base56 encoding of the object id?

I haven't tried base56 (is there a module or built-in function you'd recommend for that?) but a mongodb objectid begins with 4 bytes of timestamp, which will be very similar for large periods of time, and then 3 bytes of machineid and 2 bytes of processid, which will be identical for large periods of time, so a base-anything encoding of such ids would tend to look quite similar. I'd like my user-facing ids to look "more random" than that.


On May 29, 2013, at 00:48, Stuart P. Bentley wrote:

> ```js
> modelSchema.virtual('hashid').get(function () {
> var oidhex = this._id.toHexString();
> return hashids.encrypt(parseInt(oidhex.slice(0,12),16),parseInt(oidhex.slice(12),16));
> });
>
> modelSchema.virtual('hashid').set(function (hashid) {
> var halves = hashids.decrypt(hashid);
> var zeroes = '000000000000';
> this._id = new ObjectID((zeroes+halves[0]).slice(-12)+(zeroes+halves[1]).slice(-12));
> });
> ```

Thanks, this is along the lines I was originally thinking. I just have to train myself to set and get the "hashid" field instead of the "id" field. I'll use this for now. Since I may need a hashid on multiple models, I made a function to add the virtuals which I can call when defining each model.

I was hoping for actual real-world experience though. How do I find a database record with a hashid? To find by objectid, I just do:

Thing.findOne({_id: req.params.thingid}, function(err, thing) {...});

It seems like even if finding on a virtual field works, it would be slow, since the index would be on the id, not the hashid. And as it turns out I can't get it to work; there's no error, it just doesn't return any results. So instead I've done:

Thing.findOne({_id: fromHashId(req.params.thingid)}, function(err, thing) {...});

where fromHashId does like your virtual('hashid').get() function.


On May 29, 2013, at 00:37, George Snelling wrote:

> FWIW, we scratched our head over the same problem, gave up, and wrote our own _id generator. It's a glorified timestamp with a big random seed after milliseconds part, formatted to be read by humans and look reasonable in urls. Since the high-order part increases with time, it shards well. We found it much easier to simply check for a unique index violation error on insert and retry with a new key whenever that happens than to solve the problem you're trying to solve.

That's good to know, thanks. What are other people actually using for their short ids, regardless of backend storage system? Are you generating them yourself? How are you dealing with collisions? Has it been a problem?

I want the impossible! :) I want short ids that people posting urls to twitter will appreciate, but I don't want collisions or the overhead of verifying that there aren't any.

Alex Kocharin

unread,
May 31, 2013, 5:56:18 AM5/31/13
to nod...@googlegroups.com

> I'd like my user-facing ids to look "more random" than that.

In other words, you want this:
4273efa0006b70
4273efa000b400
4273efa000f7e0
4273efa0013810

To look like this?:
8fe615a3dfd19b0c
0673b4d04ffa4b17
67126afba95997b4
292717c7064e6019


Well, there's a standard way of doing that. :)
------------------------------------------------------------------------
var crypto = require('crypto');
var password = new Buffer('mysuperpuperpassword');

// encoding
var date = new Buffer(8);
date.writeDoubleBE(Date.now(), 0);
date = date.slice(0, 7); // fucking autopadding
var cipher = crypto.createCipher('CAST-cbc', password);
var result = cipher.update(date, null, 'hex') + cipher.final('hex');
//if (result.length == 16)
console.log('encoded thingy:', result);

// decoding
var decipher = crypto.createDecipher('CAST-cbc', password);
var final = decipher.update(result, 'hex', 'hex') + decipher.final('hex');
console.log('decoded thingy:', final.toString('hex'));
console.log('original date:', new Date(new Buffer(final+'00', 'hex').readDoubleBE(0)));
------------------------------------------------------------------------

Ryan Schmidt

unread,
May 31, 2013, 12:40:34 PM5/31/13
to nod...@googlegroups.com

On May 31, 2013, at 04:56, Alex Kocharin wrote:

>> I'd like my user-facing ids to look "more random" than that.
>
> In other words, you want this:
> 4273efa0006b70
> 4273efa000b400
> 4273efa000f7e0
> 4273efa0013810
>
> To look like this?:
> 8fe615a3dfd19b0c
> 0673b4d04ffa4b17
> 67126afba95997b4
> 292717c7064e6019
>
>
> Well, there's a standard way of doing that. :)

Ok, thanks for letting me know about crypto. So you would suggest objectid -> crypto -> base56? If so, what's a good module for doing base56?

Alex Kocharin

unread,
May 31, 2013, 5:01:44 PM5/31/13
to nod...@googlegroups.com

If you want small changes in the input to affect all bits in the output, then yes, that's what good ciphers are doing. If you just want users to clearly distinguish one value from another, that will do fine. I just hope you aren't going to use 64bit cipher to ensure unpredictability of ids...

I don't know what base56 is... But you can just use base64. Youtube uses URL-base64 encoding replacing last two chars with "-" and "_".

But anyway... mongodb id is 12 bytes. It would be 16 bytes base64-encoded. That's too long, and I'd very much like to see a solution to create shorter or more user-friendly ids.

Did you think about assigning auto-incrementing number to a message? Or a string like "user_number"? Or whatever... mongodb _ids are very much necessary, but they don't always need to be exposed to user.

Ryan Schmidt

unread,
Jun 2, 2013, 7:49:56 PM6/2/13
to nod...@googlegroups.com

On May 31, 2013, at 16:01, Alex Kocharin wrote:

> If you want small changes in the input to affect all bits in the output, then yes, that's what good ciphers are doing. If you just want users to clearly distinguish one value from another, that will do fine. I just hope you aren't going to use 64bit cipher to ensure unpredictability of ids…

I don't care if someone guesses an id; the resources would be public, like youtube videos or shortened urls. And for any ids that aren't supposed to be public, they'd be properly protected against unauthorized access.


> I don't know what base56 is... But you can just use base64. Youtube uses URL-base64 encoding replacing last two chars with "-" and "_".
>
> But anyway... mongodb id is 12 bytes. It would be 16 bytes base64-encoded. That's too long, and I'd very much like to see a solution to create shorter or more user-friendly ids.
>
> Did you think about assigning auto-incrementing number to a message? Or a string like "user_number"? Or whatever... mongodb _ids are very much necessary, but they don't always need to be exposed to user.

I have briefly considered that. And using hashids with incrementing integers does produce pleasing short random-looking strings.

But mongodb doesn't have an autoincrement feature. They have documentation explaining how to fake it, and why it's problematic:

http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/

Ivan Akimov

unread,
Aug 28, 2013, 6:47:07 PM8/28/13
to nod...@googlegroups.com
Hi,

Just wanted to add that MongoDB/hex support was added to Hashids version 0.3.0: https://npmjs.org/package/hashids thru encryptHex() and decryptHex() functions.
Disclosure: I wrote it.

Cheers,
Ivan
Reply all
Reply to author
Forward
0 new messages