Custom ID serialization

27 views
Skip to first unread message

Javier

unread,
Jan 10, 2011, 11:15:37 AM1/10/11
to mongodb-user
Hi,

My clients are generating 14-bytes custom ID's as follows:

timestamp (4 bytes)
user (4 bytes)
machine (3 bytes)
increment (3 bytes)

How can I store my ID's in Mongo while keeping the ascending order for
faster inserts? I can think of serializing my ID as binary data, but
would that preserve the ascending order?

Thanks

Nat

unread,
Jan 10, 2011, 11:24:56 AM1/10/11
to mongodb-user
just need to make sure you serialize them in big endian order... i.e.

byte[0..3] = timestamp
byte[4..7] = user
byte[8-10] = machine
byte[11-13] = increment

Javier

unread,
Jan 10, 2011, 11:30:19 AM1/10/11
to mongodb-user
But what data type should I use? binary data?

Andreas Jung

unread,
Jan 10, 2011, 11:33:14 AM1/10/11
to mongod...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You may check the Python code for generating an ObjectID from Python:

118 def __generate(self):$
119 """Generate a new value for this ObjectId.$
120 """$
121 oid = ""$
122 $
123 # 4 bytes current time$
124 oid += struct.pack(">i", int(time.time()))$
125 $
126 # 3 bytes machine$
127 oid += ObjectId._machine_bytes$
128 $
129 # 2 bytes pid$
130 oid += struct.pack(">H", os.getpid() % 0xFFFF)$
131 $
132 # 3 bytes inc$
133 ObjectId._inc_lock.acquire()$
134 oid += struct.pack(">i", ObjectId._inc)[1:4]$
135 ObjectId._inc = (ObjectId._inc + 1) % 0xFFFFFF$
136 ObjectId._inc_lock.release()$
137 $
138 self.__id = oid$


...likely doable with any other language.

- -aj


- --
ZOPYX Limited | zopyx group
Charlottenstr. 37/1 | The full-service network for Zope & Plone
D-72070 T�bingen | Produce & Publish
www.zopyx.com | www.produce-and-publish.com
- ------------------------------------------------------------------------
E-Publishing, Python, Zope & Plone development, Consulting


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQGUBAEBAgAGBQJNKzTKAAoJEADcfz7u4AZjd+gLugPXKSZ+/mhvL/KfdNKUOhRv
fZ7W+tgcimtIe5pr0Z2cmjZ8hsZUsgjsnhMOdhG1X99MGuiLP0EvL6whripP7dRt
cAC3Q+h5pY3iPG4MRXrpOeNFIPW8+8AIifCkchQJHH188LbzCwiBxkVjxdue0Fwt
SRJUiNGbrfMr576Y4z4sFQXWYjZIMrkWVeBdiHTFvOtZxnv4AtajTkyVtrYNMrsg
A6WtUfXDtBpfW7/MiOfkY52WCKACqOf8RQZ79vwT21wXYJQk6TSZYt+anB0gg2uN
q0vxWJqUNhgN88yvn0syhCbFOfU2I56mopSqIoW+7KrI8f8pCoYdElxr58/CvCQw
Zn7aTdwazVf1hMd8q+ziSmwkDA2125IIWm8Z3Pp+PDu4SeO+i3zRiTHOyIbIZW7Q
3cbrxbBvIkvA2kEzQTQlBVxH2AuYVeQWCngxVHrC5RxudjwSzNkm/z9yq2mefjD2
sR7pdo2dfB9Bc56QdAQK06PcIlcI4Qw=
=020Y
-----END PGP SIGNATURE-----

lists.vcf

Scott Hernandez

unread,
Jan 10, 2011, 11:33:47 AM1/10/11
to mongod...@googlegroups.com
Yes.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Javier

unread,
Jan 10, 2011, 12:22:04 PM1/10/11
to mongodb-user
Great answers. Thanks to all

On 10 ene, 17:33, Scott Hernandez <scotthernan...@gmail.com> wrote:
> Yes.
>

Javier

unread,
Jan 10, 2011, 3:50:27 PM1/10/11
to mongodb-user
Hi again,

I'm still still struggling trying to optimize my custom IDs. Mongo
uses BSON for network transfer and data storage and according to the
BSON spec binary data is represented as follows:

int32 subtype (byte*)

where int32 a 4 byte with the lenght of the binary data. This brings
to me some questions:

- My 14 bytes IDs will be stored with and overhead of 4+1 bytes. How
much this overhead can affect performance? Does Mongo's ObjectId add
the same overhead or it doesn't as subtype and and data lenght are
fixed?
- My IDs are serialized to mongod and stored in an unordered way
(first bytes are data lenght). Does this affect the loading speed of
the b-tree for the _id index when doing inserts?
- In order to send my IDs accros the wire, I have seen that Mongo
ObjectID's are converted to hex strings with a 100% overhead (12 to 24
bytes). Is there a reason not to use base64 which adds only 33%
overhead?

Thanks

Scott Hernandez

unread,
Jan 10, 2011, 4:33:07 PM1/10/11
to mongod...@googlegroups.com
On Mon, Jan 10, 2011 at 12:50 PM, Javier <javierf...@gmail.com> wrote:
> Hi again,
>
> I'm still still struggling trying to optimize my custom IDs. Mongo
> uses BSON for network transfer and data storage and according to the
> BSON spec binary data is represented as follows:
>
> int32 subtype (byte*)
>
> where int32 a 4 byte with the lenght of the binary data. This brings
> to me some questions:
>
> - My 14 bytes IDs will be stored with and overhead of 4+1 bytes. How
> much this overhead can affect performance? Does Mongo's ObjectId add
> the same overhead or it doesn't as subtype and and data lenght are
> fixed?

An ObjectId is a bson type and just has the single byte overhead for the type.

> - My IDs are serialized to mongod and stored in an unordered way
> (first bytes are data lenght). Does this affect the loading speed of
> the b-tree for the _id index when doing inserts?

All indexes include the full value, in this case the full binary
header of 5 bytes.

> - In order to send my IDs accros the wire, I have seen that Mongo
> ObjectID's are converted to hex strings with a 100% overhead (12 to 24
> bytes). Is there a reason not to use base64 which adds only 33%
> overhead?

It is not base64 encoded, that is just how some systems print it. It
is sent across as just 12 bytes, as the spec defines.

Javier

unread,
Jan 10, 2011, 4:54:05 PM1/10/11
to mongodb-user
Thanks for your answers Scott.

One of the recommendations from Mongo is to create ID's in ascending
order for faster inserts. From you answer I understand that even if I
do so, when serializing them as binary data that order will be lost
and inserts will be slower than with ObjectId. Is this correct?

In my last point I was referring to the way Mongo drivers serialize
Object Ids to JSON.

On 10 ene, 22:33, Scott Hernandez <scotthernan...@gmail.com> wrote:

Scott Hernandez

unread,
Jan 10, 2011, 5:14:10 PM1/10/11
to mongod...@googlegroups.com
On Mon, Jan 10, 2011 at 1:54 PM, Javier <javierf...@gmail.com> wrote:
> Thanks for your answers Scott.
>
> One of the recommendations from Mongo is to create ID's in ascending
> order for faster inserts. From you answer I understand that even if I
> do so, when serializing them as binary data that order will be lost
> and inserts will be slower than with ObjectId. Is this correct?

When you create a binary field it is in byte order, however you pack
the bytes (which is why Robert suggested packing them big-endian).
What you create will be stored directly in the document and index. It
will only be slower in the respect that it is an extra 6 bytes that
must be matched during lookups in the index. Really, it should be
almost un-noticeable since rarely is it cpu bound.

The object-id format and your format both start with a timestamp, and
therefor are both ascending in order over time (of creation). Right?

> In my last point I was referring to the way Mongo drivers serialize
> Object Ids to JSON.

Yes, don't serialize to json :)

Javier

unread,
Jan 10, 2011, 6:51:30 PM1/10/11
to mongodb-user
Understood. Thanks for being so helpful

On 10 ene, 23:14, Scott Hernandez <scotthernan...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages