[mongodb-user] Possible collision in ObjectID generation

164 views
Skip to first unread message

Matic

unread,
May 18, 2010, 5:50:31 PM5/18/10
to mongodb-user
Hey!

Where does the MongoDB driver in PHP get the machine ID part of the
ObjectID from? Would PHP workers running on the same physical machine
get the same MachineID part? I'm using OpenVZ, which is a OS-level
"virtualization" with zero performance overhead. OpenVZ simply
partitions (like partitions on HDDs) the host operating system into
several parts, each with it's own isolated processes (and PIDs).
Processes of all the running VPSs can be seen from the host system.
All the PIDs are unique when looking from host system, but some PIDs
of PHP processes can be identical if you're running several VPSs and
looking from the VPS (which Mongo driver will). Thats because each
process has 2 PIDs, 1 real (visible to host only) and 1 virtual (only
visible from inside of that VPS). You can have the same virtual PID
shared between several processes, as long as they're in different
VPSs.

So let's say we're in the same 1-second time window, same machine and
have a PID collision between 2 PHP workers, each running in separate
VPS. I presume the 3-byte counter starts from the same position every
time. If I'm correct, this would result in the same ObjectID being
generated by the both PHP workers, causing a collision.

The only solution I see to this problem (and keep using ObjectID) is
running insert() with safe mode enabled and if a collision is
detected, generate another ObjectID, which will increase the counter
and try inserting again.

Happy day,
Matic

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Kristina Chodorow

unread,
May 18, 2010, 5:59:41 PM5/18/10
to mongod...@googlegroups.com
Where does the MongoDB driver in PHP get the machine ID part of the ObjectID from?

It uses gethostbyname("localhost") and then hashes it.  So, if you can assign the different VPSs different hostnames (can you?), you should be fine, or you could do as you suggested with the safe insert.

Matic

unread,
May 18, 2010, 6:06:54 PM5/18/10
to mongodb-user
Yes, you can change hostname inside of a VPS.
But running in PHP:

echo gethostbyname('localhost');

returns 127.0.0.1, which will be the same for all VPS.

On May 18, 11:59 pm, Kristina Chodorow <krist...@10gen.com> wrote:
> > Where does the MongoDB driver in PHP get the machine ID part of
> > the ObjectID from?
>
> It uses gethostbyname("localhost") and then hashes it.  So, if you can
> assign the different VPSs different hostnames (can you?), you should be
> fine, or you could do as you suggested with the safe insert.
>
> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/mongodb-user?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group athttp://groups.google.com/group/mongodb-user?hl=en.

Kristina Chodorow

unread,
May 18, 2010, 6:15:18 PM5/18/10
to mongod...@googlegroups.com
Sorry, not the PHP gethostbyname function, the C function.  That returns a hostent struct with an h_name field described as:

The official name of the host (PC). If using the DNS or similar resolution system, it is the Fully Qualified Domain Name (FQDN) that caused the server to return a reply. If using a local hosts file, it is the first entry after the IPv4 address.

...which is what is being hashed.  (source: http://msdn.microsoft.com/en-us/library/ms738552(v=VS.85).aspx)

Matic

unread,
May 18, 2010, 6:58:22 PM5/18/10
to mongodb-user
So will it read /etc/hosts or /etc/hostname?

vps-001:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
# Auto-generated hostname. Please do not remove this comment.
10.1.10.1 vps-001

vps-001:~# cat /etc/hostname
vps-001

What I'm saying is that "localhost" will return/resolve to 127.0.0.1
on every machine. I presume if your function returned the hostname (as
defined in /etc/hostname), it wouldn't require any parameters, but it
does in your example ('localhost'). If you want a C function that
returns the hostname, check http://linux.die.net/man/2/gethostname .

On May 19, 12:15 am, Kristina Chodorow <krist...@10gen.com> wrote:
> Sorry, not the PHP gethostbyname function, the C function.  That returns a
> hostent struct with an h_name field described as:
>
> The official name of the host (PC). If using the DNS or similar resolution
>
> > system, it is the Fully Qualified Domain Name (FQDN) that caused the server
> > to return a reply. If using a local hosts file, it is the first entry after
> > the IPv4 address.
>
> ...which is what is being hashed.  (source:http://msdn.microsoft.com/en-us/library/ms738552(v=VS.85).aspx)
>
> > <mongodb-user%2Bunsu...@googlegroups.com<mongodb-user%252Buns...@googlegroups.com>

Лоик

unread,
May 18, 2010, 7:17:50 PM5/18/10
to mongod...@googlegroups.com
It should read /etc/hostname since hosts is only supposed to be a list of host you manually defined.

In other word, if each vps has it's own different /etc/hostname, it should work.

Suno Ano

unread,
May 19, 2010, 6:00:19 AM5/19/10
to mongod...@googlegroups.com

[skipping a lot of lines ...]

Лоик> In other word, if each vps has it's own different /etc/hostname,
Лоик> it should work.

Yes, OpenVZ works like this, every VE (Virtual Environment) can have it
is unique hostname
http://sunoano.name/ws/openvz.html#ve_with_static_ipv4_address

Aside this, even without walking the code for how ObjectIDs are created,
I would assume it is based on algorithms which even if the hostname were
the same, should be able to create unique IDs.

Johannes Reichardt

unread,
May 19, 2010, 6:54:18 AM5/19/10
to mongod...@googlegroups.com
I don´t want to hijack this thread, but i also wonder why the mongoid is
an object? In the end
its a simple string also but it adds some programming overhead to take
the possibility of a mongo object into account...

Tim Hawkins

unread,
May 19, 2010, 6:58:57 AM5/19/10
to mongod...@googlegroups.com
because it encapsulates methods that extract the creation timestamp from the id.
Windows - The only OS you can buy from TOYSrUS
http://www.toysrus.com/product/index.jsp?productId=3896283

Tim Hawkins
tim.h...@me.com

Kristina Chodorow

unread,
May 19, 2010, 8:25:43 AM5/19/10
to mongod...@googlegroups.com
Also, a string would take up 29 bytes of storage.  A MongoId takes up 12.

Matic

unread,
May 19, 2010, 9:28:43 AM5/19/10
to mongodb-user
Kristina, please look at this:
http://stackoverflow.com/questions/2865583/gethostbyname-in-c

I compiled one of those code examples and I indeed get "localhost" as
the result, which means MongoDB driver does as well. People replied
that gethostbyname() is not the best way to determine the hostname.
They recommend using gethostname() instead.

When the "hostname" is hashed, is the entire string hashed or only a
part of it? I'm asking because when you have a server farm, hostnames
are very dull like vps-phpworker-001, vps-phpworker-002, vps-
phpworker-003, ... If only the first 8 characters would be hashed,
you'd get the same result for every mentioned hostname because
uniqueness comes from the tail part of the hostname string.

Happy day,
Matic

On May 19, 2:25 pm, Kristina Chodorow <krist...@10gen.com> wrote:
> Also, a string would take up 29 bytes of storage.  A MongoId takes up 12.
>
>
>
> On Wed, May 19, 2010 at 6:58 AM, Tim Hawkins <tim.hawk...@me.com> wrote:
> > because it encapsulates methods that extract the creation timestamp from
> > the id.
>
> > On May 19, 2010, at 6:54 PM, Johannes Reichardt wrote:
>
> > > I don´t want to hijack this thread, but i also wonder why the mongoid is
> > an object? In the end
> > > its a simple string also but it adds some programming overhead to take
> > the possibility of a mongo object into account...
> > >> [skipping a lot of lines ...]
>
> > >>  Лоик>  In other word, if each vps has it's own different /etc/hostname,
> > >>  Лоик>  it should work.
>
> > >> Yes, OpenVZ works like this, every VE (Virtual Environment) can have it
> > >> is unique hostname
> > >>http://sunoano.name/ws/openvz.html#ve_with_static_ipv4_address
>
> > >> Aside this, even without walking the code for how ObjectIDs are created,
> > >> I would assume it is based on algorithms which even if the hostname were
> > >> the same, should be able to create unique IDs.
>
> > > --
> > > You received this message because you are subscribed to the Google Groups
> > "mongodb-user" group.
> > > To post to this group, send email to mongod...@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>
> > .
> > > For more options, visit this group at
> >http://groups.google.com/group/mongodb-user?hl=en.
>
> > Windows - The only OS you can buy from TOYSrUS
> >http://www.toysrus.com/product/index.jsp?productId=3896283
>
> > Tim Hawkins
> > tim.hawk...@me.com
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "mongodb-user" group.
> > To post to this group, send email to mongod...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/mongodb-user?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group athttp://groups.google.com/group/mongodb-user?hl=en.

Kristina Chodorow

unread,
May 19, 2010, 9:52:24 AM5/19/10
to mongod...@googlegroups.com
I'll have to read up on it a bit (make sure it's equivalent on other platforms), but I agree, changing it to gethostname() seems like a good idea.


When the "hostname" is hashed, is the entire string hashed or only a
part of it? I'm asking because when you have a server farm, hostnames
are very dull like vps-phpworker-001, vps-phpworker-002, vps-
phpworker-003, ... If only the first 8 characters would be hashed,
you'd get the same result for every mentioned hostname because
uniqueness comes from the tail part of the hostname string.
 
No worries, the entire string is hashed.
Reply all
Reply to author
Forward
0 new messages