Convert array of timestamp to MongoDB ObjectId using PHP

395 views
Skip to first unread message

hrishikesh bidkar

unread,
Nov 14, 2019, 3:43:17 AM11/14/19
to mongodb-user

I am using third party API to get value from MongoDB using PHP. I get Mongo ObjectId as array of timestamp as below. How I can convert it to original MongoDB ObjectId


Array(
        [timestamp] => 1573559942
        [machineIdentifier] => some value
        [processIdentifier] => some Value
        [counter] => 8306872
        [date] => 2019-11-12T11:59:02.000+0000
        [time] => 1573559942000
        [timeSecond] => 1573559942
)

Tim Hawkins

unread,
Nov 14, 2019, 4:03:44 AM11/14/19
to mongodb-user
Try

Derived from https://stackoverflow.com/questions/14370143/create-mongodb-objectid-from-date-in-the-past-using-php-driver

```
<?php
function createObjectId($arr)
{
    static $inc = 0;

    $ts = pack( 'N', $arr['timestamp']);

    $bin = sprintf("%s%s%s%s", 
                   $ts, 
                   $arr['machineIdentifier'], 
                   $arr['processIdentifier'], 
                   $arr['counter']);
    $id = '';
    for ($i = 0; $i < 12; $i++ )
    {
        $id .= sprintf("%02X", ord($bin[$i]));
    }
    return new MongoID($id);
}
?>

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/8a9e0f78-6483-46e4-a7f7-e0c690859283%40googlegroups.com.

hrishikesh bidkar

unread,
Nov 14, 2019, 5:06:27 AM11/14/19
to mongodb-user
   It converts it to ID but it does not match with original ID in MongoDB   
To unsubscribe from this group and stop receiving emails from it, send an email to mongod...@googlegroups.com.

Jeremy Mikola

unread,
Nov 14, 2019, 11:47:51 AM11/14/19
to mongod...@googlegroups.com
I should point out that Derick's answer above was from 2013, back when many drivers were using a machine and process identifier for the middle five bytes of an ObjectId. Some drivers have always used random values for those fields, and the drivers team collectively decided to standardize on that behavior (i.e. random values) in the ObjectId specification published earlier this year. The MongoDB manual's ObjectId reference documents those five bytes as random data and this is also why you won't find accessor methods for a machine and process identifier in the PHP driver's MongoDB\BSON\ObjectId class, unlike the legacy PHP driver's MongoId class.

Having said that, the code in Derick's answer above was also demonstrating how to reconstruct an ObjectId from just a timestamp, and he was creating the machine and process identifer himself. In the OP's case here, we have no idea what "machineIdentifier" and "processIdentifier" are. They are only reported as "some value". The main challenge here will be ensuring that both of those values are

If you're using PHP, the pack() function will certainly be helpful to ensure that the ObjectId components are encoded with the proper endian-ness. Note that the timestamp and counter components must be big-endian. Historically, I believe the server used little-endian for the machine and process identifiers (see: https://stackoverflow.com/q/23539486), so you'll need to ensure you use the correct pack() argument for those two values if your goal is to recreate the very same ObjectId found in the driver. I'll note that Derick's code did not use pack() at all for the MD5-hashed hostname and he may have incorrectly used big-endian byte order for the process identifier (i.e. pack('n')) -- this wasn't terribly important in his example because he wasn't trying to recreate an exact ObjectID and only sought to generate on. In that case, his method provided just as much entropy as if he had used the correct byte order. I'll reiterate that entropy was the only purpose for those middle five bytes, which is why we've standardized on using a purely random value today.

In addition to Derick's code example, you may find it helpful to review https://github.com/mongofill/mongofill/blob/master/src/MongoId.php, which belongs to a community-developed library that implemented the legacy PHP driver in pure PHP code. It's MongoId class also uses pack() to assemble a 12-byte ObjectId value from individual fields, and I believe it does correctly use little-endian for the machine and process identifier components, and big-endian for the timestamp and counter components.

FWIW, I reported a documentation ticket many moons ago about discussing the endian-ness requirements of ObjectId components in the MongoDB manual (DOCS-9787). For the time being, this information is only covered in the driver specification linked above.

 

Tim Hawkins

unread,
Nov 14, 2019, 9:02:12 PM11/14/19
to mongodb-user
Presumably the OP is trying to reassemble the ObjectID from the parts he is getting to allow querying of the original data. In that case it does not matter that the definition is for random or non ramdom elements, but that the id is reassembled from the data he has to hand (the 'Some Values'). Its important that the regenerated id does use that data, otherwise it wont match. 

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.

Jeremy Mikola

unread,
Nov 15, 2019, 11:19:41 AM11/15/19
to mongod...@googlegroups.com
On Thu, Nov 14, 2019 at 9:01 PM Tim Hawkins <tim.th...@gmail.com> wrote:
Presumably the OP is trying to reassemble the ObjectID from the parts he is getting to allow querying of the original data. In that case it does not matter that the definition is for random or non ramdom elements, but that the id is reassembled from the data he has to hand (the 'Some Values'). Its important that the regenerated id does use that data, otherwise it wont match.

It wasn't clear what driver the OP was using, so I thought it important to clarify why the legacy and current driver APIs differ.

Understanding that the original purpose of those fields was to provide entropy helps explain why Derick's code was suitable for the Stack Overflow case (i.e. creating some ObjectId from a single timestamp) but should not be used as-is for the OP's use case (i.e. exactly recreating an ObjectId from its parts). In particular, the Stack Overflow code...
  • used three bytes from the 32-byte hexadecimal string returned by md5(), rather than the 16-byte binary value that can be returned by specifying true for the second md5() argument
  • encoded the process ID with big-endian byte order, instead of little-endian
Both of those issues are addressed correctly in the MongoId::assembleId() code from the Mongofill project.


Reply all
Reply to author
Forward
0 new messages