Complex upsert with Perl's MongoDB driver

531 views
Skip to first unread message

Neil H Watson

unread,
Oct 1, 2012, 9:11:58 PM10/1/12
to mongod...@googlegroups.com
I'm trying to create an update/upsert to a complex json.  My expected results are.
{
        "_id" : "job 1",
        "host" : [
                {
                        "timestamp" : [
                                "2012-10-02 00:58:15", "2012-10-02 01:58:15"
                        ],
                        "_id" : "10.0.0.1",
                        "fqhn" : "host1.example.com"
                },
                {
                        "timestamp" : [
                                "2012-10-02 00:58:40", "2012-10-02 01:58:40",
                        ],
                        "_id" : "10.0.0.2",
                        "fqhn" : "host2.example.com"
                }
         ]
}

So, host is an array of embedded documents.  Timestamp is an array in that document.  However, this is the results I am getting.  I seem to be getting a full new host array entry instead.

{
        "_id" : "Job 1",
        "host" : [
                {
                        "timestamp" : [
                                "2012-10-02 00:58:15"
                        ],
                        "_id" : "10.0.0.1",
                        "fqhn" : "host1.example.com"
                },
                {
                        "timestamp" : [
                                "2012-10-02 00:58:40"
                        ],
                        "_id" : "10.0.0.2",
                        "fqhn" : "host2.example.com"
                },
                {
                        "timestamp" : [
                                "2012-10-02 00:59:14"
                        ],
                        "_id" : "10.0.0.2",
                        "fqhn" : "host2.example.com"
                }
        ]
}

I'm using Perl's MongoDB driver.  The update section is currently this:
                        $collection->update (
                                { '_id' => "$job" },
                                        { '$addToSet' => {
                                                'host' =>
                                                        {
                                                                '_id' => "$ip_address",
                                                                'fqhn' => "$fqhn",
                                                                'timestamp' => [ "$timestamp" ]
                                                        }
                                                }
                                        },
                                { 'upsert' => 1, 'safe' => 1 }
                        );
I must be something to do with the timestamp part.  The examples in the POD do not cover this type of complexity.  What have I done wrong?




            

Asya Kamsky

unread,
Oct 2, 2012, 1:10:24 PM10/2/12
to mongod...@googlegroups.com
Your code is in fact doing an $addToSet to host array (which is why you are getting a new element).

Are you trying to add a new timestamp to an existing element?  If so, which one?

Asya

Neil H Watson

unread,
Oct 2, 2012, 7:32:30 PM10/2/12
to mongod...@googlegroups.com


On Tuesday, October 2, 2012 1:10:24 PM UTC-4, Asya Kamsky wrote:
Your code is in fact doing an $addToSet to host array (which is why you are getting a new element).

Are you trying to add a new timestamp to an existing element?  If so, which one?

I would like to add a timestamp to the array in the document who's IP address '_id' matches.  If no such _id matches I'd like to add a new host document.

Asya Kamsky

unread,
Oct 3, 2012, 1:14:23 AM10/3/12
to mongod...@googlegroups.com
Here is the correct way of doing this update (in shell syntax):

db.collection.update({"_id" : "job 1","host._id" : "10.0.0.1"}, 
                                    {$push:{"host.$.timestamp":"new-timestamp-value"}})
This will push a new timestamp value to the appropriate embedded "host" subdocument's timestamp array.

Since you're using the word upsert, I'm guessing you think that if host._id doesn't mat any array elements, you want to push a full new subdocument to this array, but that's not what an upsert is.   An upsert only works on the top document level.   If the document you want to update doesn't exist then it (the entire document) will be created.

If that's how you need it to work (in a single update) then I think you will need to alter your schema.
If instead of array of host elements you have a collection of them (and each host has an associated "job_id" by which you can query to get all host documents for "job 1") then you can use upserts to either add new information to an existing "host" document or create it if it doesn't exist.

The way you have your schema now, I think you would have to do two updates or a query/test and then an update, and of course, that's not atomic like an upsert is.

Asya

Neil H Watson

unread,
Oct 3, 2012, 7:29:34 AM10/3/12
to mongod...@googlegroups.com
Thanks for the tip.  I'll look into it.  I'm not set on this scheme.  My criteria is the following and I welcome all suggestions.

1.The job _id is not something I know before hand.  I do not know how may such jobs there will be.
2. An unknown number of hosts will run this job.  I _id the host via the IP address but, also lookup the fqhn at insert time for convenience.
3. Each time the job is run I want to record the completion timestamp along with the historical timestamps of the same job (ensure there are no duplicates)
4. Any update/insert/upsert could be an existing job and existing host, an existing job and new host, or new job.
5. Suggestions for bulk insert/update/upserts all welcome as I'll receive the data in batches.

Neil H Watson

unread,
Oct 5, 2012, 9:03:18 PM10/5/12
to mongodb-user
On Oct 3, 1:14 am, Asya Kamsky <a...@10gen.com> wrote:
> Here is the correct way of doing this update (in shell syntax):
>
> db.collection.update({"_id" : "job 1","host._id" : "10.0.0.1"},
>
> {$push:{"host.$.timestamp":"new-timestamp-value"}})

I tried this, with two updates, and something odd happens.

Perl code:

# Second update:
$collection->update(
{ '_id' => "$job", 'host._id' => "$ip_address" },
{ '$push' => {
'host.$.timestamp' => "$timestamp"
}
},
{ 'safe' => 1 }
);

And this appeared
{
"_id" : "Job 1",
"host" : {
"$" : {
"timestamp" : [
"2012-10-05 20:55:09"
]
},
"_id" : "10.0.0.1",
"fqhn" : "ettin.example.com"
}
}

It is my understanding that the dollar sign in host.$.timestamp is an
operator. Why did perl treat it like a string?

Asya Kamsky

unread,
Oct 13, 2012, 6:46:09 PM10/13/12
to mongod...@googlegroups.com
Neil,

My apologies, I did not mention a restriction that you have to keep in mind here.

"The positional operator cannot be combined with an upsert since it requires a matching array element. If your update results in an insert then the "$" will literally be used as the field name."

I think this is exactly what you were seeing (although I don't see the upsert flag set in the specific example that you showed, but your result matches what would happen if you used the positional operator with an upsert and the document did not exist).

I'm not sure if this will tilt your decision to stick with this schema or switch to one that may match the requirement to do these sort of updates.

You can still do this with two updates, but your step would be to query if the _id of array element exists - below you show how you are doing the update but not what your first query (update?) is doing...

Asya

Asya Kamsky

unread,
Oct 13, 2012, 6:57:25 PM10/13/12
to mongod...@googlegroups.com
I gave this a little thought and this is what I think would be very simple:

Currently you have this  document representing each job:
{
        "_id" : "job 1",
        "host" : [
                {
                        "timestamp" : [
                                "2012-10-02 00:58:15", "2012-10-02 01:58:15"
                        ],
                        "_id" : "10.0.0.1",
                        "fqhn" : "host1.example.com"
                },
                {
                        "timestamp" : [
                                "2012-10-02 00:58:40", "2012-10-02 01:58:40",
                        ],
                        "_id" : "10.0.0.2",
                        "fqhn" : "host2.example.com"
                }
         ]
}

I'm suggesting changing it to have each document represent job/host combination
so now the above single document would instead be these two:

{
        "jobName" : "job 1",
        "host" : "10.0.0.1",
       "timestamp" : [

                                "2012-10-02 00:58:15", "2012-10-02 01:58:15"
                        ],
         "fqhn" : "host1.example.com"
 },
 {
        "jobName" : "job 1",
        "host" : "10.0.0.2",

        "timestamp" : [
                                "2012-10-02 00:58:40", "2012-10-02 01:58:40",
                        ],

         "fqhn" : "host2.example.com"
 }

Now your updates/upserts are like this:
db.collection.update({"jobName":"job 5", "host": 10.0.0.2}, {$push:{timestamp:"new time here"}}, true)
This takes care of new job (upserts), existing job/new host (upserts), existing job, existing host: updates existing document with new timestamp.

Assuming these are the only interactions (and later you query only by job name) you would need an index on {jobName:1, host:1} - I would just let the _id be generated by the driver.

If you really want to have _id unique index used to enforce uniqueness of job/host combination you can make your _id field {jobName:"name", host: "ip"} - but be careful, you can still use this index to query on {_id:{jobName:"x",host:"y"}} but you *cannot* use it to query on  {"_id.jobName":"Z"} since that's not indexed.

Asya

Neil H Watson

unread,
Oct 15, 2012, 2:33:49 PM10/15/12
to mongodb-user
Your suggestions were helpful. Thank you.
Reply all
Reply to author
Forward
0 new messages