So, host is an array of embedded documents. Timestamp is an array in that document. However, this is the results I am getting. I seem to be getting a full new host array entry instead.
I'm using Perl's MongoDB driver. The update section is currently this: $collection->update ( { '_id' => "$job" }, { '$addToSet' => { 'host' => { '_id' => "$ip_address", 'fqhn' => "$fqhn", 'timestamp' => [ "$timestamp" ] } } }, { 'upsert' => 1, 'safe' => 1 } ); I must be something to do with the timestamp part. The examples in the POD do not cover this type of complexity. What have I done wrong?
> So, host is an array of embedded documents. Timestamp is an array in that > document. However, this is the results I am getting. I seem to be getting > a full new host array entry instead.
> 'timestamp' => [ "$timestamp" ] > } > } > }, > { 'upsert' => 1, 'safe' => 1 } > ); > I must be something to do with the timestamp part. The examples in the > POD do not cover this type of complexity. What have I done wrong?
{$push:{"host.$.timestamp":"new-timestamp-value"}}) This will push a new timestamp value to the appropriate embedded "host" subdocument's timestamp array.
Since you're using the word upsert, I'm guessing you think that if host._id doesn't mat any array elements, you want to push a full new subdocument to this array, but that's not what an upsert is. An upsert only works on the top document level. If the document you want to update doesn't exist then it (the entire document) will be created.
If that's how you need it to work (in a single update) then I think you will need to alter your schema. If instead of array of host elements you have a collection of them (and each host has an associated "job_id" by which you can query to get all host documents for "job 1") then you can use upserts to either add new information to an existing "host" document or create it if it doesn't exist.
The way you have your schema now, I think you would have to do two updates or a query/test and then an update, and of course, that's not atomic like an upsert is.
On Tuesday, October 2, 2012 7:32:30 PM UTC-4, Neil H Watson wrote:
> On Tuesday, October 2, 2012 1:10:24 PM UTC-4, Asya Kamsky wrote:
>> Your code is in fact doing an $addToSet to host array (which is why you >> are getting a new element).
>> Are you trying to add a new timestamp to an existing element? If so, >> which one?
>> I would like to add a timestamp to the array in the document who's IP > address '_id' matches. If no such _id matches I'd like to add a new host > document.
Thanks for the tip. I'll look into it. I'm not set on this scheme. My criteria is the following and I welcome all suggestions.
1.The job _id is not something I know before hand. I do not know how may such jobs there will be. 2. An unknown number of hosts will run this job. I _id the host via the IP address but, also lookup the fqhn at insert time for convenience. 3. Each time the job is run I want to record the completion timestamp along with the historical timestamps of the same job (ensure there are no duplicates) 4. Any update/insert/upsert could be an existing job and existing host, an existing job and new host, or new job. 5. Suggestions for bulk insert/update/upserts all welcome as I'll receive the data in batches.
"The positional operator cannot be combined with an upsert since it requires a matching array element. If your update results in an insert then the "$" will literally be used as the field name."
I think this is exactly what you were seeing (although I don't see the upsert flag set in the specific example that you showed, but your result matches what would happen if you used the positional operator with an upsert and the document did not exist).
I'm not sure if this will tilt your decision to stick with this schema or switch to one that may match the requirement to do these sort of updates.
You can still do this with two updates, but your step would be to query if the _id of array element exists - below you show how you are doing the update but not what your first query (update?) is doing...
Now your updates/upserts are like this: db.collection.update({"jobName":"job 5", "host": 10.0.0.2}, {$push:{timestamp:"new time here"}}, true) This takes care of new job (upserts), existing job/new host (upserts), existing job, existing host: updates existing document with new timestamp.
Assuming these are the only interactions (and later you query only by job name) you would need an index on {jobName:1, host:1} - I would just let the _id be generated by the driver.
If you really want to have _id unique index used to enforce uniqueness of job/host combination you can make your _id field {jobName:"name", host: "ip"} - but be careful, you can still use this index to query on {_id:{jobName:"x",host:"y"}} but you *cannot* use it to query on {"_id.jobName":"Z"} since that's not indexed.
On Wednesday, October 3, 2012 4:29:35 AM UTC-7, Neil H Watson wrote:
> Thanks for the tip. I'll look into it. I'm not set on this scheme. My > criteria is the following and I welcome all suggestions.
> 1.The job _id is not something I know before hand. I do not know how may > such jobs there will be. > 2. An unknown number of hosts will run this job. I _id the host via the > IP address but, also lookup the fqhn at insert time for convenience. > 3. Each time the job is run I want to record the completion timestamp > along with the historical timestamps of the same job (ensure there are no > duplicates) > 4. Any update/insert/upsert could be an existing job and existing host, an > existing job and new host, or new job. > 5. Suggestions for bulk insert/update/upserts all welcome as I'll receive > the data in batches.