Before anything, I should note that if the number of owners for a
single item grows past a certain boundary, you'll have an overflow in
the size of the document. The following assumes you're already aware
of that, and don't feel like you should have a second collection to
hold the ownership relations.
Now, from what I can tell, you already have the information in a
single document, you just want to have it more accessible. The only
valid use case for this is if you put an index on the owner count. If
creating such an index is not your goal, I strongly suggest you do the
calculations on the client side, and use $where for any queries which
require the calculated value (Yet do not require an index on it).
The rest of this post assumes you want an index on the owner count.
Well, the obvious thing you need to do in order to index the owner
count is to make it a field. That much is obvious. The problem is with
keeping it up to date.
Now, in a prefect world, MongoDB would allow you to make a complex
update which adds the owner, increments the count, and updates the
field, in a single action. I've made another post on the subject,
titled "Complex Update", and there is a link to a related JIRA there.
However, since such a feature is not yet available, the rest of the
post will assume it is not an option.
Now, you're already half-way there with the concept of an incrementing
field for each owner. But now you need matching fields for the sums,
both total and unique. But, you also need to know when to increment
it, and when to decrement it. The key here is to use findAndModify: By
replacing the increments with findAndModify, you can determine if the
value for an owner was previously 0, and update the unique count then.
For example:
var doc = db.itemstats.findAndModify({'_id' : 1, '
owners.name' :
"joe" }, {'$inc' : { 'owners.$.count' : 1 }});
var found = false;
var i;
for (i in doc.owners) {
if (doc.owners[i].name == "joe") {
found = (doc.owners[i].count == 0);
break;
}
}
if (found) {
db.itemstats.update({'_id' : 1}, {'$inc' : { 'unique_owners' :
1 }});
}
db.itemstats.update({'_id' : 1}, {'$inc' : { 'total' : 1 }});
You can deal with decrementing in a similar fashion by checking that
the document after the decrementation has 0 ownership for that owner.
This should ensure that the counts will be eventually consistent so
long as the client does not fail. However, if a single client fail,
the mismatch would remain permanently, even after subsequent
operations.
A slightly more expensive yet safe method is to compute the counts on
the client side for each increment. The issue with doing that is that
you cannot hold a lock on the document while doing the calculation.
What you can do is use the following optimistic locking pattern:
1. After changing the owners, use findAndModify to increment and
return a field "change_count" in the document.
2. Get the owner set for the document. This can be done in the same
action as above.
3. Compute the counts for the owner set.
4. Perform the following operation: db.itemstats.update({'_id' : 1,
$where: "this.calculated_count != this.change_count ||
this.change_count == "+change_count}, {$set: {count: newcount, unique:
newunique, calculated_count: change_count}}); // (Apologies for the
$where injection, I don't know how to use scope with $where, nor how
to rewrite that query to not use $where)
The 4th step is the most important. It does two actions:
-If the value is still changing (High update rate), it makes the
update.
-If no more updates are being made, and the last writer has already
written the calculated counts, it makes sure no further writes are
possible. This prevents stale data from overwriting the most up to
date data. The last writer is guaranteed to have an up to date owner
set, since it retrieves it after the last change has been made.
In other words, the method provides both intermediate counts during
rapid updates, as well as eventual consistency once update cease. What
it does lose is that intermediate counts may be received out-of-order
(For instance, when adding 5 owners to a document with 3 owners,
repeated queries may return the sequence 3, 4, 7, 6, 5, 8). Only the
final count is guaranteed to be, indeed, the last returned value.
What it gains is that if a client crashes, so long as it is not the
last update, the count will remain correct. If the last update
crashed, any subsequent update will correct it. Additionally, it is
easy to detect that the last update was not yet completed (or
crashed), by checking that change_count != calculated_count. A
"garbage collector" may go over the documents to check for such
faults, or repair returned results dynamically during queries.
Note that neither method will guarantee that the owner sets and saved
counts would constantly match at every read, only that they would
closely follow each other, with a slight temporal shift, and be
consistent shortly after updating ends.
If you need, for example, to search for items with 10 owners, and must
not receive any owner sets with 9 or 11 owners, you will have to take
the results of searching based on the owner count, and further
filtering them down based on the owner sets, either with $where or on
the client side.
Note that while this will discard items that went from 10 owners to 11
owners, but have not yet updated the count, it will not add items that
went from 11 owners to 10 owners. However, since all operations are
only atomic per-document, a result which discards items which have not
yet been fully updated is valid (This can be seen as equivalent to the
update having been made after the query), and is the best that can be
guaranteed anyway.
> 1. Counting the total number of items
> - I solved this one by having a separate $inc that modifies a "count"
> field within the object, but I'd still like to be able to calculate the
> total on the fly.
> 2. Counting the total number of unique owners
> - No sensible solution for this found, I am at a loss...