Best way to track ownership of certain items using MongoDB ?

30 views
Skip to first unread message

Steve Rabouin

unread,
May 27, 2012, 8:46:00 PM5/27/12
to mongod...@googlegroups.com
Hello everyone,

I'm still pretty new to MongoDB and I am trying to optimize how I'm doing things .. And one of them is calculating everything on the fly, in the database, instead of creating results on demand. This way, I have statistics on items available in real-time and it's much more interesting for the users than looking at cached pages because re-calculating statistics on millions of items is just not efficient.

I'm running into a challenge in designing this.. I've tried multiple ways to get the result but I can't seem to figure this one out.

I want to track who owns a certain item, and someone can own multiple copies of that item. For the purpose of this example, let's assume this item is a DVD.

I thought the best way to create this collection would be something like this:

db.itemstats.update({'_id' : 1}, {'$addToSet' : { 'owners' : { 'name' : "joe" } } }, true, false) 
db.itemstats.update({'_id' : 1}, {'$addToSet' : { 'owners' : { 'name' : "beef" } } }, true, false) 
db.itemstats.update({'_id' : 2}, {'$addToSet' : { 'owners' : { 'name' : "beef" } } }, true, false) 
db.itemstats.update({'_id' : 1, 'owners.name' : "joe" }, {'$inc' : { 'owners.$.count' : 1 }});
db.itemstats.update({'_id' : 1, 'owners.name' : "joe" }, {'$inc' : { 'owners.$.count' : 1 }});
db.itemstats.update({'_id' : 2, 'owners.name' : "beef" }, {'$inc' : { 'owners.$.count' : 1 }});

This would return something like this:

{ "_id" : 1, "owners" : [ { "count" : 2, "name" : "joe" }, { "name" : "beef" } ] }
{ "_id" : 2, "owners" : [ { "count" : 1, "name" : "beef" } ] }

My challenge comes with:

  1. Counting the total number of items
    • I solved this one by having a separate $inc that modifies a "count" field within the object, but I'd still like to be able to calculate the total on the fly.
  2. Counting the total number of unique owners
    • No sensible solution for this found, I am at a loss...
Would there be a better way to store this data, and how would I go about counting distinct owners, without sending all this data to the client?

Thanks in advance!

Steve

Saar Korren

unread,
May 28, 2012, 8:15:03 AM5/28/12
to mongodb-user
Before anything, I should note that if the number of owners for a
single item grows past a certain boundary, you'll have an overflow in
the size of the document. The following assumes you're already aware
of that, and don't feel like you should have a second collection to
hold the ownership relations.

Now, from what I can tell, you already have the information in a
single document, you just want to have it more accessible. The only
valid use case for this is if you put an index on the owner count. If
creating such an index is not your goal, I strongly suggest you do the
calculations on the client side, and use $where for any queries which
require the calculated value (Yet do not require an index on it).
The rest of this post assumes you want an index on the owner count.

Well, the obvious thing you need to do in order to index the owner
count is to make it a field. That much is obvious. The problem is with
keeping it up to date.
Now, in a prefect world, MongoDB would allow you to make a complex
update which adds the owner, increments the count, and updates the
field, in a single action. I've made another post on the subject,
titled "Complex Update", and there is a link to a related JIRA there.
However, since such a feature is not yet available, the rest of the
post will assume it is not an option.

Now, you're already half-way there with the concept of an incrementing
field for each owner. But now you need matching fields for the sums,
both total and unique. But, you also need to know when to increment
it, and when to decrement it. The key here is to use findAndModify: By
replacing the increments with findAndModify, you can determine if the
value for an owner was previously 0, and update the unique count then.

For example:
var doc = db.itemstats.findAndModify({'_id' : 1, 'owners.name' :
"joe" }, {'$inc' : { 'owners.$.count' : 1 }});
var found = false;
var i;
for (i in doc.owners) {
if (doc.owners[i].name == "joe") {
found = (doc.owners[i].count == 0);
break;
}
}
if (found) {
db.itemstats.update({'_id' : 1}, {'$inc' : { 'unique_owners' :
1 }});
}
db.itemstats.update({'_id' : 1}, {'$inc' : { 'total' : 1 }});

You can deal with decrementing in a similar fashion by checking that
the document after the decrementation has 0 ownership for that owner.

This should ensure that the counts will be eventually consistent so
long as the client does not fail. However, if a single client fail,
the mismatch would remain permanently, even after subsequent
operations.

A slightly more expensive yet safe method is to compute the counts on
the client side for each increment. The issue with doing that is that
you cannot hold a lock on the document while doing the calculation.
What you can do is use the following optimistic locking pattern:
1. After changing the owners, use findAndModify to increment and
return a field "change_count" in the document.
2. Get the owner set for the document. This can be done in the same
action as above.
3. Compute the counts for the owner set.
4. Perform the following operation: db.itemstats.update({'_id' : 1,
$where: "this.calculated_count != this.change_count ||
this.change_count == "+change_count}, {$set: {count: newcount, unique:
newunique, calculated_count: change_count}}); // (Apologies for the
$where injection, I don't know how to use scope with $where, nor how
to rewrite that query to not use $where)

The 4th step is the most important. It does two actions:
-If the value is still changing (High update rate), it makes the
update.
-If no more updates are being made, and the last writer has already
written the calculated counts, it makes sure no further writes are
possible. This prevents stale data from overwriting the most up to
date data. The last writer is guaranteed to have an up to date owner
set, since it retrieves it after the last change has been made.

In other words, the method provides both intermediate counts during
rapid updates, as well as eventual consistency once update cease. What
it does lose is that intermediate counts may be received out-of-order
(For instance, when adding 5 owners to a document with 3 owners,
repeated queries may return the sequence 3, 4, 7, 6, 5, 8). Only the
final count is guaranteed to be, indeed, the last returned value.
What it gains is that if a client crashes, so long as it is not the
last update, the count will remain correct. If the last update
crashed, any subsequent update will correct it. Additionally, it is
easy to detect that the last update was not yet completed (or
crashed), by checking that change_count != calculated_count. A
"garbage collector" may go over the documents to check for such
faults, or repair returned results dynamically during queries.

Note that neither method will guarantee that the owner sets and saved
counts would constantly match at every read, only that they would
closely follow each other, with a slight temporal shift, and be
consistent shortly after updating ends.
If you need, for example, to search for items with 10 owners, and must
not receive any owner sets with 9 or 11 owners, you will have to take
the results of searching based on the owner count, and further
filtering them down based on the owner sets, either with $where or on
the client side.
Note that while this will discard items that went from 10 owners to 11
owners, but have not yet updated the count, it will not add items that
went from 11 owners to 10 owners. However, since all operations are
only atomic per-document, a result which discards items which have not
yet been fully updated is valid (This can be seen as equivalent to the
update having been made after the query), and is the best that can be
guaranteed anyway.
>    1. Counting the total number of items
>    - I solved this one by having a separate $inc that modifies a "count"
>       field within the object, but I'd still like to be able to calculate the
>       total on the fly.
>       2. Counting the total number of unique owners
>    - No sensible solution for this found, I am at a loss...
Reply all
Reply to author
Forward
0 new messages