Explicit blocking

278 views
Skip to first unread message

Saar Korren

unread,
May 9, 2012, 10:24:10 AM5/9/12
to mongodb-user
I am currently planning the migration of a large-scale service from
MySQL to MongoDB. While in most use cases involved in the system
MongoDB seems to be a superior solution (Especially considering most
of the SQL queries were already optimized in a manner that is more
consistent with MongoDB's paradigms), there is one feature of which I
could not find an equivalent in MongoDB: Explicit locks.

Currently, I'm using explicit locks for three purposes:
1. Schema migration - This will be completely removed with the move to
MongoDB.
2. Thrash reduction in job-queue polling. I use a lock to reduce
collisions between workers polling job queues. The polling queries
themselves are already designed to be atomic and collision free.
However, a collision would result in a wasted poll-spin all but one of
the workers, and with a sufficiently large quantity of workers a
pessimistic lock has proven to drastically boost performance. This is
not mandatory, but I would rather keep it.
3. Transforming asynchronous calls into a synchronous callback. This
part is mandatory. I have one synchronous script which can block up to
30 seconds waiting for a call to another script which would release
it. I was able to achieve a semaphore-like blocking system using
MySQL's GET_LOCK, IS_USED_LOCK, and KILL. It's a bit of a hack, but it
serves its purpose, to a point.

Point number 3 above is the most important one for me. As far as I
could tell, MongoDB has no method to explicitly block for any
controllable event, and, as the wait can reach several seconds even on
normal operations, a spin-lock is simply not an option for me.
Obviously, if there was a "wait for change" command for documents,
even an optimistic one with no hard guarantees about changes made,
that would solve my problem (Using a semi-blocking semi-spin
signalling system). But as far as I could tell, there isn't.

I suppose I could continue to use MySQL user locks, but that comes
with all the flaws of still relying on the MySQL database.

Is there any recommended method or system to work alongside MongoDB
for explicit signalling and blocking?
(Preferably one that shares MongoDB's properties regarding failover
and potential sharding. And, of course, the lock manager being on a
separate server from the blocking and signalling scripts)

Spencer T Brody

unread,
May 9, 2012, 11:35:32 AM5/9/12
to mongod...@googlegroups.com
MongoDB does not currently have any general signaling and blocking capabilities.  If you just want to be notified when a document is updated, however, one thing you could do is tail the oplog.  The oplog records every update that happens for use in replication.  Because the oplog is a capped collection, you can use a tailable cursor on it.  Unlike a normal cursor, when you query a tailable cursor and it runs out of results, rather than returning it will block until there are new results to return.  You could write something to query the oplog with a tailable cursor so that you'll receive notification of every new write right when it happens.

More information on the oplog is here: http://www.mongodb.org/display/DOCS/Replica+Sets+-+Oplog, and information on tailable cursors is here: http://www.mongodb.org/display/DOCS/Tailable+Cursors

Saar Korren

unread,
May 10, 2012, 3:43:19 AM5/10/12
to mongodb-user
Sounds like I could just use a tailable cursor with my own capped
collection to create a global optimistic signalling system. However,
the documentation on tailable cursors says nothing about blocking if
no document exist, save for a comment in the first example. Are you
certain about the blocking behavior? Would it prevent a busy-wait spin-
lock? (I wouldn't want to max my CPU on a signal wait) Is this just a
case of unclear documentation?

Actually, re-reading the examples, it seems to be related to the
QueryOption_AwaitData flag. What does it do for cursors which are not
tailable? (The documentation is quite confusing)

Spencer T Brody

unread,
May 10, 2012, 11:10:38 AM5/10/12
to mongod...@googlegroups.com
You're right, the documentation here was a bit confusing.  I just updated it to make the blocking behavior more explicit.

Using a tailable cursor will not cause a busy-wait - replication uses tailable cursors to read the oplog and that would be really bad if replication maxed out your CPU.


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Saar Korren

unread,
May 13, 2012, 2:57:03 AM5/13/12
to mongodb-user
Okay, just a couple more questions:
1. It is mentioned tailing cursors do not use indexes. I assume this
only refers to the sorting order, but like I said, the documentation
is confusing. Is it still possible to tail a collection based on a
specific value query?
2. While I could probably find this out myself through some
experimentation, it could save time if you could answer this: Is it
possible to produce a tailable cursor from an insert? That is, insert
a dummy document, and then wait for documents that are inserted
"approximately after" the inserted document? (Preferably with some
additional constraints)
3. Do tailable cursors support sharding? (Assuming the tail is on a
single shard, of course. It would just make more sense to have a
single collection with several tailable shards than to have a separate
collection for each blockable event)
4. The documentation mentions a timeout on tailable cursors, which is
good, but it does not mention what said timeout is. Is it possible to
configure the timeout (per cursor or per connection or per
collection)? Since I do need to handle a 30 seconds timeout, if the
timeout is much longer than that I could be stuck with needlessly
prolonged connections.

On May 10, 6:10 pm, Spencer T Brody <spen...@10gen.com> wrote:
> You're right, the documentation here was a bit confusing.  I just updated
> it to make the blocking behavior more explicit.
> There is some more documentation about the QueryOption_AwaitData flag here:http://api.mongodb.org/cplusplus/1.5.1/namespacemongo.html#a7261673f7...
> .

Spencer T Brody

unread,
May 14, 2012, 3:46:01 PM5/14/12
to mongod...@googlegroups.com
1) Yes, you can still do a value query when using a tailable cursor.  Tailable cursors should always sort on $natural - it will iterate over the documents in insertion order, not using any index.  But as it considers every document in insertion order, it can check that the document satisfies a given query expression and only return it if it does
2) There is no built-in way to do this, but you could do it by including a timestamp or objectId in the documents as they get inserted, and query for documents where the timestamp is greater than the one that you create with the insert.
3) It is currently not possible to shard a capped collection.
4) In 2.0.x releases the timeout is approximately 4 seconds and there is no way to configure that.  If you want your code to be able to wait for new documents for longer than that, you will need to put the querying client code in a loop that will retry the query from where it left off if it times out.  The documentation at http://www.mongodb.org/display/DOCS/Tailable+Cursors has some examples of doing this.

Saar Korren

unread,
May 20, 2012, 7:28:22 AM5/20/12
to mongodb-user
On the topic of question 2: It is possible to get an ObjectId from an
insert, that much I know. What I'm wondering is if I can make a query
that starts from an ObjectId, and proceeds in the natural order. Would
{"_id":{"$gte":my_obj._id}} work as expected?

On a slightly different note, I've stumbled into an issue which might
be a show-stopper for me. The PHP driver seems to have no way to turn
on flag 5 (QueryOption_AwaitData) on a cursor. I've looked into the
code, and only flags 1 (QueryOption_CursorTailable), 2
(QueryOption_SlaveOk), 4 (QueryOption_NoCursorTimeout), and 7 appear
to be supported through special functions. There does not appear to be
an option to set arbitrary flags.

Short of modifying and compiling a custom MongoDB PHP driver, I see no
solution to this. And that is hardly a viable option, due to
deployment difficulties involving a custom driver.

On May 14, 10:46 pm, Spencer T Brody <spen...@10gen.com> wrote:
> 1) Yes, you can still do a value query when using a tailable cursor.
>  Tailable cursors should always sort on $natural - it will iterate over the
> documents in insertion order, not using any index.  But as it considers
> every document in insertion order, it can check that the document satisfies
> a given query expression and only return it if it does
> 2) There is no built-in way to do this, but you could do it by including a
> timestamp or objectId in the documents as they get inserted, and query for
> documents where the timestamp is greater than the one that you create with
> the insert.
> 3) It is currently not possible to shard a capped collection.
> 4) In 2.0.x releases the timeout is approximately 4 seconds and there is no
> way to configure that.  If you want your code to be able to wait for new
> documents for longer than that, you will need to put the querying client
> code in a loop that will retry the query from where it left off if it times
> out.  The documentation athttp://www.mongodb.org/display/DOCS/Tailable+Cursorshas some examples of

Sam Millman

unread,
May 20, 2012, 12:22:42 PM5/20/12
to mongod...@googlegroups.com
"On the topic of question 2: It is possible to get an ObjectId from an
insert, that much I know. What I'm wondering is if I can make a query
that starts from an ObjectId, and proceeds in the natural order. Would
{"_id":{"$gte":my_obj._id}} work as expected?"

Possibly, well depends but normally no. _id is not ordered on disk and in order for that query to be reliable you would need to sort by _id.

Though there are cases where _id with natural order might be feasible......

Why not just extend the mongo cursor into your own? I do it all the time. Add a variable flag to the extended class and bam you have your stuff, unless I'm missing something from not reading the rest of the thread.

See also the IRC channel -- freenode.net#mongodb

Sam Millman

unread,
May 20, 2012, 12:26:26 PM5/20/12
to mongod...@googlegroups.com
Actually in the way your using the query, it might be reliable and should be unless I've forgotten something. Since even though it evaluates on insertion order, as Spencer says, it does evaluate the query.

Saar Korren

unread,
May 21, 2012, 3:03:22 AM5/21/12
to mongodb-user
I don't see how subclassing MongoCursor would help, considering the
flags exist only in the private "opts" field of the native object, and
not in the exposed class. Since the field is not exposed in any way,
and since it is used directly in serializing the cursor options for
the outgoing packet, I have no way to manually modify the cursor's
options.

I've contemplated using db.eval, but I doubt that would work, since
the code is evaluated on the server.

On May 20, 7:26 pm, Sam Millman <sam.mill...@gmail.com> wrote:
> Actually in the way your using the query, it might be reliable and should
> be unless I've forgotten something. Since even though it evaluates on
> insertion order, as Spencer says, it does evaluate the query.
>
> On 20 May 2012 17:22, Sam Millman <sam.mill...@gmail.com> wrote:
>
>
>
>
>
>
>
> > "On the topic of question 2: It is possible to get an ObjectId from an
> > insert, that much I know. What I'm wondering is if I can make a query
> > that starts from an ObjectId, and proceeds in the natural order. Would
> > {"_id":{"$gte":my_obj._id}} work as expected?"
>
> > Possibly, well depends but normally no. _id is not ordered on disk and in
> > order for that query to be reliable you would need to sort by _id.
>
> > Though there are cases where _id with natural order might be feasible......
>
> > Why not just extend the mongo cursor into your own? I do it all the time.
> > Add a variable flag to the extended class and bam you have your stuff,
> > unless I'm missing something from not reading the rest of the thread.
>
> ...
>
> read more »

Sam Millman

unread,
May 21, 2012, 3:12:46 AM5/21/12
to mongod...@googlegroups.com
db.eval would also be a horrid thing to use, nay call it "evil" since it doesn't work across sharding and there are other limitations to it.

Hmm maybe this is something to post a JIRA about?

Saar Korren

unread,
May 21, 2012, 10:20:07 AM5/21/12
to mongodb-user
Seems like someone already did, earlier today:
https://jira.mongodb.org/browse/PHP-389

It doesn't specifically deal with the flag I need, but it would
include it, if the requested solution is implemented.

On May 21, 10:12 am, Sam Millman <sam.mill...@gmail.com> wrote:
> db.eval would also be a horrid thing to use, nay call it "evil" since it
> doesn't work across sharding and there are other limitations to it.
>
> Hmm maybe this is something to post a JIRA about?
>
> > > >>www.mongodb.org/display/DOCS/Tailable+Cursorshassomeexamples of
> ...
>
> read more »

Sam Millman

unread,
May 21, 2012, 10:25:25 AM5/21/12
to mongod...@googlegroups.com
Yea seems 10gen is onto this this Scott reported it

> ...
>
> read more »

Glenn Maynard

unread,
May 21, 2012, 10:55:35 AM5/21/12
to mongod...@googlegroups.com
On Sun, May 20, 2012 at 11:22 AM, Sam Millman <sam.m...@gmail.com> wrote:
"On the topic of question 2: It is possible to get an ObjectId from an
insert, that much I know. What I'm wondering is if I can make a query
that starts from an ObjectId, and proceeds in the natural order. Would
{"_id":{"$gte":my_obj._id}} work as expected?"

Possibly, well depends but normally no. _id is not ordered on disk and in order for that query to be reliable you would need to sort by _id.

Though there are cases where _id with natural order might be feasible......

What he's describing sounds useful, though.  It's the same as the typical pattern:

items1 = coll.find({key: 'value'}).sort({_id: 1}).limit(100)
items2 = coll.find({key: 'value', $gt: {_id: items1_last_id}}).sort({_id: 1}).limit(100)
items3 = coll.find({key: 'value', $gt: {_id: items2_last_id}}).sort({_id: 1}).limit(100)

to read documents block by block (without any complex state or skip() overhead), but with a natural sort instead of ID sort, so it'd be much faster off disk.

--
Glenn Maynard


Sam Millman

unread,
May 21, 2012, 10:57:36 AM5/21/12
to mongod...@googlegroups.com
Indeed and infact when doing paging over millions of docs you should do range based queries instead of skip() but due to some reason I've now forgotten the query is only useful if you sort by what your ranging. I'm trying to remember why.....

Spencer T Brody

unread,
May 21, 2012, 11:03:44 AM5/21/12
to mongod...@googlegroups.com
If you do a query like {"_id":{"$gte":my_obj._id}} with a tailable cursor, it will scan through the entire collection in $natural order, returning any documents that match the query condition.  It will not, however, be able to jump directly to the place in the collection where the _ids are greater than the value provided as the query will not be using an index.

As for the PHP driver and QueryOption_AwaitData, you've already discovered PHP-389.  Until that is fixed, you could still use tailable cursors, it just means that if you try to fetch more results and no new documents have been inserted, the getmore will return immediately instead of blocking.

Saar Korren

unread,
May 22, 2012, 5:37:31 AM5/22/12
to mongodb-user
On the topic of QueryOption_AwaitData, without it tailable cursors
don't serve the very purpose I need them for: Database-controlled non-
busy blocking. I'll have to wait until it is fixed before I can make
the shift from SQL user-locks.

On the topic of skipping. Is there any way to obtain a "natural index"
of an object, and query by that?

The reason I'm asking is because I'm thinking of a pattern like this
(Using JavaScript for simplicity):
// Set timeout and job
var timeoutTime = new Date().getTime()+25;
var work_obj = {input: data, done: false};
db.workqueue.insert(work_obj);
async_call_worker(work_obj._id, data);
try
{
var work_done_obj;
while (1)
{
// Set up pre-check signal marker
// Optionally: Try to locate a previous signal marker, in case
there are multiple consumers for the work
// This is only to reduce the marker count. It is valid to have
multiple markers
var cursor = db.signals.find({worker:
work_obj._id}).sort({"$natural": -1});
var sig_obj;
if (cursor.hasNext())
{
sig_obj = cursor.next();
}
else
{
// No previous marker found. Create a new one
// Based on the timing, multiple consumers may create multiple
markers. This is not an error
sig_obj = {worker: work_obj._id};
db.signals.insert(sig_obj);
}
// Test if the work was completed already
// This may be necessary if the work was completed before we
inserted our marker above, as it would be inserted after the marker
set by the worker
if ((work_done_obj = db.workqueue.findOne({_id: work_obj._id,
done: true})) !== null) return work_done_obj.output;
// Create a tailable cursor iterating over the markers, starting
with the one we previously set
cursor = db.signals.find({worker: work_obj._id, "$natural":
{"$gte": sig_obj["$natural"]}}).sort({"$natural": 1});
cursor.addOption(2);
cursor.addOption(32);
// Skip the marker we set - it's not interesting
// Also, handle the case where it was already removed
if (!cursor.hasNext()) continue;
cursor.next();
// Wait for the next marker. Hopefully, it was set by the worker
while (cursor.hasNext())
{
if (timeoutTime-(new Date().getTime()) < 0) return null;
cursor.next();
// Check if the work was completed. This marker could have been
set by the worker, but also by another consumer
if ((work_done_obj = db.workqueue.findOne({_id: work_obj._id,
done: true})) !== null) return work_done_obj.output;
}
if (timeoutTime-(new Date().getTime()) < 0) return null;
}
}
finally {
// Remove all signal objects.
// If there are other consumers for the work, they will receive a
dead cursor, and simply create a new signal object.
db.signals.remove({worker: work_obj._id});
}

With a worker pattern of:
function work_done(work_id, data) {
// Update the work, marking it as complete
db.workqueue.update({_id: work_id}, {"$set": {output: data,
done:true});
// Set up a marker, telling any currently waiting consumers to
recheck if the work is done
// A consumer that has already set its marker would detect this
marker in the tail loop, and check if the work is complete, thus being
released from the wait
// A consumer that has not yet set its marker would detect that the
work is complete in the check between setting the marker and the wait
loop
// Slight gotcha: If all consumers already died, there is no
guarantee this signal would be removed
db.signals.insert({worker: work_id});
}

(In my actual use-case, "async_call_worker" and "work_done" are cross-
server CURL calls, not function calls)

As you see, besides the efficiency of skipping a long query, it also
reduces calls to db.workqueue.findOne as it skips any previously
existing entries in the signals collection. I suppose I can manage the
latter by simply adding a loop that "consumes" all documents in
db.signals until reaching sig_obj, but that seems quite inefficient.

P.S.
After considering the above pattern, I'm wondering if removing the
existing marker in the worker thread instead of inserting a new one
might be a viable alternative, as it would produce a dead cursor.
Would that be enough to release the cursor's wait?

On May 21, 6:03 pm, Spencer T Brody <spen...@10gen.com> wrote:
> If you do a query like {"_id":{"$gte":my_obj._id}} with a tailable cursor,
> it will scan through the entire collection in $natural order, returning any
> documents that match the query condition.  It will not, however, be able to
> jump directly to the place in the collection where the _ids are greater
> than the value provided as the query will not be using an index.
>
> As for the PHP driver and QueryOption_AwaitData, you've already
> discovered PHP-389.  Until that is fixed, you could still use tailable
> cursors, it just means that if you try to fetch more results and no new
> documents have been inserted, the getmore will return immediately instead
> of blocking.
>
> > on a...
>
> read more »

Glenn Maynard

unread,
May 22, 2012, 1:49:34 PM5/22/12
to mongod...@googlegroups.com
On Mon, May 21, 2012 at 9:57 AM, Sam Millman <sam.m...@gmail.com> wrote:
Indeed and infact when doing paging over millions of docs you should do range based queries instead of skip() but due to some reason I've now forgotten the query is only useful if you sort by what your ranging. I'm trying to remember why.....

Well, you need to have a consistent sort order, so when you start up the query again you continue where you left off (or at least close, if the data was changed).  The problem is just that there's no way to use $natural as the sort order for this technique, because there's no way to filter based on natural order.  That seems useful, since scanning on $natural is very fast.

(I don't have any actual, real-world need for this right now, though, so until I do I won't file a ticket.)

--
Glenn Maynard


Spencer T Brody

unread,
May 23, 2012, 11:11:29 AM5/23/12
to mongod...@googlegroups.com
Hard to pick apart that much code, but I do have a few notes.  First, I'm not entirely sure what you mean by a "natural index".  $natural represents the order of the documents on disk, which for a capped collection is insertion order.  There's no way to make an index on that, because there's no way to represent on disk location in a query, and since sequential access on spinning disks is fast, usually querying a capped collection in $natural order is very fast anyway.  When you use a tailable cursor, the cursor scans from the oldest documents in the capped collection to the newest, in on-disk order, returning any documents it encounters that match the query filter.  Also, you should be aware that you cannot delete documents from a capped collection, you must simply wait for them to be removed once the capped collection is full and old documents start being removed automatically.  You can find more about the usage and restrictions of capped collections here: http://www.mongodb.org/display/DOCS/Capped+Collections#CappedCollections-UsageandRestrictions

Saar Korren

unread,
May 24, 2012, 2:35:18 AM5/24/12
to mongodb-user
Hmm. So if there's no removal, hard iteration is my only option. It's
too bad there's no "skip to object" option for tailable cursor, but I
guess I can do that by iterating until the returned object has an
"_id" I'm expecting. Doing so on a natural order may be fast compared
to iterating an index sort, but in a large collection it is still much
slower than directly seeking to a specific item. Also, the extra
efficiency applies only in the case where no other clients are trying
to access the database at the same time.

It would have been nice to be able to tell a tailable cursor "start
here". Actually, if considering the order on disk, I don't see why it
would be difficult to have an "index" on it. It has an implicit index
based on the location of the item, no? Simply return the object's
position on disk, and you can seek back to that position. That's how
tailable cursors work (internally) anyway, no?

For the pattern, let me try an alternative, simpler, pseudo code:

Consumer:
Use global variable change
last_change = now()
While worker is not done
Wait until change>last_change or random()
last_change = now()

Worker:
Use global variable change
Set worker is done
change=now()

The "random" is to indicate that the wait is optimistic - it can be
released due to timeout, connection fault, or any arbitrary reason.
Because I can wait for new documents to be added, I can do "Wait until
change>". It's that last ">last_change" that is the problem, since I
cannot start the cursor at the object I've previously designated as
"now()".

On May 23, 6:11 pm, Spencer T Brody <spen...@10gen.com> wrote:
> Hard to pick apart that much code, but I do have a few notes.  First, I'm
> not entirely sure what you mean by a "natural index".  $natural represents
> the order of the documents on disk, which for a capped collection is
> insertion order.  There's no way to make an index on that, because there's
> no way to represent on disk location in a query, and since sequential
> access on spinning disks is fast, usually querying a capped collection in
> $natural order is very fast anyway.  When you use a tailable cursor, the
> cursor scans from the oldest documents in the capped collection to the
> newest, in on-disk order, returning any documents it encounters that match
> the query filter.  Also, you should be aware that you cannot delete
> documents from a capped collection, you must simply wait for them to be
> removed once the capped collection is full and old documents start being
> removed automatically.  You can find more about the usage and restrictions
> of capped collections here:http://www.mongodb.org/display/DOCS/Capped+Collections#CappedCollecti...
> > > >www.mongodb.org/display/DOCS/Tailable+Cursorshassomeexamples of
> > > > > doing this.
>
> > > > > On Sun, May 13, 2012 at 2:57 AM, Saar Korren <s...@a.co.il> wrote:
> > > > > > Okay, just a couple more questions:
> > > > > > 1. It is mentioned tailing cursors do not use indexes. I assume
> > this
> > > > > > only refers to the sorting order, but like I said, the
> > documentation
> > > > > > is confusing. Is it still possible to tail a collection based on a
> > > > > > specific value query?
> > > > > > 2. While I could probably find this out myself through some
> > > > > > experimentation, it could save time if you could answer this: Is it
> > > > > > possible to produce a tailable cursor from an insert? That is,
> > insert
> > > > > > a dummy document, and then wait for documents that are inserted
> > > > > > "approximately after" the inserted document? (Preferably with some
> > > > > > additional constraints)
> > > > > > 3. Do tailable cursors support sharding? (Assuming the tail is on a
> > > > > > single shard, of course. It would just make more sense to have a
> > > > > > single collection with several tailable shards than to have a
> > separate
> > > > > > collection for each blockable event)
> > > > > > 4. The
>
> ...
>
> read more »

Spencer T Brody

unread,
May 28, 2012, 11:57:36 AM5/28/12
to mongod...@googlegroups.com
That's an interesting idea, I'm not sure how difficult it would be to implement, or how much gain you would get from it since you'd likely still need to iterate through the extents to find the appropriate starting disk location, but it would certainly be better than returning all the data for the first part of the collection if it isn't needed.  If this is something that would be helpful for you to have in a future release, you should open a feature request at jira.mongodb.org.

> ...
>
> read more »

Reply all
Reply to author
Forward
0 new messages