On the topic of QueryOption_AwaitData, without it tailable cursors
don't serve the very purpose I need them for: Database-controlled non-
busy blocking. I'll have to wait until it is fixed before I can make
the shift from SQL user-locks.
On the topic of skipping. Is there any way to obtain a "natural index"
of an object, and query by that?
The reason I'm asking is because I'm thinking of a pattern like this
(Using JavaScript for simplicity):
// Set timeout and job
var timeoutTime = new Date().getTime()+25;
var work_obj = {input: data, done: false};
db.workqueue.insert(work_obj);
async_call_worker(work_obj._id, data);
try
{
var work_done_obj;
while (1)
{
// Set up pre-check signal marker
// Optionally: Try to locate a previous signal marker, in case
there are multiple consumers for the work
// This is only to reduce the marker count. It is valid to have
multiple markers
var cursor = db.signals.find({worker:
work_obj._id}).sort({"$natural": -1});
var sig_obj;
if (cursor.hasNext())
{
sig_obj = cursor.next();
}
else
{
// No previous marker found. Create a new one
// Based on the timing, multiple consumers may create multiple
markers. This is not an error
sig_obj = {worker: work_obj._id};
db.signals.insert(sig_obj);
}
// Test if the work was completed already
// This may be necessary if the work was completed before we
inserted our marker above, as it would be inserted after the marker
set by the worker
if ((work_done_obj = db.workqueue.findOne({_id: work_obj._id,
done: true})) !== null) return work_done_obj.output;
// Create a tailable cursor iterating over the markers, starting
with the one we previously set
cursor = db.signals.find({worker: work_obj._id, "$natural":
{"$gte": sig_obj["$natural"]}}).sort({"$natural": 1});
cursor.addOption(2);
cursor.addOption(32);
// Skip the marker we set - it's not interesting
// Also, handle the case where it was already removed
if (!cursor.hasNext()) continue;
cursor.next();
// Wait for the next marker. Hopefully, it was set by the worker
while (cursor.hasNext())
{
if (timeoutTime-(new Date().getTime()) < 0) return null;
cursor.next();
// Check if the work was completed. This marker could have been
set by the worker, but also by another consumer
if ((work_done_obj = db.workqueue.findOne({_id: work_obj._id,
done: true})) !== null) return work_done_obj.output;
}
if (timeoutTime-(new Date().getTime()) < 0) return null;
}
}
finally {
// Remove all signal objects.
// If there are other consumers for the work, they will receive a
dead cursor, and simply create a new signal object.
db.signals.remove({worker: work_obj._id});
}
With a worker pattern of:
function work_done(work_id, data) {
// Update the work, marking it as complete
db.workqueue.update({_id: work_id}, {"$set": {output: data,
done:true});
// Set up a marker, telling any currently waiting consumers to
recheck if the work is done
// A consumer that has already set its marker would detect this
marker in the tail loop, and check if the work is complete, thus being
released from the wait
// A consumer that has not yet set its marker would detect that the
work is complete in the check between setting the marker and the wait
loop
// Slight gotcha: If all consumers already died, there is no
guarantee this signal would be removed
db.signals.insert({worker: work_id});
}
(In my actual use-case, "async_call_worker" and "work_done" are cross-
server CURL calls, not function calls)
As you see, besides the efficiency of skipping a long query, it also
reduces calls to db.workqueue.findOne as it skips any previously
existing entries in the signals collection. I suppose I can manage the
latter by simply adding a loop that "consumes" all documents in
db.signals until reaching sig_obj, but that seems quite inefficient.
P.S.
After considering the above pattern, I'm wondering if removing the
existing marker in the worker thread instead of inserting a new one
might be a viable alternative, as it would produce a dead cursor.
Would that be enough to release the cursor's wait?
On May 21, 6:03 pm, Spencer T Brody <
spen...@10gen.com> wrote:
> If you do a query like {"_id":{"$gte":my_obj._id}} with a tailable cursor,
> it will scan through the entire collection in $natural order, returning any
> documents that match the query condition. It will not, however, be able to
> jump directly to the place in the collection where the _ids are greater
> than the value provided as the query will not be using an index.
>
> As for the PHP driver and QueryOption_AwaitData, you've already
> discovered PHP-389. Until that is fixed, you could still use tailable
> cursors, it just means that if you try to fetch more results and no new
> documents have been inserted, the getmore will return immediately instead
> of blocking.
>
> > on a...
>
> read more »