Natural Order not guaranteed - what does it mean?

257 views
Skip to first unread message

jdill

unread,
Sep 2, 2010, 11:43:49 AM9/2/10
to mongodb-user
I am trying to decide if I need to use a capped collection in the
following situation.

We have a 'tags' table that inserts potentially a 1000 records a
second.

We have clients that will be polling this data every second.

Each time the client receives records, it needs to advance its
'start' (skip) position to be able to grab the next set of records.
Since in any given second, a client could grab, say, only 1/2 of the
records that are going to be inserted in that second, next time it
comes back, I need to be sure that it will grab the rest of what was
inserted in that second.

This means I need to either rely on "natural order" or "sort" by
something that increments on every insert...is this correct?

I see that in the docs it says "although the order is often close to
insertion order, it is not guaranteed to be". I also see that
objectId has the timestamp and counter "inc" that "ensure a MOSTLY
increasing order".

Question: So under what circumstances would sorting by objectId fail
to ensure natural order? We are inserting data into a single master
server, but with multiple threads / connections inserting at same
time.

I don't really want to use capped collection if I don't have to
because of the update/drop limitations.

Thanks in advance.

Jeremy

Kyle Banker

unread,
Sep 2, 2010, 12:02:49 PM9/2/10
to mongod...@googlegroups.com
If you look at the ObjectId specification, you'll see that they're essentially precise to the second, as the first four bytes store seconds since the epoch. Beyond that, you'd relying on values like machine id, process id, and a counter. Plus, there's latency between the time the object id is created and committed to disk, as object ids are constructed on the client side, which mostly explains how object id order isn't guaranteed to be the natural order.

You shouldn't have any trouble sorting by object id, but you of course need to make sure that no two clients process the same documents. I can think of handful of ways you could accomplish this, but would need to know more about your use case.


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


jdill

unread,
Sep 2, 2010, 12:40:52 PM9/2/10
to mongodb-user
Thanks Kyle,

So where does "counter" come from? Does it increment? In any given
timestamp second, could there be a duplicate "counter" value?

I am preventing insertion of duplicate documents, so I don't think we
would run into any problem with more than one client processing same
document.

Problem I see is, if there is a delay between generation of _id and
writing document, when doing a query sorting by objectId, there is no
way to be sure that in any given instant I am getting correctly
ordered list (enough to trust that I have all records between first
and last in result). New items could finally sneak into result, which
will throw skip command out of wack. Skip might cause missing of new
records, and also would return a record I already received since there
might be more now in the same query. Right?


To be specific, here is my
> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>
> > .

Kyle Banker

unread,
Sep 2, 2010, 12:57:57 PM9/2/10
to mongod...@googlegroups.com
See below:

On Thu, Sep 2, 2010 at 12:40 PM, jdill <jer...@dilltree.com> wrote:
Thanks Kyle,

So where does "counter" come from?  Does it increment?  In any given
timestamp second, could there be a duplicate "counter" value?

 
There will never be a duplicate counter value for a given client doing inserts (machine id, process id), but across clients, there will be duplicates, of course.
 
I am preventing insertion of duplicate documents, so I don't think we
would run into any problem with more than one client processing same
document.

Problem I see is, if there is a delay between generation of _id and
writing document, when doing a query sorting by objectId, there is no
way to be sure that in any given instant I am getting correctly
ordered list (enough to trust that I have all records between first
and last in result).  New items could finally sneak into result, which
will throw skip command out of wack.  Skip might cause missing of new
records, and also would return a record I already received since there
might be more now in the same query. Right?


Yes, you're technically correct. One solution would be to ensure the data is committed by periodically issuing a call to
getlasterror and then be sure that when you process the data, you're staying well behind that commit. Of course,
this will require that the clocks on all your servers are in sync. With some experimentation and creativity, an approach like this
could probably work.

Of course, the most rigorous way to handle this would be to use findAnyModify to update the document when it's processed, but, of course,
this is more costly.
 
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

jdill

unread,
Sep 2, 2010, 1:00:45 PM9/2/10
to mongodb-user
Capped collection is looking better all the time. I think natural
order is my best bet here.

Markus Gattol

unread,
Sep 2, 2010, 4:09:43 PM9/2/10
to mongod...@googlegroups.com
How strict does your ordering have to be? If in some situations the
order might be off by just a few or even just one document[0].

I think you could (depending on your use case) turn to capped
collections and just "assume" insertion order is the same documents were
created in (even if that might just be roughly true).

I have a some use cases where the tradeoff of speed for "being a bit off
in terms of ordering by creation time" is no problem but the speed gain
all worth doing it.

[0] http://www.markus-gattol.name/ws/mongodb.html#natural_order

jdill

unread,
Sep 24, 2010, 12:21:48 PM9/24/10
to mongodb-user
Sorry to drag this back up but I am finally getting around to fully
resolving this.

When you say how strict does ordring have to be, I think the answer is
'very strict'. the notion is really simple. I have an api, and
clients will do polling. The Polling is documented like this:

===================
"Polling" can be done for splits. Polling in our case just means going
back to see what is new since the last time you received any results.

Simply use the start parameter to along with "last" + 1.

For example, let's say you have been polling the following, and the
last time you got results, the "last" was 18. Your next request should
look like this:

/events/[EVENT NAME]/categories/elite-women/splits/5K?start=19
Depending on how much data you expect to receive, and how large of a
response you want to be able to handle, this may be a good time to use
max=0 so that you can get all new records without having to call the
"next" URL.

/events/[EVENT NAME]/categories/elite-women/splits/5K?start=19&max=0
NOTE: If there are no results returned, you will receive an error
object with a type of "no_results" on each request.
===================

So, as you can see, the results of my query (in this case elite-women
at the 5k point) have to be in the same order every time. I then use
skip to advance the query to get the next set of results. The problem
with non-natural order is, that within an instant, if document # 19
(as sorted by objectId) is ABC, and then an insert completes and
record number #19 becomes to XYZ, and ABC shifts to #20, it is
possible for a client to fetch ABC with a "start" command of 19, then
issue a start of 20, and fetch ABC again...never seeing XYZ.

I have to ensure that records are never skipped with a polling
command.

One option that I considered was to only allow 1 client to insert
records into mongo one by one...by basically using a queue of some
sort. But, this is still considered optimistic (not guaranteed),
isn't it?


Thanks,

Jeremy


On Sep 2, 3:09 pm, Markus Gattol <markus.gat...@sunoano.org> wrote:
> How strict does your ordering have to be? If in some situations theordermight be off by just a few or even just one document[0].
>
> I think you could (depending on your use case) turn to capped
> collections and just "assume" insertionorderis the same documents were

Kristina Chodorow

unread,
Sep 25, 2010, 5:43:03 AM9/25/10
to mongod...@googlegroups.com
You are correct, MongoDB won't guarantee it'll keep thing in insertion order unless you use a capped collection.


Markus Gattol

unread,
Sep 25, 2010, 9:20:20 AM9/25/10
to mongod...@googlegroups.com

[skipping a lot of lines ...]

jdill> One option that I considered was to only allow 1 client to
jdill> insert records into mongo one by one...by basically using a
jdill> queue of some sort. But, this is still considered optimistic
jdill> (not guaranteed), isn't it?

If you use the combination of a *single client inserting into a capped
collection*, then insertion time will be creation time i.e. if you move
the database cursor across your collection it will see documents sorted
by creation time.

If you deviate from that combo i.e. several clients and/or normal
collection, then you probably will not get insertion_order ==
creation_time.

http://www.markus-gattol.name/ws/mongodb.html#natural_order

jdill

unread,
Sep 27, 2010, 11:03:38 AM9/27/10
to mongodb-user
Thanks for taking time to understand and advise on my issue. I have
tested with capped collections and it seems that is going to be my
solution for now.

Any chance for a future collection type that isn't capped, but that
somehow supports natural order? It kind of rough that in order to
get the natural order feature I have to predefine how large the
collection is going to be. Predicting collection sizes can be
challenging, and if I am wrong (making a collection too small) very
bad things will happen as data starts to purge.
Reply all
Reply to author
Forward
0 new messages