Limit by id

F21

unread,

May 17, 2013, 10:28:18 PM5/17/13

to aran...@googlegroups.com, f21.g...@gmail.com

I have a query where I sort and order some data, for example:

FOR x in mycollection

SORT x.someproperty

RETURN x

Before, I return, I want to limit the results. Normally using LIMIT X, Y would work, but in this case, I do not what to perform simple paging. I want to get a list of sorted documents and then return X documents starting from where the document's id is Y. This is because I am implementing an activity feed and because new items are added in front of the feed, doing LIMIT X,Y would not work.

Essentially, I need to implement something like this: https://dev.twitter.com/docs/working-with-timelines

Is this something that's possible to implement with AQL?

Jan Steemann

unread,

May 21, 2013, 3:03:46 AM5/21/13

to aran...@googlegroups.com, F21

I think you will need some index on the date/time an entry is added to
the feed. Incoming entries can then be retrieved using that index.

For the first time you query the feed and do not know what date & time
to restrict to, the query would be:

FOR x IN mycollection
FILTER x.userId == ...
SORT x.dt DESC
LIMIT y
RETURN x

Next time you query the feed, you can pass the "_id" and "dt" attribute
of the bottommost x element you got:

FOR x IN mycollection
FILTER x.userId == ... && x.dt <= $lastDt && TO_NUMBER(x._id) < $lastId
SORT x.dt DESC
LIMIT y
RETURN x

then you should just see unique elements per page.

F21

unread,

May 23, 2013, 9:26:09 PM5/23/13

to aran...@googlegroups.com, F21, j.ste...@triagens.de

Hi Jan,

Thanks for your help. It is interesting how you filter on the _id property. Are _id properties always incrementing (a newer document will always have a greater _id than the previous one) and never reused?

This is assuming we do not set any keyOptions on the collection.

Cheers!

Jan Steemann

unread,

May 24, 2013, 3:01:50 AM5/24/13

to aran...@googlegroups.com, F21

The "_id" attribute is a composition of collection name, the "/" and the
document "_key" attribute.
That means "_id" is a string with the same prefix for all documents in
the same collection.

It would be better to filter on "_key" instead, because it is shorter
and does not contain the always-same prefix, just the document key that
you assigned (or the database assigned) when the document was created.

The "_key" attribute is a string. If you don't assign a "_key" value
yourself, the database will create one for you. The "_key" values the
database creates as numeric values that will be ever-increasing. Newer
documents will get higher numeric values than older ones.
However, as you can mix your own keys with database-assigned keys, the
"_key" attribute is still a string.

Sorting on "_key" will do a string sort, and this causes the usual
problems (e.g. "2" > "100"). If you want the treat keys as numbers and
sort by that, you need to convert them to numbers first and then sort.
That might not be efficient and might not work if you use any
non-numeric keys.

In the end, filtering on "_key" using equality should be efficient, but
any other operations on "_key" with cannot use equality checks (such as
range comparisions, sorting) are probably not "right".

From my point of view it will make sense to eventually add some
attribute or functions which can be used to retrieve documents in
insertion or update order. That might solve a lot of practical problems.

Best regards
Jan

> --
> You received this message because you are subscribed to the Google
> Groups "ArangoDB" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to arangodb+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Reply all

Reply to author

Forward