How to design a query that needs to limit at a value of a sum

99 views
Skip to first unread message

Mike

unread,
Jan 14, 2018, 3:11:10 PM1/14/18
to ArangoDB
Hi, 
this is my problem, i have a list of documents from a query, each has an age attribute, my challenge is that i need to work on (update in place)
a few whose age is <= a total age say "Z",  so the first n that sum to  less than or equal to "Z", and not bother with the rest, how would i do this? Z is usually big enough to include at least one person, usually much more. 
"FOR  unselected in persons FILTER doc.selected==false  FOR person  IN unselected    {how to limit to only a few}  UPDATE  person  WITH {"selected": true} IN people  RETURN NEW" 
esentially choosing a few documents at a time the sum of whose ages are leq Z. 
any clever ideas?

Simran Brucherseifer

unread,
Jan 15, 2018, 6:39:36 AM1/15/18
to ArangoDB
Hi Mike,

so your data might look like

{ "name": "John", "age": 35 }
{ "name": "Lisa", "age": 8 }
{ "name": "Tim", "age": 17 }
{ "name": "Marry", "age": 84 }

and you want to set { "selected": true } on them until the sum of ages reaches Z, let's say 50.

John would be selected (sum = 35), Lisa as well (sum = 35 + 8 = 43), but not Tim (sum = 35 + 8 + 17 = 60) and thus also not Marry.

LIMIT can not be a runtime-evaluated expression, and a summation with COLLECT AGGREGATE can not be stopped with a FILTER (will only post-filter result).
Thus, the only way to implement this in vanilla AQL that I found is to sum up increasing slices of the data (John, John+Lisa, John+Lisa+Tim, John+Lisa+Tim+Marry), FILTER out all which which exceed Z and return the last "good" slice:

LET data = [
    { "name": "John", "age": 35 },
    { "name": "Lisa", "age": 8 },
    { "name": "Tim", "age": 17 },
    { "name": "Marry", "age": 84 }
]

FOR i IN 1..LENGTH(data)
    FILTER SUM(SLICE(data, 0, i)[*].age) <= 50
    RETURN data[i-1]

This is not particularly efficient, but should work for tiny datasets. If you want to do this on larger datasets, I would recommend to write a JavaScript transaction that requests all documents, but only processes the first n (until the age sum reaches Z), then update those documents. If you know that the maximum number of documents will never be above e.g. 100, then you can apply a limit in the initial query to improve server performance.

Two things that crossed my mind:
- You don't have a SORT statement in your query, which means the order in which documents are returned and selected may change between any two queries. Doesn't it matter for your use case which documents get selected?
- You want to UPDATE certain documents with a "selected" attribute. What do you want to use this for though? Would it be possible run a single query server-side, and select and process the rest on the client-side maybe?

Mike Atambo

unread,
Jan 20, 2018, 3:17:06 PM1/20/18
to aran...@googlegroups.com
Wow, 
nice,  first  there is no  sort, since it does not matter which ones are selected, any previously unselected document can be chosen. 
Im using this in an AQL  TRANSACTION, where i pick a set of documents whose sum of age does not exceed some value, and mutate them atomically and return them to the client (its important that they are acted on atomically). 

Mike

--
You received this message because you are subscribed to a topic in the Google Groups "ArangoDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/arangodb/BWTNkZ9e9W4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to arangodb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages