AQL list add/remove functions

372 views
Skip to first unread message

Tomas Bosak

unread,
Oct 23, 2014, 10:58:17 AM10/23/14
to aran...@googlegroups.com
Hi folks,

does AQL support adding and removing of list elements? For example something like this:

LET array = []
array.add(1)
array.add(2)
array.remove(0)
RETURN array

will return [ 2 ]

Jan Steemann

unread,
Oct 23, 2014, 12:02:01 PM10/23/14
to aran...@googlegroups.com
Hi Tomas,

not yet.
I think if they are added, they should work something like this (please
ignore the function names for now):

LET array = [ ]

LET temp1 = ADD(array, 1), temp2 = ADD(temp1, 2), temp3 = REMOVE(temp2, 0)

RETURN temp3


The reason is that on the main level, each statement must be one of FOR,
LET, RETURN etc. A simple "array.add(...)" wouldn't work on the top
level, but only inside expressions.
Additionally, an AQL function should treat its input as immutable data,
and if that data are going to be modified, it needs to be returned in a
new variable. This is sensible as the query optimizer can clearly find
dependencies between variables and statements and shuffle them around
only when sensible, without unintentionally modifying query results.

I think if we follow this approach, it would work. What do you think?

Best regards
Jan
> --
> You received this message because you are subscribed to the Google
> Groups "ArangoDB" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to arangodb+u...@googlegroups.com
> <mailto:arangodb+u...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Tomas Bosak

unread,
Oct 23, 2014, 1:57:52 PM10/23/14
to aran...@googlegroups.com, j.ste...@triagens.de
Seems legit, thanks Jan.

One more question: any updates regarding AQL support for conditional logic through native IF/ELSE/SWITCH statements?

Jan Steemann

unread,
Oct 24, 2014, 8:46:13 AM10/24/14
to aran...@googlegroups.com
Hi Tomas,

we have refactored a lot of the AQL internals in the devel branch
recently. We have decided to not introduce too much control flow logic
into AQL with dedicated statements, at least not something full-featured
like this:

IF someCondition THEN
FOR i IN ...
RETURN i
ELSE
FOR j IN ...
REMOVE j IN ...
ENDIF

The reason for not introducing "real" control flow statements is that it
would make query plan execution and distribution a nightmare. For
example, in the above case, the query would become either a retrieval or
a removal query, depending on the condition. This would be really
problematic for our implementation.

Allowing control flow on sub-levels would also be problematic. Consider
the following query:

FOR doc IN ...
IF someConditionBasedOnDoc THEN
FILTER someFilterConditionForDoc...
ENDIF

RETURN i

In the above query, index usage would be disabled if the condition
depends on the variable produced by the FOR loop. There might also be
further side-effects of control flow that would disable optimizations.


I think the only sensible way to have if/then/else or switch/case
functionality in AQL is by adding them as functions, e.g.

FOR i IN ...
RETURN IF(someCondition, trueValue, falseValue)

LET cases = { "foo": fooValue, "bar": barValue }
LET defaultValue = "not found!";

FOR i IN ...
RETURN SWITCH(value, cases, defaultValue)

What do you think about this?

Best regards
Jan
> > an email to arangodb+u...@googlegroups.com <javascript:>
> > <mailto:arangodb+u...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>

Tomas Bosak

unread,
Oct 25, 2014, 5:21:50 AM10/25/14
to aran...@googlegroups.com, j.ste...@triagens.de
Thanks for explanation Jan. Can you please provide more examples on the IF function? Because in the form of

    IF(someCondition, trueValue, falseValue) 

it doesn't really seem much different than ternary operator.

The problem I'm usually facing is that I need to store the result of some subquery into variable where the subquery code is dependent on previous condition, for example I now use this construct:

FOR item IN SomeCollection
    RETURN (item.someField == 1) ? (
        FOR subItem IN SomeCollection2
            FILTER subItem.fieldX == true ...
            RETURN subItem
    ) : (
        FOR subItem IN SomeCollection3
            FILTER subItem.fieldY == false ...
            RETURN subItem
    )

As you can see ternary operator works here, but if I would need to deal with three or more different subqueries then it would get more complicated and the query itself would contain a lot of "noise".

Dan Donoghue

unread,
Nov 9, 2014, 6:55:05 PM11/9/14
to aran...@googlegroups.com
If anyone's interested in adding or removing from an array, you can do it with the UNION and SLICE functions. I don't know if there's a better way of doing this

Adding:

LET doc = DOCUMENT( 'collection', 'document' )
UPDATE doc
WITH
{ myArray:
  UNION
(
    doc
.myArray,
   
[ @newItem ]
 
)
}
IN collection


Removing:

LET doc = DOCUMENT( 'collection', 'document' )
LET pos
= POSITION( doc.myArray, @remItem, true )
UPDATE doc
WITH
{ myArray:
  UNION
(
    SLICE
( doc.myArray, 0, pos ),
    SLICE
( doc.myArray, pos + 1 )
 
)
}
IN collection

Jan

unread,
Nov 13, 2014, 7:31:19 AM11/13/14
to aran...@googlegroups.com
Hi Dan,

thanks for sharing!
I think what you did is the way to go in the absense of dedicated add-to-list or remove-from-list AQL functions.
I think we will add such functions in the 2.3 branch, as they provide a lot of benefit.

Best regards
Jan

Jan

unread,
Nov 13, 2014, 7:56:14 AM11/13/14
to aran...@googlegroups.com, j.ste...@triagens.de
Hi Tomas,

you're right, a potential IF() function would be the function equivalent to the ternary operator. A potential SWITCH() function could be the equivalent to a switch/case statement in a programming language.
There is one very important difference though: Both functions would work in case the different branches are just simple expressions.
Here are two examples how they could work:

LET a = IF(x == 1, 'is one', 'is not one')
LET b = SWITCH(x, [ x == 1, 'foo' ], [ x == 2, 'bar' ], [ x == 4, 'baz' ], 'unknown')

In general, these functions would also work if the expressions would be subqueries. The problem is that if subqueries are passed as function parameters, they will need to be executed.
The following SWITCH would thus need to execute all the subqueries first and then pick the correct branch, which isn't ideal:

LET b = SWITCH(x, [ x == 1, (FOR ... RETURN ...) ], [ x == 2, (FOR ... RETURN ...) ], [ x == 4, (FOR ... RETURN ...) ], 'unknown')

Adding "real" control flow functionality with conditional executes will make query optimization and distribution in the cluster much much harder.
So currently I can't think of something sensible that supports "real" control flow and allows for efficient execution at the same time.

In this case it might even be better to split the one query into two:
- in the first query, fetch the value that would have driven the SWITCH condition
- have the application issue a dedicated second query, based on the return value of the first

Obviously that means two queries instead of one, but it might be more efficient. If two client/server roundtrips are overkill, the small bit of decision logic could also be put into a server-side action that issues the first and second query on the server side, avoiding the extra roundtrip.


What do you think?

Best regards
Jan

Tomas Bosak

unread,
Nov 18, 2014, 10:33:19 AM11/18/14
to aran...@googlegroups.com, j.ste...@triagens.de
Hi Jan,

so, due to query optimization complexity executing subqueries within IF/SWITCH statements would be hard and each subquery will have to be executed first before applying decision logic from flow control statements. In other words the problem isn't in decision logic itself, but in subsequent subqueries that would have to be called if the control flow branch is active. Do I understand it correctly?

I think the solution with two/multiple queries is feasible, though queries that consist of several nested loops where e.g. each FOR statement is followed by some decision logic with multiple subqueries creates complex scenarios where part of the business logic is in AQL and part on the application/server side. And I think this is the main issue - AQL gives you the power to offload significant part of your application/server side business logic related to persisted data without having to issue multiple requests (when compared for example to SQL counterpart), however without 'real' control flow functionality you are forced to break up the chain of this logic partially into AQL and application/server side.

I know that there is another solution - use Foxx/actions. I have to say I'm not a big fan of it (although I consider it a good concept) because it's tied primarily to JavaScript and although it provides control flow closer to the database server, you still have to break up business logic relevant to data between AQL and application itself.

Would it be possible to put subqueries into temporary functions which would be optimized/executed only once called/activated?

Jan Steemann

unread,
Nov 20, 2014, 6:12:55 AM11/20/14
to aran...@googlegroups.com
Hi Tomas,

we have been thinking a bit about conditional execution of calculations
and of subqueries. Currently, our idea for this looks like this:

A LET statement will always assign an expression result to a variable,
so it will need to evaluate its expression, always. If an expression
contains a subquery, the subquery will be executed first, always. There
is no simple way to make the subquery execution conditional, because the
subquery and the calculation (LET assignment) are split into two
independent processing steps.

However, if we introduced a conditional LET statement, we could make
evaluation lazy. A potential syntax could look like this in general:

LET variable = expression IF condition

And with a more concrete example:

LET a = (FOR i IN ... RETURN i) IF type == 'foo'

This could be split into the following nodes:

- a calculation that computes the result for the condition (normally
this will be rather cheap)
- a subquery that will only be executed for values which have the
condition variable set to true or a tru-ish value
- a calculation node that assigns the value to variable "a". this will
only be executed for values which have the condition variable set to
true or a tru-ish value

I think this solution would allow conditional execution, saving the
overhead of unnecessarily executed subqueries. And it could be used for
other things than subqueries, e.g. to avoid expensive function calls etc.

What do you think?

Best regards
Jan


> <mailto:arangodb+u...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Tomas Bosak

unread,
Nov 25, 2014, 4:35:16 AM11/25/14
to aran...@googlegroups.com
Hi Jan,

conditional LET statement looks like a very interesting concept. Would it be possible to also use ELSE IF/ELSE statements with this approach? e.g. something like this:

FOR item IN Items
    LET a = (
        FOR otherItem IN OtherItems
            FILTER otherItem.Parent == item._id
            RETURN otherItem
    ) IF item.Type == 1
    LET a = (
        FOR differentItem IN DifferentItems
            FILTER differentItem.Parent == item._id
            RETURN differentItem
    ) ELSE IF item.Type == 2
    FILTER a != null
    RETURN a

Jan Steemann

unread,
Dec 1, 2014, 8:41:15 AM12/1/14
to aran...@googlegroups.com
Hi Tomas,

I think there would be two problems with that:

1) variables must not be redefined in currently AQL, and in the examples
there are two assignments to variable "a". This will be highly
problematic internally though not necessarily from a user standpoint.

2) ideally these conditional LETs would really be fully independent of
other statements. In your example, the second LET also depends on the
result of the IF condition of the first, and we'd like to avoid that so
the different LETs can be moved around by the optimizer when sensible.


I think it could work as follows:

LET a1 = (
FOR otherItem IN OtherItems
FILTER otherItem.Parent == item._id
RETURN otherItem
) IF item.Type == 1
LET a2 = (
FOR differentItem IN DifferentItems
FILTER differentItem.Parent == item._id
RETURN differentItem
) IF item.Type == 2
LET a3 = (
...
) IF item.Type == 3

LET result = NOT_NULL(a1, a2, a3)

FILTER result != null

This would still be relatively cheap because all that would need to be
done is to evaluate the IF conditions and the assignments for which the
conditions are true-ish. This does not necessarily need to be a single
one as the IF conditions do not need to be mutually exclusive.

What do you think?

Best regards
Jan


Tomas Bosak

unread,
Dec 3, 2014, 10:53:09 AM12/3/14
to aran...@googlegroups.com
Hi Jan,

this looks pretty sensible and should provide enough flexiblity to create AQL queries with more complex control flow logic. Thanks a lot!
Reply all
Reply to author
Forward
0 new messages