Does Arango AQL support what is similar to EXIST in SQL?

Aymer

unread,

Oct 30, 2015, 11:17:14 AM10/30/15

to ArangoDB

Hi,

The following join:-

FOR d IN col-1

FOR w IN col-2

FILTER w.uid == 'uid1' && d.attr1 == w.attr2

return d

results in duplicate since there are multiple records in col-2 matches this condition d.attr2 == w.attr1 and this is a situation where EXISTS in SQL help.

Anything idea to avoid duplicates? as a workaround I'm using DISTINCT while returning, but its very slow.

Thanks

Aym

Jan Steemann

unread,

Nov 4, 2015, 8:03:22 AM11/4/15

to aran...@googlegroups.com

Hi,

I tried to reproduce the problem using the following setup of data:

db._create("col1");
db._create("col2");
db.col2.ensureIndex({ type: "hash", fields: [ "uid", "attr2" ] });
for (i = 0; i < 500000; ++i) {
db.col1.insert({ attr1: i % 10000 });
}
for (i = 0; i < 500000; ++i) {
db.col2.insert({ uid: "uid" + (i % 10000), attr2: i % 10000 });
}

That produces 500,000 documents in collections `col1` and `col2` each, with a few matches between them.

Running the original query produces 2,500 matches, including duplicate ´d`s:

FOR d IN col1
FOR w IN col2

FILTER w.uid == 'uid1' && d.attr1 == w.attr2

RETURN d

This query took 2.92 seconds to run on my local machine in 2.7.

When adding a duplicate removal via DISTINCT, the query text slightly changes but runs in about the same time (2.94 seconds):

FOR d IN col1
FOR w IN col2

FILTER w.uid == 'uid1' && d.attr1 == w.attr2

RETURN DISTINCT d

The `EXISTS` keywords does not exist in AQL, but it can be substituted using a subquery and a regular FILTER. Such query will iterate over each `d` only once, and filter out the `d`s for which no match in `w` exists:

    FOR d IN col1
      LET matching = (
        FOR w IN col2

FILTER w.uid == 'uid1' && d.attr1 == w.attr2

          RETURN 1
        )
        FILTER LENGTH(matching) > 0
        RETURN d

The above query ran in 4.98 seconds.

So yes, there is an equivalent to `EXISTS` but at least with the test data I used it does not provide a benefit over using `DISTINCT`.
In fact, the `DISTINCT` can be quite fast if it can use the "hash" method internally. But it may be different with other input data.

As an aside: I also ran the above queries on the same data set with 2.8 (currently in devel), and the execution times for the first two queries went down to 1.25 seconds (from 2.9x seconds) and down to 3.06 seconds from 4.98 seconds for the `EXISTS` query.

I hope this helps.

Best regards

Jan

--
You received this message because you are subscribed to the Google Groups "ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arangodb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

CoDEmanX

unread,

Nov 4, 2015, 6:48:27 PM11/4/15

to ArangoDB

Is that without indexes in all cases? (Just to clarify)

Jan

unread,

Nov 5, 2015, 3:35:27 AM11/5/15

to ArangoDB

My example contained the following index as can be seen in my setup code above:

db.col2.ensureIndex({ type: "hash", fields: [ "uid", "attr2" ] });

Not sure about the original author's query though.

Aymer

unread,

Nov 12, 2015, 7:45:28 AM11/12/15

to ArangoDB

Thanks Jan,

I'll continue with distinct for now.

Rgds

Aym

Reply all

Reply to author

Forward