Index not used for query

27 views
Skip to first unread message

Andreas Jung

unread,
May 19, 2019, 9:29:45 AM5/19/19
to ArangoDB
Running ArangoDB 3.4.5 on a collection with 100.000 items.
Each item has singe-valued field _type and there are about 20 different values for this field.
The field is indexed by a hash index: https://nimb.ws/yh4rRb

This query performs a full table scan instead of using the index.
Am I missing something here?

For doc in import
    return distinct doc._type

Query String:
 For doc in import
     return distinct doc._type

Execution plan:
 Id   NodeType                   Est.   Comment
  1   SingletonNode                 1   * ROOT
  2   EnumerateCollectionNode   95238     - FOR doc IN import   /* full collection scan, projections: `_type` */
  3   CalculationNode           95238       - LET #1 = doc.`_type`   /* attribute expression */   /* collections used: doc : import */
  4   CollectNode               76190       - COLLECT #3 = #1   /* distinct */
  5   ReturnNode                76190       - RETURN #3

Indexes used:
 none

Optimization rules applied:
 Id   RuleName
  1   reduce-extraction-to-projection


Jan

unread,
May 20, 2019, 2:39:47 AM5/20/19
to ArangoDB
Hi,

as of ArangoDB 3.4, `RETURN DISTINCT` (which is turned into a `COLLECT` statement by the optimizer) is not able to make use of indexes.
The same is true for any other form of `COLLECT`. Indexes will be used for filtering and sorting, but right now they are not used for any sort of aggregation.

Best regards
Jan

Andreas Jung

unread,
May 20, 2019, 2:42:38 AM5/20/19
to aran...@googlegroups.com
So what is the right way for performing a fast aggregation and grouping? Search views?

-- 
Andreas Jung
--
You received this message because you are subscribed to the Google Groups "ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arangodb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/arangodb/2c5efa6f-f1e6-4db9-8f18-07e19997e92e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jan

unread,
May 20, 2019, 3:10:14 AM5/20/19
to ArangoDB
Hi,

I can't tell if views will handle this query very differently, but I doubt that when using a view the extra `COLLECT` step will be removed.
That's not to say that there will be no performance difference when using views (I simply don't know). But when it comes to the `COLLECT`, I guess it will still stay around in the query even if a view is used.

I guess the query could be most improved by implementing the distinct operation directly on the index, which would remove the extra `COLLECT` step entirely.
But that functionality is still a future to-do and not available yet.

Best regards
Jan

Am Sonntag, 19. Mai 2019 15:29:45 UTC+2 schrieb Andreas Jung:

filtered

unread,
May 20, 2019, 3:58:12 AM5/20/19
to aran...@googlegroups.com
The answer does not convince me.

Are you saying that a full table scan is always required e.g. when you want to determine
the distinct values of a particular field and count the related documents with this value independent of 
the existence of an index or not?

--
You received this message because you are subscribed to the Google Groups "ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arangodb+u...@googlegroups.com.

Jan

unread,
May 20, 2019, 5:07:28 AM5/20/19
to ArangoDB
 I was trying to say that in the current version it will always do a full scan to find the distinct values, as the indexes currently do not participate in any operations such as DISTINCT / COLLECT.
What I was also trying to say is that the best remedy for this would be to implement the DISTINCT operation directly in the index scan, but that this is a feature that is not there yet.

Best regards
Jan

Am Montag, 20. Mai 2019 09:58:12 UTC+2 schrieb filtered:
The answer does not convince me.

Are you saying that a full table scan is always required e.g. when you want to determine
the distinct values of a particular field and count the related documents with this value independent of 
the existence of an index or not?

To unsubscribe from this group and stop receiving emails from it, send an email to aran...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages