In my project, I have a fairly large prefix tree, potentially containing millions of nodes (about 250K nodes in my development instance), managed in OrientDB (pointing to other vertices in my graph).
The nodes of the prefix tree are represented by a Token vertex type. Each Token has a 'key' property and is connected to its child vertices by a 'child' edge type. So, a sequence like "hello world" would be represented as:
root -child-> "hello" -child-> "world"
Currently, I have a NOTUNIQUE_HASH_INDEX on Token.key and I am querying the data structure like this:
SELECT EXPAND(OUT('child')[key=:k]) FROM :p
where k is the child key I am looking for and p is the RID of the parent node.
Generally, performance is pretty good, but I am looking for ideas on improving the query, the indexing, or both for this use case. In particular, queries starting at the root node, which has many children, take noticeably longer than the other, less-connected nodes.
Any suggestions? Thanks in advance!
-- John