Hi all!,
We are doing some tests to try choose which graph db to work with, at the moment we are considering Neo4j and Arango. We have installed both with exactly the same database on the same computer just to know how we can deal with the comparison.
Our use case is just a recommendation engine and just for testing we are using a very basic model. Device-Watched->Content with a limited time range.
- There are 282k Device nodes in a device collection
- There are 114k Watched relations in a watched colection
- There are 37k Content nodes in a content collection
The very first question to answer is: "Top watched content"
* In neo4j the query is shown below with a time execution of 141ms
MATCH (c:Content)
RETURN c.content, size((c)<-[:WATCHED]-()) AS vistos
ORDER BY vistos DESC
* We tried now with Arango with these other queries (all of them with the time execution):
//ArangoDB (graph mode) : 5.242 s
FOR c IN contents
LET contenidos =
(FOR v IN 1..1 INBOUND c WATCHED RETURN v)
LET nr = LENGTH(contenidos)
SORT nr DESC
RETURN { contenido: c.content, num: nr }
//ArangoDB (documents mode) : 8.790 s
FOR w IN WATCHED
FOR c IN contents
FILTER c._id == w._to
COLLECT contenido = w._to INTO contenidos
LET num = LENGTH(contenidos[*])
SORT num DESC
RETURN {contenido : contenido, nombre: contenidos[0].c.content, num}
//ArangoDB (documents mode) : 9.494 s
FOR w IN WATCHED
FOR c IN contents
FILTER c._id == w._to
COLLECT contenido = w._to INTO contenidos = c.content
LET num = LENGTH(contenidos[*])
SORT num DESC
RETURN {contenido : contenido, nombre: contenidos[0], num}
//ArangoDB (documents mode) : 1.247 s
FOR w IN WATCHED
COLLECT contenido = w._to WITH COUNT INTO top
SORT top DESC
FOR c IN contents
FILTER c._id == contenido
RETURN { contenido: c.content, num: top}
We are having very bad numbers and for comparison we suspect we are missing something really important, but we don't see anything wrong. Any one had a similar experience ?
Regards,
Xavi