Poor performance on relatively big db

46 views
Skip to first unread message

Matias Burak

unread,
Dec 24, 2016, 4:55:26 AM12/24/16
to Neo4j
We are working with the latest 3.1 version, we have a database of around 3.5M nodes and 100M relationships.
Doing some basic queries is very slow that makes the application unusable.

neo4j-sh (?)$ profile MATCH (n1:`Club`)<-[r1:CLUB]-(n:`Player`) WHERE n.`international` = true RETURN COUNT(*) AS `Count`;
+---------+
| Count   |
+---------+
| 2181771 |
+---------+
1 row
28676 ms

Compiler CYPHER 3.1

Planner COST

Runtime INTERPRETED

+-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
| Operator          | Estimated Rows | Rows    | DB Hits | Variables   | Other                                         |
+-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
| +ProduceResults   |           1027 |       1 |       0 | Count       | Count                                         |
| |                 +----------------+---------+---------+-------------+-----------------------------------------------+
| +EagerAggregation |           1027 |       1 |       0 | Count       |                                               |
| |                 +----------------+---------+---------+-------------+-----------------------------------------------+
| +Filter           |        1054596 | 2181771 | 4872405 | n, n1, r1   | n.international == {  AUTOBOOL0} AND n:Player |
| |                 +----------------+---------+---------+-------------+-----------------------------------------------+
| +Expand(All)      |        2445865 | 2445865 | 2445868 | n, r1 -- n1 | (n1)<-[r1:CLUB]-(n)                           |
| |                 +----------------+---------+---------+-------------+-----------------------------------------------+
| +NodeByLabelScan  |              3 |       3 |       4 | n1          | :Club                                         |
+-------------------+----------------+---------+---------+-------------+-----------------------------------------------+

Total database accesses: 7318277

There are around 2.3M players and 3 clubs. 
It's taking 28secs the first time, then in takes around 9-10 secs.
It's really annoying because it's a very simple query.
Is there anything we are doing wrong? Is this something we can do about or it's the expected behaviour? This is running on a 16gbRAM machine, getting more RAM is the only way to improve it?

Thanks,
Matias. 
 

Max De Marzi Jr.

unread,
Dec 25, 2016, 11:27:13 AM12/25/16
to Neo4j
Try:

MATCH (n:`Player`) WHERE n.`international` = true RETURN SUM(size((n)-[:CLUB]->()) ) AS `Count`;

Matias Burak

unread,
Dec 25, 2016, 2:17:30 PM12/25/16
to ne...@googlegroups.com
Well, actually I was simplifying the query but what we really need to do is to filter by some property in club too (like club.name = 'FC Barcelona') so that solution wouldn't work...

--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/uZtQWJOC1HE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Max De Marzi Jr.

unread,
Dec 26, 2016, 11:00:41 PM12/26/16
to Neo4j

Have you created Indexes for the fields you will be searching on? For example:

CREATE INDEX ON :Club(name)

What is the actual query you are trying to optimize and what is the actual query execution plan of the query?

Matias Burak

unread,
Dec 27, 2016, 3:38:43 PM12/27/16
to Neo4j
neo4j-sh (?)$ profile MATCH (n1:`Club`)<-[r1:CLUB]-(n:`Player`) WHERE n.`international` = true AND n1.name='FC Barcelona' RETURN COUNT(*) AS `Count`;
+---------+
| Count   |
+---------+
| 2181771 |
+---------+
1 row
7912 ms

Compiler CYPHER 3.1

Planner COST

Runtime INTERPRETED

+-------------------+----------------+---------+---------+-------------+--------------------------------------------------+
| Operator          | Estimated Rows | Rows    | DB Hits | Variables   | Other                                            |
+-------------------+----------------+---------+---------+-------------+--------------------------------------------------+
| +ProduceResults   |            593 |       1 |       0 | Count       | Count                                            |
| |                 +----------------+---------+---------+-------------+--------------------------------------------------+
| +EagerAggregation |            593 |       1 |       0 | Count       |                                                  |
| |                 +----------------+---------+---------+-------------+--------------------------------------------------+
| +Filter           |         351532 | 2181771 | 4869817 | n, n1, r1   | n.international == {  AUTOBOOL0} AND n:Player    |
| |                 +----------------+---------+---------+-------------+--------------------------------------------------+
| +Expand(All)      |         815288 | 2444571 | 2444572 | n, r1 -- n1 | (n1)<-[r1:CLUB]-(n)                              |
| |                 +----------------+---------+---------+-------------+--------------------------------------------------+
| +NodeIndexSeek    |              1 |       1 |       2 | n1          | :Club(name)                                      |
+-------------------+----------------+---------+---------+-------------+--------------------------------------------------+

Total database accesses: 7314391

That's the actual query and the execution plan. And yes, we do have indexes on :Club(name) and :Player(international)

Michael Hunger

unread,
Dec 28, 2016, 6:06:45 AM12/28/16
to ne...@googlegroups.com
I would turn boolean properties, like international into a label.


like this:
MATCH (n1:`Club`)<-[r1:CLUB]-(n:International) WHERE n1.name='FC Barcelona' RETURN COUNT(*) AS `Count`

Something seems to be off with your data too.

As there are hardly 2.2M international players in FC Barcelona (and 2.4M in total) it will take a bit to expand / load them.
What kind of Hardware do you run this on?

Or is this some computer game and players there?

Please also try something like this:

MATCH (c:Club {name:"FC Barcelona"})
MATCH (p:Player:International) WHERE (p)-[:CLUB]->(c)
RETURN count(*);

Michael


--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.

Matias Burak

unread,
Dec 29, 2016, 9:10:37 AM12/29/16
to Neo4j
Thanks Michael, maybe using the boolean property as label would help but it's very complicated for us to do that because we are allowing the users to build their own data model, so there might be a lot of boolean properties, and we would have to turn all those properties into labels, making the database very complex and the way to retrieve data too.
Also, it's not just about booleans, we also have other cases like filtering by strings instead of boolean, and that's slow too.
The players/clubs graph was something i made up to protect our data. 
This is what we get when we try to filter on 2 properties on related nodes.

neo4j-sh (?)$ profile MATCH (n1:`Label1`)<-[r1:REL1]-(n:`Label2`) WHERE n.`prop1` = 'aaa' AND n1.`prop2`='bbb' RETURN COUNT(*) AS `Count`;
+---------+
| Count   |
+---------+
| 2127237 |
+---------+
1 row
5747 ms

Compiler CYPHER 3.1

Planner COST

Runtime INTERPRETED

+-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
| Operator          | Estimated Rows | Rows    | DB Hits | Variables   | Other                                         |
+-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
| +ProduceResults   |            498 |       1 |       0 | Count       | Count                                         |
| |                 +----------------+---------+---------+-------------+-----------------------------------------------+
| +EagerAggregation |            498 |       1 |       0 | Count       |                                               |
| |                 +----------------+---------+---------+-------------+-----------------------------------------------+
| +Filter           |         248298 | 2127237 | 4815283 | n, n1, r1   | n.prop1 == {  AUTOSTRING0} AND n:Label2       |
| |                 +----------------+---------+---------+-------------+-----------------------------------------------+
| +Expand(All)      |         815288 | 2444571 | 2444572 | n, r1 -- n1 | (n1)<-[r1:REL1]-(n)                           |
| |                 +----------------+---------+---------+-------------+-----------------------------------------------+
| +NodeIndexSeek    |              1 |       1 |       2 | n1          | :Label1(prop2)                                |
+-------------------+----------------+---------+---------+-------------+-----------------------------------------------+

Total database accesses: 7259857
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.

Michael Hunger

unread,
Dec 29, 2016, 4:07:29 PM12/29/16
to ne...@googlegroups.com
It depends a bit.

If you can aggressively filter on both sides (doesn't seem so with your > 2M results).

Then you can do an index lookup for both sides and do a WHERE on the cross product.

The other thing you can try is this (make sure n1 is the label with smaller cardinality, which you can determine upfront quickly, by running MATCH (:Label)-[:TYPE]->()  return count(*)

profile 
MATCH (n:`Label2`) WHERE n.`prop1` = 'aaa' WITH collect(n) as nodes
MATCH (n1:`Label1`)<-[:REL1]-(n) WHERE n1.`prop2`='bbb' AND n IN nodes 
RETURN COUNT(*) AS `Count`;


profile 
MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb'
USING INDEX n1:Label1(prop1)
MATCH (n:`Label2`)-[:REL1]->(n1) WHERE n.`prop2`='aaa'
USING INDEX n:Label2(prop2)
RETURN COUNT(*) AS `Count`;

profile 
MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb'
USING INDEX n1:Label1(prop1)
MATCH (n:`Label2`) WHERE n.`prop2`='aaa' AND (n)-[:REL1]->(n1)
USING INDEX n:Label2(prop2)
RETURN COUNT(*) AS `Count`;

Please try these variants, I can imagine reasons for either being faster.
And let us know.

Your store is not that big (3M nodes 100M rels) but have you made sure it is completely memory mapped?

Michael

To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscribe@googlegroups.com.

Michael Hunger

unread,
Dec 29, 2016, 4:12:25 PM12/29/16
to ne...@googlegroups.com
In 3.1  (and then also 3.2)
you could also use the compiled runtime (except for the count)

So a query like

cypher runtime=compiledExperimentalFeatureNotSupportedForProductionUse profile 
MATCH (n1:`Label1`)<-[r1:REL1]-(n:`Label2`) WHERE n.`prop1` = 'aaa' AND n1.`prop2`='bbb' RETURN 1

should use the compiled runtime and run faster than the interpreted runtime. In 3.2 it will be again (runtime=compiled) or actually the default in enterprise.

Michael

Matias Burak

unread,
Dec 30, 2016, 9:32:36 AM12/30/16
to ne...@googlegroups.com
I think most of the slowness is when doing count. If I just get a small set doing limit it works fine.

Also, while trying to test your suggestions, I wan’t able to get USING INDEX to work, is there anything wrong in the syntax?

neo4j> profile
       MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb'
       USING INDEX n1:Label1(prop1)
       MATCH (n:`Label2`)-[:REL1]->(n1) WHERE n.`prop2`='aaa'
       USING INDEX n:Label2(prop2)
       RETURN COUNT(*) AS `Count`;
Invalid input 'S': expected 'n/N' (line 4, column 2 (offset: 56))
"USING INDEX n1:Label1(prop1)"
            ^

Also, how can I make sure is it completely memory mapped? I have 16gb on the environment I’m testing it and and I left the neo4j defaults for memory.

Matias Burak

unread,
Dec 30, 2016, 9:56:21 AM12/30/16
to ne...@googlegroups.com
Ok, the USING INDEX had to go before WHERE, but doing that the query never finishes…

Also, the first query you proposed is slower than the original.
Reply all
Reply to author
Forward
0 new messages