Slow node creation/commits with unique constraint in 2.0.0

64 views
Skip to first unread message

Moss Prescott

unread,
Dec 20, 2013, 1:36:05 PM12/20/13
to ne...@googlegroups.com
Hi,

I am experimenting with 2.0.0 and notice that it is approx. an order of magnitude slower to add nodes when a unique constraint is defined, versus an index on the same label and property. I've attached a simple program that demonstrates the behavior, adding increasing numbers of simple, unconnected nodes, with either an index or a constraint.

I'm pretty sure the index and constraint are working properly (in the sense that they improve query performance), based on other tests I've done.

I'm getting org.neo4j:neo4j:2.0.0 from maven central, and running on jdk 1.7.0_40 on Fedora with no particular flags (but apparently max heap of 3GB).

Here's the output of the test program:

indexing...
waiting...
adding 1,000 nodes...
added...
committing...
...done; 0.8s

adding constraint...
adding 1,000 nodes...
added...
committing...
...done; 3.3s

indexing...
waiting...
adding 10,000 nodes...
added...
committing...
...done; 2.9s

adding constraint...
adding 10,000 nodes...
added...
committing...
...done; 44.6s

indexing...
waiting...
adding 100,000 nodes...
added...
committing...
...done; 6.3s

adding constraint...
adding 100,000 nodes...

It runs for quite a while before even getting to the commit on the last transaction, at 100% CPU and with the heap climbing to about 1 GB.

Maybe 100k is too many nodes in a single transaction? But as you can see it's more than 10x slower even at much more modest sizes. I also tried adding the same number of nodes, but spread across many smaller transactions, and it's still much slower with the constraint.

Hopefully I'm doing something dumb here. Can anyone suggest a fix or confirm that this isn't working the way it should?

Thanks,
- moss
SlowConstraint.java

Mark Needham

unread,
Dec 21, 2013, 1:21:07 PM12/21/13
to ne...@googlegroups.com
Hi Moss,

Yep you're right, I've noticed this on some benchmarks as well. We're working on improving that - it should be a little slower than a normal index but not 10x slower as in your tests.

However, unless you might be concurrently creating those nodes on which you put the unique constraint you might be able to work around it by using a normal index 

e.g. CREATE INDEX ON :Label(key)

And then using  e.g. MERGE (l:Label {key: {key}}). That isn't thread safe so if you concurrently ran that query and the node with key x hadn't been created you might end up with two versions. However it would ensure that you don't create the same node twice after the initial creation.

Cheers
Mark

Moss Prescott

unread,
Dec 21, 2013, 3:04:18 PM12/21/13
to ne...@googlegroups.com
So if I understand correctly, the constraint provides a stronger guarantee of uniqueness than I can get with either MERGE or doing some kind of check myself through the Java API. In my mind that makes it an interesting feature, so I'll look out for improvements in future releases.

Thanks,
- moss

Eric Fulton

unread,
May 25, 2016, 5:50:37 PM5/25/16
to Neo4j
I'm still seeing 10x performance hit from a uniqueness constraint being added.  Is this on the roadmap to be addressed?

Michael Hunger

unread,
May 25, 2016, 5:56:32 PM5/25/16
to ne...@googlegroups.com
Can give us more details?

Michael

> Am 25.05.2016 um 23:50 schrieb Eric Fulton <efu...@blueorigin.com>:
>
> I'm still seeing 10x performance hit from a uniqueness constraint being added. Is this on the roadmap to be addressed?
>
> --
> You received this message because you are subscribed to the Google Groups "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Eric Fulton

unread,
May 25, 2016, 6:10:52 PM5/25/16
to Neo4j

When I add data to the database, I use a merge something like this:
MERGE (n:Thing { uid : <the uid>}) ON CREATE SET n.prop1  = blah1, n.prop2 = blah2 ...

Since I'm working in a multi-threaded environment, it's possible to have the same node written twice and I've seen many instances where duplicate nodes are created. SO
I have an index on the uid.

CREATE CONSTRAINT ON (n:Thing) ASSERT n.uid IS UNIQUE


Before creating the index, those merges were taking 1-50ms, now they take 20-300ms.  Other operations like adding relationships seem to be taking quite a bit longer too.

Eric Fulton

unread,
May 25, 2016, 6:15:16 PM5/25/16
to Neo4j
This is version 2.3.3

Clark Richey

unread,
May 25, 2016, 6:21:08 PM5/25/16
to ne...@googlegroups.com
I would definitely expect there to be additional overhead now because before the write can happen an index lookup against uid now needs to happen. that has to take some time. I also wonder if you are getting some lock contention doing this multithreaded.

Michael Hunger

unread,
May 25, 2016, 7:05:40 PM5/25/16
to ne...@googlegroups.com
Do you do one operation per transaction or multiple statements per transaction?

I presume you had CREATEs before you had the constraint, not MERGE's ? Because merges without constraints do a full scan on the label.

I also presume you use parameters, not literal values ?!

MERGE (n:Thing { uid : {uid} }) ON CREATE SET n.prop1 = {prop1}, n.prop2 = {prop2}

Did you add other indexes too?

Eric Fulton

unread,
May 26, 2016, 12:25:48 PM5/26/16
to Neo4j
So before the constraint, I was still using merges because I really did want something akin to uniqueness for these nodes.  We had an index on the "uid" so maybe that helped with the full scan?
Yes, I use parameters, not literal values.  

Michael Hunger

unread,
May 26, 2016, 5:33:02 PM5/26/16
to ne...@googlegroups.com
Can you share the query plan from before and after the constraint?

Oh if you had an index on uid then you had the benefit of the "fast scan" but without the penalty of "asserting uniqueness across the index".

Michael

> Am 26.05.2016 um 18:25 schrieb Eric Fulton <efu...@blueorigin.com>:
>
> So before the constraint, I was still using merges because I really did want something akin to uniqueness for these nodes. We had an index on the "uid" so maybe that helped with the full scan?
> Yes, I use parameters, not literal values.
>

Eric Fulton

unread,
May 26, 2016, 6:29:01 PM5/26/16
to Neo4j
I'm sorry, I'm not sure exactly what you mean by query scan.  I haven't changed any of the queries from before the addition of the uniqueness constraint.  I'm still doing the same "MERGE ..." query (like above, but with parameters, as you pointed out).  Do you mean that, or something else?

Thanks!
Eric

Michael Hunger

unread,
May 26, 2016, 8:17:03 PM5/26/16
to ne...@googlegroups.com
I meant:

with an index you the fast lookup but no pentalty
with an constraint you have the uniqueness guarantee via a lock and check against the constraint on write, both of which cost.

But I ask if that can be alleviated.

Could you test it in 3.0.2 if it performs better?

Michael

John Begley

unread,
May 1, 2017, 4:37:43 AM5/1/17
to Neo4j
Hi, was there an update to this. The core problem I'm seeing on a graph I'm working on is very similar. Basically I am creating a graph of people. In order to match people I am creating separate nodes around key matching criteria (like date of birth, first name, family name and gender). With cleanish data going in then half the person nodes will have a relationship to the male gender node and half to the female gender node. I'm finding that when I ramp up the performance tests (with a unique constraint on gender) then the performance degrades hugely but if I remove the constraint then I get multiple gender nodes which doesn't help with matching people!

Is there a common pattern/approach for solving this?

Thanks,
John

Clark Richey

unread,
May 1, 2017, 9:47:04 AM5/1/17
to ne...@googlegroups.com
John,
Generally speaking I don't recommend that you break all of those properties down into individual nodes. Taking that approach is pretty much obviating many of the benefits of a properties based graph and what you end up with is essentially RDF with all of the associated performance problems. 

The links below are some of my favorite posts on good data modeling practices although it certainly isn't 100% comprehensive. 

John Begley

unread,
May 1, 2017, 10:25:34 AM5/1/17
to Neo4j
Hey Clark,

Thanks for the reply and the links. I'm not sure I described my issue well enough: I'm using the gender,dob first name nodes as a method of implementing the scoring and matching logic. For clarity I have a person node with attributes all self contained, but in addition I'm making a relationship to a deliberately separate node for gender etc to act as part of the mechanism of matching.

So I'd like to persist with the strategy as it's an elegant and extensible solution for matching. But obviously if the performance isn't up to it then I have to think again. So I'm kind of keen to work through the performance issues rather than rewrite the application.

I hope that's clearer.

Cheers
John

Reply all
Reply to author
Forward
0 new messages