performance when deleting large numbers of nodes

已查看 80 次
跳至第一个未读帖子

John Fry

未读,
2016年6月18日 16:41:082016/6/18
收件人 Neo4j
Hello All,

I have a graph of about 200M relationships and often I need to delete a larges amount of them.
For the proxy code below I am seeing huge memory usage and memory thrashing when deleting about 15M relationships.

When it hits tx.close() I see all CPU cores start working at close to 100% util and thrash for > 30mins.
I need this to work in <5mins ideally.

(note when I execute large amounts of changes to properties or create large amounts of new properties I don't have such issues)

Any advice? Why is this happening?

Regards, John.



         int txc = 0;

         // serially delete the links

         try ( Transaction tx = db.beginTx() ) {

    for (int i=0; i<deletedLinks.size(); i++) {

    Relationship rel = db.getRelationshipById(deletedLinks.get(i));

    rel.delete();

        txc++;

    if (txc>50000) {

    txc=0;

    tx.success();

    }

    }

            tx.success();

    tx.close();

        }

        catch (Exception e) {

            System.out.println("Exception link deletion: " + e.getMessage());

        }

Clark Richey

未读,
2016年6月18日 16:43:192016/6/18
收件人 ne...@googlegroups.com
You need to periodically commit.  Holding that many transactions in memory isn't efficient.  

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Fry

未读,
2016年6月18日 17:03:332016/6/18
收件人 Neo4j
Thanks Clark - is there any good/recommended way to nest the commits?

Thx JF

Clark Richey

未读,
2016年6月18日 17:25:202016/6/18
收件人 ne...@googlegroups.com
Don't them. Just create a counter and every x deletes commit the transaction and open a new one. 

Sent from my iPhone

John Fry

未读,
2016年6月18日 17:29:252016/6/18
收件人 Neo4j

Clark - this works. It is still slow. I guess multithreading may help some.... 



        Transaction tx = db.beginTx();

    //try ( Transaction tx = db.beginTx() ) {

    for (int i=0; i<deletedLinks.size(); i++) {

    Relationship rel = db.getRelationshipById(deletedLinks.get(i));

    rel.delete();

     txc++;

    if (txc>50000) {

    txc=0;

    tx.success();

    tx.close();

    tx = db.beginTx();

    }

    }

            tx.success();

    tx.close();

        //}

        //catch (Exception e) {

        //    System.out.println("Exception link deletion: " + e.getMessage());

        //}

Clark Richey

未读,
2016年6月18日 17:54:052016/6/18
收件人 ne...@googlegroups.com
Yes. That's a lot to delete doing it in parallel will definitely help

Sent from my iPhone

Michael Hunger

未读,
2016年6月18日 19:20:262016/6/18
收件人 ne...@googlegroups.com
Shouldn't be slow. Faster disk. Concurrent batches would help. 

Von meinem iPhone gesendet

Chris Vest

未读,
2016年6月20日 05:11:462016/6/20
收件人 ne...@googlegroups.com
If you perform deletes in parallel, it can be worth investing some time in making the code smart enough to choose disjoint data sets in transactions that run in parallel; e.g. no node should be start or end node in more than one parallel transaction at a time. That way they won’t contend on locks, or worse, run into deadlocks and roll back.

--
Chris Vest
System Engineer, Neo Technology
[ skype: mr.chrisvest, twitter: chvest ]

回复全部
回复作者
转发
0 个新帖子