JVM Out of Heap Memory After creating about 200,000 Nodes

140 views
Skip to first unread message

Mohammad Habbab

unread,
Mar 21, 2012, 5:20:17 PM3/21/12
to ne...@googlegroups.com
Hi, I'm running the following code On a 64-bit JVM , JDK 1.7.0, with 4GB RAM, and at a certain point in GetStressTest1()  The Exception Re-occurs over and over no matter what the configurations of the JVM at start up are !

Code:

    static EmbeddedGraphDatabase gdb ;
    static String DB_PATH = "./GDatabase";
    static Index<Node> userIndexer;
    static Node userNS;
    static int NODES_COUNT = 200000;

public static void main(String[] args) {
        
        gdb = new EmbeddedGraphDatabase(DB_PATH);
        RegisterShutdownHook(gdb);
        Transaction tx = gdb.beginTx();
        try{
        userIndexer = gdb.index().forNodes("users");
        userNS = gdb.createNode();
        gdb.getReferenceNode().createRelationshipTo(userNS, RelTypes.USER_NS);
        tx.success();
        }catch(Exception e){
          System.out.println("Error in main Method ! won't perform any tests !");
          return;
        }finally{
            tx.finish();
        }
        StoreStressTest1(0, NODES_COUNT);
        //StoreStressTest2(NODES_COUNT/2, NODES_COUNT);
        GetStressTest1();
    }

public static String Id2UserName(final int id){
        return "user"+id+"@neo4j.org"; 
    }
    
private static void CreateIndexInsert(final int id,final String username){
       Node usernode = gdb.createNode();
       usernode.setProperty("username", username);
       userIndexer.add(usernode,""+id, username);
       userNS.createRelationshipTo(usernode, RelTypes.USER);
    }
    
    private static long GetNode(){
       long start,end;
       int Id2find =  ((int)(Math.random()*NODES_COUNT));
       start = System.currentTimeMillis();
       Node user2find = userIndexer.get(""+Id2find, Id2UserName(Id2find)).getSingle();
       end = System.currentTimeMillis();
       return end - start;
    }
    
    private static void RegisterShutdownHook(final GraphDatabaseService GraphDB){
        Runtime.getRuntime().addShutdownHook(new Thread(){
            @Override
            public void run(){
                GraphDB.shutdown();
            }
        });
    }
    
    
    private static void StoreStressTest1(int first,int last){
        long start = 0,end = 0;
        Transaction tx = gdb.beginTx();
        try{
            start = System.currentTimeMillis();
            for (int i = first; i < last; i++) {
                CreateIndexInsert(i, Id2UserName(i));
            }
            tx.success();
        }catch(Exception e){
           System.out.println("Message: " + e.getMessage()); 
        }finally{
            tx.finish();
            end = System.currentTimeMillis();
            System.out.println("(SS1)Insert Time for " + (last - first)+" Nodes: " + (end - start));
        } 
    }

private static void GetStressTest1(){
        long accumulator = 0;
        Transaction tx = gdb.beginTx();
        try{
            for (int i = 0; i < NODES_COUNT; i++) {
                accumulator+= GetNode(); 
            }
            tx.success();
            System.out.println("Time for " + NODES_COUNT +" Retrieves is " + accumulator);
        }catch(Exception e){
             System.out.println("Message: " + e.getMessage());
        }finally{
            tx.finish();
        }
    
    }

Here's the Output Given by the console :
run:
(SS1)Insert Time for 200000 Nodes: 280068
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.SegmentNorms.bytes(SegmentNorms.java:156)
at org.apache.lucene.index.SegmentNorms.bytes(SegmentNorms.java:143)
at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:599)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:107)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:577)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:426)
at org.neo4j.index.impl.lucene.Hits.getMoreDocs(Hits.java:132)
at org.neo4j.index.impl.lucene.Hits.<init>(Hits.java:105)
at org.neo4j.index.impl.lucene.LuceneIndex.search(LuceneIndex.java:381)
at org.neo4j.index.impl.lucene.LuceneIndex.query(LuceneIndex.java:282)
at org.neo4j.index.impl.lucene.LuceneIndex.get(LuceneIndex.java:206)
at neo4j.Neo4J.GetNode(Neo4J.java:64)
at neo4j.Neo4J.GetStressTest1(Neo4J.java:122)
at neo4j.Neo4J.main(Neo4J.java:46)
Java Result: 1
BUILD SUCCESSFUL (total time: 13 minutes 48 seconds)

Michael Hunger

unread,
Mar 21, 2012, 5:24:19 PM3/21/12
to ne...@googlegroups.com
Please batch your transactions so that you commit after 10k to 20k nodes.

Michael

Mohammad Habbab

unread,
Mar 21, 2012, 5:29:49 PM3/21/12
to ne...@googlegroups.com
Thank you for fast response Michael, but my Neo4J application requires that i do small numbers of inserts and deletes per user per request, and Neo4j will be my only online DB-Server .. so how do i approach the solution in a Programmatic way in your opinion ? 

Michael Hunger

unread,
Mar 21, 2012, 6:41:00 PM3/21/12
to ne...@googlegroups.com
But you're inserting 200k nodes, that's not a small one?

Small tx (a few nodes) have a larger overhead compared to medium (10k-20k) tx but work fine nonetheless.

Michael

Mohammad Habbab

unread,
Mar 22, 2012, 4:56:29 AM3/22/12
to ne...@googlegroups.com
True ... 200K nodes are part of my testing for Neo4j .. but what i don't understand is: I commited the insert .. and it worked successfully .. the output shows that the exception occurs in the GetStressTest() function (This is the function responsible for getting a random node from the database for a large number of times) .. so .. what's keeping my 200K nodes in memory, and why isn't the Garbage Collector doing some cleaning !? ( I did a memory profile and inserted some calls for the garbage collector in an updated version of the code .. plus i changed the GetNode() function to make sure that i call the close on the Iteratable<Node> Result from my query .. still the resources are not getting freed at all ... thank you in advance !

Michael Hunger

unread,
Mar 22, 2012, 6:21:17 AM3/22/12
to ne...@googlegroups.com
I looked into your code.
After all you are just executing the lucene query engine and none of neo4j's graph operations.

I changed the index add + get to use "username" as key. Otherwise the 200k different generated key-strings keep lucene busy and it is not the intended use. (it takes more and more time for lucene to handle 200k different keys (not values)) and this is also the cause for the OOM.

With this change and batching every 10k it inserted the nodes on my system in 9.4 sec and retrieved in 18 sec.

Michael

Mattias Persson

unread,
Mar 30, 2012, 5:56:15 AM3/30/12
to ne...@googlegroups.com
Yes, Lucene will choke when adding many keys

2012/3/22 Michael Hunger <michael...@neotechnology.com>



--
Mattias Persson, [mat...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
Reply all
Reply to author
Forward
0 new messages