Importing DBLP into Neo4j, the database doesn't start....

67 views
Skip to first unread message

Mohammad Hossain Namaki

unread,
Apr 28, 2016, 3:37:42 PM4/28/16
to Neo4j
Dear all,
Hi,
I've wrote a java code to import "dblp" into the Neo4j. It's using the batchInserter to create the dataset. However, the code is stuck at db.shutdown() line after two days run with 20 GB memory. The current neo4j dataset (e.g. dblp.db) is 3.2 GB and the last modified date of files inside of it is for the 2 days ago (but the program didn't finish). However, when I download the dataset and try to run it with Neo4j Server/Java API, it cannot start.

Could you please guide me on that?
It seems that the number of entities to be added is 6,600,000 and number of relationships to be added is 11,550,000.

I've attached the messages.log file and also the source code. I'm using neo4j.2.3.0 jar file.

The summary of the source code at creating section is like this:


Map<String, String> config = new HashMap<String, String>();


                config.put("dbms.pagecache.memory", "50000M");


                config.put("dbms.pagecache.pagesize", "8g");


                config.put("node_auto_indexing", "true");


                db = BatchInserters.inserter("dblp2.db", config);


                indexProvider = new LuceneBatchInserterIndexProvider(db);


                index = indexProvider.nodeIndex("dblpIndex", MapUtil.stringMap("type", "exact"));


                index.setCacheCapacity(KEY_PROPERTY, 500000001);
........

                 Label label = DynamicLabel.label(author);

                                 long nodeId = db.createNode(null, label);


......

                      if ((totalEntity % 50000) == 0) {

                    index.flush();

                       }


.......

                       Long nodeId = db.createNode(props, labels.toArray(new Label[labels.size()]));

.......

                       db.createRelationship(entityKeyNodeMap.get(key), distinctAuthors.get(author), RelTypes.WRITTEN_BY, null);

......

                       if ((totalEntity % 50000) == 0) {

                           index.flush();

                        System.out.println("relationship: " + totalEntity);


     
                  }


                System.out.println("indexProvider shutting down");

                indexProvider.shutdown();

 

                System.out.println("db shutting down");

                db.shutdown();

               System.out.println("program is finished!");


















                        }









messages.log
DBLPImporter.zip

Mohammad Hossain Namaki

unread,
Apr 29, 2016, 5:04:41 PM4/29/16
to Neo4j
Finally, I've stopped the process and run the neo4j server for that database. after taking about 2 hours, it created "neostore.counts.db" file and right now I can run it. However, it's somehow slow when I'm working with that.

I think it's related to the node/relationship index. When I see inside of the index directory it has just "segments_1" and "segments.gen". How can I make some index over the labels of a created graph?

Michael Hunger

unread,
Apr 29, 2016, 10:46:44 PM4/29/16
to ne...@googlegroups.com
You are not supposed to set the pagesize

And leave off that cache capacity

Von meinem iPhone gesendet
--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<messages.log>
<DBLPImporter.zip>

Michael Hunger

unread,
Apr 29, 2016, 10:47:59 PM4/29/16
to ne...@googlegroups.com
Create a normal schema index not a manual lucene index

There is a method to create deferred schema indexes and constraints. 

Von meinem iPhone gesendet

Mohammad Hossain Namaki

unread,
May 2, 2016, 7:19:54 PM5/2/16
to Neo4j
Hi Dear Michael,
Sorry, I cannot understand this:
"You are not supposed to set the pagesize

And leave off that cache capacity"

It means that I should remove those lines related to set the pageSize and cacheSize?

Thanks.


On Thursday, April 28, 2016 at 12:37:42 PM UTC-7, Mohammad Hossain Namaki wrote:
Reply all
Reply to author
Forward
0 new messages