Re: [Neo4j] loading huge graphs

123 views
Skip to first unread message

Jacob Hansson

unread,
Oct 2, 2012, 11:52:13 AM10/2/12
to ne...@googlegroups.com

The caches in Neo4j have eviction policies, only commonly used data is stored there. So, when storing billions of nodes in Neo4j, only a fraction will sit in the caches.

Sent from my phone, please excuse typos and brievety.

On Oct 2, 2012 4:17 PM, "Gergely Svigruha" <sger...@gmail.com> wrote:
Hi,

I have a question regarding loading huge graphs into the Neo4j DB. I've read that Neo4j is capable of handling even a few billion vertices. On the other hand all the code examples I've found use a cache (tipically java.util.HashMap) for the nodes when loading the edges and I'm not sure that Java can handle such a big HashMap. So what is the best way to load such a huge graph?

Thanks!

Greg

--
 
 

James Thornton

unread,
Oct 2, 2012, 12:27:05 PM10/2/12
to ne...@googlegroups.com
If you are preprocessing your data, I typically use Redis. And the fastest way to load data into Neo4j is to use the batch importer (https://github.com/jexp/batch-import), which imports from a CSV file. 

-  James

Peter Neubauer

unread,
Oct 2, 2012, 3:23:46 PM10/2/12
to ne...@googlegroups.com
Thanks James,
I had written exactly the same answer but it got stuck in the outbox :)

Cheers,

/peter neubauer

G: neubauer.peter
S: peter.neubauer
P: +46 704 106975
L: http://www.linkedin.com/in/neubauer
T: @peterneubauer

Wanna learn something new? Come to http://graphconnect.com
> --
>
>

Gergely Svigruha

unread,
Oct 2, 2012, 9:51:18 PM10/2/12
to ne...@googlegroups.com
Thank you, I think what I need is the batch - CSV importer:)

Greg

Gergely Svigruha

unread,
Oct 3, 2012, 1:38:14 AM10/3/12
to ne...@googlegroups.com
I just had another issue. After creating the nodes I try to create the edges using the BatchInserter.createRelationShip(nodeId1, nodeId2, relationType, properties) function but got an exception:
java.io.IOException: The process cannot access the file because another process ha locked the portion of the file.

Can you help me what can be the cause of this? 
Thanks.

Greg

Michael Hunger

unread,
Oct 3, 2012, 3:27:19 AM10/3/12
to ne...@googlegroups.com
Can you share the code you used.

Michael

--
 
 

Gergely Svigruha

unread,
Oct 3, 2012, 3:51:48 AM10/3/12
to ne...@googlegroups.com
This is the code I use. Unfortunately the input I have doesn't contain node ids as requested in the CSV importer previously recommended, so I have to create the id's myself. I have a previous version which reads the input only once and creates the nodes and edges simultaneously but I had the same error with that after ~30M edges / 90M.

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

import org.neo4j.graphdb.RelationshipType;
import org.neo4j.helpers.collection.MapUtil;
import org.neo4j.kernel.impl.util.FileUtils;
import org.neo4j.unsafe.batchinsert.BatchInserter;
import org.neo4j.unsafe.batchinsert.BatchInserters;

public class GraphImporter_v2 {

private long nodeIdx=0;
private Map<Long,Long> idxMap = new HashMap<Long, Long>();
enum RelType implements RelationshipType {
CALLS
}
private void createNode(long pnum, BatchInserter db) {
if(!idxMap.containsKey(pnum)) {
nodeIdx++;
idxMap.put(pnum,  nodeIdx);
Map<String, Object> prop = new HashMap<String, Object>();
prop.put("Number", pnum);
   db.createNode(nodeIdx, prop);
}
}
private long getNodeNum(long pnum) throws Exception {
if(idxMap.containsKey(pnum)) {
return idxMap.get(pnum);
} else {
throw new Exception("Missing number: "+pnum);
}
}
public static void main(String[] args) {
GraphImporter_v2 importer = new GraphImporter_v2();
importer.load(args[0], args[1]);
}

private void load(String inputFile, String dbpath) {
try {
File graphDb = new File(dbpath);
if (graphDb.exists()) {
           FileUtils.deleteRecursively(graphDb);
       }
long edges = 0;
long errorRows = 0;
Map<String, String> config = new HashMap<String, String>();
    config = MapUtil.load( new File( "batch.properties" ) );
    BatchInserter db =  BatchInserters.inserter(dbpath, config);
BufferedReader reader = new BufferedReader(new FileReader(new File(inputFile)));
reader.readLine();
            String line = null;
while ((line = reader.readLine()) != null) {
String[] lineData = line.split(",");
try {
createNode(Long.valueOf(lineData[0].replace("\"", "")), db);
createNode(Long.valueOf(lineData[1].replace("\"", "")), db);
} catch (NumberFormatException e) {
errorRows++;
}
           edges++;
           
       }
     
System.out.println("Total edges: "+edges);
System.out.println("Error edges: "+errorRows);
reader.close();
reader = new BufferedReader(new FileReader(new File(inputFile)));
System.out.println("Loading edges..");
long node1 = 0;
long node2 = 0;
reader.readLine();
            line = null;
while ((line = reader.readLine()) != null) {
String[] lineData = line.split(",");
try {
node1 = getNodeNum(Long.valueOf(lineData[0].replace("\"", "")));
node2 = getNodeNum(Long.valueOf(lineData[1].replace("\"", "")));
Map<String, Object> prop = new HashMap<String, Object>();
prop.put("Duration", Integer.valueOf(lineData[2]));
prop.put("Cnt", Integer.valueOf(lineData[3]));
prop.put("Charge", Integer.valueOf(lineData[4]));
           db.createRelationship(node1, node2, RelType.CALLS, prop);
} catch (NumberFormatException e) {}       
       }
     
db.shutdown();
reader.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (Throwable e) {
e.printStackTrace();

Michael Hunger

unread,
Oct 3, 2012, 4:43:02 AM10/3/12
to ne...@googlegroups.com
Probably a leftover file lock from a previous run.

Try to do the close of the readers and shutdown of db in try ... Finally

Is this windows?

Sent from mobile device
--
 
 

Gergely Svigruha

unread,
Oct 3, 2012, 5:14:08 AM10/3/12
to ne...@googlegroups.com
I put them in the finally block but the problem still occurs...I even started a new db so I think there cannot be any leftover locks...The problem always occurs at the same edge (row in the input file). 

Yes, this is Windows 7.

Michael Hunger

unread,
Oct 3, 2012, 5:36:27 AM10/3/12
to ne...@googlegroups.com
Do you have the full exception and you graphdb/messages.log ?

Sent from mobile device
--
 
 

Gergely Svigruha

unread,
Oct 3, 2012, 6:05:45 AM10/3/12
to ne...@googlegroups.com
Yes, sorry, I should've started with that. There are 2 exceptions, the first occurs when the program tries to insert the edge.
The second one occurs when the program invokes the db.shutdown() method on BatchInserter db. Unfortunately the message.log is empty I assume because the program was unable to shut down the db. I use neo4j-community-1.8.RC1-windows.

Exception 1:

java.io.IOException: The process cannot access the file because another process
has locked a portion of the file
        at java.nio.MappedByteBuffer.force0(Native Method)
        at java.nio.MappedByteBuffer.force(Unknown Source)
        at org.neo4j.kernel.impl.nioneo.store.MappedPersistenceWindow.force(Mapp
edPersistenceWindow.java:93)
        at org.neo4j.kernel.impl.nioneo.store.LockableWindow.writeOutAndClose(Lo
ckableWindow.java:64)
        at org.neo4j.kernel.impl.nioneo.store.LockableWindow.writeOutAndCloseIfF
ree(LockableWindow.java:146)
        at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.doRefreshBri
cks(PersistenceWindowPool.java:514)
        at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.refreshBrick
s(PersistenceWindowPool.java:460)
        at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.acquire(Pers
istenceWindowPool.java:127)
        at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.acquireWindow(
CommonAbstractStore.java:520)
        at org.neo4j.kernel.impl.nioneo.store.NodeStore.getRecord(NodeStore.java
:76)
        at org.neo4j.unsafe.batchinsert.BatchInserterImpl.getNodeRecord(BatchIns
erterImpl.java:904)
        at org.neo4j.unsafe.batchinsert.BatchInserterImpl.createRelationship(Bat
chInserterImpl.java:455)
        at GraphImporter_v2.load(GraphImporter_v2.java:101)
        at GraphImporter_v2.main(GraphImporter_v2.java:45)
Exception 2:

org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: Unable to close s
tore jakarta_g_db\neostore.nodestore.db
        at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.close(CommonAb
stractStore.java:699)
        at org.neo4j.kernel.impl.nioneo.store.NeoStore.closeStorage(NeoStore.jav
a:236)
        at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.close(CommonAb
stractStore.java:634)
        at org.neo4j.unsafe.batchinsert.BatchInserterImpl.shutdown(BatchInserter
Impl.java:700)
        at GraphImporter_v2.load(GraphImporter_v2.java:120)
        at GraphImporter_v2.main(GraphImporter_v2.java:45)
Caused by: java.io.IOException: The requested operation cannot be performed on a
 file with a user-mapped section open
        at sun.nio.ch.FileDispatcherImpl.truncate0(Native Method)
        at sun.nio.ch.FileDispatcherImpl.truncate(Unknown Source)
        at sun.nio.ch.FileChannelImpl.truncate(Unknown Source)
        at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.close(CommonAb
stractStore.java:669)
        ... 5 more

The modified code:

BatchInserter db =  null;
BufferedReader reader = null;
try {
File graphDb = new File(dbpath);
if (graphDb.exists()) {
           FileUtils.deleteRecursively(graphDb);
       }
long edges = 0;
long errorRows = 0;
Map<String, String> config = new HashMap<String, String>();
    config = MapUtil.load( new File( "batch.properties" ) );
    db = BatchInserters.inserter(dbpath, config);
reader = new BufferedReader(new FileReader(new File(inputFile)));
reader.readLine();
            String line = null;
while ((line = reader.readLine()) != null) {
String[] lineData = line.split(",");
try {
createNode(Long.valueOf(lineData[0].replace("\"", "")), db);
createNode(Long.valueOf(lineData[1].replace("\"", "")), db);
} catch (NumberFormatException e) {
errorRows++;
}
edges++;
           if(edges%1000000==0) {
            System.out.println("Edges: "+edges+"("+errorRows+"); Nodes: "+nodeIdx);
           }
           
       }
     
System.out.println("Total edges: "+edges);
System.out.println("Error edges: "+errorRows);
reader.close();
reader = new BufferedReader(new FileReader(new File(inputFile)));
System.out.println("Loading edges..");
long node1 = 0;
long node2 = 0;
reader.readLine();
edges = 0;
            line = null;
while ((line = reader.readLine()) != null) {
String[] lineData = line.split(",");
try {
node1 = getNodeNum(Long.valueOf(lineData[0].replace("\"", "")));
node2 = getNodeNum(Long.valueOf(lineData[1].replace("\"", "")));
Map<String, Object> prop = new HashMap<String, Object>();
prop.put("Duration", Integer.valueOf(lineData[2]));
prop.put("Cnt", Integer.valueOf(lineData[3]));
prop.put("Charge", Integer.valueOf(lineData[4]));
           db.createRelationship(node1, node2, RelType.CALLS, prop);
} catch (NumberFormatException e) {}       
edges++;
           if(edges%1000000==0) {
            System.out.println("Edges: "+edges+"; Nodes: "+nodeIdx);
           }
       }
     
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (Throwable e) {
e.printStackTrace();
} finally {
try {
if(db != null) {
db.shutdown();
}
if(reader != null) {
reader.close();
}
} catch (Throwable e) {
e.printStackTrace();
}
}
}
}
Reply all
Reply to author
Forward
0 new messages