Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
loading huge graphs
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  12 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Gergely Svigruha  
View profile  
 More options Oct 2 2012, 10:16 am
From: Gergely Svigruha <sgerg...@gmail.com>
Date: Tue, 2 Oct 2012 07:16:15 -0700 (PDT)
Local: Tues, Oct 2 2012 10:16 am
Subject: loading huge graphs

Hi,

I have a question regarding loading huge graphs into the Neo4j DB. I've
read that Neo4j is capable of handling even a few billion vertices. On the
other hand all the code examples I've found use a cache (tipically
java.util.HashMap) for the nodes when loading the edges and I'm not sure
that Java can handle such a big HashMap. So what is the best way to load
such a huge graph?

Thanks!

Greg


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jacob Hansson  
View profile  
 More options Oct 2 2012, 11:52 am
From: Jacob Hansson <jacob.hans...@neotechnology.com>
Date: Tue, 2 Oct 2012 17:52:13 +0200
Local: Tues, Oct 2 2012 11:52 am
Subject: Re: [Neo4j] loading huge graphs

The caches in Neo4j have eviction policies, only commonly used data is
stored there. So, when storing billions of nodes in Neo4j, only a fraction
will sit in the caches.

Sent from my phone, please excuse typos and brievety.
On Oct 2, 2012 4:17 PM, "Gergely Svigruha" <sgerg...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Thornton  
View profile  
 More options Oct 2 2012, 12:27 pm
From: James Thornton <james.thorn...@gmail.com>
Date: Tue, 2 Oct 2012 09:27:05 -0700 (PDT)
Local: Tues, Oct 2 2012 12:27 pm
Subject: Re: loading huge graphs

If you are preprocessing your data, I typically use Redis. And the fastest
way to load data into Neo4j is to use the batch importer (
https://github.com/jexp/batch-import), which imports from a CSV file.

-  James


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Neubauer  
View profile  
 More options Oct 2 2012, 3:24 pm
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Tue, 2 Oct 2012 21:23:46 +0200
Local: Tues, Oct 2 2012 3:23 pm
Subject: Re: [Neo4j] Re: loading huge graphs
Thanks James,
I had written exactly the same answer but it got stuck in the outbox :)

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Wanna learn something new? Come to http://graphconnect.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gergely Svigruha  
View profile  
 More options Oct 2 2012, 9:51 pm
From: Gergely Svigruha <sgerg...@gmail.com>
Date: Tue, 2 Oct 2012 18:51:18 -0700 (PDT)
Local: Tues, Oct 2 2012 9:51 pm
Subject: Re: [Neo4j] Re: loading huge graphs

Thank you, I think what I need is the batch - CSV importer:)

Greg

2012. október 3., szerda 2:24:09 UTC+7 időpontban Peter Neubauer a
következőt írta:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gergely Svigruha  
View profile  
 More options Oct 3 2012, 1:38 am
From: Gergely Svigruha <sgerg...@gmail.com>
Date: Tue, 2 Oct 2012 22:38:14 -0700 (PDT)
Local: Wed, Oct 3 2012 1:38 am
Subject: Re: [Neo4j] Re: loading huge graphs

I just had another issue. After creating the nodes I try to create the
edges using the BatchInserter.createRelationShip(nodeId1, nodeId2,
relationType, properties) function but got an exception:
java.io.IOException: The process cannot access the file because another
process ha locked the portion of the file.

Can you help me what can be the cause of this?
Thanks.

Greg

2012. október 3., szerda 8:51:18 UTC+7 időpontban Gergely Svigruha a
következőt írta:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Hunger  
View profile  
 More options Oct 3 2012, 3:27 am
From: Michael Hunger <michael.hun...@neotechnology.com>
Date: Wed, 3 Oct 2012 09:27:19 +0200
Local: Wed, Oct 3 2012 3:27 am
Subject: Re: [Neo4j] Re: loading huge graphs

Can you share the code you used.

Michael

Am 03.10.2012 um 07:38 schrieb Gergely Svigruha:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gergely Svigruha  
View profile  
 More options Oct 3 2012, 3:51 am
From: Gergely Svigruha <sgerg...@gmail.com>
Date: Wed, 3 Oct 2012 00:51:48 -0700 (PDT)
Local: Wed, Oct 3 2012 3:51 am
Subject: Re: [Neo4j] Re: loading huge graphs

This is the code I use. Unfortunately the input I have doesn't contain node
ids as requested in the CSV importer previously recommended, so I have to
create the id's myself. I have a previous version which reads the input
only once and creates the nodes and edges simultaneously but I had the same
error with that after ~30M edges / 90M.

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

import org.neo4j.graphdb.RelationshipType;
import org.neo4j.helpers.collection.MapUtil;
import org.neo4j.kernel.impl.util.FileUtils;
import org.neo4j.unsafe.batchinsert.BatchInserter;
import org.neo4j.unsafe.batchinsert.BatchInserters;

public class GraphImporter_v2 {

private long nodeIdx=0;
private Map<Long,Long> idxMap = new HashMap<Long, Long>();
 enum RelType implements RelationshipType {
CALLS

}

 private void createNode(long pnum, BatchInserter db) {
if(!idxMap.containsKey(pnum)) {
nodeIdx++;
idxMap.put(pnum,  nodeIdx);
Map<String, Object> prop = new HashMap<String, Object>();
prop.put("Number", pnum);
    db.createNode(nodeIdx, prop);
}
}

 private long getNodeNum(long pnum) throws Exception {
if(idxMap.containsKey(pnum)) {
return idxMap.get(pnum);
} else {

throw new Exception("Missing number: "+pnum);
}
}

 public static void main(String[] args) {
 GraphImporter_v2 importer = new GraphImporter_v2();
importer.load(args[0], args[1]);

}

private void load(String inputFile, String dbpath) {
try {
File graphDb = new File(dbpath);
if (graphDb.exists()) {
            FileUtils.deleteRecursively(graphDb);
        }
 long edges = 0;
long errorRows = 0;
Map<String, String> config = new HashMap<String, String>();
     config = MapUtil.load( new File( "batch.properties" ) );
     BatchInserter db =  BatchInserters.inserter(dbpath, config);
 BufferedReader reader = new BufferedReader(new FileReader(new
File(inputFile)));
reader.readLine();
            String line = null;
while ((line = reader.readLine()) != null) {
String[] lineData = line.split(",");
try {
createNode(Long.valueOf(lineData[0].replace("\"", "")), db);
createNode(Long.valueOf(lineData[1].replace("\"", "")), db);
} catch (NumberFormatException e) {
errorRows++;
}

            edges++;

        }

System.out.println("Total edges: "+edges);
System.out.println("Error edges: "+errorRows);
 reader.close();
reader = new BufferedReader(new FileReader(new File(inputFile)));
System.out.println("Loading edges..");
long node1 = 0;
long node2 = 0;
reader.readLine();
            line = null;
while ((line = reader.readLine()) != null) {
String[] lineData = line.split(",");
try {
node1 = getNodeNum(Long.valueOf(lineData[0].replace("\"", "")));
node2 = getNodeNum(Long.valueOf(lineData[1].replace("\"", "")));
Map<String, Object> prop = new HashMap<String, Object>();
prop.put("Duration", Integer.valueOf(lineData[2]));
prop.put("Cnt", Integer.valueOf(lineData[3]));
prop.put("Charge", Integer.valueOf(lineData[4]));
            db.createRelationship(node1, node2, RelType.CALLS, prop);

} catch (NumberFormatException e) {}      

        }

db.shutdown();
reader.close();

} catch (FileNotFoundException e) {

e.printStackTrace();
} catch (IOException e) {

e.printStackTrace();
} catch (Throwable e) {

e.printStackTrace();

}
 }
}

2012. október 3., szerda 14:27:31 UTC+7 időpontban Michael Hunger a
következőt írta:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Hunger  
View profile  
 More options Oct 3 2012, 4:43 am
From: Michael Hunger <michael.hun...@neopersistence.com>
Date: Wed, 3 Oct 2012 10:43:02 +0200
Local: Wed, Oct 3 2012 4:43 am
Subject: Re: [Neo4j] Re: loading huge graphs

Probably a leftover file lock from a previous run.

Try to do the close of the readers and shutdown of db in try ... Finally

Is this windows?

Sent from mobile device

Am 03.10.2012 um 09:51 schrieb Gergely Svigruha <sgerg...@gmail.com>:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gergely Svigruha  
View profile  
 More options Oct 3 2012, 5:14 am
From: Gergely Svigruha <sgerg...@gmail.com>
Date: Wed, 3 Oct 2012 02:14:08 -0700 (PDT)
Local: Wed, Oct 3 2012 5:14 am
Subject: Re: [Neo4j] Re: loading huge graphs

I put them in the finally block but the problem still occurs...I even
started a new db so I think there cannot be any leftover locks...The
problem always occurs at the same edge (row in the input file).

Yes, this is Windows 7.

2012. október 3., szerda 15:43:15 UTC+7 időpontban Michael Hunger a
következőt írta:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Hunger  
View profile  
 More options Oct 3 2012, 5:36 am
From: Michael Hunger <michael.hun...@neopersistence.com>
Date: Wed, 3 Oct 2012 11:36:27 +0200
Local: Wed, Oct 3 2012 5:36 am
Subject: Re: [Neo4j] Re: loading huge graphs

Do you have the full exception and you graphdb/messages.log ?

Sent from mobile device

Am 03.10.2012 um 11:14 schrieb Gergely Svigruha <sgerg...@gmail.com>:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gergely Svigruha  
View profile  
 More options Oct 3 2012, 6:05 am
From: Gergely Svigruha <sgerg...@gmail.com>
Date: Wed, 3 Oct 2012 03:05:45 -0700 (PDT)
Local: Wed, Oct 3 2012 6:05 am
Subject: Re: [Neo4j] Re: loading huge graphs

Yes, sorry, I should've started with that. There are 2 exceptions, the
first occurs when the program tries to insert the edge.
The second one occurs when the program invokes the db.shutdown() method on
BatchInserter db. Unfortunately the message.log is empty I assume because
the program was unable to shut down the db. I
use neo4j-community-1.8.RC1-windows.

*Exception 1:*

java.io.IOException: The process cannot access the file because another
process
has locked a portion of the file
        at java.nio.MappedByteBuffer.force0(Native Method)
        at java.nio.MappedByteBuffer.force(Unknown Source)
        at
org.neo4j.kernel.impl.nioneo.store.MappedPersistenceWindow.force(Mapp
edPersistenceWindow.java:93)
        at
org.neo4j.kernel.impl.nioneo.store.LockableWindow.writeOutAndClose(Lo
ckableWindow.java:64)
        at
org.neo4j.kernel.impl.nioneo.store.LockableWindow.writeOutAndCloseIfF
ree(LockableWindow.java:146)
        at
org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.doRefreshBri
cks(PersistenceWindowPool.java:514)
        at
org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.refreshBrick
s(PersistenceWindowPool.java:460)
        at
org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.acquire(Pers
istenceWindowPool.java:127)
        at
org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.acquireWindow(
CommonAbstractStore.java:520)
        at
org.neo4j.kernel.impl.nioneo.store.NodeStore.getRecord(NodeStore.java
:76)
        at
org.neo4j.unsafe.batchinsert.BatchInserterImpl.getNodeRecord(BatchIns
erterImpl.java:904)
        at
org.neo4j.unsafe.batchinsert.BatchInserterImpl.createRelationship(Bat
chInserterImpl.java:455)
        at GraphImporter_v2.load(GraphImporter_v2.java:101)
        at GraphImporter_v2.main(GraphImporter_v2.java:45)
*Exception 2:*

org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: Unable to
close s
tore jakarta_g_db\neostore.nodestore.db
        at
org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.close(CommonAb
stractStore.java:699)
        at
org.neo4j.kernel.impl.nioneo.store.NeoStore.closeStorage(NeoStore.jav
a:236)
        at
org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.close(CommonAb
stractStore.java:634)
        at
org.neo4j.unsafe.batchinsert.BatchInserterImpl.shutdown(BatchInserter
Impl.java:700)
        at GraphImporter_v2.load(GraphImporter_v2.java:120)
        at GraphImporter_v2.main(GraphImporter_v2.java:45)
Caused by: java.io.IOException: The requested operation cannot be performed
on a
 file with a user-mapped section open
        at sun.nio.ch.FileDispatcherImpl.truncate0(Native Method)
        at sun.nio.ch.FileDispatcherImpl.truncate(Unknown Source)
        at sun.nio.ch.FileChannelImpl.truncate(Unknown Source)
        at
org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.close(CommonAb
stractStore.java:669)
        ... 5 more

*The modified code:*

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

import org.neo4j.graphdb.RelationshipType;
import org.neo4j.helpers.collection.MapUtil;
import org.neo4j.kernel.impl.util.FileUtils;
import org.neo4j.unsafe.batchinsert.BatchInserter;
import org.neo4j.unsafe.batchinsert.BatchInserters;

public class GraphImporter_v2 {

private long nodeIdx=0;
private Map<Long,Long> idxMap = new HashMap<Long, Long>();
 enum RelType implements RelationshipType {
CALLS

}

 private void createNode(long pnum, BatchInserter db) {
if(!idxMap.containsKey(pnum)) {
nodeIdx++;
idxMap.put(pnum,  nodeIdx);
Map<String, Object> prop = new HashMap<String, Object>();
prop.put("Number", pnum);
    db.createNode(nodeIdx, prop);
}
}

 private long getNodeNum(long pnum) throws Exception {
if(idxMap.containsKey(pnum)) {
return idxMap.get(pnum);
} else {

throw new Exception("Missing number: "+pnum);
}
}

 public static void main(String[] args) {
 GraphImporter_v2 importer = new GraphImporter_v2();
importer.load(args[0], args[1]);

}

private void load(String inputFile, String dbpath) {
BatchInserter db =  null;
BufferedReader reader = null;
try {
File graphDb = new File(dbpath);
if (graphDb.exists()) {
            FileUtils.deleteRecursively(graphDb);
        }
 long edges = 0;
long errorRows = 0;
Map<String, String> config = new HashMap<String, String>();
     config = MapUtil.load( new File( "batch.properties" ) );
     db = BatchInserters.inserter(dbpath, config);
reader = new BufferedReader(new FileReader(new File(inputFile)));
reader.readLine();
            String line = null;
while ((line = reader.readLine()) != null) {
String[] lineData = line.split(",");
try {
createNode(Long.valueOf(lineData[0].replace("\"", "")), db);
createNode(Long.valueOf(lineData[1].replace("\"", "")), db);
} catch (NumberFormatException e) {
errorRows++;
}

 edges++;
            if(edges%1000000==0) {
             System.out.println("Edges: "+edges+"("+errorRows+"); Nodes:
"+nodeIdx);
            }

        }

System.out.println("Total edges: "+edges);
System.out.println("Error edges: "+errorRows);
 reader.close();
reader = new BufferedReader(new FileReader(new File(inputFile)));
System.out.println("Loading edges..");
long node1 = 0;
long node2 = 0;
reader.readLine();
edges = 0;
            line = null;
while ((line = reader.readLine()) != null) {
String[] lineData = line.split(",");
try {
node1 = getNodeNum(Long.valueOf(lineData[0].replace("\"", "")));
node2 = getNodeNum(Long.valueOf(lineData[1].replace("\"", "")));
Map<String, Object> prop = new HashMap<String, Object>();
prop.put("Duration", Integer.valueOf(lineData[2]));
prop.put("Cnt", Integer.valueOf(lineData[3]));
prop.put("Charge", Integer.valueOf(lineData[4]));
            db.createRelationship(node1, node2, RelType.CALLS, prop);

} catch (NumberFormatException e) {}      

 edges++;
            if(edges%1000000==0) {
             System.out.println("Edges: "+edges+"; Nodes: "+nodeIdx);
            }
        }

 } catch (FileNotFoundException e) {
e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();
} catch (Throwable e) {

e.printStackTrace();
} finally {

try {
if(db != null) {
db.shutdown();
}

if(reader != null) {
reader.close();
}
} catch (Throwable e) {

e.printStackTrace();

}
}
 }
}

2012. október 3., szerda 16:36:35 UTC+7 időpontban Michael Hunger a
következőt írta:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »