Received: by 10.66.90.102 with SMTP id bv6mr370819pab.34.1349255650580; Wed, 03 Oct 2012 02:14:10 -0700 (PDT) X-BeenThere: neo4j@googlegroups.com Received: by 10.68.197.72 with SMTP id is8ls5429907pbc.5.gmail; Wed, 03 Oct 2012 02:14:09 -0700 (PDT) Received: by 10.68.197.70 with SMTP id is6mr1078045pbc.14.1349255649368; Wed, 03 Oct 2012 02:14:09 -0700 (PDT) Date: Wed, 3 Oct 2012 02:14:08 -0700 (PDT) From: Gergely Svigruha To: neo4j@googlegroups.com Message-Id: In-Reply-To: References: <04166db6-6b19-45ab-a1b7-5de59c0b2508@googlegroups.com> <5b2fe343-d860-435a-b9d7-0ad674d4e603@googlegroups.com> <76D3C19B-DE33-418D-97F8-D95D4C62432E@neotechnology.com> Subject: Re: [Neo4j] Re: loading huge graphs MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1084_24183253.1349255648932" ------=_Part_1084_24183253.1349255648932 Content-Type: multipart/alternative; boundary="----=_Part_1085_3101695.1349255648932" ------=_Part_1085_3101695.1349255648932 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I put them in the finally block but the problem still occurs...I even=20 started a new db so I think there cannot be any leftover locks...The=20 problem always occurs at the same edge (row in the input file).=20 Yes, this is Windows 7. 2012. okt=C3=B3ber 3., szerda 15:43:15 UTC+7 id=C5=91pontban Michael Hunger= a=20 k=C3=B6vetkez=C5=91t =C3=ADrta: > > Probably a leftover file lock from a previous run. > > Try to do the close of the readers and shutdown of db in try ... Finally > > Is this windows? > > Sent from mobile device > > Am 03.10.2012 um 09:51 schrieb Gergely Svigruha > >: > > This is the code I use. Unfortunately the input I have doesn't contain=20 > node ids as requested in the CSV importer previously recommended, so I ha= ve=20 > to create the id's myself. I have a previous version which reads the inpu= t=20 > only once and creates the nodes and edges simultaneously but I had the sa= me=20 > error with that after ~30M edges / 90M. > > import java.io.BufferedReader; > import java.io.File; > import java.io.FileNotFoundException; > import java.io.FileReader; > import java.io.IOException; > import java.util.HashMap; > import java.util.Map; > > import org.neo4j.graphdb.RelationshipType; > import org.neo4j.helpers.collection.MapUtil; > import org.neo4j.kernel.impl.util.FileUtils; > import org.neo4j.unsafe.batchinsert.BatchInserter; > import org.neo4j.unsafe.batchinsert.BatchInserters; > > public class GraphImporter_v2 { > > private long nodeIdx=3D0; > private Map idxMap =3D new HashMap(); > enum RelType implements RelationshipType { > CALLS > } > private void createNode(long pnum, BatchInserter db) { > if(!idxMap.containsKey(pnum)) { > nodeIdx++; > idxMap.put(pnum, nodeIdx); > Map prop =3D new HashMap(); > prop.put("Number", pnum); > db.createNode(nodeIdx, prop); > } > } > private long getNodeNum(long pnum) throws Exception { > if(idxMap.containsKey(pnum)) { > return idxMap.get(pnum); > } else { > throw new Exception("Missing number: "+pnum); > } > } > public static void main(String[] args) { > GraphImporter_v2 importer =3D new GraphImporter_v2(); > importer.load(args[0], args[1]); > } > > private void load(String inputFile, String dbpath) { > try { > File graphDb =3D new File(dbpath); > if (graphDb.exists()) { > FileUtils.deleteRecursively(graphDb); > } > long edges =3D 0; > long errorRows =3D 0; > Map config =3D new HashMap(); > config =3D MapUtil.load( new File( "batch.properties" ) ); > BatchInserter db =3D BatchInserters.inserter(dbpath, config); > BufferedReader reader =3D new BufferedReader(new FileReader(new=20 > File(inputFile))); > reader.readLine(); > String line =3D null; > while ((line =3D reader.readLine()) !=3D null) { > String[] lineData =3D line.split(","); > try { > createNode(Long.valueOf(lineData[0].replace("\"", "")), db); > createNode(Long.valueOf(lineData[1].replace("\"", "")), db); > } catch (NumberFormatException e) { > errorRows++; > } > edges++; > =20 > } > =20 > System.out.println("Total edges: "+edges); > System.out.println("Error edges: "+errorRows); > reader.close(); > reader =3D new BufferedReader(new FileReader(new File(inputFile))); > System.out.println("Loading edges.."); > long node1 =3D 0; > long node2 =3D 0; > reader.readLine(); > line =3D null; > while ((line =3D reader.readLine()) !=3D null) { > String[] lineData =3D line.split(","); > try { > node1 =3D getNodeNum(Long.valueOf(lineData[0].replace("\"", ""))); > node2 =3D getNodeNum(Long.valueOf(lineData[1].replace("\"", ""))); > Map prop =3D new HashMap(); > prop.put("Duration", Integer.valueOf(lineData[2])); > prop.put("Cnt", Integer.valueOf(lineData[3])); > prop.put("Charge", Integer.valueOf(lineData[4])); > db.createRelationship(node1, node2, RelType.CALLS, prop); > } catch (NumberFormatException e) {} =20 > } > =20 > db.shutdown(); > reader.close(); > } catch (FileNotFoundException e) { > e.printStackTrace(); > } catch (IOException e) { > e.printStackTrace(); > } catch (Throwable e) { > e.printStackTrace(); > } > } > } > > > 2012. okt=C3=B3ber 3., szerda 14:27:31 UTC+7 id=C5=91pontban Michael Hung= er a=20 > k=C3=B6vetkez=C5=91t =C3=ADrta: >> >> Can you share the code you used. >> >> Michael >> >> Am 03.10.2012 um 07:38 schrieb Gergely Svigruha: >> >> I just had another issue. After creating the nodes I try to create the= =20 >> edges using the BatchInserter.createRelationShip(nodeId1, nodeId2,=20 >> relationType, properties) function but got an exception: >> java.io.IOException: The process cannot access the file because another= =20 >> process ha locked the portion of the file. >> >> Can you help me what can be the cause of this?=20 >> Thanks. >> >> Greg >> >> 2012. okt=C3=B3ber 3., szerda 8:51:18 UTC+7 id=C5=91pontban Gergely Svig= ruha a=20 >> k=C3=B6vetkez=C5=91t =C3=ADrta: >>> >>> Thank you, I think what I need is the batch - CSV importer:) >>> >>> Greg >>> >>> 2012. okt=C3=B3ber 3., szerda 2:24:09 UTC+7 id=C5=91pontban Peter Neuba= uer a=20 >>> k=C3=B6vetkez=C5=91t =C3=ADrta: >>>> >>>> Thanks James,=20 >>>> I had written exactly the same answer but it got stuck in the outbox := )=20 >>>> >>>> Cheers,=20 >>>> >>>> /peter neubauer=20 >>>> >>>> G: neubauer.peter=20 >>>> S: peter.neubauer=20 >>>> P: +46 704 106975=20 >>>> L: http://www.linkedin.com/in/neubauer=20 >>>> T: @peterneubauer=20 >>>> >>>> Wanna learn something new? Come to http://graphconnect.com=20 >>>> >>>> >>>> On Tue, Oct 2, 2012 at 6:27 PM, James Thornton = =20 >>>> wrote:=20 >>>> > If you are preprocessing your data, I typically use Redis. And the= =20 >>>> fastest=20 >>>> > way to load data into Neo4j is to use the batch importer=20 >>>> > (https://github.com/jexp/batch-import), which imports from a CSV=20 >>>> file.=20 >>>> >=20 >>>> > - James=20 >>>> >=20 >>>> >=20 >>>> > On Tuesday, October 2, 2012 9:16:15 AM UTC-5, Gergely Svigruha wrote= :=20 >>>> >>=20 >>>> >> Hi,=20 >>>> >>=20 >>>> >> I have a question regarding loading huge graphs into the Neo4j DB.= =20 >>>> I've=20 >>>> >> read that Neo4j is capable of handling even a few billion vertices.= =20 >>>> On the=20 >>>> >> other hand all the code examples I've found use a cache (tipically= =20 >>>> >> java.util.HashMap) for the nodes when loading the edges and I'm not= =20 >>>> sure=20 >>>> >> that Java can handle such a big HashMap. So what is the best way to= =20 >>>> load=20 >>>> >> such a huge graph?=20 >>>> >>=20 >>>> >> Thanks!=20 >>>> >>=20 >>>> >> Greg=20 >>>> >=20 >>>> > --=20 >>>> >=20 >>>> >=20 >>>> >>> >> --=20 >> =20 >> =20 >> >> >> --=20 > =20 > =20 > > ------=_Part_1085_3101695.1349255648932 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable I put them in the finally block but the problem still occurs...I even start= ed a new db so I think there cannot be any leftover locks...The problem alw= ays occurs at the same edge (row in the input file). 

Yes, this is Windows 7.

2012. okt=C3=B3ber 3., szerda 15:43:15 UT= C+7 id=C5=91pontban Michael Hunger a k=C3=B6vetkez=C5=91t =C3=ADrta:
Probably a lef= tover file lock from a previous run.

Try to do the= close of the readers and shutdown of db in try ... Finally

<= /div>
Is this windows?

Sent from mobile device

Am = 03.10.2012 um 09:51 schrieb Gergely Svigruha <sger...@gmail.com>:

This is the code I use. = Unfortunately the input I have doesn't contain node ids as requested in the= CSV importer previously recommended, so I have to create th= e id's myself. I have a previous version which reads the input only once an= d creates the nodes and edges simultaneously but I had the same error = with that after ~30M edges / 90M.

import java.= io.BufferedReader;
import java.io.File;
import java.io.= FileNotFoundException;
import java.io.FileReader;
impor= t java.io.IOException;
import java.util.HashMap;
import= java.util.Map;

import org.neo4j.graphdb.Rela= tionshipType;
import org.neo4j.helpers.collection.MapUtil;
import org.neo4j.kernel.impl.util.FileUtils;
import= org.neo4j.unsafe.batchinsert.BatchInserter;
import org.neo4= j.unsafe.batchinsert.BatchInserters;

public c= lass GraphImporter_v2 {

=09private long nodeIdx=3D0;
=09private Map<Long,Long> idxMap =3D new HashMap<L= ong, Long>();
=09
=
=09enum RelType implements Rela= tionshipType {
=09=09CALLS=
=09}
=09
= =09private void createNode(long pnum, BatchInserter db) {
= =09=09if(!idxMap.containsKey(pnum)) = {
=09=09=09nodeIdx++;
=09=09=09idxMap.put(pnum, &nbs= p;nodeIdx);
=09=09=09Map&l= t;String, Object> prop =3D new HashMap<String, Object>();
=09=09=09prop.put("Number", pnum);=
=09=09    db.cr= eateNode(nodeIdx, prop);
=09=09}
=09}
=09
=09private long getNodeNum(long pnum) throws Exception {
=
=09=09if(idxMap.containsKey(pnu= m)) {
=09=09=09return idxM= ap.get(pnum);
=09=09} else= {
=09=09=09throw new Exce= ption("Missing number: "+pnum);
= =09=09}
=09}
<= div>=09
=09public static void main(String[] args) {
= =09=09
=09=09GraphImporter_v2 importer =3D new GraphImporter_v= 2();
=09=09importer.load(a= rgs[0], args[1]);
=09}

=09private voi= d load(String inputFile, String dbpath) {
=09=09try {
=09= =09=09File graphDb =3D new File(dbpath);
=09=09=09if (graphDb.exists()) {
=09            F= ileUtils.deleteRecursively(graphDb);
=09        }
=09=09=09
= =09=09=09long edges =3D 0;
=09=09=09long errorRows =3D 0;
=09=09=09Map<String, String> config =3D new HashMap<S= tring, String>();
=09 &= nbsp;  =09config =3D MapUtil.lo= ad( new File( "batch.properties" ) );
=09    =09Batc= hInserter db =3D  BatchInserters.inserter(dbpath, config);
<= div>=09=09=09
=09=09=09BufferedReader reader =3D new Buffered= Reader(new FileReader(new File(inputFile)));
=09=09=09reader.readLine();
    &nbs= p;       String line =3D null;
=09=09=09while ((line =3D reader.readLine()) !=3D null)= {
=09=09=09=09String[] li= neData =3D line.split(",");
=09= =09=09=09try {
=09=09=09= =09=09createNode(Long.valueOf(lineData[0].replace("\"", "")), d= b);
=09=09=09=09=09createN= ode(Long.valueOf(lineData[1].replace("\"", "")), db);
=09=09=09=09} catch (NumberFormatException= e) {
=09=09=09=09=09error= Rows++;
=09=09=09=09}
=09=09=09=09
=09           &nbs= p;edges++;
=09   &nbs= p;        
= =09        }
=09      
=09=09=09System.out.println("Total edges: "+edges);
= =09=09=09System.out.println("Error e= dges: "+errorRows);
=09=09=09
=09=09=09reader.close(= );
=09=09=09reader =3D new= BufferedReader(new FileReader(new File(inputFile)));
=09=09=09System.out.println("Loading edges..");=
=09=09=09long node1 =3D 0= ;
=09=09=09long node2 =3D = 0;
=09=09=09reader.readLin= e();
            line =3D null;
=09=09=09while ((line =3D rea= der.readLine()) !=3D null) {
=09= =09=09=09String[] lineData =3D line.split(",");
=09=09=09=09try {
=09=09=09=09=09node1 =3D getNodeNum(Long.valueOf(= lineData[0].replace("\"", "")));
= =09=09=09=09=09node2 =3D getNodeNum(Long.valueOf(lineData[1].re= place("\"", "")));
=09=09=09=09= =09Map<String, Object> prop =3D new HashMap<String, Object&= gt;();
=09=09=09=09=09prop= .put("Duration", Integer.valueOf(lineData[2]));
=09=09=09=09=09prop.put("Cnt", Integer.valueOf(lineDa= ta[3]));
=09=09=09=09=09pr= op.put("Charge", Integer.valueOf(lineData[4]));
=09=09            db.cr= eateRelationship(node1, node2, RelType.CALLS, prop);
=09=09=09=09} catch (NumberFormatException e) {= }       
=09<= /span>        }
=09      
=09=09=09db.shutdown();
= =09=09=09reader.close();
= =09=09} catch (FileNotFoundException e) {
=09=09=09e.printStackTrace();
=09=09} catch (IOException e) {
=09=09=09e.printStackTrace();
<= span style=3D"white-space:pre">=09=09} catch (Throwable e) {
=09=09=09e.printStackTrace();
=09=09}
=09
= =09}
=09
= }


2012. okt=C3=B3ber 3., szerda 14:27:31 UTC+7 id= =C5=91pontban Michael Hunger a k=C3=B6vetkez=C5=91t =C3=ADrta:
Can you shar= e the code you used.

Michael

Am = 03.10.2012 um 07:38 schrieb Gergely Svigruha:

I just had another issue. After creating the nodes I try to create th= e edges using the BatchInserter.createRelationShip(nodeId1, nodeId2, r= elationType, properties) function but got an exception:
java.io.IOExcep= tion: The process cannot access the file because another process ha locked = the portion of the file.

Can you help me what can be the cause of th= is? 
Thanks.

Greg

2012. okt=C3=B3ber 3., szerd= a 8:51:18 UTC+7 id=C5=91pontban Gergely Svigruha a k=C3=B6vetkez=C5=91t =C3= =ADrta:
Thank you, I think what I ne= ed is the batch - CSV importer:)

Greg

2012. okt= =C3=B3ber 3., szerda 2:24:09 UTC+7 id=C5=91pontban Peter Neubauer a k=C3=B6= vetkez=C5=91t =C3=ADrta:
Thanks Jame= s,
I had written exactly the same answer but it got stuck in the outbox :)

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Wanna learn something new? Come to
http://graphconnect.com


On Tue, Oct 2, 2012 at 6:27 PM, James Thornton <james.t...@gmail.= com> wrote:
> If you are preprocessing your data, I typically use Redis. And the= fastest
> way to load data into Neo4j is to use the batch importer
> (https://github.com/jexp/batch-import), which imports from a CSV = file.
>
> -  James
>
>
> On Tuesday, October 2, 2012 9:16:15 AM UTC-5, Gergely Svigruha wro= te:
>>
>> Hi,
>>
>> I have a question regarding loading huge graphs into the Neo4j= DB. I've
>> read that Neo4j is capable of handling even a few billion vert= ices. On the
>> other hand all the code examples I've found use a cache (tipic= ally
>> java.util.HashMap) for the nodes when loading the edges and I'= m not sure
>> that Java can handle such a big HashMap. So what is the best w= ay to load
>> such a huge graph?
>>
>> Thanks!
>>
>> Greg
>
> --
>
>

--
 
 

--
 
 
------=_Part_1085_3101695.1349255648932-- ------=_Part_1084_24183253.1349255648932--