I have a question regarding loading huge graphs into the Neo4j DB. I've read that Neo4j is capable of handling even a few billion vertices. On the other hand all the code examples I've found use a cache (tipically java.util.HashMap) for the nodes when loading the edges and I'm not sure that Java can handle such a big HashMap. So what is the best way to load such a huge graph?
The caches in Neo4j have eviction policies, only commonly used data is
stored there. So, when storing billions of nodes in Neo4j, only a fraction
will sit in the caches.
Sent from my phone, please excuse typos and brievety.
On Oct 2, 2012 4:17 PM, "Gergely Svigruha" <sgerg...@gmail.com> wrote:
> I have a question regarding loading huge graphs into the Neo4j DB. I've
> read that Neo4j is capable of handling even a few billion vertices. On the
> other hand all the code examples I've found use a cache (tipically
> java.util.HashMap) for the nodes when loading the edges and I'm not sure
> that Java can handle such a big HashMap. So what is the best way to load
> such a huge graph?
If you are preprocessing your data, I typically use Redis. And the fastest way to load data into Neo4j is to use the batch importer ( https://github.com/jexp/batch-import), which imports from a CSV file.
On Tuesday, October 2, 2012 9:16:15 AM UTC-5, Gergely Svigruha wrote:
> Hi,
> I have a question regarding loading huge graphs into the Neo4j DB. I've > read that Neo4j is capable of handling even a few billion vertices. On the > other hand all the code examples I've found use a cache (tipically > java.util.HashMap) for the nodes when loading the edges and I'm not sure > that Java can handle such a big HashMap. So what is the best way to load > such a huge graph?
On Tue, Oct 2, 2012 at 6:27 PM, James Thornton <james.thorn...@gmail.com> wrote:
> If you are preprocessing your data, I typically use Redis. And the fastest
> way to load data into Neo4j is to use the batch importer
> (https://github.com/jexp/batch-import), which imports from a CSV file.
> - James
> On Tuesday, October 2, 2012 9:16:15 AM UTC-5, Gergely Svigruha wrote:
>> Hi,
>> I have a question regarding loading huge graphs into the Neo4j DB. I've
>> read that Neo4j is capable of handling even a few billion vertices. On the
>> other hand all the code examples I've found use a cache (tipically
>> java.util.HashMap) for the nodes when loading the edges and I'm not sure
>> that Java can handle such a big HashMap. So what is the best way to load
>> such a huge graph?
> On Tue, Oct 2, 2012 at 6:27 PM, James Thornton <james.t...@gmail.com<javascript:>> > wrote: > > If you are preprocessing your data, I typically use Redis. And the > fastest > > way to load data into Neo4j is to use the batch importer > > (https://github.com/jexp/batch-import), which imports from a CSV file.
> > - James
> > On Tuesday, October 2, 2012 9:16:15 AM UTC-5, Gergely Svigruha wrote:
> >> Hi,
> >> I have a question regarding loading huge graphs into the Neo4j DB. I've > >> read that Neo4j is capable of handling even a few billion vertices. On > the > >> other hand all the code examples I've found use a cache (tipically > >> java.util.HashMap) for the nodes when loading the edges and I'm not > sure > >> that Java can handle such a big HashMap. So what is the best way to > load > >> such a huge graph?
I just had another issue. After creating the nodes I try to create the edges using the BatchInserter.createRelationShip(nodeId1, nodeId2, relationType, properties) function but got an exception:
java.io.IOException: The process cannot access the file because another process ha locked the portion of the file.
Can you help me what can be the cause of this? Thanks.
Greg
2012. október 3., szerda 8:51:18 UTC+7 időpontban Gergely Svigruha a következőt írta:
>> On Tue, Oct 2, 2012 at 6:27 PM, James Thornton <james.t...@gmail.com> >> wrote: >> > If you are preprocessing your data, I typically use Redis. And the >> fastest >> > way to load data into Neo4j is to use the batch importer >> > (https://github.com/jexp/batch-import), which imports from a CSV file.
>> > - James
>> > On Tuesday, October 2, 2012 9:16:15 AM UTC-5, Gergely Svigruha wrote:
>> >> Hi,
>> >> I have a question regarding loading huge graphs into the Neo4j DB. >> I've >> >> read that Neo4j is capable of handling even a few billion vertices. On >> the >> >> other hand all the code examples I've found use a cache (tipically >> >> java.util.HashMap) for the nodes when loading the edges and I'm not >> sure >> >> that Java can handle such a big HashMap. So what is the best way to >> load >> >> such a huge graph?
> I just had another issue. After creating the nodes I try to create the edges using the BatchInserter.createRelationShip(nodeId1, nodeId2, relationType, properties) function but got an exception:
> java.io.IOException: The process cannot access the file because another process ha locked the portion of the file.
> Can you help me what can be the cause of this? > Thanks.
> Greg
> 2012. október 3., szerda 8:51:18 UTC+7 időpontban Gergely Svigruha a következőt írta:
> Thank you, I think what I need is the batch - CSV importer:)
> Greg
> 2012. október 3., szerda 2:24:09 UTC+7 időpontban Peter Neubauer a következőt írta:
> Thanks James, > I had written exactly the same answer but it got stuck in the outbox :)
> On Tue, Oct 2, 2012 at 6:27 PM, James Thornton <james.t...@gmail.com> wrote: > > If you are preprocessing your data, I typically use Redis. And the fastest > > way to load data into Neo4j is to use the batch importer > > (https://github.com/jexp/batch-import), which imports from a CSV file.
> > - James
> > On Tuesday, October 2, 2012 9:16:15 AM UTC-5, Gergely Svigruha wrote:
> >> Hi,
> >> I have a question regarding loading huge graphs into the Neo4j DB. I've > >> read that Neo4j is capable of handling even a few billion vertices. On the > >> other hand all the code examples I've found use a cache (tipically > >> java.util.HashMap) for the nodes when loading the edges and I'm not sure > >> that Java can handle such a big HashMap. So what is the best way to load > >> such a huge graph?
This is the code I use. Unfortunately the input I have doesn't contain node ids as requested in the CSV importer previously recommended, so I have to create the id's myself. I have a previous version which reads the input only once and creates the nodes and edges simultaneously but I had the same error with that after ~30M edges / 90M.
> Am 03.10.2012 um 07:38 schrieb Gergely Svigruha:
> I just had another issue. After creating the nodes I try to create the > edges using the BatchInserter.createRelationShip(nodeId1, nodeId2, > relationType, properties) function but got an exception:
> java.io.IOException: The process cannot access the file because another > process ha locked the portion of the file.
> Can you help me what can be the cause of this? > Thanks.
> Greg
> 2012. október 3., szerda 8:51:18 UTC+7 időpontban Gergely Svigruha a > következőt írta:
>> Thank you, I think what I need is the batch - CSV importer:)
>> Greg
>> 2012. október 3., szerda 2:24:09 UTC+7 időpontban Peter Neubauer a >> következőt írta:
>>> Thanks James, >>> I had written exactly the same answer but it got stuck in the outbox :)
>>> On Tue, Oct 2, 2012 at 6:27 PM, James Thornton <james.t...@gmail.com> >>> wrote: >>> > If you are preprocessing your data, I typically use Redis. And the >>> fastest >>> > way to load data into Neo4j is to use the batch importer >>> > (https://github.com/jexp/batch-import), which imports from a CSV >>> file.
>>> > - James
>>> > On Tuesday, October 2, 2012 9:16:15 AM UTC-5, Gergely Svigruha wrote:
>>> >> Hi,
>>> >> I have a question regarding loading huge graphs into the Neo4j DB. >>> I've >>> >> read that Neo4j is capable of handling even a few billion vertices. >>> On the >>> >> other hand all the code examples I've found use a cache (tipically >>> >> java.util.HashMap) for the nodes when loading the edges and I'm not >>> sure >>> >> that Java can handle such a big HashMap. So what is the best way to >>> load >>> >> such a huge graph?
> This is the code I use. Unfortunately the input I have doesn't contain node ids as requested in the CSV importer previously recommended, so I have to create the id's myself. I have a previous version which reads the input only once and creates the nodes and edges simultaneously but I had the same error with that after ~30M edges / 90M.
> 2012. október 3., szerda 14:27:31 UTC+7 időpontban Michael Hunger a következőt írta:
>> Can you share the code you used.
>> Michael
>> Am 03.10.2012 um 07:38 schrieb Gergely Svigruha:
>>> I just had another issue. After creating the nodes I try to create the edges using the BatchInserter.createRelationShip(nodeId1, nodeId2, relationType, properties) function but got an exception:
>>> java.io.IOException: The process cannot access the file because another process ha locked the portion of the file.
>>> Can you help me what can be the cause of this? >>> Thanks.
>>> Greg
>>> 2012. október 3., szerda 8:51:18 UTC+7 időpontban Gergely Svigruha a következőt írta:
>>>> Thank you, I think what I need is the batch - CSV importer:)
>>>> Greg
>>>> 2012. október 3., szerda 2:24:09 UTC+7 időpontban Peter Neubauer a következőt írta:
>>>>> Thanks James, >>>>> I had written exactly the same answer but it got stuck in the outbox :)
>>>>> On Tue, Oct 2, 2012 at 6:27 PM, James Thornton <james.t...@gmail.com> wrote: >>>>> > If you are preprocessing your data, I typically use Redis. And the fastest >>>>> > way to load data into Neo4j is to use the batch importer >>>>> > (https://github.com/jexp/batch-import), which imports from a CSV file.
>>>>> > - James
>>>>> > On Tuesday, October 2, 2012 9:16:15 AM UTC-5, Gergely Svigruha wrote:
>>>>> >> Hi,
>>>>> >> I have a question regarding loading huge graphs into the Neo4j DB. I've >>>>> >> read that Neo4j is capable of handling even a few billion vertices. On the >>>>> >> other hand all the code examples I've found use a cache (tipically >>>>> >> java.util.HashMap) for the nodes when loading the edges and I'm not sure >>>>> >> that Java can handle such a big HashMap. So what is the best way to load >>>>> >> such a huge graph?
I put them in the finally block but the problem still occurs...I even started a new db so I think there cannot be any leftover locks...The problem always occurs at the same edge (row in the input file).
Yes, this is Windows 7.
2012. október 3., szerda 15:43:15 UTC+7 időpontban Michael Hunger a következőt írta:
> Probably a leftover file lock from a previous run.
> Try to do the close of the readers and shutdown of db in try ... Finally
> Is this windows?
> Sent from mobile device
> Am 03.10.2012 um 09:51 schrieb Gergely Svigruha <sger...@gmail.com<javascript:>
> >:
> This is the code I use. Unfortunately the input I have doesn't contain > node ids as requested in the CSV importer previously recommended, so I have > to create the id's myself. I have a previous version which reads the input > only once and creates the nodes and edges simultaneously but I had the same > error with that after ~30M edges / 90M.
> 2012. október 3., szerda 14:27:31 UTC+7 időpontban Michael Hunger a > következőt írta:
>> Can you share the code you used.
>> Michael
>> Am 03.10.2012 um 07:38 schrieb Gergely Svigruha:
>> I just had another issue. After creating the nodes I try to create the >> edges using the BatchInserter.createRelationShip(nodeId1, nodeId2, >> relationType, properties) function but got an exception:
>> java.io.IOException: The process cannot access the file because another >> process ha locked the portion of the file.
>> Can you help me what can be the cause of this? >> Thanks.
>> Greg
>> 2012. október 3., szerda 8:51:18 UTC+7 időpontban Gergely Svigruha a >> következőt írta:
>>> Thank you, I think what I need is the batch - CSV importer:)
>>> Greg
>>> 2012. október 3., szerda 2:24:09 UTC+7 időpontban Peter Neubauer a >>> következőt írta:
>>>> Thanks James, >>>> I had written exactly the same answer but it got stuck in the outbox :)
>>>> On Tue, Oct 2, 2012 at 6:27 PM, James Thornton <james.t...@gmail.com> >>>> wrote: >>>> > If you are preprocessing your data, I typically use Redis. And the >>>> fastest >>>> > way to load data into Neo4j is to use the batch importer >>>> > (https://github.com/jexp/batch-import), which imports from a CSV >>>> file.
>>>> > - James
>>>> > On Tuesday, October 2, 2012 9:16:15 AM UTC-5, Gergely Svigruha wrote:
>>>> >> Hi,
>>>> >> I have a question regarding loading huge graphs into the Neo4j DB. >>>> I've >>>> >> read that Neo4j is capable of handling even a few billion vertices. >>>> On the >>>> >> other hand all the code examples I've found use a cache (tipically >>>> >> java.util.HashMap) for the nodes when loading the edges and I'm not >>>> sure >>>> >> that Java can handle such a big HashMap. So what is the best way to >>>> load >>>> >> such a huge graph?
> I put them in the finally block but the problem still occurs...I even started a new db so I think there cannot be any leftover locks...The problem always occurs at the same edge (row in the input file).
> Yes, this is Windows 7.
> 2012. október 3., szerda 15:43:15 UTC+7 időpontban Michael Hunger a következőt írta:
>> Probably a leftover file lock from a previous run.
>> Try to do the close of the readers and shutdown of db in try ... Finally
>> Is this windows?
>> Sent from mobile device
>> Am 03.10.2012 um 09:51 schrieb Gergely Svigruha <sger...@gmail.com>:
>>> This is the code I use. Unfortunately the input I have doesn't contain node ids as requested in the CSV importer previously recommended, so I have to create the id's myself. I have a previous version which reads the input only once and creates the nodes and edges simultaneously but I had the same error with that after ~30M edges / 90M.
>>> 2012. október 3., szerda 14:27:31 UTC+7 időpontban Michael Hunger a következőt írta:
>>>> Can you share the code you used.
>>>> Michael
>>>> Am 03.10.2012 um 07:38 schrieb Gergely Svigruha:
>>>>> I just had another issue. After creating the nodes I try to create the edges using the BatchInserter.createRelationShip(nodeId1, nodeId2, relationType, properties) function but got an exception:
>>>>> java.io.IOException: The process cannot access the file because another process ha locked the portion of the file.
>>>>> Can you help me what can be the cause of this? >>>>> Thanks.
>>>>> Greg
>>>>> 2012. október 3., szerda 8:51:18 UTC+7 időpontban Gergely Svigruha a következőt írta:
>>>>>> Thank you, I think what I need is the batch - CSV importer:)
>>>>>> Greg
>>>>>> 2012. október 3., szerda 2:24:09 UTC+7 időpontban Peter Neubauer a következőt írta:
>>>>>>> Thanks James, >>>>>>> I had written exactly the same answer but it got stuck in the outbox :)
>>>>>>> On Tue, Oct 2, 2012 at 6:27 PM, James Thornton <james.t...@gmail.com> wrote: >>>>>>> > If you are preprocessing your data, I typically use Redis. And the fastest >>>>>>> > way to load data into Neo4j is to use the batch importer >>>>>>> > (https://github.com/jexp/batch-import), which imports from a CSV file.
>>>>>>> > - James
>>>>>>> > On Tuesday, October 2, 2012 9:16:15 AM UTC-5, Gergely Svigruha wrote:
>>>>>>> >> Hi,
>>>>>>> >> I have a question regarding loading huge graphs into the Neo4j DB. I've >>>>>>> >> read that Neo4j is capable of handling even a few billion vertices. On the >>>>>>> >> other hand all the code examples I've found use a cache (tipically >>>>>>> >> java.util.HashMap) for the nodes when loading the edges and I'm not sure >>>>>>> >> that Java can handle such a big HashMap. So what is the best way to load >>>>>>> >> such a huge graph?
Yes, sorry, I should've started with that. There are 2 exceptions, the first occurs when the program tries to insert the edge.
The second one occurs when the program invokes the db.shutdown() method on BatchInserter db. Unfortunately the message.log is empty I assume because the program was unable to shut down the db. I use neo4j-community-1.8.RC1-windows.
*Exception 1:*
java.io.IOException: The process cannot access the file because another process
has locked a portion of the file
at java.nio.MappedByteBuffer.force0(Native Method)
at java.nio.MappedByteBuffer.force(Unknown Source)
at org.neo4j.kernel.impl.nioneo.store.MappedPersistenceWindow.force(Mapp
edPersistenceWindow.java:93)
at org.neo4j.kernel.impl.nioneo.store.LockableWindow.writeOutAndClose(Lo
ckableWindow.java:64)
at org.neo4j.kernel.impl.nioneo.store.LockableWindow.writeOutAndCloseIfF
ree(LockableWindow.java:146)
at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.doRefreshBri
cks(PersistenceWindowPool.java:514)
at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.refreshBrick
s(PersistenceWindowPool.java:460)
at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.acquire(Pers
istenceWindowPool.java:127)
at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.acquireWindow(
CommonAbstractStore.java:520)
at org.neo4j.kernel.impl.nioneo.store.NodeStore.getRecord(NodeStore.java
:76)
at org.neo4j.unsafe.batchinsert.BatchInserterImpl.getNodeRecord(BatchIns
erterImpl.java:904)
at org.neo4j.unsafe.batchinsert.BatchInserterImpl.createRelationship(Bat
chInserterImpl.java:455)
at GraphImporter_v2.load(GraphImporter_v2.java:101)
at GraphImporter_v2.main(GraphImporter_v2.java:45)
*Exception 2:*
org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: Unable to close s
tore jakarta_g_db\neostore.nodestore.db
at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.close(CommonAb
stractStore.java:699)
at org.neo4j.kernel.impl.nioneo.store.NeoStore.closeStorage(NeoStore.jav
a:236)
at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.close(CommonAb
stractStore.java:634)
at org.neo4j.unsafe.batchinsert.BatchInserterImpl.shutdown(BatchInserter
Impl.java:700)
at GraphImporter_v2.load(GraphImporter_v2.java:120)
at GraphImporter_v2.main(GraphImporter_v2.java:45)
Caused by: java.io.IOException: The requested operation cannot be performed on a
file with a user-mapped section open
at sun.nio.ch.FileDispatcherImpl.truncate0(Native Method)
at sun.nio.ch.FileDispatcherImpl.truncate(Unknown Source)
at sun.nio.ch.FileChannelImpl.truncate(Unknown Source)
at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.close(CommonAb
stractStore.java:669)
... 5 more
> Do you have the full exception and you graphdb/messages.log ?
> Sent from mobile device
> Am 03.10.2012 um 11:14 schrieb Gergely Svigruha <sger...@gmail.com<javascript:>
> >:
> I put them in the finally block but the problem still occurs...I even > started a new db so I think there cannot be any leftover locks...The > problem always occurs at the same edge (row in the input file).
> Yes, this is Windows 7.
> 2012. október 3., szerda 15:43:15 UTC+7 időpontban Michael Hunger a > következőt írta:
>> Probably a leftover file lock from a previous run.
>> Try to do the close of the readers and shutdown of db in try ... Finally
>> Is this windows?
>> Sent from mobile device
>> Am 03.10.2012 um 09:51 schrieb Gergely Svigruha <sger...@gmail.com>:
>> This is the code I use. Unfortunately the input I have doesn't contain >> node ids as requested in the CSV importer previously recommended, so I have >> to create the id's myself. I have a previous version which reads the input >> only once and creates the nodes and edges simultaneously but I had the same >> error with that after ~30M edges / 90M.