BatchInserter-Intermediate cache to hold reference for domainID-Neoid

26 views
Skip to first unread message

Ganesh Selvaraj

unread,
Jul 10, 2016, 4:26:49 AM7/10/16
to Neo4j
Hi,

I am using batchinserter in my work to import large amounts of data initially.
I am aware that there is a neo4j import tool, but I am not using it, as we do some transformations programatically.

When using batchinserter, after creating nodes, I use a map to store the domainId-Neoid references. And later while creating relationships I lookup the map for every relationship to find the corresponding Neoid.
Now since I am dealing with lot of relationships, the map I use is slow(Tried java map, mapDb maps).
Can some one help with some idea to speed up the lookup process ? I am also keen to know how neo4j import tool (how it handles lookups) works as it is super fast.
Your help in this is much appreciated.

Thanks
Ganesh


Ganesh Selvaraj

unread,
Jul 10, 2016, 6:32:59 AM7/10/16
to Neo4j
Update<> - I tried with batchinsertIndexProvider with both lucene and also mapDB(the same code from here http://www.programcreek.com/java-api-examples/index.php?api=org.neo4j.unsafe.batchinsert.BatchInserters ).
Still the lookup takes so much time. For around 100+ million relationships its taking around 6+ hours.

Michael Hunger

unread,
Jul 10, 2016, 7:53:22 AM7/10/16
to ne...@googlegroups.com
Perhaps you can share your  code?
I think it is rather the too low pagecache config


Von meinem iPhone gesendet
--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ganesh Selvaraj

unread,
Jul 10, 2016, 6:28:34 PM7/10/16
to ne...@googlegroups.com
Hi Michael,

Thanks for your response. Now I used this config with batchinserter, still its very slow.

cache_type=none
use_memory_mapped_buffers=true
# 14 bytes per node
neostore.nodestore.db.mapped_memory=30G
# 33 bytes per relationships
neostore.relationshipstore.db.mapped_memory=30G
# 38 bytes per property
neostore.propertystore.db.mapped_memory=50G
# 60 bytes per long-string block
neostore.propertystore.db.strings.mapped_memory=50G
neostore.propertystore.db.index.keys.mapped_memory=50G
neostore.propertystore.db.index.mapped_memory=50G
dbms.pagecache.memory =50G

*****************************************************************************************************8
Exaplaining my code:(using batchinserterindex)

private BatchInserterIndexProvider indexProvider = new LuceneBatchInserterIndexProvider(getInserter());


1)Index Creation:

BatchInserterIndex index = indexProvider.nodeIndex(indexName, EXACT_CONFIG);
 index.setCacheCapacity("ID", 10000000);

2)Loading to index:
index.add(NeonodeID, keyValPair(ID, "asdasdasdasdasd");

3)using the index for lookup:
index.get("ID", "asdasdasdasdasd").getSingle();


Please let me know if I am missing something here. Your help in this is much appreciated.

Thanks



--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/BnQkwjEfP5k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Cheers
Ganesh

Ganesh Selvaraj

unread,
Jul 11, 2016, 8:36:12 PM7/11/16
to Neo4j
quick question:
is this project https://github.com/jexp/batch-import  same as the neo4j import tool given with the neo4j installations ?


On Sunday, July 10, 2016 at 8:26:49 PM UTC+12, Ganesh Selvaraj wrote:

Chris Vest

unread,
Jul 12, 2016, 3:18:10 AM7/12/16
to ne...@googlegroups.com
No, import-tool is a different thing.

--
Chris Vest
System Engineer, Neo Technology
[ skype: mr.chrisvest, twitter: chvest ]


Ganesh Selvaraj

unread,
Jul 12, 2016, 4:35:17 AM7/12/16
to ne...@googlegroups.com
Thanks Chris, is it open source ? Can we look at the source code ?

Thanks

--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/BnQkwjEfP5k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Cheers
Ganesh

Michael Hunger

unread,
Jul 12, 2016, 4:40:23 AM7/12/16
to ne...@googlegroups.com
Sure it is part of neo4j, just check out the community/import-tool submodule here on GitHub:

It uses some other parts of neo4j internally like the CSV reader and the parallel batch importer API
Reply all
Reply to author
Forward
0 new messages