Hi
we are running our first steps with Neo4j and used various alternatives to create an initial database
1) we used the Java API with an embedded database
here
https://github.com/linked-swissbib/swissbib-metafacture-commands/blob/neo4j-tests/src/main/java/org/swissbib/linked/mf/writer/NeoIndexer.java#L76a transaction is closed which surrounds 20.000 nodes with relationships to around 40.000 other nodes.
We are surprised the Transaction.close() method needs up to 30 seconds to write these nodes to disk
2) then I wanted to compare my results with the neo4j-import script provided by the Neo4J-server
Using this method I have difficulties with the format of the csv-files
My small examples:
first node file:
lsId:ID(localsignature),:LABEL
"NEBIS/002527587",LOCALSIGNATURE
"OCoLC/637556711",LOCALSIGNATURE
second node file:
brId:ID(bibliographicresource),active,:LABEL
146404300,true,BIBLIOGRAPHICRESOURCE
relationship file
:START_ID(bibliographicresource),:END_ID(localsignature),:TYPE
146404300,"NEBIS/002527587",SIGNATUREOF
146404300,"OCoLC/637556711",SIGNATUREOF
./neo4j-import --into [path-to-db]/test.db/ --nodes files/br.csv --nodes files/br.csv --relationships:SIGNATUREOF files/signatureof.csv
which throws the exception
Done in 191ms
Prepare node index
Exception in thread "Thread-3" org.neo4j.unsafe.impl.batchimport.cache.idmapping.string.DuplicateInputIdException: Id '146404300' is defined more than once in bibliographicresource, at least at /home/swissbib/environment/tools/neo4j-community-2.3.2/bin/files/br.csv:2 and /home/swissbib/environment/tools/neo4j-community-2.3.2/bin/files/br.csv:2
at org.neo4j.unsafe.impl.batchimport.input.BadCollector$2.exception(BadCollector.java:107)
at org.neo4j.unsafe.impl.batchimport.input.BadCollector.checkTolerance(BadCollector.java:176)
at org.neo4j.unsafe.impl.batchimport.input.BadCollector.collectDuplicateNode(BadCollector.java:96)
at org.neo4j.unsafe.impl.batchimport.cache.idmapping.string.EncodingIdMapper.detectDuplicateInputIds(EncodingIdMapper.java:590)
at org.neo4j.unsafe.impl.batchimport.cache.idmapping.string.EncodingIdMapper.buildCollisionInfo(EncodingIdMapper.java:494)
at org.neo4j.unsafe.impl.batchimport.cache.idmapping.string.EncodingIdMapper.prepare(EncodingIdMapper.java:282)
at org.neo4j.unsafe.impl.batchimport.IdMapperPreparationStep.process(IdMapperPreparationStep.java:54)
at org.neo4j.unsafe.impl.batchimport.staging.LonelyProcessingStep$1.run(LonelyProcessingStep.java:56)
Duplicate input ids that would otherwise clash can be put into separate id space, read more about how to use id spaces in the manual:
http://neo4j.com/docs/2.3.2/import-tool-header-format.html#import-tool-id-spacesCaused by:Id '146404300' is defined more than once in bibliographicresource, at least at /home/swissbib/environment/tools/neo4j-community-2.3.2/bin/files/br.csv:2 and /home/swissbib/environment/tools/neo4j-community-2.3.2/bin/files/br.csv:2
I can't see any differences in the documentation of
http://neo4j.com/docs/2.3.2/import-tool-header-format.html#import-tool-id-spacesbecause I tried to use the ID space notation (as far as I can see...)
Thanks for any hints!
Günter