Hi,
I am extremely new to LDIF and am trying to use it on some n-quad files from bio2rdf. I have not gotten very far because I cannot even get one of the files to load at all. This file is around 5.5Gb and has at least 17 million quads in it. The other files I have tried, much smaller, have no problem to be loaded. The import job starts as can be seen by this console output
Import Job Sider_meddra_freq_parsed.0 started (quad / hourly)
[INFO] Loading from /home/bonnie/ldif-0.5.2/examples/mytest/dumps/sider-meddra_freq_parsed.nq
and the status monitor shows it loading quads fairly quickly...until it gets to around 15.5 million or 16 million (it has varied). At this point, the import job slows to a crawl, and eventually stops adding quads, though the process is still running and holding lots of memory. As far as I can tell, it must be hanging on something. Since I am new to this tool, I am really at a loss as to how to debug this. I am not getting any informative output on the console - the message above is the last thing to appear.
I have tried running this with both the in-memory and the triple store backed versions. There is no difference. It appears to me that the import process is independent of the triple store backend. I am running on a Ubuntu machine with 16Gb memory - as long as I run with the max heap space set to 10Gb, it does not run out of memory so that is not the problem.
Any advice?
thanks,
Bonnie MacKellar