--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en
Thanks for the pointer to the previous thread - apologies for not having searched the forum more thoroughly before posting.
Anyway, just to confirm what you no doubt expected, that I can load 35 million triples pretty quickly by using the bulk loader as part of a db create. Also, I realise I can load a trig file via db create, handy for me as I would generally want to initialise a new db with data in various named graphs.
Part of the reason I was trying out the 'data add' option was because an important use case for me will be adding reasonably big files to an already running database, where using the bulk loader isn't really a viable option. I appreciate that this requires a decent chunk of memory.
I've done a few experiments and with my test data, I've found that a 2GB heap size will happily load a 120MB/750k triple file using stardog data add, but runs out of memory with a 160MB/1million triple file. So that gives me a useful rule of thumb - does it correspond roughly to what you would expect? In the cases where it doesn't run out of memory, it loads the triples pretty quick.
I'll see if I can set up stardog on a machine with more RAM than my laptop and try some bigger experiments.
Hi ZacharyThanks for the pointer to the previous thread - apologies for not having searched the forum more thoroughly before posting.
Anyway, just to confirm what you no doubt expected, that I can load 35 million triples pretty quickly by using the bulk loader as part of a db create. Also, I realise I can load a trig file via db create, handy for me as I would generally want to initialise a new db with data in various named graphs.
Part of the reason I was trying out the 'data add' option was because an important use case for me will be adding reasonably big files to an already running database, where using the bulk loader isn't really a viable option. I appreciate that this requires a decent chunk of memory.
I've done a few experiments and with my test data, I've found that a 2GB heap size will happily load a 120MB/750k triple file using stardog data add, but runs out of memory with a 160MB/1million triple file. So that gives me a useful rule of thumb - does it correspond roughly to what you would expect? In the cases where it doesn't run out of memory, it loads the triples pretty quick.
--
Thanks Kendall - that would definitely be useful. But as long as I know roughly what to expect in terms of max file sizes, the current approach should work ok for me. I can just throw enough RAM at it to cover the majority of cases, and split files into chunks for edge cases.
On 5 Aug 2013, at 16:38, Kendall Clark <ken...@clarkparsia.com> wrote:
> Evren is going to be addressing this issue with Stardog intermediate writes (i.e., large writes after database creation) to avoid forcing people to do awkward wipe-and-loads.
One follow-up question, if I use /transaction/begin and /transaction/commit in the HTTP protocol, would all data added during the transaction be held in memory until it is committed? Hence I'd have to bear in mind heap sizes if grouping together several biggish data adds?
My ideal solution would be remote adds using HTTP, but fully understand a step-by-step development approach is best!
>
> If that makes it into the 2.0 release, it will have some limitations (only local adds, not remote) which we will work on lifting in the 2.x cycle.
>
> Cheers,
> Kendall
>