I'm not sure what is causing the slowness but it probably has
something to do with how much resource virtual machine can spare for
Stardog.
Loading these three data sources on my desktop (iMac running OSX
10.6.8 2.8Ghz Intel i7, 16G RAM) with default Stardog settings (JVM
memory set to 2GB) gave the following results where data source 2
loads faster but the difference is not that much:
~/programs/stardog/stardog-1.0$ ./stardog-admin create -n lodib5m_1
~/programs/lodib-0.1/usecases/5M/sources/1/data.nq
Bulk loading data to new database.
Data load complete. Loaded 1,626,795 triples in 00:00:34 @ 46.7K triples/sec.
Successfully created database 'lodib5m_1'.
~/programs/stardog/stardog-1.0$ ./stardog-admin create -n lodib5m_2
~/programs/lodib-0.1/usecases/5M/sources/2/data.nq
Bulk loading data to new database.
Data load complete. Loaded 1,786,369 triples in 00:00:28 @ 63.8K triples/sec.
Successfully created database 'lodib5m_2'.
~/programs/stardog/stardog-1.0$ ./stardog-admin create -n lodib5m_3
~/programs/lodib-0.1/usecases/5M/sources/3/data.nq
Bulk loading data to new database.
Data load complete. Loaded 1,584,828 triples in 00:00:36 @ 43.7K triples/sec.
Successfully created database 'lodib5m_3'.
It is is not usual to see 10% or 20% difference in loading times
especially when the same data source is loaded subsequently due to how
the OS caches pages from disk. It is also usual to see loading times
change between different data sources. For example, see the loading
times we report in [1]. Nearly 20 times difference you see for loading
different data sources is very unusual though. Maybe the load on the
machine varied between different loads affecting the load performance?
Best,
Evren
[1]
http://stardog.com/docs/performance/
> I am evaluating Stardog for loading a large amount of triples ( up to 5
> billion ). as a first step I am testing how well it performs on 4GB 4CPU
> Virtual Machine running on Virtual Box under Windows 7.
>
> I am loading the LODIB generated datasets (5M triples) and I get following
> strange results:
>
> Source 1 1 564.1 MB
> Data load complete. Loaded 1,626,795 triples in 00:14:30 @ 1.9K triples/sec
> -> ?
> Source 2 447.9 MB
> Data load complete. Loaded 1,786,369 triples in 00:00:44 @ 40.4K
> triples/sec. -> that's fast
> Source 3 569.5 MB
> Data load complete. Loaded 1,584,828 triples in 00:11:09 @ 2.4K triples/sec.
>
> A second run for source 3 gave:
> Data load complete. Loaded 1,584,828 triples in 00:07:29 @ 3.5K triples/sec.
>
> I don't understand how the difference can be explained, but prob it has to
> do smth with garbage collection? As during the loading of the triples memory
> consumption was 100% and prob Source 2 just 'fitted' in memory.
>
> Any thoughts?
>
> Also when idle and Stardog running loaded memory usage is about 3GB! Will it
> work on the cloud actually on a small instance with let's say 512 MB
> available for stardog?
>