Re: [stardog-users] Unexplainable difference in loading times

0 views
Skip to first unread message

Kendall Clark

unread,
Jun 22, 2012, 10:04:45 AM6/22/12
to sta...@clarkparsia.com
On Fri, Jun 22, 2012 at 9:50 AM, Laurens De Vocht <laur...@gmail.com> wrote:

I don't understand how the difference can be explained, but prob it has to do smth with garbage collection? As during the loading of the triples memory consumption was 100% and prob Source 2 just 'fitted' in memory.

That is enormous variation, far beyond what we've ordinarily seen.
 
Any thoughts?

My first guess is that you should run a database for which you want good performance on a real computer, not on a virtual one (where I/O can be very bad, etc).

Beyond that, it could be GC, it could be something else.

We'll take a look and see if we can reproduce.
 
Also when idle and Stardog running loaded memory usage is about 3GB! Will it work on the cloud actually on a small instance with let's say 512 MB available for stardog?


Al Baker, who's on this list, is the world's expert on running Stardog on memory constrained systems, so maybe he'll chime in here.

Cheers,
Kendall

Robert Butler

unread,
Jun 22, 2012, 11:38:54 AM6/22/12
to sta...@clarkparsia.com
I thought I would weigh in on running Stardog in the cloud as well since that is what we do at Pancake Technology:

Will it work on the cloud actually on a small instance with let's say 512 MB available for stardog?

On cloud in general, your mileage is going to vary with your cloud provider. I've run Stardog instances in both Amazon EC2 and Rackspace. Amazon's performance in general is not great for disk I/O based systems and can be very unpredictable on small instances. Rackspace has much better performance with only small increases in price per performance stat. Comparable machines in Rackspace perform better hands down. So, it's critical to evaluate your cloud provider (or private cloud infrastructure) and test Stardog on it directly.

I currently have Stardog running in the Rackspace cloud on the following instance sizes (most of the memory is reserved for Stardog on these boxes):
 - 512 MB dev box
 - 1024 MB dev box
 - 256 MB prod box
 - 2048 MB prod box

The query rates and data sizes on these boxes are relatively small for what Stardog can handle on that size memory. Stardog does run on < 256m RAM on a production machine with no stability/lag issues. The key is to figure out the memory size needed for your particular data-set and performace size.

As a side note, I've personally seen huge performance/lag issues with running disk I/O intensive apps inside a VM on non-server grade hardware and software. I would expect that the VM to be causing your intermittent performance issues.

Hope that helps,
Robert

Robert Butler
President
Pancake Technology, LLC

P.O. Box 271416
Flower Mound, TX 75027

--
-- -- You received this message because you are subscribed to the C&P "Stardog" group. To post to this group, send email to sta...@clarkparsia.com To unsubscribe from this group, send email to stardog+u...@clarkparsia.com For more options, visit this group at http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

Evren Sirin

unread,
Jun 22, 2012, 12:21:56 PM6/22/12
to sta...@clarkparsia.com
I'm not sure what is causing the slowness but it probably has
something to do with how much resource virtual machine can spare for
Stardog.

Loading these three data sources on my desktop (iMac running OSX
10.6.8 2.8Ghz Intel i7, 16G RAM) with default Stardog settings (JVM
memory set to 2GB) gave the following results where data source 2
loads faster but the difference is not that much:

~/programs/stardog/stardog-1.0$ ./stardog-admin create -n lodib5m_1
~/programs/lodib-0.1/usecases/5M/sources/1/data.nq
Bulk loading data to new database.
Data load complete. Loaded 1,626,795 triples in 00:00:34 @ 46.7K triples/sec.
Successfully created database 'lodib5m_1'.

~/programs/stardog/stardog-1.0$ ./stardog-admin create -n lodib5m_2
~/programs/lodib-0.1/usecases/5M/sources/2/data.nq
Bulk loading data to new database.
Data load complete. Loaded 1,786,369 triples in 00:00:28 @ 63.8K triples/sec.
Successfully created database 'lodib5m_2'.

~/programs/stardog/stardog-1.0$ ./stardog-admin create -n lodib5m_3
~/programs/lodib-0.1/usecases/5M/sources/3/data.nq
Bulk loading data to new database.
Data load complete. Loaded 1,584,828 triples in 00:00:36 @ 43.7K triples/sec.
Successfully created database 'lodib5m_3'.

It is is not usual to see 10% or 20% difference in loading times
especially when the same data source is loaded subsequently due to how
the OS caches pages from disk. It is also usual to see loading times
change between different data sources. For example, see the loading
times we report in [1]. Nearly 20 times difference you see for loading
different data sources is very unusual though. Maybe the load on the
machine varied between different loads affecting the load performance?

Best,
Evren

[1] http://stardog.com/docs/performance/

On Fri, Jun 22, 2012 at 9:50 AM, Laurens De Vocht <laur...@gmail.com> wrote:
> I am evaluating Stardog for loading a large amount of triples ( up to 5
> billion ). as a first step I am testing how well it performs on 4GB 4CPU
> Virtual Machine running on Virtual Box under Windows 7.
>
> I am loading the LODIB generated datasets (5M triples) and I get following
> strange results:
>
> Source 1 1 564.1 MB
> Data load complete. Loaded 1,626,795 triples in 00:14:30 @ 1.9K triples/sec
> -> ?
> Source 2  447.9 MB
> Data load complete. Loaded 1,786,369 triples in 00:00:44 @ 40.4K
> triples/sec. -> that's fast
> Source 3 569.5 MB
> Data load complete. Loaded 1,584,828 triples in 00:11:09 @ 2.4K triples/sec.
>
> A second run for source 3 gave:
> Data load complete. Loaded 1,584,828 triples in 00:07:29 @ 3.5K triples/sec.
>
> I don't understand how the difference can be explained, but prob it has to
> do smth with garbage collection? As during the loading of the triples memory
> consumption was 100% and prob Source 2 just 'fitted' in memory.
>
> Any thoughts?
>
> Also when idle and Stardog running loaded memory usage is about 3GB! Will it
> work on the cloud actually on a small instance with let's say 512 MB
> available for stardog?
>

Kendall Clark

unread,
Jun 25, 2012, 6:37:21 AM6/25/12
to sta...@clarkparsia.com
It's hard for us to address an issue we can't reproduce, and we've failed to reproduce yr report.

Cheers,
Kendall

On Monday, June 25, 2012, Laurens De Vocht wrote:
Maybe yes, but I did the same test with Jena TDB (default Tomcat settings, same JVM configuration) and the file with 5 million triples loaded in 86 seconds ( all sources loaded under 90 seconds ). So as I can see Stardog could be faster than Jena TDB, but for some reason it hangs.

I wonder what's happening.


On Friday, June 22, 2012 6:21:56 PM UTC+2, Evren Sirin wrote:
Maybe the load on the
machine varied between different loads affecting the load performance? 

--
-- -- You received this message because you are subscribed to the C&P "Stardog" group. To post to this group, send email to sta...@clarkparsia.com To unsubscribe from this group, send email to stardog+u...@clarkparsia.com For more options, visit this group at http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en


--
Cheers,
Kendall

Robert Butler

unread,
Jun 25, 2012, 7:24:01 AM6/25/12
to sta...@clarkparsia.com
The only difference w.r.t. configuration is max memory size passed to the JVM.

- Robert

On Jun 25, 2012, at 3:04 AM, Laurens De Vocht <laur...@gmail.com> wrote:

Is there any difference in between the configuration of these boxes dev vs prod? Or is it just the JVM settings that are adapted to the available memory?
 
I currently have Stardog running in the Rackspace cloud on the following instance sizes (most of the memory is reserved for Stardog on these boxes):
 - 512 MB dev box
 - 1024 MB dev box
 - 256 MB prod box
 - 2048 MB prod box

Reply all
Reply to author
Forward
0 new messages