GC settings for Embedded Neo4j in WebApp

94 views
Skip to first unread message

George Vincent

unread,
Feb 14, 2014, 10:55:24 AM2/14/14
to ne...@googlegroups.com
Hi there,

I'm thinking of using Embedded graph in a web application. I'm anticipating that the graph size would be ~10 GB. What would be a good GC configuration to start with? 

This will run inside a tomcat app.

Any help will be highly appreciated.

thanks,
George

Mark Needham

unread,
Feb 15, 2014, 5:50:23 AM2/15/14
to ne...@googlegroups.com
Hi George,

It depends how much RAM you have on the machine. Generally we suggest memory mapping as much of the store as possible (http://docs.neo4j.org/chunked/stable/configuration-io-examples.html) and then use the memory you have left over for your JVM heap.

The actual numbers depend on the spec of the machine so you'd have to give more information on that.

Any reason you want to use it embedded rather than use server? 

Cheers
Mark



--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

George Vincent

unread,
Feb 17, 2014, 11:31:53 PM2/17/14
to ne...@googlegroups.com
Thanks Mark. I'm evaluating a situation where I may have 8GB RAM or 16GB RAM. But it's not guaranteed that RAM will always be larger than the data size. The server may have 2 dual/quad core CPUs.

If I understand correctly, the link is suggesting to allocate memory based on the size consumed by node store and relationship store on the disk. Is that far more important than allocating java heap to cache nodes/relationships? 

I presume setting the memory mapped configuration will not affect GC. So, is it recommended to go with CMS collector with the available heap (after setting the memory mapping) ?

The reason to pick embedded server is that we can wrap a webapp around it, and have all the application logic including cyper queries live in this application. We can have as many clients talking to this app. 

Thanks again. 

Mark Needham

unread,
Feb 18, 2014, 3:34:53 AM2/18/14
to ne...@googlegroups.com
Hi George,

I presume setting the memory mapped configuration will not affect GC

Assuming that you're not on windows the memory mapping stuff is done off heap so no GC to worry about. 

 Is that far more important than allocating java heap to cache nodes/relationships? 

I haven't measured the exact numbers but I'd suggest memory mapping as much of the store files as you can and then use what you have left over for the cache. (i.e. heap size). 

The reason to pick embedded server is that we can wrap a webapp around it, and have all the application logic including cyper queries
> live in this application

You could still do a similar thing using Neo4j server. That way you won't mix up the GC cycles of your application and Neo4j which can be annoying at times. 

Cheers
Mark

George Vincent

unread,
Feb 18, 2014, 8:44:20 AM2/18/14
to ne...@googlegroups.com
Thanks a lot, Mark! 

>> Assuming that you're not on windows the memory mapping stuff is done off heap so no GC to worry about. 
Correct. This will be running on Linux.

>> I haven't measured the exact numbers but I'd suggest memory mapping as much of the store files as you can and then use what you have left over for the cache. (i.e. heap size). 
Sounds good.

>>  So, is it recommended to go with CMS collector with the available heap ?
Any thoughts on the GC setting to go with for the application (with the webapp + embedded server set up)? 

Mark Needham

unread,
Feb 18, 2014, 9:32:32 AM2/18/14
to ne...@googlegroups.com
D'oh sorry I didn't answer that bit. Yeh CMS is what most people go with. I think I've seen one user go with G1 but CMS should work fine. 

George Vincent

unread,
Feb 18, 2014, 4:09:12 PM2/18/14
to ne...@googlegroups.com
Thanks Mark. Do you have any suggestion for the CMS collector configuration to start with, given that in community edition, Neo4j tries to evict the cache using GC as it gets full ? 



--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/rkt5ZiKaqkU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.



--
Thanks,
George

Ph: +1 617 771 8517 (US)

Michael Hunger

unread,
Feb 18, 2014, 4:56:10 PM2/18/14
to ne...@googlegroups.com
Can you describe your use-case (queries) and usage patterns (concurrency) a bit? Then it might become more obvious why the caches get filled that quickly and which other cache config could help.

And you can always get enterprise and try out the hpc cache.

Michael

George Vincent

unread,
Feb 18, 2014, 11:16:52 PM2/18/14
to ne...@googlegroups.com
Hi Michael - The graph resembles a social network. User knows other users. General use cases are to find a list of users I know, list of users that my friends know (second degree), finding mutual friends between two users etc. In my load test, i was trying to exercise second degree queries for 1000 unique users with a throughput of 200 requests/min. (1 request for 1 user). The second degree queries are not paginated, therefore, it may be pulling quite a lot of nodes and caching them.

I use the default cache config. Perhaps, I should try with weak cache config? 

I read that hpc cache does not rely on GC for cache eviction.Yes, that's an option.

thanks for you help!
George 

Michael Hunger

unread,
Feb 19, 2014, 3:17:02 AM2/19/14
to ne...@googlegroups.com
Can you list the queries you ran?
You should be able to sustain much higher throughput

Also what we the amounts returned? Avg, max, percentiles

Sent from mobile device
Reply all
Reply to author
Forward
0 new messages