High CPU usage

jam+

unread,

Nov 11, 2013, 11:33:36 PM11/11/13

to project-...@googlegroups.com

Hi,

Our VDM cluster is causing high CPU usage problem now.

Currently I have no idea what the root cause is.

There is no error/exception log, there is no unstable status... all I can see is usage of CPU getting 100%.

Disk space

/dev/md0 63G 45G 16G 75% /opt/vdm

server.properties

node.id=0

max.threads=100

client.max.connections.per.node=50

client.connection.timeout.ms=5000

############### DB options ######################

data.directory=/opt/vdm/service/data

http.enable=true

socket.enable=true

slop.pusher.enable=true

slop.frequency.ms=300000

# BDB

bdb.write.transactions=true

bdb.flush.transactions=false

bdb.cache.size=11000m

# Mysql

mysql.host=localhost

mysql.port=1521

mysql.user=root

mysql.password=3306

mysql.database=test

#NIO connector settings.

enable.nio.connector=true

storage.configs=voldemort.store.bdb.BdbStorageConfiguration, voldemort.store.readonly.ReadOnlyStorageConfiguration, voldemort.store.memory.CacheStorageConfiguration

Here is my jstack dump: https://cloudup.com/cNJDQplqskC

Hope someone can give me a hint, thanks!

jam+

unread,

Nov 11, 2013, 11:42:46 PM11/11/13

to project-...@googlegroups.com

BTW, here is top:

9246 csrunner 15 0 12.9g 10g 5120 S 102.2 68.8 76729:04 java

here is free:

jam[0]0$ free -m

total used free shared buffers cached

Mem: 15360 15325 34 0 36 4182

-/+ buffers/cache: 11105 4254

Swap: 2047 505 1542

The memory usage is almost 70%, and eating swap...

jam+於 2013年11月12日星期二UTC+8下午12時33分36秒寫道：

jam+

unread,

Nov 12, 2013, 4:09:39 AM11/12/13

to project-...@googlegroups.com

Here for more information.

Memory analysis:
1. https://i.cloudup.com/uhR7aXEBBI.png

2. https://i.cloudup.com/kreu4cF7rZ.png

Thanks.

jam+於 2013年11月12日星期二UTC+8下午12時42分46秒寫道：

Esteban Donato

unread,

Nov 12, 2013, 6:48:08 AM11/12/13

to project-...@googlegroups.com

Did you check if you are running too frequent FGC? What's your max heap size value? It seems all your heap is being used by the bdb cache.

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/groups/opt_out.

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,

Nov 12, 2013, 11:33:09 AM11/12/13

to project-...@googlegroups.com

Hi Jam,

Like Esteban asked, please give use your JVM config (the full config).

Also ...

storage.configs=voldemort.store.bdb.BdbStorageConfiguration, voldemort.store.readonly.ReadOnlyStorageConfiguration, voldemort.store.memory.CacheStorageConfiguration

I don't recommend running more than one storage configuration on a single JVM instance. The objects, their creation rate and lifespans are very different from storage engine to storage engine. The JVM's automated GC is not generally adequate for such a complex system of objects. The JVM's GC activity could keep the CPU very busy.

Thanks,

Brendan

jam+

unread,

Nov 12, 2013, 10:26:49 PM11/12/13

to project-...@googlegroups.com

Thank you for all your reply.

Here is JVM config (I reduce the classpath part):

java -Xms12g -Xmx12g -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70 -XX:SurvivorRatio=2 -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/opt/vdm-0.95/bin/../logs/gc-vdm.log -XX:NewSize=512m -XX:MaxNewSize=512m -XX:MaxPermSize=160M -Dlog4j.configuration=file:///opt/vdm-0.95/conf/log4j.properties -Dcom.sun.management.jmxremote.port=9001 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.library.path=/opt/vdm-0.95/bin/../lib/boot -classpath /opt/vdm-0.95/lib/.. -Dwrapper.key=B6OXdyhr5fM8dD83 -Dwrapper.port=32000 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.pid=24254 -Dwrapper.version=3.2.3 -Dwrapper.native_library=wrapper -Dwrapper.service=TRUE -Dwrapper.cpu.timeout=10 -Dwrapper.jvmid=1 org.tanukisoftware.wrapper.WrapperSimpleApp voldemort.server.VoldemortServer

For the storage.configs, we will figure out how to separate theses instances.

Thanks for the advice!

Brendan Harris (a.k.a. stotch on irc.oftc.net)於 2013年11月13日星期三UTC+8上午12時33分09秒寫道：

Brendan Harris (a.k.a. stotch on irc.oftc.net)

unread,

Nov 12, 2013, 11:04:58 PM11/12/13

to project-...@googlegroups.com

So, your bdb.cache.size is 11g and your jvm heap size is 12g. 0.5g of the heap is for newgen, leaving roughly 11.4g for oldgen, which is inevitably where your bdb cache objects will land and remain for a long time. If you want a cache size that large, you're going to need to bump your heap size up to at least 16g and you should probably give at least 1g to newgen. You're probably spending most of your time in GC, which is probably what is keeping the CPU busy. You may need a much larger heap than even 16g (and larger newgen) depending upon your throughput rate. How many queries per second are you serving?

Also, if you're running voldemort 0.95 (I assume that is what /opt/vdm-0.95 is), you're _very_ out of date and should upgrade to 1.2.0+ to get all of the performance improvements.

Lastly, with the bdb cache consuming 90% of the jvm heap, there's probably no room for the read-only and in-memory storage engines to run properly, so you're probably just stuck in a non-stop GC loop trying to allocate for all three engines in barely enough heap space for even one engine.

Brendan

jam+

unread,

Nov 12, 2013, 11:13:09 PM11/12/13

to project-...@googlegroups.com

Thanks, it's really helpful !! We will consider to arrange the setting and maybe will upgrade to 1.3 as well.

Thanks again!

Brendan Harris (a.k.a. stotch on irc.oftc.net)於 2013年11月13日星期三UTC+8下午12時04分58秒寫道：

Reply all

Reply to author

Forward