.jdb file size keeps on increasing and the server crashes

345 views
Skip to first unread message

Samyuktha M S

unread,
Feb 8, 2013, 12:14:48 AM2/8/13
to project-...@googlegroups.com
Hi,
 
We are using voldemort 0.80.1 server. We have more than a million records stored in the voldemort DB.
The .jdb file size keeps on increasing. Over a period of week, the server gets hanged and no data can be retreived
We had to restart the server pointing to a new data path. The existing data is lost and again we have to dump the entire data again.
 
.jdb files store the data which we dump to voldemort right i.e if i put the value A=B, then it will be stored in .jdb file? Please correct if I am wrong
Does it store any information for each transaction; say I query voldemort for data A, does it store any info relating to this transaction also?
 
Please throw some light on this.
 
Thanks,
Samyuktha.

Carlos Tasada

unread,
Feb 8, 2013, 6:58:29 AM2/8/13
to project-...@googlegroups.com
Hi Samyuktha,

First of all you're using a quite old Voldemort version. Anyway, there are different things that you should check:
- How are your cleaners configured? Check here http://www.project-voldemort.com/voldemort/configuration.html
- I have seen something similar when BDB gets corrupted. Check your logs and your OS for potential problems.

Is there any log on the server pointing to some problem? How many servers are in your cluster? Are your using the auto-expiry feature?

Regards,
Carlos.


--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Brendan Harris

unread,
Feb 8, 2013, 12:32:17 PM2/8/13
to project-...@googlegroups.com
Samyuthka,

In addition to what Carlos said, you also need a large enough bdb cache size, so that the cleaners can fit enough of the index into the cache to do their cleaning. If your cache is too small, your cleaner threads can fall behind and cause your jdb structure to grow in an unbounded fashion (rarely reclaiming enough space). Your max logfile size (jdb size) can also impact the performance of the cleaners. And I'd definitely recommend upgrading to a much more recent release.

Also no transaction data is stored in the structure, just key, versions, timestamps and values. It's log structured (append-only), so all modifications to existing keys get written to the end of the current jdb file and the old files are deleted over time as the keys in them are invalidated by the more current jdb files. That deleting of the old jdb files can be improved or degraded depending on your configuration, how many keys you have and how many updates to existing keys you do.

Brendan

Samyuktha M S

unread,
Feb 11, 2013, 1:44:30 AM2/11/13
to project-...@googlegroups.com
Hi Carlos& Brendan,

As per your suggestion, we will try to upgrade to latest version. But since the products are live, it takes a period of time to upgrade and test.

But till then, can you suggest something on the existing problem.

Its a single node cluster

Please find the configuration details (server.properties):

# The ID of *this* particular cluster node

max.threads=300
#enable.memory.engine=true

############### DB options ######################

http.enable=true
socket.enable=true

# BDB
bdb.write.transactions=false
bdb.flush.transactions=false
bdb.cache.size=100M

# Mysql
mysql.host=localhost
mysql.port=1521
mysql.user=root
mysql.password=3306
mysql.database=test

#NIO connector settings.
enable.nio.connector=false
#enable.readonly.engine=true
#file.fetcher.class=voldemort.store.readonly.fetcher.HdfsFetcher
storage.configs=voldemort.store.bdb.BdbStorageConfiguration, voldemort.store.readonly.ReadOnlyStorageConfiguration

voldemort-server.sh has the following properties:

/usr/java/jdk1.6.0_26/bin/java   -XX:+UseConcMarkSweepGC -Xloggc:/opt/Voldemort/voldemort-0.80.1/bin/log/gc.log -Dlog4j.configuration=src/java/log4j.properties -Xms256M -Xmx256M -server -Dcom.sun.management.jmxremote -cp $CLASSPATH voldemort.server.VoldemortServer $@


The existing key size is 60,000,000
On an average, updates of 50000 (inclusive of new adds and updates of existing keys)

The bdb folder size is approximately 10G now

Please suggest.

Thanks,
samyuktha.



--

Brendan Harris

unread,
Feb 11, 2013, 4:27:56 AM2/11/13
to project-...@googlegroups.com
Hi Samyuthka,

For future reference, don't put your usernames/passwords or any identifying secure info on the board, even for test databases.

Since you have about 60M keys and many updates to those keys, you're going to need a lot more than 100 megabytes for the bdb.cache.size. Your cleaner threads are probably falling way behind. If you can upgrade to 0.96, you can expose metrics via mbeans that show how much cache you need and how far behind your cleaner threads are. You can also use the com.sleepycat.je.util.DbPrintLog package to get an idea of how large your index may be to help you understand how much bdb cache you may need. Also, if you have a lot of concurrent writes to individual keys, you could have a very large index.

The first thing you should do is increase your bdb.cache.size to something like 2G to start and then see if the jdb files start getting cleaned up. If you don't see bdb files getting deleted more regularly, increase it. You can tune down from there, once you start seeing progress, until you find a comfortable amount of "memory vs gc time/frequency/delay". You'll need to increase your heap size to allow for more than enough oldgen space to host the entire size of the bdb cache, plus other objects (if it at least 30% memory more than your bdb.cache.size).

You might also want to add in the following:
bdb.cleaner.min.file.utilization=0

This will allow the overall environment utilization to be used alone, which is 50% and will help avoid conflicting compaction rules.

Unfortunately, with your version of voldemort, you cannot increase the number of bdb cleaner threads and you cannot see the cleaner backlog in real-time or the bdb cache usage. So, there are not a lot of options for you. But the cache increase and disabling the min.file.utilization should definitely give you some improvement.

Lastly, if you're not using any read-only persistence stores, you should remove the ReadOnlyStorageConfiguration or you will just be wasting heap space.

Brendan

On Sunday, February 10, 2013 10:44:30 PM UTC-8, Samyuktha M S wrote:
Hi Carlos& Brendan,

As per your suggestion, we will try to upgrade to latest version. But since the products are live, it takes a period of time to upgrade and test.

But till then, can you suggest something on the existing problem.

Its a single node cluster

Please find the configuration details (server.properties):

# The ID of *this* particular cluster node

max.threads=300
#enable.memory.engine=true

############### DB options ######################

http.enable=true
socket.enable=true

# BDB
bdb.write.transactions=false
bdb.flush.transactions=false
bdb.cache.size=100M

# Mysql
mysql.host=localhost
mysql.port=1521
mysql.user=<user>
mysql.password=<pass>
mysql.database=test

#NIO connector settings.
enable.nio.connector=false
#enable.readonly.engine=true
#file.fetcher.class=voldemort.store.readonly.fetcher.HdfsFetcher
storage.configs=voldemort.store.bdb.BdbStorageConfiguration, voldemort.store.readonly.ReadOnlyStorageConfiguration

voldemort-server.sh has the following properties:

/usr/java/jdk1.6.0_26/bin/java   -XX:+UseConcMarkSweepGC -Xloggc:/opt/Voldemort/voldemort-0.80.1/bin/log/gc.log -Dlog4j.configuration=src/java/log4j.properties -Xms256M -Xmx256M -server -Dcom.sun.management.jmxremote -cp $CLASSPATH voldemort.server.VoldemortServer $@


The existing key size is 60,000,000
On an average, updates of 50000 (inclusive of new adds and updates of existing keys)

The bdb folder size is approximately 10G now

Please suggest.

Thanks,
samyuktha.


On Fri, Feb 8, 2013 at 11:02 PM, Brendan Harris <vold...@stotch.com> wrote:
Samyuthka,

In addition to what Carlos said, you also need a large enough bdb cache size, so that the cleaners can fit enough of the index into the cache to do their cleaning. If your cache is too small, your cleaner threads can fall behind and cause your jdb structure to grow in an unbounded fashion (rarely reclaiming enough space). Your max logfile size (jdb size) can also impact the performance of the cleaners. And I'd definitely recommend upgrading to a much more recent release.

Also no transaction data is stored in the structure, just key, versions, timestamps and values. It's log structured (append-only), so all modifications to existing keys get written to the end of the current jdb file and the old files are deleted over time as the keys in them are invalidated by the more current jdb files. That deleting of the old jdb files can be improved or degraded depending on your configuration, how many keys you have and how many updates to existing keys you do. 

Samyuktha M S

unread,
Feb 11, 2013, 5:06:34 AM2/11/13
to project-...@googlegroups.com
Thanks a lot Brendan for your suggestions.
Will implement and update with the results

And will take care about user and password details in future.

Vinoth Chandar

unread,
Feb 11, 2013, 2:53:28 PM2/11/13
to project-...@googlegroups.com
I can give you a config and ask you to try out.. but not sure how many configs are supported in 0.80.1, as Brendan pointed out.. How much RAM do you have ?   I am unable to quickly map the branches in github to the release number..
Are you doing 50000 insert-updates/sec? Do you have SSDs or spindle disks?  You may not even have enough IOPS to write that much.

If you can paste the bdb configuration params from VoldemortConfig.java, I can give you some configs to test out..

Vinoth Chandar

unread,
Feb 11, 2013, 6:17:48 PM2/11/13
to project-...@googlegroups.com
Brendan helped me pull up the release on github

    https://github.com/voldemort/voldemort/blob/release-0801/src/java/voldemort/server/VoldemortConfig.java

From this, your best options are 

this.bdbCacheSize = props.getBytes("bdb.cache.size", 200 * 1024 * 1024); // bump this upto 70% of the maximum heap size you can set.
this.bdbCheckpointBytes = props.getLong("bdb.checkpoint.interval.bytes", 20 * 1024 * 1024); // Bump this up to 1GB or so, to not to interfere with online traffic. Downside it large recovery time.
this.bdbCheckpointMs = props.getLong("bdb.checkpoint.interval.ms", 30 * Time.MS_PER_SECOND); // Bump this up to 5 minutes or so
this.bdbCleanerMinFileUtilization = props.getInt("bdb.cleaner.min.file.utilization", 5); // set this to 0
this.bdbCleanerMinUtilization = props.getInt("bdb.cleaner.minUtilization", 50);

You can manually configure your BDB-JE by dropping a je.properties in your environment.. (Refer to BDB-JE docs) .
Again, you need to first figure out how much data you can write to the disk using a dd test. No storage engine is going to be able to top that.

Thanks
Vinoth

Brendan Harris

unread,
Feb 12, 2013, 1:41:11 AM2/12/13
to project-...@googlegroups.com
Samyuthka,

I'd like to add a couple notes to what Vinoth said inline ...

On Monday, February 11, 2013 3:17:48 PM UTC-8, Vinoth Chandar wrote:
this.bdbCacheSize = props.getBytes("bdb.cache.size", 200 * 1024 * 1024); // bump this upto 70% of the maximum heap size you can set.

I'd make this 70% of your OldGen size.

I am not sure how experienced you are with JVM tuning, but your OlgGen size is generally the heap size minus NewGen and PermGen sizes. Your cached objects (bdb.cache.size) will live fairly long and are very likely to tenure and promote to OldGen (though this depends somewhat upon your request rate, NewGen size and NewGen GC/tenuring parameters).

this.bdbCheckpointBytes = props.getLong("bdb.checkpoint.interval.bytes", 20 * 1024 * 1024); // Bump this up to 1GB or so, to not to interfere with online traffic. Downside it large recovery time.

Be careful with this one if you're not on SSDs. You might want to play around with this one a bit, starting with a lower number of bytes and increasing over time to see if it improves. To truly test the results, you need an ungraceful shutdown (suddeng server power down, SIGKILL, etc) to tell you how long your startup time will take. The larger the byte interval, the longer it will take to start the server in the event of a failure.

Brendan

Samyuktha M S

unread,
Feb 13, 2013, 1:02:03 AM2/13/13
to project-...@googlegroups.com
Hi Vinoth,
We do not get 50000 inserts or updates per second. But data to be loaded into voldemort is approx 50000 (inserts+updates) per day. 
This data comes in intervals say once in 4 hrs through a file. And we dump this to voldemort.

We have a RAM size of 8GB




Brendan

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
Reply all
Reply to author
Forward
0 new messages