Usage of RRDCachedBackend in RRD4J

65 views
Skip to first unread message

AdityaVaja

unread,
Mar 27, 2015, 2:40:38 PM3/27/15
to rrd4j-...@googlegroups.com
Hi,

I'm using version 2.2 from the online maven repo: http://mvnrepository.com/artifact/org.rrd4j/rrd4j/2.2

I have a periodic Java task which is executed every 45 seconds. It updates 4K RRD files. The interval for the archive is 60 seconds. I've kept the Java periodic task at 45 seconds, since it may take 10-15 seconds to update the files and to schedule the job again at the end of its execution.

After a little logging I figured it takes 16 or more seconds (sometimes 25) to update 4K files. Getting the counter values is a relatively costly operation which I would not do more frequently than 45 seconds.

What is the best way to reduce time to write to disk? I'm looking at two:
  1. Use RRDCachedBackend http://jrds.fr/cachingbe
  2. Install a SSD (not really, but worst case)
Since I have a maven project, I'm not sure how to include RRDCachedBackend. Ideally it would be good to have another maven repo by which I can include it as a dependency and be done. Having a jar separately will then put the onus of checking if there's an update and update the jar locally.

Any pointers would be appreciated, even a link to stackoverflow :)

--
Aditya Vaja
Graduate Student | Computer Science | NCSU

Fabrice Bacchella

unread,
Mar 30, 2015, 4:34:26 AM3/30/15
to rrd4j-...@googlegroups.com
You'r right. I have a custom maven repository (http://jrds.fr/jenkins/plugin/repository/everything/) linked to jrds' jenkins but RRDCachedBackend is not build using maven, that's no good. I will write one as soon as I have a few spare time.

--
You received this message because you are subscribed to the Google Groups "rrd4j-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rrd4j-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

AdityaVaja

unread,
Mar 31, 2015, 4:36:46 PM3/31/15
to rrd4j-...@googlegroups.com
Hi Fabrice,

Thanks for the quick response. I found an easier way - include the files directly from https://github.com/fbacchella/jrds/tree/HeavyCaching/src/jrds/caching and remove the dependencies on jrds (minor) and compile the directfd.c and include it.

Only to realize that caching backend is caching pages per file and its actually better performance wise in my case if I keep it to the default memory mapped file which is using NIO :)

For reference, if anyone wants to use just the caching backend without importing the jar and its dependency on jrds:
  1. copy the java files at that above link to your project at some package (ideally create a new one in the hierarchy called caching or jrds)
  2. comment the calls to jrds library methods from the java file manually
  3. update the method names in directfd.c based on package hierarchy in your code where you place the java files
  4. compile the c file using the following command 
    1. gcc -shared -fPIC -I/usr/lib/jvm/java-7-openjdk-amd64/include/ -I/usr/lib/jvm/java-7-openjdk-amd64/include/linux/ directfd.c -o libdirect.so
  5. if direct.h import in C file causes error, import jni.h instead and fix errors if any
Just for comparison I also tried to use MemoryBackend, but the method to access the files has to be changed I guess. Could not find an example of how to use MemoryBackend. Would you have an idea about that?

Thanks again for your help!

Fabrice Bacchella

unread,
Mar 31, 2015, 6:00:07 PM3/31/15
to rrd4j-...@googlegroups.com
Le 31 mars 2015 à 22:36, AdityaVaja <wolver...@gmail.com> a écrit :

Hi Fabrice,

Thanks for the quick response. I found an easier way - include the files directly from https://github.com/fbacchella/jrds/tree/HeavyCaching/src/jrds/caching and remove the dependencies on jrds (minor) and compile the directfd.c and include it.

Only to realize that caching backend is caching pages per file and its actually better performance wise in my case if I keep it to the default memory mapped file which is using NIO :)

Not sure. I found that NIO can issue big read, that are too big for rrd4j usage where small IO are much better. It leads to a lot of useless data in cache. The caching backend is interesting if you set flush cache to a bigger interval that collect step, so it can merge many writes.


For reference, if anyone wants to use just the caching backend without importing the jar and its dependency on jrds:
  1. copy the java files at that above link to your project at some package (ideally create a new one in the hierarchy called caching or jrds)
  2. comment the calls to jrds library methods from the java file manually
  3. update the method names in directfd.c based on package hierarchy in your code where you place the java files
  4. compile the c file using the following command 
    1. gcc -shared -fPIC -I/usr/lib/jvm/java-7-openjdk-amd64/include/ -I/usr/lib/jvm/java-7-openjdk-amd64/include/linux/ directfd.c -o libdirect.so
  5. if direct.h import in C file causes error, import jni.h instead and fix errors if any
Why all this work ? ant failed for you ?

Just for comparison I also tried to use MemoryBackend, but the method to access the files has to be changed I guess. Could not find an example of how to use MemoryBackend. Would you have an idea about that?

What change are needed ? Except for file path, there should be none.

AdityaVaja

unread,
Mar 31, 2015, 6:28:01 PM3/31/15
to rrd4j-...@googlegroups.com
Inline.

On Tue, Mar 31, 2015 at 3:00 PM, Fabrice Bacchella <fabrice....@orange.fr> wrote:

Le 31 mars 2015 à 22:36, AdityaVaja <wolver...@gmail.com> a écrit :

Hi Fabrice,

Thanks for the quick response. I found an easier way - include the files directly from https://github.com/fbacchella/jrds/tree/HeavyCaching/src/jrds/caching and remove the dependencies on jrds (minor) and compile the directfd.c and include it.

Only to realize that caching backend is caching pages per file and its actually better performance wise in my case if I keep it to the default memory mapped file which is using NIO :)

Not sure. I found that NIO can issue big read, that are too big for rrd4j usage where small IO are much better. It leads to a lot of useless data in cache. The caching backend is interesting if you set flush cache to a bigger interval that collect step, so it can merge many writes.

​I tried with pageCache(2) and syncInterval(300) - every 5 intervals. Since the data is only 64 * 12 = 768 bytes, two pages = 8kB should be enough for 8 intervals. However there are 4K files updated every minute, so that may be the reason. It was 3~4 times slower than NIO. Since memory is not an issue, I thought its a better fit.


For reference, if anyone wants to use just the caching backend without importing the jar and its dependency on jrds:
  1. copy the java files at that above link to your project at some package (ideally create a new one in the hierarchy called caching or jrds)
  2. comment the calls to jrds library methods from the java file manually
  3. update the method names in directfd.c based on package hierarchy in your code where you place the java files
  4. compile the c file using the following command 
    1. gcc -shared -fPIC -I/usr/lib/jvm/java-7-openjdk-amd64/include/ -I/usr/lib/jvm/java-7-openjdk-amd64/include/linux/ directfd.c -o libdirect.so
  5. if direct.h import in C file causes error, import jni.h instead and fix errors if any
Why all this work ? ant failed for you ?

I had to import as little dependencies as possible. Plus we already use sl4j logger, so no point in adding another one to the mix :)​
 

Just for comparison I also tried to use MemoryBackend, but the method to access the files has to be changed I guess. Could not find an example of how to use MemoryBackend. Would you have an idea about that?

What change are needed ? Except for file path, there should be none.

​Ah, I forgot that. I changed that and it works. Thanks!​

 
-- 
You received this message because you are subscribed to the Google Groups "rrd4j-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rrd4j-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fabrice Bacchella

unread,
Mar 31, 2015, 6:50:37 PM3/31/15
to rrd4j-...@googlegroups.com


​I tried with pageCache(2) and syncInterval(300) - every 5 intervals. Since the data is only 64 * 12 = 768 bytes, two pages = 8kB should be enough for 8 intervals. However there are 4K files updated every minute, so that may be the reason. It was 3~4 times slower than NIO. Since memory is not an issue, I thought its a better fit.

That's not the way it works. Each file is sliced in 4k page and  slices are loaded as needed. You're allowing only 2 of those slices in memory at the same time for all the files. That's the way mmap works any way, but with more agressive page loading and more frequent flush. But it uses all the available memory. I'm surprised it's only 4 times slower with such a little memory.

You should count 2-3 pages for each file open to keep as much memory as possible in memory and delay flush.

To check the NIO usage, try the command "free", the column to look at is "cached".
Reply all
Reply to author
Forward
0 new messages