Removing and restoring rolled up chronicle queue files on demand

524 views
Skip to first unread message

Vachagan Balayan

unread,
Jun 8, 2016, 11:13:53 AM6/8/16
to Chronicle
Here is a use case, we have some data collector microservices that harvest lot of data from many different sources, process them and in the end produce lots of events which are stored to chronicle queue (latest version 4.3.3 if that matters).

The machine that hosts these microservices has a limited storage, for that purpose i want to add logic that will upload old rolled up chronicle files to remote storage and delete locally. It will keep track which files are accessible locally and which are not and what are the index numbers (so we will never get a read request for a data that is not physically present), if we get a request to read some data that is not available locally we download the necessary files on demand...

Note that all of this must happen without interrupting chronicle from work, the service keeps appending data...

So the question is, is it possible to delete rolled up data file from chronicle without interrupting? And if we return the data to its place exactly as it was, will chronicle work as expected when we try to read from it?

Vachagan Balayan

unread,
Jun 8, 2016, 12:29:07 PM6/8/16
to Chronicle
I'm experimenting with chronicle to figure out above mentioned. It seems to work even when i remove the older rollback files, if i start reading it will just read from the file that is present there (it seems that file name is what is used to determine the rollback setting)...

So i run a thread that writes a simple text using appender.writeText() every second and rollback is one minute... what happens is chronicle does rollup every minute but every file is 80+ Mb, why is that?

I've tried to build chronicle setting .blocksize(1024) assuming thats bytes... but its still producing 84MB files...
This is version 4.3.2 (i tried 4.4.4-snapshot but it fails with some Licensing class not found exception)

Could you please describe how do i use memory efficiently? I'd like to pack my information as compact as possible (without compression).

Rob Austin

unread,
Jun 8, 2016, 2:45:56 PM6/8/16
to java-ch...@googlegroups.com
Which OS are you using, 

Just because the OS shows you the file is 80+ Mb. It does not mean that chronicle has written to the whole 80+ Mb. 

In other words - I think it comes down to how the memory mapped files are represented by the OS. Rather than how chronicle is storing its data. 

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "Chronicle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-chronicl...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter Lawrey

unread,
Jun 8, 2016, 3:16:06 PM6/8/16
to java-ch...@googlegroups.com

Yes, what you suggest is possible in theory.  However instead of downloading the whole file you could access an individual entry remotely from a machine running chronicle engine.
Eg.

X writes messages.

Messages are copied to Y as a whole file or progressively. X deletes files successfully written to Y.

X needs an old entry and looks locally first or to Y to get a copied entry.

Peter.

Vachagan Balayan

unread,
Jun 12, 2016, 4:13:57 AM6/12/16
to Chronicle
Rob i'm using osx, 
but production code will run surely on linux, does linux create files of the size that chronicle had wrote? or its like this as well?

In this screenshot each of these literally have like 100KB actual data... I suppose its the OS that takes more space when the file is persisted...
I don't care much about osx, just need to know what will be when its working on linux... 

If my queue has wrote like 100kb in this cycle will it create 100kb sized file or bigger?


Peter Lawrey

unread,
Jun 12, 2016, 7:32:37 AM6/12/16
to java-ch...@googlegroups.com

You can check the amount of space used with du -k file. The file itself can use 256 KB for initial indexes so of you wrote 100 KB it might use up to 400 KB. You can tune this down if the overhead or block size is too large.

Peter.

Rob Austin

unread,
Jun 12, 2016, 12:00:11 PM6/12/16
to java-ch...@googlegroups.com
Also - Some OS, especially windows will show the size the memory mapped file is allocated. Not in fact what was used.  What is usually important, is to check the size used. As on Linux this is your constraint when running out of space. 

Rob

Sent from my iPhone

Vachagan Balayan

unread,
Jun 23, 2016, 9:36:33 AM6/23/16
to Chronicle
Rob i've deployed my app to ubuntu linux aws instance, in a nutshell it collects some data, writes to chronicle queue which is rolling up for example HOURLY,
right after queue is rolling, i always have 80MB files on hard drive...

Peter i can see that current mapped file does not actually consume that much memory, but when its rolls up, it takes 80M hard drive space...
I understand that this queue was designed for much higher frequency and 80M is not something you end up with when it rolls up, but problem is i have many different streams of data which have very different frequency of updates so some queues might fill up 1GB in one hour, others might be 20M in one hour... And as i'm using it as generic storage of raw event data i want to make sure it efficiently uses the hard drive (all this data goes to amazon and stores for a looong time...).

I can prepare a sample app that will show this, just by creating a queue with rollup of MINUTE and filling with say 1M data
and when rollup happens you'll see 80M file (on macos or ubuntu linux)...

If i can do this myself i'd be happy to contribute, just give me some insight why is this happening and what needs to be done to fix it...




Rob Austin

unread,
Jun 23, 2016, 9:41:08 AM6/23/16
to java-ch...@googlegroups.com

On 23 Jun 2016, at 14:36, Vachagan Balayan <vachagan...@gmail.com> wrote:

I can prepare a sample app that will show this, just by creating a queue with rollup of MINUTE and filling with say 1M data
and when rollup happens you'll see 80M file (on macos or ubuntu linux)...

so when you do on linux 

du -h

what file size do you see ?

Peter Lawrey

unread,
Jun 23, 2016, 11:42:40 AM6/23/16
to java-ch...@googlegroups.com

You can reduce the chunk size to say 1 MB however for long term storage I suggest compressing the files and it should be around 1/10th the size.

Vachagan Balayan

unread,
Jun 23, 2016, 12:31:19 PM6/23/16
to Chronicle
Oh thanks stupid mistake on my part, i messed up with block size...

Regarding compression i didn't consider it cos i thought the data is not repetitive and probably wont give good results for compression... definitely will try it...

Is there some compression functionality in chronicle itself? or shall i compress/decompress externally?
If yes is there any demo code on that?

Thanks in advance

Vachagan Balayan

unread,
Jun 23, 2016, 12:56:27 PM6/23/16
to Chronicle
Thanks Rob, it seems that i've been configuring wrong queue instance with the small block size, and the one who actually wrote were going with the default block size...

Rob Austin

unread,
Jun 23, 2016, 1:06:57 PM6/23/16
to java-ch...@googlegroups.com
ok - thanks for letting me know.

On 23 Jun 2016, at 17:56, Vachagan Balayan <vachagan...@gmail.com> wrote:

Thanks Rob, it seems that i've been configuring wrong queue instance with the small block size, and the one who actually wrote were going with the default block size...

Vachagan Balayan

unread,
Jun 24, 2016, 1:48:18 AM6/24/16
to Chronicle
I've tried default Deflater and it gave 1/2 of the size on best speed (pretty similar on best compression as well),

still this is pretty far from 1/10th of the size. What would you recommend Peter? is there anything in OpenHFT libs that i could use for compression?

I suppose i can gzip the result after Deflater but anyway if there is something already available from you or your awesome team i'd gladly use that :)

On Thursday, June 23, 2016 at 11:42:40 PM UTC+8, Peter Lawrey wrote:

Peter Lawrey

unread,
Jun 24, 2016, 2:57:50 AM6/24/16
to java-ch...@googlegroups.com

There us a call back when the file has rolled. You can use this to trigger a compression however I assume most people use a cron job.

Regards, Peter.

Reply all
Reply to author
Forward
0 new messages