IMPORTANT: How to keep disk space under control?

747 views
Skip to first unread message

Konstantin Erman

unread,
Aug 15, 2014, 10:10:55 AM8/15/14
to get...@googlegroups.com
This problem looks a little far fetched, but as a matter of fact it is way more practical and important than it may seem. 

For production we need lots of VMs on a short notice, so we rent those VMs from Windows Azure or Amazon Cloud and to minimize the costs those VMs are as small as only reasonably possible and that includes disk space. So our production environment where Seq client is running is really limited on the disk space and we must be very carefully watching for it. 

Scenario which we absolutely must avoid is when due to the nature of processed data amount of logging dramatically increases, logs (rolling buffer files) consume available disk space and production machine crashes in the middle of the job because of not enough disk space.

When I think about it, the most convenient solution from our (client) point of view would be to specify maximum amount of disk space rolling buffer files are allowed to take (quota). When this quota is reached Serilog Seq client should do something like - delete old (sent) data and continue or worst case just stop using buffer files until more disk space available.

For all the practical purposes it is less harmful to lose some logs than to crash production machine. 

Do you have something like that? If it is not immediately available what would be the best way to have it implemented on our side?

Thank you!
Konstantin

nblum...@nblumhardt.com

unread,
Aug 15, 2014, 3:56:03 PM8/15/14
to get...@googlegroups.com
Hi there - indeed an important point. The Serilog FileSink and RollingFileSink both implement a maximum log file size, and RollingFileSink extends this with a cap on the number of rolling files that will be retained. I think the second aspect - number of files - is already covered by the Seq client's deletion of shipped logs.

The first parameter however - maximum file size - is not exposed, although it is present:

https://github.com/continuousit/seq-client/blob/master/src/Seq.Client.Serilog/Client/Serilog/DurableSeqSink.cs#L33


(The maximum file size is passed as null.)

I've created a ticket in the seq-client repository to expose this setting; it needs a little care to consider all implications, ensure binary compatibility and follow the obsoletion scheme we use - I'll endeavour to address it in the next week or so but in the meantime rebuilding the package with the parameter set would work for you.

Regards,
Nick

Konstantin Erman

unread,
Aug 15, 2014, 5:22:22 PM8/15/14
to get...@googlegroups.com
I'm glad we are on the same page regarding importance of keeping disk space under control.

Understood about maximum log file size. We definitely can wait a week to use that property when it is published in the official build. 

About the number of rolling files I need clarification though. First of all as far as I remember RollingFileSink uses hardcoded number of files to retain - 2. It would be great to make it configurable too. But there is more important aspect. You sound like rolling buffer file supposed to be deleted as soon as all the data are sent over to Seq, but from my experience those buffer files stay until their number exceeds two. Then the oldest one get deleted. This happens even when all data are sent to Seq for sure.

Let's get back to the original requirement - we need control over the maximum total disk space consumed by the logging services. For instance, we may specify that buffer files must not occupy more than 1 Gb of disk space, regardless of number of files. 
Whichever properties combination you can support to achieve that - would be good.    

Konstantin

Konstantin Erman

unread,
Aug 23, 2014, 1:21:12 AM8/23/14
to get...@googlegroups.com
Nick, I fetched latest NuGet package with maximum file size exposed - thank you for that!

At the same time I realized that we still don't have any guarantied cap on the disk usage, simply because we are not guarantied to have one buffer file for a day! Arbitrary number of the buffer files (with _xxx suffix) may be created and therefore limiting the size of one file does not really guaranty anything!

To reiterate the final goal of this - we don't really care how many buffer files may be created or how large each one can grow! We only care about keeping total size of those files under control - all the rest are technical details, which are not really important for end users. 

Please think how this can be accommodated.

Konstantin

On Friday, August 15, 2014 12:56:03 PM UTC-7, nblum...@nblumhardt.com wrote:

nblum...@nblumhardt.com

unread,
Aug 25, 2014, 5:41:09 PM8/25/14
to get...@googlegroups.com
Hi Konstantin,

Glad you spotted the new version - the change is actually thanks to Jasmin Sehic, who submitted the improvement as a PR (somehow I broke GitHub's pull request tracking for the changes, so it doesn't show up properly on the GitHub site....)

Regarding retained files, yes - the changes prevent runaway buffer growth on a single day (or part day if a file locking issue causes an intra-day roll) but over time buffer files will accumulate if the events are unable to be shipped. The durable sink uses a different algorithm for managing retained files (only the most recent two, if shipping works effectively), so I'm hesitant to complicate it by enabling the RollingFileSink's file retention limit without substantial consideration. I've raised https://github.com/continuousit/seq-client/issues/20 to track this.

Thanks for the feedback, will give it more thought as you suggest.

Regards,
Nick
Reply all
Reply to author
Forward
0 new messages