Re: [boost] [log] Patch: rotating file collector

402 views
Skip to first unread message

Andrey Semashev

unread,
Apr 23, 2013, 8:18:44 AM4/23/13
to bo...@lists.boost.org
On Tue, Apr 23, 2013 at 3:58 PM, Alexander Arhipenko <arhi...@gmail.com>wrote:

>
> I've added implementation of rotating file collector -
> the patch against boost log trunk attached.
> This is not the final version, I'm going at least to finish most of the
> TODOs.
>
> However, it would be nice to receive feedback and questions from
> library author as well as from people interested in this feature.
>
> Andrey, could you please find several minutes to review the attached patch?
>

Thanks for the patch. I didn't look deeply into it yet but from the first
glance it looks like you changed the existing file collector rather than
introduce an alternative one. Am I right? If so, could this be made as a
pure addition rather than modification? I would prefer to keep the current
behavior also available for the users.

Also, I would like to discuss following question with you :
> rotate_file invocation in ~text_file_backend -
> I would prefer to have this call during text_file_backend construction
> phase.
>

I'll have to take a closer look later to answer that but it seems odd to
have to rotate something when nothing have been written yet. The
rotate_file call is intended to place the written file to the storage
governed by the collector, and at construction this file hasn't been
created yet. Could you describe why you need this?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Andrey Semashev

unread,
Apr 23, 2013, 4:27:43 PM4/23/13
to bo...@lists.boost.org
On Tuesday 23 April 2013 16:18:44 you wrote:
> On Tue, Apr 23, 2013 at 3:58 PM, Alexander Arhipenko
<arhi...@gmail.com>wrote:
> > I've added implementation of rotating file collector -
> > the patch against boost log trunk attached.
> > This is not the final version, I'm going at least to finish most of the
> > TODOs.
> >
> > However, it would be nice to receive feedback and questions from
> > library author as well as from people interested in this feature.
> >
> > Andrey, could you please find several minutes to review the attached
> > patch?
>
> Thanks for the patch. I didn't look deeply into it yet but from the first
> glance it looks like you changed the existing file collector rather than
> introduce an alternative one. Am I right? If so, could this be made as a
> pure addition rather than modification? I would prefer to keep the current
> behavior also available for the users.

I took a closer look now and my first impression was wrong - the original
behavior is still accessible, this is good. However, I have several notes:

1. I would still prefer the interface to be distinct for different types of
collectors, and not based on the file name pattern presence. These are
distinct objects with different behaviour and it should be apparent from the
collector construction what behavior it will implement.

I'm prepared to rename the current file::make_collector method to better
describe the current behavior (I'm open to suggestions on the new name). The
new collector should be created with a different function, also appropriately
named (file::make_stacking_collector?). It's also probably a good idea to
extract collectors and the base file::collector interface to separate headers.

2. The new collector should also be kept in the collectors repository. This
way it is possible to create multiple sinks collecting files in the common
target directory, without different collectors interferring with each other.
Basically, you'll have to move file_collector_hook and
enable_shared_from_this< file_collector > base classes to file_collector_base,
as well as m_pRepository.

This introduces an additional possible issue if the user tries to create
different sinks with different collectors targeting the same directory. This
case should be detected (e.g. by checking the type of the collector in
file_collector_repository::get_collector) and handle it with an exception
(log::setup_error).

3. In rotating_file_collector::scan_for_files, invalid file name pattern
should also be handled with setup_error.

4. I think, it should be possible to avoid the do_rollover method.
file_collector_base::store_file should be moved to file_collector, and there
should be a separate rotating_file_collector::store_file method as well. Any
common subroutines (like checking available free space, erasing old files,
etc.) can be extracted as distinct functions into file_collector_base and
called from store_file methods of the derivatives.

5. You don't seem to rename the collected files when the file pattern contains
date/time (see date_rollover_policy). Is that right? Obviously, you will only
have the date and time point of the rotation and not the file creation.

6. Actually, the date/time support for the new collector seems quite broken.
The current behavior, when the file is originally named by the sink, ensures
that the date/time in the file name corresponds to the first records in the
file. If the date/time is added by the collector, the timestamp will
correspond to the latest written records, which it counterintuitive. The check
in rotating_file_collector::scan_for_files prevents users from setting
date/time in the sink, so basically, it will only be usable if there is only a
file counter placeholder in the file name pattern.

I'm not sure what is the best way to solve this. Maybe it's better to avoid
dealing with date/time placeholders in the collector at all and only work with
file counters. But scan_for_files should be changed so that it is possible to
set a pattern with date/time placeholders in the sink backend. In that case I
think it's reasonable to remove the file name pattern from the file collector
construction and simply always add file counters right before the extension.
That way the file name pattern is only set in the sink backend, which
simplifies the interface a lot.

7. You should probably rebase your patch to the Boost trunk, as the library
has been merged recently. For now it's pretty close to the SourceForge SVN, so
you should not have any problems. I'm going to work in Boost SVN now, so the
SourceForge SVN will get outdated eventually.

> Also, I would like to discuss following question with you :
> > rotate_file invocation in ~text_file_backend -
> > I would prefer to have this call during text_file_backend construction
> > phase.
>
> I'll have to take a closer look later to answer that but it seems odd to
> have to rotate something when nothing have been written yet. The
> rotate_file call is intended to place the written file to the storage
> governed by the collector, and at construction this file hasn't been
> created yet. Could you describe why you need this?

This question remains actual.

Alexander Arhipenko

unread,
Apr 24, 2013, 11:09:37 AM4/24/13
to bo...@lists.boost.org
Hi Andrey,

First of all, thanks for such a detailed review.

Please, find my answers/proposals below.

On Tue, Apr 23, 2013 at 11:27 PM, Andrey Semashev
<andrey....@gmail.com> wrote:
> On Tuesday 23 April 2013 16:18:44 you wrote:
>> On Tue, Apr 23, 2013 at 3:58 PM, Alexander Arhipenko
> <arhi...@gmail.com>wrote:
>> > I've added implementation of rotating file collector -
>> > the patch against boost log trunk attached.
>> [snip]
>> ...
>>
>> Thanks for the patch. I didn't look deeply into it yet but from the first
>> glance it looks like you changed the existing file collector ...
>> [snip]
>> ...
>
> I took a closer look now and my first impression was wrong - the original
> behavior is still accessible, this is good. However, I have several notes:
>
> 1. I would still prefer the interface to be distinct for different types of
> collectors, and not based on the file name pattern presence. These are
> distinct objects with different behaviour and it should be apparent from the
> collector construction what behavior it will implement.
>

Agree to that. I also had an intention to provide separate function for
rotating collector, but decided to expand existing one just to save typing.

> I'm prepared to rename the current file::make_collector method to better
> describe the current behavior (I'm open to suggestions on the new name). The
> new collector should be created with a different function, also appropriately
> named (file::make_stacking_collector?). It's also probably a good idea to
> extract collectors and the base file::collector interface to separate headers.
>

I see 2 options here.
First, have 2 functions: make_collector, make_rotating_collector.
Second, have:
a) enum rotation_method { rotate_sequential, rotate_??? }
(enum collection_method ?)
and
b) pass additional parameter rotation_method to make_collector function

Personally I would prefer second approach.

> 2. The new collector should also be kept in the collectors repository. This
> way it is possible to create multiple sinks collecting files in the common
> target directory, without different collectors interferring with each other.
> Basically, you'll have to move file_collector_hook and
> enable_shared_from_this< file_collector > base classes to file_collector_base,
> as well as m_pRepository.
>
> This introduces an additional possible issue if the user tries to create
> different sinks with different collectors targeting the same directory. This
> case should be detected (e.g. by checking the type of the collector in
> file_collector_repository::get_collector) and handle it with an exception
> (log::setup_error).
>

The reason I haven't added rotating collector to repository
is the "possible issue" you are describing:
several collectors targeting one directory but different files.
In current implementation,
it's impossible to have more than 1 collector for one directory.
It is okay when the collector responsibilities are:
* files copying to target directory
* old files removal (according to max size and/or min free space setup)
...but does not work for rotating collector case.
To elaborate more, consider use case:
* /var/log - is a directory where all the system log files reside
* foo.log - foo service log file
* bar.log - bar service log file
Both foo and bar log files have to be rotated, e. g.:
foo.log (latest), foo.log.1 (previous), ..., foo.log.N (earliest).
Thus, user have to setup 2 collectors and will got log::setup_error.

The easiest way to make described use case working -
do not store rotating collectors in repository.
The more sophisticated approach would be:
use additional criteria for collector identification (file_name pattern?).

Please, comment on this.

> 3. In rotating_file_collector::scan_for_files, invalid file name pattern
> should also be handled with setup_error.
>

Agree, will do.

> 4. I think, it should be possible to avoid the do_rollover method.
> file_collector_base::store_file should be moved to file_collector, and there
> should be a separate rotating_file_collector::store_file method as well. Any
> common subroutines (like checking available free space, erasing old files,
> etc.) can be extracted as distinct functions into file_collector_base and
> called from store_file methods of the derivatives.
>

I don't see any advantages of eliminating do_rollover method
in favour of common subroutine methods (as well as vice versa ;) ).
I don't mind to implement approach you are describing,
but maybe you can provide additional arguments why do we need this change?

> 5. You don't seem to rename the collected files when the file pattern contains
> date/time (see date_rollover_policy). Is that right? Obviously, you will only
> have the date and time point of the rotation and not the file creation.
>

You're right that collected files are not renamed for date/time patterns.
I don't have any strong opinion about whether file name should indicate
file creation or file rotation time.
However, gut feeling inclines me to the second case (file rotation time).

Let's consider existing logging components.
E.g. Python's logging module provides TimedRotatingFileHandler to deal with
time based rotation.
This handler indicates file creation time in the rotated file name.
But the files are rotated periodically (file sizes could differ).

On the opposite, I have following real life use case.
User logins to system and wants to monitor events for day 23.
There are 2 log files in the log directory: file.log and file.log.2013-04-21.
If filename indicates creation time, then it's hard to guess
what file to look at: records could be found in both files.
If 2013-04-21 indicates rotation time,
then it's clear that the record for day 23 is in file.log.

It will be interesting to hear opinions from you.

> 6. Actually, the date/time support for the new collector seems quite broken.
> The current behavior, when the file is originally named by the sink, ensures
> that the date/time in the file name corresponds to the first records in the
> file. If the date/time is added by the collector, the timestamp will
> correspond to the latest written records, which it counterintuitive. The check
> in rotating_file_collector::scan_for_files prevents users from setting
> date/time in the sink, so basically, it will only be usable if there is only a
> file counter placeholder in the file name pattern.
>
> I'm not sure what is the best way to solve this. Maybe it's better to avoid
> dealing with date/time placeholders in the collector at all and only work with
> file counters. But scan_for_files should be changed so that it is possible to
> set a pattern with date/time placeholders in the sink backend. In that case I
> think it's reasonable to remove the file name pattern from the file collector
> construction and simply always add file counters right before the extension.
> That way the file name pattern is only set in the sink backend, which
> simplifies the interface a lot.
>

Idea of dealing with counters only seems reasonable to me.
I need to analyze this a little bit more...
Please, also consider my comments regarding point 5.

> 7. You should probably rebase your patch to the Boost trunk, as the library
> has been merged recently. For now it's pretty close to the SourceForge SVN, so
> you should not have any problems. I'm going to work in Boost SVN now, so the
> SourceForge SVN will get outdated eventually.
>

Will do.

>> Also, I would like to discuss following question with you :
>> > rotate_file invocation in ~text_file_backend -
>> > I would prefer to have this call during text_file_backend construction
>> > phase.
>>
>> I'll have to take a closer look later to answer that but it seems odd to
>> have to rotate something when nothing have been written yet. The
>> rotate_file call is intended to place the written file to the storage
>> governed by the collector, and at construction this file hasn't been
>> created yet. Could you describe why you need this?
>
> This question remains actual.
>
>

I'll try to explain why I'm requesting this.
Point 1 - when rotating collector is used,
file name that is currently being written is somewhat constant.
But when application shut down, it becames something like
file_name.log + ".1" (counter name pattern case).
I would like to have file name the same for both cases:
when service is running and when not running.
Also, if user collects log files to e.g. /var/log/archive folder but
the log file itself resides in /var/log directory,
then latest application log should be seeked in 2 place which is annoying.

Point 2 - user changed collector configuration (decreased min free space).
After starting application he would expect that some files will be removed,
but this will occur only on next rotation phase.
So, I would re-phrase:
* it's preferable to *try* rotate_file
during text_file_backend construction phase
* there is no need to rotate_file during text_file_backend destruction


Andrey, I will wait for you response and continue working on patch.

Thanks and Regards

Andrey Semashev

unread,
May 1, 2013, 8:53:20 AM5/1/13
to bo...@lists.boost.org
Ok, I finally got to this, sorry for the delay.

On Wednesday 24 April 2013 18:09:37 Alexander Arhipenko wrote:
>
> > I'm prepared to rename the current file::make_collector method to better
> > describe the current behavior (I'm open to suggestions on the new name).
> > The new collector should be created with a different function, also
> > appropriately named (file::make_stacking_collector?). It's also probably
> > a good idea to extract collectors and the base file::collector interface
> > to separate headers.
> I see 2 options here.
> First, have 2 functions: make_collector, make_rotating_collector.
> Second, have:
> a) enum rotation_method { rotate_sequential, rotate_??? }
> (enum collection_method ?)
> and
> b) pass additional parameter rotation_method to make_collector function
>
> Personally I would prefer second approach.

I would rather prefer different factory functions for different collectors so
that the parameters are independent and have the appropriate meaning for each
collector type.

I would also prefer another name for the new collector since I'm afraid
"rotation" term would become too overloaded since it is used not with regard
to rotating files in the sink backend. "Stacking" collector maybe? "Rolling"
collector?
Well, the rule of the thumb the library follows is that only one process is
managing the target directory. This means:

a. If some other process manages the storage, you just don't set up the
collector and leave it to this process to collect the rotated log files from
the sink. In this case we do nothing in Boost.Log to help it.

b. If Boost.Log is managing the storage, you set up the collector and mostly
rely on the fact that noone else is messing around with the storage. This
means that we don't try to detect sudden file deletions or additions to the
storage, although should that happen the collector should handle the situation
gracefully. This means that if there are two services (foo and bar) running,
they should be collecting their log files to separate directories. This is
fairly common in Linux, at least, since many services would just create
directories within /var/log and put their logs there. If this is not
acceptable, then go for case (a).

Now, in (b) you still have to implement Boost.Log so that it is possible to
create multiple sinks collecting to the same target directory, and this is
solved by the collectors repository. The problem of having different
collectors targeting the same directory can be solved by simply prohibiting
such configuration, because the desired behavior is not clear in this case. So
if you put the new collector to the repository and add a check for its type
before adding, you should be fine.

> > 4. I think, it should be possible to avoid the do_rollover method.
> > file_collector_base::store_file should be moved to file_collector, and
> > there should be a separate rotating_file_collector::store_file method as
> > well. Any common subroutines (like checking available free space, erasing
> > old files, etc.) can be extracted as distinct functions into
> > file_collector_base and called from store_file methods of the
> > derivatives.
>
> I don't see any advantages of eliminating do_rollover method
> in favour of common subroutine methods (as well as vice versa ;) ).
> I don't mind to implement approach you are describing,
> but maybe you can provide additional arguments why do we need this change?

This is a design clarity question. The do_rollover method is specific to the
new collector, and by exposing it to the base class you're leaking these
specifics to the base class, which is supposed to be oblivious to it. Also,
the method is virtual and you already have a virtual table dispatch when
store_file is called. Not that the performance really matters in this case,
but in general this counts as a design flaw to me.

> > 5. You don't seem to rename the collected files when the file pattern
> > contains date/time (see date_rollover_policy). Is that right? Obviously,
> > you will only have the date and time point of the rotation and not the
> > file creation.
> You're right that collected files are not renamed for date/time patterns.
> I don't have any strong opinion about whether file name should indicate
> file creation or file rotation time.
> However, gut feeling inclines me to the second case (file rotation time).

I think, most, if not all systems I had experience with (including proprietary
developments) put file creation time in the file name. And this is quite
expected, I think. I mean, if you're looking for a file that might contain
logs for a specific time point, you would typically assume the file name to
indicate the beginning of the time frame and not the end of it.

Of course, we could put both time stamps into the file name, but that's
another story.

> On the opposite, I have following real life use case.
> User logins to system and wants to monitor events for day 23.
> There are 2 log files in the log directory: file.log and
> file.log.2013-04-21. If filename indicates creation time, then it's hard to
> guess
> what file to look at: records could be found in both files.
> If 2013-04-21 indicates rotation time,
> then it's clear that the record for day 23 is in file.log.
>
> It will be interesting to hear opinions from you.

If I have a login date (20013-04-23), I would look into file.log.2013-04-21
first. Call it whatever you like, but my intuition tells me that the file name
indicates the date of the file creation. :) Also, the last modification time
might give you a hint on when the file ends, although it depends on your file
manager how apparent this attribute is.

You can avoid the confusion if you name the active file with its creation time
as well (which is what Boost.Log does, currently). So if you have
file.log.2013-04-21 and file.log.2013-04-25 then it's pretty obvoius which
file contains the records you seek.

> >> Also, I would like to discuss following question with you :
> >> > rotate_file invocation in ~text_file_backend -
> >> > I would prefer to have this call during text_file_backend construction
> >> > phase.
> >>
> >> I'll have to take a closer look later to answer that but it seems odd to
> >> have to rotate something when nothing have been written yet. The
> >> rotate_file call is intended to place the written file to the storage
> >> governed by the collector, and at construction this file hasn't been
> >> created yet. Could you describe why you need this?
> >
> > This question remains actual.
>
> I'll try to explain why I'm requesting this.
> Point 1 - when rotating collector is used,
> file name that is currently being written is somewhat constant.
> But when application shut down, it becames something like
> file_name.log + ".1" (counter name pattern case).

This behavior is chosen to guarantee there are no files piling up in the
directory where the sink backend writes them. Imagine there are timestamps in
the file name, so the next time you start your application the file names will
be different and the old files will be left unmanaged. So the files are
collected when the sink is destroyed and then managed by the collector after
the application restarts and runs scan_for_files.

Another problem can happen even with file counters only, if file appending is
not enabled (the default). After scan_for_files doesn't find the last file's
counter, the sink will reconstruct the file name that is the same as it was in
the previous run and will overwrite the old file. This kind of gotcha I would
like to avoid.

> I would like to have file name the same for both cases:
> when service is running and when not running.
> Also, if user collects log files to e.g. /var/log/archive folder but
> the log file itself resides in /var/log directory,
> then latest application log should be seeked in 2 place which is annoying.

Well, right now I don't have ideas on how to support this without interferring
with my previous points. You see, when file collectors are used, the file that
the sink writes is a temporary, really.

Actually, I think if you open it on Windows while the sink attempts to rotate
it, you can screw up the rotation and loose some records while the sink tries
to recreate the file. I haven't tried this but if I'm right then you're
basically not recommended to open the file while it's being written.

> Point 2 - user changed collector configuration (decreased min free space).
> After starting application he would expect that some files will be removed,
> but this will occur only on next rotation phase.
> So, I would re-phrase:
> * it's preferable to *try* rotate_file
> during text_file_backend construction phase

You're not talking about file rotation here but rather about checking the
target storage constraints when initializing the sink. This might be
worthwhile although, as I said before, we don't expect someone else doing
something with the storage behind the scene. Besides, the next time a file is
stored, the collector will check the constraints anyway _before_ storing the
file, so the additional check on the sink construction isn't really needed.

> * there is no need to rotate_file during text_file_backend destruction

See my previous points.

Rob Stewart

unread,
May 2, 2013, 8:09:13 AM5/2/13
to bo...@lists.boost.org
On May 1, 2013, at 8:53 AM, Andrey Semashev <andrey....@gmail.com> wrote:

> On Wednesday 24 April 2013 18:09:37 Alexander Arhipenko wrote:

[snip]

> I would also prefer another name for the new collector since I'm afraid "rotation" term would become too overloaded since it is used not with regard to rotating files in the sink backend. "Stacking" collector maybe? "Rolling" collector?

I'm ignorant of you library terminology, but "rotation" is appropriate if you maintain a set of numbered files, so a given file's number changes as it ages. In that case, the contents rotates through the numbered filenames.

OTOH, if the files are dated, then they are not renamed when a new file is created, so the log output spills over into a new file when the date changes. In that case, "rollover" makes sense.

>>> 5. You don't seem to rename the collected files when the file pattern contains date/time (see date_rollover_policy). Is that right? Obviously, you will only have the date and time point of the rotation and not the file creation.
>> You're right that collected files are not renamed for date/time patterns. I don't have any strong opinion about whether file name should indicate file creation or file rotation time.
>> However, gut feeling inclines me to the second case (file rotation time).
>
> I think, most, if not all systems I had experience with (including proprietary developments) put file creation time in the file name. And this is quite expected, I think. I mean, if you're looking for a file that might contain logs for a specific time point, you would typically assume the file name to
> indicate the beginning of the time frame and not the end of it.
>
> Of course, we could put both time stamps into the file name, but that's another story.

I agree that the creation date/time makes the most sense. Both creation and close dates/times would be ideal, but with a set of creation-dates/times-only files, the end date/time can be inferred from that on the next file.

>> On the opposite, I have following real life use case.
>> User logins to system and wants to monitor events for day 23.
>> There are 2 log files in the log directory: file.log and
>> file.log.2013-04-21. If filename indicates creation time, then it's hard to guess what file to look at: records could be found in both files.
>> If 2013-04-21 indicates rotation time, then it's clear that the record for day 23 is in file.log.
>>
>> It will be interesting to hear opinions from you.
>
> If I have a login date (20013-04-23), I would look into file.log.2013-04-21 first. Call it whatever you like, but my intuition tells me that the file name indicates the date of the file creation. :) Also, the last modification time might give you a hint on when the file ends, although it depends on your file
> manager how apparent this attribute is.

If the current file has no date/time in the name, that's a problem. An entry for the 23rd can only be located by searching. I'd start by running head on the most recent file. If the first entry is past the desired time, then I'd know to look in the file with 21 in the name.

> You can avoid the confusion if you name the active file with its creation time as well (which is what Boost.Log does, currently). So if you have
> file.log.2013-04-21 and file.log.2013-04-25 then it's pretty obvoius which file contains the records you seek.

Right

> Actually, I think if you open it on Windows while the sink attempts to rotate it, you can screw up the rotation and loose some records while the sink tries to recreate the file. I haven't tried this but if I'm right then you're basically not recommended to open the file while it's being written.

You must allow for following the current file or reading previous files, at all times, as much as possible. What's not possible just takes more effort. :)


___
Rob

(Sent from my portable computation engine)

Andrey Semashev

unread,
May 2, 2013, 8:54:29 AM5/2/13
to bo...@lists.boost.org
On Thursday 02 May 2013 08:09:13 Rob Stewart wrote:
> On May 1, 2013, at 8:53 AM, Andrey Semashev <andrey....@gmail.com>
wrote:
>
> > I would also prefer another name for the new collector since I'm afraid
> > "rotation" term would become too overloaded since it is used not with
> > regard to rotating files in the sink backend. "Stacking" collector
> > maybe? "Rolling" collector?
> I'm ignorant of you library terminology, but "rotation" is appropriate if
> you maintain a set of numbered files, so a given file's number changes as
> it ages. In that case, the contents rotates through the numbered filenames.
>
> OTOH, if the files are dated, then they are not renamed when a new file is
> created, so the log output spills over into a new file when the date
> changes. In that case, "rollover" makes sense.

The library supports both date/time stamped files and file counters. And when
the file is rotated, it is simply put into the storage, and a new file name is
generated (with the new timestamp and/or the next counter value). This process
of switching the actively written file is called "rotation" (at least, that's
the meaning I put into the term). English is not my native language, so there
may be a better word for it.

What Alexander suggested is a different mechanism of storing the files, when
the previous files are renamed (their counters are incremented) and the new
file is named so that it has the least counter value. So the storage behaves
kind of like a stack of files, that's why I also suggested "stacking"
collector. But in any case, whether or not this new mechanism is used, file
rotation will still take place, so I think it is a good idea to name the
collector differently to avoid the confusion.

> > Actually, I think if you open it on Windows while the sink attempts to
> > rotate it, you can screw up the rotation and loose some records while the
> > sink tries to recreate the file. I haven't tried this but if I'm right
> > then you're basically not recommended to open the file while it's being
> > written.
> You must allow for following the current file or reading previous files, at
> all times, as much as possible. What's not possible just takes more effort.
> :)

I agree. That shouldn't be a problem on systems when an open file can be
deleted or renamed, which covers all UNIX-like systems, if I'm not mistaken. I
have always been annoyed with the different behavior of Windows in this
regard, especially in conjunction with some antivirus services that love to
poke their noses into files. In fact, that was one reason I tried to minimize
the number of file system operations that need to be performed on a file
rotation.

Anyway, I'll need to try this next time I'm running Windows to see the impact.

Alexander Arhipenko

unread,
May 6, 2013, 8:09:21 AM5/6/13
to bo...@lists.boost.org
Hi Andrey,

Again, thanks for you comments.

I've fixed several items mentioned in code review.
To be precise: 1, 3, 4.
1: factory function name, make_rollover_collector was chosen.
3:
> 3. In rotating_file_collector::scan_for_files, invalid file name pattern
> should also be handled with setup_error.
>
4:
> 4. I think, it should be possible to avoid the do_rollover method.
> file_collector_base::store_file should be moved to file_collector, and there
> should be a separate rotating_file_collector::store_file method as well. Any
> common subroutines (like checking available free space, erasing old files,
> etc.) can be extracted as distinct functions into file_collector_base and
> called from store_file methods of the derivatives.
>

Also, patch was rebased against boost trunk.

Regarding points 5 and 6: after analyzing this case,
I've dropped support for datetime pattern in rollover collector.
Please, find some explanations below.
First, using last write time for the file name pattern was not adopted
unambiguously by community. Also, there is a counter example in existing
component - python TimedRotatingFileHandler, that uses creation timestamp.
Second, if user wants to use file creation timestamp in rotated file name,
she can use existing make_collector function for that purpose.
You've also mentioned an option of using datetime placeholder in file sink:

> ... Maybe it's better to avoid
> dealing with date/time placeholders in the collector at all and only work with
> file counters. But scan_for_files should be changed so that it is possible to
> set a pattern with date/time placeholders in the sink backend.

I liked that option, but existing implementation should be enhanced to
support this: currently during rollover phase files are simply renamed.
I.e., fileN -> fileN+1, fileN-1 -> fileN, ..., file -> file1.
Thus, datetime placeholder will be invalid after rollover -
it will point to the wrong date.
If you think this option is very nice to have - please tell.


There are still 2 points we need to reach an agreement on:

a. > > 2. The new collector should also be kept in the collectors repository.

b. > * there is no need to rotate_file during text_file_backend destruction.

In regards to a. - collectors repository.
I agree that only one file_collector should serve one target directory.
Regarding rollover collectors - they are different beasts.
They do not serve the whole directory, but only file series in that directory.
So, I propose 2 alternatives:
a) use target directory and file name pattern as collector "identity"
or
b) do not put rollover collector into repository

Now quoting some of your arguments against:

> b. If Boost.Log is managing the storage, you set up the collector and mostly
> rely on the fact that noone else is messing around with the storage. This
> means that we don't try to detect sudden file deletions or additions to the
> storage, although should that happen the collector should handle the situation
> gracefully. This means that if there are two services (foo and bar) running,
> they should be collecting their log files to separate directories. This is
> fairly common in Linux, at least, since many services would just create
> directories within /var/log and put their logs there. If this is not
> acceptable, then go for case (a).
> [snip]
> ... The problem of having different
> collectors targeting the same directory can be solved by simply prohibiting
> such configuration, because the desired behavior is not clear in this case.

Agree that Boost.Log should not detect any "external" file deletions or
additions. However, I don't see any problems of having 2 _rollover_
collectors targeting one directory.
Each of them will monitor own file series and manipulates (add, remove)
only on that series.
The only 1 point of "race condition" we can imagine -
that is minimum free space requirements.
However, this is still not a problem: even if both collectors perform
rotation at one point in time and minimum free space requirement is not met,
each collector will just remove its file(s),
freeing a little bit more of free space.

Also, some words regarding Linux example.
On my Ubuntu 12.04 there are both files that reside in separate directories:
/var/log/samba, /var/log/mongodb, /var/log/nginx
as well as in root log directory:
/var/log/syslog.log*, /var/log/dpkg.log*, /var/log/auth.log*.
That means that having files from different services in one directory
is existing practice. We need to give user an ability to cover this cases
in Boost.Log library.


In regards to b. - rotate_file during text_file_backend destruction.
You've provided a good argument that I can't argue against:
>[snip]
> ... Imagine there are timestamps in the file name,
> so the next time you start your application the file names will
> be different and the old files will be left unmanaged. So the files are
> collected when the sink is destroyed and then managed by the collector after
> the application restarts and runs scan_for_files.

However, I still think that having active log file with persistent name
will be a good benefit for many cases
(one of the examples - /var/log/*.log* files mentioned above).
After some thinking, I've came to conclusion that when
* file sink does not have placeholders (i.e., log file name is static),
* active log resides in collection directory (this point is not mandatory)
- then we can safely do not rotate such file during sink destruction.
So, I propose to implement the behavior described above, i.e.:
log file is not rotated during sink destruction when it's filename is static.

Regards
rollover_collector.patch

Andrey Semashev

unread,
May 10, 2013, 1:55:44 PM5/10/13
to bo...@lists.boost.org
On Mon, May 6, 2013 at 4:09 PM, Alexander Arhipenko <arhi...@gmail.com>wrote:

> Hi Andrey,
>
> Regarding points 5 and 6: after analyzing this case,
> I've dropped support for datetime pattern in rollover collector.
> You've also mentioned an option of using datetime placeholder in file sink:
>
> > ... Maybe it's better to avoid
> > dealing with date/time placeholders in the collector at all and only
> work with
> > file counters. But scan_for_files should be changed so that it is
> possible to
> > set a pattern with date/time placeholders in the sink backend.
>
> I liked that option, but existing implementation should be enhanced to
> support this: currently during rollover phase files are simply renamed.
> I.e., fileN -> fileN+1, fileN-1 -> fileN, ..., file -> file1.
> Thus, datetime placeholder will be invalid after rollover -
> it will point to the wrong date.
> If you think this option is very nice to have - please tell.
>

Why, the date will stay relevant even after the collector renames the file
(assuming it doesn't change the date). Say, you have these files:

file-2013-05-01.3.log
file-2013-05-03.2.log
file-2013-05-05.1.log

and you rotate file-2013-05-07.log. After rotation you should have:

file-2013-05-01.4.log
file-2013-05-03.3.log
file-2013-05-05.2.log
file-2013-05-07.1.log

So the dates in the file names still indicate the beginning of the time
frame of the files.

I admit, however, that it may look strange to use both dates and file
counters with rollover collector. Dates are useful for lookup purposes and
it shouldn't be difficult to support them. On the other hand the current
collector already supports them. I have no strong opinion on whether to
support dates in rollover collector or not, I'll leave it to you.

In regards to a. - collectors repository.
> I agree that only one file_collector should serve one target directory.
> Regarding rollover collectors - they are different beasts.
>

No, they are not, actually. Let's see why.


> Agree that Boost.Log should not detect any "external" file deletions or
> additions. However, I don't see any problems of having 2 _rollover_
> collectors targeting one directory.
> Each of them will monitor own file series and manipulates (add, remove)
> only on that series.
> The only 1 point of "race condition" we can imagine -
> that is minimum free space requirements.
> However, this is still not a problem: even if both collectors perform
> rotation at one point in time and minimum free space requirement is not
> met,
> each collector will just remove its file(s),
> freeing a little bit more of free space.
>

Nope, that case is still broken. Suppose you have two sinks that have two
rollover collectors with the common target directory. Disregarding thread
contention, the library will feed records to sinks sequentially, always in
the same order. When the space limit for the target directory is reached,
only one of the collectors will typically invoke old files deletion, and
the other collector will typically not.

You see, storage limits are set for the target storage and not for the sink
or set of files. While maintaining these limits the collector should
typically ensure that the files are being deleted in a fair manner. Always
deleting the oldest file (regardless of the sink that file came from)
maintains this behavior, assuming that all sinks rotate files at a more or
less similar rate. This approach also works with rollover collector, IMHO,
and that is why I think only one collector should be managing any given
storage.

I think I can see why you want to make the collectors separate - this would
simplify managing the queues of files from different sinks. You can
implement multiple queues of files within a single collector, which will
still exclusively manage its target directory. You can also implement
additional limits for each file queue, if you like, but then you'll have to
come up with some rules that will choose the files to delete in a fair
manner, and this is also much easier to implement within a single collector.

Also, some words regarding Linux example.
> On my Ubuntu 12.04 there are both files that reside in separate
> directories:
> /var/log/samba, /var/log/mongodb, /var/log/nginx
> as well as in root log directory:
> /var/log/syslog.log*, /var/log/dpkg.log*, /var/log/auth.log*.
> That means that having files from different services in one directory
> is existing practice. We need to give user an ability to cover this cases
> in Boost.Log library.
>

Yes, some services allow that, although I suspect most if not all of the
files in /var/log are written by syslog daemon. My point was that storing
files in service-specific directories is a fairly common practice already.


> However, I still think that having active log file with persistent name
> will be a good benefit for many cases
> (one of the examples - /var/log/*.log* files mentioned above).
> After some thinking, I've came to conclusion that when
> * file sink does not have placeholders (i.e., log file name is static),
> * active log resides in collection directory (this point is not mandatory)
> - then we can safely do not rotate such file during sink destruction.
> So, I propose to implement the behavior described above, i.e.:
> log file is not rotated during sink destruction when it's filename is
> static.
>

I don't think it is safe to implicitly rely on configuration in this case.
After all, directories and file name patterns may change after application
restart. And such subtle change in behavior without being explicitly
requested by user also doesn't look like a good idea.

I don't really like this, but we could add an option to the sink (a flag,
rotate_on_destroy), which by default would be true to maintain the current
behavior. If the user knows what he's doing, he would be able to disable
rotation on destructor.
Reply all
Reply to author
Forward
0 new messages