Hi Janis,
On Thursday, June 22, 2017 at 2:30:53 AM UTC+2, Janis Papanagnou wrote:>
[]
> It's not quite clear to me what exactly you intend to do with the filter file;
> does it contain _specific_ records, or is "<some time expression" substitute
> for a _pattern_ that shall match any expression?
uhm, let me be more explicit with an example:
<file.ref>
TAG: 01 [2017.05.00 23:32:42 432] Warning ...
TAG: 02 [2017.06.02 18:32:42 436] Warning ...
TAG: 01 [2017.07.03 15:33:42 472] Warning ...
<file.log>
TAG: 01 [2017.05.00 23:32:42 432] Warning ...
TAG: 02 [2017.06.02 18:32:42 436] Warning ...
TAG: 01 [2017.07.03 15:33:42 472] Warning ...
TAG: 05 [2017.07.04 02:33:43 433] Error ...
TAG: 06 [2017.08.04 13:53:43 432] Error ...
TAG: 05 [2017.09.04 02:32:52 441] Error ...
TAG: 05 [2017.10.04 02:33:34 644] Error ...
<file.filter>
# This expression shall have an explanatory comment.
TAG: 05 [<some time expression>] Error ...
<file.filtered.out>
TAG: 01 [2017.05.00 23:32:42 432] Warning ...
TAG: 02 [2017.06.02 18:32:42 436] Warning ...
TAG: 01 [2017.07.03 15:33:42 472] Warning ...
TAG: 06 [2017.08.04 13:53:43 432] Error ...
<file.diff>
TAG: 06 [2017.08.04 13:53:43 432] Error ...
> And (I suspect so) does your
> filter file contain more than one filter expressions?
correct. We can say that the likelihood that a comment in the file.filter is matching a string in the file.log is nearly 0, therefore we are safe to consider it as a non matching regex.
> It's the easiests to use if you have only patterns (i.e. no comments, or if
> no comments match the data) in the filter file.
>
> Say, you have in your filter file this pattern...
>
> TAG: 05 \[.*\] Error ...
>
> (escapes are necessary due to the regexp meta-characters used) then you can
> use this construct for comparison...
So each line is a regex, correct?
>
> diff <( grep -vf file.filter file.log ) file.ref
That is not bad at all.
>
> If your filter file is more complex you need to extend that (e.g. to comment
> filtering/skipping).
Well, indeed the process is a bit more complex than that.
Sometimes it happens that my file.log matches perfectly the file.ref, except for the time stamps. How could I 'ignore' these type of differences? I can indeed 'sed away' the time part of the file, just wondering if diff had some sort of mechanism to handle that.
Thanks a lot for your feedback,
Al