Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

diffing files' content with 'waving' expressions

11 views
Skip to first unread message

alb

unread,
Jun 21, 2017, 6:54:55 PM6/21/17
to
Hi everyone,

I have a logfile that I need to compare to a reference logfile, but I need to
ignore a set of expressions in the diffing process. Let's see with an example:

<file.ref>
TAG: 01 [2017.05.00 23:32:42 432] Warning ...
TAG: 02 [2017.06.02 18:32:42 436] Warning ...
TAG: 01 [2017.07.03 15:33:42 472] Warning ...

<file.log>
TAG: 01 [2017.05.00 23:32:42 432] Warning ...
TAG: 02 [2017.06.02 18:32:42 436] Warning ...
TAG: 01 [2017.07.03 15:33:42 472] Warning ...
TAG: 05 [2017.07.04 02:33:43 433] Error ...

<file.filter>
# This expression shall have an explanatory comment.
TAG: 05 [<some time expression] Error ...

Now, if I apply the filter to file.log *before* diffing the reference, I would
have no differences.

I've looked up grep -f, not sure if it's really the best option.
Any pointer/suggestion/comment is appreciated!

Al

--
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Janis Papanagnou

unread,
Jun 21, 2017, 8:30:53 PM6/21/17
to
On 22.06.2017 00:54, alb wrote:
> Hi everyone,
>
> I have a logfile that I need to compare to a reference logfile, but I need to
> ignore a set of expressions in the diffing process. Let's see with an example:
>
> <file.ref>
> TAG: 01 [2017.05.00 23:32:42 432] Warning ...
> TAG: 02 [2017.06.02 18:32:42 436] Warning ...
> TAG: 01 [2017.07.03 15:33:42 472] Warning ...
>
> <file.log>
> TAG: 01 [2017.05.00 23:32:42 432] Warning ...
> TAG: 02 [2017.06.02 18:32:42 436] Warning ...
> TAG: 01 [2017.07.03 15:33:42 472] Warning ...
> TAG: 05 [2017.07.04 02:33:43 433] Error ...
>
> <file.filter>
> # This expression shall have an explanatory comment.
> TAG: 05 [<some time expression] Error ...
>
> Now, if I apply the filter to file.log *before* diffing the reference, I would
> have no differences.

It's not quite clear to me what exactly you intend to do with the filter file;
does it contain _specific_ records, or is "<some time expression" substitute
for a _pattern_ that shall match any expression? And (I suspect so) does your
filter file contain more than one filter expressions?

>
> I've looked up grep -f, not sure if it's really the best option.
> Any pointer/suggestion/comment is appreciated!

It's the easiests to use if you have only patterns (i.e. no comments, or if
no comments match the data) in the filter file.

Say, you have in your filter file this pattern...

TAG: 05 \[.*\] Error ...

(escapes are necessary due to the regexp meta-characters used) then you can
use this construct for comparison...

diff <( grep -vf file.filter file.log ) file.ref

If your filter file is more complex you need to extend that (e.g. to comment
filtering/skipping).

Janis

>
> Al
>

al.b...@gmail.com

unread,
Jun 22, 2017, 8:37:27 AM6/22/17
to
Hi Janis,

On Thursday, June 22, 2017 at 2:30:53 AM UTC+2, Janis Papanagnou wrote:>
[]
> It's not quite clear to me what exactly you intend to do with the filter file;
> does it contain _specific_ records, or is "<some time expression" substitute
> for a _pattern_ that shall match any expression?

uhm, let me be more explicit with an example:

<file.ref>
TAG: 01 [2017.05.00 23:32:42 432] Warning ...
TAG: 02 [2017.06.02 18:32:42 436] Warning ...
TAG: 01 [2017.07.03 15:33:42 472] Warning ...

<file.log>
TAG: 01 [2017.05.00 23:32:42 432] Warning ...
TAG: 02 [2017.06.02 18:32:42 436] Warning ...
TAG: 01 [2017.07.03 15:33:42 472] Warning ...
TAG: 05 [2017.07.04 02:33:43 433] Error ...
TAG: 06 [2017.08.04 13:53:43 432] Error ...
TAG: 05 [2017.09.04 02:32:52 441] Error ...
TAG: 05 [2017.10.04 02:33:34 644] Error ...

<file.filter>
# This expression shall have an explanatory comment.
TAG: 05 [<some time expression>] Error ...

<file.filtered.out>
TAG: 01 [2017.05.00 23:32:42 432] Warning ...
TAG: 02 [2017.06.02 18:32:42 436] Warning ...
TAG: 01 [2017.07.03 15:33:42 472] Warning ...
TAG: 06 [2017.08.04 13:53:43 432] Error ...

<file.diff>
TAG: 06 [2017.08.04 13:53:43 432] Error ...

> And (I suspect so) does your
> filter file contain more than one filter expressions?

correct. We can say that the likelihood that a comment in the file.filter is matching a string in the file.log is nearly 0, therefore we are safe to consider it as a non matching regex.

> It's the easiests to use if you have only patterns (i.e. no comments, or if
> no comments match the data) in the filter file.
>
> Say, you have in your filter file this pattern...
>
> TAG: 05 \[.*\] Error ...
>
> (escapes are necessary due to the regexp meta-characters used) then you can
> use this construct for comparison...

So each line is a regex, correct?

>
> diff <( grep -vf file.filter file.log ) file.ref

That is not bad at all.

>
> If your filter file is more complex you need to extend that (e.g. to comment
> filtering/skipping).

Well, indeed the process is a bit more complex than that.
Sometimes it happens that my file.log matches perfectly the file.ref, except for the time stamps. How could I 'ignore' these type of differences? I can indeed 'sed away' the time part of the file, just wondering if diff had some sort of mechanism to handle that.

Thanks a lot for your feedback,

Al
0 new messages