CJK Characters from windows_eventlog2

31 views
Skip to first unread message

Mark Sutyak

unread,
Jul 27, 2021, 12:28:47 PM7/27/21
to Fluentd Google Group
When using the option "read_all_channels true" and pulling data from the more obscure event logs, there are many instances on some systems where the message description cannot be found.  This somehow results in winevt adding some Chinese/Japanese/Korean (CJK) characters at the end of the line.  Sometimes one, sometimes dozens.  The characters seem random since there is no discernable message. 

Here is an example:

   "The message ID for the desired message could not be found.\r\n瑶㈯㐮眯湩癥"

Locating the original event in Windows shows no additional characters after the carriage return.

Any help pointing me in the right direction of the core script/code that handles this extraction is appreciated.  

Thanks,

Mark

Mark Sutyak

unread,
Aug 2, 2021, 4:47:31 PM8/2/21
to Fluentd Google Group
Additional details, and possibly enough to submit a bug report, but I'm not sure where.

A coworker took a quick look at this in different character encodings and realized these characters are not CJK, but rather a botched unicode interpretation.  I took that lead and write a quick piece of code to iterated through each charcter and cast it as an integer then convert it from Unicode bytes to UTF-8 I see actual readable data, but not data that is supposed to be there.
It's as if the C code in the winevt plugin has read too far in memory and started concatenating bytes it's not supposed to be grabbing.

Here is the input text: "Description":"The resource loader failed to find MUI file.
㐳㈸‧獉畃牲湥㵴琧畲❥㸯਍⼼潂歯慭歲楌瑳>>䏐涔倀者䈼潯浫牡䱫獩㹴਍†䈼潯浫牡桃湡敮㵬洧捩潲潳瑦眭湩潤獷欭牥敮⵬湰⽰潣普杩牵瑡潩❮删捥牯䥤㵤㈧㐱✱䤠䍳牵敲瑮✽牴敵⼧ാ㰊䈯潯浫牡䱫獩㹴㸀",

Something that stands out is the "3", which is the "end of text" character.  For now I can add a check to that in my code to signify what is valid text.
You can see after it gets past "10", the line feed character, everything goes a bit wonky.

Here is a snippet of the output:
Colums are Integer value - character

77 - M
85 - U
73 - I
32 -
102 - f
105 - i
108 - l
101 - e
46 - .
 3 -
10 -

13363 - 34
12856 - 82
8231 - '
29513 - Is
30019 - Cu
29298 - rr
28261 - en
15732 - t=
29735 - 't
30066 - ru
10085 - e'
15919 - />
2573 -

12092 - </
28482 - Bo
27503 - ok
24941 - ma
27506 - rk
26956 - Li
29811 - st
62 - >


Reply all
Reply to author
Forward
0 new messages