iRODS audit plugin: ELK Stack works as expected but multiple errors in logstash

357 views
Skip to first unread message

mauro....@cmcc.it

unread,
Nov 2, 2020, 3:35:15 PM11/2/20
to iRODS-Chat

Hi All,

I'm using iRODS audit plugin together with ELK stack and they are working as expected.
Anyway, I noticed that the used disk space is growing due to some big files saved in /var/log/logstash.

These files contain a lot of errors similar to the following one:

[2020-11-02T20:28:32,069][ERROR][logstash.codecs.json     ] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unexpected character ('\' (code 92)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: (String)"\u0000Sw\xB1\u0000\u0000\u0004\xA7__BEGIN_JSON__{\"proxy_auth_info_ppid\": \"0\", \"pid\": \"73213\", \"time_stamp\": \"1604348912054\", \"user_user_other_info_user_modify\": \"\", \"hostname\": \"icat02.cmcc.scc\", \"std_string\": \"native\", \"rule_name\": \"audit_pep_auth_agent_auth_request_pre\", \"user_auth_info_flag\": \"0\", \"user_user_other_info_user_info\": \"\", \"proxy_user_other_info_user_info\": \"\", \"proxy_rods_zone\": \"idas\", \"user_sys_uid\": \"0\", \"auth_scheme\": \"native"[truncated 881 chars]; line: 1, column: 2]>, :data=>"\\u0000Sw\\xB1\\u0000\\u0000\\u0004\\xA7__BEGIN_JSON__{\\\"proxy_auth_info_ppid\\\": \\\"0\\\", \\\"pid\\\": \\\"73213\\\", \\\"time_stamp\\\": \\\"1604348912054\\\", \\\"user_user_other_info_user_modify\\\": \\\"\\\", \\\"hostname\\\": \\\"icat02.cmcc.scc\\\", \\\"std_string\\\": \\\"native\\\", \\\"rule_name\\\": \\\"audit_pep_auth_agent_auth_request_pre\\\", \\\"user_auth_info_flag\\\": \\\"0\\\", \\\"user_user_other_info_user_info\\\": \\\"\\\", \\\"proxy_user_other_info_user_info\\\": \\\"\\\", \\\"proxy_rods_zone\\\": \\\"idas\\\", \\\"user_sys_uid\\\": \\\"0\\\", \\\"auth_scheme\\\": \\\"native\\\", \\\"proxy_auth_info_host\\\": \\\"\\\", \\\"zone_name\\\": \\\"\\\", \\\"digest\\\": \\\"\\\", \\\"client_addr\\\": \\\"10.0.1.76\\\", \\\"proxy_auth_info_auth_flag\\\": \\\"0\\\", \\\"proxy_user_name\\\": \\\"rods\\\", \\\"proxy_auth_info_auth_scheme\\\": \\\"\\\", \\\"proxy_auth_info_auth_str\\\": \\\"\\\", \\\"proxy_auth_info_flag\\\": \\\"0\\\", \\\"proxy_sys_uid\\\": \\\"0\\\", \\\"proxy_user_other_info_user_comments\\\": \\\"\\\", \\\"proxy_user_other_info_user_create\\\": \\\"\\\", \\\"user_user_name\\\": \\\"rods\\\", \\\"proxy_user_other_info_user_modify\\\": \\\"\\\", \\\"user_user_type\\\": \\\"\\\", \\\"user_auth_info_auth_scheme\\\": \\\"\\\", \\\"proxy_user_type\\\": \\\"\\\", \\\"user_name\\\": \\\"\\\", \\\"user_auth_info_auth_flag\\\": \\\"0\\\", \\\"user_auth_info_auth_str\\\": \\\"\\\", \\\"user_auth_info_host\\\": \\\"\\\", \\\"user_auth_info_ppid\\\": \\\"0\\\", \\\"user_rods_zone\\\": \\\"idas\\\", \\\"user_user_other_info_user_comments\\\": \\\"\\\", \\\"user_user_other_info_user_create\\\": \\\"\\\", \\\"std_string_ptr\\\": \\\"\\\"}__END_JSON__"}
[2020-11-02T20:28:32,069][WARN ][logstash.codecs.json     ] Received an event that has a different character encoding than you configured. {:text=>"\\u0000Sw\\xA1y__BEGIN_JSON__{\\\"action\\\": \\\"END\\\", \\\"pid\\\": \\\"73213\\\", \\\"time_stamp\\\": \\\"1604348912067\\\", \\\"hostname\\\": \\\"icat02.cmcc.scc\\\"}__END_JSON__", :expected_charset=>"UTF-8"}
[2020-11-02T20:28:32,069][ERROR][logstash.codecs.json     ] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unexpected character ('\' (code 92)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: (String)"\u0000Sw\xA1y__BEGIN_JSON__{\"action\": \"END\", \"pid\": \"73213\", \"time_stamp\": \"1604348912067\", \"hostname\": \"icat02.cmcc.scc\"}__END_JSON__"; line: 1, column: 2]>, :data=>"\\u0000Sw\\xA1y__BEGIN_JSON__{\\\"action\\\": \\\"END\\\", \\\"pid\\\": \\\"73213\\\", \\\"time_stamp\\\": \\\"1604348912067\\\", \\\"hostname\\\": \\\"icat02.cmcc.scc\\\"}__END_JSON__"}

Could you please help me to understand where is the error and how can I fix it?

Thank you,
Mauro

Terrell Russell

unread,
Nov 2, 2020, 4:01:05 PM11/2/20
to irod...@googlegroups.com
It appears to be some garbage data in the message itself...

This is where we're building that string...

That's probably always been there - we haven't changed the format since our first release.
Happy to have other eyes help find what's going on.

Terrell




--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/irod-chat/2bf026b4-b578-4ae8-9d35-a61627ba1df4n%40googlegroups.com.

Mauro Tridici

unread,
Nov 3, 2020, 3:11:25 AM11/3/20
to irod...@googlegroups.com
Hello Terrell,

thank you for your feedback.
I’m trying to understand the error and the involved .cpp file code.

If I’m not wrong, msg_str should start with “_BEGIN_JSON_”, 

std::string msg_str = std::string("__BEGIN_JSON__") + std::string(tmp_buf) + std::string("__END_JSON__);

but in the logstash log there I see something like that "\u0000Sw\xB1\u0000\u0000\u0004\xA7” and I don’t know what I can do to remove it.

Have a great day,
Mauro

mauro....@cmcc.it

unread,
Dec 18, 2020, 6:59:55 AM12/18/20
to iRODS-Chat
Good morning,

any news about how to fix this issue?
There are so many error messages and they are filling the available disc space...

If it can help, I detected something that could be related to this issue in logstash configuration file (inside the docker container).

[...]
    # Remove AMQP header and footer information from message
    #ruby {
    #    code => "event.set(message, event.get(message).sub(/.*__BEGIN_JSON__/, ))
    #             event.set(message, event.get(message).sub(/__END_JSON__.*/, ))"
    #}

    if "_jsonparsefailure" in [tags] {
        mutate {
                  gsub => [ "message", "[\\]","" ]
                  gsub => [ "message", ".*__BEGIN_JSON__", ""]
                  gsub => [ "message", "__END_JSON__", ""]

        }
        mutate { remove_tag => [ "tags", "_jsonparsefailure" ] }
        json { source => "message" }

    }

    # Parse the JSON message
    json {
        source       => "message"
        remove_field => ["message"]
    }
[...]


Thank you in advance,
Mauro

dmoore.renci

unread,
Jan 4, 2021, 10:10:30 AM1/4/21
to iRODS-Chat
hi Mauro,
  Apart from everything else, I believe the problem of disk space filling up could be solved by rotating the logstash log files.
  With regard to the JSON parse errors, I wonder if your system possibly has any data object or collection names containing characters that would interfere with proper JSON formatting, particularly quotes or backslashes?
  Let us know anything you may have customized in your iRODS system (rule bases, ongoing delayed jobs, loaded plugins) or auditing setup ! 
best,
Daniel Moore
Applications Engineer
iRODS Consortium

Mauro Tridici

unread,
Jan 5, 2021, 5:34:27 AM1/5/21
to irod...@googlegroups.com
Hi Daniel,

thank you for taking care of my issue and for your suggestion, I created a cronjob to delete logstash log file periodically (every 3 hours).
The file size grows very fast, it can be about 20GB in a very short time.

Anyway, I can confirm that no data object or collection names contain  quotes or backslashes.

iRODS system configuration is almost the basic one (if I understand correctly your question).
I will send you, in a separate private message, the /etc/irods folder content (configuration files + .r + .re files + auditing setup).
In addition, I would notify you that there is also a quota rule running periodically.

Many thanks for any suggestion you will provide.
Regards,
Mauro

Coposky, Jason Matthew

unread,
Jan 5, 2021, 7:29:05 AM1/5/21
to irod...@googlegroups.com
Mauro,

You can tune which dynamic policy enforcement points that are emitted as messages with the configuration:

"pep_regex_to_match" : "audit_.*"

Which can be used to adjust how much information is sent to the elastic stack.  By default, every possible dyn pep is serialized and sent.

For instance, to just send the API dyn peps you could set it to:

"pep_regex_to_match" : "audit_pep_api_.*"

You can reference the full list here:


Thanks,

------
Jason Coposky
Executive Director, iRODS Consortium
RENCI at the University of North Carolina at Chapel Hill
w: (919)445-9675
m: (919)522-0517
jas...@renci.org



From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of Mauro Tridici <mauro....@cmcc.it>
Sent: Tuesday, January 5, 2021 5:34 AM
To: irod...@googlegroups.com <irod...@googlegroups.com>
Subject: Re: [iROD-Chat:19457] iRODS audit plugin: ELK Stack works as expected but multiple errors in logstash
 

mauro....@cmcc.it

unread,
Jan 7, 2021, 4:15:21 AM1/7/21
to iRODS-Chat
Hello Jason,

many thanks for your suggestion.
I tried to simplify and reduce audit peps regexpr, but it didn't solve the issue.
Daniel reproduced the issue in his environment and he is trying to fix it.

Thank you,
Mauro

Brett Hartley

unread,
Aug 2, 2021, 9:57:37 AM8/2/21
to iRODS-Chat
Hi Mauro, Daniel,

What was the solution to the JSON parse errors? I've just hit the same issue in a test zone with no extra rules, and only one object: /tempZone/home/rods/testfile

Thanks,
Brett

Terrell Russell

unread,
Aug 4, 2021, 10:28:14 PM8/4/21
to irod...@googlegroups.com

Venustiano

unread,
Oct 9, 2021, 4:59:03 PM10/9/21
to iRODS-Chat
We have observed that that the plugin not only produces an invalid json but also creates a key/field "buf" (when a file is uploaded using iput) and the content of the file is added as the value of "buf".

For example, given a file (attached) containing 


error_message
"error: limit is  top 500000  SELECT TOP 1000000 dec, ra, psfMag_g, psfMag_u, Photoz.z FROM Galaxy, Photoz WHERE Photoz.objID = Galaxy.objID "


logstash produces

logstash           | [2021-10-09T19:54:01,829][WARN ][logstash.codecs.json     ][main][e9bc8284bcc6075cb06bf135426d2f5c59727b5956e4f3fd3cac8c097f0e525f] Received an event that has a different character encoding than you configured. {:text=>"\\u0000Sw\\xB1\\u0000\\u0000\\u0006>__BEGIN_JSON__{\\\"proxy_user_other_info_user_modify\\\": \\\"\\\", \\\"user_auth_info_host\\\": \\\"\\\", \\\"proxy_user_name\\\": \\\"user\\\", \\\"dataIncluded\\\": \\\"\\\", \\\"time_stamp\\\": \\\"1633809241826\\\", \\\"proxy_sys_uid\\\": \\\"0\\\", \\\"rule_name\\\": \\\"audit_pep_api_data_obj_put_pre\\\", \\\"hostname\\\": \\\"rdms-irods\\\", \\\"proxy_auth_info_ppid\\\": \\\"0\\\", \\\"auth_scheme\\\": \\\"native\\\", \\\"std_string\\\": \\\"api_instance\\\", \\\"proxy_auth_info_auth_scheme\\\": \\\"\\\", \\\"pid\\\": \\\"769\\\", \\\"user_auth_info_auth_str\\\": \\\"\\\", \\\"client_addr\\\": \\\"10.3.17.1\\\", \\\"proxy_auth_info_flag\\\": \\\"0\\\", \\\"proxy_auth_info_auth_flag\\\": \\\"3\\\", \\\"proxy_auth_info_auth_str\\\": \\\"\\\", \\\"proxy_auth_info_host\\\": \\\"\\\", \\\"proxy_rods_zone\\\": \\\"rugZone\\\", \\\"user_sys_uid\\\": \\\"0\\\", \\\"proxy_user_other_info_user_comments\\\": \\\"\\\", \\\"user_auth_info_auth_flag\\\": \\\"3\\\", \\\"user_auth_info_auth_scheme\\\": \\\"\\\", \\\"user_user_other_info_user_create\\\": \\\"\\\", \\\"user_user_other_info_user_comments\\\": \\\"\\\", \\\"proxy_user_other_info_user_create\\\": \\\"\\\", \\\"opr_type\\\": \\\"1\\\", \\\"user_rods_zone\\\": \\\"rugZone\\\", \\\"proxy_user_other_info_user_info\\\": \\\"\\\", \\\"len\\\": \\\"159\\\", \\\"user_user_type\\\": \\\"\\\", \\\"proxy_user_type\\\": \\\"\\\", \\\"ERROR\\\": \\\"[portalOprOut**] not supported\\\", \\\"user_auth_info_flag\\\": \\\"0\\\", \\\"user_auth_info_ppid\\\": \\\"0\\\", \\\"user_user_name\\\": \\\"user\\\", \\\"user_user_other_info_user_info\\\": \\\"\\\", \\\"user_user_other_info_user_modify\\\": \\\"\\\", \\\"create_mode\\\": \\\"0\\\", \\\"dataSize\\\": \\\"159\\\", \\\"dataType\\\": \\\"generic\\\", \\\"data_size\\\": \\\"159\\\", \\\"defRescName\\\": \\\"rootResc\\\", \\\"num_threads\\\": \\\"0\\\", \\\"obj_path\\\": \\\"/rugZone/home/user/SDSS_1M.csv\\\", \\\"offset\\\": \\\"0\\\", \\\"open_flags\\\": \\\"2\\\", \\\"buf\\\": \\\"error_message\\\\r\\\\n\\\\\\\"error: limit is  top 500000  SELECT TOP 1000000 dec, ra, psfMag_g, psfMag_u, Photoz.z FROM Galaxy, Photoz WHERE Photoz.objID = Galaxy.objID \\\\\\\"\\\\r\\\\n\\\"}__END_JSON__", :expected_charset=>"UTF-8"}


In addition, the audit_pep_api_data_obj_put_pre pep is not ingested into elasticsearch.

Has someone observed a similar behaviour?

As a workaround, we exclude the buf field using the following code in logstash conf

        mutate {
                  gsub => [ "message", "[\\]","" ]
                  gsub => [ "message", ".*__BEGIN_JSON__", ""]
                  gsub => [ "message", "__END_JSON__", ""]
  gsub => ["message", ", \"buf\":(.*?)\"(, |\})","\2"]
        } 

We have also observed that the content of binary files are also ingested into elasticsearch (in the buf field). Why? This is problematic for huge files.

Venus
SDSS_1M.csv

Terrell Russell

unread,
Oct 12, 2021, 9:26:43 AM10/12/21
to irod...@googlegroups.com
Venus,

Agreed - have created this issue to remove the `buf` (and potentially other parameters) from being sent to the auditing infrastructure...


Terrell



Rudi Broekhuizen

unread,
Dec 28, 2022, 4:22:02 AM12/28/22
to iRODS-Chat
Hi, I receive messages like this in my RabbitMQ queue using iRODS 4.3.0 with the audit plugin installed:

b'\x00Sw\xb1\x00\x00\x02\r{"@timestamp":1672216601627,"attriCnt":"0","continueInx":"0","continueInx__2":"0","hostname":"irods","maxRows":"256","options":"0","pid":1804,"rowCnt":"0","rowOffset":"0","rule_name":"audit_pep_database_gen_query_post","select_401":"1","select_407":"1","select_411":"1","select_412":"1","select_413":"1","select_415":"1","select_419":"1","select_420":"1","select_421":"1","select_423":"1","std_string":"postgres","std_string_ptr":"","totalRowCount":"0","where_403":"=\'file.0171\'","where_501":"=\'/tempZone/home/rods/test.25\'"}'

I run into two issues:
- Every message start with b'\x00Sw\xb1\x00\x00\x02\r and ends with a single quote
- The json part is not valid json,  single quote characters do not need to be escaped

cat message.json | jq .
parse error: Invalid escape at line 1, column 753 (
"=\'file.0171\'")

I expect valid json lines to be produced by the plugin.


Op dinsdag 12 oktober 2021 om 15:26:43 UTC+2 schreef Terrell Russell:

Alan King

unread,
Jan 4, 2023, 10:44:22 AM1/4/23
to irod...@googlegroups.com
Hi,

The audit plugin is currently undergoing a major rewrite which will address this very issue. The fixes should be included in the next release of the audit plugin for the 4.3 server.

You can read more about that investigation here as well: https://github.com/irods/irods_rule_engine_plugin_audit_amqp/issues/101

In the meantime, you can check out the ELK stack demo we use for training which has recently been updated to accommodate AMQP 1.0: https://github.com/irods/contrib/tree/main/irods_audit_elk_stack Please note the "CAVEAT EMPTOR", but this could be a jumping off point for implementing a solution.

If these are not addressing your particular concern, please file an issue here: https://github.com/irods/irods_rule_engine_plugin_audit_amqp/issues



--
Alan King
Senior Software Developer | iRODS Consortium
Reply all
Reply to author
Forward
0 new messages