"field [uid] already exists" jamming up Logstash pipeline on new SO install

109 views
Skip to first unread message

Kevin Branch

unread,
Jul 4, 2020, 12:42:43 AM7/4/20
to securit...@googlegroups.com
Hi,

I have a very new standalone SO system, package-installed on fresh Ubuntu 16.04 server.
After being installed, it was initially pumping plenty of events but hours later nothing new would show up in Kibana unless I would restart Logstash which would only produce a little spurt of events in Kibana and then the pipeline would stop again, apparently related to these errors that keep repeating in logstash.log:

[2020-07-04T04:20:32,313][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 500 ({"type"=>"exception", "reason"=>"java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [uid] already exists", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"java.lang.IllegalArgumentException: field [uid] already exists", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"field [uid] already exists"}}, "header"=>{"processor_type"=>"rename"}})

[2020-07-04T04:20:32,313][INFO ][logstash.outputs.elasticsearch] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>1}


I even reran sosetup and still it continues to act the same and throw the same logstash log complaints.

Looking at sostat the only thing that stands out to me is the syslog-ng drops related to the logstash destination, which only makes sense since Logstash is stopping up the pipeline.

Short of reinstalling Ubuntu 16.04 server itself, is there something else I can do to get a cleaner rebuild of SO than just rerunning sosetup alone?

Please advise.  I am unsure where to turn next.

Thanks,
Kevin


sostat.txt

Kevin Branch

unread,
Jul 4, 2020, 1:15:49 PM7/4/20
to securit...@googlegroups.com
I was able to trace it down to bro_ssl.  If I comment out the "source(s_bro_ssl);" line from syslog-ng.conf and restart logstash, then things get flowing.  Zero bro_ssl records were making it to Kibana, though I am unsure why Logstash gets jammed up on them.  Every  time I would restart logstash a burst of records would make it through until a bro_ssl record was encountered, at which point the Logstash pipeline rolls to a halt.

Here's the most recent record in my /nsm/bro/spool/logger/ssl.log
{"ts":"2020-07-04T16:11:44.436863Z","uid":"C0ZaPQKzQcZauSjn4","id.orig_h":"172.16.81.226","id.orig_p":53098,"id.resp_h":"172.217.14.202","id.resp_p":443,"version":"TLSv13","cipher":"TLS_AES_128_GCM_SHA256","curve":"x25519","server_name":"addons-pa.clients6.google.com","resumed":true,"established":true}
Does it look in any way malformed to you?

I have no files in /etc/logstash/custom so I assume everything is standard on the Logstash config side.

Kevin

Kevin Branch

unread,
Jul 6, 2020, 10:33:05 PM7/6/20
to securit...@googlegroups.com, Ken Fredericksen
Well I finally gave up and clean-slated the system.  Rerunning sosetup, even repeatedly, did not resolve the issue, but pulling out all the SO packages/files/directories allowed me to start over cleanly without reinstalling Ubuntu 16.04 itself, which would have been hard since the system is in a different country than I am.  After that, I apt installed the SO package set again and then sosetup ran nicely and Logstash did not jam up...  until some log items emerged from Apache2 that were parsed such that the srcip field was populated with a non-IP value.  That caused the Logstash geoip processor to balk on the value and halt the entire Logstash pipeline.  If this intended behavior?  It seems that having a field value that is not valid for geoip lookups should not be grounds for entirely stopping the pipeline.  Here is an example log entry that does that:

Example log in  /var/log/apache2/other_vhosts_access.log

nsm.domain.ca:81 nsm.domain.ca - - [06/Jul/2020:22:48:35 +0000] "OPTIONS * HTTP/1.0\" 200 126 "-" "Apache/2.4.18 (Ubuntu) OpenSSL/1.0.2g (internal dummy connection)  

gets parsed

**Phase 1: Completed pre-decoding.

       full event: 'nsm.domain.ca:81 nsm.domain.ca - - [06/Jul/2020:22:48:35 +0000] "OPTIONS * HTTP/1.0\" 200 126 "-" "Apache/2.4.18 (Ubuntu) OpenSSL/1.0.2g (internal dummy connection)  '

       timestamp: '(null)'

       hostname: 'calnsm'

       program_name: '(null)'

       log: 'nsm.domain.ca:81 nsm.domain.ca - - [06/Jul/2020:22:48:35 +0000] "OPTIONS * HTTP/1.0\" 200 126 "-" "Apache/2.4.18 (Ubuntu) OpenSSL/1.0.2g (internal dummy connection)  '

**Phase 2: Completed decoding.

       decoder: 'web-accesslog'

       srcip: 'nsm.domain.ca'

       protocol: 'OPTIONS'

       url: '*'

       id: '200'

resulting in an archives.json record of 

{"timestamp":"2020-07-06T22:48:37.400+0000","agent":{"id":"000","name":"calnsm.domain.ca"},"manager":{"name":"calnsm.domain.ca"},"id":"1594075717.3281941","full_log":"nsm.domain.ca:81 nsm.domain.ca - - [06/Jul/2020:22:48:35 +0000] \"OPTIONS * HTTP/1.0\" 200 126 \"-\" \"Apache/2.4.18 (Ubuntu) OpenSSL/1.0.2g (internal dummy connection)\"","decoder":{"parent":"web-accesslog","name":"web-accesslog"},"data":{"protocol":"OPTIONS","srcip":"nsm.domain.ca","id":"200","url":"*"},"location":"/var/log/apache2/other_vhosts_access.log"}  

which due to the non-IP value placed in the srcip field by the decoder, upsets the Logstash geoip filter

[2020-07-06T20:39:03,717][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 500 ({"type"=>"exception", "reason"=>"java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: 'nsm.domain.ca' is not an IP string literal.", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"java.lang.IllegalArgumentException: 'nsm.domain.ca' is not an IP string literal.", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"'nsm.domain.ca' is not an IP string literal."}}, "header"=>{"processor_type"=>"geoip"}})

and for some reason this causes Logstash to completely jam, endlessly retrying the same impossible geoip operation on that non-IP address value.

[2020-07-06T20:39:03,717][INFO ][logstash.outputs.elasticsearch] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>1}


With the diverse nature of logs that SO might be collecting, it seems quite likely that invalid srcip field values will pop up from time to time.  It seems incredibly fragile for Logstash to stop the pipeline whenever this happens.  Is there more to this issue than I am seeing?  I suppose if a non-IP value in the srcip field is expected to break the pipeline, then we could add some Logstash logic to only invoke the geoip filter after first validating that the srcip value is a valid IP (ipv4 or ipv6).   But first, tell me I'm missing something.  We can't be endlessly hovering over Logstash to see if it has been shut down again by another rogue value that clogged the pipe.  There must be more to the story.

I'm doing OK at present by simply commenting out the s_ossec log source in syslog-ng conf, but more importantly I want to know how to properly debug and remediate Logstash in situations like this.

Eager to understand better,
Kevin



Wes Lambert

unread,
Jul 6, 2020, 10:38:13 PM7/6/20
to securit...@googlegroups.com, Ken Fredericksen
Hi Kevin,

Thanks for the feedback, and sorry for the issue you are experiencing.  We've had similar reports for non-ip/null src ip values and related geoip issues, and we are looking into it.  Thanks again for your detailed explanation and patience!

Thanks,
Wes 

--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/security-onion/CA%2BdGL9Fds1p6c6gxUjOc1kQ%2BskJf%2BDNUAyX24rWYNZOF-h9mLw%40mail.gmail.com.


--

Wes Lambert

unread,
Jul 7, 2020, 11:35:32 AM7/7/20
to securit...@googlegroups.com, Ken Fredericksen
Please try modifying /etc/elasticsearch/ingest/common to look like the following for both destination_ip and source_ip (adding "ignore_failure": true):

"geoip":  {

                "field": "destination_ip",

                "target_field": "destination_geo",

                "database_file": "GeoLite2-City.mmdb",

                "ignore_missing": true,

                "ignore_failure": true,

                "properties": ["ip", "country_name", "region_iso_code", "region_name", "city_name", "timezone", "location"]

        }


Then run so-elasticsearch-pipelines, and confirm that the updated pipeline is loaded:



curl -s localhost:9200/_ingest/pipeline/common | jq . | grep geoip -A5



Let me know if you still run into issues after doing so. 



Thanks,

Wes

Kevin Branch

unread,
Jul 7, 2020, 6:07:05 PM7/7/20
to securit...@googlegroups.com
Thanks Wes,  

I did as you suggested, re-enabled my s_ossec log source, and then restarted syslog-ng and logstash.  The pipeline keeps flowing now all the way to Kibana, and here and there in Logstash.log I still see the geoip "is not an IP string literal"  complaints but no longer do I see the "Retrying individual bulk actions that failed or were rejected" appearing.  That is a good sign that Logstash is really seeing those illegal srcip values still but is no longer blocking on them.

Kevin


Wes Lambert

unread,
Jul 7, 2020, 8:46:18 PM7/7/20
to securit...@googlegroups.com
Thanks for confirming, Kevin!

Reply all
Reply to author
Forward
0 new messages