Elastic Alpha - dropping logs to remote logstash / ELK

426 views
Skip to first unread message

James Gordon

unread,
Sep 19, 2017, 10:26:31 AM9/19/17
to security-onion
Hi all,

First off, big thanks to Doug and team for their work with Security Onion and the awesome job integrating Elastic into the SO stack! I'm looking forward to leveraging Bro in Elastic on Security Onion.

I stood up an SO Elastic alpha release box on the same network as one of my production Security Onion sensors. I'm trying to ship Bro logs from the prod sensor to the new elastic box to test out elastic, but the vast majority of the logs are dropping in transit according to syslog-ng on the source sensor. I had this functionality working on a previous release of the elastic preview - I can't remember if that was TP1 or TP2, but I don't believe network congestion is the limiting factor here. For context, on the previous TP I had running, the old ELK instance was seeing about 50 million logs per day. I've had this alpha running for about 18 hours now and it's only imported about 300,000 logs.

To send the logs, I copied and pasted the relevant syslog-ng configuration lines out of the new SO-ELK alpha server and put them in the source sensor. I modified the output to send to the IP of the ELK box, and added a UFW rule to allow connectivity.

I've used the `syslog-ng-ctl stats` command to confirm that the syslog to logstash transport step is where my logs are being dropped.

I've tried the following:
*Switched Syslog-ng output between TCP and UDP - both resulted in about the same amount of loss.
*Increased RAM and worker count on logstash - no changes observed.
*Set up a syslog-ng receiver on the ELK box, wrote the Bro logs to disk vis syslog-ng, and used syslog-ng to send the contents of that file into logstash. The source SO sensor reported FAR fewer dropped logs with this approach - but syslog-ng on the ELK box reported a large amounts of drops from that file source to the logstash destination. This makes me think this is a logstash input / input rate issue.
*Modified syslog-ng settings such as log_fifo_size and flow-control. Nothing I tried here made a difference.
*Briefly stopped sending syslog from the source sensor to our SIEM thinking I was putting too much stress on syslog - but this didn't make a difference either.
*Commented out all source log lines except for source(s_bro_conn);. Still observed a high rate of log loss.

user@sourceSensor:/var/log$ sudo syslog-ng-ctl stats | grep logstash_bro
destination;d_logstash_bro;;a;processed;17253139
dst.tcp;d_logstash_bro#0;tcp,Dest.SO.ELK.IP:6050;a;dropped;17229709
dst.tcp;d_logstash_bro#0;tcp,Dest.SO.ELK.IP:6050;a;processed;17253139
dst.tcp;d_logstash_bro#0;tcp,Dest.SO.ELK.IP:6050;a;stored;10000


Here's the relevant syslog-ng configuration lines on the sensor / source machine (note I didn't modify any of the log source lines so leaving those out intentionally at this time):

destination d_logstash_bro { tcp("IP.Of.ELK.Box" port(6050) template("$(format-json --scope selected_macros --scope nv_pairs --exclude DATE --key ISODATE)\n")); };

log {
source(s_bro_conn);
...#rest of the bro log sources
log { filter(f_bro_headers); flags(final); };
log { destination(d_logstash_bro); };

};

So I guess my questions are:
1. Is anyone experiencing logs dropped between syslog-ng and logstash on a standalone ELK Alpha box? (taking the network factor entirely out of the equation)? (`syslog-ng-ctl stats` and looking for the logstash destination should show this)
2. Has anyone else attempted sending these logs from a production SO sensor to an ELK TP or alpha release and encountered this issue?
3. Are there any tuning options that may help? Recommendations from anyone that has encountered similar issues?


I've attached sostat-redacted output for both the source and destination systems in case there's any useful information in there for troubleshooting this.

Thanks in advance for any input / help!

James Gordon

sostat-ELK
sostat-source

Doug Burks

unread,
Sep 19, 2017, 2:05:19 PM9/19/17
to securit...@googlegroups.com
Hi James,

You say that you increased logstash RAM and worker count, how did you
change these settings and what did you set them to?
> --
> Follow Security Onion on Twitter!
> https://twitter.com/securityonion
> ---
> You received this message because you are subscribed to the Google Groups "security-onion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
> To post to this group, send email to securit...@googlegroups.com.
> Visit this group at https://groups.google.com/group/security-onion.
> For more options, visit https://groups.google.com/d/optout.



--
Doug Burks

James Gordon

unread,
Sep 19, 2017, 2:53:52 PM9/19/17
to security-onion


Hi Doug,

I increased logstash (and ES) RAM values in /etc/nsm/securityonion.conf. Setup initially dedicated 8227mb to both of these - I upped them to 12000m. The machine has 32 GB of RAM.

# Elasticsearch options
ELASTICSEARCH_ENABLED="yes"
ELASTICSEARCH_HOST="localhost"
ELASTICSEARCH_PORT=9200
#ELASTICSEARCH_HEAP="8227m"
ELASTICSEARCH_HEAP="12000m"
ELASTICSEARCH_OPTIONS=""

# Logstash options
LOGSTASH_ENABLED="yes"
#LOGSTASH_HEAP="8227m"
LOGSTASH_HEAP="12000m"
LOGSTASH_OPTIONS=""

I modified /etc/logstash/logstash.yml to increase the pipeline workers - I went from 1 to 4 workers to see if this made any improvement.

root@securityonion-elk:~# cat /etc/logstash/logstash.yml
path.config: /usr/share/logstash/pipeline
queue.type: persisted
queue.max_bytes: 1gb
pipeline.workers: 4
path.logs: /var/log/logstash

After making these changes I restarted elastic with so-elastic-restart. I built the ELK machine on a test network and moved it to production to ingest logs, so so-elastic-restart timed out getting docker configs and used cached values. I don't see where the number of pipeline workers is reflected in the running logstash process, but it definitely applied the heap size change.

root@securityonion-elk:~# ps -ef | grep logstash
jgordon 14758 14741 7 12:10 ? 00:29:10 /usr/bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -Djava.awt.headless=true -Dfile.encoding=UTF-8 -XX:+HeapDumpOnOutOfMemoryError -Djava.security.egd=file:/dev/urandom -Xmx12000m -Xms12000m -Xss2048k -Djffi.boot.library.path=/usr/share/logstash/vendor/jruby/lib/jni -Xbootclasspath/a:/usr/share/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/usr/share/logstash/vendor/jruby -Djruby.lib=/usr/share/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main /usr/share/logstash/lib/bootstrap/environment.rb logstash/runner.rb


Thanks,

James Gordon

Wes

unread,
Sep 27, 2017, 6:53:42 AM9/27/17
to security-onion

James,

Are you still experiencing issues with this, or were you able to get this resolved?

Thanks,
Wes

James Gordon

unread,
Sep 27, 2017, 9:24:49 AM9/27/17
to security-onion

Wes,

I rebuilt the ELK server last night. Just rebuilding it fixed the dropped logs as reported by syslog-ng on the production sensor, but only a very small portion of the logs were making into elasticsearch still. I noticed some error logs in /var/logs/logstash/logstash.log for DomainStats timing out - I disabled domain stats in /etc/nsm/securityonion.conf, rebooted, and things are working much better! I will note that I had to increase the logstash workers count to keep up with the logs. It fell behind pretty quickly last night - after an hour of ingesting logs it was 45 minutes behind - so I bumped the logstash worker count to 8 and it was up to date when I checked on it this morning. Over the past 14 hours ELK has ingested just under 30 million logs.

As mentioned in my response to Doug, I built the ELK instance in a test / lab environment that has open access to the internet, then swapped it over to production. It does not currently have proxy rules in place to connect outbound to the internet, so I'm assuming that's the cause of the domain stats time outs.

Thanks,

James Gordon

Eric Appelboom

unread,
Oct 18, 2017, 9:19:33 AM10/18/17
to security-onion

Had similar issue with domainstats slowing down the indexing of logs
the systems were long periods of sporadic log indexing.

Sorted by setting DOMAIN_STATS_ENABLED="no" in /etc/nsm/securityonion.conf

[2017-10-18T12:50:54,205][ERROR][logstash.filters.rest ] error in rest filter {:request=>[:get, "http://domainstats:20000/domain/creation_date/angsrvr.com", {}], :json=>false, :code=>nil, :body
=>nil, :client_error=>#<Manticore::StreamClosedException: Could not read from stream: Read timed out>}

Looks like DOCKER Iptables chain has no forwarding rule for domainstats

iptables -L DOCKER
Chain DOCKER (2 references)
target prot opt source destination
ACCEPT tcp -- anywhere 172.17.0.4 tcp dpt:9300
ACCEPT tcp -- anywhere 172.17.0.4 tcp dpt:9200
ACCEPT tcp -- anywhere 172.17.0.5 tcp dpt:6053
ACCEPT tcp -- anywhere 172.17.0.5 tcp dpt:6052
ACCEPT tcp -- anywhere 172.17.0.5 tcp dpt:6051
ACCEPT tcp -- anywhere 172.17.0.5 tcp dpt:6050
ACCEPT tcp -- anywhere 172.17.0.6 tcp dpt:5601

containe ris running but not listening on tcp/20000
# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
399aeb648082 securityonionsolutions/so-curator "/bin/bash" 16 minutes ago Up 16 minutes so-curator
3c38a258221a securityonionsolutions/so-elastalert "/opt/start-elasta..." 16 minutes ago Up 16 minutes so-elastalert
8ec4f2d73a44 securityonionsolutions/so-kibana "/bin/sh -c /usr/l..." 16 minutes ago Up 16 minutes 127.0.0.1:5601->5601/tcp so-kibana
5e40eb533ae5 securityonionsolutions/so-logstash "/usr/local/bin/do..." 16 minutes ago Up 16 minutes 5044/tcp, 9600/tcp, 127.0.0.1:6050-6053->6050-6053/tcp so-logstash
0ce595e769bc securityonionsolutions/so-elasticsearch "/bin/bash bin/es-..." 16 minutes ago Up 16 minutes 127.0.0.1:9200->9200/tcp, 127.0.0.1:9300->9300/tcp so-elasticsearch
a9ecb1effec9 securityonionsolutions/so-domainstats "/bin/sh -c '/usr/..." 16 minutes ago Up 16 minutes 20000/tcp so-domainstats
8552535e82e1 securityonionsolutions/so-freqserver "/bin/sh -c '/usr/..." 16 minutes ago Up 16 minutes 10004/tcp so-freqserver

Doug Burks

unread,
Oct 18, 2017, 9:34:13 AM10/18/17
to securit...@googlegroups.com
Hi Eric,

Replies inline.

On Wed, Oct 18, 2017 at 9:19 AM, Eric Appelboom <eappe...@gmail.com> wrote:
>
>
> Had similar issue with domainstats slowing down the indexing of logs
> the systems were long periods of sporadic log indexing.
>
> Sorted by setting DOMAIN_STATS_ENABLED="no" in /etc/nsm/securityonion.conf
>
> [2017-10-18T12:50:54,205][ERROR][logstash.filters.rest ] error in rest filter {:request=>[:get, "http://domainstats:20000/domain/creation_date/angsrvr.com", {}], :json=>false, :code=>nil, :body
> =>nil, :client_error=>#<Manticore::StreamClosedException: Could not read from stream: Read timed out>}
>
> Looks like DOCKER Iptables chain has no forwarding rule for domainstats
>
> iptables -L DOCKER
> Chain DOCKER (2 references)
> target prot opt source destination
> ACCEPT tcp -- anywhere 172.17.0.4 tcp dpt:9300
> ACCEPT tcp -- anywhere 172.17.0.4 tcp dpt:9200
> ACCEPT tcp -- anywhere 172.17.0.5 tcp dpt:6053
> ACCEPT tcp -- anywhere 172.17.0.5 tcp dpt:6052
> ACCEPT tcp -- anywhere 172.17.0.5 tcp dpt:6051
> ACCEPT tcp -- anywhere 172.17.0.5 tcp dpt:6050
> ACCEPT tcp -- anywhere 172.17.0.6 tcp dpt:5601

That is correct. The new version of securityonion-elastic no longer
publishes that port. However, other Docker containers should still be
able to connect to domainstats:20000 over the internal Docker network.
From http://blog.securityonion.net/2017/10/security-advisory-for-security-onion.html:
"securityonion-elastic - 20171011-1ubuntu1securityonion1 makes the
following changes:

so-kibana publishes port 5601 to 127.0.0.1 only
so-elasticsearch publishes ports 9200 and 9300 to 127.0.0.1 only
so-logstash publishes ports 6050, 6051, 6052, and 6053 to 127.0.0.1 only
so-freqserver no longer publishes port 10004
so-domainstats no longer publishes port 20000"

> containe ris running but not listening on tcp/20000
> # docker ps -a
> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
> 399aeb648082 securityonionsolutions/so-curator "/bin/bash" 16 minutes ago Up 16 minutes so-curator
> 3c38a258221a securityonionsolutions/so-elastalert "/opt/start-elasta..." 16 minutes ago Up 16 minutes so-elastalert
> 8ec4f2d73a44 securityonionsolutions/so-kibana "/bin/sh -c /usr/l..." 16 minutes ago Up 16 minutes 127.0.0.1:5601->5601/tcp so-kibana
> 5e40eb533ae5 securityonionsolutions/so-logstash "/usr/local/bin/do..." 16 minutes ago Up 16 minutes 5044/tcp, 9600/tcp, 127.0.0.1:6050-6053->6050-6053/tcp so-logstash
> 0ce595e769bc securityonionsolutions/so-elasticsearch "/bin/bash bin/es-..." 16 minutes ago Up 16 minutes 127.0.0.1:9200->9200/tcp, 127.0.0.1:9300->9300/tcp so-elasticsearch
> a9ecb1effec9 securityonionsolutions/so-domainstats "/bin/sh -c '/usr/..." 16 minutes ago Up 16 minutes 20000/tcp so-domainstats

This last line appears to report that the container is listening on 20000/tcp.

Is this Security Onion box allowed to connect to whois servers on the
Internet over port 43?


--
Doug Burks
Reply all
Reply to author
Forward
0 new messages