/nsm/elasticsearch partition full - stopped working

1,253 views
Skip to first unread message

Christian Sommer

unread,
Aug 29, 2018, 6:37:20 AM8/29/18
to security-onion
Hi,

right now some of our sensors refuse to accept new data in ElasticSearch,
because the partition for elasticsearch is 95% full. (/nsm is on another partition)

/dev/sda1 73T 62T 7,4T 90% /nsm
/dev/sdd1 459G 70M 435G 1% /nsm/logstash
/dev/sdd2 12T 11T 571G 95% /nsm/elasticsearch

Which process is responsible for freeing up space on the elasticsearch partition?
How can I solve this problem?

BR
Christian

Wes Lambert

unread,
Aug 29, 2018, 5:51:53 PM8/29/18
to securit...@googlegroups.com
This is normally handles by Curator.

What does the ES log look like?

Thanks,
Wes

--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
To post to this group, send email to securit...@googlegroups.com.
Visit this group at https://groups.google.com/group/security-onion.
For more options, visit https://groups.google.com/d/optout.


--

Erwin

unread,
Aug 30, 2018, 4:41:41 AM8/30/18
to security-onion
Hi Wes,

we see the following error:
Suppressed: java.lang.IllegalArgumentException: unable to consistently parse [cluster.routing.allocation.disk.watermark.low=30gb], [cluster.routing.allocation.disk.watermark.high=20gb], and [cluster.routing.allocation.disk.watermark.flood_stage=95%] as percentage or bytes

Caused by: org.elasticsearch.ElasticsearchParseException: failed to parse setting [cluster.routing.allocation.disk.watermark.flood_stage] with value [95%] as a size in bytes: unit is missing or unrecognized

[2018-08-28T00:06:06,072][INFO ][org.elasticsearch.cluster.routing.allocation.DiskThresholdMonitor] rerouting shards: [high disk watermark exceeded on one or more nodes]
[2018-08-28T00:06:39,227][WARN ][org.elasticsearch.cluster.routing.allocation.DiskThresholdMonitor] high disk watermark [90%] exceeded on [UZb8XTHOQCqBuwfHwaMzqA][UZb8XTH][/usr/share/elasticsearc
nodes/0] free: 578.1gb[5%], shards will be relocated away from this node


Regards,
Erwin

Wes Lambert

unread,
Aug 30, 2018, 9:31:10 AM8/30/18
to securit...@googlegroups.com
What is the result of the following?

grep LOG_SIZE_LIMIT /etc/nsm/securityonion.conf

How does that number compare with the output of df -h or the space assigned to Elasticsearch?

Thanks,
Wes

--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
To post to this group, send email to securit...@googlegroups.com.
Visit this group at https://groups.google.com/group/security-onion.
For more options, visit https://groups.google.com/d/optout.

Erwin

unread,
Aug 31, 2018, 2:18:07 AM8/31/18
to security-onion
Hi Wes,

grep LOG_SIZE_LIMIT /etc/nsm/securityonion.conf
LOG_SIZE_LIMIT=11900

df -h
Filesystem Size Used Avail Use% Mounted on


/dev/sdd2 12T 11T 571G 95% /nsm/elasticsearch


We have deleted some indicies on the first node, everything started working again.
I left the 2nd node full so we can troubleshoot on this one.

Regards,
Erwin

Wes Lambert

unread,
Aug 31, 2018, 1:52:54 PM8/31/18
to securit...@googlegroups.com
Hi Erwin,

Try dropping the value for LOG_SIZE_LIMIT to see if that helps.

Curator uses this value to determine when to purge data.

Thanks,
Wes

--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
To post to this group, send email to securit...@googlegroups.com.
Visit this group at https://groups.google.com/group/security-onion.
For more options, visit https://groups.google.com/d/optout.

Erwin

unread,
Sep 1, 2018, 3:08:35 AM9/1/18
to security-onion
Hi Wes,


dropping the value would be the same as deleting it manual or?
Could this be a problem as we set the days to keep open to 60?

I lowered this already to 50 - this is the value we can keep.


btw. we are on the latest release.


Regards,
Erwin

Wes Lambert

unread,
Sep 1, 2018, 6:34:02 AM9/1/18
to securit...@googlegroups.com
Hi Erwin,

DAYSTOKEEP only refers to alert data stored in securityonion_db

Thanks,
Wes

--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
To post to this group, send email to securit...@googlegroups.com.
Visit this group at https://groups.google.com/group/security-onion.
For more options, visit https://groups.google.com/d/optout.

Erwin

unread,
Sep 3, 2018, 4:54:12 AM9/3/18
to security-onion
Hi Wes,

sorry wrong "spelling" i meant Curatore Close Days:
# Curator options
CURATOR_ENABLED="yes"
CURATOR_CLOSE_DAYS=50

Could this be the problem?
i lowered it already to 50 but we still reaching the Watermark, any idea what we could do next?


Regards,
Erwin

Erwin

unread,
Sep 3, 2018, 11:21:45 AM9/3/18
to security-onion
Hi Wes,

i found the issue:
2018-09-03 15:15:04,881 ERROR Failed to complete action: close. <class 'curator.exceptions.FailedExecution'>: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: TransportError(403, u'cluster_block_exception', u'bocked by: [FORBIDDEN/12/index read-only / allow delete (api)];')


i could use the command:
curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

but, isnt there a better way to get this permanent?

Regards,
Erwin

Erwin

unread,
Sep 3, 2018, 11:23:31 AM9/3/18
to security-onion
edit: command does not solve the issue.

any idea? :)

Regards

Erwin

unread,
Sep 3, 2018, 12:33:19 PM9/3/18
to security-onion
sorry for spam -

i deleted now the indices manual step by step letz say about 30 days - now everything is ok but the question is still on - how could that happen?

Curator stopped doing his job :(
Workaround would be a script to delete the oldest Indices


Regards,
Erwin

Wes Lambert

unread,
Sep 4, 2018, 8:51:05 AM9/4/18
to securit...@googlegroups.com
To be clear, CURATOR_CLOSE_DAYS only affects the closing of indices -- not the deletion.  So, if you have more than 30 days of indices, they will begin being collapsed by Curator.  They will only be purged if they exceed the threshold in GB defined for LOG_SIZE_LIMIT in /etc/nsm/securityonion.conf.

If your LOG_SIZE_LIMIT value is greater than the percentage of disk specified for the watermark threshold for ES, then Curator will never delete the indices.

Try  reducing LOG_SIZE_LIMIT so that it is only ~80% of the disk assigned to /nsm/elasticsearch

Thanks,
Wes

--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
To post to this group, send email to securit...@googlegroups.com.
Visit this group at https://groups.google.com/group/security-onion.
For more options, visit https://groups.google.com/d/optout.

Erwin

unread,
Sep 24, 2018, 4:39:14 AM9/24/18
to security-onion
Dear Wes,


we do have the same problem again, the indices moves to read only and curator is not allowed to delete them... so it is just a matter of time we run again into a full disk.

Problem seems also to be discussed here: https://groups.google.com/forum/#!topic/security-onion/ZW9uc8J_UUc

2018-09-24 08:32:04,002 INFO Closing selected indices: [u'logstash-syslog-2018.08.04', u'logstash-bro-2018.08.01', u'logstash-bro-2018.08.02 ', u'logstash-bro-2018.08.03', u'logstash-bro-2018.08.04', u'logstash-bro-2018.08.05', u'logstash-ids-2018.08.05', u'logstash-ids-2018.08.04', u' logstash-ids-2018.08.03', u'logstash-ids-2018.08.02', u'logstash-ids-2018.08.01', u'logstash-syslog-2018.08.01', u'logstash-syslog-2018.08.03', u 'logstash-syslog-2018.08.02', u'logstash-syslog-2018.08.05']
2018-09-24 08:32:04,007 ERROR Failed to complete action: close. <class 'curator.exceptions.FailedExecution'>: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: TransportError(403, u'cluster_block_exception', u'blocked by : [FORBIDDEN/12/index read-only / allow delete (api)];')


We could write our own script to get the last indice out of read only and delete it, but shouldn that me done by curator?

So deleting the oldest one manual couldnt be the expected behaviour or?

Do you have any idea how to get this working normaly?


thank you,
Erwin

Erwin

unread,
Sep 24, 2018, 4:42:38 AM9/24/18
to security-onion
/dev/sdd2 12T 5,8T 4,9T 55% /nsm/elasticsearch

Wes Lambert

unread,
Sep 25, 2018, 10:59:08 AM9/25/18
to securit...@googlegroups.com
Hi Erwin,

I'll have to look into this to see if I notice anything strange with Curator, or something that might be affecting normal operation.

Thanks,
Wes

On Mon, Sep 24, 2018 at 4:42 AM Erwin <ak...@chello.at> wrote:
/dev/sdd2        12T  5,8T  4,9T  55% /nsm/elasticsearch

--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
To post to this group, send email to securit...@googlegroups.com.
Visit this group at https://groups.google.com/group/security-onion.
For more options, visit https://groups.google.com/d/optout.

Erwin

unread,
Sep 26, 2018, 3:57:39 AM9/26/18
to security-onion
Thank you Wes,

If you need anything, just let me know, i will daylie have a look here to support you.

Regards,
Erwin

Wes Lambert

unread,
Sep 27, 2018, 3:44:33 PM9/27/18
to securit...@googlegroups.com
Hi Erwin,

Could you please provide the output of the following from your storage node(s)?

curl -s localhost:9200/_cat/indices

curl -s localhost:9200/_cat/indices | grep close

Thanks,
Wes

--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
To post to this group, send email to securit...@googlegroups.com.
Visit this group at https://groups.google.com/group/security-onion.
For more options, visit https://groups.google.com/d/optout.

Erwin

unread,
Oct 2, 2018, 3:59:54 AM10/2/18
to security-onion
Hi Wes,

please see the attached files.



From the curator-log:

Closing selected indices: [u'logstash-bro-2018.08.13', u'logstash-bro-2018.08.12', u'logstash-bro-2018.08.11', u'logstash-bro-2018.08.10', u'logstash-ids-2018.08.07', u'logstash-bro-2018.08.01', u'logstash-bro-2018.08.02', u'logstash-bro-2018.08.03', u'logstash-bro-2018.08.04', u'logstash-bro-2018.08.05', u'logstash-bro-2018.08.06', u'logstash-bro-2018.08.07', u'logstash-bro-2018.08.08', u'logstash-bro-2018.08.09', u'logstash-ids-2018.08.13', u'logstash-ids-2018.08.05', u'logstash-ids-2018.08.04', u'logstash-ids-2018.08.03', u'logstash-ids-2018.08.02', u'logstash-ids-2018.08.01', u'logstash-ids-2018.08.09', u'logstash-syslog-2018.08.12', u'logstash-syslog-2018.08.13', u'logstash-syslog-2018.08.10', u'logstash-syslog-2018.08.11', u'logstash-ids-2018.08.10', u'logstash-ids-2018.08.11', u'logstash-ids-2018.08.12', u'logstash-ids-2018.08.06', u'logstash-syslog-2018.08.09', u'logstash-syslog-2018.08.08', u'logstash-syslog-2018.08.01', u'logstash-syslog-2018.08.03', u'logstash-syslog-2018.08.02', u'logstash-syslog-2018.08.05', u'logstash-syslog-2018.08.04', u'logstash-syslog-2018.08.07', u'logstash-syslog-2018.08.06', u'logstash-ids-2018.08.08']
2018-10-02 07:59:04,034 ERROR Failed to complete action: close. <class 'curator.exceptions.FailedExecution'>: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: TransportError(403, u'cluster_block_exception', u'blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];')


Regards,
Erwin


node2_output.txt
node1_output.txt

Erwin

unread,
Oct 2, 2018, 4:14:55 AM10/2/18
to security-onion
JFI - Actual my workaround is deleting it manual via wildcard.

Erwin

unread,
Oct 8, 2018, 7:04:59 AM10/8/18
to security-onion
Hi Wes,


did you have a chance to look futher that issue?


Regards,
Erwin

Wes Lambert

unread,
Oct 8, 2018, 8:22:04 AM10/8/18
to securit...@googlegroups.com
Hi Erwin,

It turns out Curator cannot delete closed indices.  This is because Elasticsearch reports back that closed indices are 0 bytes in size,therefore, when Curator looks at disk space, it does not know that there could actually be closed indices sitting on disk, taking up space.  Therefore, there ends up being a misrepresentation of available space, and this can cause the disk to eventually fill up or surpass the flood stage watermark (95%), resulting in ES locking indices to be read-only.

A short-term fix could be:

 - Alter the current Curator close job to use "dry-run", which will do nothing -- the job will still run, but won't do anything (no closed indices) and delete the current closed indices, reset the current read-only indices to be write-able again.

We have a more proper fix currently in testing that we hope will help resolve this issue.

Thanks,
Wes
 

--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
To post to this group, send email to securit...@googlegroups.com.
Visit this group at https://groups.google.com/group/security-onion.
For more options, visit https://groups.google.com/d/optout.

Erwin

unread,
Oct 8, 2018, 8:52:58 AM10/8/18
to security-onion
Hi Wes,


thanks for the reply, good to hear you found the issue.
looking forward to get this in the next release done :)

If you need further input, would be glad to assist you.


Regards,
Erwin

Wes Lambert

unread,
Oct 8, 2018, 8:57:31 AM10/8/18
to securit...@googlegroups.com
Sure thing -- and to clarify, it's not that Curator can't delete closed indices at all, it's that it can't delete them based on disk_space.  You could actually change the delete for closed indices to occur based on a number of days, and so on.

Thanks,
Wes

--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
To post to this group, send email to securit...@googlegroups.com.
Visit this group at https://groups.google.com/group/security-onion.
For more options, visit https://groups.google.com/d/optout.

Daniel Sullivan

unread,
Oct 12, 2018, 1:28:40 PM10/12/18
to security-onion
I will also be very interested to see this proper fix.

Brandon Stephens

unread,
Oct 29, 2018, 1:25:00 PM10/29/18
to security-onion
On Monday, October 8, 2018 at 8:57:31 AM UTC-4, Wes wrote:
Wes,

Can you please comment with instructions on how to make this change? In looking at https://www.elastic.co/guide/en/elasticsearch/client/curator/current/options.html

I dont see any straightforward option to enable and I would prefer not to mess with the docker. We keep getting stung with closed indices not being deleted. I would be perfectly happy deleting an index regardless of size after 60 days.

thanks,

Brandon

Wes Lambert

unread,
Oct 31, 2018, 4:10:47 PM10/31/18
to securit...@googlegroups.com
HI Brandon,

We will be releasing a patch for this soon:


You could wait for that and see if it helps, or just set the disable the delete action similar to the following in /etc/curator/action/delete.yml:

actions:

 1:

   action: delete_indices

   description: >-

     Delete indices when age is exceeded.

   options:

     ignore_empty_list: True

     disable_action: False

   filters:

   - filtertype: pattern

     kind: prefix

     value: logstash-

   - filtertype: age

     source: name

     direction: older

     timestring: '%Y.%m.%d'

     unit: days

     unit_count: 60


Thanks,
Wes

Reply all
Reply to author
Forward
0 new messages