/nsm/elasticsearch partition full

Christian Sommer

unread,

Aug 29, 2018, 6:37:20 AM8/29/18

to security-onion

Hi,

right now some of our sensors refuse to accept new data in ElasticSearch,
because the partition for elasticsearch is 95% full. (/nsm is on another partition)

/dev/sda1 73T 62T 7,4T 90% /nsm
/dev/sdd1 459G 70M 435G 1% /nsm/logstash
/dev/sdd2 12T 11T 571G 95% /nsm/elasticsearch

Which process is responsible for freeing up space on the elasticsearch partition?
How can I solve this problem?

BR
Christian

Wes Lambert

unread,

Aug 29, 2018, 5:51:53 PM8/29/18

to securit...@googlegroups.com

This is normally handles by Curator.

What does the ES log look like?

Thanks,

Wes

--

https://twitter.com/therealwlambert

https://securityonion.net/

Erwin

unread,

Aug 30, 2018, 4:41:41 AM8/30/18

to security-onion

Hi Wes,

we see the following error:
Suppressed: java.lang.IllegalArgumentException: unable to consistently parse [cluster.routing.allocation.disk.watermark.low=30gb], [cluster.routing.allocation.disk.watermark.high=20gb], and [cluster.routing.allocation.disk.watermark.flood_stage=95%] as percentage or bytes

Caused by: org.elasticsearch.ElasticsearchParseException: failed to parse setting [cluster.routing.allocation.disk.watermark.flood_stage] with value [95%] as a size in bytes: unit is missing or unrecognized

[2018-08-28T00:06:06,072][INFO ][org.elasticsearch.cluster.routing.allocation.DiskThresholdMonitor] rerouting shards: [high disk watermark exceeded on one or more nodes]
[2018-08-28T00:06:39,227][WARN ][org.elasticsearch.cluster.routing.allocation.DiskThresholdMonitor] high disk watermark [90%] exceeded on [UZb8XTHOQCqBuwfHwaMzqA][UZb8XTH][/usr/share/elasticsearc
nodes/0] free: 578.1gb[5%], shards will be relocated away from this node

Regards,
Erwin

Wes Lambert

unread,

Aug 30, 2018, 9:31:10 AM8/30/18

to securit...@googlegroups.com

What is the result of the following?

grep LOG_SIZE_LIMIT /etc/nsm/securityonion.conf

How does that number compare with the output of df -h or the space assigned to Elasticsearch?

Thanks,

Wes

Erwin

unread,

Aug 31, 2018, 2:18:07 AM8/31/18

to security-onion

Hi Wes,

grep LOG_SIZE_LIMIT /etc/nsm/securityonion.conf
LOG_SIZE_LIMIT=11900

df -h
Filesystem Size Used Avail Use% Mounted on

/dev/sdd2 12T 11T 571G 95% /nsm/elasticsearch

We have deleted some indicies on the first node, everything started working again.
I left the 2nd node full so we can troubleshoot on this one.

Regards,
Erwin

Wes Lambert

unread,

Aug 31, 2018, 1:52:54 PM8/31/18

to securit...@googlegroups.com

Hi Erwin,

Try dropping the value for LOG_SIZE_LIMIT to see if that helps.

Curator uses this value to determine when to purge data.

Thanks,

Wes

Erwin

unread,

Sep 1, 2018, 3:08:35 AM9/1/18

to security-onion

Hi Wes,

dropping the value would be the same as deleting it manual or?
Could this be a problem as we set the days to keep open to 60?

I lowered this already to 50 - this is the value we can keep.

btw. we are on the latest release.

Regards,
Erwin

Wes Lambert

unread,

Sep 1, 2018, 6:34:02 AM9/1/18

to securit...@googlegroups.com

Hi Erwin,

DAYSTOKEEP only refers to alert data stored in securityonion_db

Thanks,

Wes

Erwin

unread,

Sep 3, 2018, 4:54:12 AM9/3/18

to security-onion

Hi Wes,

sorry wrong "spelling" i meant Curatore Close Days:
# Curator options
CURATOR_ENABLED="yes"
CURATOR_CLOSE_DAYS=50

Could this be the problem?
i lowered it already to 50 but we still reaching the Watermark, any idea what we could do next?

Regards,
Erwin

Erwin

unread,

Sep 3, 2018, 11:21:45 AM9/3/18

to security-onion

Hi Wes,

i found the issue:
2018-09-03 15:15:04,881 ERROR Failed to complete action: close. <class 'curator.exceptions.FailedExecution'>: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: TransportError(403, u'cluster_block_exception', u'bocked by: [FORBIDDEN/12/index read-only / allow delete (api)];')

i could use the command:
curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

but, isnt there a better way to get this permanent?

Regards,
Erwin

Erwin

unread,

Sep 3, 2018, 11:23:31 AM9/3/18

to security-onion

edit: command does not solve the issue.

any idea? :)

Regards

Erwin

unread,

Sep 3, 2018, 12:33:19 PM9/3/18

to security-onion

sorry for spam -

i deleted now the indices manual step by step letz say about 30 days - now everything is ok but the question is still on - how could that happen?

Curator stopped doing his job :(
Workaround would be a script to delete the oldest Indices

Regards,
Erwin

Wes Lambert

unread,

Sep 4, 2018, 8:51:05 AM9/4/18

to securit...@googlegroups.com

To be clear, CURATOR_CLOSE_DAYS only affects the closing of indices -- not the deletion. So, if you have more than 30 days of indices, they will begin being collapsed by Curator. They will only be purged if they exceed the threshold in GB defined for LOG_SIZE_LIMIT in /etc/nsm/securityonion.conf.

If your LOG_SIZE_LIMIT value is greater than the percentage of disk specified for the watermark threshold for ES, then Curator will never delete the indices.

Try reducing LOG_SIZE_LIMIT so that it is only ~80% of the disk assigned to /nsm/elasticsearch

Thanks,

Wes

Erwin

unread,

Sep 24, 2018, 4:39:14 AM9/24/18

to security-onion

Dear Wes,

we do have the same problem again, the indices moves to read only and curator is not allowed to delete them... so it is just a matter of time we run again into a full disk.

Problem seems also to be discussed here: https://groups.google.com/forum/#!topic/security-onion/ZW9uc8J_UUc

2018-09-24 08:32:04,002 INFO Closing selected indices: [u'logstash-syslog-2018.08.04', u'logstash-bro-2018.08.01', u'logstash-bro-2018.08.02 ', u'logstash-bro-2018.08.03', u'logstash-bro-2018.08.04', u'logstash-bro-2018.08.05', u'logstash-ids-2018.08.05', u'logstash-ids-2018.08.04', u' logstash-ids-2018.08.03', u'logstash-ids-2018.08.02', u'logstash-ids-2018.08.01', u'logstash-syslog-2018.08.01', u'logstash-syslog-2018.08.03', u 'logstash-syslog-2018.08.02', u'logstash-syslog-2018.08.05']
2018-09-24 08:32:04,007 ERROR Failed to complete action: close. <class 'curator.exceptions.FailedExecution'>: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: TransportError(403, u'cluster_block_exception', u'blocked by : [FORBIDDEN/12/index read-only / allow delete (api)];')

We could write our own script to get the last indice out of read only and delete it, but shouldn that me done by curator?

So deleting the oldest one manual couldnt be the expected behaviour or?

Do you have any idea how to get this working normaly?

thank you,
Erwin

Erwin

unread,

Sep 24, 2018, 4:42:38 AM9/24/18

to security-onion

/dev/sdd2 12T 5,8T 4,9T 55% /nsm/elasticsearch

Wes Lambert

unread,

Sep 25, 2018, 10:59:08 AM9/25/18

to securit...@googlegroups.com

Hi Erwin,

I'll have to look into this to see if I notice anything strange with Curator, or something that might be affecting normal operation.

Thanks,

Wes

On Mon, Sep 24, 2018 at 4:42 AM Erwin <ak...@chello.at> wrote:

Erwin

unread,

Sep 26, 2018, 3:57:39 AM9/26/18

to security-onion

Thank you Wes,

If you need anything, just let me know, i will daylie have a look here to support you.

Regards,
Erwin

Wes Lambert

unread,

Sep 27, 2018, 3:44:33 PM9/27/18

to securit...@googlegroups.com

Hi Erwin,

Could you please provide the output of the following from your storage node(s)?

curl -s localhost:9200/_cat/indices

curl -s localhost:9200/_cat/indices | grep close

Thanks,

Wes

Erwin

unread,

Oct 2, 2018, 3:59:54 AM10/2/18

to security-onion

Hi Wes,

please see the attached files.

From the curator-log:

Closing selected indices: [u'logstash-bro-2018.08.13', u'logstash-bro-2018.08.12', u'logstash-bro-2018.08.11', u'logstash-bro-2018.08.10', u'logstash-ids-2018.08.07', u'logstash-bro-2018.08.01', u'logstash-bro-2018.08.02', u'logstash-bro-2018.08.03', u'logstash-bro-2018.08.04', u'logstash-bro-2018.08.05', u'logstash-bro-2018.08.06', u'logstash-bro-2018.08.07', u'logstash-bro-2018.08.08', u'logstash-bro-2018.08.09', u'logstash-ids-2018.08.13', u'logstash-ids-2018.08.05', u'logstash-ids-2018.08.04', u'logstash-ids-2018.08.03', u'logstash-ids-2018.08.02', u'logstash-ids-2018.08.01', u'logstash-ids-2018.08.09', u'logstash-syslog-2018.08.12', u'logstash-syslog-2018.08.13', u'logstash-syslog-2018.08.10', u'logstash-syslog-2018.08.11', u'logstash-ids-2018.08.10', u'logstash-ids-2018.08.11', u'logstash-ids-2018.08.12', u'logstash-ids-2018.08.06', u'logstash-syslog-2018.08.09', u'logstash-syslog-2018.08.08', u'logstash-syslog-2018.08.01', u'logstash-syslog-2018.08.03', u'logstash-syslog-2018.08.02', u'logstash-syslog-2018.08.05', u'logstash-syslog-2018.08.04', u'logstash-syslog-2018.08.07', u'logstash-syslog-2018.08.06', u'logstash-ids-2018.08.08']
2018-10-02 07:59:04,034 ERROR Failed to complete action: close. <class 'curator.exceptions.FailedExecution'>: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: TransportError(403, u'cluster_block_exception', u'blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];')

Regards,
Erwin

node2_output.txt

node1_output.txt

Erwin

unread,

Oct 2, 2018, 4:14:55 AM10/2/18

to security-onion

JFI - Actual my workaround is deleting it manual via wildcard.

Erwin

unread,

Oct 8, 2018, 7:04:59 AM10/8/18

to security-onion

Hi Wes,

did you have a chance to look futher that issue?

Regards,
Erwin

Wes Lambert

unread,

Oct 8, 2018, 8:22:04 AM10/8/18

to securit...@googlegroups.com

Hi Erwin,

It turns out Curator cannot delete closed indices. This is because Elasticsearch reports back that closed indices are 0 bytes in size,therefore, when Curator looks at disk space, it does not know that there could actually be closed indices sitting on disk, taking up space. Therefore, there ends up being a misrepresentation of available space, and this can cause the disk to eventually fill up or surpass the flood stage watermark (95%), resulting in ES locking indices to be read-only.

A short-term fix could be:

- Alter the current Curator close job to use "dry-run", which will do nothing -- the job will still run, but won't do anything (no closed indices) and delete the current closed indices, reset the current read-only indices to be write-able again.

We have a more proper fix currently in testing that we hope will help resolve this issue.

Thanks,

Wes

Erwin

unread,

Oct 8, 2018, 8:52:58 AM10/8/18

to security-onion

Hi Wes,

thanks for the reply, good to hear you found the issue.
looking forward to get this in the next release done :)

If you need further input, would be glad to assist you.

Regards,
Erwin

Wes Lambert

unread,

Oct 8, 2018, 8:57:31 AM10/8/18

to securit...@googlegroups.com

Sure thing -- and to clarify, it's not that Curator can't delete closed indices at all, it's that it can't delete them based on disk_space. You could actually change the delete for closed indices to occur based on a number of days, and so on.

Thanks,

Wes

Daniel Sullivan

unread,

Oct 12, 2018, 1:28:40 PM10/12/18

to security-onion

I will also be very interested to see this proper fix.

Brandon Stephens

unread,

Oct 29, 2018, 1:25:00 PM10/29/18

to security-onion

On Monday, October 8, 2018 at 8:57:31 AM UTC-4, Wes wrote:

Wes,

Can you please comment with instructions on how to make this change? In looking at https://www.elastic.co/guide/en/elasticsearch/client/curator/current/options.html

I dont see any straightforward option to enable and I would prefer not to mess with the docker. We keep getting stung with closed indices not being deleted. I would be perfectly happy deleting an index regardless of size after 60 days.

thanks,

Brandon

Wes Lambert

unread,

Oct 31, 2018, 4:10:47 PM10/31/18

to securit...@googlegroups.com

HI Brandon,

We will be releasing a patch for this soon:

https://github.com/Security-Onion-Solutions/security-onion/issues/1340

You could wait for that and see if it helps, or just set the disable the delete action similar to the following in /etc/curator/action/delete.yml:

actions:

1:

action: delete_indices

description: >-

Delete indices when age is exceeded.

options:

ignore_empty_list: True

disable_action: False

filters:

- filtertype: pattern

kind: prefix

value: logstash-

- filtertype: age

source: name

direction: older

timestring: '%Y.%m.%d'

unit: days

unit_count: 60

Thanks,

Wes

Reply all

Reply to author

Forward

/nsm/elasticsearch partition full - stopped working

Christian Sommer

Wes Lambert

Erwin

Wes Lambert

Erwin

Wes Lambert

Erwin

Wes Lambert

Erwin

Erwin

Erwin

Erwin

Wes Lambert

Erwin

Erwin

Wes Lambert

Erwin

Wes Lambert

Erwin

Erwin

Erwin

Wes Lambert

Erwin

Wes Lambert

Daniel Sullivan

Brandon Stephens

Wes Lambert