Curator not deleting old indices

865 views
Skip to first unread message

Daniel Sullivan

unread,
Sep 5, 2018, 12:04:02 PM9/5/18
to security-onion
I have a single heavy node attached to a master. Disk on the heavy node is at 95%, which causes Elastic to shutdown. Curator is on, log size is limited:

LOG_SIZE_LIMIT=600

CURATOR_ENABLED="yes"
CURATOR_CLOSE_DAYS=10
------

Elastic is well over the limit:

> du /nsm/elasticsearch -h -s
1.5T /nsm/elasticsearch


---------

Here is a sample from /var/log/curator/curator.log:


2018-09-05 14:11:01,586 INFO Preparing Action ID: 1, "delete_indices"
2018-09-05 14:11:01,587 INFO Preparing Action ID: 1, "close"
2018-09-05 14:11:01,592 INFO Trying Action ID: 1, "delete_indices": Delete indices when $disk_space value (in GB) is exceeded.
2018-09-05 14:11:01,594 INFO Trying Action ID: 1, "close": Close indices older than 10 days (based on index name), for logstash- prefixed indices.
2018-09-05 14:11:02,928 INFO Skipping action "delete_indices" due to empty list: <class 'curator.exceptions.NoIndices'>
2018-09-05 14:11:02,928 INFO Action ID: 1, "delete_indices" completed.
2018-09-05 14:11:02,928 INFO Job completed.
2018-09-05 14:11:02,948 INFO Closing selected indices: [u'logstash-syslog-2018.08.23', u'logstash-syslog-2018.08.22', u'logstash-syslog-2018.08.26', u'logstash-syslog-2018.08.25', u'logstash-syslog-2018.08.24', u'logstash-beats-2018.08.24', u'logstash-beats-2018.08.25', u'logstash-beats-2018.08.26', u'logstash-beats-2018.08.22', u'logstash-beats-2018.08.23', u'logstash-bro-2018.08.22', u'logstash-bro-2018.08.23', u'logstash-bro-2018.08.26', u'logstash-bro-2018.08.24', u'logstash-bro-2018.08.25', u'logstash-ids-2018.08.23', u'logstash-ids-2018.08.22', u'logstash-ids-2018.08.25', u'logstash-ids-2018.08.24', u'logstash-ids-2018.08.26']
2018-09-05 14:11:02,953 ERROR Failed to complete action: close. <class 'curator.exceptions.FailedExecution'>: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: TransportError(403, u'cluster_block_exception', u'blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];')


I have the same issues prior to a complete reinstall a few months ago.
Any ideas on this?

Wes Lambert

unread,
Sep 6, 2018, 8:00:10 AM9/6/18
to securit...@googlegroups.com
Hi Daniel,

What is the output of the following?

grep disk_space: /etc/curator/action/delete.yml

Thanks,
Wes



--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
To post to this group, send email to securit...@googlegroups.com.
Visit this group at https://groups.google.com/group/security-onion.
For more options, visit https://groups.google.com/d/optout.


--

Daniel Sullivan

unread,
Sep 6, 2018, 8:55:33 AM9/6/18
to security-onion
grep disk_space: /etc/curator/action/delete.yml
disk_space: 600

Wes Lambert

unread,
Sep 6, 2018, 9:03:49 AM9/6/18
to securit...@googlegroups.com
Hi Daniel,

What are the size of your indices?  About how many events per second/GB/TB/day are you ingesting?

Thanks,
Wes

Daniel Sullivan

unread,
Sep 6, 2018, 9:34:26 AM9/6/18
to security-onion
Averaging 200 logs/second = 16M logs/day.

I do not know how many GB/TB/day, but I can tell you that I have been watching this for weeks as it has been creeping up. I don't believe anything has been deleting since 6/21, so it isn't much.

Ran the following and sorted the output: curl 'localhost:9200/_cat/indices?v'

https://pastebin.com/4Mm9xmty

Wes Lambert

unread,
Sep 6, 2018, 10:08:02 AM9/6/18
to securit...@googlegroups.com
Hi Daniel,

Try deleting indices (oldest first), until your disk space gets under the threshold amount.  

Ex.

curl -XDELETE localhost:9200/logstash-blah.<OLDEST-DATE>

Then, for your newest indices, curl the index settings to see if the read-only block is set to true:

curl localhost:9200/logstash-blah.<NEWEST-DATE>/_settings

If read-only is set to true, then you will need to modify the settings by doing something like:

Ex. 

curl -XPUT -H 'Content-Type: application/json' localhost:9200/logstash-syslog.2018.09.06 -d'{"index.blocks.read_only_allow_delete": null}'

Thanks,
Wes

Daniel Sullivan

unread,
Sep 6, 2018, 10:29:38 AM9/6/18
to security-onion
Thanks, Wes.

Removed a large number of old indicies and have the volume down to 573GB, which is under the 600GB threshold. Root problem may still exist, though.

"read_only_allow_delete" is true on all open indices. Ran to correct:
curl -XPUT -H 'Content-Type: application/json' localhost:9200/logstash-beats-2018.09.04 -d'{"index.blocks.read_only_allow_delete": null}'

Result:
{"error":{"root_cause":[{"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"}],"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"},"status":403}

Wes Lambert

unread,
Sep 6, 2018, 10:38:23 AM9/6/18
to securit...@googlegroups.com
Hi Daniel,

It looks like I made a mistake on the last line.  Try PUT'ing to /logstash-beats-2018.09.04/_settings instead of the previous.

Thanks,
Wes

Wes Lambert

unread,
Sep 6, 2018, 10:39:04 AM9/6/18
to securit...@googlegroups.com
Ex. 

curl -XPUT -H 'Content-Type: application/json' localhost:9200/logstash-beats-2018.09.04/_settings   -d'{"index.blocks.read_only_allow_delete": null}'  

Daniel Sullivan

unread,
Sep 6, 2018, 11:36:15 AM9/6/18
to security-onion
Wes,

Verified that "read_only_allow_delete" no longer exists after running that revised cmd. Applied successfully to all open indices.

Another look at /var/log/curator/curator.log:
2018-09-06 15:20:02,360 INFO Preparing Action ID: 1, "close"
2018-09-06 15:20:02,367 INFO Trying Action ID: 1, "close": Close indices older than 10 days (based on index name), for logstash- prefixed indices.
2018-09-06 15:20:02,412 INFO Preparing Action ID: 1, "delete_indices"
2018-09-06 15:20:02,420 INFO Trying Action ID: 1, "delete_indices": Delete indices when $disk_space value (in GB) is exceeded.
2018-09-06 15:20:02,986 INFO Skipping action "delete_indices" due to empty list: <class 'curator.exceptions.NoIndices'>
2018-09-06 15:20:02,986 INFO Action ID: 1, "delete_indices" completed.
2018-09-06 15:20:02,987 INFO Job completed.
2018-09-06 15:20:02,989 INFO Skipping action "close" due to empty list: <class 'curator.exceptions.NoIndices'>
2018-09-06 15:20:02,989 INFO Action ID: 1, "close" completed.
2018-09-06 15:20:02,989 INFO Job completed.

Another look at indices shows that they are within limits (I manually deleted only until 2018.08.01, so I know that something automated got in there): https://pastebin.com/YfYQw6r1


This is just a guess on the root cause: The Curator job was dying from being unable to close the open indices. The job then never got to deleting old closed indices when space used is over the 600GB limit?
If that is the case, how do we ensure that "read_only_allow_delete":"True" is not causing this issue again?

Appreciate your help!

Daniel Sullivan

unread,
Sep 6, 2018, 11:44:41 AM9/6/18
to security-onion
Disregard that note about something else deleting the old indices... there was a period where SO wasn't operating, so that is why there is a gap in time prior to 8/13.

Question still stands: Why was this happening to begin with and how do we ensure it doesn't ?

Erwin

unread,
Sep 25, 2018, 6:40:28 AM9/25/18
to security-onion
Hi Daniel,

we do have the exact same issue:
https://groups.google.com/forum/#!topic/security-onion/r8o0W6NM1KY

Did you change something?

We also see that older indicies get to Read Only and the curator is not able to delete the old logs. after some period of time, the disk is full and elastic stops working...

Regards,
Erwin

Daniel Sullivan

unread,
Sep 25, 2018, 10:17:53 PM9/25/18
to security-onion
Nothing that I am aware of has been changed other than Ubuntu being upgraded to 16.04 months prior. Haven't checked the state of the indicies in a while. Will do so in a day or so, but can't imagine the issue going away.

Tony Butt

unread,
Sep 27, 2018, 9:20:13 PM9/27/18
to security-onion
On Wednesday, 26 September 2018 02:17:53 UTC, Daniel Sullivan wrote:
> Nothing that I am aware of has been changed other than Ubuntu being upgraded to 16.04 months prior. Haven't checked the state of the indicies in a while. Will do so in a day or so, but can't imagine the issue going away.

We had the same problem, and I found for some reason the cron jobs to run the curator process were not present.
I manually ran the curator-delete task, and copied in the files from the git repos, now seems OK

Francois Lachance

unread,
Sep 28, 2018, 6:07:16 PM9/28/18
to security-onion
Tony,

Are you saying that there was no file "curator-delete" in the /etc/cron.d directory?

Tony Butt

unread,
Nov 11, 2018, 5:56:19 PM11/11/18
to security-onion
Francois,
That is correct. The cron job is there now, and curator is running, but the indexes are still not being deleted. I'm in the process of working through the other suggestions in this topic.

So far, I have been able to manually delete old indexes, so we are still working OK.

Tony

Francois Lachance

unread,
Nov 13, 2018, 1:02:37 PM11/13/18
to security-onion
The latest update to SO should have this fixed. Just run "sudo soup" on your nodes, starting with Master first.

Wes Lambert

unread,
Nov 19, 2018, 8:16:02 AM11/19/18
to securit...@googlegroups.com
The latest version of the Curator closed index delete script is currently in testing:


Thanks,
Wes

--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
To post to this group, send email to securit...@googlegroups.com.
Visit this group at https://groups.google.com/group/security-onion.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages