Delete data from elasticsearch automatically

4,331 views
Skip to first unread message

Ayush Agarwal

unread,
Oct 4, 2019, 1:59:48 AM10/4/19
to Wazuh mailing list
Hi,

Is there a way to automatically delete date from elasticsearch after like 30 days or 60 days.
Currently, /var/lib/elasticsearch/nodes/0/indices directory takes most of my disk space and I want to delete elasticsearch after certain period of time automatically.

Thanks!
Ayush Agarwal

Pablo Torres

unread,
Oct 4, 2019, 2:52:40 AM10/4/19
to Wazuh mailing list
Hi Ayush,

Yes, that's possible. You can make use of Elastic Curator, I'll explain you an example of how to set up Elastic Curator to delete wazuh-alerts* indices older than 30 days:


2. Once it's installed, we have to create Curator configuration file:
mkdir ~/.curator/

touch
~/.curator/curator.yml

3. Open that file ( ~/.curator/curator.yml) and add the following configuration to it: (do not forget to change <elastic-ip> with your Elasticsearch host IP)
client:
  hosts
:
   
- <elastic-ip>
  port
: 9200
  url_prefix
:
  use_ssl
: False
  certificate
:
  client_cert
:
  client_key
:
  ssl_no_validate
: False
  http_auth
:
  timeout
: 30
  master_only
: False

logging
:
  loglevel
: INFO
  logfile
:
  logformat
: default
  blacklist
: ['elasticsearch', 'urllib3']

4. 
Now let's create the action_file:
touch ~/.curator/delete_indices.yaml

Open that new file (~/.curator/delete_indices.yaml) and add this configuration:
actions:
 
1:
    action
: delete_indices
    description
: >-
     
Delete indices older than 30 days (based on index name), for wazuh-alerts-3.x-
      prefixed indices
. Ignore the error if the filter does not result in an
      actionable list of indices
(ignore_empty_list) and exit cleanly.
    options
:
      ignore_empty_list
: True
      disable_action
: False
    filters
:
   
- filtertype: pattern
      kind
: prefix
      value
: wazuh-alerts-3.x-
   
- filtertype: age
      source
: name
      direction
: older
      timestring
: '%Y.%m.%d'
      unit
: days
      unit_count
: 30

5. Now we have Elastic Curator configured to delete indices older than 30 days, all we have to do to run Elastic Curator is this command:
curator --config ~/.curator/curator.yml ~/.curator/delete_indices.yaml

So every time you manually run this command, indices older than 30 days will be deleted.


What if we want to automatically remove old indices without manually running that command? we can set a cron job to do that:

crontab -e
Now add this to the crontab file:
0 12 * * * curator --config ~/.curator/curator.yml ~/.curator/delete_indices.yaml
In this example, we set a cron job to run the previous Elastic Curator command everyday at 12:00 AM deleting indices older than 30days.


We can add multiple actions to the action file we created (~/.curator/delete_indices.yaml), the previous example deletes only wazuh-alerts-* if you also want  to delete wazuh-monitoring-* indices you just have to add this new action:
...
 
2:
    action
: delete_indices
    description
: >-
     
Delete indices older than 30 days (based on index name), for wazuh-monitoring-3.x-
      prefixed indices
. Ignore the error if the filter does not result in an
      actionable list of indices
(ignore_empty_list) and exit cleanly.
    options
:
      ignore_empty_list
: True
      disable_action
: False
    filters
:
   
- filtertype: pattern
      kind
: prefix
      value
: wazuh-monitoring-3.x-
   
- filtertype: age
      source
: name
      direction
: older
      timestring
: '%Y.%m.%d'
      unit
: days
      unit_count
: 30




This is just an example of how to automatically delete data from Elasticsearch, if that does not work for you, there is another option to automatically delete data from Elasticsearch: using Elasticsearch Index Lifecycle Management API. You can find here more info about Elasticsearch ILM: https://www.elastic.co/guide/en/elasticsearch/reference/6.7/index-lifecycle-management.html


Let me know if it helps

Kind regards,
Pablo Torres

Ayush Agarwal

unread,
Oct 10, 2019, 1:19:30 AM10/10/19
to Wazuh mailing list
Hi Pablo,

Thanks for your help!
However, this does not delete any file from the /var/lib/elasticsearch/nodes/0/indices directory and the size of the directory remains same.
This directory is taking most of the disk space and is increasing everyday. I am not sure what all can I delete in this without losing any dashboards and kibana settings.

Could you please guide me on this?

Thanks!
Ayush Agarwal

Ayush Agarwal

unread,
Oct 10, 2019, 4:26:20 AM10/10/19
to Wazuh mailing list
Finally, managed to delete old indices. However, I used the below 2 commands to show and delete the older indices.

curator_cli --host localhost show_indices --filter_list '{"filtertype":"age","source":"name","timestring":"%Y.%m.%d","unit":"days","unit_count":30,"direction":"older"}'
curator_cli --host localhost delete_indices --filter_list '{"filtertype":"age","source":"name","timestring":"%Y.%m.%d","unit":"days","unit_count":30,"direction":"older"}'

Adri Valle

unread,
Oct 10, 2019, 4:59:54 AM10/10/19
to Wazuh mailing list

Hi Ayush,

Since Elasticsearch 6.6
Elasticsearch implemented a new capability since the 6.6 version, index lifecycle management . This capability allows control how indices are handled as they age by attaching a lifecycle policy to the index template used to create them.

I advise you replace your curator files and the crontab entries for a new ilm(index lifecycle management) policy.

I go to explain how to create a new policy and how to apply it to the wazuh-alerts-3.x-* indices.

You have to know that the are several options, but I’ll explain to you only the delete option because that is what you need.

So, the first thing that we go to do is open the Devtools console in Kibana:

After that you have to create a new ilm policy:

PUT _ilm/policy/delete_after_30_days 
{
 "policy": { 
 "phases": {
 "delete": {
 "min_age": "30d", 
 "actions": {
 "delete": {} 
 }
 }
 }
 }
}

The previous policy called delete_after_30_days has the delete phase with the condition min_age: “30d” the min_age condition is to check the minimum time that the index exists, so this policy will be applied only to the index with 30 days old or more. When an index matches this condition the action will delete it.

Once you have created the ilm policy you have to update the wazuh template to assign it the policy.

PUT _template/wazuh
{
 "index_patterns" : ["wazuh-alerts-3.x-*"],
 "settings": {
 "index.lifecycle.name": "delete_after_30_days"
 }
}

NOTE: The index_patterns parameter is necessary to match with the indices that start with this index pattern, the index.lifecycle.name has to be the previous ilm policy created name.
NOTE: You can create as many policies as you want, but you can only apply once per index pattern.

After that you can check if the indices are now managed by the ilm policy executing GET wazuh-alerts-3.x-*/_ilm/explain:

...
...
...
},
"wazuh-alerts-3.x-2019.10.05" : {
 "index" : "wazuh-alerts-3.x-2019.10.05",
 "managed" : false
},
"wazuh-alerts-3.x-2019.10.09" : {
 "index" : "wazuh-alerts-3.x-2019.10.09",
 "managed" : false
},
"wazuh-alerts-3.x-2019.10.10" : {
 "index" : "wazuh-alerts-3.x-2019.10.10",
 "managed" : true,
 "policy" : "delete_after_30_days",
 "lifecycle_date_millis" : 1570697276930,
 "phase" : "new",
 "phase_time_millis" : 1570697276968,
 "action" : "complete",
 "action_time_millis" : 1570697276968,
 "step" : "complete",
 "step_time_millis" : 1570697276968
},
"wazuh-alerts-3.x-2019.10.03" : {
 "index" : "wazuh-alerts-3.x-2019.10.03",
 "managed" : false
},
"wazuh-alerts-3.x-2019.10.04" : {
 "index" : "wazuh-alerts-3.x-2019.10.04",
 "managed" : false
},
...
...
...

As you can see several indices are not managed by the policy and only one that it is. This is due the template is applied when the index is created, then the existing indices will not be managed by the policy, this means, that you have to choices, migrate the index in order to apply the new template with the ilm policy or delete by hand the old index that you don’t longer need. The next created indices will have the new policy applied and when the 30 days since some index was created pass it will be removed.

NOTE: Index lifecycle management policy runs every 10 minutes by default, if you want to change the interval you can do it executing the following request:

PUT /_cluster/settings
{
 "persistent" : {
 "indices.lifecycle.poll_interval": <interval>
 }
}

I hope it helps. If you have more doubts or problems, please don’t hesitate to ask again.

Regards,

Adri

Reply all
Reply to author
Forward
0 new messages