Is it safe to delete Druid's folder var/tmp?

840 views
Skip to first unread message

Nikita Salnikov-Tarnovski

unread,
Jun 1, 2016, 4:31:40 AM6/1/16
to Druid User
Good morning.

I am testing batch data ingestion in Druid. Currently in my installation there is a folder $DRUID/var/tmp which grew to almost 1TB (terabyte, yes) now. And $DRUID/var/druid is mere 350GB. Is it safe to delete that $DRUID/var/tmp folder to free the space?

Thank you in advance,
Nikita

Gian Merlino

unread,
Jun 1, 2016, 12:42:21 PM6/1/16
to druid...@googlegroups.com
Hey Nikita,

It should be safe to delete anything ingestion-related from tmp if you are not currently doing an ingestion job. Most processes should be cleaning up after themselves, though; could I ask what is in there right now?

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/149265c8-947b-4ae7-864e-2e48d8bda551%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

charles.allen

unread,
Jun 1, 2016, 2:54:10 PM6/1/16
to Druid User
is $DRUID/var/tmp set as the java tmp directory? 


On Wednesday, June 1, 2016 at 9:42:21 AM UTC-7, Gian Merlino wrote:
Hey Nikita,

It should be safe to delete anything ingestion-related from tmp if you are not currently doing an ingestion job. Most processes should be cleaning up after themselves, though; could I ask what is in there right now?

Gian

On Wed, Jun 1, 2016 at 1:31 AM, Nikita Salnikov-Tarnovski <ni...@plumbr.eu> wrote:
Good morning.

I am testing batch data ingestion in Druid. Currently in my installation there is a folder $DRUID/var/tmp which grew to almost 1TB (terabyte, yes) now. And $DRUID/var/druid is mere 350GB. Is it safe to delete that $DRUID/var/tmp folder to free the space?

Thank you in advance,
Nikita

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

Nikita Salnikov-Tarnovski

unread,
Jun 2, 2016, 2:12:48 AM6/2/16
to Druid User
I have not modified java.io.tmpdir property. I assume that I use quickstart default settings for all my nodes.

Currently I still have like 40 tasks running and pending. var/tmp contains around 3400 subfolders with names like "base948886206048112452flush":


tmp/base948886206048112452flush/

└── merged

    ├── 00000.smoosh

    ├── meta.smoosh

    └── version.bin


1 directory, 3 files



On Wednesday, June 1, 2016 at 9:54:10 PM UTC+3, charles.allen wrote:
is $DRUID/var/tmp set as the java tmp directory? 

On Wednesday, June 1, 2016 at 9:42:21 AM UTC-7, Gian Merlino wrote:
Hey Nikita,

It should be safe to delete anything ingestion-related from tmp if you are not currently doing an ingestion job. Most processes should be cleaning up after themselves, though; could I ask what is in there right now?

Gian

On Wed, Jun 1, 2016 at 1:31 AM, Nikita Salnikov-Tarnovski <ni...@plumbr.eu> wrote:
Good morning.

I am testing batch data ingestion in Druid. Currently in my installation there is a folder $DRUID/var/tmp which grew to almost 1TB (terabyte, yes) now. And $DRUID/var/druid is mere 350GB. Is it safe to delete that $DRUID/var/tmp folder to free the space?

Thank you in advance,
Nikita

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

Fangjin Yang

unread,
Jun 3, 2016, 7:49:47 PM6/3/16
to Druid User
Hi Nikita, what are the sizes of yoru smoosh files? I am wondering if you are persisting to disk more often than you need to for your setup.

Nikita Salnikov-Tarnovski

unread,
Jun 4, 2016, 7:25:45 AM6/4/16
to Druid User
From 1MB up to 1GB+.

Currently I have no indexing jobs running, but the var/tmp folder is still full of files. More than 3600 of them.

Thanks for your help, guys :)

Nikita

mo...@sweetcouch.com

unread,
Sep 26, 2016, 3:00:01 AM9/26/16
to Druid User
Hi Nikita,
Did you figure out why this is happening(Files are not being cleared on their own)? Facing the same issue.

Thanks,
Mo

Nikita Salnikov-Tarnovski

unread,
Sep 29, 2016, 9:29:30 AM9/29/16
to Druid User
Haven't such problems recently any more.

julien...@millmobile.com

unread,
Jun 2, 2017, 5:13:02 AM6/2/17
to Druid User
Hi,
 
Also facing the same issue here. We are in druid 10.0 and using the hadoop indexer.
What version are you using Nikita ?

Thanks,
Julien

den...@gmail.com

unread,
Jul 12, 2017, 4:19:05 PM7/12/17
to Druid User
Having very the same issue, already tired to look for solution.
Also on hadoop indexer...

Gian Merlino

unread,
Jul 12, 2017, 4:41:38 PM7/12/17
to druid...@googlegroups.com
It should be safe to clear out the tmp directory when no indexing is running. Also, if you use a remote Hadoop cluster or if you use local mode native indexing (the "index" task in Druid) then this should not be an issue. I believe it should only be an issue with local mode hadoop (which isn't recommended in production anyway).

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

den...@gmail.com

unread,
Jul 12, 2017, 7:07:25 PM7/12/17
to Druid User
Can you please provide a reference where it stated that hadoop-indexer isn't recommended in production and why?

Is there any sane way (besides cronjob) to clean up these files ?

Thanks!

Gian Merlino

unread,
Jul 12, 2017, 7:17:15 PM7/12/17
to druid...@googlegroups.com
Hadoop indexer in YARN mode is totally good in production. It's just Hadoop in _local mode_ that isn't normally suggested for production. That's really just meant for testing and dev. I think a cron job is your best bet for cleaning up the files that it generates.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.

Johnson Johnson

unread,
Oct 11, 2017, 10:08:25 PM10/11/17
to Druid User
Hey, channel - I am having a similar issue I am druid version v0.10.0, and I am not seeing the smoosh files getting cleaned up in the java tmpdir on all nodes at different times.  It's causing our disk checks to sound off.  Does anyone know if upgrading fixes this issue, or has anyone implemented a workaround?  Thanks!


On Wednesday, July 12, 2017 at 4:17:15 PM UTC-7, Gian Merlino wrote:
Hadoop indexer in YARN mode is totally good in production. It's just Hadoop in _local mode_ that isn't normally suggested for production. That's really just meant for testing and dev. I think a cron job is your best bet for cleaning up the files that it generates.

Gian

On Wed, Jul 12, 2017 at 4:07 PM, <den...@gmail.com> wrote:
Can you please provide a reference where it stated that hadoop-indexer isn't recommended in production and why?

Is there any sane way (besides cronjob) to clean up these files ?

Thanks!

On Wednesday, July 12, 2017 at 11:41:38 PM UTC+3, Gian Merlino wrote:
It should be safe to clear out the tmp directory when no indexing is running. Also, if you use a remote Hadoop cluster or if you use local mode native indexing (the "index" task in Druid) then this should not be an issue. I believe it should only be an issue with local mode hadoop (which isn't recommended in production anyway).

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Johnson Johnson

unread,
Oct 11, 2017, 10:09:10 PM10/11/17
to Druid User
Also, does / should Druid automatically clean up these smoosh files on its own?

Alberto González Mesas

unread,
Dec 12, 2017, 7:39:50 AM12/12/17
to Druid User
Hi Guys... There are any config parameter to auto delete old tmp files?

Thanks!

Johnson Johnson

unread,
Dec 12, 2017, 7:54:17 AM12/12/17
to druid...@googlegroups.com
No I don’t believe so.

I think I got around this by removing old files in cron under tmpdir flag to Java Opts.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/_fFdS0impzY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
John Knepper

Alberto González Mesas

unread,
Dec 12, 2017, 11:10:06 AM12/12/17
to Druid User
ajá, Finally I have used systemd:

 # cat overlord.service
...
ExecStartPre=/opt/druid/bin/r_iotmpdirs overlord
...

Also in middlemanager:

ExecStartPre=/opt/druid/bin/r_iotmpdirs middlemanager

cat /opt/druid/bin/r_iotmpdirs middlemanager

#!/bin/bash

# Remove old directories into /opt/druid/tmp/{overlord,middleManager}
service="$1"

OVE_DIR="/opt/druid/tmp/overlord"
MID_DIR="/opt/druid/tmp/middleManager"

if [ "$service" == "overlord" ]
then
    rm -rf "$OVE_DIR"/*
elif [ "$service" == "middlemanager" ]
then
    find $MID_DIR -ctime +1 -exec rm -rf {} \;
fi

Johnson Johnson

unread,
Dec 12, 2017, 12:27:33 PM12/12/17
to druid...@googlegroups.com
I think if you also set to the same location for tmpdir you can just remove underneath same path

To unsubscribe from this group and all its topics, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
John Knepper

Punnaiah Guptha

unread,
Mar 4, 2022, 9:33:57 PM3/4/22
to Druid User
The same issue we have observed in Production. Because of too many files and huge data of the tmp folder each Peon processed more than 10 min.
After cleanup the tmp folder files were processed in short time.

Please share me if any configuration needs to be enabled..?

Punnaiah Guptha

unread,
Mar 5, 2022, 9:28:29 AM3/5/22
to Druid User
Hi, Any solution or any configuration?

Can I delete the temp folder files when applications are in a running state.?


Mark Herrera

unread,
Mar 7, 2022, 11:25:15 AM3/7/22
to Druid User
Is this possibly a JVM Configuration issue?

That relevant parameter might be -Djava.io.tmpdir=<a path> . I've taken the liberty of copying and pasting the relevant text below:

Various parts of Druid use temporary files to interact with the file system. These files can become quite large. This means that systems that have small /tmp directories can cause problems for Druid. Therefore, set the JVM tmp directory to a location with ample space.

Also consider the following when configuring the JVM tmp directory:

  • The temp directory should not be volatile tmpfs.
  • This directory should also have good read and write speed.
  • Avoid NFS mount.
  • The org.apache.druid.java.util.metrics.SysMonitor requires execute privileges on files in java.io.tmpdir. If you are using the system monitor, do not set java.io.tmpdir to noexec.

Reply all
Reply to author
Forward
0 new messages