Multiple nsm_sensor_clean process started until server runs out of memory / cpu - looks to be due to 2+ million files in /nsm/bro/extracted

78 views
Skip to first unread message

Brant Hale

unread,
Jun 13, 2016, 2:03:47 PM6/13/16
to security-onion
I am getting multiple nsm_sensor_clean proccesses that continue to grow until the server is unresponsive. It looks to be due to a directory taking longer than 1 minute to list. I am not sure if it makes sense to break this folder down into subfolders based on date (big change to bro?) or change the cleanup script to either delete a different way or check to see if it is already running?

Any thoughts?

Brant Hale

Here are more details:

I believe I have tracked it down to this part of the sensor_clean process:

# find the oldest extracted files in /nsm/bro/extracted/ and exclude today
ls -l --time-style="long-iso" /nsm/bro/extracted/*-* | grep $OLDEST_EXTRACT_DATE | awk '{print $8}' |while read FILE
do
echo_msg 1 "removing extracted file: $FILE"
rm -f "$FILE"
done
REMOVED="yes"
fi


If I try and run the ls from the script it takes longer than 1 minute to complete. I believe the cron job is starting a new process each minute because the previous instance does not have a chance to complete.

ls -l --time-style="long-iso" /nsm/bro/extracted/*-*

h@so:/nsm/bro$ time ls -l --time-style="long-iso" /nsm/bro/extracted/*-*
-bash: /bin/ls: Argument list too long

real 1m20.120s
user 1m16.225s
sys 0m3.787s


This directory has about 3 million files and cannot run a normal rm.

haleb@so:/nsm/bro/extracted$ ls -al | wc -l
2978261
haleb@so:/nsm/bro/extracted$ rm -f *
-bash: /bin/rm: Argument list too long
haleb@so:/nsm/bro/extracted$


TOP showing the numerous nsm_sensor_cleans staring every 1 minute.
23998 root 20 0 358224 328860 404 R 100.0 0.2 13:21.24 nsm_sensor_clea
20336 root 20 0 358228 332372 404 R 99.9 0.3 22:22.62 nsm_sensor_clea
20723 root 20 0 358216 329804 404 R 99.9 0.3 21:21.36 nsm_sensor_clea
21739 root 20 0 358224 329708 404 R 99.9 0.2 20:20.65 nsm_sensor_clea
21867 root 20 0 358224 332840 404 R 99.9 0.3 19:21.98 nsm_sensor_clea
22206 root 20 0 358220 330904 404 R 99.9 0.3 18:23.17 nsm_sensor_clea
22276 root 20 0 358220 327548 404 R 99.9 0.2 17:23.18 nsm_sensor_clea
23154 root 20 0 358216 328852 404 R 99.9 0.2 16:24.42 nsm_sensor_clea
23385 root 20 0 358224 332716 404 R 99.9 0.3 15:21.96 nsm_sensor_clea
23647 root 20 0 358224 327552 404 R 99.9 0.2 14:20.12 nsm_sensor_clea
24090 root 20 0 358220 327548 404 R 99.9 0.2 12:24.20 nsm_sensor_clea
24185 root 20 0 358220 329224 404 R 99.9 0.2 11:24.23 nsm_sensor_clea
24564 root 20 0 358220 329200 404 R 99.9 0.2 10:22.71 nsm_sensor_clea
24697 root 20 0 358224 327552 404 R 99.9 0.2 9:23.09 nsm_sensor_clea
24901 root 20 0 358228 329144 404 R 99.9 0.2 8:21.60 nsm_sensor_clea
25119 root 20 0 358228 328864 404 R 99.9 0.2 7:24.31 nsm_sensor_clea
25466 root 20 0 358324 329356 404 R 99.9 0.2 5:23.95 nsm_sensor_clea
25591 root 20 0 358328 327676 404 R 99.9 0.2 4:22.40 nsm_sensor_clea
26023 root 20 0 358324 327676 404 R 99.9 0.2 2:24.23 nsm_sensor_clea
26113 root 20 0 358328 328712 404 R 99.9 0.2 1:23.07 nsm_sensor_clea
26344 root 20 0 358344 329156 404 R 99.9 0.2 0:23.52 nsm_sensor_clea

Brant Hale

unread,
Jun 13, 2016, 2:12:52 PM6/13/16
to securit...@googlegroups.com
More info :   I see the Argument list too long which would keep the /nsm/bro/extracted files from ever getting cleaned once the sensor gets to this point.   I am looking for a way to keep it from getting here.  Perhaps a check on number of file in /nsm/bro/extracted and clean it when it crosses a threshold to keep it from getting to this state?




--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onio...@googlegroups.com.
To post to this group, send email to securit...@googlegroups.com.
Visit this group at https://groups.google.com/group/security-onion.
For more options, visit https://groups.google.com/d/optout.

Wes

unread,
Jun 13, 2016, 2:31:47 PM6/13/16
to security-onion
In this case, I suppose you could use something like pidof to check if sensor-clean is already running or not--if it is, wait until it finishes:

Ex. http://stackoverflow.com/questions/16807876/shell-script-execution-check-if-it-is-already-running-or-not

Doug, should there be an issue opened for this?

Thanks,
Wes

Doug Burks

unread,
Jun 13, 2016, 5:41:53 PM6/13/16
to securit...@googlegroups.com
Yes, I've created Issue 942 for this:
https://github.com/Security-Onion-Solutions/security-onion/issues/942
--
Doug Burks

Doug Burks

unread,
Jun 14, 2016, 6:36:37 AM6/14/16
to securit...@googlegroups.com
Hi Brant,

Are you still running with the default of just extracting EXEs or did
you configure Bro to extract other file types as well?

On Mon, Jun 13, 2016 at 2:03 PM, Brant Hale <bran...@gmail.com> wrote:

Brant Hale

unread,
Jun 14, 2016, 8:47:40 PM6/14/16
to security-onion

Doug,

I just have exe extraction. I do have quite a bit of internal exe's so my case may be abnormal. I will go back and see what my growth is on them. I suspect the timeout on the directory list is the tipping point. I have been trying to figure out how many files is too many in a directory.

Let me know if I can help out in any way and as always thanks!

Brant

Doug Burks

unread,
Jul 6, 2016, 3:31:34 PM7/6/16
to securit...@googlegroups.com
Hi Brant,

Let's do some testing. Please run the following commands and include
all output:

time ls -l --time-style="long-iso" /nsm/bro/extracted/*-* > /dev/null

time ls -l --time-style="long-iso" /nsm/bro/extracted/ > /dev/null

time find /nsm/bro/extracted/ -type f -printf '%T+ %p\n' | sort >/dev/null

On Mon, Jun 13, 2016 at 2:03 PM, Brant Hale <bran...@gmail.com> wrote:

Doug Burks

unread,
Jul 6, 2016, 4:45:34 PM7/6/16
to securit...@googlegroups.com
That's probably not a fair test since the last command included a
sort, so let's try this instead:

time ls -l --time-style="long-iso" /nsm/bro/extracted/*-* > /dev/null

time ls -l --time-style="long-iso" /nsm/bro/extracted/ > /dev/null

time find /nsm/bro/extracted/ -type f -printf '%T+ %p\n' >/dev/null
--
Doug Burks

Brant Hale

unread,
Aug 17, 2016, 1:41:43 PM8/17/16
to securit...@googlegroups.com
I have cleaned up the files and don't have a good test case now.   The normal cleanup script has been working to keep the file list low.

I will see if I can get a test box to fail.

Brant


>> To unsubscribe from this group and stop receiving emails from it, send an email to security-onion+unsubscribe@googlegroups.com.
>> To post to this group, send email to security-onion@googlegroups.com.

>> Visit this group at https://groups.google.com/group/security-onion.
>> For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> Doug Burks



--
Doug Burks

--
Follow Security Onion on Twitter!
https://twitter.com/securityonion
---
You received this message because you are subscribed to the Google Groups "security-onion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to security-onion+unsubscribe@googlegroups.com.
To post to this group, send email to security-onion@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages