Old pcaps not removed

765 views
Skip to first unread message

C. L. Martinez

unread,
Oct 22, 2015, 3:16:05 AM10/22/15
to moloc...@googlegroups.com
Hi all,

I am having serious problems with my moloch host. Today, there is no
space left on hard disk assigned to moloch:

root@molochhst01:/var/log/moloch# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/dm-0 20G 1.6G 17G 9% /
udev 10M 0 10M 0% /dev
tmpfs 1.6G 316K 1.6G 1% /run
tmpfs 4.0G 0 4.0G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 4.0G 0 4.0G 0% /sys/fs/cgroup
/dev/mapper/datavol-configvol 12G 34M 12G 1% /data/config
/dev/sda2 347M 33M 292M 10% /boot
/dev/mapper/nsmvol-logpcapvol 2.6T 2.6T 20K 100% /nsm

In config.ini, I have configured "freeSpaceG = 15". According to
viewer.log, moloch-viewer is trying to remove old indexes:

_source:
{ num: 2535,
node: 'plzfnsm01',
name: '/nsm/moloch/plzfnsm01-151016-00002535.pcap',
first: 1445039481 },
sort: [ 1445039481 ] }
Deleting { _index: 'files_v3',
_type: 'file',
_id: 'plzfnsm01-2536',
_score: null,
_source:
{ num: 2536,
node: 'plzfnsm01',
name: '/nsm/moloch/plzfnsm01-151016-00002536.pcap',
first: 1445039575 },
sort: [ 1445039575 ] }
Deleting { _index: 'files_v3',
_type: 'file',
_id: 'plzfnsm01-2537',
_score: null,
_source:
{ num: 2537,
node: 'plzfnsm01',
name: '/nsm/moloch/plzfnsm01-151016-00002537.pcap',
first: 1445039707 },
sort: [ 1445039707 ] }
Deleting { _index: 'files_v3',
_type: 'file',
_id: 'plzfnsm01-2538',
_score: null,
_source:
{ num: 2538,
node: 'plzfnsm01',
name: '/nsm/moloch/plzfnsm01-151016-00002538.pcap',
first: 1445039831 },
sort: [ 1445039831 ] }

but under elasticsearch's log, a lot of errors are displayed:

[2015-10-22 07:11:52,107][WARN ][cluster.action.shard ]
[plzfsiem03] [sessions-151022][0] received shard failed for
[sessions-151022][0], node[1lLflWuAR_yqojcMXRXopw], [P],
s[INITIALIZING], indexUUID [H_DF2dh1RgOpEUI1qM9p1w], reason [shard
failure [failed
recovery][IndexShardGatewayRecoveryException[[sessions-151022][0]
failed to recover shard]; nested: TranslogCorruptedException[translog
corruption while reading from stream]; nested:
ElasticsearchException[failed to read
[session][151022-L0jdtbEFzglH27t_eCS6gVKA]]; nested:
ElasticsearchIllegalArgumentException[No version type match [77]]; ]]
[2015-10-22 07:11:52,140][WARN ][index.engine ]
[plzfsiem03] [stats][0] failed to sync translog
[2015-10-22 07:11:52,141][WARN ][indices.cluster ]
[plzfsiem03] [[stats][0]] marking and sending shard failed due to
[failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[stats][0] failed to recover shard
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:290)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.translog.TranslogCorruptedException:
translog corruption while reading from stream
at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:72)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:260)
... 4 more
Caused by: org.elasticsearch.ElasticsearchException: failed to read
[stat][plzfnsm01]
at org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:522)
at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68)
... 5 more
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: No
version type match [-1]
at org.elasticsearch.index.VersionType.fromValue(VersionType.java:307)
at org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:519)
... 6 more
[2015-10-22 07:11:52,141][WARN ][cluster.action.shard ]
[plzfsiem03] [stats][0] received shard failed for [stats][0],
node[1lLflWuAR_yqojcMXRXopw], [P], s[INITIALIZING], indexUUID
[AFL4fTJkTgCiTAphxc2EEg], reason [shard failure [failed
recovery][IndexShardGatewayRecoveryException[[stats][0] failed to
recover shard]; nested: TranslogCorruptedException[translog corruption
while reading from stream]; nested: ElasticsearchException[failed to
read [stat][plzfnsm01]]; nested:
ElasticsearchIllegalArgumentException[No version type match [-1]]; ]]
[2015-10-22 07:11:52,878][WARN ][index.engine ]
[plzfsiem03] [sessions-151022][0] failed to sync translog
[2015-10-22 07:11:52,880][WARN ][indices.cluster ]
[plzfsiem03] [[sessions-151022][0]] marking and sending shard failed
due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[sessions-151022][0] failed to recover shard
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:290)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.translog.TranslogCorruptedException:
translog corruption while reading from stream
at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:72)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:260)
... 4 more
Caused by: org.elasticsearch.ElasticsearchException: failed to read
[session][151022-L0jdtbEFzglH27t_eCS6gVKA]
at org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:522)
at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68)
... 5 more
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: No
version type match [77]
at org.elasticsearch.index.VersionType.fromValue(VersionType.java:307)
at org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:519)
... 6 more
[2015-10-22 07:11:52,880][WARN ][cluster.action.shard ]
[plzfsiem03] [sessions-151022][0] received shard failed for
[sessions-151022][0], node[1lLflWuAR_yqojcMXRXopw], [P],
s[INITIALIZING], indexUUID [H_DF2dh1RgOpEUI1qM9p1w], reason [shard
failure [failed
recovery][IndexShardGatewayRecoveryException[[sessions-151022][0]
failed to recover shard]; nested: TranslogCorruptedException[translog
corruption while reading from stream]; nested:
ElasticsearchException[failed to read
[session][151022-L0jdtbEFzglH27t_eCS6gVKA]]; nested:
ElasticsearchIllegalArgumentException[No version type match [77]]; ]]
[2015-10-22 07:11:52,908][WARN ][index.engine ]
[plzfsiem03] [stats][0] failed to sync translog
[2015-10-22 07:11:52,909][WARN ][indices.cluster ]
[plzfsiem03] [[stats][0]] marking and sending shard failed due to
[failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[stats][0] failed to recover shard
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:290)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.translog.TranslogCorruptedException:
translog corruption while reading from stream
at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:72)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:260)
... 4 more
Caused by: org.elasticsearch.ElasticsearchException: failed to read
[stat][plzfnsm01]
at org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:522)
at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68)
... 5 more
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: No
version type match [-1]
at org.elasticsearch.index.VersionType.fromValue(VersionType.java:307)
at org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:519)
... 6 more
[2015-10-22 07:11:52,910][WARN ][cluster.action.shard ]
[plzfsiem03] [stats][0] received shard failed for [stats][0],
node[1lLflWuAR_yqojcMXRXopw], [P], s[INITIALIZING], indexUUID
[AFL4fTJkTgCiTAphxc2EEg], reason [shard failure [failed
recovery][IndexShardGatewayRecoveryException[[stats][0] failed to
recover shard]; nested: TranslogCorruptedException[translog corruption
while reading from stream]; nested: ElasticsearchException[failed to
read [stat][plzfnsm01]]; nested:
ElasticsearchIllegalArgumentException[No version type match [-1]]; ]]

Elasticsearch is installed on a third host. Result: both hard disks
(moloch and elasticsearch) are full.

How can I avoid this situation?? Is it a a bug??

Andy

unread,
Oct 22, 2015, 8:06:00 AM10/22/15
to Moloch Full Packet Capture
Are the files it is trying to remove still there?  ('/nsm/moloch/plzfnsm01-151016-00002535.pcap' for example) If so do you have a permission/owner problem?  Depending on the version you are running you might be hitting the min file limit (https://github.com/aol/moloch/wiki/FAQ#PCAP_Deletion)

For elasticsearch, it doesn't like full disks, you will need to run the daily script (or db.pl expire) and reduce the number of days.  You can delete the indexes manually for now.

Andy

unread,
Oct 22, 2015, 9:44:46 AM10/22/15
to Moloch Full Packet Capture

>> Are the files it is trying to remove still there?
>> ('/nsm/moloch/plzfnsm01-151016-00002535.pcap' for example)
>
>Yes, they are.

>
> If so do you have
>> a permission/owner problem?
>
>No. Moloch procesess runs as root.


Have you set dropUser/dropGroup?  if so viewer will drop priv.   You need to make sure that user can delete the files.



  Depending on the version you are running you
> might be hitting the min file limit
> (https://github.com/aol/moloch/wiki/FAQ#PCAP_Deletion)
Uhmm ... Maybe the problem is with freeSpaceG option. I have
configured "freeSpaceG = 15". I am using moloch 0.11.5 release. Do I
need to use "freeSpaceG = 15%" instead??


>>
>> For elasticsearch, it doesn't like full disks, you will need to run the
>> daily script (or db.pl expire) and reduce the number of days.  You can
>> delete the indexes manually for now.

>I run daily script every midnight ...

Right, but you might need to reduce the number of days if the ES disk is full.

Matt C

unread,
Oct 22, 2015, 10:41:10 AM10/22/15
to Moloch Full Packet Capture


On Thursday, October 22, 2015 at 9:44:46 AM UTC-4, Andy wrote:

>> Are the files it is trying to remove still there?
>> ('/nsm/moloch/plzfnsm01-151016-00002535.pcap' for example)
>
>Yes, they are.



I haven't pinned down the exact circumstances, but from time to time file deletes fail for me as well.  The pcap file is removed from the files index in ES, but it isn't deleted from the disk.  I suspect it's related to load, as my servers are undersized for the job.   If I don't periodically check the files index and delete any pcaps that are no longer in it, my disk fills up.  For example on my most heavily loaded box right now I have this:

$ ls /raw/moloch/*pcap | wc -l
19775
$ curl -s 'localhost:9200/files_v3/_search?size=100000&pretty' |grep 'raw/moloch'| wc -l
19657

The files that exist on disk but not in the index don't show up in the viewer log, so it seems like the viewer hasn't attempted to delete them.

- Matt
Reply all
Reply to author
Forward
0 new messages