keep on getting "disk quota exceeded" Log on wal file

582 views
Skip to first unread message

Rakesh Jain

unread,
Aug 7, 2018, 9:08:09 AM8/7/18
to Prometheus Users

hello team,

I am running prometheus, version 2.3.1 (branch: HEAD, revision: 188ca45)

Running prometheus on an LXC container on Proxmox VE. was getting the following error ->
WAL log samples: log series: write /var/lib/prometheus/wal/001530: disk quota exceeded

Because the disk was actually full. Later I increased the container size to 80GB from 32GB. and its reflected immediately.
rakesh.jain@labs-monitor:~$ df -hT
Filesystem Type Size Used Avail Use% Mounted on
zfshome/images/subvol-111-disk-1 zfs 82G 32G 51G 39% /

But we still get the same error for the same old wal file.

daemon.log says ->
Aug 7 12:23:53 labs-monitor prometheus[21529]: level=warn ts=2018-08-07T12:23:53.076092475Z caller=scrape.go:713 component="scrape manager" scrape_pool=test-icmp target="http://labs-monitor.eng.fireeye.com:9115/probe?module=icmp&target=hpc123a.eng.fireeye.commsg="append failed" err="WAL log samples: log series: write /var/lib/prometheus/wal/001530: disk quota exceeded"

here is the wal directory listing ->

rakesh.jain@labs-monitor:/var/lib/prometheus/wal$ ls -ltr
total 446203
-rw-r--r-- 1 prometheus prometheus 268428668 Jul 20 12:06 001527
-rw-r--r-- 1 prometheus prometheus 268434456 Jul 20 12:31 001528
-rw-r--r-- 1 prometheus prometheus 268428273 Jul 20 12:56 001529
-rw-r--r-- 1 prometheus prometheus 268435456 Jul 20 12:59 001530
-rw-r--r-- 1 prometheus prometheus 62241086 Aug 7 11:27 000001

rakesh.jain@labs-monitor:/var/lib/prometheus/wal$ du -sh *
6.8M 000001
137M 001527
137M 001528
137M 001529
19M 001530

Rakesh Jain

unread,
Aug 7, 2018, 2:45:43 PM8/7/18
to Prometheus Users
Any help is appreciated. Please.

Christian Hoffmann

unread,
Aug 7, 2018, 3:33:11 PM8/7/18
to Rakesh Jain, Prometheus Users
On 08/07/2018 03:08 PM, Rakesh Jain wrote:
> Because the disk was actually full. Later I increased the container size
> to 80GB from 32GB. and its reflected immediately.
> rakesh.jain@labs-monitor:~$ df -hT
> Filesystem Type Size Used Avail Use% Mounted on
> /zfshome/images/subvol-111-disk-1 zfs 82G 32G 51G 39% //
>
> But we still get the same error for the same old wal file.
Is this even after a restart of Prometheus?

It is a known issue that Prometheus will not recover perfectly from
disk-full situations [1]. Although the given issue mentions another
error message, this could still be similar behavior, IMO.

So, if you haven't tried yet, you might try restarting Prometheus.

Kind regards,
Christian

Rakesh Jain

unread,
Aug 7, 2018, 3:55:48 PM8/7/18
to Prometheus Users
Thanks for reply Christian. No I did not try that yet. Restart should not break anything right ?? because the only change which I did is increasing the disk size.

Rakesh Jain

unread,
Aug 7, 2018, 5:01:21 PM8/7/18
to Prometheus Users
Please let me know if I can go ahead and do a restart. because I am scared after going through this thread -> https://github.com/prometheus/prometheus/issues/4028 (Data Loss after restart)

Simon Pasquier

unread,
Aug 8, 2018, 2:25:52 AM8/8/18
to Rakesh Jain, Prometheus Users
When the system runs out of disk space, all bets are off as far as data sanity and consistency are concerned.
What I would do:
- stop Prometheus
- make a copy of the data directory
- start Prometheus

Regarding the issue you referenced, it didn't seem to relate with disk space starvation.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/951388fa-cf60-4689-b7b8-2c4312e8c75b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rakesh Jain

unread,
Aug 12, 2018, 7:09:46 AM8/12/18
to Prometheus Users
Thanks Simon Pasquier.

It started working after restarting the Prometheus. though we took backup first before restarting it.

Thanks all for your valuable suggestions.
Reply all
Reply to author
Forward
0 new messages