Journald corrupts its files and completely stops logging messages

1,635 views
Skip to first unread message

Tobia Conforto

unread,
Apr 8, 2015, 1:43:15 PM4/8/15
to coreo...@googlegroups.com
Hello

I have just installed the latest CoreOS stable. I have not configured fleetd or anything like that. I only have 3 containers installed as local units.

My server has been up for 8 days and apparently has only logged messages for the first 3 days. The last 5 days have completely vanished. I noticed this when a systemctl status would not show any messages at all at the bottom.

One of the containers is a Nginx server that is configured to send all messages to stdout/err and I was supposed to keep all http logs, so this is quite bad.

- How can I repair the logs so that it starts logging again? I tried restarting systemd-journald, to no avail.
- Why would journalctl start dropping messages instead of opening a new log file or something like that? That's horrible!

This is the output of journalctl --verify:

PASS: /var/log/journal/afb231cca89d416b968c0625041b7aad/sys...@000512aa78f616fc-12f45fdc0951f873.journal~                                                            
PASS: /var/log/journal/afb231cca89d416b968c0625041b7aad/sys...@000512aa53773236-0f72287c155348a3.journal~                                                            
PASS: /var/log/journal/afb231cca89d416b968c0625041b7aad/sys...@000512aabfe309ae-fa02e2d324d3056a.journal~                                                            
1029a88: (null) decompression failed: Bad message                                                                                                                    
1029a88: invalid object contents: Bad message                                                                                                                        
File corruption detected at /var/log/journal/afb231cca89d416b968c0625041b7aad/system.journal:1029a88 (of 25165824 bytes, 67%).                                       
FAIL: /var/log/journal/afb231cca89d416b968c0625041b7aad/system.journal (Bad message)

This is my /var/log/journal/*/ directory:

-rw-r----- 1 root systemd-journal 25165824 Apr  8 17:22 system.journal
-rw-r----- 1 root systemd-journal 16777216 Apr  1 14:13 sys...@000512aa53773236-0f72287c155348a3.journal~
-rw-r----- 1 root systemd-journal  8388608 Apr  1 14:24 sys...@000512aa78f616fc-12f45fdc0951f873.journal~
-rw-r----- 1 root systemd-journal  8388608 Apr  1 14:44 sys...@000512aabfe309ae-fa02e2d324d3056a.journal~

Please help!

Tobia

Tobia Conforto

unread,
Apr 8, 2015, 2:17:56 PM4/8/15
to coreo...@googlegroups.com
More info: running dmesg shows hundreds of these lines:

[611929.600167] systemd-journald[26050]: Failed to write entry (10 items, 284 bytes), ignoring: Cannot assign requested address
[611929.613331] systemd-journald[26050]: Failed to write entry (10 items, 332 bytes), ignoring: Cannot assign requested address
[611929.626620] systemd-journald[26050]: Failed to write entry (22 items, 698 bytes), ignoring: Cannot assign requested address
[611929.639864] systemd-journald[26050]: Failed to write entry (23 items, 683 bytes), ignoring: Cannot assign requested address

Why was journald trying to get an address assigned and why was it failing? Netstat only shows it being connected to 4 unix domain sockets.

Anyways, a pkill -USR2 systemd-journald seems to have fixed the issue, for now. This made it finally rotate the corrupted file and start logging again.

I'm still astonished at the lack of reliability of this tool. What can I do to prevent this mess from happening again?

What are CoreOS users doing to safeguard their logs? What is the best way to keep the logs generated by production containers, for example?

Tobia

Alexei Kounine

unread,
Apr 10, 2015, 4:31:19 AM4/10/15
to coreo...@googlegroups.com
I have the exact same problem. It never happened before (I am running Stable v607 on a 5 node cluster). I got this problem on 2 machines.
Restarting journalctl doesn't work (sudo systemctl restart systemd-journald)

I can see these logs in dmesg too:
[ 3726.267237] systemd-journald[408]: Failed to write entry (22 items, 708 bytes), ignoring: Cannot assign requested address
[ 3727.011141] systemd-journald[408]: Failed to write entry (23 items, 693 bytes), ignoring: Cannot assign requested address
[ 3727.308361] systemd-journald[408]: Failed to write entry (18 items, 520 bytes), ignoring: Cannot assign requested address

Could you please help me with identifying where the problem comes from?

Alexei Kounine

unread,
May 2, 2015, 7:28:13 AM5/2/15
to coreo...@googlegroups.com
Does anyone have the same problem?

I upgraded to 633.1.0 and still get the same journal corruption problem from time to time (i run GCE instances).

The symptoms are simple:
- journalctl doesn't refresh anymore
- dmesg shows thousands of those lines:
[83961.009470] systemd-journald[393]: Failed to write entry (18 items, 822 bytes), ignoring: Cannot assign requested address
- journalctl --verify shows a corruption error

The only thing i can do when it happens is to make force a log rotation

$ ps -A | grep journal
15617 ?        00:00:01 systemd-journal

$ sudo  kill -SIGUSR2 15617

I thought the 2 first cases i had were isolated, but it seems to happen every now and then.

Thanks for your help,

Alexei

Tobia Conforto

unread,
May 4, 2015, 4:16:50 AM5/4/15
to coreo...@googlegroups.com
Yes, it keeps happening to me too every once in a while.
Same description and actions as in your case.
I have CoreOS (stable channel) installed on a bare metal server at Hetzner.

This is not a small issue, the log is quite an important part of a system.
Is there any test I can run to help pinpoint the issue?

Tobia

Matthew Henniges

unread,
May 28, 2015, 12:56:47 PM5/28/15
to coreo...@googlegroups.com
I've faced this as well, I believe at least in my case it has to do with the unicode non-breaking-space character being in the log line.

Salvatore Poliandro

unread,
May 29, 2015, 11:29:53 AM5/29/15
to coreo...@googlegroups.com
there is an open bug report for this : https://github.com/coreos/bugs/issues/322 

I have a timer set that reloads journald ever hour, its hacky and ends up trimming logs, but it keeps the load under control.



Salvatore Poliandro III
Operations Lead
http://keep.com

(o) 857-366-9345

(c) 551-427-9904


--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages