How to handle systems that don't have RTC battery

49 views
Skip to first unread message

Steven Lu

unread,
Jul 19, 2023, 2:58:32 AM7/19/23
to lnav
I'm working with hundreds of embedded linux devices trying to make sense of runtime behavior where we have insane quantities of log output (lnav is phenomenal by the way at consuming this much content without getting bogged down), and for which whenever the devices reboot they will get their clocks reset to january 28 2018 and syslog will report this time until NTP kicks in to fix the time. 

What this means is when I load any log files into lnav, it is being very smart about applying the constraint that all lines in a given log file are chronological. Indeed this is the case, but let's say syslog.34.gz contains 1000 lines on July 18 2023, then it reboots and shows 400 lines showing Jan 28 2018, then it gets corrected back to July 18 2023 for 300 more lines, then reboots again generating 500 lines on Jan 28 and 6000 lines on July 18, then what lnav does is work backward putting 6000 lines correctly for july 18 2023, 500 lines in jan 28 2023, 300 lines in july 18 2022, 400 lines in jan 2022, and 1000 lines in july 2021. 

As you might imagine once I load in a large number of rotated syslog files i will get a jumble of interlaced content that doesn't present anything useful. One thing to note is that if I could filter out all jan 28 content before feeding it into lnav, all the logfiles cover a period of like 5 days and it would certainly do the right thing. 

So I'm going to start making a script to try to accurately determine when the time went back to jan 28 and when NTP corrects the time, and .... I should be able to backfill in some sort of reasonable computed times actually. my question is: Does something exist built into lnav that might help me control the time guessing/handling behavior? I know it's a long shot. 

Thanks

lnav support

unread,
Jul 19, 2023, 9:22:41 AM7/19/23
to Steven Lu, lnav

Quick question, do these syslog message have the year in their timestamp?

Thanks,

Tim

------- Original Message -------
--
You received this message because you are subscribed to the Google Groups "lnav" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lnav+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lnav/dcec24b2-a83d-4381-b616-bbf93c7e5e5dn%40googlegroups.com.

Steven Lu

unread,
Jul 19, 2023, 7:26:36 PM7/19/23
to lnav support, lnav
They do, since it's a grab bag of all the services that log on a linux system. but as far as the application is concerned, only something like 1% of the actual log entries have the year in them. But there will be periodic messages that contain a date. It's not clear which exact lines have date/times actually parsed and understood by lnav, I'm curious if there is a way to find that out also. 

Steven Lu

unread,
Jul 19, 2023, 8:53:13 PM7/19/23
to lnav support, lnav
I just thought of another basic thing that I did not consider until now. I can take advantage of the constraint that the order is preserved across log files by simply taking my collection of log files, catting them in order (though their mostly being gzipped will make this slightly tricky), and feeding it to lnav as a single file or in the stdin, then it will address the problem I stated earlier without having to doctor any timestamps. 

Steven Lu

unread,
Jul 19, 2023, 9:45:56 PM7/19/23
to lnav support, lnav
Another burning question! Why when i type / to search is the first character entered for the search term hidden? There is a search preview (and also the search result) and both show that the initial character is not being lost. It's just not echoed out. Why is this? 

lnav support

unread,
Jul 20, 2023, 11:45:48 AM7/20/23
to Steven Lu, lnav

Hmm, I have not seen this search behavior before.  Can you record a
gif of it happening and include it here or as an issue on the github
repo?

What version are you using?  The latest is v0.11.2.

I'm not sure what to do about the time issue at the moment.  I'll
keep thinking about it.

thanks,

Tim

------- Original Message -------

Steven Lu

unread,
Jul 20, 2023, 2:23:01 PM7/20/23
to lnav support, lnav
The weird search field behavior please find attached. Actually it's the first two chars (/ and first char) that are missing... 

I noticed that it's fine and does not do this in Linux, it shows both a slash and the first char like you'd expect. 

Separately i'm finding loading over around 5 million log lines reliably crashes lnav under macos and linux. It's scaling quite well up till this point. Have you done testing around this? Maybe it's just a simple buffer size issue.  Attached log and have many more of those too.
Screen Recording 2023-07-20 at 14.17.38 480p.mov
crash-2023-07-20-14-15-01.74176.log

Steven Lu

unread,
Jul 20, 2023, 10:07:05 PM7/20/23
to lnav support, lnav
Sorry, another question: is it possible to search across line breaks? I'd like to find all the cold-reboot situations, so for example I need to search for July 20 on one line followed by pretty much anything and then Jan on the next line. It's hard to figure out, google shows me this topic though which seems to indicate the architecture of regex matching isn't suitable for the need. 

Steven Lu

unread,
Jul 20, 2023, 10:07:39 PM7/20/23
to lnav support, lnav

lnav support

unread,
Jul 21, 2023, 1:38:49 PM7/21/23
to steve...@gmail.com, ln...@googlegroups.com

Searching across multiple lines in the search prompt is not supported.  But, this is probably better solved through the SQL interface.  You can write a query to tag messages that match a condition.  I’m traveling now, so I can’t write an example right now.

Can you send some sample log messages when the ntp kicks in.  Maybe I can add some code to detect that and correct the timestamps.

Tim

Steven Lu

unread,
Jul 22, 2023, 1:20:43 PM7/22/23
to lnav support, ln...@googlegroups.com
These are Nvidia Tegra embedded Linux devices. It's a well-known issue that they have a tendency to reset to some arbitrary past time on cold boot without a battery for the clock.
On these systems, with the way we've configured them, the time goes back to normal with: 

Jul 20 13:49:25 localhost chronyd[6470]: Selected source 10.83.48.1
Jul 20 13:49:25 localhost chronyd[6470]: System clock wrong by 172705847.272658 seconds, adjustment started
Jul 20 13:49:25 localhost chronyd[6470]: System clock was stepped by 172705847.272658 seconds


I don't have any examples from my desktop linux system which would use systemd-timesyncd, NTP will be very quietly doing its thing behind the scenes adjusting system time by fractions of a second typically. 

Thanks!

P.S. I am recommending to my colleagues that when browsing log files larger than 5 million lines of log to do so with less instead of lnav, because lnav will crash when exceeding that much log volume. Is this a known issue?

lnav support

unread,
Jul 24, 2023, 1:48:01 AM7/24/23
to Steven Lu, ln...@googlegroups.com


------- Original Message -------

On Saturday, July 22nd, 2023 at 10:20 AM, Steven Lu <steve...@gmail.com> wrote:

These are Nvidia Tegra embedded Linux devices. It's a well-known issue that they have a tendency to reset to some arbitrary past time on cold boot without a battery for the clock.
On these systems, with the way we've configured them, the time goes back to normal with:

Jul 20 13:49:25 localhost chronyd[6470]: Selected source 10.83.48.1
Jul 20 13:49:25 localhost chronyd[6470]: System clock wrong by 172705847.272658 seconds, adjustment started
Jul 20 13:49:25 localhost chronyd[6470]: System clock was stepped by 172705847.272658 seconds

I think this is something that I can work with... I'll explore a bit.

I don't have any examples from my desktop linux system which would use systemd-timesyncd, NTP will be very quietly doing its thing behind the scenes adjusting system time by fractions of a second typically.

Thanks!

P.S. I am recommending to my colleagues that when browsing log files larger than 5 million lines of log to do so with less instead of lnav, because lnav will crash when exceeding that much log volume. Is this a known issue?

When lnav sees the January timestamp, it thinks a year has passed.  So, it goes back
through the previous log messages and updates the timestamps to make them
from the previous year.  Looking at the crash log you sent, I think there were quite
a lot of these rollovers, to the point where it went before the unix epoch and lnav
got really confused.  Not totally sure, this is the problem, but I think fixing the main
issue will fix this.  (I thought I had put a safeguard for this case, but that might not
be working totally correctly)

For reference, I have loaded multi-million lines of logs into lnav without a problem.
The hardcoded limit on the number of lines per file is 256 million.

tim

Steven Lu

unread,
Jul 24, 2023, 7:21:09 PM7/24/23
to lnav support, ln...@googlegroups.com
I consistently get crashes beyond 5M lines but I can def do a test at some point to see if this is happening on ANY file or just on these particular kinds of logs i'm dealing with but the behavior is very consistent. I can open up 25 million lines in `less` for example without issue. 

I don't think I had any that had so many of these events (more than 40) to go that far back but it's certainly plausible! 

lnav support

unread,
Jul 26, 2023, 7:08:09 PM7/26/23
to Steven Lu, ln...@googlegroups.com

I think I found the code path that was causing the assert to fire.  Can
you try one of the artifacts attached to this build to see if it works
for you:


Thanks,

Tim
Reply all
Reply to author
Forward
0 new messages