Parsing a time as a prefix of a larger string

707 views
Skip to first unread message

ben...@gmail.com

unread,
Mar 15, 2022, 11:35:06 PM3/15/22
to golang-nuts
We're making a log processing program that needs to parse times from the prefix of a larger string, in this case in a log line such as:

2006-01-02 15:04:05 INFO this is a log message

We need to parse the "2006-01-02 15:04:05" part as a timestamp. Unfortunately, time.Parse always returns an error if there's extra text after the timestamp.

If the timestamp were in a fixed format (like the one above) we could just hard-code it to grab the first two fields, or even the first N characters. However, in our case the format of the timestamp is user-controlled, so we're planning to let the user specify a custom time.Parse layout for their logs.

We can hack around this with the following "introspect the error message" workaround (see runnable code at https://go.dev/play/p/CWuSk0te7-p):

        t, err := time.Parse(layout, line)
        if e, ok := err.(*time.ParseError); ok && strings.HasPrefix(e.Message, ": extra text: ") {
                prefix := line[:len(line)-len(e.ValueElem)]
                t, _ = time.Parse(layout, prefix) // parsing just the prefix should succeed
                err = nil
        }
        // use t, err

This works, and it seems unlikely the Go team will change that "extra text" message, but that's certainly not guaranteed, and it's bad form to rely on the formatting of error messages. It also means we need to call time.Parse again, when it's already done the work once.

Does anyone have suggestions for how to best parse "time prefixes"? Currently we're thinking of either the above hack, or copying the time.Parse code into our repo and modifying it to suit (there's a clear place that returns "extra text", so that would be annoying, but easy enough to do).

This seems like it would be a useful feature for others, and I'm surprised there are not open issues about this already (I couldn't find any). Maybe I could open an issue to suggest that ParseError could have a new method ExtraText as follows:

// ExtraText returns the extra text after the parsed time value in the original string.
// If this error is not an "extra text" error, return "", nil.
func (*ParseError) ExtraText() (extra string, parsed Time)

But of course, even if that method were added, it would have to wait till at least Go 1.19, so for our project we need a way forward either way.

Thanks,
Ben

peterGo

unread,
Mar 16, 2022, 12:12:47 PM3/16/22
to golang-nuts
On Tuesday, March 15, 2022 at 11:35:06 PM UTC-4 Ben wrote:
We're making a log processing program that needs to parse times from the prefix of a larger string, in this case in a log line such as:

2006-01-02 15:04:05 INFO this is a log message

We need to parse the "2006-01-02 15:04:05" part as a timestamp. Unfortunately, time.Parse always returns an error if there's extra text after the timestamp.

If the timestamp were in a fixed format (like the one above) we could just hard-code it to grab the first two fields, or even the first N characters. However, in our case the format of the timestamp is user-controlled, so we're planning to let the user specify a custom time.Parse layout for their logs.

Ben

Ben,
 
How does the user control the format of the timestamp? How do you get the time.Parse layout?

Peter
 

Sean Liao

unread,
Mar 16, 2022, 5:57:05 PM3/16/22
to golang-nuts
My understanding is that most things have the user specify a top level field parser (json, regex, csv, format string, etc), and time parsing is only done on an extracted field.

ben...@gmail.com

unread,
Mar 16, 2022, 8:44:07 PM3/16/22
to golang-nuts
How does the user control the format of the timestamp? How do you get the time.Parse layout?

The project is a lightweight service manager, so the user "controls" the format of the timestamp based on the service they're running. For example, if they're running nginx, it will output logs (and timestamps) in a certain format, and so on.

The service manager has configuration per service, and that's where we're going to specify the time.Parse (or perhaps regex) layout.

-Ben
 

Rob Pike

unread,
Mar 16, 2022, 10:03:08 PM3/16/22
to ben...@gmail.com, golang-nuts
I would approach the problem a different way and ask the question, how
do I split the string to separate the time? The time parser doesn't
have to be the one to do this. For instance, uou could require a
marker (if the word INFO or its substitute isn't already one), such as
a spaced hyphen:

2006-01-02 15:04:05 - INFO this is a log message

-rob
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/17f2caf4-765b-4e14-845d-5b0d8bde064dn%40googlegroups.com.

ben...@gmail.com

unread,
Mar 18, 2022, 7:59:39 PM3/18/22
to golang-nuts
Thanks for the feedback, folks. I think we'll end up using an explicit "log-trim" regex for this. -Ben
Reply all
Reply to author
Forward
0 new messages