A couple of years ago I wrote a Python script to parse Java thread dumps. I have been converting that script into Go.
I noticed that for a particular regex, Python is at least five to six times faster, sub 5 ms vs 30 ms, than Go for each line of the file. This
really adds up, ending with a total runtime of Python 3secs vs Go 24 secs.
Python regex:
'\d{4}-\d\d-\d\d\s\d\d:\d\d:\d\d,\d{3}\s\w{4,5}\s.+?\s-\s(.*)'
Go regex:
"\\d{4}-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d,\\d{3}\\s\\w{4,5}\\s.+?\\s-\\s(.*)"
Sample data:
2012-06-10 12:54:27,162 INFO CORBA.Daemon.App.out - java.lang.Thread.State: WAITING (on object monitor)
What I want is the text to the right of that dash, java.lang.Thread.State: WAITING (on object monitor).
Both are pre-compiled, so that's not the issue.
I'm using this call on the compiled pattern.
logPattern.FindStringSubmatch(line)
I know I could use strings.Split, I have started using split with a simpler regex match check, because of the performance difference. That's not the point of this post,
I want to understand why, so I can either fix my regex or avoid a pitfall I don't understand yet.
Any suggestions?
'\d{4}-\d\d-\d\d\s\d\d:\d\d:\d\d,\d{3}\s\w{4,5}\s.+?\s-\s(.*)'
A couple of years ago I wrote a Python script to parse Java thread dumps. I have been converting that script into Go.
I noticed that for a particular regex, Python is at least five to six times faster, sub 5 ms vs 30 ms, than Go for each line of the file. This
really adds up, ending with a total runtime of Python 3secs vs Go 24 secs.
Python regex:
'\d{4}-\d\d-\d\d\s\d\d:\d\d:\d\d,\d{3}\s\w{4,5}\s.+?\s-\s(.*)'
Go regex:
"\\d{4}-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d,\\d{3}\\s\\w{4,5}\\s.+?\\s-\\s(.*)"
Sample data:
2012-06-10 12:54:27,162 INFO CORBA.Daemon.App.out - java.lang.Thread.State: WAITING (on object monitor)
What I want is the text to the right of that dash, java.lang.Thread.State: WAITING (on object monitor).
Both are pre-compiled, so that's not the issue.
I'm using this call on the compiled pattern.
logPattern.FindStringSubmatch(line)
I know I could use strings.Split, I have started using split with a simpler regex match check, because of the performance difference. That's not the point of this post,
I want to understand why, so I can either fix my regex or avoid a pitfall I don't understand yet.
Any suggestions?